DAC2025 SemiWiki 800x100

Enhanced X-NAND flash memory architecture promises faster, denser memories

Enhanced X-NAND flash memory architecture promises faster, denser memories
by Dave Bursky on 08-25-2022 at 10:00 am

x nand timing vs slc

Although the high-performance X-NAND memory cell and architecture were first introduced in 2020 by Neo Semiconductor, designers at Neo haven’t rested on that accomplishment and recently updated the cell and the architecture in a second-generation implementation to achieve 20X the performance of conventional quad-level-cell (QLC) 3D NAND memories. The Gen2 X-NAND, unveiled at this month’s Flash Memory Summit, achieves that improvement through an enhanced architecture that allows the 3D NAND flash programming (i.e., data writes) to occur in parallel using fewer planes. Able to deliver a 2X performance improvement over the first generation X-NAND technology, the Gen2 architecture remains compatible with current manufacturing technologies and processes thus giving adopters of the technology a competitive advantage over existing 3D NAND flash memory products.

According to Andy Hsu, Neo’s CEO, the Gen2 technology incorporates zero-impact architectural and design changes that do not increase manufacturing costs while offering improvements in throughput and reductions in latency. The X-NAND technology can be implemented with all flash cell structures – SLC, MLC, TLC, QLC, and PLC – while delivering higher performance at comparable manufacturing costs. Neo is currently looking to partner with memory manufacturers who will license the X-NAND technology and then design and manufacture the high-performance memory chips.

NAND-flash memories based on the QLC cell and 3D stacking have been widely adopted in many applications thanks to the high storage capacities possible and their relatively low cost per bit. The one drawback is their slow write speed. The X-NAND technology overcomes that limitation and improves the QLC NAND read and write speeds threefold and the sequential read/write throughput by 15 to 30X. Further improvements in the Gen2 technology let memories deliver SLC-like (single-level cell) performance but with the higher capacity and lower cost of QLC implementations. Transfer rates of up to 3.2 Gbytes/s are possible with the Gen2 X-NAND technology.

In Neo’s Gen1 design the company employs a unique SLC/QLC parallel programming scheme that allows the data to be programmed to QLC pages at SLC speed for the entire memory capacity. This also solves the conventional NAND’s SLC cache-full problem (see the figure). When the conventional SLC cache is full for traditional NAND, the data will be directly written to QLC cells, and the write speed will be reduced to below 12%. X-NAND solves this problem, explains Hsu, and it also provides an excellent solution for heavy-write systems such as data centers with NAS systems.

Furthermore, the X-NAND’s 16-64 plane architecture provides parallelism at the chip level. When compared to a conventional NAND that uses 2 to 4 planes, one X-NAND chip can provide the same parallelism of 4 to 16 NAND chips. This would allow small-form-factor packaging such as in M.2 and eMMC memory modules. Additionally, X-NAND’s bit-line capacitance is only 1/4 – 1/16 that of a conventional NAND and thus the bit line’s power consumption for read and write operations can be reduced to about 1/4 – 1/16 (or by about 25% – 90%). This significantly increases the battery life for smartphones, tablets, and IoT devices.

www.neosemic.com

Also read:

WEBINAR: A Revolution in Prototyping and Emulation

ARC Processor Summit 2022 Your embedded edge starts here!

WEBINAR: Design and Verify State-of-the-Art RFICs using Synopsys / Ansys Custom Design Flow


Getting Ahead with Semiconductor Manufacturing Equipment and Related Plasma Reactors

Getting Ahead with Semiconductor Manufacturing Equipment and Related Plasma Reactors
by Kalar Rajendiran on 08-25-2022 at 6:00 am

Figure 1 Dry Etching Process Classification

Advanced semiconductor fabrication technology is what makes it possible to pack more and more transistors into a sq.mm of a wafer. The rapidly increasing demand for advanced-process-based chips has created huge market opportunities for semiconductor manufacturing equipment vendors. According to SEMI, worldwide sales of semiconductor manufacturing equipment in 2021 rose 44% to an all-time record of $102.6 billion. While the opportunities are big, delivering cost-effective equipment optimized for mass production involves overcoming a number of challenges.

In June 2022, SemiWiki published an article titled “Leveraging Simulation to Accelerate the Design of Plasma Reactors for Semiconductor Etching Processes.” That article was a brief introduction, leading up to a webinar on simulation techniques to accelerate design of plasma reactors for semiconductor etching processes. The webinar was presented by Richard Cousin of Dassault Systèmes and is now available on-demand for viewing. This article covers some salient points from that webinar.

Benefits of Simulation When Designing Plasma Reactors

Simulations help understand how a device will behave even before the device is actually manufactured. Whether the device being designed is a chip or a plasma reactor, the savings in terms of time and money is worth investing in a good simulation tool.

In the case of plasma reactors, the design has to balance many different parameters to accommodate various requirements and plasma characteristics. The plasma characteristics include the density profiles, the ionization rate, the effect of the pressure and the type of gas used as well as the influence of the geometry to prevent damages. Simulation can, for example, help experiment with different numerical and physical settings for a plasma reactor to improve uniformity of ion density profiles. It can also help perform thermal coupling analysis with the aim of eliminating vulnerabilities to reactor damage. It can also allow experimenting with different types of gases, pressures and power levels.

The benefits of simulation fall into three categories.

  • Predicting and explaining experimental results, particularly when no diagnostics are available in advance
  • Reducing cost of development by optimizing performance and reliability before actually manufacturing the plasma reactor (or modifying an existing reactor)
  • Accelerating device validation and decision making on right manufacturing processes

Focus of Dassault Systèmes’ SIMULIA Tools

Depending on the pressure levels deployed, dry etching process would be a physical process or a chemical process. Refer to the Figure 1 below for the dry etching classification spectrum and the focus range of the spectrum for SIMULIA tools.

Figure 1: Dry Etching Process Classification as a function of the Pressure level

The anisotropic etching process is well suited for nanoscale size features. Also, more physical parameters can be controlled to characterize the plasma, such as the input power as well as the pressure of the neutral gas which controls the plasma density.

Simulation Techniques Deployed by SIMULIA Tools

The SIMULIA tools help analyze not only the steady-state model after the plasma is formed but also the transient model on how/when the plasma is formed. SIMULIA can use a microscopic approach or a macroscopic approach for the simulations.

Microscopic Approach

Under the microscopic approach, it uses a time-domain kinetic approach with a Poisson based Particle-In-Cell code to analyze space charge interactions. Several particle interactions are taken into account in a global Monte-Carlo Collision (MCC) type model. The ionization of the neutral gas, its excitation and the elastic collisions are considered simultaneously to compute the plasma kinetics.

Macroscopic Approach

Under the macroscopic approach, SIMULIA tools treat the plasma as bulk for RF plasma analysis and matching network optimization. Both linear and non-linear Drude dispersion models are available as options. This is well suited for use when designing Capacitive Coupled Plasma (CCP) type of reactors. With the application of a Bias Magnetic field, an Electric Gyrotropic Dispersion model is available for use when designing Inductive Coupled Plasma (ICP) type of reactors, for example.

SIMULIA Tool Use Cases

The SIMULIA tool can be used to simulate various types of plasma reactors. The following are three examples that were presented.

The DC Magnetron Sputtering Reactor Example

This design setup is as follows:

Pressure range: 1 to 5 mTorr

Target voltage: -400V

Target materials: Al, Cu or Ti

Target thickness: 1 mm to 3 mm

Goal: Estimating the target erosion profile for understanding long term sputtering efficiency

Figure 2 below shows the high correlation between the simulated results and measured results, as it relates to target erosion profile prediction.

Figure 2: Very good agreement between the predicted results and the experimental data for the Target erosion profile in a DC Magnetron Sputtering Device example

GEC CCP Reactor (Capacitive Coupled Plasma) Example

The design setup is as follows:

Pressure: 200 mTorr

Temperature: 300K

Gas: Argon neutral gas

RF-Voltage: 60V Peak-to-Peak

Discharge over 13.56MHz RF-Voltage

Goal: Control of plasma homogeneity

Figure 3 below shows the ion density profile as predicted by the simulator. This compares very well to density profiles presented in the technical literature for CCP reactors.

Figure 3: Well-known GEC Cell CCP Reactor. Ion Density Profile in good agreement with the published results for this Device 

The VHF ICP Reactor (Inductive Coupled Plasma) Example

The design setup is as follows:

Pressure: 30 mTorr

Gas: Argon neutral gas

Input power: 300 W

EM-Field distribution: 13.56MHz

Goal: Characterize the physical parameters, understand the physical principles and identify potential issues and damages

Figure 4 below shows the plasma homogeneity issue and potential for damage from energy enhancement.

Figure 4: Electron Energy Enhancement localized which affects the plasma homogeneity and could lead to potential damage of the ICP Reactor

 


Automating and Optimizing an ADC with Layout Generators

Automating and Optimizing an ADC with Layout Generators
by Daniel Payne on 08-24-2022 at 10:00 am

Layout Geneator tool flow min

I first got involved with layout generators back in 1982 while at Intel, and about 10% of a GPU was automatically generated using some code that I wrote. It was an easy task for one engineer to complete, because the circuits were digital, and no optimization was required. In an IEEE paper from the 2022 18th International Conference on Synthesis, Modeling, Analysis and Simulation Methods and Applications to Circuit Design, there was an article authored by experts from Fraunhofer IIS/EAS, MunEDA, IMST Gmbh and Dresden University of Technology. I’ll share what I learned from their article, “A Multi-Level Analog IC Design Flow for Fast Performance Estimation Using Template-based Layout Generators and Structural Models.”

Analog designs like an ADC require that you start with a transistor-level schematic, do an initial IC layout, extract the parasitics, simulate, then measure the performance to compare against the specifications. This manual process is well understood, yet it requires iterations that can take weeks to complete, so there has to be a better approach. In the paper they describe a more automated approach, built upon three combined techniques:

  • Template-based generator
  • Parasitic estimation from object-oriented templates
  • Fast model-based simulation

The following diagram shows the interaction and flow between optimization, performance estimation and layout generators for an ADC:

Generator Template, Model, Performance Estimation

There’s a SystemC AMS model of the pipeline ADC circuit, and the parameters define things like the number of device rows, while the model defines the behavior of non-ideal capacitors and OpAmp offsets. The flow is designed to be executed, and when optimized it reaches an acceptable performance criteria.

The inner loop makes an estimate of the layout parasitics in about 5 seconds, and then the ADC is optimized and a layout generated in about 1 minute. The layout generator uses the best parameter set, and generates the capacitor structures. Layout capacitance values for device and wires were pre-characterized to enable fast estimates in the template approach. The optimization step is using estimated parasitics, not extracted parasitics, saving time.

A SystemC AMS model of a pipeline ADC has both behavioral and structural details, so that engineers can trade off accuracy versus runtime. Using an analytical model enables a thousand runs in just a few minutes. The outer loops adds the ADC model, and that run takes about 50 seconds to complete.

This generator template approach even estimates layout parasitics, capacitor variation and device mismatch. Both global and local process variations were taken into account.

Results

Starting from transistor-level schematics for the ADC, a parameterized model was built. Having a model enabled fast simulation and optimization, with the goals of:

  • Reduced layout area
  • Specific layout aspect ratio
  • Minimal error in the effective capacitor ratio
  • Robustness against process variations and mismatch

Layout-level optimization used an EDA tool from MunEDA called WiCkeD, which has a simulator in a loop approach, and the template was the simulator:

Optimization with WiCkeD

As the optimizer needs to find the performance for a set of design parameters, it asks the template to evaluate them. The optimizer finds the direction to change the design parameters that improve the layout. Evaluating a template takes under 5 seconds, so the optimization can quickly reach an optimal set of layout parameters.

To get the best capacitor array aspect ratio they selected an input parameter range of W and L for the unit devices, and the number of rows in the array. Next, they simulated the worst-case performance, including offsets in under two hours with 114 individual parameterizations. The worst-case transfer functions of the ADC model, based on numbers of rows in the capacitive array and various W over L values are shown below, where the ideal curve is dashed:

Worst-case transfer functions

Summary

Analog design and optimization is more difficult than digital design, because there are more interdependencies and trade-offs involved. A new approach with layout generators, template-based layout estimates and optimization has been demonstrated successfully for an ADC circuit, and it uses optimization technology from MunEDA, called WiCkeD. Instead of taking days to weeks, this approach met specifications for an ADC in just minutes.

Related Blogs


Hazard Detection Using Petri Nets. Innovation in Verification

Hazard Detection Using Petri Nets. Innovation in Verification
by Bernard Murphy on 08-24-2022 at 6:00 am

Innovation New

Modeling and verifying asynchronous systems is a constant challenge. Petri net models may provide an answer. Paul Cunningham (GM, Verification at Cadence), Raúl Camposano (Silicon Catalyst, entrepreneur, former Synopsys CTO and now Silvaco CTO) and I continue our series on research ideas. As always, feedback welcome.

The Innovation

This month’s pick is Hazard Detection in a GALS Wrapper: a case study. The paper published in the 2005 International Conference on Application of Concurrency to System Design. The authors are/were from Humboldt-Universität zu Berlin and IHP Microelectronics, Frankfurt.

This is an old paper yet interesting for use of Petri nets in modeling asynchronous processes, a problem domain that continues to crop up in modern designs. The authors apply their analysis to a GALS wrapper for a baseband block. Hazards can occur in handoff between a local clock driving the pipeline in the absence of external activity and the clock associated with an external input.

The method constructs Petri net patterns for a range of elementary gates, as state transition graphs (STGs) based on input signal states and edges. These nets are composable into a full circuit. In modeling, if an output edge with more than one marking is reachable from a starting state or might lead to such a marking, that represents a hazard. This is a widely studied problem and the basis of the method. Through a series of abstractions, the authors were able to detect several hazards and a potential deadlock.

Paul’s view

This month’s paper was a total blast from the past for me, bringing back memories from my own PhD in the formal verification of asynchronous circuits. The paper describes a method for detecting hazards (the potential for wires to glitch) in control circuits for a “globally asynchronous locally synchronous” (GALS) design style, but the star of the show for me is the authors’ use of a wonderful notation called a Petri net.

A Petri net looks like a state machine diagram, but it overlays the traditional node and arc notation with the concept of a “token”, literally drawn as a black dot that can go inside nodes. These tokens enable Petri nets to elegantly visualize concurrent interactions between state machines. The notation is not new and dates to the 1960s. Much of what we do in modern SOC verification involves wrestling with concurrency and this paper reminded me that Petri nets may warrant some fresh attention to improve complex protocol verification (CXL, USB4, etc.), cache coherency verification, or EDA tools based on the new Portable Stimulus language for transaction level randomization like Cadence’s Perspec.

As a further side comment, we all know about the duality between state machines and regular expressions. Likewise with Petri nets, there is an extension of regular expression syntax called “Trace Expressions” that introduces a special concurrency construct called the “weave” operator enabling a similar duality between Petri nets and Trace Expressions…but now we’re getting way too close to my PhD for comfort.

OK, back to the paper: the authors develop a system to represent every gate in a circuit with a Petri net fragment. They then compose these Petri net fragments together to form a giant Petri net for their GALS control circuits. Hazards can be formally expressed as certain states of this giant Petri net, and so proving that a circuit is hazard-free reduces to proving that these states are unreachable, which can be done with standard off-the-shelf model checking tools.

It’s a tight paper, and the authors’ use of Petri nets to formalize hazards is beautiful.  But unfortunately, the scalability of the underlying model checking problem generated by their method is very poor, and in the end the authors have to make some significant simplifying assumptions to get their checker to converge. That said, even with these assumptions they did find some real hazards in the control circuits that needed fixing.

Raúl’s view

The authors show how to model a chip that implements baseband processing compliant with IEEE802.11a. The implementation is globally asynchronous locally synchronous (GALS). The asynchronous part, which is of interest for the paper, is implemented as a wrapper around synchronous parts. It consists of 5 blocks and ensures that a synchronous pipeline functions properly. This they build up from elementary gate models, each represented by a Petri Net model. Once built, they make a key observation: that a hazard can occur when an edge place c carries more than one token. The problem of determining if a particular token pattern in a Petri-Net can occur is called the reachability problem.

They aim to solve this problem with a model checker, e.g., LoLA [10] or SMV[6]. The initial model results in a Petri-Net of 288 places and 526 transitions, too complex to be solved. The model is simplified applying hazard preserving abstraction techniques such as merging gates, abstracting structures. Also considering each of the 5 blocks separately. And they use VHDL simulations to confirm if a “dangerous” state transition could really occur. Several hazards could be detected with the simplified model. The authors conclude that “Fortunately, in the usual application scenario for GALS blocks no reachable state generates any of the detected potential hazards. Nevertheless, in order to increase reliability of the asynchronous wrapper we have fixed all those potential hazards in the system. Without the formal verification this would not have been feasible.”

I think this is an interesting modeling technique which allows formal verification. The modeling part is very clear, I found it elegant to model both signal values and edges. Detecting hazards when multiple edges can occur through solving reachability in Petri-Nets. The formal verification part however applies several techniques which are cumbersome and difficult to follow. Concluding thoughts:  Petri-Nets were popular in the 80s and 90s, particularly to model asynchronous systems. Asynchronous systems never became ubiquitous. Almost all digital systems are synchronous and just communicate, necessarily, with the outside world in a well understood asynchronous manner. There was no follow up to this paper. But even if not practical, it is a stimulating contribution to a hard problem.

My view

Despite the theoretical elegance of Petri Nets, I struggled to find papers on this topic in our domain. I did find several related papers based on different methods which all commented on scalability problems for Petri Nets. Perhaps this realization accounts in part for the lack of papers.


EDA Product Mix Changes as Hardware-Assisted Verification Gains Momentum

EDA Product Mix Changes as Hardware-Assisted Verification Gains Momentum
by Lauro Rizzatti on 08-23-2022 at 10:00 am

Semiwiki Hero Image Lauro Rizzatti

The Design Automation Conference, as always, is a good barometer on the state of EDA and my area of interest, verification. The recent DAC offered plenty of opportunities to check on trends and the status quo.

Remarkably, exhibitors and attendees were upbeat about the chip design landscape despite concerns about supply chain shortages and the impending semiconductor downturn. Based on my informal analysis, the hardware-assisted verification (HAV) segment and the future for chip design verification looks exceedingly bright. The big three players (Cadence, Siemens EDA and Synopsys) all hint at big gains in HAV tool adoption and point to emerging market segments for this growth.

Three executive-level presentations from three financial and market analysts that were part of the DAC program confirmed what I think and helped put into perspective the outlook for the EDA and semiconductor industry.

Sunday night, Charles Shi, Principal, Senior Analyst at Needham & Company, talked about “EDA to Power Through Semiconductor Cycles.” Shi reminded us that the semiconductor industry is cyclical and appears to be on the brink of a downturn. More positively, he outlined the reasons why the EDA industry will power through semiconductor cycles and emerge stronger on the other side.

Shi pointed to the slowdown of Moore’s Law driving secular growth of EDA and reminded attendees that EDA is part of an interdependent ecosystem that includes fabless, IP, foundry, equipment and materials. He added that chiplets are creating greater design complexity as the industry moves from 2D to 2.5/3D ICs, forcing stronger EDA and IP collaboration and tightly coupled package/chip co-design for faster design convergence. According to Shi, this means design and verification are getting harder and system-level design and analysis will increase EDA spending.

His last slide predicted that EDA is going to be okay if there is a recession and a semiconductor downcycle. Foundry, EDA and IP growth will continue to outperform the semiconductor industry, Shi concluded.

From my perspective, this is good news and portends many more opportunities for HAV as we move into more diverse applications areas and help manage chip functionality.

Rich Wawrzyniak, Senior Market Analyst from Semico Research, seemed to concur with Shi. He presented a well-balanced talk on “Semiconductor Market Trends: Tying it all Together for the Big Picture” Monday at the DAC Pavilion, beginning with a current assessment of the landscape. While semiconductor sales are high –– 26% growth in 2021, 6.3% projected for 2022 –– there’s a potential for a -0.9% forecast in 2023. Mitigating factors could create a downturn despite ongoing demand, a tight labor market and new application areas, he cautioned.

Much like Shi, Wawrzyniak pointed to advanced SoCs with elevated complexity levels and gate counts as drivers for increased functionality and performance. Wawrzyniak further pointed to the growing number of IP blocks per chip rises with each new process node and a reason for rising design costs.

As he wrapped up, Wawrzyniak predicted continued EDA revenue growth with more designs and more complex design starts. EDA is starting to move into new areas for growth and diversification.

Again, I concluded that more functionality and performance in a chip combined with more chip applications demand HAV solutions.

Jay Vleeschhouwer, Managing Director of Griffin Securities, presented “The State of EDA: A View from Wall Street” Tuesday in the DAC Pavilion. The most analytical of the three presentations, it nonetheless painted a positive view of EDA. For example, he noted that the EDA industry has grown each year for more than a decade with revenue surpassing $10 billion in 2021. The growth, he said, has been sustained across multiple product categories, a result of semiconductor and system engineering requirements in design, process and system complexity.

One the most important product mix changes over the past five to 10 years, according to Vleeschhouwer, has been the growth of HAV, both hardware emulation and FPGA prototyping. While he didn’t break out HAV revenue specifically, I estimate it’s approaching $1 billion, a huge revenue increase from hovering around $300-$500 million for almost two decades or since 2000.

Vleeschhouwer identified two areas where semiconductor company are investing –– software development and silicon development. He added that investments in silicon development remains where the majority of EDA revenue comes from. Software development is welcome news for HAV suppliers since it’s the only verification solution to support both software and silicon.

Given we’re more than halfway through 2022, we should expect more good news through yearend. Next year’s DAC could have a vibrant exhibit floor with attendees flocking to see the latest product news. I look forward to it and think you will, too.

Also Read:

An EDA AI Master Class by Synopsys CEO Aart de Geus

ARC Processor Summit 2022 Your embedded edge starts here!

WEBINAR: Design and Verify State-of-the-Art RFICs using Synopsys / Ansys Custom Design Flow


WEBINAR: A Revolution in Prototyping and Emulation

WEBINAR: A Revolution in Prototyping and Emulation
by Daniel Nenni on 08-23-2022 at 6:00 am

MimicPro Picture

This webinar will introduce to you a revolutionary new way to do prototyping and emulation at best-in-class performance, productivity, and pricing by unifying the hardware and a new software stack so one system is capable of prototyping and delivering essential emulation functionality.

Register Here

The speed of Moore’s law has slowed. However, with plans for 3nm production silicon in the second half of 2022 and with 2nm and 1nm in the works, the amount of IP being put on a single SOC is exploding. Also, the ability to verify IP both stand-alone and in the overall context of the SOC is becoming a greater challenge for the verification community at large.

Take the area of consumer electronics and we all know what our cell phones can do and deliver in terms of productivity and entertainment. Another device that is representative of massive amounts of IP placed on a SOC is the Drone market. Drones have upwards of DSLR sensors and record 5k+ video, requiring incredibly high-speed data interfaces. They leverage GPS multiple networks acquiring signals from well over 20 satellites to maintain a perfectly still position even on a breezy day. Even with Moore’s law slowing down, it is still driving us toward lower nodes and packing more and more IP into a single SOC creating a massive verification and software development challenge.

The same can be said at the enterprise level with networking bandwidth exploding and customized silicon now found at the desktop compute level. Data centers are all looking for faster and faster networking, and these complex SOCs need to be verified with firmware developed before silicon back from the FAB. The need for a faster and lower-cost platform for emulation and prototyping is apparent and Corigine’s EDA product line is aimed directly at those requirements.

Today, companies invest millions in costly emulation systems that need special power and cooling requirements and perform sub-par in order to run verification. In this session we will introduce you to Corigine’s approach to solving these issues using their unified platform for prototyping and essential emulation with a new level of performance.

Traditionally, costly emulators have been the tools deployed in top fabless companies around the world. With VC funding on the rise for fabless start-ups, these players need similar capabilities and Corigine with the only unified platform for prototyping and emulation functionality is answering these market requirements.

Corigine’s goal was to create an FPGA-based system bridging the gap between prototyping and major emulation functions. Doing so would reduce the investment in the key emulation functionality and require only a single platform for both teams meaning much lower overall infrastructure support costs. In addition, scalability and automation were key elements and with push button, automated timing optimized partitioning, fully automated handling of complex clock domains and the elimination of time-consuming recompiles, Corigine’s MimicPro product line is this revolutionary step forward for both Prototyping and Emulation.

Register Here

Also read:

Bringing Prototyping to the Desktop

A Next-Generation Prototyping System for ASIC and Pre-Silicon Software Development


A clear VectorPath when AI inference models are uncertain

A clear VectorPath when AI inference models are uncertain
by Don Dingee on 08-22-2022 at 10:00 am

Achronix VectorPath Accelerator Card with Speedster 7t1500 FPGA for running AI inference models and more

The chase to add artificial intelligence (AI) into many complex applications is surfacing a new trend. There’s a sense these applications need a lot of AI inference operations, but very few architects can say precisely what those operations will do. Self-driving may be the best example, where improved AI model research and discovery are on a frantic pace. What should a compute environment look like when AI inference models are uncertain?

Software adds AI inference flexibility but at a cost

A familiar reflex in the face of uncertainty is opting for software programmability. This dynamic has dominated large-scale CPU core development for generations. Faster processors debut, programmers write more software until it takes all the new-found capacity, then another round of even faster processors appears. But there’s a mismatch between a bulked-up CPU core and the fine-grained parallelized workloads in AI inference, and inefficiency becomes overwhelming.

Then GPUs showed up at the AI inference party with many smaller, parallelized cores and multithreading. On the surface, scaling up a software-programmable field of fast GPU cores seems a better fit for the fine-grained inference workload. If one has room for a rack of GPU-based hardware, it’s possible to pack a lot of TOPS in a system. But bigger GPUs start presenting other issues for AI inference in the form of sub-optimal interconnects and memory access. Hardware utilization isn’t great, and determinism and latency are suspect. Power consumption and cooling also head in the wrong direction.

Hardware could optimize around a known workload

If this is starting to sound like the case for a workload optimized custom SoC, that’s because it is. Design high-performance execution units, optimize memory access and interconnects, and organize them around running an AI inference model.

We’re seeing off-the-shelf AI inference SoCs popping up all over – primarily targeting one specific class of AI inference problem. There are SoCs designed to run YOLO models on facial recognition. Others optimize for driver assistance functions like lane adherence or emergency braking. AI inference is starting to get traction in areas like pharmaceutical research. If the AI inference models are well-defined, optimizing the workload in hardware is achievable.

But different AI inference models do not map onto layers or execution units the same way. Optimizing hardware around one model can be utterly inefficient for running another model. Making optimization matters worse, some of these more complex problems call for running different types of AI inference models concurrently on separate parts of the problem.

Niching down a custom SoC too tightly can result in lock-in, possibly preventing an enhanced AI inference model from running efficiently without a hardware redesign. That’s terrible news for a long life cycle project where the breakthrough AI inference innovations are yet to happen. It’s also not healthy for return on investment if volumes on a custom SoC are too low.

If only there were fast, programmable AI inference hardware

Several IP vendors are working on the specifics of reconfigurable AI inference engines with higher utilization and efficiency. Most are after the premise of co-design, where one looks at the AI inference models at hand and then decides how to configure the engine to run them best.

Recapping, we don’t know what the best hardware solution looks like when we start the project. We need a platform to explore combinations of IP quickly and change the design accordingly, maybe many times during development. And we must respond promptly and keep pace with state-of-the-art AI inference methods from new research. Also, if we’re going to a custom SoC, we need an inexpensive platform for software development before silicon is available.

Before thinking about designing a workload-optimized SoC, or even thinking about one at all if volumes are low, we should be thinking about an FPGA-based solution. The fact that an application may depend on AI inference models currently in flux reinforces that choice.

On that backdrop comes the Achronix VectorPath Accelerator Card, jointly designed with BittWare and now in general availability. It carries the Achronix Speedster 7t1500 FPGA, with its unique multi-fracturable MAC array matched to high performance LRAM and BRAM. Much of the attention on this design focuses on its blazing Ethernet connectivity for applications like high-frequency trading. It’s also an 86 TOPS engine with a 2-dimensional NoC for optimizing IP interconnects, plus 4 Tbps bandwidth to GDDR6 memory. Sensor data can come via those Ethernet ports or MCIO lanes at PCIe Gen5 data rates or on legacy interfaces over GPIO.

In short, it’s a powerful platform for AI inference, whether starting with third-party IP or designing it in-house. It drops into a host system easily with its PCIe form factor. More importantly, it allows designers to cope with projects starting when AI inference models are uncertain. We expect AI inference software and IP vendors to adopt Achronix into their ecosystems soon, and we’ll watch for future developments.

Visit the Achronix VectorPath Accelerator Card page for videos, datasheets, white papers, and more information on how this can help your AI inference project.


Protecting Critical IP With Next-Generation Zero Trust Security

Protecting Critical IP With Next-Generation Zero Trust Security
by Kalar Rajendiran on 08-22-2022 at 6:00 am

Securing Access Controls

While semiconductors are the enablers for high-tech solutions, the semiconductor industry was not at the forefront of Cloud adoption. There were many valid concerns behind the slow adoption, a primary reason being the threat to intellectual property (IP) security. IP in this context refers to not just chip building blocks but rather everything including tools, design methodologies and operational processes. Naturally, cloud providers took the IP security concern seriously and developed strong security mechanisms. They invested heavily into developing a secure infrastructure with tight security protocols and procedures for data at rest and data in motion.

With this increased level of cloud security, the chip industry started migrating to cloud computing. But accessing corporate data was still via on-prem equipment or via company supplied laptop computers and other mobile devices. The company’s IT department typically installed security software to protect against viruses and malware. As many other industries started accommodating bring your own devices (BYOD) policies with their employees, semiconductor industry was still lagging on this front.

Evolving Workplace

The semiconductor work environment has changed a lot over the recent years. Data and tools are nowadays accessible using both personal and company supplied devices. Access is made from within the firewall as well as from outside. When the Covid-19 pandemic hit, businesses were pushed into a work from home (WFH) model with predominant data access from outside the firewall. Of course, the primary mode for remote accessing data and tools was via a virtual private network (VPN).

VPNs are insecure because once hacked into, hackers can potentially access the entire network. Once an attacker has breached the network through a compromised device, hackers can try to steal corporate IP, customer or financial information, or launch a ransomware attack.

Security Challenges

In spite of elaborate security measures put in place, corporate networks are still hacked into on a regular basis. The usual hacks are made possible through a combination of phishing and breaking into a VPN client. With employees, contractors, vendors, supply-chain partners and customers needing different levels of access to various data, the threat of break-in is ever present.

Just last week, Cisco announced that their corporate network was hacked into.

Following is an excerpt from a blog post by Cisco Talos Intelligence Group, one of the largest commercial threat intelligence teams in the world.

  • The attacker conducted a series of sophisticated voice phishing attacks under the guise of various trusted organizations attempting to convince the victim to accept multi-factor authentication (MFA) push notifications initiated by the attacker. The attacker ultimately succeeded in achieving an MFA push acceptance, granting them access to VPN in the context of the targeted user.
  • Initial access to the Cisco VPN was achieved via the successful compromise of a Cisco employee’s personal Google account.

Following is an excerpt from SecurityWeek.com as it relates to the above Cisco news.

  • The attacker managed to enroll new devices for MFA and authenticated to the Cisco VPN. Once that was achieved, they started dropping remote access and post-exploitation tools. The hackers escalated their privileges, created backdoors for persistence, and moved to other systems in the environment, including Citrix servers and domain controllers.

Security risks are only growing over time, prompting companies to favor a Zero Trust security model over a VPN-based model. As per a report on helpnetsecurity.com, attacks against VPNs were up nearly 2,000% in 2021. According to a VPN Risk Report based on Zscaler’s survey of over 350 cybersecurity professionals:

  • 72% of organizations are concerned that VPN may jeopardize IT’s ability to keep their environments secure
  • 67% of enterprises are considering remote access alternative to a traditional VPN
  • 72% of companies are prioritizing the adoption of a Zero Trust model

Zero Trust Security (ZTS)

Even before Covid-19 impacted the workplace forever, ZTS was already in motion as a solution for addressing security threats. The ZTS model takes the approach of verifying everyone and every device that is seeking access to confirm they are who they claim to be. It does not matter whether the access is being sought from within the network or from outside.

Fundamental Requirements for Enterprise ZTS

  • Access visibility – Full visibility of all access to critical applications and data
  • Access control – Authenticate and control of who can access what resources, applications, and data
  • Protocol control – Whitelisted software and protocol controls to block unapproved or malicious traffic
  • Universal enforcement – Unified method across enterprise on-prem, in the cloud, or remote
  • Business agility – Dynamic project-based controls, no infrastructure “rip and replace”

A primary challenge of Zero Trust approach is ensuring security without slowing things down. Access to sensitive data is needed on an ongoing basis to communicate and collaborate on projects. Team members’ roles and access levels vary and change on a frequent basis. ZTS implementation should be such that people don’t lose access, leading to productivity loss.

The Figure below highlights where data leaks typically happen and how attacks propagate through a network.

Zentera CoIP® Platform and Technology

Zentera has developed the next generation ZTS solution using its patented technology. Its solution takes a differentiated approach of in-line ZTS. The Zentera CoIP Platform implements a Zero Trust security fabric using an overlay technology architecture. The platform authenticates users using a company’s existing identity providers and endpoints and applications using certificates and fingerprints. This method enables a role-based access to specific applications and resources. Accordingly, secure virtual zones can be created, managed and torn down rapidly as the requirements call for. Refer to the Figure below for the solution concept.

The CoIP platform integrates the following three core engines to implement ZTS.

The CoIP Chamber

The chamber implements the concept of a secure virtual development environment wherein applications are cloaked and lateral network access is eliminated. Inbound and outbound flow control are allowed only for authorized users, apps and services.

The CoIP Network

The CoIP network connects applications rather than connect networks across silos. The applications’ connectivity is based on application identity. IP collisions across environments are avoided through the use of overlay IP addressing.

CoIP Access

This engine authorizes each access based on an user’s role and privileges. The users and devices are authenticated through MFA. This engine also sets up non-step and one-stop access between specific applications instead of connecting entire networks together.

CoIP Technology in Action

Whether it is the employee, contractor, vendor or customer, they gain access to the remote desktop chamber via the CoIP platform. From there, they gain access to the application chamber and other applications based on CoIP access controls setup. Even if someone wants access from an on-prem workstation, they still have to go through the remote desktop chamber before they can access applications and data. Refer to the Figure below.

In the particular use-case shown in the above Figure, blanket access to the internet has been prohibited. For use-cases where internet access is required for specific purposes, the technology supports those requirements. For example, let’s say that access to a company’s remote data server is required, that specific connectivity can be authenticated and setup.

Also Read:

Post-quantum cryptography steps on the field

CEO Interview: Jaushin Lee of Zentera Systems, Inc.

Memory Security Relies on Ultra High-Performance AES-XTS Encryption/Decryption


Podcast EP102: A Brief History of eFPGA with Geoff Tate of Flex Logic

Podcast EP102: A Brief History of eFPGA with Geoff Tate of Flex Logic
by Daniel Nenni on 08-19-2022 at 10:00 am

Dan is joined by Geoff Tate, CEO and Co-founder of Flex Logix. Geoff explains the embedded FPGA market, including some history, applications and challenges to deliver a product that customers really want. He provides some very relevant background on why Flex Logix has been so successful in this market, and what lies ahead.

GEOFF Tate: BSc, Computer Science, University of Alberta. MBA Harvard. MSEE (coursework), Santa Clara University. 1979-1990 AMD, Senior VP, Microprocessors and Logic with >500 direct reports. 1990 joined 2 PhD founders as founding CEO to grow Rambus from 4 people to IPO to $2 Billion market cap, till 2005.

The views, thoughts, and opinions expressed in these podcasts belong solely to the speaker, and not to the speaker’s employer, organization, committee or any other group or individual.