SemiWiki – Page 397 – The Open Forum for Semiconductor Professionals

May 10, 2020January 11, 2022

MOSFET Gate Length Scaling Limit at Reduced Threshold Voltages

MOSFET Gate Length Scaling Limit at Reduced Threshold Voltages
by Fred Chen on 05-10-2020 at 6:00 am
Categories: Lithography

As transistor dimensions shrink to follow Moore’s Law, the functionality of the gate used to switch on or off the current is actually being degraded by the short channel effect (SCE) [1-5]. Moreover, the simultaneous reduction of voltage aggravates the degradation, as will be discussed below.

A Practical Lower Limit of Threshold Voltage
First, we will estimate a practical lower limit for the threshold voltage Vth, i.e., the gate voltage at which the transistor is said to turn on. Below the threshold voltage, the current drops off exponentially, in the best case, at a rate of 60 mV/decade, i.e., every 0.06 V reduction below Vth results in the current dropping to 10% of its value (Figure 1). So we can see that if the leakage current at 0V is to be 0.1% (already a large allowance) of its value at Vth, the threshold voltage must be at least 0.18 V. In turn, the power supply voltage Vdd is expected to be several times Vth, e.g., ~ 1V. 60 mV/decade also means the current changes by a factor of 2 for every 0.02V shift. This is important for considering changes in the threshold voltage itself.

Figure 1. Subthreshold slope of 60 mV/decade gives ~0.1% leakage at 0V for Vth ~0.2V. A 20 mV drain-induced barrier lowering (DIBL) leads to ~2X change in current due to the shift of the Ids vs. Vg curve.

The Short Channel Effect: Drain-Induced Barrier Lowering
Normally, in order to turn the transistor on or off, the gate voltage controls the depletion of charges under the gate, between the source and drain terminals. Basically, as shown in Figure 2, as the gate length Lg is reduced, the source and drain terminals are closer, and the respective depletion layer widths Ws and Wd take up the significant portion of Lg. Specifically, the depths of the source and drain depletion layers cause electric field bending under the gate, which becomes more severe as the source-drain distance is narrowed.

Figure 2. The origin of drain-induced barrier lowering (DIBL). A larger gate (left) has a flat potential contour over most of the gate length, while a shorter gate (right) shows bending of the potential contour.

As a result, when the voltage from the source to drain is increased, the barrier in between is reduced fairly significantly, to the same degree as the voltage on the gate itself. This phenomenon is also known as drain-induced barrier lowering (DIBL). DIBL is generally given as the shift in threshold voltage (the reduction of the barrier) for a given shift in drain-source voltage. Usually the reference drain-source voltage is near zero, while the shifted voltage is near the supply voltage, and the threshold voltage shift is on the order of tens of millivolts. But given that a 20 mV shift already constitutes a factor of 2 change, when Vth ~ 0.2V and Vdd ~ 0.7-1V, a DIBL of 20 mV/V as shown in Figure 1 can therefore be considered an upper limit of tolerance.

Have we already reached minimum Lg?
A minimum gate length of ~20 nm has already been predicted by scientists at IBM [1,5] as well as IMEC [6]. This holds for both SiO2 (minimum 1 nm) and high-k (HfO2 ~4-5 nm) gate dielectrics. It is derived from the characteristic decay length of the lateral electric field under the gate [1].

Figure 3. 2017 field FinFET data showing DIBL degradation for Lg of 20 nm and below [5].

A lower Lg limit of ~20 nm for the planar MOSFET means alternative transistor architectures need to be considered for achieving smaller gate lengths. The most well-known are the FinFET [5] and the surround-gate [7]. On the other hand, a similar Lg limit also appears to have been confirmed by field FinFET data [5] (Figure 3). This is not hard to imagine, as field bending toward the substrate is still possible within the fins. Moreover, in the case of the gate surrounding all sides of the silicon, the gate + 2x oxide thickness (>10 nm) must be added to the silicon body thickness, which hinders scaling of cell height (perpendicular to the gate pitch). By also considering drive current requirements [8], it is also preferred to widen the cell height [7], i.e., there is potential reverse scaling perpendicular to the gate pitch.

Implications
The limitation of the lateral scaling of transistors could portend greater reliance on 3D extension by wafer bonding, such as that implemented in the HBM interface [9]. Or it could be that the future of computing will shift more to memory, particularly those with 3D capacity expansion capability. Thus, the current ongoing developments toward in-memory computing, e.g., [10], are very timely.

References
[1] Y. Taur and T. Ning, Fundamentals of Modern VLSI Devices, 2nd Edition, Cambridge University Press, 2009.

[2] http://www.cs.ucl.ac.uk/staff/ucacdxq/projects/vlsi/report.pdf

[3] https://web.stanford.edu/class/ee316/MOSFET_Handout5.pdf

[4] http://www-inst.eecs.berkeley.edu/~ee130/sp03/lecture/lecture27.pdf

[5] A. Razavieh et al., “Scaling Challenges of FinFET Architecture below 40nm Contacted Gate Pitch,” 75th Annual Device Research Conference, 2017.

[6] http://www1.semi.org/eu/sites/semi.org/files/events/presentations/07_Hans%20Mertens_imec.pdf

[7] N. Loubert et al., “Stacked Nanosheet Gate-All-Around Transistor to Enable Scaling Beyond FinFET,” 2017 Symp. VLSI Technology.

[8] U. K. Das et al., “Limitations on Lateral Nanowire Scaling Beyond 7-nm Node,” IEEE Elec. Dev. Lett. 38, 9 (2017).

[9] https://en.wikipedia.org/wiki/High_Bandwidth_Memory

[10] https://www.researchgate.net/publication/335070394_RRAM_Based_In-Memory_Computing_From_Device_and_Large-Scale_Integration_System_Perspectives

Flex Logix CEO Update 2020

Flex Logix CEO Update 2020
by Daniel Nenni on 05-08-2020 at 10:00 am
Categories: AI, CEO Interviews, eFPGA, Flex Logix, IP
1 Comment

We started working with Felx Logix more than eight years ago and let me tell you it has been an interesting journey. Geoff Tate was our second CEO Interview so this is a follow up to that. The first one garnered more than 15,000 views and I expect more this time given the continued success of Flex Logix pioneering the eFPGA market, absolutely.

What is Flex Logix’ core strength?
My co-founder Cheng Wang invented and refined a superior programmable interconnect which we apply to a range of applications to solve major market needs; then we combine this with the software tools to program the resulting solution. Combined with our design methodology, we can create scalable and portable IP products very quickly and economically.

What markets/applications does Flex Logix play in?
Embedded FPGA (eFPGA)
AI inference
DSP acceleration

You started in eFPGA, how is that market developing for Flex Logix?
We are the “ARM of FPGA technology”: we license eFPGA for integration into SoCs, but we do not build chips.

Using our superior programmable interconnect we are able to achieve Xilinx-like density and performance in any process node using standard cells for rapid development and fewer metal layers.

We have proven eFPGA silicon with numerous customers and chips in 180nm, 40nm, 28/22nm, 16nm and 12nm process nodes. There are >10 working chips using eFPGA and >>10 more in fab and in design and many more planned. Our technology is mature and robust: our 2nd generation architecture is now 3 years old and every chip has worked 1st time.

Our early adopter market segment has been Aerospace (Sandia, Boeing, etc) but commercial design activity is now taking off as well (Morning Core, Dialog, etc). Our eFPGA technology has become strategically critical to many of our customers and they have extensive roadmap plans for a series of chips and they are driving us to improve our offerings to even better meet their needs, creating very high “stickiness”.

Half of our customers are using FPGA chips and want to integrate to reduce power/size/cost. Half of our customers have never used FPGA but use eFPGA for customizability and acceleration.

We provide software tools to program our eFPGA using Verilog.

The eFPGA market now is profitable for us and the cash flow is helping fund our AI Inference initiative.

How did Flex Logix get into AI Inference and why is it synergistic?
Companies like Microsoft use FPGAs in wide deployment to accelerate work loads including inference. Inference uses a lot of MAC operations – FPGAs have a lot of MACs as do GPUs.

Customers a couple years ago asked us if we could optimize our eFPGA for AI Inference. Cheng studied the neural network models, like YOLOv3, and realized we could take our existing DSP MACs and optimize for INT8/BF16 and as well we could increase MAC density by clustering MACs into 1-dimensional systolic arrays of 64 MACs each. Using our programmable interconnect we can wire up MACs in very flexible ways to achieve high MAC utilization and throughput at low die cost for a wide range of neural network models. The resulting product is our nnMAX AI Inference IP which, like our eFGPA, is a tile that can be arrayed to achieve whatever throughput the customer needs for their SoC.

But initially we expect most customers to want to buy chips so we have designed and are taping out now our InferX X1 which is very compact and low cost but has performance that rivals chips 5-10x larger. We will also build PCIe boards and expect to sample in Q3 this year. We recently shared benchmarks vs Nvidia’s leading Xavier NX and Tesla T4, showing we have superior price/performance.

The interesting thing is the relative performance of X1/NX/T4 is very different from one model to another. Our customers did not expect this – they assumed they could get a benchmark for say ResNet-50 batch=1 and that would show relative performance. The reason it doesn’t is different models stress different aspects of the hardware (and software) architectures. For example, ResNet-50 has very small images and activations so it does not stress the memory subsystem; whereas YOLOv3 for megapixel images definitely does.

Our inference technology is available now for 16nm. Our roadmap is to make it available on 7/6nm and 12nm (for our Aerospace customers who want US fabrication).

So then what about DSP?
Just like customers led us to explore AI Inference; customers have asked us “gee, your nnMAX IP has so many MACs in such a small area, can we use it for DSP?”

It turns out nnMAX is excellent for DSP doing FIR filters at up to Gigasample rates and taps of hundreds, thousands or even tens of thousands using the arrayable nnMAX tile. For our ports to 7/6 and 12 we are exploring adding similar FFT performance.

WEBINAR: eFPGA what’s available now, what’s coming & what’s possible to optimize your SoC

About Flex Logix
Flex Logix provides solutions for making flexible chips and accelerating neural network inferencing. Its eFPGA platform enables chips to be flexible to handle changing protocols, standards, algorithms and customer needs and to implement reconfigurable accelerators that speed key workloads 30-100x compared to processors. Flex Logix’s second product line, nnMAX, utilizes its eFPGA and interconnect technology to provide modular, scalable neural inferencing from 1 to >100 TOPS using a higher throughput/$ and throughput/watt compared to other architectures. Flex Logix is headquartered in Mountain View, California. https://flex-logix.com/

Also Read:

CEO Interview: Jason Xing of Empyrean Software

Executive Interview: Howie Bernstein of HCL

CEO Interview: Adnan Hamid of Breker Systems

May 8, 2020March 21, 2022

How to Modify, Release and Update IP in 30 Minutes or Less

How to Modify, Release and Update IP in 30 Minutes or Less
by Mike Gianfagna on 05-08-2020 at 6:00 am
Categories: Cliosoft, EDA, Events

I had the opportunity to attend a ClioSoft webinar recently on the topic of IP traceability. ClioSoft provides a broad range of tools for design data management and IP reuse. Entitled The New Trend in IP Traceability that IP Developers and Design Managers Rely On, the webinar was presented by Karim Khalfan, director of applications engineering at ClioSoft. Karim has been at ClioSoft for almost 17 years, so he knows a lot about the company’s products and how they are used.

I’ve attended and produced many webinars over the years. There have been a lot more opportunities to do so in recent times. After a while, you identify the winning formula for that special medium of streaming delivery. Focus, clarity, clear examples and above all, brevity are all ingredients that work. I can say that this ClioSoft webinar did everything right. Learning the complexity of IP tracing, along with a clear demonstration of how to address those perils from the perspective of three different users, all in under 30 minutes with a Q&A session as well is impressive. Karim hit all the highlights perfectly.

If you didn’t have the opportunity to attend the event, don’t despair. There is a replay link coming in a bit. Before we get to that, I’ll give you some highlights of Karim’s presentation.

First of all, why is IP traceability important? There are lots of intuitive responses to this question. Here are three concrete points to consider:

Increase visibility: whether it’s a third-party or internally developed piece of IP, knowing where it’s been used and with what kind of success it has seen are important
Improve quality: through tacking what projects are using the IP and how they’re using it
Reduce risk: by knowing if you’re using the right version and knowing how it works

Beyond the commonsense reasons for IP traceability, very clear and well documented IP tracing is the price of admission for standards-driven design projects such as those required by ISO26262 and MIL-STD-882.

With some motivation as to why IP traceability is important, Karim discussed the various stakeholders that would be involved in his live demonstration. There are three:

IP Owner: Reviews Jira tickets, modifies IP, releases new versions
IP Consumer: Selects the right IP, updates it as needed and integrates the IP into the design project
Design Manager: Reviews all aspects of IP updates to ensure the correct IP is being used, analyzes and addresses any conflicts, approves the design and propagates results

So, what could go wrong in the lives of these folks on a real project without the right methodology and tools? Lack of proper notification of IP changes to all the teams and projects that use the IP, inability to find and review all the changes made (especially for binary representations) and incomplete propagation of required changes to all those using the IP are just a few of the headaches one could face.

Karim then ran a series of live demos on a real IP update example from the point of view of the IP owner, IP consumer and design manager. ClioSoft’s SOS7 design management platform formed the backbone of the demo, along with integrations to other key tools like the Jira issue tracking system and the Cadence Virtuoso layout editing platform.

The demo began with the IP owner logging into Jira to find that an important IP enhancement was needed – reduce the finger count on a precision op amp. Logging into the ClioSoft SOS environment allowed the IP owner to find the IP and see all the design teams and projects that were using the IP. The IP owner then made the required changes in Virtuoso, which is integrated into the SOS environment. A new version of the IP was then checked back into SOS and the users of the IP were notified.

The IP consumer had many paths of notification for this change – it was actually hard to miss. This person then used additional analysis tools provided in SOS to examine the changes in the new version to make sure it was appropriate to update the instances. The changes were then reviewed by the design manager who identified an inconsistency in the use of ATPG in two blocks of the design. This was remedied with a quick query regarding available versions.

That’s a very short overview of the demo. I highly recommend you watch the live version; it shows a lot more details about the capabilities available to all stakeholders in the ClioSoft tools. You can access the webinar replay here. While you’re on the ClioSoft website, you can check out all of their products to support design data management and IP reuse. At my prior company, eSilicon, we were a ClioSoft customer and found their tools to work well and their customer support to be excellent.

Also Read

Best Practices for IP Reuse

WEBINAR REPLAY: AWS (Amazon) and ClioSoft Describe Best Cloud Practices

WEBINAR REPLAY: ClioSoft Facilitates Design Reuse with Cadence® Virtuoso®

May 7, 2020July 28, 2020

High-Level Synthesis and Open Source Software Algorithms

High-Level Synthesis and Open Source Software Algorithms
by Daniel Payne on 05-07-2020 at 10:00 am
Categories: CircuitSutra, Semiconductor Services

The DVCon conference and exhibition finished up in California just as the impact of the COVID-19 pandemic was ramping up in March, but at least they finished the conference by altering the schedule a bit. Umesh Sisodia, CEO at CircuitSutra Technologies presented at DVCON on the topic, Using High-Level Synthesis to Migrate Open source Software Algorithms to Semiconductor Chip designs, and I had a chance to review his presentation.

My first exposure to High-Level Synthesis (HLS) was back in 2005 when I worked at Y Explorations, Inc., a company that started out using VHDL or Verilog as the input language, then later focused on C input.

So why would an engineer choose to use an HLS approach over a more traditional RTL coding methodology? With a higher level of abstraction as an input designers can separate design from implementation, use up to 10X less code which reduces design efforts, and benefit from 10 to 1,000X faster simulation speeds making them much more productive.

Additional reasons to consider using an HLS flow:

One source, many implementations (ASIC, FPGA, eFPGA)
Optimize for wide array of Power, Performance and Area
Embedded SW engineers can use FPGAs
Lots of C/C++ tools available
Open Source algorithms can be re-used

The focus area of this blog is how to migrate the open source software algorithms to Verilog and accelerate these inside the semiconductor chips.

Many semiconductor companies are designing custom SoCs for emerging domains like Vision, Speech, Video / Image processing, 5G, Deep learning etc.. In these domains lots of algorithms are already available as a software implementation, either as a free open source version, or the companies have their own software implementation.

In general, the software world has a huge code base available as free and open source code, most of which is widely used by the industry and is thoroughly verified. Many popular algorithms are available as an open source implementation, along with comprehensive reference test suites.

CircuitSutra is in the process of defining a robust methodology where an existing software implementation can be quickly implemented into silicon, creating a big game changer for the industry.

Engineers at CircuitSutra migrated the open source C implementation of a Sobel filter to Verilog using a High Level Synthesis design flow.

Sobel Filter Example

For computer vision and image processing applications there’s an edge detection algorithm called a Sobel Filter, and it’s found on Github as Open Source. The filter generates a 2D map of the gradient, and it finds the direction of largest increase from light to dark, and then rates the change in that direction. Here’s an example starting image shown on the left, the gradients, and the filtered result:

HLS Flow

The team then modified the C code for the Sobel Filter to make it work with the synthesizable subset, then generated Verilog code using the Mentor Catapult tool.

There is a comprehensive and well defined set of guidelines for using a synthesizable subset of C / C++ / SystemC which needs to be followed. The important points of these guidelines are listed below:

HLS tool parse the code to extract the design intent, and the entire functionality should be extractable at compile time. Any functionality that is determined at run time cannot be extracted by the tool. Constructs must be unambiguous and of fixed size.
C functions synthesize into RTL blocks, and function arguments synthesize to RTL I/O. Arrays in the C code synthesize to memory: RAM / ROM / FIFO.
Datatypes of the variables impact the precision, area and performance of the RTL. A generic 32bit integer can be avoided if a 10 bit integer is sufficient. HLS tool vendors provide their own implementation of datatypes for usage in synthesizable code. The Algorithmic C datatypes (AC datatypes) from Mentor Graphics were used in this exercise.
The synthesizable code cannot use function calls from other libraries which are not synthesizable, you need to find a corresponding synthesizable library or implement it yourself. The math.h functions used in the code were replaced with the corresponding function calls from the ac_math.h / ac_dsp.h from Mentor Graphics.

Not all C / C++ constructs can be synthesized. You should avoid memory allocation, OS system calls, function pointers, STL classes, non const global variables, utility libraries etc..

One of the benefits of the proposed methodology is to reuse the test suite of the original software implementation to verify the final RTL implementation.

Most of the time, the original software implementation will have a comprehensive functional test suite, if not it will be a good idea to start by creating such a test suite. At this stage, the code base is smallest and execution speed is fastest, so comprehensive functional verification at this stage requires minimal efforts

After refining the source code to make it compliant with the synthesizable subset, you re-use the same test suite to ensure that functionality is still intact. The synthesizable code is still C / C++ code which can be compiled using gcc compiler, and does not require any specific tool set or specialized setup to take it through the original test suite. Some minor updates in the testbench may be required. It will be good to use the original software implementation as the golden reference to verify the synthesizable implementation.

Next, you synthesize the refined implementation using the HLS tool to generate RTL. For the functional verification of the RTL it is advisable to re-use the same original test suite and use the original software implementation as the golden reference. So, you can very quickly ascertain that the resulting RTL is functionally correct. This kind of setup will require Verilog-C/C++ co-simulation, and Mentor Catapult provides the SCVerify flow for verification setup.

A testbench was created to validate that the algorithm is working properly, and that testbench can be used at both the C++ and RTL levels.

With the flow explained so far, software experts can easily take the original software implementation and generate functionally correct RTL, without requiring in-depth knowledge of RTL. They just need to understand the synthesizable subset.

The RTL generated with these steps will be functionally correct, however will definitely not be in the usable form yet, as it is not fully optimized for the specific target implementation (FPGA / ASIC / technology nodes) or for specific target application requiring certain Power Performance Area matrix. To get the optimized RTL, you will have to play with the HLS tool directives and constraints, and may have to further refine the synthesizable code by using optimization directives or tool pragmas at the right places. It also requires re-structuring of synthesizable code to capture a bit of macro architecture = Registers, Memory, Interfaces etc.. This exercise requires strong understanding of the RTL, and cannot be done by software experts. The good news is that by now you already have a robust functional verification setup, and with each optimization iteration you can quickly ascertain that the implementation (C as well as RTL) is still functionally correct. This cycle of refine – optimize – verify continues till you get the final RTL that meets requirements.

Open Source HLS Libraries

Software developers generally have access to lots of free general purpose libraries, however these cannot be readily used in the synthesizable code. You need to find a corresponding HLS library, or implement it yourself as per synthesizable subset. Few HLS libraries come bundled with HLS tools , but there are a few open source HLS libraries available:

Nvidia Matchlib– SystemC/C++ library of common HW functions

Open source libraries from Xilinx
- Vitis Accelerated Libraries
- Vivado HLS library for FINN – Quantized Neural Network (QNN) using FINN
- Vivado HLS Tiny Tutorial – Algorithm, Interface and miscellaneous
- Vivado HLS Libraries for Networking
Open source libraries from Mentor Graphics
- HLSLIBS
Others

Xilinx provides an HLS tool named Vivado that is widely used in the FPGA community to implement the designs at a higher abstraction level using C / C++ / SystemC. The HLS libraries provided by Xilinx works with their HLS tool, but not with Mentor Catapult and other tools. The HLS libraries provided by Mentor Graphics work with Catapult only, so it is recommended to write the synthesizable code in a tool independent fashion, so that same code can be easily re-used across multiple projects targeted for different technologies (FPGA, ASIC / SoC). There are some minor differences in how the synthesizable code has to be written for different tools. The article ‘Porting Vivado HLS Designs to Catapult HLS Platform’, provides a good summary of differences in writing code for Xilinx Vivado and Mentor Catapult. The CircuitSutra team used these concepts to migrate some of the open source Vivado HLS libraries to work with Mentor Catapult.

CircuitSutra is also in the process of developing tool independent HLS libraries corresponding to widely used software libraries.

Advanced ESL Flows

The methodology under consideration opens the door for various advanced ESL flows that were mostly a wish list.

Apart from High-Level Synthesis, the other widely used use-case of ESL methodologies is virtual prototyping. Virtual prototypes are the fast simulation of models for SoCs and systems. These are used for pre-silicon, embedded software development, SoC level & System level co-design and co-verification, automated unit testing of firmware, architecture exploration etc.. Virtual Prototyping uses a CPU Instruction Set Simulator (ISS) along with IP models and memory models The models for virtual prototypes are developed using SystemC, which is a C++ library.

There has always been talk in the industry to have a single model which can be used in virtual prototypes, and also synthesized using a HLS tool to generate RTL. However both use-cases require different kinds of high level code. HLS requires code which is compliant with the synthesizable subset, virtual prototypes requires code which can simulate as fast as possible. Virtual prototype models uses the concepts like Transaction Level Modeling (TLM) and loosely timed (LT), and can use any constructs of C & C++.

Starting with the same original open source software implementation, you can now create models for both High-Level synthesis and virtual prototypes. While I explained how to make the code synthesizable, developing the model for virtual prototype is even simpler, you just need to wrap the software implementation in the SystemC and implement the transaction level interfaces and other macro-architecture details of the IP like registers, memory etc.. The same test suite can be used for the verification of both models.

You can also add hybrid modeling by simulating parts of your design in virtual prototype, RTL simulation, FPGA chips or emulator boxes.

A virtual prototype enables verification at the SoC level using bare metal tests, firmware embedded application. For maximum productivity and re-usability, you can move step by step. As a first step, run these tests on the pure virtual prototype having TLM models of IP. In the next step, you can replace the TLM version of a specific IP block with the synthesizable version of C / C++ / SystemC implementation and verify it with the same test suite. Finally, through co-simulation you replace the specific IP block with the RTL implementation and verify it with the same test suite. With each step you are moving to the slower simulation, and the objective is to catch as many bugs as possible early in the cycle when simulation is fast.

The RTL IP have to be thoroughly verified using SystemVerilog and a UVM environment. The same environment can be used to further verify the TLM models and synthesizable models. This will ensure complete equivalence at all abstraction levels.

These advanced flows are also likely to enable the effective usage of the upcoming standard Portable Stimulus, which allows you to generate different flavors of test cases from the same verification intent.

Summary

HLS is a proven approach for ESL design, and moving up from RTL coding to HLS will give you time to actually explore the design space and make early trade-offs. Because SW is written in C and C++, you can simulate both SW and early HW together early, always a good thing instead of waiting for silicon to arrive. Virtual Platforms allow you to decide what goes into SW and HW.

Companies like CircuitSutra have deep experience using these ESL approaches to implement new design products quickly and correctly.

About CircuitSutra

CircuitSutra is an Electronics System Level (ESL)design IP and services company, headquartered in India, having development centers in Noida and Bangalore, and an office in Santa Clara CA. It enables customers to adopt advanced methodologies based on C, C++, SystemC, TLM, IP-XACT, UVM-SystemC, SystemC-AMS, Verilog-AMS. Its core competencies include Virtual Prototype (Development, Verification, Deployment), High-Level Synthesis, Architecture & Performance modeling, SoC and System-Level co-design and co-verification.

CircuitSutra’s mission is to accelerate the adoption of ESL methodologies in the Industry.

CircuitSutra provides best in class ESL experts, who works as an extension of customer’s R&D team, either remotely through offshore development center (ODC) model or onsite at customer location. CircuitSutra provides re-usable modeling IP & methodology, that helps the customers to quick start their modeling projects. It also provides specialized SystemC training that helps customers to groom the non-SystemC professionals to become virtual prototyping experts.

Related Blogs

System Level Flows for SoC Architecture Analysis and Design – DVCON 2020

May 7, 2020July 6, 2020

Ultra-Low Power Inference at the Extreme Edge

Ultra-Low Power Inference at the Extreme Edge
by Bernard Murphy on 05-07-2020 at 6:00 am
Categories: Eta Compute, IP

I wrote last year about Eta Compute and their continuously tuned dynamic voltage-frequency scaling (CVFS). That piece was mostly about the how and why of the technology, that in self-timed circuits (a core technology for Eta Compute) it is possible to continuously vary voltage and frequency, whereas in conventional synchronous logic it’s only possible to switch between a few discrete voltage and frequency options. You might think ‘self-timed, this must be about performance’ but in fact Eta Compute is pushing it for ultra-low power at the extreme edge in AI applications.

I haven’t talked with them in a while, so I confess I’m catching up. From what I see, it looks like they’ve found their sweet spot, a very targeted application in power constrained applications where some level of inference is required. They cite as examples intelligent sensing and/or voice activation and control in:

- Building: thermostats, smoke detector, alarm sensors
- Home consumer: washing machines, remote control, TV, earbuds
- Medical and fitness: fitness band, health monitor, patches, Hearing aid
- Logistics: asset tracking, retail beacon, remote monitoring
- Factory: motors, industrial networks, industrial sensors

The most recent Eta Compute solution is realized in their ECM3532 neural sensor processor. This is a system on chip with an Arm Cortex-M3 processor and an NXP CoolFlux DSP, 512KB of Flash, 352KB of SRAM, and supporting peripherals. All of this is built with Eta Compute’s proprietary CVFS (continuous voltage frequency scaling) technology, operating near threshold voltage.

The dual-MAC DSP handles signal processing from sensors, feature extraction and inferencing. The MCU handles application software, control and networking. I’ve seen this combo in other products (though not built on CVFS technology) so it looks like an up and coming architecture to me.

Eta Compute’s benchmarking shows the Cortex MCU running at up to 10X lower power than competitive solutions across a wide range of temperatures and process corners. Even more important, they ran a range of neural net benchmarks: image recognition, sound recognition (eg glass breaking), motion sensing, always-on keyword recognition and always-on command recognition. In all case they are running in a few hundreds of micro-amps and performing multiple inferences per second (up to 50 for motion sensing).

Overall, Eta Compute say they can already reduce power in AI at the extreme edge by a factor of 10. This is for published networks, not specifically optimized to extreme edge applications. They have been running trials with partners to further optimize networks and have already demonstrated an additional 10X increase in efficiency in image recognition through reducing operations by a factor of 10 and weight sizes by a factor of 2. Comparing that with a common MCU-only implementation, they claim 1000X higher efficiency.

At these numbers, intelligence at the extreme edge could become ubiquitous, even down to truly remote, coin-cell operated devices, asset-tracking devices, even energy-harvesting devices. Eta Compute don’t yet want to provide customer names but sounds like they have quite a few already in development.

Eta Compute recently released a white paper – Deep learning at the extreme edge: a manifesto – on their vision and technology. You can download the white paper HERE.

May 6, 2020July 6, 2020

Tech Shows up for COVID-19: Time to Expand Horizons

Tech Shows up for COVID-19: Time to Expand Horizons
by Terry Daly on 05-06-2020 at 10:00 am
Categories: 5G, China, IoT
1 Comment

Bring digital technology solutions to bear on more of our toughest societal problems

(Illustration/iStock)

“We are all in this together”. The world faces 250,000 COVID-19 deaths, each a tragic human story. The pandemic will bring a litany of “lessons learned” including lack of preparedness, slow response and uneven recovery. The rapid pivot by the political class from response to recrimination amplifies the tragedy and the economic pain from shutdowns. But a secondary story line holds hope for the future: the response by the technology sector (Tech) in rising to the challenge of this humanitarian crisis. The actions taken by Tech were immediate, generous, targeted and impactful. Thank you! Regrettably, there are equally perilous humanitarian crises hiding in plain sight. Each deserves from Tech the same spirit of collaboration and intensity of response as with COVID-19.

What a heartwarming response by Tech! Millions of dollars donated to charitable organizations world-wide along with tens of millions of masks and personal protective equipment provided to front-line medical staff. Real estate was made available for hospital overflow. Product shipments were prioritized and expedited for medical applications. Specific examples abound. AMD created a $15 million initiative to provide high-performance computing (HPC) platforms and resources to accelerate medical research. Intel carved out $50 million to fund access to its technology at medical points of care, speed scientific research and increase access to on-line education. NVidia offered free access its Parabricks offering, enabling researchers 50 times faster analysis of genomic sequences.

Apple and Google joined to create a contact tracing app. Google provided free access to Hangouts Meet videoconferencing to support remote education. Microsoft helped the CDC develop a tool to assess COVID symptoms and suggest patient courses of action. One million messages per day were fielded helping doctors and nurses prioritize and provide care for those most directly in need. IBM led a public-private COVID-19 HPC Initiative with free supercomputer use, free access to its patent portfolio for COVID research and blockchain support to help governments and healthcare groups address supply shortages. These examples are merely illustrative of a much broader Tech response.

COVID has extracted an enormous toll, a loss of life and livelihood across the globe. But there are other humanitarian catastrophes hiding in plain sight. Much of the world at large seems to have become numb to these crises, to have developed a collective immunity from response. Poverty, hunger and a lack of access to clean water, shelter, education and basic health services extract annual death tolls far exceeding COVID. United Nations data shows that minimally ten percent of the global population, over 700 million people, lack access to clean water and live in extreme poverty. Five million children die every year due to poor health services; 265 million children are out of school due to lack of access to education and the need to focus on survival. Political strife has created 70 million refugees.

With a COVID-awoken sensibility, can Tech mobilize to solve these intractable problems? Innovation is emerging to make it happen: AI, 5G, blockchain, IoT, quantum computing, autonomous transport, and others. COVID has accelerated our transformation into the digital economy on a pace unimaginable in December 2019, notably in on-line education, telemedicine and digital payments. In its pandemic response announcement, Intel said its “… technology underpins critical products and services that global communities, governments and healthcare organizations depend on every day. We hope that by harnessing our expertise, resources, technology and talents, we can help save and enrich lives by solving the world’s greatest challenges through the creation and development of new technology-based innovations and approaches.” Spot On! But the world needs Tech to move beyond “hope” and establish a concrete path to close massive inequality gaps and establish a truly inclusive global society.

The familiar Moore’s Law can provide inspiration. Although 50 years old, Moore’s law was futuristic, predicting a doubling in circuit density every two years. But it was not pre-ordained; rather it was achieved by innovation, hard work, collaboration, investment and commitment across Tech. Process technology “roadmaps” and targeted parameters pointed to the milestones necessary to stay on the curve. The industry made Moore’s Law a reality, and it continues to set audacious goals, invest, compete, collaborate, innovate, solve challenges and reward success. This is the approach needed to tackle society’s toughest problems. The private sector is best positioned to make it happen.

But is this the proper role for Tech and the private sector? Companies need a maniacal focus on product development, market validation, execution and scaling to be successful. Wealth creation is the incentive engine that drives success. Contributing to the larger societal benefits as envisioned here seems out of scope. How to proceed? Tech and Venture Capital can support passionate non-profit entrepreneurs with know-how, access to IP, funding and emerging digital technology to solve the toughest societal issues.

Take the example of “charity: water”, founded in 2006 by a non-Tech entrepreneur with the mission to bring clean and safe drinking water to every person living without it. His team established a technology-based digital marketing and fund-raising platform that matched donors to water projects. They established ecosystem partners for local implementation and a remote monitoring tool using IoT sensors and cloud computing technology to provide real-time data on water system performance, assuring sustainability of investment. By year-end 2019, 1 million donors contributed $450 million to over 51,000 water projects in 28 countries, ultimately providing more than 11 million people with clean, safe drinking water. Charity: water is inspiring, impactful and scalable. Imagine how much broader and faster the impact could be with concerted and sustained support from Tech!

The opportunity is now to bring the enormous power of digital technologies to tackle poverty, hunger, water, shelter, health and education with the same focus, purpose, partnership and investment that was brought to bear on COVID-19. Will Tech be all in this battle together?

Terry Daly is a retired semiconductor industry executive

May 6, 2020January 11, 2022

Reliable Line Cutting for Spacer-based Patterning

Reliable Line Cutting for Spacer-based Patterning
by Fred Chen on 05-06-2020 at 6:00 am
Categories: Lithography
2 Comments

Spacer-defined patterning is an expected requirement for advanced semiconductor patterning nodes with feature sizes of 25 nm or less. As the required gaps between features go well below the lithography tool’s resolution limit, the use of cut exposures to separate features is used more often, especially in chips produced by TSMC or Intel, where “cut poly” and “cut metal” are applied [1,2]. However, line cutting introduces new concerns, such as placement error as illustrated in Figure 1.

Figure 1. The effect of line cut placement error is to increase the risk of arcing across the narrowest portion of the gap (right).

The cut itself is expected to be round when confined to very small spaces. These will lead to burrs or spurs at the cut locations. Moreover, the cut itself cannot be perfectly placed all the time, and this leads to the spurs narrowing the gap toward one side. Consequently, unwanted arcing across the gap is likely. Fortunately, there are a number of ways to address this issue today.

Solution 1: Design rule/layout restrictions

The quickest way to avoid these issues is to have enough clearance for the cuts not to be rounded. This would mean the layout of Figure 1 with the gaps close to one another would become forbidden as part of the Design Rule Check (DRC) violations [2]. On the other hand, some layouts such as DRAM active area (see Figure 2 for example) are more tolerant.

Figure 2. The cut for the DRAM active area shown here (18.4 degrees from vertical) is achievable by immersion single exposure with a phase shift mask for features<0.5 wavelength/NA[3], for cut pitches of 80 nm and above. For smaller cut pitches, double patterning, e.g., spacer double patterning, would be required.

Solution 2: Cut grid with selection mask

If the layout of Figure 1 must be used, then a different process must be used to ensure a straight edge cut. One possible approach is to use a cut grid with a selection mask[4]. This is illustrated in Figure 3.

Figure 3. Process sequence for cut selection from a pre-defined cut grid.

This particular approach would entail that the cut would require three masks instead of one. The first two masks would define the cut grid in an etch mask over the pre-patterned lines. The first mask would define a grid of lines perpendicular to the lines to be cut, and a second mask would define posts where the rectangular target cut locations would be separated. A third mask would select the actual cut locations from the grid. The advantage of the cut selection approach is that the cut grid is already predefined with straight edge cuts. However, it does require more masks and process steps.

Solution 3: Self-aligned blocking (or cutting)

A reduction of masks is possible with the self-aligned blocking (SAB) approach [5]. In this approach, the spacer-defined lines are divided into two groups in an ABAB.. fashion, where any two adjacent lines separated by spacers belong to different groups (A or B). Two different materials in the process flow are used to represent each group (Figure 4). These two materials are selected so that one may be etched without affecting the other. The spacers in between the two materials are also not etched. Consequently, a cut across five lines may cut the two selected lines with the straight edge allowed by the longer length of the cut. There is one cut mask for removing A material only and one for removing B material only. Note that the cut masks may also make use of spacer-defined double patterning [6]. The emergence of SAB means that two masks (for A and B) will be used independently of wavelength.

Figure 4. Self-aligned blocking (or cutting) approach makes use of etch selectivity to avoid unwanted cutting of adjacent feature lines.

References

[1] M. C. Smayling, V. Axelrad, “Simulation-Based Lithography Optimization for Logic Circuits at 22nm and Below,” SISPAD 2009.

[2] https://www.design-reuse.com/articles/45832/design-rule-check-drc-violations-asic-designs-7nm-finfet.html

[3] L. W. Liebmann, J-A. Carballo, “Layout Methodology Impact of Resolution Enhancement Techniques,” Proc. 2003 Intl. Symp. Phys. Design, 110.

[4] http://www.tela-inc.com/wordpress/wp-content/uploads/2012/05/SPIE-2013_8683.pdf

[5] https://www.researchgate.net/profile/Angelique_Raley/publication/316087783_Self-aligned_blocking_integration_demonstration_for_critical_sub-40nm_pitch_Mx_l5vel_patterning/links/59f08353aca272cdc7ca3200/Self-aligned-blocking-integration-demonstration-for-critical-sub-40nm-pitch-Mx-level-patterning.pdf

[6] US Patent 9240329, assigned to Tokyo Electron Limited, filed Feb. 17, 2015.

DFT Innovations Come from Customer Partnerships

DFT Innovations Come from Customer Partnerships
by Tom Simon on 05-05-2020 at 10:00 am
Categories: EDA, Siemens EDA

There is an adage that says that quality is not something that can be slapped on at the end of the design or manufacturing process. Ensuring quality requires careful thought throughout development and production. Arguably this adage is more applicable to the topic of Design for Test (DFT) than almost any other area of IC development and verification. DFT is quite literally built into the design and involves every part of the chip. In addition to helping validate completed chips to ensure they meet their functional requirements, DFT is also used to feed yield and reliability statistics back into the design process to help prevent quality issues.

Detecting and preventing defects in ICs is a major area of research and technology development. Over the last 15 years DFT has been the topic of numerous academic papers and articles that rely on investigations and data derived from real world designs produced by leading semiconductor companies. To help enable these investigations and use their results to leverage DFT tool and flow development Mentor, a Siemens business, has worked with many of these companies on improved methods for DFT.

Mentor uses a partnership model for developing new and advanced features in their DFT tools. In a recent publication titled “How to Maximize Your Competitiveness in the Semiconductor Industry Using Advanced DFT” Mentor talks about how important these partnerships are and highlights some of the advances they have made possible.

One such example is how the traditional fault models dating back many years have been expanded with the cell-aware fault model that proves useful for understanding defects that can occur within cells. The Mentor publication references a jointly authored paper with AMD and ON Semiconductor that shows in detail how the cell-aware test approach can reduce defect rates.

The Mentor publication also discusses the origins of their Tessent TestKompress. Back in 2001-2002 as 130nm designs were being developed using copper interconnect it became necessary to run at-speed tests with new fault models. Mentor devised TestKompress to allow the new patterns to fit in limited tester memories. TestKompress has continued to evolve since then to help accommodate larger design sizes at smaller nodes and new test types that require more patterns. The paper cites their work with Broadcom in 2015 that resulted in an IEEE paper for DAC.

Mentor also talks about their work partnering with foundries, such as GLOBALFOUNDRIES, to improve physical failure analysis (PFA). Looking at large numbers of wafers and dies, scan diagnosis can help pinpoint defect locations and lead to understanding the failure/defect mechanisms. Scan diagnosis is used for both yield ramp and yield improvement. Mentor has also worked with fabless semiconductor companies, foundries and integrated device manufacturers to develop root-cause deconvolution (RCD) which uses AI algorithms to estimate the defect Pareto from volume diagnosis results in the presence of noise.

Another significant advancement that has been presented in an IEEE paper is Mentor’s hierarchical DFT. Performing DFT on smaller blocks individually and then wrapping them so each instance can be tested in-situ makes the processes much more manageable. Mentor mentions how both Amazon and Samsung benefited from hierarchical DFT because of improvements in runtime, pattern count and compute resources. Hierarchical DFT is so compelling that it has become the standard practice.

The Mentor paper does a good job of discussing their unified Tessent Connect DFT environment for managing the entire DFT process. One of the innovations here is how it helps move parts of the DFT process into the RTL stage. They also touch on how they have helped their customers meet functional safety requirements, such as ISO 26262 for the automotive market. Built-in self-test (BIST) required new features to operate in these systems. Mentor worked with ARM to put together a safety ecosystem that works for these applications.

The Mentor paper includes references to many of the partnerships that have driven and proven their DFT offerings. It also makes clear that without this level of cooperation and collaboration it would not be possible to develop such a rich and full featured solution for DFT. The full paper and its citations make interesting reading for those looking for a deep dive into their state of the art test solutions.

May 5, 2020April 20, 2022

Accellera Tackles Functional Safety, Mixed-Signal

Accellera Tackles Functional Safety, Mixed-Signal
by Bernard Murphy on 05-05-2020 at 6:00 am
Categories: Accellera, Automotive, Events, Semiconductor Services

I managed a few meetings at DVCon this year in spite of the Coronavirus problems. One of these was with Lu Dai Chairman of Accellera. I generally meet with Lu each year to get an update on where they are headed, and he had some interesting new topics to share.

Membership and headcount remain pretty stable. Any changes (at the associate level) are more in composition as some join for new areas, some drop out as their topic of interest wraps up. Since DVCon is Accellera’s show, Lu walked me through conference status by region. DVCon US happened, but was noticeably cut back since both Cadence and Synopsys dropped out. DVCon China was cancelled (surprise, surprise) and we’ll have to wait to see what will happen to the DVCon Europe show (Oct 27-28). These are strange times.

WEBINAR: Portable Stimulus: Moving UVM Verification Up To The Next Level

On a brighter note they have two new functional working groups, on functional safety and UVM AMS. Functional safety started as a proposed working group late last year and became a full working group in February. UVM AMS also moved pretty quickly from proposed working group to full working group. Lu said that the detail in the proposals and contribution from participants convinced them pretty quickly that both these topics were ready to be developed more fully.

Functional Safety
Functional safety work within Accellera is attracting a lot of support, sometimes from companies that haven’t been involved with Accellera before. Lu didn’t want to tell me who, but I did notice that Bosch is listed as an associate member. I’m told there are now 19 companies participating in the working group. I’m not surprised there is momentum behind standards here. ISO 26262 is famously good at defining what in general you must demonstrate, without getting into details of how you do that. Good for them but that leaves a lot of freedom, some of it unnecessary, in how builders comply. Adding more guidelines can only be a good thing.

Lu told me a big focus area is traceability of requirements between stages – architecture to implementation. He said he has personal experience of this need on one of his own projects. When they aim to certify functional safety, they’re asked to provide traceability. Right now they have no tool to automate that task, to trace between assets in the specs, their architecture, design and development flows.

Without automation and structure this is a lot of overhead and still comes down to trust. That’s OK when you’re dealing with someone you’ve worked with for many years, but not when you want to expand to new customers. Take testing as an example. You might start with a high-level spec in PDF. Then you write your test plan in a vendor tool flow, or maybe in Excel or some in-house format, then you implement the test plan, perhaps in Verilog. When you run the test, some of it runs in C on a processor for which you can’t get coverage, but you can get a logfile dump. Then you have to demonstrate fault coverage, say in an FMEDA. How do you convince your customer that their higher-level safety specs map down cleanly into what you have done? Even if you can, you’ve proven compliance for your tool flow but what happens if your customer wants to integrate components tested with other sets of tools? This is an area ripe for standardization and for automation.

UVM-AMS
On AMS, Lu told me he has an analog designer friend who expresses frustration that Accellera develops lots of “toys” for the digital folks but nothing for people like him. Apparently he’s not the only one concerned. According to Lu, pent-up demand culminated at DVCon Europe last year (there are a lot of analog designers in Europe). He said the analog voice is still a minority but it has become very consistent (and perhaps insistent).

Accellera held an open session in which some groups shared proposals and others expressed interest in participating in working groups. It became quite was pretty clear there were concrete things that could be done. One point they were all clear on is that the methodology they propose should be language-independent, working for both SystemC and SystemVerilog. The next milestone is a whitepaper, intended to be released in October, though the virus and travel constraints might have something to say about that date.

Lu also briefly mentioned progress on PSS 1.1, not enough for me to comment since I missed the tutorial unfortunately. Generally, I’m happy to see they are working on some important areas!

WEBINAR: Portable Stimulus: Moving UVM Verification Up To The Next Level

Also Read:

Functional Safety Comes to EDA and IP

Accellera IP Security Standard: A Start

Semiconductor IP Security Issues

May 4, 2020July 6, 2020

Webinar: Build Your Next HBM2/2E Chip with SiFive

Webinar: Build Your Next HBM2/2E Chip with SiFive
by Mike Gianfagna on 05-04-2020 at 10:00 am
Categories: Events, IP, RISC-V, SiFive

I have been watching the trend for quite some time now that many advanced FinFET designs today are actually 2.5D systems in package. All of these 2.5D silicon interposer-based designs have high-bandwidth memory (HBM) stacks on board. Often there are multiple memory stacks in both 4-high and 8-high configurations. If you follow what’s been called the “more than Moore” revolution associated with 2.5 and 3D design, you know that HBM memory stacks essentially paved the way for this revolution. The HBM memory specification is alive and well with new, high performance versions available today and more in the pipeline.

That’s why an upcoming webinar from SiFive caught my eye. Entitled SiFive HBM2/2E IP Subsystem: Features and Integration Guidelines, this webinar will take you through everything you need to know about the latest HBM standards and how to add HBM memory to your next chip. If you’re even considering the benefits of HBM memory stacks, you need to attend this webinar. The event will be hosted on Thursday, May 14, 2020 from 9AM-10AM and 6PM-7PM, Pacific Daylight Time. One of those time slots should work for you and you can register for the SiFive webinar here.

If you’re still on the fence about attending, here is some information that should help. The event will cover three aspects of HBM-based designs:

Ketan Mehta, Director, SoC IP Product Marketing at SiFive will cover markets, applications and roadmaps
Pranav Kale, Staff Engineer, SoC IP Engineering at SiFive will cover the features of the new HBM2/2E standards
Ritam Das Adhikari, Manager, SoC IP Applications at SiFive will cover implementation guidelines for these technologies

I will have the honor of moderating the event. I’ve reviewed the presentation material with the SiFive team and I can tell you it’s quite complete. You should have an honorary degree in HBM design after attending. Let’s look at a few details to whet your appetite.

First of all, what is HBM and why do you need it? HBM memory stacks are actually very dense memory subsystems implemented with 3D packaging technology. The latest version of the specification is HBM2E. SiFive prepared a useful table below that summarizes what the latest HBM2E spec can deliver when compared with more traditional memory technologies. I would pay particular attention to the density, power efficiency and bandwidth rows.

If your application needs ultra-dense, high-performance memory, HBM is really the only practical path forward. SiFive offers an HBM2/HBM2E IP subsystem that provides the critical elements of an Integrated HBM controller and HBM PHY that support both the HBM2 and HBM2E standards in multiple fab technologies. Below is a table summarizing the substantial technology and support offered by SiFive and their HBM2/2E subsystem. Note that CoWoS stands for chip-on-wafer-on-substrate, a 2.5D technology offered by TSMC.

The webinar goes on to cover popular applications for HBM technology, including high-performance computing, AI training/inference and networking. SiFive’s experience in these areas, as well as their technology roadmap are presented. You will also get a detailed overview of the extensive features and options supported by the SiFive HBM2/2E IP subsystem.

The webinar concludes with an overview of implementation guidelines for your next HBM design. The following items are all addressed:

Key implementation guidelines
HBM2/2E bump map
HBM2/2E PHY orientation in ASIC
Loopback support for testability
DFT methodology debug tools
Collateral available
Support infrastructure

In short, everything you need to embark on an advanced 2.5D HBM-based design. There will also be a Q&A session with the presenters moderated by yours truly. As I’ve said, if you’re even thinking about HBM for your next design, you need to attend this webinar. You can register here. I hope to see you there.