The IT industry marvels like augmented reality and artificial intelligence, which marked technological utopianism in the science fiction movies during the 1970s and 1980s, are here now, enabled by a machine-learning technique called deep learning.
Continue reading “How to Make Smartphone Even Smarter? With Deep Learning”
Free Copy of EDAgraffiti!
Last month we offered a free PDF version of our book “Fabless: The Transformation of the Semiconductor Industry”, for the greater good. More than thirteen thousand people have downloaded it thus far so we would like to keep the momentum going with another book giveaway. Paul McLellan has graciously offered his book on EDA so here we go again:
Continue reading “Free Copy of EDAgraffiti!”
Eliminating the Chasm of Computing
The world has come through a long way from the 1[SUP]st[/SUP] UNIVAC computer in 1952, IBM mainframes and minicomputers in secured computer rooms to laptops, tablets, mobile phones, and so on in our hands. Imagine the compute power of a minicomputer then and the compute power of your smartphone or tablet today. And do you know the cost of initial DEC 12-bit PDP-8 in 1964? It was USD 16000+. There were only a few computer companies and computers were accessible to a few lucky ones. I started my tour to a computer lab with mainframe and IBM desktops in 1980s. I had to book my slot for computer time of a few hours which was generally available during the evening or night.
The basic principle of computing with a CPU, memory, and I/O interface still prevails. What has changed is the efficiency of processing, power, interaction, mobility, and so on for all of these components along with a large number of other integrated components on what we call an SoC. In the realm of this enormous change, which is not yet complete, many companies had to cross their own chasms but who were the heroes to unleash that change and what’s being done to eliminate the chasm of the computing itself?
Compare an Apollo Guidance Computer of 1960s with today’s Cypress SoC with ARM Cortex-M0 core. Today’s SoC is millions of times lesser in weight, size, and power consumption. However we are still very far from the ultimate. So far only a substantial portion of things have evolved, connection between all the things is yet to take place, what we call IoT or IoE. It’s a natural progression as internet came after computers. But what does that mean? That means computers everywhere. Along with the evolution of IoT, we have to also fix many things to process and manage that explosion of data. Companies are struggling to manage the expenses they incur in maintaining their data centers and power bills. Lot of power is still being wasted due to poor management of public amenities across the world.
Coming back to what has made such transformation possible and what more is going on to accomplish the ultimate. It’s the power of collaboration between companies, each company diligently leading its own space. The transformation has taken place from a few computer companies to a large ecosystem of a number of IP companies, fabless design companies, IDMs, and pure-play foundries. While foundries have excelled in bringing the process technologies to a few nm scales, there has been tremendous progress in the semiconductor design industry with a large number of IP providers and SoC companies. ARM is one of the earliest initiators of the IP-based business model for the chip and SoC design. It’s in their philosophy not to manufacture chips but enable chip manufacturers and fabless designers to design best optimized chips by providing them with ARM’s most efficient processor cores and other building blocks in terms of power, performance, density, cost, interaction, and so on. Today, ARM IP (CPU, GPU, multimedia, interconnect, software etc.) goes inside most of the consumer and other electronic products; more than 60 billion chips have ARM IP inside them. Under different licensing schemes, ARM’s products have long lifecycles ranging into 20+ years, i.e. about ARM’s own life which came into existence in Nov 1990. They would be soon celebrating their Silver jubilee!
Now in the realm of IoT what’s required to proliferate computing everywhere, cementing the chasms between Endpoints, Hub and Cloud? It requires mobile devices with many sensors integrated together into the system with intelligent software, data security, integrated RF, very high energy efficiency, light weight and size, and cost brought down to a couple of cents.
ARM[SUP]®[/SUP] Cordio Radio Core IP provides on-chip radio connectivity to Endpoint devices that can operate at 1 V or lower, consuming ultra-low power that can enhance battery life by a number of years.
What more is required? We require new class of memory with high endurance, low cost, and high performance with low latency. Samsung, SK hynix, Micron and others are coming up with new types of memory such as STT-RAM, Phase-change RAM, ReRAM, and so on. A seamless integration of various components, on a single chip as much as possible and then in a single package including flash, MEMS, and other discrete components is required.
What about the cloud? This needs new architecture too; a distributed and intelligent architecture which moves data to where it is consumed, thus eliminating a lot of overhead. Also the data center servers need high performance for accelerated storage and energy efficiency to save power. Although Intelis the undisputed leader in processors for data center servers, new breed of server processors with ARM processor core inside are on the horizon. caviumThunderX[SUP]TM[/SUP]uses 64bit ARMv8 architecture for workload optimized cloud processing. Also AppliedMicroleads in ARM server CPU market, and most recently Qualcomm has prototyped an ARM server CPU that is supposed to be very power efficient, high performance and cost efficient.
There is no slowdown in evolution of newer technologies, architectures, methodologies, software and so on. There is an interesting presentation given by Simon Segars, CEO of ARM Holdings as part of the “View from the Top” lecture series at Berkeley Engineering. This is freely available on Youtube HERE.
Simon provided an in-depth view about the technological progress and what ARM is doing to unleash newer opportunities, newer markets. Today, the ARM connected community has 1000+ partners which put the company in the middle of intense collaboration with design companies, foundries, EDA tool providers, IP providers, and other service providers.
Accessibility to technology has become cheap; development boards with ARM processors can be obtained for a few tens of dollars and those can work with Linux, Android, or Windows. The momentum in IoT has pushed several open platforms for development and innovation of IoT. ARM® mbed is the IoT platform promoted by ARM. Also, there are other IoT platforms promoted by Intel, Samsung, Google, and others. Raspberry Pi is being used by school students in space projects promoted by UK Space Agency.
Follow the adventures of SemiWiki on LinkedIn HERE!
Why Your IP Release Methodology Can Make or Break Reuse Success
When the term IP first came into popular usage for IC design, it was primarily conceived as blocks of design content that were bought occasionally from external sources. A customer might use one or two in a design, and expect one delivery with perhaps some minor updates before tapeout. Over the last 18 years, this notion has changed quite a bit as design complexity has increased and the level of integration required for SOC development has risen dramatically.
What have we learned in the intervening years? For one thing, IP is frequently developed internally as well as externally. So, IP management and release systems are not just for so-called IP providers anymore. Furthermore, the necessities of design reuse have formalized a lot of the processes regarding sharing of design blocks internally. It’s easy to think of it still being a matter of just freezing a design block and handing it off as being a workable method of releasing IP. However, this is far from the case.
Methodics recently posted an article on their website that peels the onion regarding the issues related to developing a truly effective way of developing and releasing IP. Here are some of the key messages in this article.
Because IP’s are essentially design blocks, they possess a variety of views, some of which have their own independent release criteria. Good examples are documentation, schematics, RTL, layout, test files, generated views for external processing steps, etc. The list of view types is vast. Each view type can have its own release schedule and dependencies. Lumping them all together in the IP management system can cause inefficient course grain data management.
Internally during development of IP there will be a variety of people who work on the IP. But they each may work in different domains and do not have a need to populate their workspace with all the files in the IP. They need a way to selectively populate their workspace with just the entities they need to do their job.
Because different view types will inevitably pass regression tests out of sync with each other, IP developers will often need to have selected versions of certain views in their workspaces so they can work with known good versions. This may vary from user to user. Additionally, in some cases designers may want the latest version of the layout data, which is in flux, but also reference a specific version of the schematic for instance. Just grabbing a single release of the physical design view will not accomplish this.
It gets even more interesting when you add in the dependency checks to ensure that the data all represents a consistent flow from start to finish. It can be a big problem if the RTL is newer than the gate level data for instance. Or another example is where the verification output is older than the design data it is derived from.
At the end of the Methodics article they discuss how they have implemented their MDX_VIEWS to dynamically allow for easily setting up all the views needed for an IP release in such a way that all the needs are met and the data is consistent. I won’t spoil this for you, but instead refer you to the article on the Methodics website.
What is clear from reading their blog, is that they have encountered many real world large team design problems and have been able to effectively devise features in their software to accommodate these needs.
Follow the adventures of SemiWiki on LinkedIn HERE!
Simulating Full-System EMI for a Car in Just 28 Minutes
While there’s a lot of cool technology in modern semiconductors, it’s important to raise our sights periodically to understand how well these chips will work in the systems for which they are designed. One area driving a lot of semiconductor growth is automobile electronics. We’ve had drive-train control forever it seems, but now that is enhanced to provide user-selectable ride-control. More recently we have seen advanced driver assistance systems, high-end infotainment options, mobile hot-spots, Bluetooth-enabled links to smart phones and more. But all that fancy electronics won’t be very useful (and might well become dangerous) if disrupted by external and internal EM interference. One big concern is the antenna represented by as much as 5 kilometers of interconnect in modern cars, which can both receive and transmit EM. Multiple radio sources from phones, tablets and Bluetooth devices inside or outside the car add to EMI concerns.
Amazingly (ha-ha) there are standards for automobile electronics beyond ISO 26262. ISO 11451-2 defines expectations for minimum EMI immunity in vehicle electronics. This requires measurements in and around the complete vehicle in an anechoic chamber. An antenna in the chamber generates RF waves swept by strength and frequency over specified ranges and interference is monitored at select test points in the car. Final testing must follow this methodology, but it is an expensive approach to require for experimentation during the design phase.
As always, simulation is the optimum approach for design but this is complex analysis problem. You have to model electronics and cabling, the car frame and the EM sources. ANSYS® pioneered in their HFSS software a Finite Element Method (FEM) simulation approach called the “domain decomposition method” (DDM) which makes it possible to parallelize the analysis across multiple machines. This itself was a big step forward in making numerical simulation a practical alternative to chamber testing.
For those of you who don’t know this domain so well, HFSS is the industry standard for full-wave electromagnetic field simulation. It is used extensively for design of antennas, communications systems, radar systems, satellites, smart phones and tablets and transportation electronics. It also integrates with ANSYS multiphysics solutions so you can, for example, combine electromagnetics simulations with thermal simulations.
ANSYS recently introduced a refinement to HFSS called FE-BI, using an integral equation approach to truncate boundaries around EM emitters and around the car; this greatly reduces the space that must be considered in analysis while preserving the accuracy of a complete FEM analysis. The impact of that reduction in a joint test with Fiat-Chrysler was a 10-fold increase in performance and a 10-fold reduction in memory demand. A realistic simulation of an external antenna transmitting from a distance to a car frame completed in just 28 minutes and calculated electric fields matched closely with a full FEM analysis.
Fiat-Chrysler engineers were also able to model the effect of a wiring harness as a transmitter inside the car, allowing them to determine the effects of coupling between the harness and a connected PCB. What impresses me about all of this analysis is the scale and detail it comprehends – from PCBs and individual wire connections, through the full car-frame, to external EM sources some distance away from the car. To design for the complexity we demand today in auto electronics and yet still to be able to model EMI to a level allowing detailed design decisions to optimize immunity, this is a high order of modeling.
To lean more about HFSS-BI and the Fiat-Chrysler experiments, click HERE.
ARM Keil Ecosystem Integrates Atmel SAM ESV7
Even the best System-on-Chip is useless without software, as well as the best designed S/W needs H/W to flourish. The “old” embedded world has exploded into many emergent markets like IoT, wearable, and automotive is no more restricted to motor control or airbags as innovative products, from entertainment to ADAS are being developed. What is the common denominator with these emergent products? All of these are requiring more software functionality and fast memory algorithm with deterministic code execution and consequently innovative hardware supporting these requirements, like ARM Cortex-M7 based Atmel SAM ESV7.
ARM has released a complete software development environment for a wide range of ARM Cortex-M based microcontroller devices, Keil MDK. Keil is part of ARM wide ecosystem, allowing developers to speed up system release to the market. MDK includes the µVision IDE/Debugger, ARM C/C++ Compiler, and essential middleware components and software packs. If you are familiar with Run-Time Environment stacked description, you can recognize the various stacks. Let’s focus on “CMSIS-Driver”. CMSIS is the standard software framework for Cortex-M microcontrollers, extending the SAM-ESV7 Chip Library with standardized drivers for middleware and generic component interfaces.
By definition, MCU is designed to address multiple applications and Atmel SAM ESV7 is dedicated to support performance demanding and DSP intensive systems. Thanks to the 300 MHz clock, SAM ESV7 delivers up to 640 DMIPS and its DSP performance is double that available in the Cortex-M4. A double-precision floating-point unit and a double-issue instruction pipeline further position the Cortex-M7 for speed.
Let’s review some of these applications where SAM ESV7 is the best choice.
Finger Printer Module
The goal is to provide human bio authentication module for office or house access control. The key design requirements are:
- +300 MHz CPU performance to process recognition algorithms.
- Image Sensor Interface to read raw finger image data from finger sensor array.
- Low cost and smaller module size: Inside Flash/Memory to reduce BOM cost and module size.
- Memory Interface to expand model with memory extension just in case.
The performance requirement and the need for Image sensor Interface can be seen as essential needs, but which will make the difference will be to offer both cheaper BOM cost and smaller module size than competitor. The SAM S70 integrates up to 2 MB embedded Flash, which is twice more than the direct competitor and may allow reducing BOM and module size.
Automotive Radio System
Each $ cent count in automotive and OEM prefer using a MCU than MPU, at first for cost reason. Building an attractive radio for tomorrow’ car require developing very performing DSP algorithms. Such algorithms used to be developed on expansive DSP standard part, leading to large module size, including external Flash and MCU leading obviously to a heavy BOM. In a 65nm embedded Flash process device, the Cortex-M7 can achieve a 1500 CoreMark score while running at 300 MHz, and its DSP performance is double that available in the Cortex-M4. This DSP power can be used to manage 8 channels of speaker processing, including 6 stages of Biquads, delay, scaler, limiter and mute functions. Atmel SAM S71 workload is only 63% of the CPU, leaving enough room to support Ethernet AVB stack, very popular in automotive.
One of the secret sauces of the Cortex-M7 architecture is to provide a way to bypass the standard execution mechanism using “tightly coupled memories,” or TCM. There is an excellent white paper describing TCM implementation in ARM Cortex-M based Atmel SAM S70/E70 series, “Run Blazingly Fast Algorithms with Cortex-M7 Tightly Coupled Memories” from Lionel Perdigon and Jacko Wilbrink, that you can find here.
More about Keil MDK
Getting started with Atmel SAM V7 Video
From Eric Esteve from IPNEST
Three New Things from ITC this year
The NFL has its annual Super Bowl contest each year, EDA vendors attend DAC, then the test folks attend ITCwhich was in Anaheim a few weeks ago. I’ve marketed ATGP, BIST and DFT tools before so I like to keep updated on what’s happening at conferences like ITC. Robert Ruiz from Synopsys spoke with me by phone to provide an update on three new things that they shared at ITC this year.
- New ATPG Technology
- ISO 26262-5 Certification (Auto reliability standard)
- Improved support of FinFET and emerging node tests
New ATPG Technology
You can only enhance EDA software so much by adding incremental new features, sometimes you have to start over from scratch in order to achieve something great, and in this case the engineers at Synopsys did a re-write of their ATPG algorithm in order to produce new results that complete some 10X faster than before. Just think about that, having to wait either 10 hours or just 1 hour. Of course, we all want that 1 hour result because of the productivity gains provided.
This faster ATPG engine takes more effective advantage of all cores in your machine. In the previous ATPG algorithm each core would read in the entire design, while with the new algorithm the paths for each fault are distributed to an available core, thus using far less RAM. Here’s a look at some actual ATPG runt time speed up results across five different designs:
Your time spent to diagnose where faults are located in actual silicon are also sped up now. This improvement helps foundries and IDMs to bring up new process nodes quicker.
Another benefit of the new ATPG engine is that the quality of the patterns has improved, so you can expect a reduction in the number of test patterns by around 25% as shown below:
ISO 26262-5
The automotive market is adopting an increasing amount of semiconductor content with each new generation of vehicles, and this market has a well-defined requirement for functional safety as defined by the ISO 26262 certification standard. The actual design process and EDA software must meet the ISO standards. To that end, the following Synopsys EDA tools have achieved this functional safety certification process:
- TetraMAX ATPG
- DFTMAX Ultra
- DesignWare STAR Memory System
- STAR Hierarchical System
FinFET and Emerge Node Tests
With FinFET transistors there are new process defects that require new fault models like:
- Metal shorts
- Open vias
- Resistive shorts
- Resistive opens
- Litho induced defects
You really need cell-aware faults based upon transistor-level defects found inside of the cells.
You can now perform cell-aware characterization using a SPICE tool like HSPICE in about one day, which is another 10X speed improvement.
Summary
The faster, more efficient ATPG algorithms at Synopsys are certainly going to make test engineers more productive in their jobs. Complying to the ISO standard for automotive safety will be another benefit of using the Synopsys DFT tools. Finally, you can be confident on FinFET designs that the silicon defects are properly modeled.
The big question is, when I get a hold of this new technology? Early customers should get the first use of this technology by the end of the year.
Related Blogs
Why FPGA synthesis with Synplify is now faster
The headline of the latest Synopsys press release drops quite a tease: the newest release of Synplify delivers up to 3x faster runtime performance in FPGA synthesis. In our briefing for this post, we uncovered the surprising reason why – and it’s not found in their press release. Continue reading “Why FPGA synthesis with Synplify is now faster”
FinFET Reliability Analysis with Device Self-Heating
At the recent TSMC OIP symposium, a collaborative presentation by Synopsys and Xilinx highlighted the importance of incorporating the local FinFET device self-heating temperature increase on the acceleration of device reliability mechanisms.
Continue reading “FinFET Reliability Analysis with Device Self-Heating”
How Virtualization Makes Network Processor Verification Efficient
When Ethernet was introduced in 1983 it ran at 10Mbps and mostly relied on hubs and coaxial cable. Twelve years later a faster speed was introduced, running at 100Mbps. Since then we have seen an acceleration of new data rate introductions. According top the Ethernet Alliance, Ethernet could have 12 speeds before 2020, with 6 of those new speeds introduces in the next 5 years.
At the high-end, 25G, 50G and 100G data rates are necessary for supporting the massive growth of internet traffic. The first networks in the 80’s used hubs with very simple circuits and 10’s of ports. Today’s routers and switches have sophisticated network processors that provide numerous services to maintain throughput and manage traffic. The complexity of networking chips is second only to CPU and GPU chips. The number of ports is driving up beyond 256 heading toward 1024.
I find it fascinating that the compute power of the network itself often exceeds the compute power of the things we are connecting to it. The number of gates required is well into the hundreds of millions. With the increasing complexity of network processors comes a need for increasing verification performance. HDL simulators can run at effective speeds of ~100Hz. That translates into only 1,000 packets per day, which is not enough to permit adequate functional verification prior to tape out. The cost of a missed bug is potentially millions of dollars.
In circuit emulation (ICE) in a verification lab has been the fallback for many companies to achieve better verification. However, this still is not an ideal solution. Huge rat’s nests of cables are required. An engineer often needs to be physically present to set up, fix cables and run tests. Unfortunately, the need to run tests with the system software running on the system hardware remains as critical as ever.
Taking a page from the virtualization trend seen in data centers, Mentor Graphics decided to implement what they call a VirtuaLab for verifying next generation protocols in a tractable fashion. The key to this is using their Veloce hardware emulator to accelerate simulation speed to be able to simulate over 11 million packets per day. Veloce offers visibility and traceability that is not available in ICE approaches. This means that when bugs are found, they are more easily tracked down.
The traditional deal breaker for emulation is capacity. To solve this problem Mentor took the approach of designing their own ASIC, called Crystal 2, for their emulator. Higher capacity per chassis means a cleaner set up in the data center. Their largest configuration supports running designs with 2 billion gates. No tangles of wires connecting boxes are needed. The Ethernet testers can run on a regular workstation and can drive many ports per workstation.
The VirtuaLab is not new, but Mentor just announced support for the latest and highest speed Ethernet specifications, 25G, 50G and 100G. They also provided a lot of customer testimonial information to illustrate the efficacy of the Veloce emulation solution for network chip design.
I had a chance to listen to Jean-Marie Brunet, Marketing Director for the Emulation Division at Mentor Graphics discuss the new features in Veloce VirtuaLab. During the presentation he discussed two compelling case studies. The first was Juniper Networks using Veloce to verify a network switch with multiple 10G ports. They were streaming 64-byte packets. They were able to run billions of transactions of 64 bytes up to jumbo across all ports. They went from ¼ frame per second with simulation to ~3,662 frames per second with Veloce. The second case study features Cavium. They were able to go from 1,000 packets per day up to 11 million by switching to Veloce.
Mentor continues to show its prowess in the emulation arena. It’s good to see how emulation can be used to dramatically improve pre-silicon verification of complex hardware-software systems.

