One of the most exciting recent developments in low power design and verification is the successive refinement flow developed by ARM® and Mentor Graphics®.
Continue reading “Shifting Low Power Verification to an IP to SoC Flow”
Do 8 Cores Really Matter in Smartphones?
As the smartphone industry has begun to mature, one-upmanship among smartphone manufacturers and SoC vendors has bred a dangerous trend: ever-increasing processor core counts and the association between increased CPU core count and greater performance. This association originated as SoC vendors and OEMs have tried to find ways to differentiate themselves from one another through core counts. Some vendors are creating confusion, as phones today have core counts from 2 up to 8 and vary wildly in performance and, even more importantly, experience. One reason for this confusion is many users and reviewers have used inappropriate benchmarks to illustrate smartphone user experience and real world performance. As a result, we believe that some consumers are misled in their buying decisions and may end up with the wrong device and the wrong experience.
The 8 Core Myth…
The 8 Core Myth, also known as the Octacore Myth, is the perception that more CPU cores are better and having more cores means higher performance. Today’s smartphones range from 2 cores up to 8 cores, even though performance and user experience are not a function of CPU core count. The myth, however, will not be limited to 8 cores, as there are plans for SoCs with up to 10 cores, and we could even see more in the future.
Not All Cores Are the Same…
In some phones, users are getting Octacore designs with up to 8 ARM Cortex-A53 cores. These 8 cores perform differently than 4 ARM Cortex-A57 cores paired with 4 ARM Cortex-A53 cores in what is called a big.LITTLE configuration. Core designs vary wildly from ARM’s own A53 and A57 64-bit CPUs to Intel’s x86 Atom 4-core processors to Apple’s 2-core A8 ARM processor. All these processors are designed differently and behave differently across application workloads and operating systems. Some cores are specifically designed for high performance, some for low power. Others are designed to balance the two through dynamic clocking and higher IPC (instructions per clock). As a result, no two SoCs necessarily perform the same when you take clock speed and core count into account.
Through the different benchmarks, tools, and applications, we showed that CPU core count in a modern smartphone is not an accurate measurement of performance or experience. More CPU cores are not always better. We do acknowledge that having many smaller cores is one way to simplify power management, but these tests are not focused on power; they are focused on performance and user experience.
CPU core counts are not the way that phone manufacturers or carriers should be promoting their devices. CPU core count is only one factor in Android when the SoC has fewer than 4 cores. The marketing of core counts as a primary driver of performance and experience must end and be replaced with improved benchmarking practices and education.
Hopefully this will be the start of a meaningful discussion in the comment section…
Follow the adventures of SemiWiki on LinkedIn HERE!
To err is runtime; to manage, NoC
Software abstraction is a huge benefit of a network-on-chip (NoC), but with flexibility comes the potential for runtime errors. Improper addresses and illegal commands can generate unexpected behavior. Timeouts can occur on congested paths. Security violations can arise from oblivious or malicious access attempts.
Runtime errors also tend to be things not happening in isolation, especially if the first error in a sequence goes unmitigated. If there are natural causes such as congestion, further errors are likely to pile up as operation continues. For unnatural causes such as a malicious app, small errors can be a precursor to larger exploits. A chain of runtime errors can eventually render part or all of a SoC unable to function.
Not all errors are created equal. Many errors simply happen silently, producing an incorrect response but otherwise undetected. Others are seen but unacted upon. Depending on the source and severity of the error condition, recovery might be possible, or it might be prohibitively expensive in terms of extra gates and layers of software. The last resort is the dreaded hardware reset, an increasingly archaic response that irritates users to no end.
Without the right NoC infrastructure, even the first few phases of error management are difficult, making simple errors hard to handle. In architectures such as automotive and the IoT, where real-time and safety-critical operation becomes more important, error management is taking on more importance in SoC design. With the right NoC architecture, built-in features make robust error management easier.
There are five phases in error management: detection, aggregation, logging, reporting, and recovery. In the SonicsGN architecture, detection starts with configurable initiator agents and target agents. A transaction begins at an initiator, flows through routers, is received at a target, and is acknowledged with a response that flows back to the initiator. Each agent has what amounts to a watchdog timer, looking at four situations: burst failure, target flow control, return ack fail, and initiator flow control.
Other types of in-band errors can occur. Each initiator agent has a map of the targets it is permitted to reach; an access attempt can fall into an address “hole” in the map, or might be trying to access a powered-down domain. An initiator agent might see an unsupported command, a target agent might see an access violation, or both might report some type of safety error (as in a firewall, or what Sonics terms a protection mechanism). Another common error is the out-of-band variety, such a violation of the AXI non-modifiable burst. When possible, errors are handled at the initiator agent to minimize network traffic.
The SonicsGN agents detect, aggregate, and log errors – but what happens then? Reporting is configurable, with responses ranging from simple in-band messages to sideband techniques up to processor interrupt. One interesting scenario is an attack on a sensitive IP block. It may be futile to report those errors back to the initiator, who would be generating the attack entirely on purpose. Recovering errors is also up to the customer. Software can go into the agents and sweep the error logs, looking at different classes of severity and frequency, then decide what to do.
The point is customers can use the SonicsGN capability to engineer as little or as much error management into their product as needed. Much of the original work on NoC error management was done in conjunction with TI on various OMAP family members, and Sonics has a detailed error management microarchitecture (under NDA).
There are always tradeoffs. For a fully certifiable, safety-critical system, the investment in both hardware and software for a SoC with robust error reporting and recovery in some scenarios may be well worth it. Even for less hardened systems where recovery might be expensive in silicon, the ability to recognize and report suspicious activity could be instrumental in IoT and other applications. Imagine an IoT edge device that could tell the provisioning system it is being hacked and going offline – while the attack is in progress, rather than after the fact when bad data has propagated all over network.
To me, this seems like the early days of the Internet, when IT types were looking through logs of traffic from routers, firewalls, packet shapers, load balancers, and other appliances looking for who was trying to do what to whom. The difference is now it is all happening within a single chip running a NoC. Without the type of visibility SonicsGN provides, errors could easily run out of control all over a chip – and users would never know until it was too late. With the error management capability in SonicsGN, SoC designers have a lot more control.
Are FinFETs too Expensive for Mainstream Chips?
One of the most common things I hear now is that the majority of the fabless semiconductor business will stay at 28nm due to the high cost of FinFETs. I wholeheartedly disagree, mainly because I have been hearing that for many years and it has yet to be proven true. The same was said about 40nm since 28nm HKMG was more expensive, which is one of the reasons why 28nm poly/SiON was introduced first.
Continue reading “Are FinFETs too Expensive for Mainstream Chips?”
SpyGlass World 2015 User Group Meeting
I attended SpyGlass World this week – to give you an update, to catch up with old friends, including users, and to meet some of the new (to me) players from the Synopsys side of the event. The event was held in the United Club at Levi stadium, just like last year. Don’t know if this will continue. Merging the SpyGlass User Group into SNUG would be logical, also attendance wasn’t as strong as last year, perhaps because no-one expected significant news such a short time after the merger. The marketing guys confirmed it was indeed to soon to share well-developed merger plans. I’ll spare you a blow-by-blow for the event and will focus just on a few presentations that captured my interest. The detailed schedule is here. As always, it’s a lot more useful to learn about tool usage from real users than from a product company and each of the presenters delivered.Philippe Magarshack, CTO of ST, gave an enlightening keynote on IoT and FD-SOI. Some highlights:
- For an IoT example, he mentioned a UK telematics-based insurance company called “Drive like a Girl”. You plug a device into your car which tracks your driving habits. The company uses that information to adjust your insurance rates. Not a new idea but interesting for the company name!
- Claims FD-SOI has better reliability to soft-error rates (radiation-induced errors) than FinFET. This is interesting not just for obvious applications like satellites but also in cars which demand higher reliability standards that found in consumer electronics. The claim is that FinFETs have to compensate with logic triplication (with higher area, cost and power) which is not necessary with FD-SOI.
- To help jumpstart the nascent IoT market, ST is incubating startups in Grenoble; this also helps ST tune their products to real applications. An example is sevenhugs, an application which monitors sleep habits and adjusts temperature and humidity to improve sleep.
Vitor Antunes from the (Synopsys) DesignWare group gave a presentation on the group’s use of SpyGlass:
- Has been used for internal quality checking since 2012
- Became the default choice for DW customer validation of configured cores, through coreConsultant, since 2014
- Today run CDC and Lint, but plan to extend this to other Guideware checks over time
- The DW group aims to stick close to the SpyGlass GuideWare 2.0 ruleset. This should be a hint to end-users still custom-blending their own SpyGlass rulesets that the need to be different is looking increasingly difficult to defend.
Nathan Hsiung from Broadcom explained their use in validating CDC correctness in large networking chips. What I found especially interesting was their use of the hierarchical CDC flow. Designers generally do everything they can to avoid hierarchical verification flows anywhere – in CDC, timing, you name it. Nathan offered several reasons for why they went hierarchical.
- They had no choice. They build huge networking chips – many 100’s of millions of gates. Flat CDC analysis on designs this size would take too long to complete.
- Hierarchical approaches allow running top-level CDC verification many more times
- Analysis is much simpler when you follow a bottom-up flow. If you run everything flat, you are deluged with warnings and errors with no obvious place to start debug. If you clean-up bottom-up, each stage of debug is manageable.
- They checked carefully that abstracted models used in higher-level runs do not make unreasonable assumptions, which gives them high confidence that those higher-level runs are not masking potential problems.
Finally, Michael Sanie from Synopsys presented the Synopsys Verification vision. This is worthy of a separate blog, so I won’t detail it here. You can learn more abut the SpyGlass products, now on the Synopsys website, HERE.
GlobalFoundries 14nm Process Update
Last Monday Daniel Nenni and I had a conference call with Jason Gorss and Shubhankar Basu of Global Foundries to get an update on their 14nm process. Shubhankar is the product line manager for 14nm.
Global Foundries 14nm process is a FinFET on bulk process they licensed from Samsung and both companies supply the same process although as Shubhankar pointed out they have different targets for the process especially in light of Global acquiring IBM’s chip business.
The 14nnm process is run in Global’s Fab 8 in upstate New York. The 14LPE process was the first generation and was qualified in January. A second generation 14LPP process was qualified in September. They are now shipping 14nm parts to customers.
Shubhankar said that Global is being successful at getting customers to design for their 14nm process and aren’t just a “second source”. In the mobility space a lot of consumer parts need high performance. Global has a huge IP library for LPE and LPP and they are having success in mobile diversifying their customer base.
14LPE and 14LPP share the same design rules and most of the equipment is the same. 14LPP offers a 10% to 14% performance boost over 14LPE. The Back-End-Of-Line (BEOL) is the same but 14LPP has some transistor enhancements. I asked about the transistor enhancements and Shubhankar said he couldn’t give specifics. I mentioned enhancements such as taller fins. Shubhankar would only comment that you can make geometry enhancements and you can reduce parasitic by tailoring things such as implants.
My analysis of his comments is as follows: He did say the pitches are the same so my guess would be a combination of taller fins and implant adjustments. This would suggest to me that manufacturing costs aren’t very different for LPE and LPP. Taller fins would require a longer etch and likely have some yield impact but I would expect the costs to be similar, say within 10% (just my opinion). I also think this is basically what TSMC did with 16FF and 16FF+, 16FF+ is a tuned version.
Production qualification is greater than 60% yield on a 128Mb SRAM. Yields on LPP are >20 points higher than that now (>80%) and LPE is ahead of that.
Daniel mentioned that processes used to be performance first but are now mobile-power first. He asked how the FPGA and processor guys get what they want.
Shubhankar notes that FinFET changes the game, performance is so much better versus planar that it is a no brainer. Further the 3D FinFET structure has much lower leakage than planar (fully depleted). Their IP is also characterized for high performance.
Shubhankar believes 14nm will be a long lived node, there is a lot more to be gotten out of it, they aren’t standing still. I asked him if this would be like what we see at 28nm where companies such as TSMC have HP, HPL, HPM, LP, HPC and other variants. He said they would continue to tune performance and cost and that tier two and even tier three customers are adopting the process.
I asked how they segment 14nm versus the 22nm SOI family Global recently announced. Shubhankar said that certain IOT applications that are middle spectrum or on the lowest end of mobility are still on 28nm and reluctant to move to FinFETs. 22nm SOI is an intermediate process and can be pushed close to FinFFET performance. You can also run 22nm SOI at 0.4 volts and 14LPP is not ready for that space yet.
In terms of cost a 22nm SOI wafer is less expensive than a 14nm FinFET wafer but die cost depends on how much shrink you can get. Some die will be cheaper in 22nm SOI and some die will be cheaper in 14nm FinFET if you get enough die size shrink. If you need the longest battery life and performance is less important 22nm SOI wins, if you need maximum performance 14FF wins.
Daniel commented that Qualcomm and others are doing server chips. Will a foundry do a very high performance process for server chips. This led to a discussion about the IBM chip business acquisition and whether IBM’s 14nm FinFET on SOI process will be available to outside customers. Global is committed to support IBM’s SOI technology for 10 years but beyond that they can’t comment on IBM technology plans although they did say they think it is a game changer.
My analysis: An interesting thing here is IBM’s 14nm FinFET on SOI process is a server process with embedded DRAM for very large on-chip cache. This could potentially be an interesting process for very high performance applications if Global could or would offer it externally. Once again this section is just my opinion, they wouldn’t comment on this.
Daniel also commented that he thinks 10nm will be kind of a short node like 20nm because 10nm and 7nm will use the same equipment (the same way that 20nm and 16nm used the same equipment).
More information HERE.
Follow the adventures of SemiWiki on LinkedIn HERE!
How to Make Smartphone Even Smarter? With Deep Learning
The IT industry marvels like augmented reality and artificial intelligence, which marked technological utopianism in the science fiction movies during the 1970s and 1980s, are here now, enabled by a machine-learning technique called deep learning.
Continue reading “How to Make Smartphone Even Smarter? With Deep Learning”
Free Copy of EDAgraffiti!
Last month we offered a free PDF version of our book “Fabless: The Transformation of the Semiconductor Industry”, for the greater good. More than thirteen thousand people have downloaded it thus far so we would like to keep the momentum going with another book giveaway. Paul McLellan has graciously offered his book on EDA so here we go again:
Continue reading “Free Copy of EDAgraffiti!”
Eliminating the Chasm of Computing
The world has come through a long way from the 1[SUP]st[/SUP] UNIVAC computer in 1952, IBM mainframes and minicomputers in secured computer rooms to laptops, tablets, mobile phones, and so on in our hands. Imagine the compute power of a minicomputer then and the compute power of your smartphone or tablet today. And do you know the cost of initial DEC 12-bit PDP-8 in 1964? It was USD 16000+. There were only a few computer companies and computers were accessible to a few lucky ones. I started my tour to a computer lab with mainframe and IBM desktops in 1980s. I had to book my slot for computer time of a few hours which was generally available during the evening or night.
The basic principle of computing with a CPU, memory, and I/O interface still prevails. What has changed is the efficiency of processing, power, interaction, mobility, and so on for all of these components along with a large number of other integrated components on what we call an SoC. In the realm of this enormous change, which is not yet complete, many companies had to cross their own chasms but who were the heroes to unleash that change and what’s being done to eliminate the chasm of the computing itself?
Compare an Apollo Guidance Computer of 1960s with today’s Cypress SoC with ARM Cortex-M0 core. Today’s SoC is millions of times lesser in weight, size, and power consumption. However we are still very far from the ultimate. So far only a substantial portion of things have evolved, connection between all the things is yet to take place, what we call IoT or IoE. It’s a natural progression as internet came after computers. But what does that mean? That means computers everywhere. Along with the evolution of IoT, we have to also fix many things to process and manage that explosion of data. Companies are struggling to manage the expenses they incur in maintaining their data centers and power bills. Lot of power is still being wasted due to poor management of public amenities across the world.
Coming back to what has made such transformation possible and what more is going on to accomplish the ultimate. It’s the power of collaboration between companies, each company diligently leading its own space. The transformation has taken place from a few computer companies to a large ecosystem of a number of IP companies, fabless design companies, IDMs, and pure-play foundries. While foundries have excelled in bringing the process technologies to a few nm scales, there has been tremendous progress in the semiconductor design industry with a large number of IP providers and SoC companies. ARM is one of the earliest initiators of the IP-based business model for the chip and SoC design. It’s in their philosophy not to manufacture chips but enable chip manufacturers and fabless designers to design best optimized chips by providing them with ARM’s most efficient processor cores and other building blocks in terms of power, performance, density, cost, interaction, and so on. Today, ARM IP (CPU, GPU, multimedia, interconnect, software etc.) goes inside most of the consumer and other electronic products; more than 60 billion chips have ARM IP inside them. Under different licensing schemes, ARM’s products have long lifecycles ranging into 20+ years, i.e. about ARM’s own life which came into existence in Nov 1990. They would be soon celebrating their Silver jubilee!
Now in the realm of IoT what’s required to proliferate computing everywhere, cementing the chasms between Endpoints, Hub and Cloud? It requires mobile devices with many sensors integrated together into the system with intelligent software, data security, integrated RF, very high energy efficiency, light weight and size, and cost brought down to a couple of cents.
ARM[SUP]®[/SUP] Cordio Radio Core IP provides on-chip radio connectivity to Endpoint devices that can operate at 1 V or lower, consuming ultra-low power that can enhance battery life by a number of years.
What more is required? We require new class of memory with high endurance, low cost, and high performance with low latency. Samsung, SK hynix, Micron and others are coming up with new types of memory such as STT-RAM, Phase-change RAM, ReRAM, and so on. A seamless integration of various components, on a single chip as much as possible and then in a single package including flash, MEMS, and other discrete components is required.
What about the cloud? This needs new architecture too; a distributed and intelligent architecture which moves data to where it is consumed, thus eliminating a lot of overhead. Also the data center servers need high performance for accelerated storage and energy efficiency to save power. Although Intelis the undisputed leader in processors for data center servers, new breed of server processors with ARM processor core inside are on the horizon. caviumThunderX[SUP]TM[/SUP]uses 64bit ARMv8 architecture for workload optimized cloud processing. Also AppliedMicroleads in ARM server CPU market, and most recently Qualcomm has prototyped an ARM server CPU that is supposed to be very power efficient, high performance and cost efficient.
There is no slowdown in evolution of newer technologies, architectures, methodologies, software and so on. There is an interesting presentation given by Simon Segars, CEO of ARM Holdings as part of the “View from the Top” lecture series at Berkeley Engineering. This is freely available on Youtube HERE.
Simon provided an in-depth view about the technological progress and what ARM is doing to unleash newer opportunities, newer markets. Today, the ARM connected community has 1000+ partners which put the company in the middle of intense collaboration with design companies, foundries, EDA tool providers, IP providers, and other service providers.
Accessibility to technology has become cheap; development boards with ARM processors can be obtained for a few tens of dollars and those can work with Linux, Android, or Windows. The momentum in IoT has pushed several open platforms for development and innovation of IoT. ARM® mbed is the IoT platform promoted by ARM. Also, there are other IoT platforms promoted by Intel, Samsung, Google, and others. Raspberry Pi is being used by school students in space projects promoted by UK Space Agency.
Follow the adventures of SemiWiki on LinkedIn HERE!
Why Your IP Release Methodology Can Make or Break Reuse Success
When the term IP first came into popular usage for IC design, it was primarily conceived as blocks of design content that were bought occasionally from external sources. A customer might use one or two in a design, and expect one delivery with perhaps some minor updates before tapeout. Over the last 18 years, this notion has changed quite a bit as design complexity has increased and the level of integration required for SOC development has risen dramatically.
What have we learned in the intervening years? For one thing, IP is frequently developed internally as well as externally. So, IP management and release systems are not just for so-called IP providers anymore. Furthermore, the necessities of design reuse have formalized a lot of the processes regarding sharing of design blocks internally. It’s easy to think of it still being a matter of just freezing a design block and handing it off as being a workable method of releasing IP. However, this is far from the case.
Methodics recently posted an article on their website that peels the onion regarding the issues related to developing a truly effective way of developing and releasing IP. Here are some of the key messages in this article.
Because IP’s are essentially design blocks, they possess a variety of views, some of which have their own independent release criteria. Good examples are documentation, schematics, RTL, layout, test files, generated views for external processing steps, etc. The list of view types is vast. Each view type can have its own release schedule and dependencies. Lumping them all together in the IP management system can cause inefficient course grain data management.
Internally during development of IP there will be a variety of people who work on the IP. But they each may work in different domains and do not have a need to populate their workspace with all the files in the IP. They need a way to selectively populate their workspace with just the entities they need to do their job.
Because different view types will inevitably pass regression tests out of sync with each other, IP developers will often need to have selected versions of certain views in their workspaces so they can work with known good versions. This may vary from user to user. Additionally, in some cases designers may want the latest version of the layout data, which is in flux, but also reference a specific version of the schematic for instance. Just grabbing a single release of the physical design view will not accomplish this.
It gets even more interesting when you add in the dependency checks to ensure that the data all represents a consistent flow from start to finish. It can be a big problem if the RTL is newer than the gate level data for instance. Or another example is where the verification output is older than the design data it is derived from.
At the end of the Methodics article they discuss how they have implemented their MDX_VIEWS to dynamically allow for easily setting up all the views needed for an IP release in such a way that all the needs are met and the data is consistent. I won’t spoil this for you, but instead refer you to the article on the Methodics website.
What is clear from reading their blog, is that they have encountered many real world large team design problems and have been able to effectively devise features in their software to accommodate these needs.
Follow the adventures of SemiWiki on LinkedIn HERE!

