llmda newsletter ad (2)

Samsung 10nm and 7nm Strategy Explained!

Samsung 10nm and 7nm Strategy Explained!
by Daniel Nenni on 04-23-2016 at 7:00 am

Samsung Foundry had an intimate gathering recently for 200 customers and partners that I missed, but I know several people who attended. This event was a precursor to #53DAC where Samsung has the largest foundry presence. I was able to clarify what I had heard via a phone call with Kelvin Low so here is my version of what is important:

Samsung is all in on the foundry business
Samsung is opening up their 200mm fabs, internal IP, design methodologies (IE: low power), and related services (packaging) to foundry customers. To me this is a definitive statement as to their foundry commitment. Samsung is not however going into the captive ASIC business like TSMC (GUC), UMC (Faraday), GlobalFoundries (Invacas), and SMIC (Brite Semiconductor). Samsung could easily buy an established ASIC supplier like eSilicon, Open-Silicon, or Verisilicon, but Samsung is choosing to not compete with their ASIC partners, which makes complete sense since the other foundries do. I would bet Samung will get a much larger share of the ASIC business in the not too distant future (it’s a safe bet since I have already asked my ASIC friends about this).

Samsung Foundry is continuing to focus on 28nm FD-SOI
I saw this at the FD-SOI symposium where Kelvin presented “28FDS – Industry’s first mass produced FDSOI technology for IoT era, with single platform benefits.” Unfortunately the slides are not up yet, I will let you know when they are posted. For China, Samsung is FD-SOI enabling their ASIC partners which is a great strategy, Verisilicon for example is very active in China.

Key FD-SOI take aways from the Symposium:

Proven manufacturability

  • Variability lower than bulk
  • No reliability concerns – all WLR and PLR completed
  • No FD-SOI specific in-line defect generation and systematic failure
  • Proven performance benefits on silicon

28FDS commercial products are in production

  • Technology deployed in actual products
  • 12 tapeouts in 2015 and >10 tapeouts so far in 2016

Full foundry support from design to manufacturing

  • Samsung Foundry supports foundation and basic IP
  • Other IP by 3rd party vendors (ARM, Synopsys, etc…)
  • Regular MPWs available for design validation

28FDS will be a long-lived node

  • Derivative offerings including RF and eNVM
  • Increase reach into new markets (Auto, IoT, Industrial, etc…)

Samsung Foundry is offering a low cost version of 14nm
This was not surprising at all given the TSMC 16FFC announcement last year but I am told that Samsung Foundry LPC (cost down version) offers process simplifications (less masks) without compromising performance. LPC is also PDK compatible with LPP for seamless design migration. Thus far Samsung has shipped more than .5M 14nm wafers making them the largest FinFET foundry share holder today and that’s a fact.

Samsung Foundry 10nm will be in production by the end of 2016
Samsung is approaching 10nm differently than TSMC. Rather than doing a quick node transition from 10nm to 7nm, Samsung will focus on 10nm as a full node by building out different versions targeted at multiple markets. According to Samsung a “true” 10nm can be done using double patterning thus saving the cost of triple or quad patterning. Samsung does use triple patterning on one of the metal layers but still allows bidirectional routing which is easier to design to.

Samsung Foundry 7nm will use EUV for cost reduction
As I was told at SPIE, Samsung will use EUV for 7nm logic before using EUV for memory. An executive from ASML EUV (Dr. Hans Meiling) even presented at the Samsung event to bring everyone up to date. Given that Samsung 10nm will be a full node, delaying 7nm until 2020 (EUV ETA) should not be a problem.

Bottom line: Samsung is showing significant foundry leadership skills again with FD-SOI and FinFETs. Not only does this greatly benefit the fabless semiconductor ecosystem by giving us more innovative foundry choices, it also benefits the semiconductor industry by continuing to push the cost per gate to affordable levels.


Enterprise Design Management Engineered for SoCs

Enterprise Design Management Engineered for SoCs
by Don Dingee on 04-22-2016 at 4:00 pm

In my initial look at ClioSoft’s design management system created from the ground up for the semiconductor industry, I made the opening case for managing and reusing IP across an ASIC design organization. Let’s for a moment say we agree on the need for an enterprise software package to do design management Continue reading “Enterprise Design Management Engineered for SoCs”


Static Timing Analysis Keeps Pace with FinFET

Static Timing Analysis Keeps Pace with FinFET
by Daniel Payne on 04-22-2016 at 12:00 pm

At SemiWiki we’ve been blogging for several years now on the semiconductor design challenges of FinFET technology and how it requires new software approaches to help chip designers answer fundamental questions about timing, power, area and design closure. When you mention the phrase Static Timing Analysis (STA) probably the first commercial EDA tool that pops into mind is PrimeTime from Synopsys. I learned more about what’s been recently updated in PrimeTime by talking with Robert Beanland of Synopsys by phone, and we’ve kept in touch over the years since both working at Viewlogic in the 90’s.

Synopsys engineers have focused on three major areas of improvement with the latest release of PrimeTime 2015.12: Performance, Accuracy and Productivity.

Performance
Like most EDA tools there is a never-ending demand from engineers that they see results quickly, like within the same work day instead of waiting multiple days. One way to get faster results from STA is to exploit multiple CPUs, so the Holy Grail is to get linear scalability when going from 1 to 2, 4, 8 and 16 cores. Clever engineers at Synopsys have figured out how to eek out a further 2X overall speedup in PrimeTime by using up to 16 cores with the 2015.12 release. With the latest version, just comparing 1 core to 16 cores you can expect a speed improvement of 10-15X, pretty close to the ideal speedup.

Another important performance metric is RAM usage with EDA tools, because running a flat STA design on a big SOC (designs in the range of 1 billion transistors) can consume 1T of RAM. A technique called HyperScale allows for less RAM usage, something quite helpful for large designs because HyperScale supports partitioning of your design into smaller pieces and distributing them across multiple smaller machines.

Accuracy
Faster timing results are great, but only if it means that the accuracy is acceptable. With STA tools there have been a couple of approaches used: Graph-based Analysis (GBA) and Path-based Analysis (PBA). GBA produces full coverage results across the entire timing graph with some pessimism. PBA is more accurate, but it runs on a path by path basis requiring longer run times. From a methodology viewpoint you typically start out running STA with a GBA approach to get results quickest, then near the end of your project use PBA to get highest accuracy on critical timing paths. With the 2015.12 release the accuracy of GBA has been updated by improving its accuracy using Parametric On-Chip-Variation, getting timing results even closer to PBA and ultimately, HSPICE results.

Productivity
Engineers are constantly being given changes to the spec in terms of features and requirements, which lead to the practice of Engineering Change Orders (ECOs). As designs get close to tapeout, one key to managing the tapeout schedule is to tightly control any changes that are introduced to the design and only permit changes which move the design closer to meeting PPA (Power, Performance, Area) targets. Achieving the lowest possible power use can drive ECO changes right up until tapeout. In the area of power ECO capabilities for 14nm FinFET I was impressed to see that Samsung was able to get a 20% total power reduction using signoff timing to reduce power. This release also supports downsizing where cells with smaller transistors replace initial cells, plus techniques like Vth swapping. HiSilicon reported that on a recent 16nm FinFET tapeout, PrimeTime provided fast, accurate and predictable design closure helping them reach performance and power targets.


Summary
There’s plenty of challenges in designing SoCs with FinFET technology, and for users of STA tools like PrimeTime you can benefit from using the latest release to help meet those challenges with improved performance, accuracy and productivity. Even early designs for 10nm FinFET will benefit from support for the new and complex placement rules that work together between PrimeTimeand IC Compiler.

Related Blogs


Webinar: How to Implement an ARM Cortex-A17 Processor in 22FDX 22nm FD-SOI Technology

Webinar: How to Implement an ARM Cortex-A17 Processor in 22FDX 22nm FD-SOI Technology
by Daniel Nenni on 04-22-2016 at 7:00 am

Who’s doesn’t like a good webinar? I certainly do as it is one of the most time efficient ways to interact with the fabless semiconductor ecosystem, absolutely. Especially when it addresses two of the top trending topics on SemiWiki and they are ARM and FD-SOI. Here is a quick summary of what you will learn:

GLOBALFOUNDRIES
Technical Webinar Series:

How to Implement an ARM Cortex-A17 Processor in 22FDX 22nm FD-SOI Technology

Moore’s Law has progressed unabated for decades, pushing the laws of physics and helping to power unprecedented innovation throughout the world. Soon science fiction will become reality, as the fastest, most computationally powerful devices will have transistors consisting only of a molecule and a few atoms. However, the best solution isn’t always the biggest chip with the smallest, fastest transistors. For the mobile, pervasive and intelligent computing space, other factors such as ultra-low-power consumption and RF integration have equal or higher priority. For these applications, GLOBALFOUNDRIES 22FDX platform with 22nm fully depleted silicon-on-insulator (FD-SOI) technology offers optimized, differentiated solutions, with an optimal combination of performance, low power and cost.

One of the essential building blocks of these apps is a high-performance, low-power processor. This webinar outlines the physical architecture considerations and physical design steps of implementing an ARM Cortex-A17 quad-core processor in 22FDX FD-SOI technology, including:

  • Digital implementation flow with industry-standard EDA tools
  • Application of body-bias for specific design intents and scenarios
  • Initial PPA results of an ARM Cortex sub-module
  • Analysis of details and results, including comparison to a 28nm implementation

Adopting a technology platform usually includes a new design flow. Not in this case, since the 22FDX digital design flow is similar to the bulk flow with support from all of the major EDA vendors. The flows use EDA techniques (implant-aware, source/drain-aware, double patterning, UPF support) which have been deployed on earlier nodes. This case uses the Cadence tool suite from initial design creation to signoff.

GLOBALFOUNDRIES design IP for the ARM Cortex-A17 processor includes standard cell base cells, power management cells, and cache memory instances, each with support for body-biasing. Strategic use of software-controlled, dynamic body-biasing enables specific application scenarios and optimization criteria to be applied on a block-by-block basis, resulting in optimized tradeoffs of performance and power. Sample scripts show how this is done.

The concept of an optimizable technology platform is great, but PPA results are what really counts. The performance and power consumption of 22FDX 22nm FD-SOI with body-bias compared to 28nm bulk technologies? This implementation shows ~30% higher frequency at the same power. Optimized for power reduction, there is ~45% power reduction at the same frequency. Both optimizations have ~45% area reduction. The implementation of an ARM Cortex-A9 sub-module based on an initial release of the Invecas 8-track continuous RX standard cell library shows significant boost in frequency and power efficiency compared to 28SLP.

The 22FDX platform is ready to adopt for new designs, with the starter kit of 22FDX digital design flow available now. More information including webinars and white papers are available at GLOBALFOUNDRIES.com/22fdx.

Also Read: ARM and FD-SOI are like Peanut Butter and Jelly!


Feeding the Startup Cycle

Feeding the Startup Cycle
by Zach Shelby on 04-21-2016 at 12:00 pm

I am a technologist, an entrepreneur and most recently an angel investor. As I have announced my investments in promising young companies over the last couple years, many people have asked me why. Isn’t the stock market easier (well…), isn’t that risky (yep), what does that mean for your role at ARM (business as usual), how do you chose a company etc.? Maybe a little background first.

For me, being an entrepreneur was a natural choice, something picked up from my father and a drive to succeed at building new things. I spent the first decade of my career creating new technologies for the Internet of Things, and like most technologists, became frustrated when big companies in the mid-2000s did nothing with the technology. My solution was to go and deploy the technology myself, and my first technology startup Sensinode was born in 2005. We had a big vision – bring Internet and Web technology to embedded devices, and really create a scalable platform of innovation instead of the silos of lock-in we had in automation systems at the time.

What I didn’t realise at that time, was that the resistance we were experiencing to the adoption of IoT was a disruption point. Startups are a great way to change an industry, and in the best cases change the world, through the application of new technologies or business models that the status-quo isn’t ready for (crossing the disruption point). We succeeded with Sensinode, by being an early innovator, not growing too fast, and having plenty of luck. In 2013 we had a successful exit to ARM. For me, the most exciting and fulfilling thing was being able to help realize a vision and then find a home where it can scale. In our case it was helping to realize the Internet of Things, for which ARM mbed is an awesome home.

It took a little inspiration before I built up the courage to become an angel investor – Could I really help other startups? Was the time and risk manageable? Why? That all changed for me in December 2014, at the Nokia Foundation awards, when I heard Jorma Ollila (former CEO of Nokia) tell why he personally donated millions of Euros to provide grants for technology graduate students. His logic was simple and inspiring, as a university student a similar grant allowed him to do graduate studies in the UK which he felt helped in the success of his career. He was feeding a positive circle, kiitos Jorma. I realised that the angel investors in my company, in particular Vesa Raudaskoski (Nokia, Elektrobit, Eden Rock), played a key part in helping us succeed (and keeping our sanity).

For me investing in startups is about playing my part in the positive startup cycle of the technology industry (and it keeps life exciting!). If I can help startups succeed through encouragement, my experience and early funding, then I’m helping what makes Silicon Valley such a powerful centre for innovation. My startups have their roots in Finland with international plans, as I find Finland to be one of the best startup scenes on the planet. Great technical resources, reliable people, a positive attitude to startups, reasonable cost, and right-sized VC. And hey, we’re the home of Slush, the biggest startup event in the world 🙂

My first investment was in a company bringing natural gesture recognition to Augmented Reality (AR) industrial applications called Augumenta. Where VR can be compared to the PC, AR is the mobile in a new age of computing. More recently, I invested in CubiCasa, who created a technology platform for indoor floor plans and have already achieved a scalable business. Content that will some day be used in navigation, AR and VR. Just last week I closed on an investment in a fast growing company helping to save energy for entire industries (stealth for now).


For my next project I plan on realising something a little bigger and more personal. As a kid I had an opportunity to play with a lot of technology, building solid state electronics and sensors in the garage, running a BBS and coding my much beloved C64. I would like to bring that same opportunity to every child in Finland. Something fellow technologist and CTO of Espotel, Jaakko Ala-Paavola and I are actively working on. Stay tuned!


Digital Design Trends – A Cadence Perspective

Digital Design Trends – A Cadence Perspective
by Bernard Murphy on 04-21-2016 at 7:00 am

I talked with Paul Cunningham (VP front-end digital R&D) at CDNLive recently to get a Cadence perspective on digital design trends. He sees needs from traditional semiconductor companies evolving as usual, with disruption here and there from consolidation. But on the system side there is explosion in demand – for wearables, furniture, grid and power delivery management and in many more domains. Since many of these teams are starting from a blank sheet, they’re looking (especially in Asia) for high-productivity front-to-back solutions that will get them running at full speed as fast as possible.

Everyone is squeezing cost and power and needs to build in security and delivery is (unsurprisingly) schedule-sensitive but other factors counter traditional expectations of IoT design. Devices are complex (now we have realized you can’t push all the heavy compute to the cloud), so they want to go to advanced nodes. Also system designers want to build a diverse range of solutions, therefore require flows that support fast turn-around without needing vast teams of engineers. In short, direct involvement from systems companies pushes the well-known problem even harder; design complexity and diversity continues to rise much faster than engineering resources, schedules are getting tighter and cost-sensitivity is climbing.

On turn-around time, physical synthesis has to handle 3-5X the number of gates in the same time which demands all kinds of fundamental changes for massive parallelism, for coupling to physical design and for handling advanced technologies. The Cadence Genus Synthesis Solution is now handling 3-5 million placeable instances in production designs and should already be scalable to 10+ million instances flat in overnight runs making it practical to optimize all but the largest IPs and even some sub-systems in single physical synthesis runs.

On high-productivity solutions, physical synthesis requires very accurate correlation between physical estimates at this stage and what will actually be implemented on physical design. You have to use the same placer, the same global router, the same extractor and the same delay calculator which is exactly what Genus does, sharing the same engines with the Innovus Implementation System.

But productivity is not just about engine correlation. Who among us didn’t curse Microsoft Office when Word, PowerPoint and Excel supported what should have been exactly the same features in different ways? Didn’t we feel more productive when those features became common? The same thing applies to design. Front-end and implementation designers need to be able to easily exchange bounding timing and physical constraints, timing reports and scripts without confusion between different flavors of format. Genus provides this again through deep engine integration with Innovus, even extending to report formats.

On cost-pressure, a major contributor to device unit-cost is test-time (which can be as much as 50% of unit cost). System and IoT applications are pushing to reduce this further by looking for even higher levels of test compression. Current compression approaches compress in effect linearly by splitting scan chains into multiple chains. Unfortunately, this increases routability problems since every scan chain must connect to compression logic. The result can be increased die area which puts the cost burden back on silicon. Effectiveness is also bounded by need to keep chains sufficiently long that they can deliver test patterns to test challenging cases.

Cadence recently announced a 2D-based elastic compression in the Modus Test Solution which through a grid-based approach can greatly reduce routing overhead; the elastic part represents the ability to borrow from previous test clock cycles to extend scan patterns for challenging test cases. Between these innovations, Modus allows for much higher levels of compression, reducing time on the tester, without bloating die area. Experience with production designs shows 2-3X reduction in test time with 2.6X reduction in routing overhead.

You can read more detail about recent Genus and Modus advances HERE and HERE.

More articles by Bernard…


Cross-viewing improves ASIC & FPGA debug efficiency

Cross-viewing improves ASIC & FPGA debug efficiency
by Don Dingee on 04-20-2016 at 4:00 pm

We introduced the philosophy behind the Blue Pearl Software suite of tools for front-end analysis of ASIC & FPGA designs in a recent post. As we said in that discussion, effective automation helps find and remedy issues as each re-synthesis potentially turns up new defects. Why do Blue Pearl users say their tool suite is easier to use Continue reading “Cross-viewing improves ASIC & FPGA debug efficiency”


Top Mobile OEM Uses NetSpeed to Boost Its Next Gen Application Processor

Top Mobile OEM Uses NetSpeed to Boost Its Next Gen Application Processor
by Eric Esteve on 04-20-2016 at 12:00 pm

The smartphone segment is certainly the most competitive market for chip makers today and the yearly product launch cadence puts a lot of pressure on the application processor design cycle. End-users expect to benefit from higher image definition, better sound quality, ever faster and more complex applications which push the limits of application processor performance in terms of higher frequency, lower latency, and reduced power consumption. The race for ever better performance is also translating into always more cores, CPU or GPU.

Optimizing processing by integrating cache memory is a well-known architecture, but the core multiplication is creating a new challenge: cache-coherency. Because the memory has to be shared between many cores (6 GPU and 2 CPU in the picture below), when one core read a precise memory location, after another core has written this same location, the read must return the last written value, not an older one. You may define cache-coherency as the ability to maintain consistency between the cache and memory. Cache-coherency is adding to design complexity (a specific function has to be developed), but is severely impacting the overall system performance, that’s why it will become a must have functionality in the complexes multi-core SoC, even in consumer or mobile applications, as it is today in networking and data center.

One of NetSpeed’s customers is a mobile OEM developing his own Application Processor (AP) which it then integrates into its flagship smartphone product. This latest generation application processor was defined as a future-proof platform. To ensure that the processor would be adaptable for future generations, the spec required support for cache coherency. In light of a long list of stringent requirements (performance x2, lower power, complex QoS requirements), the team was relieved that they were not locked into a legacy design or forced into using a low bandwidth crossbar-based interconnect design.

There is only one commercially available on-chip interconnects solution that is capable of satisfying both coherent and non-coherent requirements and that is NetSpeed’s Gemini NoC IP. By selecting NetSpeed IP, the company was able to implement a single solution today that satisfied current requirements for a non-coherent design and future requirements for coherent designs and even designs with a mix of coherent and non-coherent traffic. This approach allowed the company to minimize the risk for future SoC designs because later when the design team needs to implement a new cache-coherent architecture they will be working with an interconnect IP that is already known and well understood.

Not all interconnects (or NoCs) are created equal. NetSpeed provided a physically aware interconnect synthesis engine, an innovative solution that optimizes the interconnect architecture based on workload models able to deliver the right topology within minutes. Implementation of NetSpeed’s NoC led to a new generation SoC that delivers 20% lower latency and 15% higher maximum frequency than target set by the customer. Because NetSpeed synthesizes a pre-verified interconnect design within minutes, the direct impact on design schedule is to shrink six months of analysis down to a few hours.

Designing an heterogeneous multi-core SoC for mobile requires to meet very aggressive target for power consumption and also for Quality of Service (QoS). QoS is not equal to performance (in term of frame/second or MIPS), but a mediocre QoS may lead to downgrade an excellent performance figure-on the paper. For example, NetSpeed’s Gemini NoC allows building a real time bandwidth allocation mechanism, through an automated virtual assignment. The number of wires after P&R is directly impacting the SoC power consumption, but also the SoC performance itself due to wiring congestion. You understand why obtaining 65% fewer wires next to the memory controller is such an important result. Not only the power consumption will decrease, but the easiest routability in this critical area will also help meeting more stringent timing constraints.

Using NetSpeed’ NoC solution to design this heterogeneous multi-core SoC Application Processor for mobile has helped to meet or exceed the incredible TTM requirement for this kind of SoC, improve QoS as well as push the maximum frequency limit, prepare the future by integrating a cache-coherent NoC, and finally help NetSpeed’s customer to launch an AP SoC with a power consumption behavior on line with mobile customer expectation.

This blog is extracted from NetSpeed “Mobile” Success Stories. You can read more about this story and Data Center AP, Automotive SoC, Networking, Digital Home SoC or Data Center Storage stories here

From Eric Esteve from IPNEST


The Semicon Industry Keeps Wafer Fabs Moving Up

The Semicon Industry Keeps Wafer Fabs Moving Up
by Pawan Fangaria on 04-20-2016 at 7:00 am

The worldwide revenue of semiconductor industry has remained flat in last few years; to be more precise, overall semiconductor revenue declined by 1.9% in 2015 and Gartner forecasts it to further decline by 0.6% in 2016. The total revenue was at record high of $340.3 billion in 2014.

Well, semiconductor industry has matured. A market with growth of anything between 0 and 5% for five years or more can be considered as a matured. However, the health of semiconductor industry has been evergreen. This can be best judged by the way wafer fabs are moving up the chain. Although 450mm wafer volume production is not expected in this decade, the number of 300mm wafer fabs is continuously increasing since the beginning of 2002. The semiconductor companies are moving up the production facilities from 150mm and 200mm wafers to 300mm wafers.

The number of 200mm wafer fabs with volume production reached a maximum of 210 and then started declining. By the end of 2015, the number of 200mm wafer fabs was reduced to 148.


An IC Insights report provides this chart depicting the number of 300mm IC wafer fabs which are in volume production since 2002. The chart shows steady growth in the number of 300mm IC wafer fabs and forecasts it to reach 117 by 2020.

The only year when the number declined slightly was in 2013. This was because of closure of 2 fabs by ProMOS in 2013 and delay in schedule of some of the new fabs which were to be opened in that year. In 2014, the number of 300mm fabs jumped from 81 to 90. Naturally, this reflected in a handsome 8% increase in the semiconductor revenue in 2014 from $315 billion in 2013.

In the semiconductor industry, a growth of 0-5% in dollar terms would mean a much higher growth in terms of area of wafer produced because increase in scale of production of wafers reduces price per transistor drastically.

The arrival of 450mm wafer fab will determine how the number of 300mm wafer fabs will grow in future. Also, the price advantage with 450mm wafers is yet to be seen because of their higher production cost. Lithography is a big challenge in 450mm wafer production; the equipments are not there yet and will be highly priced when available.

Within the semiconductor industry, the wafer fab segment of the market is also maturing. The number of companies with fabs is declining. There are fewer companies with higher investment fabs like 300mm. Today, 200mm fabs are spread across 61 fab companies in the world, whereas 300mm fabs are concentrated among just 22 companies.

There will be still fewer companies with 450mm wafer fab facility. In last two years, we saw a large consolidation among semiconductor companies. Amid growing concentration of wafer fabs with fewer companies, there may be a challenge in reduction of price per transistor any further. However, the wafer production is expected to remain healthy.

More Articles from Pawan