Banner 800x100 0810

A Brief History of TSMC’s OIP part 2

A Brief History of TSMC’s OIP part 2
by Paul McLellan on 09-18-2013 at 11:00 pm

The existence of TSMC’s Open Innovation Platform (OIP) program further sped up disaggregation of the semiconductor supply chain. Partly, this was enabled by the existence of a healthy EDA industry and an increasingly healthy IP industry. As chip designs had grown more complex and entered the system-on-chip (SoC) era, the amount of IP on each chip was beyond the capability or the desire of each design group to create. But, especially in a new process, EDA and IP qualification was a problem.

See also Part 1.

On the EDA side, each new process came with some new discontinuous requirements that required more than just expanding the capacity and speed of the tools to keep up with increasing design size. Strained silicon, high-K metal gate, double patterning and FinFETs each require new support in the tools and designs to drive the development and test of the innovative technology.

On the IP side, design groups increasingly wanted to focus all their efforts on parts of their chip that differentiated them from their competition, and not on re-designing standard interfaces. But that meant that IP companies needed to create the standard interfaces and have them validated in silicon much earlier than before.

The result of OIP has been to create an ecosystem of EDA and IP companies, along with TSMC’s manufacturing, to speed up innovation everywhere. Because EDA and IP groups need to start work before everything about the process is ready and stable, the OIP ecosystem requires a high level of cooperation and trust.

When TSMC was founded in 1987, it really created two industries. The first, obviously, is the foundry industry that TSMC pioneered before others entered. The second was the fabless semiconductor companies that do not need to invest in fabs. This has been so successful that two of the top 10 semiconductor companies, Qualcomm and Broadcom, are fabless. And all the top FPGA companies are fabless.

The foundry/fabless model largely replaced IDMs and ASIC. An ecosystem of co-operating specialist companies innovates fast. The old model of having process, design tools and IP all integrated under one roof has largely disappeared, along with the “not invented here” syndrome that slowed progress since ideas from outside the IDMs had a tough time penetrating. Even some of the earliest IDMs from the “real men have fabs” era have gone “fab lite” and use foundries for some of their capacity, typically at the most advanced nodes.

Legendary TSMC Chairman Morris Chang’s “Grand Alliance” is a business model innovation of which OIP is an important part, gathering all the significant players together to support customers. Not just EDA and IP but also equipment and materials suppliers, especially high-end lithography.

Digging down another level into OIP, there are several important components that allow TSMC to coordinate the design ecosystem for their customers.

  • EDA: the commercial design tool business flourished when designs got too large for hand-crafted approaches and most semiconductor companies realized they did not have the expertise or resources in-house to develop all their own tools. This was driven more strongly in the front-end with the invention of ASIC, especially gate-arrays; and then in the back end with the invention of foundries.
  • IP: this used to be a niche business with a mixed reputation, but now is very important with companies like ARM, Imagination, CEVA, Cadence andSynopsys, all carrying portfolios of important IP such as microprocessors, DDRx, Ethernet, flash memory and so on. In fact, large SoCs now contain over 50% and sometimes as much as 80% IP. TSMC has well over 5,500 qualified IP blocks for customers.
  • Services: design services and other value-chain services calibrated with TSMC process technology helps customers maximize efficiency and profit, getting designs into high volume production rapidly
  • Investment: TSMC and its customers invest over $12B/year. TSMC and its OIP partners alone invest over $1.5B. On advanced lithography, TSMC has further invested $1.3B in ASML.

Processes are continuing to get more advanced and complex, and the size of a fab that is economical also continues to increase. This means that collaboration needs to increase as the only way to both keep costs in check and ensure that all the pieces required for a successful design are ready just when they are needed.

TSMC has been building ecosystems of increasing richness for over 25 years and feedback from partners is that they see benefits sooner and more consistently than when dealing with other foundries. Success comes from integrating usage, business models, technology and the OIP ecosystem so that everyone succeeds. There are a lot of moving parts that all have to be ready. It is not possible to design a modern SoC without design tools, more and more SoCs involve more and more 3[SUP]rd[/SUP] party IP, and, at the heart of it all, the process and the manufacturing ramp with its associated yield learning all needs to be in place at TSMC.


The proof is in the numbers. Fabless growth in 2013 is forecasted to be 9%, over twice the increase for the overall industry at 4%. Fabless has doubled in size as a percentage of the semiconductor market from 8% to 16%, during a period when the growth in the overall semiconductor market has been unimpressive. TSMC’s own contribution to semiconductor revenue grew from 10% to 17% over the same period.

The OIP ecosystem has been a key pillar in enabling this sea change in the semiconductor industry.

TSMC’s OIP Symposium is October 1st. Details and to register here.


Texture decompression is the point for mobile GPUs

Texture decompression is the point for mobile GPUs
by Don Dingee on 09-18-2013 at 9:00 pm

In the first post of this series, we named the popular methods for texture compression in OpenGL ES, particularly Imagination Technologies PVRTC on all Apple and many Android mobile devices. Now, let’s explore what texture compression involves, what PVRTC does, and how it differs from other approaches.

Continue reading “Texture decompression is the point for mobile GPUs”


Samsung 28nm Beats Intel 22nm!

Samsung 28nm Beats Intel 22nm!
by Daniel Nenni on 09-18-2013 at 6:00 pm

There was some serious backlash to the “Intel Bay Trail Fail” blog I posted last week, mostly personal attacks by the spoon fed Intel faithful, but there are however some very interesting points made amongst the 30+ comments so be sure and read them when you have a chance.

The Business insider article “The iPhone 5S Is By Far The Fastest Smartphone In The World…” reinforces what I have been saying in regards to technology versus the customer experience. Yes, Intel has very good technology and “factory assets” but that no longer translates to market domination in the new world order of smartphones and tablets.

The chips in the chart below are called SoCs or System on Chips. Per Wikipedia:

A system on a chip or system on chip (SoC or SOC) is an integrated circuit (IC) that integrates all components of a computer or other electronic system into a single chip. It may contain digital, analog, mixed-signal, and often radio-frequency functions—all on a single chip substrate.


The technological benefits of SOCs are self-evident: everything required to run a mobile device is on a single chip that can be manufactured at high volumes for a few dollars each. The industry implications of SoCs are also self-evident: as more functions are consolidated into one SoC semiconductor companies will also be consolidated. For example, Apple is a systems company by definition but they also make their own SoC. Samsung, Google, Amazon, and others are following suit. For those who can’t make their own highly integrated SoC, QCOM is the smart choice as they are best positioned to continue to dominate the merchant SoC market.

The most interesting thing to note is that the Apple A7 chip is manufactured by Samsung on a 28nm LP (Gate-First HKMG) process and the Bay Trail chip is Intel 22nm (Gate-Last HKMG). So tell me: Why is a 28nm chip faster than a 22nm chip? Simply stated, the Intel 22nm process is not a true 22nm process by the standard semiconductor definition. It also has to do with architecture and the embedded software that runs the chip which directly translates to customer experience. The other thing is cost. 28nm silicon is the cheapest transistor we will probably ever see again.

At the IDF keynote Intel CEO Brian K. clearly stated that the new Intel 14nm microprocessor will be out in the first half of 2014 while the new Intel 14nm SoC will be out the first half of 2015. Okay, the first thing that comes out of Intel fabs is the microprocessors then a year or so later the SoC? Either SoCs are harder for Intel than microprocessors or SoCs are just not a priority? You tell me. Unfortunately for Intel, Samsung 14nm will also be out the first half of 2015 and based on what I have heard thus far it will be VERY competitive in terms of cost, speed, and density.

Bottom line: Intel is not an SoC company, Intel is not a systems company, Intel is far removed from the user experience. All things considered, Intel is still not relevant in the mobile SoC market and I do not see that changing anytime soon. Just my opinion of course.

lang: en_US


How Do You Do Computational Photography at HD Video Rates?

How Do You Do Computational Photography at HD Video Rates?
by Paul McLellan on 09-18-2013 at 2:22 pm

Increasingly, a GPU is misnamed as a “graphics” processing unit. They are really specialized architecture highly parallel compute engines. You can use these compute engines for graphics, of course, but people are inventive and find ways of using GPUs for other tasks that can take advantage of the highly parallel architecture. For example, in EDA, Daniel Payne talked to G-Char at DAC in June who use GPUs to accelerate SPICE circuit simulation.

Users don’t actually want to program on the “bare metal” of the GPU. Firstly, often the hardware interface to the GPU is not fully revealed in the documentation since it is only intended to be accessed through libraries. And secondly, whatever the GPU interface is to this GPU, it will not be the same to the next generation GPU, never mind to a competitor’s GPU. So standards have sprung up not just for using GPUs for graphics but also using GPUs for general purpose computation. The earliest of these was OpenCLand Imagination were a pioneer in using mobile GPU compute, being the first mobile IP vendor to achieve OpenCL conformance.

Indeed, to encourage wider adoption of GPU compute in mobile, Imagination has delivered a number of OpenCL extensions related to advanced camera interoperability. They can be used with the Samsung Exynos Soc which is the application processor in the Galaxy S4 and some other smartphones, and is also available on a development board. These new OpenCL API extensions enable developers to implement Instagram-like functionality on real-time camera data, including computational photography and video processing, while offloading the main CPU.

The challenge to doing all this, especially in HD (and 4K is coming soon too), is that video generates a lot of data to swallow, and it has tight constraints: if you don’t like this frame there is another one coming real soon.

Historically, computational photography like this has been done on the main CPU, which is fine for still images (you can’t press the shutter 60 times a second even if the camera would let you) but it doesn’t work for HD video because the processor gets too hot. It is no good adding extra cores either, since the CPU will just overheat and shut down. But modern application processors in smartphones contain a GPU (often an Imagination PowerVR). The trick is to use a heterogeneous solution that combines these blocks and lives under the power and thermal budgets.

In an application, for example, performing real-time airbrushing on a teleconference, there is a camera, the CPU, the GPU and a video codec involved, four components all requiring access to the same image data in memory. Historically, all OpenCL implementations in the market created a behind-the-scenes copy of the image data while transferring its ownership between components. This increases memory traffic, burns power and reduces performance. This negates, or perhaps even eliminates, the advantage of using the GPU in the first place.

Imagination has been working with partners to develop a set of extensions that allow images to be shared between multiple components sharing the same system memory. So no increased memory traffic, lower power and better processing performance, based on Kronos (the OpenGL group) EGL images which handles issues related to binding and synchronization.

See a video explaining and showing it all in action (2 mins)

To accelerate adoption, Imagination is releasing the PowerVR GPU compute SDK and programming guidelines for PowerVR Series5XT GPUs. In the future they will release an SDK supporting PVRTrace for PowerVR Series6 GPUs, which will allow compute and graphics workloads to be tracked simultaneously.

More details are in the Imagination blog here.


Early Test –> Less Expensive, Better Health, Faster Closure

Early Test –> Less Expensive, Better Health, Faster Closure
by Pawan Fangaria on 09-18-2013 at 11:00 am


I am talking about the health of electronic and semiconductor design, which if made sound at RTL stage, can set it right for the rest of the design cycle for faster closure and also at lesser cost. Last week was the week of ITC(International Test Conference) for the Semiconductor and EDA community. I was looking forward to what ITC prescribes for the community this year and it happened such that I was able to talk to Mr. Kiran Vittal, Senior Director of Product Marketing at Atrenta. I know SpyGlass has good things about its RTL level test methodology to unveil at ITC this year, so this was an opportune time to indulge in an interesting conversation with the right expert who wrote the SpyGlass whitepaper “Analysis of Random Resistive Faults and ATPG Effectiveness at RTL

So let’s know about what the trend is in test automation, challenges, issues, solutions,… from this conversation with Mr. Vittal –

Q: How’s the experience from ITC? What’s the top news from there?

We had a very good show at ITC this year, with over 100 of the conference attendees checking out our latest advances in RTL testability solutions and product demonstrations at our exhibitor booth. We also had several dedicated meetings with existing and potential customers from leading semiconductor companies discussing how SpyGlass RTL solutions would address their key challenges.


[Customers watching SpyGlass product demonstrations]

Samsung’s Executive VP, Kwang-Hyun Kim gave the keynote address on “Challenges in Mobile Devices: Process, Design and Manufacturing”. He said that the quality of test for mobile devices is of utmost importance due to the large number (typically millions) of parts shipped. Any degradation in quality can badly impact the rejection ratio which can affect the bottom line as well as the reputation of the particular company.

One of the best paper awards was also related to RTL validation –
ITC 2012 Best Student Paper Award: Design Validation of RTL Circuits Using Evolutionary Swarm Intelligence by M. Li, K. Gent and M. Hsiao, Virginia Tech

Q: What is the industry trend? Is it converging towards RTL level test?

The industry trend for deep submicron designs is to ship parts with the highest test quality, because new manufacturing defects surface with smaller geometry nodes. These defects need additional tests and the biggest concerns are the quality and cost of testing to meet the time-to-market requirements. Every semiconductor vendor performs at-speed tests at 45nm and below. The stuck-at test coverage goal has also increased from ~97% to over 99% in the last couple of years. The other concern has to do with design size and the large quantity of parts being shipped in the mobile and handheld space and the danger of having rejects for missing manufacturing defects due to the lack of high quality tests.

The only way to get very high test quality is to address testability at an early stage in the design cycle, preferably at RTL. The cost of test can be reduced by reducing the test data volume and test time. This problem can also be addressed by adhering to good design for test practices at the RTL stage.

Q: So, there must be great interest in SpyGlass test products which work at RTL level?

Yes, there is great interest in Atrenta’s solution for addressing testability at the RTL stage. We have a majority of the large semiconductor companies adopting our RTL testability solution. They say that by adopting SpyGlass testability at RTL, they are able to significantly shorten design development time and improve test coverage and quality.

Q: What’s the benefit of using SpyGlass? How does it save testing time? What kind of fault coverage does it provide compared to ATPG?

Fault coverage estimation at RTL is very close to that of ATPG, typically within 1%.

SpyGlass DFT has a unique ability to predict ATPG (Automatic Test Pattern Generation) test coverage and pinpoint testability issues as the RTL description is developed, when the design impact is greatest and the cost of modifications is lowest. That eliminates the need for test engineers to design test clocks and set/reset logic for scan insertion at the gate level, which is expensive and time consuming. This significantly shortens development time, reduces cost and improves overall quality.

The test clocks in traditional stuck-at testing are designed to run on test equipment at frequencies lower than the system speed. At-speed testing requires test clocks to be generated at the system speed, and therefore is often shared with functional clocks from a Phase Locked Loop (PLL) clock source. This additional test clocking circuitry affects functional clock skew, and thus the timing closure of the design. At-speed tests often result in lower than required fault coverage even with full-scan and high (>99%) stuck-at coverage. Identifying reasons for low at-speed coverage at the ATPG stage is too late to make changes to the design. The SpyGlass DFT DSM product addresses these challenges with advanced timing closure analysis and RTL testability improvements.

SpyGlass MBIST has the unique ability to insert memory built-in self test (BIST) logic at RTL with any ASIC vendor’s qualified library and validate the new connections.

Q: What’s new in these products which caught attention at ITC?

We introduced a new capability at ITC for analyzing the ATPG effectiveness early at the RTL stage, especially for random-resistive or hard-to-test faults.

ATPG tools have been traditionally efficient in generating patterns for stuck-at faults. The impact on fault coverage, tool runtime and pattern count for stuck-at faults is typically within reasonable limits.

However, the impact of “hard-to-test” faults in transition or at-speed testing is quite large in terms of pattern count, runtime or test coverage. This problem can now be analyzed with the SpyGlass DFT DSM product to allow RTL designers to make early tradeoffs and changes to the design to improve ATPG effectiveness, which in turn improves test quality and overall economics. Details can be seen in our whitepaper on Random Resistive Faults.

Q: Any customer experience to share with these products?

Our customers have saved at least 3 weeks of every netlist handoff by using SpyGlass DFT at the RTL stage. The runtime of SpyGlass at RTL is at least 10 times faster than running ATPG at the netlist level. Our customers have claimed about a 25x overall productivity improvement in using SpyGlass at RTL vs. other traditional methods.

This interaction with Mr. Vittal was much focused on testing aspects of Semiconductor Design. After this conversation, I did take a look at the new whitepaper on Random Resistive Faults and ATPG Effectiveness at RTL. It’s worth reading and knowing about effective methods to improve fault coverage.


Mentor Teaches Us About the Higg’s Boson

Mentor Teaches Us About the Higg’s Boson
by Paul McLellan on 09-17-2013 at 4:46 pm

Once a year Mentor has a customer appreciation event in Silicon Valley with a guest speaker on some aspect of science. This is silicon valley, after all, so we all have to be geeks. This year it was Dr Sean Carroll from CalTech on The Particle at the End of the Universe, the Hunt for The Higg’s Boson and What’s Next.

Wally Rhines asked who Higgs was but Dr Carroll didn’t give much detail. He is actually an emeritus professor of physics from the Edinburgh University in Scotland. Since the physics department, like computer science, was in the James Clerk Maxwell Building while I was doing my PhD and then working for the university, I presume we had offices in the same building (it’s large). As an aside, Edinburgh is famous for naming its buildings after people that it snubbed. James Clerk Maxwell, he of the equations of electromagnetism, was turned down for a lectureship. Biology is in the Darwin building. Charles Darwin wasn’t much interested in the courses in medicine he took at Edinburgh and either failed and was kicked out or moved on to London on his own volition, depending on which version of the story you like. Anyway, he never graduated.

So what about particles? In the early 20th century, everything was simple: we had protons and neutrons forming the nucleus of atoms and electrons whizzing around the outside like planets. There were all those quantum mechanical things about where the orbits had to be to stop the electrons just spiraling into the nucleus. And how the nucleus stayed together was unclear.

Around the time I was an undergraduate, the existence of the strong nuclear force was postulated (well, something had to stop the nucleus flying apart) and the realization that particles were actually made of three quarks (“three quarks for Muster Mark”, James Joyce in Finnegan’s Wake). The strong nuclear force was transmitted by gluons (that glued the nucleus together). Luckily, in my quantum mechanics courses we only had to worry about Bohr’s simple model of the atom. The weak nuclear force had actually be proposed by Fermi as another way to keep the nucleus together, and it was a force that only operated over tiny distances (unlike, say, electromagnetism, strong nuclear force and gravity which operate over infinite distances albeit vanishingly weakly as distances get large).

Gradually the standard model was filled in (and it turned out that Group Theory had some practical applications outside of pure math). Various quarks were discovered. The gauge bosons that carry the weak nuclear force were found. But there was an empty box in the standard model, the Higgs boson. Rather like Mendeleev predicting certain elements nobody had ever seen because there was a hole in the periodic table, the standard model had a hole. The Higgs boson is required to explain why some particles have mass and why they don’t move at the speed of light all the time.

You probably know that the Higgs boson was discovered (technically it is extremely likely that it was detected but these things are probablistic) at the large hadron collider at CERN, which runs underneath parts of Switzerland and France. and cost $9B to build. There was also a US collider that was started in Texas that would have been 3 times as powerful, but in the wonderful way of Congress, it was canceled…but only after $2B had been spent for nothing.


EDAC Export Seminar: Don’t Know This Stuff…Go Directly to Jail…Do Not Pass Go

EDAC Export Seminar: Don’t Know This Stuff…Go Directly to Jail…Do Not Pass Go
by Paul McLellan on 09-17-2013 at 2:04 pm

I am not making this up: All exports from the United States of EDA software and services are controlled under the Export Administration Regulations, administered by the U.S. Department of Commerce’s Bureau of Industry and Security (BIS). You need to understand these regulations. Failure to comply can result in severe penalties including imprisonment. Many smaller companies do not have the resources to track these complex and ever changing sets of export regulations, but ignorance of the laws is not an excuse. Non-compliance can be costly.

Tomorrow evening, Wednesday 18th September you can learn from one of the experts. Cadence actually has an employee responsible for all this. Larry Disenhof, Cadence’s Group Director, Export Compliance and Government Relations and the chairman of the EDAC Export Committee is an expert in export regulations and will share his knowledge of the current state of US export regulations. Learn the basics as well as best practices. But there is homework you should do before tomorrow: watch the export overview presentation here so you already know the basics.

The seminar will be held at EDAC, 3081 Zanker Road, San Jose. There is a reception at 6pm and the seminar from 7-8.30pm.

If you work for an EDAC member company it is free. If you don’t, then you can’t go. However, you do need to register here.

BTW I assume you already know that one of the benefits of being a member in EDAC in addition to seminars such as this one is that you get a 10% discount off the cost of your DAC booth floorspace. That can make the EDAC membership close to free.

Talking of EDAC, don’t forget the 50th Anniversary of EDA on October 16th at the Computer History Museum (101 and Shoreline, just near the Googleplex). Join previous Kaufman Award recipients like Bob Brayton and Randy Bryant. And the founders of EDAC, Rick Carlson and Dave Millman. Previous CEOs such as Jack Harding, Penny Herscher, Bernie Aronson, Rajeev Madhavan, and Sanjay Srivastava. Current EDAC board members: Aart de Geus, Lip-bu Tan, Wally Rhines, Simon Segars, John Kibarian, Kathryn Kranen, Ravi Subramanian, Dean Drako, Ed Cheng, and Raul Camposano. Investors who have focused on EDA, like Jim Hogan and John Sanguinetti. Not to mention Dan Nenni and I (SemiWiki is one of the sponsors of the event). To register go here.


Are 28nm Transistors the Cheapest…Forever?

Are 28nm Transistors the Cheapest…Forever?
by Paul McLellan on 09-17-2013 at 10:43 am

It is beginning to look as if 28nm transistors, which are the cheapest per million gates compared to any earlier process such as 45nm, may also be the cheapest per million gates compared to any later process such as 20nm.

What we know so far: FinFET seems to be difficult technology because of the 3D structure and so the novel manufacturing required but seems to be stable once mastered. Intel ramped it at 22nm and TSMC says they are on-track to have it at 16nm. What Intel doesn’t have at 22nm is double patterning, and TSMC does at 20nm. It seems to have severe variability problems even when mastered. TSMC have not yet ramped 20nm to HVM so there is still an aspect of wait-and-see there.

The cheap form of double patterning is non-self-aligned, meaning that the alignment of the two patterns on a layer is entirely up to the stepper repeatability which is of the order of 4nm apparently. Of course this means that there is huge variation in any sidewall effects (such as sidewall capacitance) since the distance between the “plates” of the capacitor may vary by up to 4nm. This is variability that is very hard to remove (the stepper people of course are trying to tighten up repeatability, of course, which will be needed in any case for later processes). Instead EDA tools need to analyze it and designers have to live with it, but the margins to live with are getting vanishingly small.

There is a more expensive form of double patterning that is self-aligned using a spacer or mandrel. The material required on the wafer is laid down. The spacer is laid down on top and then the edges of the spacer are used to create the pattern as sidewalls and then the spacer is removed. The pattern is then used to etch the underlying material. This involves a lot more process steps and is a lot more expensive, but does has less variability since the two sidewalls are closely aligned due to the way they were manufactured. It looks like we will need to use this approach to construct the FinFET transistors and their gates for 10nm and below.

A general rule in fabrication is to touch the wafer as few times as possible. Double patterning inevitably drives this up. One way to get it down is to use bigger wafers and, of course, there is a big push towards 450mm wafers. These provide about the same reduction in cost per million transistors as we used to get from a process generation (where the rule of thumb was twice as many transistors with a cost increase of 15% per wafer leaving about a 35% cost reduction per million transistors). But 450mm reduces the cost of all processes, and so probably the only thing that will ever be cheaper than 28nm on 300mm wafers will be 28nm on 450mm wafers. Or perhaps 28nm on 300mm wafers running in a fully-depreciated fab.

The other hope for cost reduction is EUV lithography. I’m skeptical about it, as you know if you’ve read my other blogs about it. Even if it works people appear to be planning to do 10nm without it (except in pilot stuff). EUV is almost comical if you describe it to someone. Droplets of molten tin are shaped with a small laser. Then a gigawatt-sized power plant blasts the molten tin with half a dozen huge lasers, vaporizing it and producing a little bit of EUV light. But everything absorbs EUV light so everything also has to be in a vacuum. Then the light is bounced of half a dozen mirrors and a reflective mask. And I use “reflective” in a relative way, since only about 30% of the light is reflected per mirror since these are actually mirrors that work by interference and Bragg reflection because a regular polished metal mirror would simply absorb the EUV. So maybe 4% of the light reaches the photoresist. And if that isn’t enough, the masks cannot be made defect free. And contamination on the mask will print since we (probably) can’t put a pellicle on it to keep contamination out of the focal plane since the pellicle will absorb all the EUV too. But maybe it will all come good. After all, when you first hear about immersion lithography or CMP they sound pretty unlikely ways to make electronics too.

If this scenario is true, there are a couple of big problems. The first is that electronics will stop getting cheaper. You can have faster processors, lower power, more cores or whatever. But it will cost you. In the past we have always had a virtuous cycle where costs get reduced, performance/power improve and design size increases. So even if you didn’t want to move from 90nm to 65nm for performance, power or size reasons, the cost reduction made you do it anyway. That will no longer be true. Yes, Apple’s Ax chips for high-end smartphones will move even if the chips cost twice as much: in a $600 phone you won’t notice. But the mainstream smartphone market, and the area with predicted high growth, is sub $100. They will all have to be made at 28nm for cost reasons, and make do without the stuff 20nm and below offers. Products that can support a premium for improved performance will benefit, of course, but we’ve never been in an area where next year’s quadcore chip costs twice what this year’s dual core chip did.

The other big problem is that if only a few designs move to these later nodes, the bleeding edge designs that really need the performance, will that be enough to justify the multi-billion dollar investment in developing the processes and building the fabs. Those leading edge smartphone, router and microprocessor chips can go to 22/20nm for a year, then move to 16/14nm. But then…crickets. All the other designs can’t afford to pay the premium and stay at 28nm. Chip design will be like other industries, such as batteries say, improving at most by a few percent per year and no longer with any exponential component.

To be fair, Intel have said publicly that they see costs continuing to come down. Various theories are around as to why this is. It seems likely that they believe it rather than just posturing. Maybe they know something nobody else does. I know equipment people who say that they no longer get any access to Intel fabs so don’t really know everything their equipment is being used for. Maybe they are mistaken. Or maybe it is true for Intel who are transitioning from a very high-margin microprocessor business that is not very cost-sensitive to a foundry/SoC business that is very cost-sensitive, and so are also transitioning from not having good wafer costs to being forced to be competitive with everyone else. I’ve said before that managers at Intel often think that they are better than they are since there is so much margin bleedthrough from microprocessors that everyone else looks good. Maybe this is just another facet of that phenomenon.

See also my report on EUV from Semicon West in July.
See also my take on Intel’s cost-reduction statements.


TSMC’s 16FinFET and 3D IC Reference Flows

TSMC’s 16FinFET and 3D IC Reference Flows
by Paul McLellan on 09-17-2013 at 2:01 am

Today TSMC announced three reference flows that they have been working on along with various EDA vendors (and ARM and perhaps other IP suppliers). The three new flows are:

  • 16FinFET Digital Reference Flow. Obviously this has full support for non-planar FinFET transistors including extraction, quantized pitch placement, low-vdd operation, electromigration and power management.
  • 16FinFET Custom Design Reference Flow. This supports the non-digital stuff. It allows full customer transistor level design and verification including analog, mixed-signal, custom digital and memory.
  • 3D IC Reference Flow, addressing vertical integration with true 3D stacking using both TSV through active silicon and/or using interposers.


There have been multiple silicon test vehicles. The digital reference flow uses an ARM Cortex-A15 multicore processor as a validation vehicle and helps designers understand the challenges of full 3D RC modeling and quantized transistor widths, which are the big “new” gotchas in the FinFET world. The flow also includes methodology and tools for improving PPA in 16nm including low voltage operation analysis, high resistance layer routing optimization, path based analysis and graph based analysis correlation to improve timing closure.

By definition there is less automation in the custom reference flow because it’s custom and the designer is expected to do more by hand. But obviously it includes the verification necessary for compliance with 16nm manufacturing and reliability requirements.

The 3D IC flow allows everything to move up into the third dimension. This is still work in progress so I don’t think this will be any type of final 3D flow. But it supports what you would expect: the capability to stack die using through-transistor-stacking (TTS), through-silicon-vias & microbumps, backside metal routing, TSV to TSV coupling extraction.

So what is TTS? It is TSMC’s own name for TSV on wafers containing active devices (as opposed to on interposers, which typically only contain metal routing and decaps, where they still use the TSV name). The 3D test vehicle has stacked memories on top of 28nm SoC logic die (connected via microbumps). The 28nm logic die has TSVs through active silicon and connects to the backside routing (also called re-distribution layer or RDL) and C4 bumps on the backside of the logic die. The bumps then connect to standard substrate on the module. So this is true 3D, not 2.5D where die are bumped and flipped onto an interposer, and only the interposer (which doesn’t contain active devices) has TSVs. One of the challenges of TSVs is that the stress of manufacturing them alters transistor threshold voltages in the vicinity, and probably other stuff I’ve not heard about.


So FinFETs are coming at 16nm and the flows are ready to start designs, already validated in silicon. Plus a true 3D More than Moore flow.

OIP is coming up on October 1st. I’m sure that one of the keynotes will have some more about 16nm and 3D. For details and to register go here.


How to Design an LTE Modem

How to Design an LTE Modem
by Paul McLellan on 09-16-2013 at 4:24 pm

Designing an LTE modem is an interesting case study in architectural and system level design because it is pretty much on the limit of what is possible in a current process node such as 28nm. I talked to Johannes Stahl of Synopsys about how you would accomplish this with the Synopsys suite of system level tools. He is the first to admit that this is not a push-button flow where everything flows cleanly from one tool to the next, but more of a portfolio of technologies that can be used to get a modem done. Another complication over previous generations is that multiple radios can be used simultaneously.

LTE is actually a whole series of different standards with different uplink and downlink data rates, but one thing is constant: no matter what the data rate, the power dissipation of the modem must be such that the battery of the phone will last all day. So efficient tradeoff analysis is required to meet power and performance goals.

A high end LTE modem requires approximately 1 TOPS/second at 1W. To get there requires a complex architecture in which things happen in parallel. The picture above shows the type of architecture involved with dedicated FFT units and multiple SIMD execution units.

In principle it is possible to design a modem entirely in software, but the power dissipation would be unacceptably high. It is also possible to design highly optimized RTL but the design cycle would stretch out unacceptably and it would be too inflexible to cope with changes in the standards and the phone price points.


So step 1 is architectural exploration to answer questions such as:

  • Application-level parallelism?
  • How many cores?
  • Which parts in HW and SW?
  • Memory architecture?
  • Interconnect topology?
  • Performance, power?


The verification of the architecture then requires a flow that takes both the basic block level architecture and the actual software loads as input, with a goal of refining the architecture so that the block level performance and power envelopes are defined, and the interconnectivity (such as bus widths) is determined. This can involve cycle accurate models, virtual platforms, Zebu emulation boxes and FPGA prototypes.


One possible type of block to include in the design is an application specific processor (ASIP). There are configurable processors that is one approach to modem design but it doesn’t necessary hit the sweet spot of PPA as well as an ASIP that can be created with Synopsys’s processor design tool (the old LISA that came to Synopsys via CoWare). The processor will require specialized functions useful for modems, for inverting matrices, error control coding (ECC) and so on.


One nice side-effect of the model based approach is that at the end there is a virtual platform that can be used to accelerate software development before silicon is available (and perhaps after, since control and visibility is so much better in a virtual platform). Usually people don’t set out to change their software development methodology, but once the virtual platform is created for architectural reasons then it is ideal to use for the very complex debugging (involving several loads of software running on different processors: control processor, DSP software, protocol software, hardware etc, often all with their own debuggers).

This approach doesn’t make LTE modem design easy but it does at least make it possible.

More details on platform architect, processor developer,and virtualizer.