webinar IPXACT banner

TSMC Breaks More Records in Q3 2014!

TSMC Breaks More Records in Q3 2014!
by Daniel Nenni on 10-16-2014 at 4:00 pm

As previously predicted TSMC is having another record breaking SoC quarter. TSMC is my favorite economic bellwether and from what I can see the semiconductor industry will continue to grow this year and next at a rapid rate thanks to TSMC and the fabless semiconductor ecosystem:

We have set a new record of revenue and profitability thanks to strong demand and our successful ramp of 20nm. Our revenue increased 14% sequentially and 29% on a year-over-year basis to reach $209 billionNT. Our gross margin exceeded 50% to reach 50.5% which is also a record since the second half 2006.

20nm is the focus of this quarter and rightly so. Let’s not forget that a famed semiconductor analyst, Dr. Handel Jones of IBS, predicted at the beginning of this year that:

  • 20nm will be a high volume technology node in 2015 but mostly 2016.
  • 16/14nm will provide low cost gates with volume production only in 2017.
  • 10nm will be postponed. Cost per gate will be prohibitive and unclear where demand will come from outside high-speed processors and FPGAs.

Also Read:Handel Jones Predicts Process Roadmap Slips

To be clear, 20nm is in high volume production TODAY and 14nm/16nm will be in high volume production in 2015. In regards to 10nm:

On 10 nanometer development our 10 nanometer development is progressing according to plan. Currently we are working on early customer collaboration for product tape outs in 4Q of 2015. The risk production date remains targeted at the end of 2015. Our goal is to enable our customers’ production in 2016.

To meet this goal we are getting our 10 nanometer design ecosystem ready now. We have completed certification of over 35 EDA tools using ARM CPU core as a vehicle. In addition we have started the IP validation process six months earlier than previous nodes with our IP partners.

TSMC has a dozen 10nm early access customers designing SoCs, baseband/LTE chips, CPUs, GPUs, network processors, FPGAs, and game consoles. So, rather than postponing, TSMC has pulled in 10nm to better align with the Intel 10nm road map. The famed Intel process lead, in regards to SoCs, has slowed since 22nm. The Intel Bay Trail 22nm SoC was released in Q3 2013 while Apple’s 20nm A8 was released in Q3 2014. The Intel 14nm Cherry Trail SoC has now been delayed to Q1 2015 and the Intel 14nm Broxton SoC targeting phones is no longer being discussed publicly.

We are happy to say that 16 nanometer has achieved the best technology maturity at the same corresponding stage as compared to all TSMC’s previous nodes. On the yield the progress is much better than our original plan. This is because the 16 nanometer uses similar process to 20 SoC except for the transistors and since 20 SoC has been in mass production with good yield.

On the performance side compared with the 20 SoC, 16 FinFET is greater than 40% speed, faster than the 20 SoC at the same total power or consume less than 50% power at the same speed. Our data shows that in high-speed application it can run up to 2.3 GHz or on the other hand for low power application it consumes as low as 75 milliwatts per core.

Samsung still seems to have a production lead over TSMC at 14nm. My expectation is that Samsung 14nm SOCs will start revenue in Q2 2015 and TSMC 16nm FF+ SoCs will start in Q3 2015, so Intel’s process lead is narrowing. At 10nm I expect the foundries to be lockstep with Intel (for SoCs). Just my opinion of course.

You can find TSMC quarterly result and presentation materials HERE.


Finding Logic Issues Early that Impact Physical Implementation

Finding Logic Issues Early that Impact Physical Implementation
by Daniel Payne on 10-16-2014 at 7:00 am

Complex SoC project teams typically use a divide and conquer approach where specialized engineers work in separate domains, like front-end or back-end. The five major engineering tasks for IC design can be described as: RTL design, synthesis, floor planning, place and route, then finally design analysis.

What if you could detect physical implementation bottlenecks earlier in the design process, like during RTL design? That ability would save you not only design time, but also reduce the number of iterations in trying to reach design closure and sign-off. Here’s what the design iteration cycle looks like if you wait until physical implementation and then analyze for design closure:

Related: A Complete Timing Constraints Solution – Creation to Signoff

The steps in red show that engineering has to review the results of Static Timing Analysis (STA), identify the critical paths that are limiting the clock rate, rework the floor plan, try various design optimizations, re-run STA, then maybe go all the way back to RTL and try refactoring code to eliminate the logic creating the bottleneck for closure. One EDA vendor that has studied this issue is Atrenta, and their approach to reduce design closure iterations is called Physical Lint – a step used during RTL coding that predicts the physical implementation effects of logic at an early stage:


There are logic structures identified during RTL analysis known to cause physical implementation issues, like:

  • Large logic cones
  • Large muxes
  • Registers or memories
  • Excessive register count
  • Unintended black boxes for physical implementation

Related: Assertion Synthesis – From Startup to Mainstream

During physical lint the SpyGlass tool applies rules based on your actual technology mapped design to improve accuracy. For the rule about large logic cones their tool can automatically identify flip-flops with high fan-in cones, then show you how to split them up. Likewise, for flip-flops with high fan-out, the fix is to add a pipeline:

For muxes the SpyGlass Physical tool identifies both wide and deep muxes and can show you where to refactor the RTL by mux spreading, thereby reducing physical congestion.

All of the rule violations identified by SpyGlass Physical are presented in a spreadsheet viewer, and ranked based upon their physical impact so that engineers can quickly identify the RTL source code. Modifications are suggested for each rule violation, keeping the designer in control of all changes.

Related – Automatic RTL Restructuring: A Need Rather than Convenience

Mark Baker of Atrenta presented the approach of SpyGlass Physical in a 29 minute webinar, now archived and available online. The basic premise is that an ounce of prevention is worth a pound of cure, so why not use the physical lint approach in your next IC project.

Q&A

Q: Will SpyGlass Physical attempt to update my RTL?

A: Today we identify the RTL code, then suggest how to change your RTL.

Q: What physical data is required to run Physical Lint?

A: We’re using the standard Liberty format for your cell library along with the RTL source code.

Q: How does SpyGlass compare to actual logic synthesis results?

A: In physical lint we’re doing a fast synthesis process, and there’s not a need for heavy timing optimizations.

Q: How does the actual implementation compare to what SpyGlass predicts?

A: We see excellent correlation in identifying physical congestion.


Demler: Quad Core is Just For Marketing; Intel Will Not Succeed in Mobile

Demler: Quad Core is Just For Marketing; Intel Will Not Succeed in Mobile
by Paul McLellan on 10-15-2014 at 9:00 pm

At Memcon today Mike Demler of the Linley Group (and coincidentally someone who used to work for me back at Cadence and who now run Memcon, small world) gave an interesting presentation on Trends in Mobile Processors. A mobile application processor (AP) is a highly integrated SoC to run the applications in a mobile device. Mostly they run Android or iOS although there are a few other mobile operating systems around. The AP always contains MMU, GPU, ISP and VPU but increasingly they might contain cellular baseband (typically LTE these days), Bluetooth, Wi-Fi and GPS although sometimes those are on separate chips. The power is 2W or less in a phone, 4-5W in a tablet.

Starting with Samsung’s Exynos Octa these often contain four main cores or up to 8 in the ARM big.LITTLE configurations. All cores get counted since all can be running at the same time (initially big.LITTLE could only run the big or the little out of each pair). The smallest cores are slower but consume a lot less power (and area). For example, Cortex-A7 is 3.5x more efficient in MIPS/W and 2x in MIPS/mm2.


But there is a dirty secret. When the Chinese smartphone test service Testin measure multicore utilization on a Qualcomm APQ8064 (it is a quadcore Snapdragon AP) the utilization was 58% on the first one (actually the zeroth one, computer scientists like to start counting at zero) was 58%, on core 1 it was 49% but on cores 2 and 3 it was 2% and 1%. The numbers vary a little with the benchmark but the basic fact remains true. Only two cores are really getting any use.

Also Read: Samsung Profits Fall 60%

So if 4 cores are not even being used then why 8 cores. Basically marketing. One thing that I’d not thought of is that in Chinese culture 8 is a symbol for prosperity (because it sounds like the word for wealth) and 4 only differs from the word for death in the tone. My ex-father-in-law used to work for Wedgwood and for the Chinese market they had to ship a tea-service, say, with 5 cups and saucers not 4. Some games can use 4 cores but the improvement over 2 is minimal.


So where are we? Single core is pretty much gone. Quadcore is the most common configuration, the average will be 4 next year. It is also all going to be 64-bit. Apple designed the first 64-bit ARM (before ARM) followed by Qualcomm (they both have architectural licenses so don’t have to wait to get the IP from ARM). Same for the Intel processors: quadcore Baytrail for tablets and quadcore Moorefield for smartphones (although so far only used for phablets). But again the transition is driven more by marketing than technology since it is not needed until devices with 4GB of DRAM appear. There is actually a performance advantage since A53 is faster than A7 even on 32 bit code and the area penalty is minimal.


Having written yesterday about Intel losing $1B/quarter in mobile, Linley’s projection is not optimistic that this will change. They predict rapid decline in 32-bit ARM APs, rapid growth in ARM 64-bit APs (A57 at high end and A53 at low end + Apple and Qualcomm doing their own thing) and nothing significant in x86.

Graphics will get a lot more powerful driven by games and the transition to what is called 4K video (4000 pixel horizontal resolution approx). The drive for this is not so much that phone screens will have that resolution but people will want to connect their phones/tablets to high-resolution TVs, if and when we have them (which remains to be seen).

Next week is the Linley Processor Conference on October 22nd to 23rd. I’ll be there (so will Mike Demler). It is in the Hyatt Santa Clara (yes, in the hotel not the conference center). Details here. This is not the mobile processor conference which is in the spring, this one is focused on networking and base-station. But note, registration closes at 5pm today.


More articles by Paul McLellan…


Did we forget non-volatile memory?

Did we forget non-volatile memory?
by Don Dingee on 10-15-2014 at 7:00 pm

In our rush to shrink SoC nodes more and more to achieve better performance and more complex devices, we may have forgotten a passenger in the back seat: non-volatile memory. There has been little discussion of this in the pages of SemiWiki until now. Let’s give it a closer look.

Embedded flash has usually been associated with microcontrollers, replacing ROM as the code storage mechanism and offering reprogrammability. MCUs have been on less aggressive nodes for a number of reasons, including integrated mixed signal circuitry, and enough density on mature nodes to achieve high yields and a very low cost.

The lines between SoC and MCU are blurring, now lining up with whether a core has an MMU or not. For larger code and data storage requirements, discrete flash is still a choice – and it ran into problems first.

Discrete NAND flash fabrication was happily on the path of most shrinkage. Then, issues developed as we arrived at 1x nm processes: reliability and endurance started dropping, rapidly. Noise started affecting cells packed too closely together. Write endurance, always a lurking concern with flash, became a problem as smaller cells wear out faster.

The problem with planar 2D NAND flash is now so bad, we are at the end of the line as far as process shrinks. The solution is 3D vertical NAND, or V-NAND. Relaxing geometries back to 3x nm ranges to get past the noise and wear issues, V-NAND stacks cells vertically in channeled layers to expand capacity. Scalability relies on increasing the number of vertical layers, instead of shrinking the geometry.

V-NAND presumably solves the issues for discrete flash, but does little for embedded flash. The V-NAND process is so radically different from those used for SoCs, it is hard to see V-NAND flash ever co-existing on an SoC outside of a stacked 3D-IC approach.

Similarly, a few MCU manufacturers are moving toward FRAM for integrated non-volatile memory. FRAM is fast, very durable, and low power – if it can be fabricated reliably, with exotic materials in the mix. It also does not scale in capacity or geometry very well, but may be more than adequate for MCU architectures on mature nodes.

Embedded flash is not as easy as it sounds. A quick scan shows the most advanced node disclosed publicly is Renesas, teamed with TSMC for embedded flash at 28nm. Getting beyond that point may expose exactly the same problems discrete flash has already run into at smaller geometries.

One-time programmable (OTP) antifuse memory, such as Sidense SHF, is headed for 16nm FinFET as we covered last month. OTP can emulate reprogrammability using multiple blocks for code storage. It also provides enhanced security against reverse engineering, and more than enough read endurance and reliability.

What’s the solution? As badly as we’d like a single non-volatile memory bullet, there may not be one at sub-20nm for a while.

When OTP is working at 16nm, it will go where embedded flash has not ventured yet. Not enough reprogrammability cycles for code storage? If you are reprogramming production code more than about 10 or maybe at the worst 20 times over the life of an embedded device, you don’t need better non-volatile memory, you need better programmers and QA folks.

Yes, the model of pluggable apps such as used in Android does present issues – that use case is probably best for large external discrete flash, anyway. We’re assuming a class of SoC emerges needing the performance that sub-20nm offers, but running a more MCU-like application with embedded code.

The tradeoff of OTP may be with serial SPI flash – slower (you went to sub-20nm for a reason, executing code out of SPI would be undoing that), but able to capture data written frequently. The question becomes if data really needs to be non-volatile. Configuration data is well served with OTP, but if the requirement is writing a continuous data stream from a sensor or other source, serial SPI flash may be necessary.

It is an interesting tradeoff. The point of this is not to overlook integrated OTP NVM for code storage and configuration data requirements, because the assumption of embedded flash existing for all non-volatile needs may not be valid as geometries continue to shrink. Never say never, but right now it is a problem at and beyond 28nm.

How are you thinking about this issue? Have you rediscovered OTP NVM for code storage? What about data storage? Do we need research on another non-volatile memory technology for advanced nodes? Or is there just no reason to ever head south of 28nm for some applications?

Related articles:


"How do I …?" Fast help for specific EDA software questions

"How do I …?" Fast help for specific EDA software questions
by Beth Martin on 10-15-2014 at 4:55 pm

It’s a question that we ask all the time. How do I replace a door? How do I set up a printer? For most questions in today’s world, you can open a browser, type a question into your favorite search engine…and voilà, there you go! Ten seconds of typing, a few more sorting through the results, and you have an answer (or many answers!). This process works very well for ordinary needs like replacing a door or estimating the height of a tree, and even for some technical questions. However, what about questions like—How do I extract a net from a GDSII file? How do I quickly merge hundreds of GDSII files together? When the question is how to do specific tasks with an EDA software tool, you are often out of luck with this approach.

So, what happens when design engineers get fed up with the hacked-together code or process they’ve used for the past two projects, and want to find a better way to do it? The first step is still usually to search for it—who knows, someone might have posted it somewhere in an EDA blog or forum. If that approach doesn’t pan out, it’s time to ask Bob how he did it…although Bob is in Korea (or is it India?) for a couple of months, so you might not want to wait until he has time to respond. Hey, let’s ask the account engineer…after you find the phone number. Oh, the AE’s not available for 2 days? Sigh…well, let’s go to the support site…anybody remember their password? Eventually, someone logs on to the support site, and gets contacted by a support engineer, who usually answers the question pretty quickly.

But why is it so hard to find out how to do a specific task using a tool that costs many thousands of dollars? Shouldn’t the software interface be intuitive, or contain enough online help to make any task easy to accomplish? Shouldn’t the company provide training and documentation?

Actually, EDA companies work really hard to make their software “easy to use” (really, we do!), and the documentation and training is pretty good. The disconnect comes in three parts:
[LIST=1]

  • Use it or lose it. Designers use a dizzying array of tools to accomplish their jobs, often from several vendors. Some of those tools are used frequently, others infrequently. If you don’t use a tool often, or if you perform a particular task infrequently, you forget the details, and there are an awful lot of details in EDA software.
  • Specificity. How-to questions are often very specific to an end-user problem, but manuals are generic descriptions of the product. Manuals describe what every button does, but not how or when to use it. Examples tend to describe how to use the product features, rather than solve a specific user problem.
  • Accessibility. Even if the correct information is available, end-users need to be able to get to it when needed in a form that provides quick review and application. Surprisingly, users often don’t have time to read a 30-page technical reference when they have a tapeout schedule to maintain. EDA companies provide a colossal volume of useful content…and it’s usually behind a support fire wall (for good reason, of course). But how many end-users are actually familiar with all the support centers for all the EDA tools that they use? And once they get into the support site, they find every EDA vendor has a different format and framework, making it challenging to find the information they need in a timely manner.

    What EDA engineers need is a better way to get practical information on how to do specific tasks. And like I said, where does everyone go these days, when they want to get quick information about how to do something? The Internet, of course. And if we have this medium available to us, which is accessible to everyone in the industry, why not use it? Why not take some of that useful information currently behind the support firewalls, and make it available on the general Internet? Let’s make it task-specific, not generic, so people will find it when they type their question into the search engine. In fact, why not make videos, so that engineers can see what you mean when you say “choose the Verification menu”? Make them short, to the point, and designed from the engineer’s point of view. It probably won’t go viral and get five million hits, but somewhere out there, a design engineer will be gratefully getting a task done quickly and efficiently.

    For example, type your tool headache into the Google search bar, or Youtube (as in this screen shot), and you might find what you’re looking for.

    Support like this won’t take away from the value of the EDA vendor’s support site, because any big problems still have to go through support. And it won’t even obsolete your technical documentation, because all those new engineers still need the full roadmap through your software. Sure, competitors can see your GUI, but that doesn’t mean they can make it work as well as yours. Take a little risk, and reward your customers. Isn’t that the end game?


  • Mentor Wins v Synopsys

    Mentor Wins v Synopsys
    by Paul McLellan on 10-15-2014 at 10:00 am

    Just a couple of days ago I read a curious press release.Mentor Graphics Corp. (NASDAQ: MENT), today announced that a Portland, Oregon jury delivered a verdict in favor of Mentor in a trial in which Mentor asserted infringement of one of its patents against Emulation and Verification Engineering S.A. (EVE) and Synopsys, Inc. (Nasdaq: SNPS). The jury in the United States District Court for the District of Oregon found that one Mentor patent – U.S. Patent No. 6,240,376 – was directly and indirectly infringed by EVE and Synopsys. As part of the verdict, the jury awarded damages of approximately $36 million and certain royalties to be paid to Mentor Graphics.

    I don’t really know a lot of the details. Emulation has been a heavily contested area for patent lawsuits, maybe because hardware is easier than software to prove infringement. After all, if your place & route tool has some patents it is really hard to work out whether a competing tool violates them or uses some alternative technology. With hardware it is a lot easier to take a look.

    For a time a dozen years ago the shoe was on the other foot. Mentor was in a set of lawsuits with Quickturn (now Cadence, the Palladium product line). Quickturn previously sued Mentor for patent infringement, and successfully blocked the sale of the Meta-developed SimExpress emulator in the United States. Mentor engaged in a lengthy and ultimately unsuccessful hostile takeover bid for Quickturn. Then in 1998, just a couple of months before Cadence acquired Ambit where I was VP engineering, they became a white knight and acquired Quickturn too. More recently emulation has really come into its own. I think it was Wally Rhines, CEO of Mentor, who pointed out at DVCon a year or so ago, that emulation was now the cheapest way to simulate a cycle. The box costs a lot but it runs so fast.

    Mentor was then in various lawsuits and agreements with EVE, and licensed some patents to them although in a way that is Synopsys acquired EVE the licenses would lapse. Then, a couple of years later, Mentor sued EVE about some different patents.

    It was a well-known secret that Intel was EVE’s biggest customer (Synopsys too, I believe) and they put pressure on Synopsys to acquire EVE. In fact, since emulation was increasing in importance and Synopsys had no solution and EVE was the only game in town they pretty much had to acquire them. I think it is unlikely that Cadence or Mentor would have been allowed to acquire them without raising objections. Weirdly, Synopsys counter-sued Mentor about EVE before they even did the acquisition. Anyway, the history is complicated and I probably have some of the details wrong.

    What is more important is the effect, if any, on future business. The above press release is about EVE who were finally acquired by Synopsys almost exactly two of years ago. $36M is noise level for both Synopsys and Mentor, but not being able to sell emulators going forward would be a big problem. Since it doesn’t say that I assume that at the most Synopsys has to pay an ongoing royalty to Mentor. Synopsys have said that they will appeal.

    If anyone knows more please comment. It is certainly an interesting note in a very long story and worthy of discussion.


    More articles by Paul McLellan…


    EDA and the Nobel Prize in Physics!

    EDA and the Nobel Prize in Physics!
    by Daniel Nenni on 10-15-2014 at 7:00 am

    What does EDA and the Nobel Prize for Physics have in common? Our very own Dr. Walden Rhines (CEO of Mentor Graphics):

    The Nobel Prize in Physics 2014 was awarded jointly to Isamu Akasaki, Hiroshi Amano and Shuji Nakamura “for the invention of efficient blue light-emitting diodes which has enabled bright and energy-saving white light sources”.

    What they don’t say however is how America pioneered much of the work on the blue LED that a Japanese team of researchers “perfected”. Herbert Markuska actually developed the initial blue LED while at Stanford with fellow PhD student Wally Rhines (Wally is listed as a co-inventor on the patent issued in 1974 for this discovery). You can read more about it in the Oral History of Dr. Walden C. Rhines and A Brief History of GaN Blue Light-Emitting Diodes.


    Gallium nitride metal-semiconductor junction light emitting diode

    [TABLE]
    |-
    | Publication number
    | US3819974 A
    |-
    | Publication type
    | Grant
    |-
    | Publication date
    | Jun 25, 1974
    |-
    | Filing date
    | Mar 12, 1973
    |-
    | Priority date
    | Mar 12, 1973
    |-
    | Inventors
    | D Stevenson, W Rhines, H Maruska

    |-
    | Original Assignee
    | D Stevenson, H Maruska, W Rhines

    |-

    As the story goes, in 1990 while having dinner with Satoru Ito (who later became Chairman & CEO of Hitachi Semiconductor) Wally expressed concerns about the collapsing semiconductor industry in the U.S. The previous decade was a period of transition where the momentum of the U.S. dominated semiconductor industry had moved rapidly to Japan because of their superior manufacturing capabilities, particularly for memory devices. Ito-san said that the 1980s favored Japan because, “We are a nation skilled at optimization while the United States is a nation skilled at invention. The coming decades will be difficult for Japan because optimization requires stable standards like the MOS dynamic RAM, where improvements are evolutionary and there are no sharp discontinuities.” Ito’s brilliant insight was that as the computer and cell phone industries evolved in the 1990s and beyond it would be more difficult for Japan semiconductor companies because the products were changing radically in unexpected ways, unlike the semiconductor memories of the 1980s. Ito turned out to be right and the U.S. made a remarkable recovery through innovation and invention rather than optimization and manufacturing efficiencies while Japan struggled with rapidly changing and diverse standards.

    It is interesting to note that the Nobel Prize announcement was carefully worded to emphasize that Akasaki, Amano, and Nakamura were being recognized “for the invention of EFFICIENT blue light-emitting diodes”. The inventor of the magnesium-doped gallium nitride light emitting diode, Herbert P. Maruska, was excluded from the prize. The Nobel Prize winners chose to base their work on Maruska’s patented discovery, including the specific materials used. The resulting press noted that Maruska was the inventor of the first blue light emitting diode in 1972 and questioned why RCA and the U.S. were unable to capitalize upon the invention. While Nobel Prize decisions involve many factors, there appears to be a more fundamental issue for the United States.

    The free society fostered in the U.S. throughout its history has led to revolutionary inventions by people who could freely try things that were contrary to accepted norms. As Ito noted, the U.S. has, as one of its greatest strength, the ability to invent. Where the U.S. has fallen short is in optimization. Manufacturing, which requires rigid repeatability of operations without variation, has become a lesser strength in the U.S. as Asian countries have capitalized upon their own optimization capabilities. The issue is a more important one for the U.S. than the question of who should receive a Nobel Prize (although most believe that Maruska should have been included).

    For the U.S. to capitalize upon its innately superior inventing ability, it must either partner with other nations who are most skilled at optimization, or develop a similar optimization skill in the U.S. Today’s outsourcing trends to Asia are effectively executing the partnering approach. Cell phones designed in the U.S. are manufactured in China. Innovative start-up companies outsource the production of their products to Asia. Longer term though, invention and optimization need to come together in close cooperation. The semiconductor industry grew up with offshore production for a large share of its labor intensive work but, until the last 20 years, the manufacturing was largely integrated with the parent company. For new products and ideas, foundries and offshore assemblers are available for just about any type of manufacturing. While this may be a very efficient approach, one might question whether this will lead to partnerships in new technology areas that will achieve the kind of partnering with U.S. companies that produces the best results. Would the optimization of the magnesium-doped gallium nitride blue LED have taken twenty years from the invention if the teams had been closely coupled?


    GPU Benchmarks? Try to See the Complete Picture…

    GPU Benchmarks? Try to See the Complete Picture…
    by Eric Esteve on 10-15-2014 at 3:29 am

    We all know benchmarks, but do we really understand benchmark results? Benchmark users should always look beyond the simple score when making in-depth technical analysis and request to see all the facts. There are many graphics benchmarks to choose from but let’s name today these three below:

    3DMark Ice Stormfrom Futuremark
    GFXBench 3.0from Kishonti
    Basemark Xfrom Rightware

    We tend to look first at raw GFLOPS results, but we should also analyze performance in both GFLOPS/mm2 and GFLOPS/mW. A smartphone user will probably don’t care about GFLOPS/mm2, even if he will certainly care about the price of the phone, and both are certainly linked: just take a look at the Apple A8 layout. The A8 has been processed in 20nm, integrating Quad-Core GPU from Imagination (the PowerVR…). The Quad-Core GPU and shared GPU logic represent 22% of the chip area (89 mm[SUP]2[/SUP]), the impact on chip, then smartphone price is indisputable! But Apple and iPhone 6 users will certainly care about GFLOPS/mW. If you look at Graphic benchmark results, you should care about power efficiency, too. And we will see that you may have to look further than to the raw GFLOPS results.

    Benchmarks results are expressed in term of Triangles, Pixels and GFLOPS:

    Triangles
    Real world applications have modest triangles rate requirements; moreover, high triangle rates in mobile quickly become memory bandwidth limited way before they turn out to be GPU-limited. In fact, on most GPUs today, triangle throughput is no longer a problem – or even a relevant metric. Mobile GPUs today can easily support 100 to 200 million triangles per second (MTri/s), providing more than enough resources for real world cases.

    Pixels and Texcel
    Pixel rates on the other hand are probably the most important metric for all market segments and typical usage scenarios. User interfaces or browser running at 60 fps are all about pushing textured pixels. If you are looking for an easy top level requirement calculation, the formula below offers you the headline million pixels per second (MPix/s) figure:
    Screen resolution x fps = pixels/sec

    GFLOPS
    Floating point operations per second (FLOPS) are increasingly becoming a critical parameter for mobile GPUs when it comes to graphics and compute performance. The FLOPS metric indicates the number crunching ability of a graphics processor and can be compared to the million instructions per second (MIPS) that a CPU can deliver.
    This graphic shows the graphics performance evolution for mobile devices. As we can see, graphic performance evolution rate is simply amazing, as the PowerVR GX6650 to be used in the application processor integrated in smartphone sold in 2015 delivers 40 times the performance of the PowerVR SGX530 integrated for example in OMAP3 from TI, developed less than 10 years ago!

    Raw GFLOPS results give you a precise evaluation of the “brutal” force of a Graphic device. Brutal because analyzing graphic performance without taking into account the power consumption or the power efficiency of a specific GPU, may provide some surprises and you may have to plug your smartphone for battery charging more frequently than expected. How to clarify this point? You should look at benchmark results about Long term performance:

    Long term performance
    If you are looking at a better indication of real-world workloads for current-generation graphics hardware, long term performance is probably your best bet. One of the benchmark to search for is: GFXBench 3.0 from Kishonti.

    Do you remember this plot inserted as a comment in the blog “The TSMC iPhone6” from Dan Nenni? Now you get the complete picture, so you better understand why Apple is throttling A8 CPU after 10 minutes! That’s the only way to offer very good graphic performance at the beginning (to impress a potential buyer), and a decent power consumption on the long term.

    You will find a lot more information about benchmarks and how to use it in this very interesting blog from Alexandru Voica (he is with Imagination Technologies, so the focus on PowerVR family!): “A consumer’s guide to graphic benchmark

    From Eric Esteve from IPNEST

    More Articles by Eric Esteve…..


    How ST Designs with Layout Dependent Effects (LDE)

    How ST Designs with Layout Dependent Effects (LDE)
    by Daniel Payne on 10-15-2014 at 12:00 am

    I first visited STat their Agrate, Italy site where Flash memory development is done. At DACthis year Antonio Bogani talked about how ST designs with LDE while using EDA tools and a PDK (Process Design Kit) from Cadence. They recorded the 17 minute presentation, and you can view it herewithout having to register. Antonio’s group provides the technology R&D for Smart Power products that is used in markets like:

    • Digital consumer products
    • Microcontrollers
    • MEMS & Sensors
    • Automotive products
    • Smart Power products (digital, analog and power modules combined) using Bipolar, CMOS + DMOS(BCD)

    Process technology ranges from 90nm to .32um nodes which are well suited for high-voltage applications. Custom IC design at the transistor-level for BCD designs must take into account LDE like:

    • Shallow Trench Isolation (STI)
    • Well Proximity Effects (WPE)

    Related: Samsung 14 nm FinFET Design with Cadence Tools

    Process rules are difficult to manage because of:

    • Halation and other non-simple backend rules
    • Rounded corners of polygons to smooth high-voltage electrical fields

    The old design approach was quite sequential and had a long feedback loop requiring slow iterations during design:

    A new design approach does LDE analysis of STI and WPE while doing layout, then annotates the effects into the schematics, ready for simulation:


    You can also do an LDE analysis flow on incomplete layout inside the Virtuoso XL layout tool, where Spectre is used to simulate and then display LDE deviations on device parameters like: Vth, Ids. This provides you with verification that constraints were met in the presence of LDE on your critical devices.

    While you are drawing an IC layout you can turn on real-time DRC verification, taking less than a second or so and providing visual feedback as violation markers. Another way to verify is DRC on demand, typically on the visible area, taking 15 seconds or less.

    Related: A Deeper Insight into Quantus QRC Extraction Solution

    By using the Cadence PDK with MODGENs or partial layout you can get to a simulation netlist even with just a prototype placement, then simulate the LDE effects using the Analog Design Environment – ADE XL for short. Layout parameters are quickly added to the schematic.

    By having such quick feedback you are receiving early detection of LDE variability hotspots and making layout fixes much easier than waiting for a final layout. The DFM PDK flow is shown below:

    Groups at ST are using this type of flow on their bcd8 and bcd8 families PDK. The benefits of using a Cadence in-design flow are to accelerate IC development on Smart Power projects in process technologies with LDE.

    Related: 10 nm, the View from IBM


    Intel: Spectacular PC, Some Progress in Mobile

    Intel: Spectacular PC, Some Progress in Mobile
    by Paul McLellan on 10-14-2014 at 8:30 pm

    Intel announced their quarterly results today. They beat consensus by a cent. Not surprisingly their business is driven heavily by the microprocessor businesses, what Intel calls PC and Datacenter. Revenue and earnings both set new records in both groups.

    As Brian Krzanich, the CEO, said:In the Data Center, we saw double-digit revenue growth across all four major market segments. Enterprise grew 11%; networking grew 16%; and HPC; and cloud service providers grew 22% and 34%, respectively. We also launched the new Xeon E5 processor, formerly known as Grantley. This product family provides leadership, features and performance for compute; storage and network workloads. Formally launched just five weeks ago, E5 is already 10% of our DP or two-socket volume.

    Intel is clearly still having problems getting 14nm to yield. They didn’t give an actual number but did say that while yields improved substantially during the quarter they are behind where they expect to be. In what has to count as a huge understatement, Brian said that “these challenges highlight just how difficult it has become to ramp advanced process technology.” Stacey, the CFO, gave a bit more color later in the call. It looks as if 14nm is pulling overall margin down by at least a few percentage points:Moving to gross margin, third quarter gross margin of 65% was up 0.5 point from the second quarter and down 1 point from our guidance. The increase from the second quarter was primarily due to lower platform unit costs on 22-nanometer and higher platform volume, mostly offset by higher production costs on 14-nanometer products.

    As always, the interesting thing about Intel is not how well its main microprocessor business is doing, that is almost a given, but how it is doing in other areas, in particular mobile. In mobile Intel lost $1B this quarter, which is actually an improvement of $81M from last quarter. They reckon they are now the second biggest supplier of chips into the tablet market, and the first merchant supplier (obviously Apple is #1 but they don’t sell their chips to anyone else). They are on track to ship 40 million parts into the tablet market by the end of the year. But remember they are shipping money with each part too (what they call contra revenue dollars). In the Q&A they admitted that they will continue to lose money in this sector through next year (probably).

    They did have a couple of design wins that may turn out to be significant. Samsung selected their Cat6 LTE modem for the Galaxy Alpha and Galaxy Note 4 products. What may be especially important for them are the predictions that increasingly PCs will also have LTE capability. By 2018 they expect 15% of PCs to have cellular baseband, and the percentage of tablets that do will double. But reading between the lines they are still losing money (contra revenue) in this part of the market too.

    And Intel foundry? Didn’t get mentioned.

    Transcript of call here. Intel’s investor meeting is on November 20th.


    More articles by Paul McLellan…