wide 1

Dear Meg, HP is Still a Goner

Dear Meg, HP is Still a Goner
by Ed McKernan on 10-11-2012 at 9:30 pm


A year ago, Meg Whitman decided it was time to venture back into the business world by grabbing onto the HP CEO baton from a badly wounded Leo Apotheker. What for? My best guess is to enter the Pantheon of Great Turnaround CEOs of failing companies, best exemplified by the work of Lou Gerstner with IBM in the early 1990s. It comes too late though, as the $120B company has none of the core legacy that IBM had with mainframes that ultimately allows a company to rise from the ashes. HP along with Dell and the other PC vendors are all locked into a business model whose fate is determined by others. It wasn’t supposed to be this way when the decision was made in the 1990s to cast off Bill and Dave’s legacy for a promise of dominating Computing from Big Iron all the way down to PCs. Many people forget, that HP once designed and built RISC processors in their own leading edge fabs. The surrender of this capability, ironically was not supposed to open them up to their eventual destruction at the hands of system competitors who would build their own (i.e. Apple and Samsung).

Whitman’s warning a week ago that HP would enter the valley of Death for at least another year and then rise again in 2014 or 2015 sent nervous investors scampering for the exits. As a result, HP’s stock is now down 50% in the past six months (Dell is also down 50%). The squeeze in PC profits is most painful in the consumer market, where preferences and pocketbooks seek out sleek smartphones and tablets. A retreat to corporate is in motion but hardly a rampart to the coming onslaught.

A little over a year ago, I wrote in a blog that the disaster that culminated in the dismissal of then HP CEO Leo Apotheker over his attempt to spinout the PC Group, could be traced all the way back to July 6, 1994 when they signed an agreement with Intel to stop their own RISC development in exchange for a partnership to define the 64 bit Itanium architecture. This set in motion the decisions whereby the highly profitable instrumentation groups (the true legacy of Bill and Dave) were spun off and in their place a full range of proprietary architectures were acquired (Tandem, DEC VAX and Alpha etc). All would be converted to Itanium and customers would have their one stop shop fulfilled by HP. Intel would make it all so seamless by paying Billions to port the Software Industry over to the new architecture all the while telling a story that 32 bit x86 processors were fairly soon going to hit a performance wall. The two were married to the vision of dominance and off they went.

Software, however, would not be so easily led to the new architecture and in the late 1990s the never-ending trend of computers always desiring to be smaller and more mobile would extend x86 as Itanium wallowed in overextended developments. WiFi entered the picture in the early 2000s and thus drive mobile volumes past desktops. AMD saw an opening in the server space and pressed ahead with 64 bit x86 processors and within a year Intel was forced to concede with its own 64 bit x86 chips. Itanium was defacto dead for all the world to see. Itanium De jure meant that the parking meters would still be left running as the big Iron computers would require constant R&D maintenance $$$ from both Intel and HP. At the end of the day, one has to wonder what the bill would have been had HP stayed with their RISC development for workstations.

Steve Jobs has been called a Genius Control Freak for good reasons. It must have occurred to him sometime after the Portal Player driven iPOD was released that to truly own the marketplace without competitors chomping at his heals he had to do the opposite of what HP did and get into the processor business. Let’s not gloss over this because building a mobile processor on a near leading edge process has resulting in many $100M+ sinkholes in Silicon Valley (I speak from experience) . However, freedom from Intel’s $200 mobile processors, adds up quickly. The new A6 processor has been described in glowing terms due to its outstanding performance and low power capabilities. I think it is somewhat of overkill for the iPhone 5 but it most likely is targeting a broad set of new devices in the coming year. The greater point of Apple’s effort and that of Samsung is that it has proven that going vertical in the computer industry was the only path to success; something HP abandoned in 1994 and which in the end will make Meg Whitman’s turnaround effort an exercise in futility.

FULL DISCLOSURE: I am Long AAPL, INTC, ALTR, QCOM


Altera’s Use of Virtual Platforms

Altera’s Use of Virtual Platforms
by Paul McLellan on 10-11-2012 at 9:00 pm

Altera have been making use of Synopsys’s virtual platform technology to accelerate the time to volume by letting software development proceed in parallel with semiconductor development so that the software development does not need to wait until availability of hardware.

In the past, creating the virtual platform has been comparatively time-consuming, but the entire ecosystem of partners providing models, TLMcentral, DesignWare cores and peripherals has come together so that creating the virtual prototype is no longer hard. Plus, stealing some of the fast iteration approach from agile software development means that people realize that the entire prototype does not need to exist before it can be used. The software and platform can be co-developed incrementally.

The software development environment does not need to be changed at all, the same tools can be used for the virtual prototype as for the actual silicon. Furthermore, whereas actual hardware is really tough for debugging when you get down to the signal level, a virtual prototype is simple. There is complete observability and no need for oscilloscopes and logic state analyzers to pick up the signals of interest when they are even available.

So what did Altera actually do? They used Synopsys’ Virtual Prototyping Solution to create a virtual target of their new Cyclone® V and Arria® V SoC FPGA devices. The Altera® SoC FPGA Virtual Target is a fast functional simulation of a dual-core ARM® CortexTM-A9 MPCoreTM embedded processor development system. This complete prototyping tool, which models a real development board, runs on a PC and enables customers to boot the Linux operating system out of the box. Designed to be binary- and register-compatible with the real hardware that it simulates, the Virtual Target enables the development of device-specific, production software that can run unmodified on real hardware.

The result of this, the big advantages that have always been there coupled with the lower barriers to use, mean that broad market adoption really (finally!) seems to be starting. For a company like Altera that doesn’t ship its own systems, there are two big advantages, one direct and one indirect.


The indirect advantage is the Altera’s customers and OEMs can accelerate their own software development and thus bring their systems to market sooner and with higher quality. Application software development can’t start until the operating system is at least partially running, which can’t start until there is some substrate (virtual or silicon) on which to run and test it. Virtual prototypes pull in the start and thus the finish of software development.


The direct advantage is that by accelerating the software development of their OEMs, they accelerate the time when they can ship in volume, which only happens when the customers’ software development is ready for market. In turn that means that there is less of a delay from, for example, introducing a new product or a new process node, to when Altera starts to get a real return on the investment.

The Altera/Synopsys success story is here. The Altera Q&A is here.


A Brief History of Moore’s Law

A Brief History of Moore’s Law
by Sam Beal on 10-11-2012 at 9:00 pm


I recently read a news article where the author referred to Moore’s Law as a ‘Law of Science discovered by an Intel engineer’. Readers of SemiWiki would call that Dilbertesque. Gordon Moore was Director of R&D at Fairchild Semiconductor in 1965 when he published his now-famous paper on integrated electronic density trends. The paper doesn’t refer to a law, but rather the lack of any fundamental restrictions to doubling component density on an annual basis (at least through 1975). He later re-defined the trend to a doubling of speed every 18 months. [The attribution “Moore’s Law” was coined by Carver Mead in 1970]. As founder and President of Intel Corporation, Gordon Moore led what is now a 40+ year continuation of that trend.

Continue reading “A Brief History of Moore’s Law”


The Protocol Processing Dataplane

The Protocol Processing Dataplane
by Paul McLellan on 10-11-2012 at 8:48 pm

At the Linley processor conference this week, Chris Rowen, the CTO of Tensilica presented on the protocol processing dataplane. That sounds superficially like he is talking about networking but in fact true protocol processing is just part of adding powerful compute features to the dataplane. Other applications are video, audio, security, voice-recognition and so on. All of these applications are inherently parallel and data-rich and either are impossible to process on a general purpose control processor such as an ARM (not enough performance) or are extremely power-hungry to use a general purpose processor.

Depending on the application, different kinds of parallelism are required, from single-instruction multiple-data (SIMD) vector processing to homogenous threads (all doing the same thing) or heterogenous threads.

The Tensilica Xtensa dataplane processor units (DPUs) are highly customizable and thus suitable for all these applications. The processors generated range from 11.5K gates up to huge beasts with large numbers of execution units. In addition, they can have a huge range of I/O architectures with FIFOs, lookup tables, or very wide direct connections. After all, a high-performance DPU isn’t much use if you can’t get the data in and out to the rest of the design with high enough bandwidth.


Probably the most demanding application, requiring very high I/O performance and high performance in the compute fabric, is network data forwarding (such as in a high-performance router). The most generic way to do this would be to use a cache-coherent memory system and just put the packets in off-chip DRAM. But Chris has a rule-of-thumb that, since energy is proportional to distance, if a direct wire connect is 1 unit of energy, local memory is 4, on-chip NoC is 16 and going off-chip is 256.


There is thus an enormous difference in energy efficiency to build the best possible fabric on-chip to keep everything fed, rather than building something completely general purpose, as can be seen from the above diagram showing the difference between using a cache-coherent cluster, one where DMA is used to offload the processors and one with direct connect.


The savings are huge using a DPU versus a standard microrprocessor. The pink bars show the efficiency of the Tensilica Xtensa DPU, the blue are ARM and the green is Intel Atom. Higher numbers are good (this is efficiency, Xtensa has been scaled to 1).


To take another demanding example, LTE-Advanced level 7. The block diagram is complex and requires a huge amount, 6.5Gb/s, to be moved around between the blocks. Again, comparing the general purpose solution to building direct connections on-chip shows the enormous difference in efficiency.



ARM in Networking/Communications

ARM in Networking/Communications
by Paul McLellan on 10-11-2012 at 7:15 pm

I was at the Linley Processor Conference yesterday. There are two of these each year, one focused on mobile and this one, focused on networking and communications (so routers, base-stations and the like). You probably know that ARM is pretty dominant in mobile handsets (and Intel is trying to get a toe-hold although I’m skeptical they will succeed). A story that I haven’t heard before is their potential in the networking and communications space.

So if you look at the architecture share pie chart you can see that PowerPC has the strongest position with about 50% market share, heavily used by Cisco, Ericsson and Alcatel-Lucent. Next is x86 which is not heavily used in routers but in various network appliances that are more like a PC in a box with network connectivity. MIPS is, apparently (and suprisingly) increasing its share and leads in multicore, in use at Cavium, Broadcom/Netlogic, Nokia-Siemens, Huawei (and some Cisco). ARM has a toe-hold and is only used at Marvell today.

So that doesn’t look like much of a story. ARM is a nobody, the other guys together have 95% market share.

But ARM has developed a 64-bit architecture and that has opened new markets at the higher end. There is potential for ARM-based servers replacing general purpose PC-servers in specialized datacenters. If you are Netflix and all you do is pump video then a general purpose PC may not be the most power/cost effective way to do that. Or uploading billions of photos to Facebook. But the jury is still out as to whether that might happen.

But there are some ARM plans. AppliedMicro (PowerPC-based) now developing a system based on ARM called X-gene, which is the world’s first 64-bit ARM server on a chip. Freescale (PowerPC) is developing a hybrid ARM/PPC system called Layerscape. LSI (PowerPC) is developing an ARM version of their Axxia product line going beyond 4 cores. And Cavium, which builds high-count multicore systems called Octeon is developing an ARM-based complement called Thunder.

But these dual architecture strategies are expensive both to build and verify the CPU cores and also to maintain the software on two architectures. So Linley’s prediction is that it won’t endure and there is a good chance that the migration will take place to ARM although it may take more than 5 years. ARM is seen as a safe choice. MIPS has been up for sale for most of the year and whether they can continue to make the engineering investment to remain competitive for the long term has to be in doubt. PowerPC, which originally was a joint development with Apple, Freescale (then still part of Motorola) and IBM, seems also to be gradually losing support. Apple has gone, of course. IBM is still committed to the architecture but really only for their own high-end servers.

x86, of course, is also a safe choice but Intel’s roadmap is driven by the PC and so it tends to require a higher chip-count to build a high-data throughput platform for communications. The requirements for routers and base-stations are very different from general purpose PCs since they have very different latency and data-plane requirements.

During the day, ARM announced two new products CoreLink CCN-504 (CCN is cache-coherent network) which is an ARM-specific network-on-chip (NoC) and CoreLink DMC-520 dynamic memory controller. These two products work together to provide on-chip and off-chip connectivity with a real-world throughput of 1Tbps and peak performance of 1.5Tbps using a 128-bit internal bus. There is support for the existing Cortex-A15 (32 bit) and future 64-bit ARMv8 architecture cores. As always, ARM is focused on power-efficiency and not just raw performance at any cost. These will be available early next year with the first production designs expected in 2014. LSI discussed some work based on CCN-504 where they have been partnering with ARM on interconnect technology requirements, although they didn’t announce any specific products.

So ARM today has pretty much zero market share in communications (outside of mobile, where they are dominant). But they have new technology for the space from 64-bit processor cores to on-chip NoC technology that can link up to 16 cores, and high performance access to off-chip memory. And several of the big players in the networking/communication spaces have ARM-based products that they will bring to market in the near future. The landscape could look very different in two or three years.


Challenges in Managing Power Consumption of Mobile SoC Chipsets: And What Lies Ahead When Your Hand-Held Is Your Compute Device!

Challenges in Managing Power Consumption of Mobile SoC Chipsets: And What Lies Ahead When Your Hand-Held Is Your Compute Device!
by Daniel Nenni on 10-10-2012 at 6:00 pm

Qualcomm VP of Engineering, Charlie Matar, will be keynoting the Apache/ANSYS seminar in Santa Clara next Thursday. Charlie is a great guy and a great speaker so you won’t want to miss this and it’s FREE! I spoke to Charlie, he will be speaking on:

Today’s complex SOC design is driven by the constant demand for high performance capabilities, rich feature sets, concurrency modes and low power requirements. This creates many challenges in delivering on what users want from their devices and these challenges impact the whole SOC eco-system from Design, analysis, sign-off, reliability and time to market. These trends will continue to challenge SOC designers in the future especially now that compute and mobile devices are merging.

My talk will focus on the key challenges that designers are facing especially in key areas like performance, power, thermal and reliability and discusses some of the future trends in the industry that requires a more efficient model between foundries, SOC designers and EDA companies.

Charles Matar is a Vice President of Engineering at Qualcomm. He joined Qualcomm in 2003 and formed a new CPU team the delivered ARM based CPU Cores for Qualcomm’s Mobile SOCs. After that, he was the chip lead that delivered the first 65nm Mobile SOC in 2007 and then he went on to manage the Physical Design Team, Low power Implementation team, Serdes and the foundation IP Design. His responsibility then included delivering all of the SOC tape outs for San Diego’s QCT division and enabling new technology nodes for next generation SOCs like backend infrastructure, IP, methodology and working closely with foundries, design and EDA companies. Presently, he is managing the Graphics Hardware team in Qualcomm responsible for delivering all of Qualcomm’s Adreno GPU cores.

Prior to joining Qualcomm, Charles held multiple positions as a CPU designer and manager. His technical interest is in SOC and Processor Design, Low Power Design and Process Technology.

Charles Matar hold a BSEE from the University of Texas at Austin and an MSEE from Southern Methodist University.

Charlie will be followed by technical tracks/breakout sessions featuring application specific presentations focused on 20-nm low-power designs and high-speed I/O verification. They will include presentations by leading companies such as Samsung-SSIand Texas Instruments sharing their experiences designing for low-power applications.

The 20nm Low-Power IC Design track will discuss tools and methodologies that address power and reliability challenges for advanced low-power designs. For those designing high-speed I/O interfaces such as DDR3/4, a special technical presentation entitled “Chip-Package-System (CPS) Methodology for Giga-hertz Performance Mobile Electronics” should be of key interest.

Learn More

View the full Agenda and Technical Track abstracts.

For more information on the series.

Reserve Your Seat Today!

I will see you there!


Hynix View on New, Emerging Memories

Hynix View on New, Emerging Memories
by Ed McKernan on 10-10-2012 at 11:11 am


The recent (August) flash memory summit in Santa Clara had a session devoted to ReRAM as well as featuring prominently in the keynote address by Sung Wook Park of SK Hynix. The talk includes a summary of NAND’s well known scaling issues along with approaches to 3D NAND. It turns out that they are working on three different technologies: PCRAM (phase change RAM) with IBM, STT-RAM (next gen MRAM) with Toshiba and the better known ReRAM program with HP. The HP collaboration has been ongoing since 2010, the other two collaborations date publicly at least from earlier this year. While SK Hynix have a vision for the three apparently competing technologies, we were a bit surprised and wondered if the additional collaborations were a reaction to internal concerns about the progress of their longer standing collaboration with HP. Christie Marrian has more over at ReRam-Forum.com


Tensilica Ships 2 Billionth Core

Tensilica Ships 2 Billionth Core
by Paul McLellan on 10-10-2012 at 7:00 am

It was in June of last year that Tensilica announced that they (or rather their licensees) had shipped one billion cores. Now they have just announced that they have shipped two billion cores. They are shipping at a run-rate of 800 million cores per year, which is 50% higher than June last year. If business continues to grow they will bet at a run-rate of over a billion cores per year sometime next year. They’ll have to put one of those signs outside that McDonalds used to have about how many billion hamburgers they sold, before the numbers go so big that it became impossible to keep up.

Since Tensilica is still a private company they don’t announce their financials, but they did also announce that their product license revenue is bigger than any other DSP licensing company by about 25% (presumably CEVA is #2) and that they are number 2 in product license revenue for all CPU IP licensing companies, behind ARM at #1 of course.

The accelerating growth in the number of cores is driven by designs ramping to volume in smarphones, digital TV, tablets, personal and notebook computers and storage and networking applications. But the real driver is power: dataplane processor units (DPUs) provide better performance/power/area than classic DSPs and scale from function-specific micro-cores up to large general purpose DSPs. Click on the picture to see a subset of the mobile phones that contain Tensilica DPUs.

New design wins are important for future revenue when royalties start to be paid too. Typically a core will be licensed at the start of a project, and it can be a couple of years before the design is complete and the systems start to ship in high volume.

And for anyone visiting Tensilica’s offices, they will move buildings the weekend of 20th October (ten days time). They are only moving to the two-storey building across the road so if you go to the old offices you are almost in the right place!



The Middle is A Bad Place to Be if You’re a CPU Board

The Middle is A Bad Place to Be if You’re a CPU Board
by Don Dingee on 10-09-2012 at 10:45 pm

In a discussion with one of my PR network recently, I found myself thinking out loud that if the merchant SoC market is getting squeezed hard, that validates something I’ve been thinking – the merchant CPU board market is dying from the middle out.

Continue reading “The Middle is A Bad Place to Be if You’re a CPU Board”


ICCAD: 30 years

ICCAD: 30 years
by Paul McLellan on 10-09-2012 at 9:00 pm

ICCAD is November 5th to 8th in the Hilton San Jose (downtown).

It is very off topic, but if you are British then November 5th is the rough equivalent of July 4th when there are fireworks displays all over the country. Britain is one of very few countries that transitioned from some sort of autocracy to a democracy without having a revolution. There is thus no “national day” to celebrate with fireworks. Instead, Britain has to celebrate a failed revolution when Guy Fawkes in 1605 attempted to blow up the houses of parliament while the king was also there. Except he got caught. In addition to fireworks, bonfires are also lit in Britain and an effigy of Guy Fawkes is burned.

Anyway, back to ICCAD. In all the hoopla that DAC next year is the 50th anniversary I only recently realized that ICCAD is celebrating its 30th anniversary this year. It started in 1982 (coincidentally also the year I immigrated to the US so a sort of 30th anniversary for me too, I guess).

There is a complete technical program, of course. But there are three keynotes that are especially noteworthy:

  • On Monday at 9am, John Gustafson of AMD will talk about The Limits of Parallelism for Simulation. As we can put dozens of core and thousands of GPU processing elements on a workstation, how can we scale up problems to take advantage of all this. A simple but powerful speedup model that includes all overhead costs shows how we can predict the limits of parallelism, and in many cases it predicts that it is possible to apply billions of processors to simulation problems that have traditionally been viewed as “embarrassingly serial.”
  • At lunchtime on Tuesday, CEDA has invited Alberto Sangiovanni-Vincentelli to talk about ICCAD at Thirty Years: where have we been, where are we going. Of course Alberto has been a key contributor to EDA research throughout that whole 30 year period at UC Berkeley, along with Richard Newton until his untimely death in 2007. He also found time to found both Cadence (where is still on the board) and Synopsys. He is also the recpipient of the EDAC/CEDA Kaufman award in 2001.
  • On Wednesday at 9am, Sebastian Thrun of Udacity will talk about Designing for an Online Learning Community.Sebastian a year ago ran a course at Stanford on machine learning that hundreds of thousands of people all over the world followed online, which made him very aware of the power of online education. He since left Stanford and founded Udacity to provide scalable online education. Plus, he is a Google fellow and in his second life he is a key contributor to Google’s driverless car project.

Details of ICCAD, including more about the above keynotes, is here.