WP_Term Object
    [term_id] => 13
    [name] => ARM
    [slug] => arm
    [term_group] => 0
    [term_taxonomy_id] => 13
    [taxonomy] => category
    [description] => 
    [parent] => 178
    [count] => 362
    [filter] => raw
    [cat_ID] => 13
    [category_count] => 362
    [category_description] => 
    [cat_name] => ARM
    [category_nicename] => arm
    [category_parent] => 178

ARM on Moore’s Law at 50: Are we planning for retirement?

ARM on Moore’s Law at 50: Are we planning for retirement?
by Scotten Jones on 01-16-2016 at 7:00 am

On Monday morning on December 7, 2016 Greg Yeric of Arm gave an excellent and wide ranging plenary talk at IEDM entitled “Moore’s Law at 50: Are we planning for retirement?”. You can download Greg’s slide deck here.

Moore’s law is now fifty years old and for the last fifty years, transistors per IC have been doubling every year! The question now is how long can we keep this up? Greg then went on to review several of the previous predictions that Moore’s law was coming to the end including a chart from the economist showing when the prediction was made and what the predicted end date was. The interesting thing is in the economist chart the lines for when the prediction was made and the line for the end-date are converging with a convergence date of 2029!

There have been a number of reports that cost per transistor is now going up but almost all of the reports trace back to work done by IBS. Greg then went on to discuss why this prediction is likely not true. First off some of the predictions around yield and wafer costs at 16/14nm appear pessimistic. Secondly, stepper speed at 7nm will be 50% faster than at 28nm, etch is also improving and there are mask cost reductions in the pipeline. Greg presented an alternate scenario that shows continued cost per transistor reductions through 7nm. The rate of cost improvement does slow down and there is also the issue that increasing mask and design cost require large volumes to make a part economical to design.

But cost is just one part of the problem, today’s systems have to meet cost, performance and power requirements. In systems today everything has a power budget and the goal is to maximize performance within the power budget.

The key problem is that parasitics now rule transistor performance. By 7nm parasitic RC delay will represent over 60% of the device delay! Contact resistance is a big problem but there doesn’t seem to be much work being done on it. This is a big problem for future devices as well; TFETs have better electrostatics but will likely not be able to produce the required drive current. FinFETs have given us improved electrostatics and better performance by virtue of the 3D folded width of the gate but this also drives up parasitics. Stacked nanowires will have the same problem and although they may help they won’t fix the problem. In theory a flat 2D devices is attractive but you have to get a lot of current through a 2D channel. If you can’t solve the parasitic problem the only solution is very low voltage. Spintronics is an interesting technology that could potentially run in the millivolt range but there are many challenges to overcome. The bottom line of all this is simple scaling no longer gets us higher speed at the same power.

The lack of improvement in power per transistor has led to dark silicon where you can’t turn on all the transistors and still meet your power budget. As you scale from 28nm to 20nm, 16/14nm, 10nm, 7nm and 5nm the percentage of transistors you have to leave off to stay in your original power budget goes to 33%, 45%, 56%, 75% and 80% respectively. This leads to the practice of spending transistor to reduce power by implementing techniques such as multi-core and memory assist, designs and new technologies have to be co-optimized. In some cases today, in order to achieve high performance logic in the minimum area we actually need to use longer gate length devices (this is likely not true in the memory and GPU areas). Supporting multiple gate pitches on the same die could be very advantageous.

Offering additional threshold voltage options on processes yields lower power products. In one example shown, a baseline design is compared to iterations where the non-critical timing transistors are changed to higher Vt transistors reducing power while maintaining the same performance. This design change drops the power by at least one node! “The ideal process will support the fastest transistor and the most energy efficient transistor, as well as several intermediate options.” Limiting options that may be required by some future technologies limits the achievable product performance. One option may be to pursue technologies that offer heterogeneous integration of multiple devices.

One really interesting point in this talk that was repeated at the Coventor event at IEDM was the importance of reducing variation. Device engineers focus on improving the mean but designers are more concerned with the distribution tails. Reducing variation is better even is the mean is lower! It was also noted that many of the proposed future devices will likely have more variability and therefore their actual performance may be less impressive than originally expected.

Up to this point the talk has focused on device to circuit scaling. To summarize the key points:

  • Moore’s law is slowing down – back end of line (BEOL) cost in particular is growing.
  • Dennard scaling is even worse – within existing device electrostatics, invest in variability reduction and focus on reducing contact resistance. For the long term devices that can operate at lower voltage while still providing high drive current are needed.
  • As Moore’s law slows – more radical change becomes viable such as heterogeneous integration.
  • Reliability and yield are also potential limiters for the future.

This brings us to architecture – technology interactions.

In the age of dark silicon we are trading utilization for efficiency. Multicore designs now feature heterogeneous cores with various levels of power and performance. These cores may not all be on at the same time depending on the task. The differences in the cores take advantage of the various threshold voltages available on the process. In the future we will likely see more dedicated accelerators.

This bring us to the interconnect issue. As processes scale to smaller nodes the distance a transistor can push a signal is decreasing. In order to maintain performance more transistor are being used but it adds to the power problem.

One way to mitigate interconnect is three dimensional stacking of ICs (3DIC). For very high transistor count designs it may also make sense to split the device into 3 smaller stacked ICs where the improvement of yield and cost for the smaller ICs may offset the 3D stacking costs. Relatively simple estimates suggest a one node improvement in power but there are many hurdles to overcome.

With respect to memory, software thinks of memory as one big pool of memory but in reality it is a complex multiple tier architecture. SRAM is trending to smaller bank sizes and more overhead (<50% area efficiency), six transistor cells are transitioning to eight transistor cells with bigger transistor and it is often the voltage scaling bottleneck. For system memory DRAM consumes up to one half of the system power including consuming power when it isn’t even “doing anything”. There are also only 2 to 3 “known” nodes left for DRAM. NAND has transition to multiple levels and 3D but for how much longer? NAND has problems with cost, power, speed and endurance but it is still better than the alternatives. A super memory is needed that can scale and is denser, faster and lower power. A fast high endurance non-volatile memory would enable new paradigms. The contenders are RRAM but there are variability, endurance and scaling issues, PCM with variability and endurance issues and MRAM with power/speed tradeoffs, disturb and cost issues.

The children of Moore’s law are now coming into their own:

  • Replacing a $370 trash can with a $5,000 smart trash can is expected to save the city of Barcelona $4 billion dollars over 10 years.
  • Photovoltaics are available for $0.50 on Alibaba
  • Accelerometers have gone from $3.00 in 2007 to $0.54 in 2014!
  • 1 trillion sensors are expected in 10 years and now the Internet of things is emerging driven by sensors and the ubiquity of the internet but energy efficiency will be key.

To achieve the next 100x improvement we need to put it all together. We need optimized layer processing for 3DICs, new bandwidth paradigms and new form factors.

In summary:

  • Moore’s law is slowing and there are no near-term magic bullets.
  • Moore’s law slowing is an opportunity for technology investment in fab/tools/process integration/device/circuit (DTCO). You have to optimize the whole chain.
  • Dennard scaling is also challenged and as important with both FETs and wires facing issues.
  • SOC informed innovation – leverage 3DIC and novel memory. Energy efficiency versus utilization gives way to heterogeneity. Future technology choices will need transistor to system benchmarks.
  • The breadth of future systems strains simple Moore’s law scaling. Slowing of Moore’s law opens the opportunity for radical change. New systems leverage the children of Moore’s law, sensors, MEMS. System requirements must be used to understand and guide underlying technology options.

And Greg left us with a last thought: “Together, at the system level, I expect Moore’s Law level progress well past my own retirement” (~2029).


0 Replies to “ARM on Moore’s Law at 50: Are we planning for retirement?”

You must register or log in to view/post comments.