RVN! 26 Banner revised (800 x 100 px) (600 x 100 px)

How to be Smart About DFT for AI Chips

How to be Smart About DFT for AI Chips
by Tom Simon on 01-31-2019 at 12:00 pm

We have entered the age of AI specific processors, where specialized silicon is being produced to tackle the compute needs of AI. Whether they use GPUs, embedded programmable logic or specialized CPUs, many AI chips are based on parallel processing. This makes sense because of the parallel nature of AI computing. As a result, in silicon for these applications we are seeing large numbers of replicated processing elements and distributed memories. These large AI designs fortunately lend themselves to advanced DFT solutions that can take advantage of their architectural characteristics.

Mentor has produced a white paper, titled “AI Chip DFT Techniques for Aggressive Time to Market”, that talks about how the properties of many large AI chips can be leveraged to save DFT, ATPG and test time. The first step they recommend is to take advantage of AI chip regularity. They propose doing test insertion and pattern generation/verification at the core level. Hierarchical DFT, like that found in Mentor’s Tessent, can use hierarchically nested cores that are already signed off for DFT to run DFT on the entire design from the top level. Higher level blocks can include blocks or cores that have already had DFT sign-off. These in turn can be signed off and used repeatedly within a chip.

Tessent’s IJTAG allows plug and play for core replication and integration. It also offers automation for chip-level DFT configuration and management. The flexibility this allows for some interesting optimizations. One such case is where there are a large number of very small cores. Mentor suggests using hierarchical grouping of cores for test to reduce overhead and save time. This is a happy middle ground between too granular and completely flat ATPG.

Another optimization that their approach allows is channel broadcasting. This allows the same test data to be used for identical groups of cores. It reduces test time and the number of pins required. Tessent is smart enough to help optimize the configuration for channel broadcasting.

In addition to repeating logic elements, AI chips have a large number of smaller distributed memory elements. If each memory core had its own BIST controller this would require a large area overhead. With Tessent it is possible for one BIST controller to be shared among multiple memory cores. To go along with this they offer a shared-bus interface to optimize the connections to the BIST controller.

Another topic the white paper covers is their move to RTL for test insertion. When this is used, it is possible to run test verification before the synthesis. RTL verification runs much faster than gate level verification. Also, the debug process is easier. Moving test debug and verification to the RTL level means that synthesis is not required each time a test fix is made. Mentor has also implemented a number of testability checks at RTL that can save down-steam iterations during ATPG.

While AI is making the lives of end users easier, it is certainly creating a demand for increasingly powerful silicon for processing. Despite this growing complexity of silicon, there is a bright spot in the test arena. Mentor clearly has been investing in their DFT product line. The good news is that many of the characteristics of these AI chips create opportunities for improving the efficiency of the design process and the resulting design, particularly in the area of test. If you want to delve into the specifics of how Mentor proposes designers take advantage of DFT optimizations for AI chips, the white paper is available on their website.


Secretary Chao Unchained @ CES 2020

Secretary Chao Unchained @ CES 2020
by Roger C. Lanctot on 01-31-2019 at 10:00 am

U.S. Department of Transportation Secretary Elaine Chao has agreed to mount the stage at the upcoming Consumer Electronics Show in Las Vegas to share her vision of the positive economic impact of technology unleashed from regulatory oversight. It’s a powerful message but it’s going to be a tough sell.

Chao is likely taking the keynote slot vacated by General Motors CEO Mary Barra in the wake of the catastrophic United Auto Workers strike which left dealer lots bare in time for Christmas and delayed plans for the company’s first electric pickup truck – at least according to official GM statements.

The front burner issue for Chao at CES 2020 will be enhanced vehicle safety from inter-vehicle (V2V) connections. To her credit Secretary Chao has carved out a technology agnostic stance on the issue which has conveniently left the door open for the Federal Communications Commission to pass a Notice of Proposed Rule Making (NPRM) last week re-allocating the 5.9GHz spectrum intended for V2V applications.

“The Commission proposes to designate the lower 45 megahertz of the band for unlicensed uses like Wi-Fi,” writes the FCC in its NPRM. “This 45 megahertz sub-band can be combined with existing unlicensed spectrum to provide cutting-edge high-throughput broadband applications on channels up to 160 megahertz wide.

“The Commission is proposing to dedicate the remaining 30 megahertz of the band for use by transportation and vehicle safety-related communication services. Specifically, in the NPRM, the Commission proposes to revise its rules to provide Cellular Vehicle to Everything (C-V2X), an emerging standard for transportation applications, with exclusive access to the upper 20 megahertz of the band.

“Under the Commission’s current rules, no spectrum is allocated for C-V2X. The NPRM seeks comment on whether to retain the remaining 10 megahertz for use by DSRC systems or to dedicate it for C-V2X use.”

If adopted, the FCC plan likely puts a fork in the plans of the National Highway Traffic Safety Administration’s efforts to mandate dedicated short range communication (DSRC) technology for the same application. This 20-year-old effort appears to have arrived at the end of the road – which one might imagine is welcome news at the Trump administration’s USDOT where regulations are being eliminated, not promulgated.

Just last week Chao released a statement that she had signed a “rule on rules” ensuring the department’s regulations aren’t “too complicated, out of date, or contradictory.” The new Transportation Department action formalized a Trump administration requirement that for each regulatory step a department takes, it must undertake two deregulatory moves.

News reports quoted USDOT claims that it had exceeded its own standard, establishing a ratio of 23 deregulatory steps for each regulatory initiative – estimating unspecified resulting industry savings of $3.7B. (The agency provided no details regarding the source of these savings or industries impacted.)

Maintaining that 23-1 ratio of deregulation to regulation may pose a challenge as the USDOT faces a growing clamor for more regulatory guidance in the development of self-driving cars and the advancement of active safety systems. Secretary Chao and the Trump administration may have painted themselves into a corner with the new mandate or simply written themselves out of the normal NHTSA script intended to reduce highway fatalities by guiding the future of automotive design.

Of course, the claim of a $3.7B contribution to industry savings from reduced regulatory oversight must be considered in the context of what is shaping up as a far more enduring industry impact from Trump administration policies as Boeing ponders the termination of 737 MAX production in January. With two fatal crashes occurring under the guidance of an acting USDOT secretary (preceding Secretary Chao), the Trump administration will have to come to terms with the $3.7B earnings hit Boeing took months ago to which $5B in victim compensation and liability has since been added to Boeings books.

Some have taken to describing the USDOTs deregulation binge as “win-ovation.” Nobody at Boeing in Chicago or Renton, Washington, would use such a word – especially as Boeing ponders what may be the demise of its most popular plane.

Maybe Secretary Chao can ask fellow CES keynoter Delta Airlines CEO Ed Bastian what he thinks about deregulation.


Qualcomm Attests Benefits of Mentor’s RealTime DRC for P&R

Qualcomm Attests Benefits of Mentor’s RealTime DRC for P&R
by Tom Simon on 01-31-2019 at 7:00 am

When floor planning (FP) and place & route (P&R) tools took over from custom layout tools for standard cell based designs, life became a lot better for designers of large digital chips. The beauty of the new flows was that all the internals of the standard cells and many IP blocks were hidden from view, lightening the load on the tools and designers. So-called footprints, in the form of Library Exchange Format (LEF) files, filled in for the internals of these cells and blocks.

A properly constructed LEF cell view is frequently adequate to give the P&R tool the ability to produce design rule correct final GDS level layout. But not always. Final DRC is always needed. As design sizes increased and the loop from final whole chip DRC to fixes and verification became unwieldy and not practical for highly sensitive chip design schedules.

This is the issue that Qualcomm encountered in their flow while designing extremely large state of the art SOCs. I recently had the chance to read a white paper that talks about the issues that Qualcomm encountered and how they solved them with Mentor’s Calibre RealTime Digital in-design DRC. Let’s look at the primary issues that they encountered.

LEF files contain an abstract of the underlying layout called a footprint. However, as the cells and blocks are placed in a design there are a large number of interactions that can take place between these cells that could lead to DRC violations. Waiting until late in the design cycle to verify all the base layer design rule correctness can leave serious issues baked into the design until they are much more difficult to fix.

The other source of issues are interactions between routing in the P&R tool and the geometry inside of the placed cells. A cause of this issue can be errors in the LEF cell views. In this case the errors might not be caught until the entire design is exported and merged with the contents of the placed cells. For large designs like those in development at Qualcomm, the chip level operation to merge all the geometry in the design can take a long time.

The white paper describes the process that Qualcomm adopted using Mentor’s Calibre RealTime Design solution. Instead of doing a merge of the entire design and running batch DRC, Calibre RealTime Digital makes direct calls to the Calibre engine running foundry qualified sign-off Calibre rule decks. It is able to run analysis on the area in proximity to where designers are working. It enables incremental DRC while in the P&R tool, including all the layout layers and nested cells and blocks that are normally not available from the P&R tool. As a result, designers get nearly instantaneous feedback on violations and feedback on potential fixes.

When geometry in inside placed cells is part of violation, selected shapes contained in the cells or blocks is shown to help designers fully understand the error, so a fix can quickly be made. This is really the best of both worlds, designers can still work efficiently in the P&R tool environment, yet they are able to detect and understand problems that arise due to the fully merged geometry and all layers.

It was years ago that Calibre completely disrupted the DRC market with the introduction of its easy to use and highly effective hierarchical capabilities. Customers immediately understood the competitive advantage that they would gain by using Calibre. The Calibre RealTime Digital interface looks to be another game changing capability to come from this formidable development team. Based on the white paper it seems that Qualcomm agrees with this sentiment.


The 50th Year of Intel, What Happened in 2018

The 50th Year of Intel, What Happened in 2018
by Daniel Payne on 01-30-2019 at 12:00 pm

2018 was the 50th year for Intel in the semiconductor business, and their Q4 2018 conference call just happened last week, so I’ll get you all caught up on what they talked about. Bob Swan is the CFO and interim CEO, as the company continues to search for a new CEO after Brian K. was ousted for misconduct. Here’s a quick financial summary for the Q4 2018 quarter:

  • EPS of $1.28, beat estimates by $0.06
  • Revenue of $18.66B, a 9.4% increase, but missed estimates by $360M

For the entire year revenue passed $70B, a growth of 13% thanks to data center, IoT, programmable, memory and modem businesses. I spend all day in the cloud by browsing, banking, reading, writing and networking, so no surprise that the Data Center Group did $23B in revenue, up 21% for the year.

I kind of expected the PC business to be flat or slightly down, but surprisingly that business grew 9% for the year. There was no big Q4 revenue uptick, so the product areas blamed are:

  • Modem
  • China slow-down
  • Slower cloud growth
  • NAND pricing

Hopes are high for three areas: AI, Autonomous Driving, 5G. All of these topics are popular on SemiWiki, so let’s see if Intel can compete on a global scale with so many competitors in each field.

Related blog:Intel Swaggers at CES

3D chip packaging (aka Foveros) has arrived, and their first product is coming later in 2019, Lakefield – a 10nm CPU, four Atom cores, Gen 11 graphics. I’m still curious about how they managed to remove all of that heat effectively.

The 9th generation of Intel Core desktop products was launched in 2018, so that benefits the scientific, gaming and content creating segments. The 10nm Ice Lake client CPUs were previewed, but not quite ready for sale in 2018.

Vision processing is a typical AI task using CNN (Convolutional Neural Networks), so Intel has a 3rd generation vision processing unit along with a toolkit called OpenVINO. Another tease for AI is something Intel calls a Neural Network Processor for Inference (NNPI), due in the 2H2019, with a product family name of Nervana.

Intel’s automotive acquisition Mobileye added some 28 new design wins and 78 vehicle model launches last year, so that segment for collision detection and avoidance is in growth mode for ADAS levels one to three.

Shareholders of INTC stock enjoyed receiving $5.5B in dividends, and the company bought back 217 million shares as the stock traded between $42 – $57.

One statistic that really jumped out at me was that revenue per employee has improved 25% in the past three years.

Just the Data Center Group at Intel produced $6B of revenue in Q4, more than most companies in the world.

Back in 2006 Intel and Micron created IM Flash Technologies, LLC, but in 2018 Micron announced their plans to fully own the technology known within Intel as 3D Xpoint, or product name of Optane. Intel can still purchase their chips from Micron, so let’s see if this segment continues to grow in 2019.

Forecasts for 2019 at Intel are:

  • Revenue of $71.5B,
  • EPS of $4.60

Q1 2019 forecast of $16B in revenue and EPS at $0.87, so no growth expected.

The transition at Intel from PC-centric to data-centric is slowly taking shape.

Data Center revenues for the first nine months of 2018 were up 45%, but the fourth quarter slowed way down. For 2019 they hope that the second half shows pickup in this area.

2018 we saw Intel exit Wind River and the wearables market. The 10nm node slowly continues to ramp up production volumes in 2019, although some 4 years later than expected.


Why High-End ML Hardware Goes Custom

Why High-End ML Hardware Goes Custom
by Bernard Murphy on 01-30-2019 at 7:00 am

In a hand-waving way it’s easy to answer why any hardware goes custom (ASIC): faster, lower power, more opportunity for differentiation, sometimes cost though price isn’t always a primary factor. But I wanted to do a bit better than hand-waving, especially because these ML hardware architectures can become pretty exotic, so I talked to Kurt Shuler, VP Marketing at Arteris IP, and I found a useful MIT tutorial paper on arXiv. Between these two sources, I think I have a better idea now.

Start with the ground reality. Arteris IP has a bunch of named customers doing ML-centric design, including for example Mobileye, Baidu, HiSilicon and NXP. Since they supply network on chip (NoC) solutions to those customers, they have to get some insight into the AI architectures that are being built today, particularly where those architectures are pushing the envelope. What they see and how they respond in their products is revealing.

I talked a bit about this in an earlier blog (On-Chip Networks at the Bleeding Edge of ML). There is still very active development, in CPUs and GPUs, around temporal architectures, primarily to drive performance – single instruction, multi-data (SIMD) fed into parallel banks of ALUs. More intriguing, there is rapidly-growing development around spatial architectures where specially-designed processing elements are arranged in a grid or some other topology. Data flows through the grid between processors, though flow is not necessarily restricted to neighbor to neighbor communication. The Google TPU is in this class; Kurt tells me he is seeing many more of this class of design appearing in his customer base.


Why prefer such strange structures? Surely the SIMD approach is simpler and more general-purpose? Neural net training provides a good example to support the spatial approach. In training, weights have to be updated iteratively through a hill-climbing optimization, aiming at maximizing a match to a target label (there are more complex examples, but this is good enough for here). Part of this generally requires a back-propagation step where certain values are passed backwards through the network to determined how much each weight influences the degree of mismatch (read here to get a more precise description).

Which means intermediate network values need to be stored to support that back-propagation. For performance and power you’ll want to cache data close to compute. Since processing is quite massively distributed, you need multiple levels of storage, with register file storage (perhaps 1kB) attached to each processing element, then layers of larger shared caches, and ultimately DRAM. The DRAM storage is often high-bandwidth memory, perhaps stacked on or next to the compute die.

Here it gets really interesting. The hardware goal is to optimize the architecture for performance and power for neural nets, not for general purpose compute. Strategies are based on operation ordering, eg minimizing update of weights or partial sums or minimizing size of local memory by providing methods to multicast data within the array. The outcome, at least here, is that the connectivity in an optimized array is probably not going to be homogenous, unlike a traditional systolic array. Which I’m guessing is why every reference I have seen assumes that the connectivity is NoC-based. Hence Arteris IP’s active involvement with so many design organizations working on ML accelerators.

More interesting still are architectures to support recurrent NNs (RNNs). Here the network needs to support feedback. An obvious way to do this is to connect the right -side of the mesh to the left-side forming (topologically) a 3D cylinder. You can then connect the top-side to the bottom-side, making (again topologically) a torus. Since Kurt tells me he hears about these topologies from a lot of customers, I’m guessing a lot of folks are building RNN accelerators. (If you’re concerned about how such structures are built in semiconductor processes, don’t be. These are topological equivalents, but the actual structure is still flat. Geometrically, feedback is still routed in the usual way.)

I see two significant takeaways from this:

  • Architecture is driven from top-level software goals and is more finely tuned to the NN objective than you will find in any traditional application processor or accelerator; under these circumstances, there is no one “best” architecture. Each team is going to optimize to their goals.
  • What defines these architecture topologies, almost more than anything else, is the interconnect. This has to be high-throughput, low-power, routing friendly and very highly configurable, down to routing-node by routing-node. There needs to be support for multicast, embedding local cache where needed and high-performance connectivity to high-bandwidth memory. And of course in implementation, designers need efficient tools to plan this network for optimal floorplan, congestion and timing across some very large designs.

I hope Kurt’s insights and my own voyage of discovery added a little more to your understanding of what’s happening in this highly-dynamic space. You can learn more about what Arteris IP is doing to support AI in these leading-edge ML design teams HERE. They certainly seem to be in a pretty unique position in this area.


Switch Design Signoff with IC Validator

Switch Design Signoff with IC Validator
by Alex Tan on 01-29-2019 at 12:00 pm

The surge of network traffic at the data centers has driven to an increase in network bandwidth, doubling every 12-15 months according to a study conducted on Google’s data centers. The primary drivers to this uptick include the proliferation of cloud computing, more distributed storage architecture, emerging applications in AI, 5G and video streamings.

Innovium is a provider of high performance and highly scalable switching silicon solutions for data centers. Its TERALYNXTM Ethernet Switch family supports switch capacity ranging from 3.2 Tbps through 12.8 Tbps with large buffers. Key to its network switch design requirements are the power efficiency in terms of performance per watt, a very low latency and inherently high port count connectivity. Innovium has rolled-out several generation of silicon starting with 28nm at 3.2Tbps and the most recent one in 16nm at 12.8Tbps. One of the main challenges to its silicon design is the need to reduce the physical verification time.

IC Validator™ is a comprehensive physical verification solution from Synopsys. It delivers both performance scalability and a broad runset support for advanced process nodes including 7nm FinFET. IC Validator’s Design Rule Checking (DRC) and Layout Versus Schematic (LVS) physical verification engine has near-linear scalability performance across hundreds of CPU cores and substantially reduces the time to results as shown in Figure 2.

Earlier this month, Synopsys announced Innovium adoption of IC Validator as its TERALYNX physical signoff tool. Innovium was able to take advantage of IC Validator’s performance scaling across 250 plus CPU cores to complete full-chip DRC/LVS signoff in TSMC 16 FinFET process within a day.

“Physical verification is on the critical path to our tapeout. Early physical verification closure is essential to ensure that design schedules are met,” said Keith Ring, vice president of Technology at Innovium. “IC Validator performance enabled us to complete full-chip DRC and LVS signoff within a day for our flagship network switch design.”

“Designers are challenged to close physical verification within schedule because of the increasing manufacturing complexity at advanced technology nodes,” said Christen Decoin, senior director of business development, Design Group at Synopsys. “Through high performance, scalability, and readily available optimized runsets from all major foundries, IC Validator is providing designers with the fastest path to production silicon.”

I had the opportunity to talk about this announcement with Manoz Palaparthi, Synopsys Technical Marketing Manager. The following is his excerpted responses to my inquiries:

What types of challenges that Innovium overcame by migrating to IC Validator physical signoff?
Performance was a key concern for Innovium. The design is large with several billions of transistors to verify. As such, the traditional full-chip DRC signoff happens late in the design flow and its long runtimes can lead to tapeout delays. With IC Validator, Innovium could complete full chip DRC and LVS runs under one day. Innovium used IC Validator across more than 250 CPU cores to take advantage of IC Validator’s distributed processing and scalability.

Which parts of IC Validator verification features being utilized by Innovium?
Innovium deployed IC Validator for all of their physical verification needs, including DRC, LVS, Antenna checks and metal-Fill.

Could you comment on how IC Validator smart memory-aware load scheduling and balancing technology work?
Yes. The memory aware scheduling and smart load sharing technologies are built into the IC Validator scheduler.

Memory aware scheduling enables jobs to be scheduled based on their individual memory requirements. IC Validator scheduler estimates memory needs in advance. If a job requires large memory, for example 512GB or 1TB, it is scheduled on a large machine. Lighter jobs are scheduled on smaller machines. And as the jobs progress, the tool dynamically adjusts and reschedules as needed. With smart load sharing technology, IC Validator continuously monitors jobs and dynamically optimizes them: are the jobs progressing well? Have some machines died? Is load balancing and rescheduling required from some jobs?

Once the jobs start running, scheduler ensures that jobs run as efficiently as possible. This technology, combined with massive distributed processing scalability, accelerates time to results for customers like Innovium.

IC Validator has evolved to a comprehensive physical signoff tool, could you comment on how you keep adding values to the tool?
The focus for IC Validator product is on delivering highest productivity to the physical verification engineer. In that direction, we recently introduced lot of technology to help our customers to get to their tapeouts faster: Massively parallel distributed processing with scalability to 1000s of CPUs, Explorer to quickly identify and fix gross design weaknesses during chip integration and fusion technology for automated DRC repair, timing aware metal fill and more.

For potential users who would like to migrate to IC Validator, could you share what the expected pre-adoption collaboration time?
Potential customers can migrate to IC Validator very quickly. IC Validator runsets are readily available for all mainstream process nodes from foundry partners such as TSMC, GF, Samsung and more.

After the runsets are ready, it is just a matter of few hours to setup the tool and start running PV jobs with IC Validator. Over the next few days/weeks, customers typically do runs with some designs to evaluate tool for performance and features. Several of our recent customers are able to deploy IC Validator in production within a month.

To find out more details on IC Validator, please check HERE


Mathematics are Hard – That is Why AI Needs Mathematics Hardware

Mathematics are Hard – That is Why AI Needs Mathematics Hardware
by Tom Simon on 01-29-2019 at 7:00 am

The field of artificial intelligence has relied on heavy inspiration from the world of natural intelligence, such as the human mind, to build working systems that can learn and act on new information based on that learning. In natural networks, neurons do the work, deciding when to fire based on huge numbers of inputs. The relationship between the inputs, in the form of incoming synapses, and the act of firing an outgoing synapse is called the transfer function. In the wetware of our brains there is a complex interplay of neurotransmitters and ion flux across cell membranes that define the activation function for a given neuron. In some cases, thousands of incoming inputs can control firing. In each part of the brain where information is processed there might be a multitude of parallel neurons layered in series to perform a task.

So too, in artificial neural networks, there are layers of highly interconnected neurons that receive input and must decide to fire based on an activation function. Over the years, AI researchers have experimented with a wide range of activation functions, i.e. step function, linear, sigmoid, etc. Sigmoid and sigmoid-like functions have become popular because of their responsiveness and suitability for building multilayered networks. There are, of course, other desirable types, but many of these share the need to vary sensitivity as a function of input level.

A direct consequence of the use of sigmoid functions, with their varying sensitivity, is the use of floating point operations to implement the activation functions for many classes of neural networks. Initially it was sufficient to rely on the processing capabilities of standard processors for the implementation of neural networks, but with increasing requirements for throughput and optimization, specialized hardware is used more and more often. Specialized floating point processing has proven itself to offer big benefits. In addition to lack of parallelization, CPU based floating point units often did not fit the precision, area, power or performance needs of neural networks. This is especially true for neural networks tailored for specific tasks.

It makes more sense for designers of SOCs that perform neural network processing to pick and choose the best floating point IP for their application. The considerations might include optimization through fusing specific functions, controlling rounding, and minimizing redundant normalization operations. It is also worth pointing out that when lower precision will work, reductions in area and power can be made.

It was with great interest that I read an article by Synopsys on how their DesignWare library offers floating point functions that can be used for neural network implementations. When their floating point operations are pipelined it is possible to use their Flexible Floating Point (FFP) format to transfer intermediate results and meta data between operations. This gives designers the ability to precisely specify all aspects of the float operations. To ensure ideal performance and behavior, Synopsys supplies bit-accurate C++ libraries, allowing system simulation that includes desired floating point operations.

In some ways, the future of neural networks is about lower precision. A lot of work is going on into driving towards sparse matrix operations at the lowest resolutions. However, at these precisions the sensitivity and accuracy of the operations must be ensured. Designers need the ability to explore and then define the optimal configurations and operation of special purpose neural networks. The Synopsys DesignWare IP offerings for floating point functions seem like a useful tool to accomplish this. The article referenced above can be found on their website.


Accuracy of In-Chip Monitoring for Thermal Guard-banding

Accuracy of In-Chip Monitoring for Thermal Guard-banding
by Daniel Payne on 01-28-2019 at 12:00 pm

I remember working at Intel and viewing my first SPICE netlist for a DRAM chip, because there was this temperature statement with a number after it, so being a new college graduate I asked lots of questions, like, “What is that temperature value?”

My co-worker answered, “Oh, that’s the estimated junction temperature of the chip.”

The next question was, “What do you mean estimated, don’t we know what the junction temperature actually is?”

With a slight grimace, the co-worked replied, “Well, we don’t know what the actual junction temperature is until we fabricate it, package it up, place it in the tester, then measure it. So we just estimate the junction temperature based on past experience, then put that number in SPICE.”

My naive engineering bubble had just burst, because I assumed that my professional colleagues in industry knew a lot more about the DRAM chips that they were designing than to simply guess at a temperature for use in SPICE, then hope for the best when silicon came back. Today, however, the IC design landscape has changed quite a bit, even to the point that engineers can place an actual IP block on their chip that will dynamically measure the local temperature in real time, aka In-Chip Monitoring.

Going back to the DRAM example at Intel we first started out packaging the memory device in a rather expensive ceramic package which had excellent thermal properties, but then for cost-savings we would migrate to a cheaper plastic package with poor thermal properties, so knowing the junction temperature made a huge difference in the operation of the DRAM and the profit margin of our product.

Chips are being designed today across a wide range of process nodes from the mature 40nm down to research nodes like 3nm, and at each node you have to keep your chip operating within a safe thermal limit in order to meet power and reliability requirements. Many design segments are limited by thermal considerations for semiconductor devices, like: Datacenter, IoT, consumer and automotive. If you can sense the die temperature and then manage the operation of the chip to keep within thermal limits, you will save power and improve reliability.

Let’s consider using an in-chip thermal monitor where we have a target junction temperature of 85 degrees C. If the temperature sensor accuracy is plus or minus 5C, then our expected temperature range is 80C to 90C. When the worse-case lower point to the temperature of 80 degrees C is reached then the chip could slow down a clock frequency or even reduce the Vdd level to one or more IP blocks. So, within software you may decide to set such actions to be taken at the 80C, to be on the safe-side. However, by setting the software limit to 80C you still need to account for the worse case thermal sensor accuracy. Therefore, within the thermal guard-banding scheme, software thinks 80C is reached but actual junction temperature could be as low as 75C.

In comparison what happens if we instead use a temperature sensor with a tighter accuracy of plus or minus 2C?

The good news is that this more accurate temperature sensor has a tighter range of 83C to 87C, and then with guard-banding has a lower limit of 81C. The difference between the first temperature sensor limit and the second one then becomes 81C – 75C = 6C. That 6C difference means a lot, and could be between 5W and 10W of power savings, depending on the architecture.

When talking about a consumer hand-held device running on battery power, that 5W-10W savings means longer battery life, a real benefit. On the other end of the electronics power spectrum like a data center or telecom system this savings would be seen in system energy consumption, speed and data throughput. An automotive benefit is tighter reliability management of the semiconductor device.

An IP supplier based in the UK that focuses on in-chip monitoring of temperature is Moortec, and Stephen Crosher is the CEO who recently made a video on this thermal topic. Stay tuned for a video series from Moortec because they also have IP sensors for Process and Voltage, parts of the PVT troika.

Related Blogs


Why we will all benefit from the next space race

Why we will all benefit from the next space race
by Vivek Wadhwa on 01-27-2019 at 7:00 am

Until January 3, no human being had ever set eyes upon the “dark side” of the moon: the side always facing away from the Earth. It always remained a mystery. But no longer. China’s National Space Administration successfully landed a lunar lander, Chang’e-4, at South Pole-Aitken, the moon’s largest and deepest basin. Its lunar rover Yutu-2 is sending home dozens of pictures so that we can see the soil, rocks, and craters for ourselves. Seeds it took on the journey also just germinated, making this the first time any biological matter from Earth has been cultivated on the Moon.

Scientists had long speculated about the existence of water on the Moon — which would be necessary to grow crops and build settlements. India’s Chandrayaan-1 satellite confirmed a decade ago that there was water in the Moon’s exosphere, and in August 2018, it helped NASA find water ice on the surface of the darkest and coldest parts of its polar regions.

India’s Mangalyaan satellite went even further, to Mars, in 2014, and is sending back stunning images. Prime Minister Narendra Modi has promised a manned mission to space by 2022, and one Indian startup, Team Indus, has already built a lunar rover that can help with the exploration.

The Americans and Soviets may have started the space race in their quest for global domination, but China, India, Japan, and others have joined it. The most interesting entrants are entrepreneurs such as Elon Musk, Jeff Bezos, Richard Branson, and Team Indus’ Rahul Narayan. They are space explorers like the ones we saw in the science fiction, driven by ego, curiosity, and desire to make an impact on humanity. Technology has levelled the playing field so that even startups can compete and collaborate with governments.

In the fifty years since the Apollo 8 crew became the first to go round the moon and return, the exponential advance of technology has dramatically lowered the entry barriers. The accelerometers, gyroscopes, and precision navigation systems that cost millions and were national secrets are now available for a few cents on Alibaba. These are what enable the functioning of Google Maps and Apple health apps—and make space travel possible.
Satellites, rockets, and rovers are also much more affordable.

The NASA space-shuttle program cost about $209 billion over its lifetime and made a total of 135 flights, costing an average per launch of nearly $1.6 billion. Its single-use rockets were priced in the hundreds of millions of dollars. Elon Musk’s company SpaceX now offers launch services for $62 million for its reusable Falcon 9 rockets, which can carry a load of 4020 Kg. And yes, discounts are available for bulk. Team Indus built their lunar rover with only $35 million of funding and a team of rag-tag engineers in Bangalore.

NASA catalyzed the creation of technologies as diverse as home insulation, miniature cameras, CAT scans, LEDs, landmine removal, athletic shoes, foil blankets, water-purification technology, ear thermometers, memory foam, freeze-dried food, and baby formulas. We can expect the new forays into space to yield even more. The opportunities are endless: biological experimentation, resource extraction, figuring out how to live on other planets, space travel, and tourism. The technologies will include next-generation nano satellites, image sensors, GPS, communication networks, and a host of innovations we haven’t conceived of yet. We can also expect to be manufacturing in space and 3D-printing buildings for space colonies.

Developments in each of these frontiers will provide new insights and innovations for life on earth. Learning about growing plants on the moon can help us to grow plants in difficult conditions on this planet. The buildings NASA creates for Mars will be a model for housing in extreme climates.

As with every technology advance, there are also new fears and risks. Next-generation imagery can provide military advantages through intelligence gathering. The military already has an uncanny ability to track specific people and watch them in incredible detail. For any sort of space station or base on another planet or moon, there is the question of who sets the rules, standards, and language that’s used in outer space. Then there’s the larger question of whose ethical and social values will guide the space communities of the future — and the even larger question of whether places beyond Earth are ethically claimable as property at all.

Regardless of the risks, the era of space exploration has begun and we can expect many exciting breakthroughs. We can also start dreaming about the places we want to visit in the heavens.

For more on how we can create the amazing future of Star Trek, please read my book: The Driver in the Driverless Car


ASML and Memory Loss 2019

ASML and Memory Loss 2019
by Robert Maire on 01-25-2019 at 7:00 am

ASML reported a more or less in line quarter as expected, coming in at EUR3.14B in revenues and EPS of EUR1.87. However, guidance was worse than most analysts were expecting with Q1 revenues expected to be EUR2.1B or down about one third.

This cut is something we have been talking about for a while as we have expected sharp memory CAPEX cuts with recent logic/foundry cuts adding to the downturn. Since things were going to be bad anyway, ASML decided to throw in the kitchen sink and settle its lawsuit with Nikon which will cut Q1 even further after its payments to Nikon for patent infringement. Orders dropped very sharply from EUR2.2B in Q3 to EUR1.59B in Q4 but more importantly, 80% of those orders were for logic/foundry which means that memory spending has virtually ground to a halt.

This means that memory spending is down about 50% from peak levels.

The breakout of system sales tells the story; USA (meaning Intel since GF is dead) went from 5% in Q3 to 32% in Q4, China stayed at 18%, Korea dropped from 33% to 26% and Taiwan (TSMC) dropped from 30% to 20%. In Q3, memory was 58% of sales whereas in Q4 memory fell to 40%. 5 EUV systems were shipped, same as Q3.

Memory orders fell from 63% of orders in Q3 to 20% of orders in Q4 as memory drove off a cliff without skid marks.
Unfortunately ASML is stuck in a typical downcycle which they can’t do a lot about. In our view, it is clear that ASML will be one of the least impacted companies, as litho systems have the longest lead times and customers never want to step out of the queue for fear of not having litho capacity during an upturn. In addition, ASML also has the significant benefit of the EUV transition which we think remains on track.

Perhaps the biggest question from an investment perspective is how long and how deep is the downcycle. On the call, management echoed statements from their customers which suggested things getting better in H2 2019. However, there is zero evidence to suggest a second half recovery other than hope and good wishes. Even if memory prices stabilize it doesn’t mean that capex spending will pick up again. We would expect capex spending increases to only come after memory pricing has stabilized and perhaps started to move north again.

In the mean time we have a slowdown in foundry spending which will likely add to the length of the downturn as the negative Apple news trickles down through the supply chain.

The company did announce a 50% increase in dividend which will offset some of the negative news and Nikon settlement.

ASML the stock
As we have seen with other stocks, there is increasing down side resistance to bad news or worse than expected news. In addition, ASML has the benefit of more European investors who tend to be longer term and more resistant to near term negative news. European investors will also like the dividend increase. As such we expect the stock to trade flat to up as “it could have been worse” is the likely prevailing sentiment. We still favor the shares of ASML given their monopoly position, EUV progress and solid financial performance.

Collateral Damage
As we have been saying for a while, ASML and KLAC will be less impacted while LRCX and AMAT will be much more impacted by collapsing memory capex.

Lam, which is the poster child of memory making equipment, is the most impacted as at its peak memory was 84% of revenues. Given that TEL and Hitachi have a bigger share of Intel’s spend versus Lam, we do not expect Intel spending to offset memory weakness as it did at ASML. AMAT will be impacted more by weakened spend coming out of TSMC. LRCX reports tonight and we would expect them, to reset their guidance to a lower range than most are expecting, as analysts have been slow to reflect or admit to the reality of the down cycle.