webinar banner2025 (1)

Key Applications for Chip Monitoring

Key Applications for Chip Monitoring
by Daniel Nenni on 04-24-2020 at 2:00 pm

Richard McPartland

One of the side benefits of working with SemiWiki is that you get to meet a broad range of people and in the semiconductor industry that means a broad range of very smart people, absolutely. Recently I had the pleasure to meet Richard McPartland of Moortec. Richard and I started in the semiconductor industry at the same time but from across the pond as they say. Richard started at UK semiconductor pioneer Plessey in the early 1980s as an IC designer. The stories he can and does tell…

Richard and I are working on an upcoming webinar on optimizing power and increasing data throughput in advanced multi-core AI/ML/DL devices. Artificial intelligence, machine learning, and deep learning are touching just about every new design so this webinar will be a full one. Be sure and register to attend the event and you will get a link to the replay. Here is the webinar abstract:

If you are working on complex Artificial Intelligence (AI) or Machine Learning (ML) or Deep Learning (DL) designs using advanced node processes, you will understand the motivations for optimising CPU utilisation, device power and processing speed. Cutting-edge AI, ML & DL chips, by their very nature, are susceptible to intra-die process variability. Designers are often walking a fine line between optimal performance and failure.

This webinar from Moortec looks at how close real-time analysis of dynamic conditions, as well as identifying process corners, using embedded in-chip monitoring fabrics based on advanced node processes can greatly improve the power consumption, data throughput and computational performance of the overall system design.

Topics covered will include how tight dynamic guard-banding will enable improvement for the optimisation of multi-core utilisation, thermal load balancing and fine-grain SVS/AVS control, whilst the device is in mission-mode.

Due to their experience and dedication to In-Chip Monitoring, Moortec are able to support companies who are operating at the cutting edge of AI and Machine Learning chip design. Such companies have utilised Moortec’s highly accurate, highly featured sensors within their in-chip monitoring subsystem to ensure optimal performance and enhanced reliability.

Richard is also a fellow blogger. You can catch his musings on the Moortec blog site Talking Sense with Moortec.

He’s got two up thus far and they are very good:

Talking Sense with Moortec – Key Applications for In Chip Monitoring…In-Die Process Speed Detection
Chip designers working on advanced nodes typically include a fabric of sensors spread across the die for a number of very specific reasons. In this, the second of a three-part blog series Richard McPartland, Moortec’s Technical Marketing Manager continues to explore some of the key applications and benefits of these types of sensing solutions. In this instalment the focus is In-Die Process Speed Detection and why understanding in-chip process speed detection alongside thermal & supply conditions is essential if you want to maximise performance and power, improve reliability and ultimately reduce costs your cutting-edge design…

Talking Sense With Moortec – Key Applications For In Chip Monitoring…Thermal Sensing
The latest SoCs on advanced semiconductor nodes typically include a fabric of sensors spread across the die and for good reason. But why and what are the benefits? This first blog of a three-part series explores some of the key applications for In-chip thermal sensing and why embedding in-chip monitoring IP is an essential step to maximise performance and reliability and minimise power, or a combination of these objectives…

Moortec is one of the more collaborative companies I have worked with. They participate in most of the events I frequent and many more. It would probably be easier to list companies that they do not work with because their customer list is extensive. And when you do chip monitoring, analytics and optimization you get first hand experience with leading edge design challenges. So who better to partner with?

 


CEO Interview: Jason Xing of Empyrean Software

CEO Interview: Jason Xing of Empyrean Software
by Daniel Payne on 04-24-2020 at 10:00 am

empyrean

It’s been about seven years since Randy Smith last interviewed Jason Xing, the President/CEO of North America for Empyrean Software, so the timing felt good for a fresh update. I’ve been watching Empyrean at DAC for several years now, and have come away impressed with their growth and focus on some difficult IC design problems:

  • GPU-powered SPICE circuit simulator (ALPS-GT)
  • High capacity, parallel SPICE circuit simulator (ALPS)
  • IC Layout Analysis and chip finishing (Skipper)
  • Timing ECO (Empyrean XTop)

Q&A

How has Empyrean adapted to the pandemic, is work continuing but remotely?

Every year Empyrean has made significant investment in R&D which forms a big portion of its staffing. After the Pandemic broke out, we quickly enabled most employees to work from home by setting up remote server access and expanding online meeting capacities from Zoom and Webex. The pandemic limited employee mobility but saved on their commute time. Our customer engagement teams use this extra time to make deeper business planning, review case studies, and catch up on product technologies.

Has Empyrean been using Video technology to keep in contact with its employees and customers, or have you adopted any new apps to keep the business running and support customers?

Empyrean has used most major video technologies like Zoom, Webex, and XYLink. Also we use wechat for small scale and casual meetings as well.

With fewer trade shows and conferences happening in 2020, how will Empyrean connect with new customers?

Some of our target new customers were incubated in the past. During this pandemic, we just continue the business engagements through video technology communication and tool evaluations. We also try to reach new customers with online articles and webinars.

2019 just wrapped up, so what kind of progress did Empyrean make last year in the EDA industry?

In 2019, Empyrean successfully released a new product, the GPU powered, high performance, parallel SPICE simulator, ALPS-GT, which achieved over 10X performance speedup over competitors on large, post-layout simulation with high accuracy. This product has been adopted by several top-tier design houses and IP vendors.

Empyrean has also developed and perfected a design flow for flat-panel designs, which have been adopted major FPD IDMs.

What are the current semiconductor design challenges that your company is addressing in 2020?

We’re focused on the following four design challenges:

Analog design verification, including traditional analog design, structured memory design, and RF designs.

Difficult to debug, post-layout AMS designs at advanced nodes,

STA signoff is too pessimistic and cannot guard-band designs for very advanced design process and IOT designs,

Library characterization is too power and time-consuming for advanced design processes.

What kind of events will Empyrean be attending this year?

Empyrean will attend DAC, TSMC Symposium and OIP.

Which customers can you talk about from 2019 that were using Empyrean tools for IC design?

nVidia and Xilinx.

What would a successful 2020 look like for Empyrean?

Successfully roll out planned technology in our disruptive products and gain customer satisfaction for our products and support.

How do you compete with the solutions from big EDA companies?

Empyrean builds a competitive edge with  innovative or disruptive technologies to create products in a niche or void market place. Also, Empyrean tries to provide the best available customer support as well.

Did you see much competition for ALPS in the circuit simulation segment and how did you address it?

Yes. We did see competition. However, we saw that major competitors used massive RC reduction in order to gain simulation speed, which is not acceptable for post-layout simulation of designs with high accuracy at advanced nodes. Our ALPS/ALPS-GT circuit simulators excel in this type of long, accurate simulation and also our product team are still working hard to provide best innovations to achieve fast and accurate simulation for such designs.

Summary

I like the new products coming out of Empyrean for IC designers, and their customers include tier one semiconductor companies, all good signs. To take the next step, just contact Empyrean in Silicon Valley, China, Japan, Korea or Singapore. Empyrean also has a new webinar coming up. Even if you cannot attend on that day register and you will get a link to the replay:

WEBINAR: IP Integration Challenges of Complex SoC Platforms

Also read:

More CEO Interviews:

Executive Interview: Howie Bernstein of HCL

CEO Interview: Adnan Hamid of Breker Systems

CEO Interview: Cristian Amitroaie of AMIQ EDA


Using ML Acceleration Hardware for Improved DSP Performance

Using ML Acceleration Hardware for Improved DSP Performance
by Tom Simon on 04-24-2020 at 6:00 am

nnMAX Flex Logix Tile

Some amazing hardware is being designed to accelerate AI/ML, most of which features large numbers of MAC units. Given that MAC units are like the lego blocks of digital math, they are also useful for a number of other applications. System designers are waking up to the idea of repurposing AI accelerators for DSP functions such as FIR filter implementation. Word of this demand comes from Flex Logix, who have seen their customers asking if their nnMAX based InferX X1 accelerator chips can be used effectively for DSP based FIR. Flex Logix’s response is a resounding yes, with supporting information given recently at the Spring Linley Processor conference in March.

Flex Logix Senior VP Cheng C. Wang gave a presentation titled “DSP Acceleration using nnMAX” that shows how effective the nnMAX architecture is when applied to DSP functions. Applications such as 5G, testers, base stations, radar and imaging all need high sample rates and large numbers of taps. Some system designers are using expensive FPGAs or high-end DSP chips to get the performance they need. In fact, many times expensive FPGA are being used just for their DSP units.

The nnMAX tiles used in the InferX X1 each contain 1024 configurable MACs that run at 933MHz. They support INT8x8 and INT16x8 at full throughput. BFloat16x16 and INT16x16 run at half throughput. There is also support for mixed precision as well. nnMAX also provides Winograd acceleration for INT8 that can boost performance by 2.25x. For AI/ML nnMAX can be programmed by TensorFlow Lite/ONNX, with multiple models running simultaneously.

The nnMAX tiles can be arrayed to add compute capacity. In the InferX X1 each tile has 2MB L2 SRAM. Going to a 2X2 or even up to 7X7 provides exponential improvement in performance. NMAX clusters are assembled from arrays, with each cluster performing a 32 bit tap filter. When longer filters are needed, NMAX clusters are chained to form thousands or tens of thousands of taps.

Cheng gave several examples of possible configurations. For instance, at 1,000 MegaSamples per second (1GHz clock), a nnMAX cluster gives 16 taps, a nnMAX 1K tile gives 256 taps and a 2×2 nnMAX array gives 1024 taps. So, what does all this translate to in terms of FIR operation?

In one of the slides for the Linley presentation Cheng compares nnMAX to a Virtex Ultrascale. The comparison shows that a nnMAX 2×2 array can run 1000 taps at the same rate as an Ultrascale 21 tap FIR. Considering that the Ultrascale is 100’s of mm*2 and hundreds of dollars, and the nnMAX 2×2 array with 8MB SRAM is just 26 mm*2, these are impressive results. Cheng also provides an eye-opening comparison with CEVA XC16 and nnMAX. There is a link below to the full presentation with all the comparison numbers.

Cheng pointed out that the FIR application was the first one they tackled with the nnMAX. They already have a major customer for this and are working on improved usability by taking Matlab output to map onto the nnMAX. There will also be a technology port for the nnMAX adding GF12LLP and TSMC N7/N6 as new nodes. Their next target application will be fast FFT.

So, it seems that there is collateral benefit from the development of AI accelerators. Of course, even just for AI/ML, the nnMAX technology offers very high performance and performance per dollar. The full slide deck for the Linley presentation is available on the Flex Logix website. It offers more detail than can be provided here. I suggest taking a look at it if you are looking for AI/ML or DSP acceleration.


Tracing Technology’s Evolution with Patents

Tracing Technology’s Evolution with Patents
by Arabinda Das on 04-23-2020 at 10:00 am

Figure 1

We live in an age of abundant information. There is a tremendous exchange of ideas crisscrossing the world enabling new innovative type of products to pop up daily. Therefore, in this era there is a greater need to understand competitive intelligence. Corporate companies today are interested in what other competitors are brewing in their R&D labs and in predicting what novel application is coming up in the market so as to determine the best possible plan of action to counterattack. Moreover, new players with radically innovative ideas are rapidly emerging as partly deduced from the massive shift in the patent filing scenario in the past years. For example, in 2000, the three countries which filed the most patents were US, Japan and Germany. But since 2019, China has become the largest patent filing country with World Intellectual Property Organization (WIPO), surpassing USA, Japan, and Germany. South Korea has also emerged as a top five patent producers [1]. Companies around the world are looking for a synthesis of information from this data deluge. They are relying on industry experts to provide the technological know-how but also on patent engineers or analysts to perform the analysis of intellectual property (IP) of a particular company and/or a whole industry. Their aim is to understand the activities of the main players as well as the fields in which they dominate. Creating such a detailed patent landscape is time-consuming and complex, however, the end result could provide deep insights into the technology and the market.

I have come across several thorough patent landscapes that have predicted emerging technologies quite accurately. However, I have found mixed results for semiconductor road maps especially those related to advanced logic devices. Specifically, some of the major technologically break-through concepts in advanced logic devices were not predicted in time by market analysts or industry experts. The most striking example is the introduction of finFET device (a tri-gate where the gate wraps around the silicon fin for better control of the channel) by Intel in 2012 for its i5-3550 processor which arrived completely as a surprise to the industry.

The story gets even more interesting after the introduction of finFET devices. Very quickly there were multiple reports that after 10 nm node finFET devices were not going to be extendable. Solutions were proposed in public forums like IEEE papers, IEDM and VLSI conferences. Needless to say, prior to the publication of every proposed solution in a public literature, multiple patents related to them were filed by all major device manufacturers. All the patents and non-patent literature could be grouped into two categories: new materials or new device architectures. They discussed either new materials with existing technologies or suggested radical solutions where new device architectures were fabricated with new materials. For example, some of the serious propositions with prototype data were the following device structures: ultra-thin-body (UTB) field-effect-transistor (FET) based on silicon-on-insulator (SOI), gate-all-around (GAA) involving nano-wires/nano-sheets stacked horizontally or vertically, tunneling FET (TFET), and stacked FET. Meanwhile the materials section mainly focused on silicon -germanium (SiGe) replacing the silicon (Si) channel for PMOS or using III-V compounds. However, today, we are at 7 nm node and slowly transitioning to 5 nm node and still moving forward with the original finFET configuration.

I wondered why these predictions were inaccurate and came to the following conclusions. Firstly, all these suggested devices in spite of their strengths had some serious concerns too. The ultra-thin-body (UTB) architecture gave the possibility of back biasing and also had low consumption of power. The initial wafer cost was high then. UTB is now not used but SOI based technology is currently widely prevalent in the market despite not being used in high speed processors. Similarly the GAA concepts provided better electrostatic control of the channel but required two materials which could be deposited one top of each other, each of them having a very different etch selectivity for the same etching chemistry. The onus on deposition and etching was high, which made the overall process flow very expensive. Vertical GAA FET devices which required major integration change as the wire-shaped channel regions were perpendicular to the substrate (implying that source and drain regions were not on the same plane) were especially hindered by their requirements. This implied additional process steps involving deposition and etching which would make the manufacturing of advanced logic devices even more expensive. Regarding TFET, there was the promise of attaining the sub-threshold slope limit of 55mV/dec, which could open new applications for low power computing. However, the band gap tunneling based TFET devices unfortunately lacked a robust drive current. Next, let us consider stacked FET devices. This idea had been floating since a long time in the technical forum. In this concept, transistors are stacked one on top of another. Either the transistors are made in separate wafers and bonded or they are fabricated directly on the lower layer of transistors. This requires good bonding techniques or proper controlling of the thermal budget for the top devices. Additionally, controlling the implant process could be difficult on the stacked layer. Back in 2012, the solutions were not ready. What about SiGe replacing Si? Most of the patents filed and literature submitted highlighted two possible scenarios both of which involved integration methods post fin formation. One requires growing SiGe on the side walls, while the other is recessing the fins between the isolation structures and growing SiGe on top of the fin (see figure 1). Both methods required at least additional mask sets and numerous process steps, which suggested that the end result would be expensive.

If you observe the track history of semiconductor manufacturers it becomes evident why none of these concepts ever made it into the mainstream. The continuous miniaturization or scaling of the devices has maintained the transistor count trend in accordance with Moore’s law even today [2]. The scaling is actually the shrinkage of all the dimensions of metal-oxide-semiconductor field effect transistor (MOSFET). Every time the semiconductor manufacturers were faced with process challenges or design difficulties due to scaling, they analyzed what is the smallest change that could be made in the  integration scheme in order to continue to use the existing tool set and process flows in the new technology node. They also had to consider whether new processes that were to be introduced could be extended to future nodes. The strategy is that in every technology node when some new process-integration step is introduced, the majority of other process steps are kept unaltered. The direct result of this strategy is that with each coming generation the process-flow becomes more stable and reliable.

This strategy of minimum change for every new generation is well exemplified in Intel’s processors. Intel’s 22 nm had the 5th generation of strained silicon engineering with raised source-drain having embedded graded SiGe for PMOS channel, and embedded Si for NMOS. Similarly, for channel and gate engineering, high-k with replacement metal gates were introduced in 45 nm node and was further improved in 32 nm node and finally implemented in 22 nm finFET structure. Intel has maintained the same finFET architecture up to 10 nm. Yet the device performance has improved and the transistors per unit area count has increased. In the case of TSMC it is equally impressive, TSMC introduced finFET device at 16 nm node in the iPhone 7 processor in 2016, and since has produced three new generations of finFET devices. According to the press release, it will also continue to use finFET devices in their 5 nm devices [3].

Needless to say the devil is in the details; detailed structural analyses are needed to understand the process evolution. Even though finFET configuration has remained as the workhorse since 2012, the evolution of the integration process flow and the design layout are impressive. In a broad sense, maximum changes and new process steps in advanced logic nodes take place near the gate structure, especially in the lowest interconnect structure closest to the gate. A glimpse of the process sophistication can be deduced from an old presentation of Intel, along with Mr. Dick James’ comments of Intel’s 10 nm process which includes cross-sections and detailed explanations about the changes in contact formation [4]. This article highlights how by changing the layout and the integration scheme the standard cell could be reduced and thus increase the number of transistors per unit area. A detailed survey of technology process of finFETs starting from 14 nm to 10 nm is well collected in a presentation from Siliconics [5]. This presentation is full of cross-sections and detailed explanations, and is quite a treasure trove. It elaborates some of the major innovations that have been introduced in finFET devices. For examples, it discusses, fin geometry and pitches, work function metal layers of NMOS and PMOS transistors, solid-source diffusion punch stop and its role, the introduction of novel materials in the lower interconnect structure, the structure of dummy gates at the fin end, post patterning fin removal, the coming of super vias that connect directly from metal 1 to the gate without the need of an intermediate metal 0 layer, the implementation of multi-stage contacts to the source-drain regions, the introduction of quadruple patterning for the front-end, and air-gaps in the back-end-of line. Figure 2 taken from this presentation shows a variety of contacts, which is only one of the novelties in finFET devices. And of course each of these process steps is backed by a family of patents. This illustrates the point that massive innovations were implemented on the same finFET device configuration.

Predicting near future technologies for semiconductor devices would require looking for patents that make incremental changes yet affect the cell area or the layout of interconnect structure closest to the gate. These patents would be able to make the miniaturization process without much disruption while still maintaining the integration flow, thus keeping the manufacturing cost low. Modern technology will accelerate the process of using patents to more effectively predict the near future technologies of semiconductor devices. Related ideas are already being tried out with the help of deep learning as in the case of Google which announced that it is experimenting with artificial intelligence to make more efficient chips. It is not looking for radical changes in device structures but rather optimizing what is available [6]. Semiconductor technology has never stopped innovating and will not stop surprising us and a thorough understanding of current process steps and their corresponding patents could be key to predicting what is still to come.

The ideas expressed in this article are solely the opinion of the author and do not represent the author’s employer or any other organization with which the author may be affiliated.

References

1/ https://twitter.com/WIPO/status/1247498105135566848

2/ https://www.semiconductor-digest.com/2020/03/10/transistor-count-trends-continue-to-track-with-moores-law/

3/ https://www.tsmc.com/english/dedicatedFoundry/technology/5nm.htm

4/ https://newsroom.intel.com/newsroom/wp-content/uploads/sites/11/2017/09/10-nm-icf-fact-sheet.pdf

https://sst.semiconductor-digest.com/chipworks_real_chips_blog/2017/04/10/intel-unveils-more-10nm-details/

5/ https://nccavs-usergroups.avs.org/wp-content/uploads/JTG2018/JTG718-4-James-Siliconics.pdf

6/ https://www.zdnet.com/article/google-experiments-with-ai-to-design-its-in-house-computer-chips/


Wi-Fi Bulks Up

Wi-Fi Bulks Up
by Bernard Murphy on 04-23-2020 at 6:00 am

Wi Fi

Wireless discussion these days seems to be dominated by 5G, but that’s not the only standard that’s attracting attention. The FCC just circulated draft rules to dramatically expand bandwidth available to Wi-Fi in the new Wi-Fi 6e standard.

Is this a tragic plea for attention from a once-important standard, now eclipsed by its cellular big brother? Not at all. According to Cisco, 60% of mobile traffic will be offloaded to Wi-Fi by 2022. That’s both an opportunity and a challenge. The challenge comes in bandwidth.

Vanilla Wi-Fi at 2.4GHz is competing with other standards like Bluetooth and ZigBee (and garage-door openers). It’s become a crowded space. In 2014, Wi-Fi 5 (then more cryptically known as 802.11ac) appeared, offering more bandwidth at 5GHz – that other option on your home router. But that band is also getting crowded because we’re pushing more through our internet connections (e.g. streaming video). WI-Fi 6 added to the noise in 2019.

Where’s all that mobile traffic going to go? Onto a new band at 6GHz, with 1200MHz of bandwidth is where. This will be called Wi-Fi 6e and should fix an important limitation in Wi-Fi 6 – practically attainable bandwidth. I should add here that 6e is not a new standard, it is simply an expansion of range.

In principle the 6 standard offers bandwidth options up to 160MHz, in practice that high-end option has never been very useful thanks to interference from other traffic. So when you’re wandering around an airport (remember those days?), Wi-Fi access points are more likely configured to one of five 80MHz bands. There’s no loss in net available bandwidth but you don’t have access to fat 160MHz pipes.

The 6e standard will have 1200MHz total bandwidth, offering up to seven 160MHz bands and fourteen 80MHz bands. In total, this increases available bandwidth for Wi-Fi by almost a factor of 5.

One place this added capacity will be appreciated in in wireless VR/AR headsets. Today those run on an earlier Wi-Fi standard called WiGig, starting at 60GHz. Loads of bandwidth, very low latency but extremely power hungry and must be built in specialized processes, also making them expensive. 6e won’t be able to offer the same extreme performance but is viewed as still good enough, at much lower power and can be build in mass market mobile processes, i.e. much more cheaply.

Wi-Fi 6e will have a little shorter range than Wi-Fi 6 and power will be a little bit higher. For access points (APs) in airports, stadiums and our homes, that shouldn’t be a big problem. We will need some more APs to provide decent coverage and our phone charge will run down a little more quickly but not very noticeably.

Wireless chips are already being built, from Qualcomm and Broadcom, though the certification still hasn’t been announced. I’ve seen that Intel are also planning to release a chip later in the year. So I’m guessing Wi-Fi 6e-enabled products aren’t likely to appear before late 2020 or early 2021.

Wireless IP to take advantage of Wi-Fi 6E will be essential for those who want to jump on this bandwagon. CEVA tells me that this will only require a simple upgrade in its RivieraWaves 802.1ax IP (you’ll need some work in the RF stage as well). The IP will be released as soon as the standard is ratified. You can learn more about RivieraWaves Wi-Fi HERE.

Also Read:

5G Infrastructure Opens Up

Using IMUS and SENSOR FUSION to Effectively Navigate Consumer Robotics

A Bundle of Goodies in Bluetooth 5.2, LE Audio


ASML A Scenario More Lumpy While Demand and Tech Remain Solid Despite Covid Delays

ASML A Scenario More Lumpy While Demand and Tech Remain Solid Despite Covid Delays
by Robert Maire on 04-22-2020 at 2:00 pm

ASML SemiWiki 2020

Covid issues create “lumpy” quarters due to delays
Orders & demand remain solid and strong
2020 Year financials intact so far but ignore Qtrs
Taking prudent actions- no buybacks or guidance

As expected, Covid impacts both shipments & supply chain, ignore the near term lumpiness…
ASML reported revenues of Euro 2.4B and EPS of Euro 0.93 per share, obviously well short of prior expectations set before Covid19.

The results were impacted by the loss of Euro 200M of DUV sales and Euro 500M of EUV sales due to issues primarily related to Covid19 related delays and interruptions. There were issues both with the supply chain to build tools as well as getting tools delivered. It seems the part in the middle, manufacturing, was spared significant impact.

In our view investors have to take a deep breath and ignore near term and quarterly results as large lumps of revenue will flow in and out of quarters while Covid19 issues still exist.

We are much better off focusing on order rates and yearly goals of tool shipments and ramping of new technology than counting revenue in a specific quarter.

Order intake was a very solid Euro 3B and the company maintains its goal of 35 EUV systems in 2020. High NA has not changed so we view the overall long term story as very much intact.

Business will remain lumpy given EUV pricing…
When tools cost over Euro 100M each, a couple of tools slipping in or out of a quarter can make things look exceedingly good or bad. As EUV becomes a bigger percentage of business over time this lumpiness will continue. This time the shifting of revenues was caused by a pandemic, next time it could be some other global or trade related event that was just as unseen as Covid19 was 3 months ago.

From a short term investor perspective we think it would be natural to take advantage of this variability by buying into light quarters and selling into inflated quarters.

Obviously, long term investors just have to sit back and focus on order intake, backlog, annual goals and shipments and technology progress.

It would be incorrect to assume that just because there is a long lead time and large backlog that ASML can manage tool delivery or build times during such global disturbances.

Taking prudent steps…
As we have heard from a number of other companies ASML is stopping share buy backs and more conservatively managing cash by slowing expenses and hires. This is nothing more than correct behavior and not an indication of expectations.

It would also be prudent at this time to use cash to help manage the supply chain with existing and new suppliers to maintain a flow of materials, which ASML is doing.

Q2 up 50%????
Even though the company is not giving “official” guidance they are softly guiding to sales recognition being up 50% in Q2 as systems get recognized and impacts of some delays get worked out.

Obviously, just as we would ignore Q1 weakness we would also ignore a 50% rise in Q2 as tools pushed out of Q1 into Q2.

If we look at 35 EUV systems in 2020, we still view that as a perfectly “doable” number and we should just pay attention to progress on that.

Supply chain is the biggest issue…
Of the things to be worried about is the supply chain. While ASML can manage its own manufacturing, supply of critical components from outside suppliers is less under control especially those parts that originate from all parts of the globe. Given the very specialized and advanced nature of the tools there are many components that are single sourced. Lenses are obviously the most critical example. Many components, such as lenses, have long lead times and “safety stock” which helps buffer near term disruptions but it is unclear how long the Covid impact will last and what permanent impact it could have on the supply chain. A longer term impact could use up the “safety stock” or WIP buffer

Long term demand not impacted…yet…
Second on our list of concerns is longer term demand. As we have pointed out in recent notes, we remain concerned about the demand for semiconductors in general and more specifically memory as the economic impact of Covid19 trickles down to consumer demand for electronics.

So far the company stated, and we agree, that customers have not canceled or pushed out orders as the lead times for tools are so long that no one wants to get out of line and risk future delays of a technology roadmap as litho tools are the critical gating item in increasing fab capacity or pushing Moore’s law forward.

Right now TSMC, Samsung and Intel remain in a race to move forward with Moore’s Law as quickly as possible. If anything this race has heated up as AMD is a bigger threat to Intel and Samsung wants to get business back from TSMC, while TSMC wants to maintain its lead.

Memory feels like its coming back but it is also most at risk to see orders slow of push out so we will try to keep a sharper eye on memory outlook. It would likely take a couple of quarters of Covid induced economic slow down to trickle down to chip makers, with memory fabs first to be impacted.

At this point we are in a bit of wait and see on future demand but in the mean time business remains solid and plans on track.


Accelerating Edge Inference with Flex Logix’s InferX X1

Accelerating Edge Inference with Flex Logix’s InferX X1
by Mike Gianfagna on 04-22-2020 at 10:00 am

Screen Shot 2020 04 11 at 6.29.49 PM

For a long time, memories were the primary technology driver for process development. If you built memories, you got access to cutting-edge process information. If you built other products, this could give you a competitive edge. In many cases, FPGAs are replacing memories as the driver for advanced processes. The technology access benefits still apply and at least one company, Flex Logix, is reaping those benefits.

Flex Logix has been known for their embedded FPGA technology, providing the best of both custom logic and programmable logic in one chip. On April 9, the company disclosed real-world AI interference benchmarks for its InferX X1 product. This new product contains both custom logic and embedded programmable logic.

An eye-popping overview of the benchmark results was presented at the Spring Linley Processor Conference on the same day. This popular conference was conducted as a virtual event.  I got a chance to attend many of the sessions and I can say that The Linley Group did a great job capturing their live event in a virtual setting, delivering both high-quality presentations and providing informal access to the presenters. Expect to see more events like this.

The presentation was given by Vinay Mehta, AI inference technical marketing manager at Flex Logix. Prior to joining Flex Logix, Vinay spent two years at Lyft designing next generation hardware for Lyft’s self-driving systems. His activities included demonstration of quantization and hardware acceleration of neural networks and evaluation of edge and data center inference accelerator hardware and software. Vinay is a very credible speaker on AI topics.

Vinay’s presentation covered an overview of edge computing, customer requirements, characterizing workloads, a discussion of throughput vs. latency for streaming, benchmark details and convolution memory access pattern strategies. Here are some the highlights of Vinay’s talk…

The InferX X1 is completing final design checks and will tape out soon (TSMC 16FFC). It contains 4,000 MACs interconnected with Flex Logix’s EFLX eFPGA technology. This flexible interconnect helps the product achieve high utilization. Total power is 13.5 watts max, with typical power consumption substantially lower. Chip samples and a PCIe evaluation card are expected in Q3 2020. The part has flexibility built-in to support low-latency operation for both complex and simpler models at the edge.

Vinay covered many aspects of customer benchmarks for the InferX X1. To begin with, power and size stack up as shown in the figure below. The Flex Logix part appears to be lower power and less expensive (thanks to the small die size).

Regarding performance benchmarking, Vinay spent some time reviewing the various benchmarks (e.g., MobileNet, ResNet-50, Inception v4, YOLOv3). He also explained that many benchmarks assume perfectly ordered data, which often is not the case in real-world workloads. Putting this all together to examine benchmark performance for latency-sensitive edge applications yields the figure below.  Note these results focus on latency without regard to power and cost.

Vinay pointed out that the view above isn’t holistic in the sense that customers will be interested in the combination of performance, power and cost. Looking at the benchmark data through the lens of throughput relative to die size, which is a proxy for cost, you get the figure below. The InferX X1 has a clear advantage thanks to its small size and efficient utilization of resources.

Vinay then spent some time explaining how various convolutional neural network (CNN) algorithms are mapped to the InferX X1 architecture. The ability to “re-wire” the part based on the memory access patterns of the particular convolutional kernel or series of convolutional kernels being implemented is a key reason for the results portrayed in the figure above. Flex Logix’s embedded FPGA technology provides this critical level of differentiation, as it allows for more complicated operations (such as 3D convolutions) to map efficiently to its unique 1D systolic architecture.

Vinay’s talk covered many more aspects of real-time inference requirements at the edge. There was also a very useful Q&A session. If you weren’t able to attend his presentation at the Linley Conference, there is a replay available. I highly recommend you catch that. You can register to access the replay here. The presentation is on day four, April 9, 2020 at 11:10 AM.


How does TensorFlow Lite on Tensilica HiFi DSP IP Sound?

How does TensorFlow Lite on Tensilica HiFi DSP IP Sound?
by Tom Simon on 04-22-2020 at 6:00 am

TensorFlow Lite Needed for Audio

In all the hubbub about AI/ML, it’s easy to see why visual ML gets more attention. It’s got appeal because of applications such as autonomous driving. Because of this it’s easy to overlook the importance of audio ML. I own a Tesla and putting it into autopilot is very cool, but even it has voice recognition built in as an important feature to reduce driver distraction. At home I have numerous Google Minis that we use every day for controlling our heating, lights, etc. When all is said and done I am sure I use our voice ML appliances more often than the Tesla Autopilot.

Audio ML is most useful when it runs on the edge, not in the cloud. This enhances security because the audio stream does not need to travel to the cloud. Local ML processing also improves latency and lowers network loading. With this in mind, Cadence just announced support for TensorFlow on their Tensilica HiFi DSPs. Along with the announcement on their website, Cadence also gave a presentation, titled “Efficient Machine Learning on HiFi DSPs Using TensorFlow”, at the Linley Processor Conference in March. This year, of course, the conference was held online due to Covid-19.

Yipeng Liu, Technical Marketing Director at Cadence for Tensilica IP, started off the talk with a discussion of how audio ML was recently found useful in China to help prevent the spread of the Covid-19 virus. Once elevator buttons were found to be a transmission risk, a team of engineers rapidly developed a voice activated system that worked in the acoustically difficult elevator environment to let riders control the elevator.

Embedded edge devices present several constraints for deploying ML, including small memory sizes and limited processing capabilities, both of which can lead to difficulties in coding effective solutions in a timely manner. There are optimization methods that can be employed; however, they make the process more manual and ad-hoc. Edge-based hardware usually has fixed-point math units and also requires C/C++ rather than Python code.

Cadence is offering TensorFlow Lite for Microcontrollers (TFLM) targeted at its Tensilica HiFi DSP IP, which effectively deals with these limitations. Included in TFLM is the HiFi Neural Network library (NN lib), HiFi Nature DSP library (NDSP lib) and 8/16/32 bit SIMD and VFPU support. The libraries are optimized for all Tensilica HiFi DSPs. The libraries are also framework agnostic.

Yipeng also talked about the Tensilica XAF middleware that allows for faster system integration. It handles memory allocation and management. It also has the ability to install and uninstall components to save memory. Yipeng said that XAF middleware simplifies integration of ML and also traditional audio components from multiple providers.

We live in a world where audio information plays a huge role. Even though image processing gets the limelight, audio processing enables many important tasks. These range from voice processing, to song recognition, and a host of other important tasks. Now with the ability to easily add ML to devices at the edge, the applications and uses for this technology will expand further. Cadence is now providing a way to develop and deploy these applications with much less effort with the widely used TensorFlow platform. Full information about Cadence TensorFlow Lite for Microcontrollers can be found on the Cadence website.

Also Read:

Ultra-Short Reach PHY IP Optimized for Advanced Packaging Technology

Cadence Dives Deeper at Linley Fall Processor Conference

Leveraging Virtual Platforms to Shift-Left Software Development and System Verification


The Quiet Giant in Verification IP and More

The Quiet Giant in Verification IP and More
by Mike Gianfagna on 04-21-2020 at 10:00 am

SmartDV Market Coverage

In the technology industry, we’re all used to the hype about the latest and greatest. Semiconductor IP participates in the over-drive news cycle from time to time as well. So, when I see a company that has real, solid credentials but has resisted the temptation to over-hype, it gets my attention. I had an experience like this recently relative to SmartDV.

The company is a new sponsor for SemiWiki, and I spent some time recently speaking with Barry Lazow, their vice president of worldwide sales and marketing. Barry has been doing high technology sales work for quite a while, all the way back to VLSI Technology, arguably one of the true pioneering companies in semiconductor. I took the opportunity to probe Barry about the story behind SmartDV.

What I found was, in a word, breathtaking. First of all, the company is self-funded. No VC commitments, no need to waiver from their core focus, which is stated as “being #1 in verification and design IP,” a lofty goal.  Barry describes the company as a family business, with the founding team of chip design and verification experts still in place after 12 years. The company’s development team of over 250 people is located in Bangalore, India.

They maintain a three-shift operation there to ensure worldwide coverage for customer support, which they earn high marks for from their customers. Doing centralized support this way can be challenging and it seems SmartDV has figured out how to do it right. Speaking of customers, Barry explained that more than 100 companies worldwide rely on SmartDV’s products, including seven of the top ten worldwide semiconductor companies.

So, what does SmartDV offer? Their focus is on verification IP (VIP) and design IP, with over 400 titles in their portfolio. The figure below drives home the point about being #1.

In the VIP area, the SmartDV portfolio is quite robust. It includes models for simulation, emulation (synthesizable transactors (SimXL) for accelerating emulation), assertion-based VIP (formal verification), FPGA prototyping and verification (with a supplied LINUX Perl driver), post-silicon verification and a visual debugger called SmartViP debug for rapid analysis of protocol issues. Probing a bit more, SmartDV’s VIP has native support for UVM, System Verilog, VMM, OVM, Vera, Verilog and SystemC/TLM. Quite a list. Each VIP also includes a compliance test suite and a functional coverage model, which is not always the case with VIP.

Barry pointed out another key attribute of this VIP, the ability to seamlessly transition from simulation to emulation. I have plenty of stories of that transition taking many months. I recall the term “design for emulation”. You probably have some of your own stories. An easy transition to emulation is a big deal. All the major emulators are supported of course. Another statistic Barry shared was from his customers, who report compile times for SmartDV VIP being 2-3X faster than the competition. Impressive.

In the design IP area, SmartDV offers synthesizable RTL in Verilog or VHDL to cover popular interfaces such as MIPI, AMBA, PCI, CAN, RapidIO and so on. After our tour of IP products, just when I thought I had heard all the juicy stuff, Barry gave me one more tidbit – perhaps the key secret of SmartDV’s success. SmartDV’s products are typically sold as soft, compilable IP. Nothing really new there. What is new is that ALL of their products are built with a proprietary compiler technology which utilizes a proprietary language that drives the process.

This has some significant implications. Let’s start with the myth of standard, off-the-shelf IP. We all know that does happen sometimes, but often there are tweaks needed to use an IP block effectively. That usually translates into additional manpower at the vendor to implement the tweaks and associated delivery delays. This is not the case at SmartDV. Thanks to their proprietary compiler, the company can implement modifications in days. This is one of the reasons for their high marks in customer support. Barry explained that an entirely new piece of VIP can typically be done in 4-6 weeks, but typical customization is done in 1-2 weeks. This allows the company to be first to market supporting virtually all new and emerging protocols, as depicted in the figure below. As an example, SmartDV was first to market with VIP to support TileLink, the RISC-V interconnect fabric.

SmartDV also provides verification services – they are a one-stop shop for a huge part of a project’s verification needs You can learn more about this quiet, but potent company at the SmartDV website. If you have verification or design IP needs for your next project, I strongly recommend you start there.

Also Read:

SemiWiki and SmartDV on Verification IP

Secret Sauce of SmartDV and its CEO’s Vision

SmartDV at DAC and More


That Last Level Cache is Pretty Important

That Last Level Cache is Pretty Important
by Bernard Murphy on 04-21-2020 at 6:00 am

CodaCache in System

Last-level cache seemed to me like one of those, yeah I get it, but sort of obscure technical corners that only uber-geek cache specialists would care about. Then I stumbled on an AnandTech review on the iPhone 11 Pro and Max and started to understand that this contributes to more than just engineering satisfaction.

Caching

A brief refresh on caching. This is a method used in processors and in more general SoCs to minimize the need to go off-chip to read or write memory. Off-chip accesses are slow and burn a lot of power. If you already have the data around in on-chip memory – a cache – you should read or write that memory instead. This is a frequency-of-access play. You can’t get rid of the off-chip memory, but you can reduce the number of times you have to go there, speeding up net performance and reducing net power consumption.

It’s common to have a hierarchy of caches. These start with small caches really close to a CPU or IP for very fast access, then bigger caches a bit further away (but still on chip) to serve a bigger demand though not quite as fast.

The last level provides caching for the whole chip, bigger still, a little slower still and the last line of defense before you have to go to main memory.

In a modern SoC with CPUs and GPUs and AI accelerators and who knows what else, this cache has to serve a lot of different masters. Which is important because how effectively a cache does its job is very dependent on the application, extra tricky when the last level cache has to serve a bunch of different applications.

Pointer-chasing

There are a number of pointer-based programming data structures, especially popular in data analytics and machine learning kernels, which can particularly stress caching. These include linked-lists, trees and graphs. In all these cases to trace through a structure you have to chase through pointers to work through a list or a path through a tree or graph. Because this is pointer-based data there’s no guarantee it will all fall close together in memory, which makes it harder to get all of it into a cache limited to caching some fixed number of memory blocks. Chasing through those pointers is most likely to stress the ability of a cache to keep the relevant data local without needing to go out to main memory.

The iPhone 11 review (more exactly the Apple 13 Thunder SoC review)

So AnandTech ran a pretty detailed set of pointer-chasing tests on the A13 and compared these with results for the A12 Tempest (and variously some other phones in some graphs). In most of the charts they compare latency (speed of access) versus test depth (how many pointers are chased).

They show a latency versus test depth curve for A12 and A13 with identical or improved latencies in the first and second level caches but consistently better latency in the last level cache (which they call the system level cache – SLC). In the upper right of the graph, A13 is slower than A12 but that’s for off-chip memory accesses. SLC access are one level below and left of those.

AnandTech concludes that Apple is very effectively using that last level cache to get better overall system performance. And they note later that the A13 SLC maintains bandwidth versus test depth much better than in the A12. All adding up to better system performance across many applications. If it’s that important to Apple, I’d have to guess is should be just as important to everyone else.

Arteris IP CodaCache

Caching and interconnect are very tightly tied together. When multiple IPs each have their own local caches, those must be kept coherent. When an attempt to find a piece of memory in a local cache fails (a cache miss), the request must be passed onto the next level and so on until if the last level cache can’t help, it has to go out to main memory. A big drag on latency and power in those cases.

Arteris has built a general-purpose highly-configurable last level cache, their CodaCache, for such applications. They’ve had this for a little while. What’s new about this implementation is that it now meets ISO 26262 functional safety compliance, in line with Arteris IP general directions in safety. You can learn more about CodaCache HERE.

Also Read:

Trends in AI and Safety for Cars

Autonomous Driving Still Terra Incognita

Evolving Landscape of Self-Driving Safety Standards