SemiWiki – Page 299 – The Open Forum for Semiconductor Professionals

September 8, 2021September 9, 2021

Can/Will NHTSA Rein in AVs Bad Boys?

Can/Will NHTSA Rein in AVs Bad Boys?
by Roger C. Lanctot on 09-08-2021 at 10:00 am
Categories: Automotive
2 Comments

There’s a new sheriff in town at the U.S. Department of Transportation in the form of Department Secretary Pete Buttigieg with an acting deputy in Acting Administrator Dr. Steve Cliff at the National Highway Traffic Safety Administration (NHTSA). NHTSA has served notice on bad boy Tesla CEO Elon Musk that it is investigating the circumstances connected with 11 fatal crashes of Tesla vehicles operating in Autopilot mode.

This is a key turning point in the industry for the undermanned and underfunded NHTSA. After four years without an administrator, the agency appears to be finally taking the reins, building a regulatory agenda (much of it already baked into the pending infrastructure bill), and slipping into the driver seat to guide the industry.

One of the higher profile and frankly embarrassing issues facing the agency has been periodic crashes of Tesla vehicles operating in Autopilot mode. Many of these crashes received high priority on-site investigative treatment by the agency and its regulatory cousin the National Transportation Safety Board (NTSB).

The NTSB most recently concluded that Tesla ought to be instructed to add an effective driver monitoring system and limit the use of Autopilot to divided highways. For its part, NHTSA expressed concern, but accepted Tesla’s dodges that A) drivers in fatal crashes of their vehicles were misusing Autopilot by not paying attention; and B) that data showed Tesla vehicles with Autopilot were safer to operate than non-Autopilot equipped vehicles.

The concern among regulators is clearly that other auto makers might follow Tesla’s lead, flooding highways with semi-autonomous driving systems being similarly misused resulting in similar fatal crash scenarios. It is perhaps for this reason that the agency sent a letter to George Hotz, founder of self-driving startup Comma.ai, expressing concern that the company’s aftermarket self-driving system would create hazardous driving circumstances for its users and other drivers sharing the road with them.

Hotz took two steps in response to the NHTSA outreach. He cancelled plans to introduce his aftermarket device, and he shifted to offering his software on a downloadable open source basis and sold the devices separately. Hotz also added a driver monitor to his device, which earned Comma.ai a top ranking in a Consumer Reports evaluation of self-driving systems.

Until now, both Musk and Hotz have found ways to work around NHTSA, while more traditional robotaxi developers, like Kyle Vogt at General Motors’ Cruise, have sought self-driving car exemptions from regulatory oversight by NHTSA. Safety advocates are outraged at the behavior of both NHTSA and Tesla. Tesla fans are thrilled with their cars and the freedom with which they are trusted to operate in semi-autonomous mode in their Tesla’s.

NHTSA certainly faces a challenge in coming to grips with the Tesla Autopilot application. It is clear that the system fails when it is being misused, but also fails when drivers are paying attention – there are multiple Youtube videos to attest to the wandering guidance of Tesla systems and their ability to mistake a Burger King sign for a Stop sign under the right circumstances.

There is an even more salient concern which is the challenge of properly activating and de-activating the system in the car. Regardless of the reliability of Tesla’s sensing systems, it is quite simple for a driver to make an incorrect selection in his or her attempt to turn on or maintain Autopilot – creating a significant gap between the driver’s expectations and the vehicle’s actual performance.

It remains to be seen whether NHTSA has the resources and expertise necessary to evaluate Tesla’s Autopilot – especially as the system itself is a moving target with regular updates of its algorithms and sensing capabilities. Perhaps NHTSA could start with the Tesla’s own warning message – most recently received by drivers of the Model 3 with Full Self Driving beta. The message stated: “(Your vehicle) may do the wrong thing at the worst time…(keep your) hands on the wheel and pay extra attention to the road.”

As soon as that message was sent to Tesla owners, NHTSA ought to have stepped in. It is comparable to GM telling owners of older model Chevrolet Bolts to park their cars outdoors. Of course, those Bolts were already subject to a NHTSA-initiated recall.

The latest initiative by NHTSA – to investigate Tesla crashes – is an effort to at least make an effort. Like President Biden said about climate change: Doing nothing is not an option. The best news of all is that NHTSA is no longer asleep at the wheel. It is an open question as to whether the Agency is ready to take the wheel. The bad boys of AV tech will be watching closely.

September 8, 2021March 25, 2022

Verifications Horizons 2021, Now More Siemens

Verifications Horizons 2021, Now More Siemens
by Bernard Murphy on 09-08-2021 at 6:00 am
Categories: AI, Automotive, EDA, Siemens EDA

In a discussion with Tom Fitzpatrick of Siemens EDA he recalled that their Verification Horizons newsletter started 17 years ago, back when they were Mentor. We’ve known about the Siemens acquisition for a while. The deal closed in March 2017, but it wasn’t until January 1, 2021 that the legal entity merger was complete. Which makes this the first version of the newsletter in which they’ve had enough time to absorb and express a Siemens slant to Verification Horizons.

Tom reiterated that a major motivation in the acquisition was filling out the Siemens vision of digital twins. These start from a big system view (like an aircraft for example). Mechanical, fluidics, thermal, software and so on. In modern systems there’s now so much new electronic content that modeling must also reach down inside those subsystems. The September issue of Verification Horizons covers multiple topics underlying this trend. I’ll just touch on a few.

Digital Threads, Twins, MBSE and IC Development

Model-based Systems Engineering (MBSE) is a new favorite topic of mine, driving modeling and design all the way from the ultimate system (e.g. an aircraft) down to SoCs. Siemens outlines a methodology called Arcadia which they use in their System Modeling Workbench to describe and decompose from high-level requirements and block functions down to individual components. SysML is a modeling language commonly used to describe behaviors and constraints at these higher levels.

How does IC design and particularly verification interact with these higher levels? In the example shown in the newsletter, they bridge using TLM (i.e. software) models for IC component behaviors, and verification of requirements through coverage analysis. In the aircraft example, they talk about a DO-254 list of requirements, each requiring a test and confirmation that the test passed. To this they would add coverage metrics to complete requirements coverage.

Verifying AI-enabled SoCs for HPC

Time to market pressures are just as active today in these SoCs as elsewhere, even though such systems are monsters and require very extensive hardware and software testing. (Much of what they describe in this article applies equally to big non-AI systems but I’m guessing this write up was motivated by an actual AI design experience 😀.) They talk here particularly about need for parallel development of hardware and software. This starts from a virtual platform and progresses to IP RTL development in parallel with driver development, and so on through to pre-silicon prove-out with apps and post-silicon bring-up.

The article makes the point that this style of development must be supported by a combination of emulation and prototyping. Emulation through hardware design development and early software apps development, since even here validation must comprehend hardware test loads. Prototyping during late hardware development and through software app development because there you need software performance. The article stresses the advantages of the Siemens two-part prototyping solution here: Veloce Primo for up to 12B gates and ICE support and Veloce ProFPGA for shipping prototypes to customers.

Verifying a DDR5 Memory Subsystem

I like talking about applications, so this is my last selection from the set of articles in the September newsletter. High bandwidth memory is more commonly integrated in big server processors, AI systems and other large SoCs. For this we we need even faster links from the main digital die(s) to these in-package DRAMs. The latest released standard here is DDR5, providing double the bandwidth at lower power than DDR4.

Siemens provides QVIPs for both chip and DIMM DDR5 memories. This write-up goes into quite a bit of detail on connecting and configuring your design. Plus creating compile and simulation scripts and running simulation and debug. I won’t attempt to summarize these other than to note they provide help in generating scenarios. Also assertions, transaction, performance analysis and more. If this is an area of interest to you, follow the link. They provide much more detail on all of these topics.

Lots of good material in this issue of Verifications Horizons. Tom also created his own blog post. Both well worth your time to read!

Also Read:

Optimize AI Chips with Embedded Analytics

AMS IC Designers need Full Tool Flows

Symmetry Requirements Becoming More Important and Challenging

September 7, 2021March 16, 2022

Ansys IDEAS Digital Forum 2021 Offers an Expanded Scope on the Future of Electronic Design

Ansys IDEAS Digital Forum 2021 Offers an Expanded Scope on the Future of Electronic Design
by Daniel Nenni on 09-07-2021 at 10:00 am
Categories: Ansys, Inc., EDA, Events

For those of you following the latest developments in electronic design it has become clear that the industry is transitioning through an inflection point that is shifting some of the ground rules of design. The increase in the speed and integration density in today’s systems are blurring the lines between chip design and system design, and is epitomized by multi-die, 3D integrated circuits. Multiphysics – the simultaneous analysis of multiple physical effects – is at the heart of another profound and challenging shift in electronic design practice. By merging previously distinct physics disciplines while adding novel physics into the equation, it is driving a step-function in the technical expertise required by electronic design teams.

Here Comes the Multiphysics Revolution

A concrete example of how the facts on the ground are rapidly evolving include the increasing focus on thermal analysis, as it has become apparent that heat dissipation is probably the #1 limiting factor in 3D-IC integration density. But thermal gradients across heterogeneous components inevitably leads to differential expansion, which results in mechanical stress and warpage of a package.

Warping impacts the system reliability directly, but temperature also has less direct design effects. For example, it determines the maximum current in wires to avoid electromigration reliability issues. The higher speeds of signals coupled with larger physical sizes of multi-die systems make electromagnetic simulation a must – for not just the radio frequency (RF) designers, but also for high-performance computing (HPC) and artificial intelligence/machine learning (AI/ML) hardware. Inter-related physical effects such as these are driving the multiphysics revolution.

New Signoff Requirements

The semiconductor foundries have responded to the rise in 3D-IC design starts by supplementing their sign-off requirements and their recommended IC design flows to include the thermal, electromagnetic, and other tools that were previously relegated to OSATs or other outside vendors following fabrication. Most of the advanced 3D systems that have been brought to market so far were designed by large, leading semiconductor companies that have the resources and expertise to take advantage of the new technical opportunities. But, in order to make 3D-IC design more accessible to mainstream design teams, the industry needs tools and design platforms that capture and automate these advanced multiphysics design requirements in practical workflows.

The old ways of working won’t cut it in this new reality. Design specialists with specific domain knowledge dispersed over multiple groups need to be brought together into vertically integrated design teams that make expertise available right from the get-go during system prototyping. Designers will need new tools, new training, and new methodologies to compete in this environment.

Register for IDEAS to Discover Leading Electronic Design Techniques

The best place to learn about the newest electronic design techniques from industry experts is to attend this year’s Ansys IDEAS Digital Forum: Innovative Designs Enabled by Ansys Solutions.

IDEAS is a digital event that takes place Sept. 22-23, 2021. It gathers many of the leading electronic design companies from across the world where you’ll access C-level executive keynote speeches as well as advanced techniques from leading-edge design teams. Take a look at the IDEAS Agenda to see the unparalleled breadth and scope of multiphysics tools, solutions, and practical implementations with 40 presentations in 10 technical tracks for electronic systems analysis, semiconductor signoff, photonics, cloud, and workflow solutions.

– Power Integrity	– Silicon to System Reliability	– 3D-IC & Electrothermal Analysis
– Voltage-Timing Signoff	– Silicon Photonics	– System Analysis & Simulation
– Low Power Design	– Designing with Electromagnetics	– Cloud & Workflow Automation

With a roundtable panel and executives from the design community and solution providers, you can get a quick and accurate impression of the state of the art in electronic design today – all from the comfort of your home office.

Registration for IDEAS is now open to all at ansys.com/ideas. Sign up and reserve your front seat for a showcase of the future of electronic design.

Also Read

Have STA and SPICE Run Out of Steam for Clock Analysis?

Extreme Optics Innovation with Ansys SPEOS, Powered by NVIDIA GPUs

Ansys Multiphysics Platform

September 5, 2021September 3, 2021

IoT and 5G Convergence

IoT and 5G Convergence
by Ahmed Banafa on 09-05-2021 at 6:00 am
Categories: 5G, IoT
3 Comments

The Convergence of 5G and Internet of Things (IoT) is the next natural move for two advance technologies built to make users lives convenient, easier and more productive. But before talking about how they will unite we need to understand each of the two technologies.

Simply defined; 5G is the next-generation cellular network compared to 4G, the current standard, which offers speeds ranging from 7 Mbps to 17 Mbps for upload and 12 Mbps to 36 Mbps for download, 5G transmission speeds may be as high as 20 Gbps. Latency will also be close to 10% of 4G transmission, and the number of devices that can be connected scales up significantly which warranted the convergence with IoT. [1]

The Internet of Things (IoT) is an ecosystem of ever-increasing complexity; a universe of connected things providing key physical data and further processing of that data in the cloud to deliver business insights— presents a huge opportunity for many players in all businesses and industries. Many companies are organizing themselves to focus on IoT and the connectivity of their future products and services. IoT can be better understood by its four components: Sensors, Networks, Cloud/AI and Applications as showing in Fig.1. [2,3,9]

Figure 1: Components of IoT

When you combine both technologies, 5G will hit all components of IoT directly or indirectly, sensors will have more bandwidth to report actions, network will deliver more information faster, for cloud and AI the case of real-time data will be reality, and applications will have more features and cover many options given the wide bandwidth provided by 5G.

Benefits of using 5G in IoT

1. Higher transmissions speed

With transmission’s speed that can reach 15 to 20 Gbps, we can access data, files, programs on remote applications much faster. By increasing the usage of the cloud and making all devices depend less on the internal memory of the device, it won’t be necessary to install numerous processors on a device because computing can be done on the Cloud. Which will increase the longevity of sensors and open the door for more types of sensors with different types of data including high-definition images, and real-time motion to list few. [4]

2. More devices connected

5G impact on IoT is clearly the increased number of devices that can be connected to the network. All connected devices are able to communicate with each other in real-time and exchange information. For example, smart homes will have hundreds of devices connected in every possible way to make our life more convenient and enjoyable with smart appliances, energy, security and entertainment devices. In case of industrial plants, we are talking about thousands of connected devices for streamlining the manufacturing process and provide safety and security, add to that concept of building a smart city will be possible and manageable on a large scale. [4]

3. Lower latency

In simple words, latency is the time that passes between the order given to your smart device till the action occurs. Thanks to 5G this time will be ten times less than what it was in 4G. For example: Due to lower latency the use of sensors can be increased in industrial plants, including; control of machinery, control over logistics or remote transport all is now possible. Another example, lower latency led healthcare professionals to intervene in surgical operations from remote areas with the help of precision instrumentation that can be managed remotely. [4]

Challenges facing 5G and IoT convergence

Operating across multiple spectrum bands

5G will not replace all the existing cellular technologies any soon, it’s going to be an option beside what we have now, and also new hardware needed to take full advantage of the power of 5G, IoT’s second component “networks” will have more options now and can deal with a wide spectrum of frequencies as needed, instead of being limited to few options. [5]

A Gradual up-gradation from 4G to 5G

The plan is to replace 4G in a gradual way with all the infrastructure available now and this must be done on multiple levels and phases; software, hardware and access points. This needs big investment by both sides’ users and businesses, different parts of the nation will have different timelines to replace 4G and that will be created challenges in the services provided based on 5G, in addition the ability and desire of users to upgrade their devices to a “5G compatible “device is still a big unknown, a lot of incentives and education needed to convince individual and businesses to make the move. [5]

Data interoperability

This is an issue on the side of IoT as the industry evolves, the need for a standard model to perform common IoT backend tasks, such as processing, storage, and firmware updates, is becoming more relevant. In that new sought model, we are likely to see different IoT solutions work with common backend services, which will guarantee levels of interoperability, portability, and manageability that are almost impossible to achieve with the current generation of IoT solutions. Creating that model will never be an easy task by any level of imagination, there are hurdles and challenges facing the standardization and implementation of IoT solutions and that model needs to overcome all of them, interoperability is one of the major challenges. [6]

Establishing 5G business models

The bottom line is a big motivation for starting, investing in, and operating any business, without a sound and solid business models for 5G-IoT convergence we will have another bubble, this model must satisfy all the requirements for all kinds of e-commerce; vertical markets, horizontal markets, and consumer markets. But this category is always a victim of regulatory and legal scrutiny. [6]

Examples of Applications of 5G in IoT

1. Automotive

One of the primary use cases of 5G is the concept of connected cars, enhanced vehicular communications services which include both direct communication (between vehicles, vehicle to pedestrian, and vehicle to infrastructure) and network-facilitated communication for autonomous driving. In addition to this, use cases supported will focus on vehicle convenience and safety, including intent sharing, path planning, coordinated driving, and real-time local updates. This bring us to the concept of Edge Computing which is a promising derivative of cloud computing, where edge computing allows computing, decision-making and action-taking to happen via IoT devices and only pushes relevant data to the cloud, these devices, called edge nodes, can be deployed anywhere with a network connection: on a factory floor, on top of a power pole, alongside a railway track, in a vehicle, or on an oil rig. Any device with computing, storage, and network connectivity can be an edge node. Examples include industrial controllers, switches, routers, embedded servers, and video surveillance cameras.”, 5G will make communications between edge devices and cloud a breeze [5,7]

2. Industrial

The Industrial Internet of Things (IIoT) is a network of physical objects, systems, platforms and applications that contain embedded technology to communicate and share intelligence with each other, the external environment and with people. The adoption of the IIoT is being enabled by the improved availability and affordability of sensors, processors and other technologies that have helped facilitate capture of and access to real-time information.5G will not only offer a more reliable network but would also deliver an extremely secure network for industrial IoT by integrating security into the core network architecture. Industrial facilities will be among the major users of private 5G networks. [5,8]

3. Healthcare

The requirement for real-time networks will be achieved using 5G, which will significantly transform the healthcare industry. Use cases include live transmission of high-definition surgery videos that can be remotely monitored. The concept of Telemedicine with real-time and bigger bandwidth will be reality, IoT’s sensors will be more sophisticated to give more in depth medical information of patients on the fly, for example a doctor can check up and diagnostic patients while they are on the emergency vehicle in the way to the hospital saving minutes that can be the difference between life and death. 2020’s pandemic taught us the significance of alternative channels of seeing our doctor beside in person, and many startups created apps for telemedicine services during that period, 5G will propel the use of such apps and make our doctor visits more efficient and less waiting [5]

Ahmed Banafa, Author the Books:

Secure and Smart Internet of Things (IoT) Using Blockchain and AI

Blockchain Technology and Applications

Read more articles at: Prof. Banafa website

Article originally published in IEEE-IoT

References

[1] https://davra.com/5g-internet-of-things/

[2] https://www.linkedin.com/pulse/iot-blockchain-challenges-risks-ahmed-banafa/

[3] https://www.linkedin.com/pulse/three-major-challenges-facing-iot-ahmed-banafa/

[4] https://appinventiv.com/blog/5g-and-iot-technology-use-cases/

[5] https://www.geospatialworld.net/blogs/how-5g-plays-important-role-in-internet-of-things/

[6] https://www.linkedin.com/pulse/iot-standardization-implementation-challenges-ahmed-banafa/

[7] https://www.linkedin.com/pulse/why-iot-needs-fog-computing-ahmed-banafa/

[8] https://www.linkedin.com/pulse/industrial-internet-things-iiot-challenges-benefits-ahmed-banafa/

[9] https://www.amazon.com/Secure-Smart-Internet-Things-IoT/dp/8770220301/

Podcast EP36: Semiconductor Design Acceleration

Podcast EP36: Semiconductor Design Acceleration
by Daniel Nenni on 09-03-2021 at 10:00 am

Dan and Mike are joined by Michael Johnson (MJ), CTO at NetApp. MJ provides “behind the scene” insights into NetApp technology and how it has quietly revolutionized information storage and management for chip design. The key enabling technologies along with specific use cases are discussed. MJ also discusses moving to the cloud and how NetApp addresses the major hurdles for this migration along with a specific customer example.

The views, thoughts, and opinions expressed in these podcasts belong solely to the speaker, and not to the speaker’s employer, organization, committee or any other group or individual.

September 3, 2021September 3, 2021

The Arm China Debacle and TSMC

The Arm China Debacle and TSMC
by Daniel Nenni on 09-03-2021 at 6:00 am
Categories: Arm, TSMC
4 Comments

Having spent 40 years in the semiconductor industry, many years working with Arm and even publishing the definitive history book “Mobile Unleashed: The Origin and Evolution of ARM Processors in Our Devices” plus having spent more than 20 years working with China based companies, I found the recent Arm China media circus quite entertaining.

While I have zero firsthand information on this situation I do have numerous contacts and have had discussions on the topic. I also have many years of experience with Arm management, enough to know that the Arm China situation as described in the media is complete nonsense.

Rather than rehash the whole fiasco, here are links to one of the inflammatory articles and a retraction, which is quite rare for today’s media. After publishing false information most sites just move on to the next topic leaving the fake news up in spite of the collateral damage. I would guess that Arm made some calls on this one, absolutely.

ARM China Seizes IP, Relaunches as an ‘Independent’ Company [Updated]

ARM Refutes Accusations of IP Theft by Its ARM China Subsidiary

This Arm China false narrative started as most do, with a misread publication and a provocative title with the sole purpose of feeding clicks to the advertising monster within. He didn’t even get the author of the original publication’s name right and that still has not been corrected:

“As Devin Patel reports...” It’s Dylan Patel, he is a SemiWiki member, and he said nothing about “ARM China Seizing IP”. And by the way it’s Arm not ARM. That name was changed some time ago.

The author of the misfortunate article is a prime example of the problem at hand. While not the worst by any means, he has zero semiconductor education or experience. He does not know the technology, the companies, or the people, yet flocks of sheep come to his site for the latest semiconductor news. Pretty much the same as getting accurate political information from Facebook.

One of the reasons we started SemiWiki ten plus years ago was that semiconductors did not get their fair share of media attention. TSMC was a prime example. Even though they were the catalyst for the fabless semiconductor revolution that we all know and love, very few people knew their name or what they accomplished.

Now the pendulum has completely swung in the other direction with false TSMC narratives running amok. This one is my favorite thus far:

Intel locks down all remaining TSMC 3nm production capacity, boxing out AMD and Apple

And yes that one reverberated throughout the faux semiconductor media even though it was laughably false.

Here are a couple more recent ones that went hand-in-hand:

Taiwan’s TSMC asking suppliers to reduce prices by 15%

TSMC to hike chip prices ‘by as much as 20%’

Imagine the financial windfall here…

The upside I guess is that TSM stock is at record levels as it should be. There is an old saying, “There is no such thing as bad publicity” (which was mostly associated with circus owner and self-promoter extraordinaire Phineas T. Barnum). The exception of course being your own obituary as noted by famed Irish writer Brendan Behan.

With today’s cancel culture, bad press can be your own obituary which is something to carefully consider before publishing anything, anywhere, at any time. Of course, there is that insatiable click monster that needs to be fed so maybe not.

September 2, 2021July 18, 2025

Why Optimizing 3DIC Designs Calls for a New Approach

Why Optimizing 3DIC Designs Calls for a New Approach
by Synopsys on 09-02-2021 at 10:00 am
Categories: EDA, Synopsys
1 Comment

The adoption of 3DIC architectures, while not new, is enjoying a surge in popularity as product developers look to their inherent advantages in performance, cost, and the ability to combine heterogeneous technologies and nodes into a single package. As designers struggle to find ways to scale with complexity and density limitations of traditional flat IC architectures, 3D integration offers an opportunity to continue functional diversity and performance improvements, while meeting form-factor constraints and cost.

3D structures offer a variety of specific benefits. For example, performance is often dominated by the time and power needed to access memory. With 3D integration, memory and logic can be integrated into a single 3D stack. This approach dramatically increases the width of memory busses through fine-pitch interconnects, while decreasing the propagation delay through the shorter interconnect line. Such connections can lead to memory access bandwidth of tens of Tbps for 3D designs, as compared with hundreds of Gbps bandwidth in leading 2D designs.

From a cost perspective, a large system with different parts has various sweet spots in terms of silicon implementation. Rather than having the entire chip at the most complex and/or expensive technology node, heterogeneous integration allows the use of the ‘right’ node for different parts of the system, e.g., advanced/expensive nodes for only the critical parts of the system and less expensive nodes for the less critical parts.

In this post, which was originally published on the “From Silicon to Software” blog, we’ll look at 3DIC’s ability to leverage designs from heterogenous nodes– and the opportunities and challenges of a single 3D design approach to achieve optimal power, performance, and area (PPA).

Adding a Vertical Dimension Changes the Design Strategy

While 3D architectures elevate workflow efficiency and efficacy, 3DIC design does introduce new challenges. Because of the distinct physical characteristics of 3D design and stacking, traditional tools and methodologies are not sufficient to solve these limitations and require a more integrated approach. In addition, there is a need to look at the system in a much more holistic way, compared to a typical flat 2D design. Simply thinking about stacking 2D chips on top of each other is insufficient in dealing with the issues related to true 3D design and packaging.

Since the designs must be considered in three dimensions, as opposed to the typical x, y aspects of a flat 2D design, everything must be managed with the addition of the z dimension – from architectural design to logic verification and route connection – including bumps and through-silicon vias (TSVs), thermal, and power delivery network (PDN) opportunities for new tradeoffs (such as interposer based versus 3D stacks, memory on logic or logic on memory, and hybrid bonding versus bumps). Optimization of the ‘holy grail’ of PPA is still a critical guiding factor; however, with 3DICs, it now becomes cubic millimeter optimization, because it’s not just in two directions, but also the vertical dimension that must be considered in all tradeoff decisions.

Further complicating matters, higher levels of integration available with 3DICs obsolete traditional board and package manual-level techniques such as bump layout and custom layout for high-speed interconnects, which cause additional bottlenecks. Most importantly, interdependency of previously distinct disciplines now needs to be considered in a co-design methodology (both people and tools), across all stages of chip design, package, architecture, implementation, and system analysis.

Let’s look at an example of a specific design challenge – the goal to improve memory bandwidth. Traditionally, designers would look at how to connect the memory and CPU to get the highest possible bandwidth. But with 3DICs, they need to look at both the memory and CPU together to figure out the optimal placement in the physical hierarchy, as well as how they connect, through CSVs or silicon vias, for example. While performance is critical, designers need a way to evaluate the power and thermal impact by stacking these types of elements together in different ways, introducing new levels of complexities and design options.

Taking a Silicon-First Approach

While it might seem obvious to consider a 3D architecture in a similar manner as a printed circuit board (PCB) design, 3DICs should ideally take a silicon-first approach – that is, optimize the design IP (of the entire silicon) and co-design this silicon system with the package. Within our approach to 3DICs, Synopsys is bringing key concepts and innovations of IC design into the 3DIC space. This includes looking at aspects of 3DICs such as architectural design, bringing high levels of automation to manual tasks, scaling the solution to embrace the high levels of integration from advanced packaging, and integrating signoff analysis into the design flow.

3DICs integrate the package, traditionally managed by PCB-like tools, with the chip. PCB tools are not wired to deal with both the scale complexity and process complexity. In a typical PCB there may be 10,000 connections. But in a complex 3DIC, there are hundreds of millions of connections, introducing a whole new level of scale which is far outpacing what older, PCB-centric approaches can manage. Existing PCB tools cannot offer assistance for stacking dies, and there is no package or PCB involved. Further, PCB tools cannot look at RTL or system design decisions. The reality is that there cannot be one single design tool for all aspects of a 3DIC (IC, interposer, package), yet there is an acute need for assembling and visualizing the complete stack.

The Synopsys 3DIC Compiler does just that. It is a platform that has been built for 3DIC system integration and optimization. The solution focuses on multi-chip systems, such as chip-on-silicon interposer (2.5D), chip-on-wafer, wafer-on-wafer, chip-on-chip, and 3D SoC.

The PPA Trifecta

Typically, when you think of large complex chips, the first optimization considered is area. SoC designers want to integrate as much functionality into the chip and deliver as high performance as possible. But then there are always the required power and thermal envelopes, particularly critical in applications such as mobile and IoT (although also increasingly important in areas such as high-performance computing in a data center when overall energy consumption is prioritized as well). Implementing 3D structures enables designers to continue to add functionality to the product, without exceeding the area constraints and, at the same time, lowering silicon costs.

But a point tool approach only addresses sub-sections of the complex challenges in designing 3DICs. This creates large design feedback loops that don’t allow for convergence to an optimal solution for the best PPA per cubic mm² in a timely manner. In a multi-die environment, the full system must be analyzed and optimized together. It isn’t enough to perform power and thermal analysis of the individual die in isolation. A more effective and efficient solution would be a unified platform that integrates system-level signal, power, and thermal analysis into a single, tightly coupled solution.

This is where 3DIC Compiler really shines–by enabling early analysis with a suite of integrated capabilities for power and thermal analysis. The solution reduces the number of iterations through its full set of automated features while providing power integrity, thermal, and noise-aware optimization. This helps designers to better understand the performance of the system and facilitate exploration around the system architecture. And it also allows a more efficient way to understand how to stitch together various elements of the design and even connect design engineers in some ways to traditional 2D design techniques.

3DICs Are an Ideal Platform for Achieving Optimal PPA Per Cubic mm²

Through the vertical stacking of silicon wafers into a single packaged device, 3DICs are proving their potential as a means to deliver the performance, power, and footprint required to continue to scale Moore’s law.

Despite the new nuances of designing 3D architectures using an integrated design platform, the possibilities of achieving the highest performance at the lowest achievable power makes 3D architecture appealing. 3DICs are poised to become even more widespread as chip designers strive to achieve the optimum PPA per cubic mm².

By Kenneth Larsen, Product Marketing Director, Synopsys Digital Design Group

Also Read:

Using Machine Learning to Improve EDA Tool Flow Results

How Hyperscalers Are Changing the Ethernet Landscape

On-the-Fly Code Checking Catches Bugs Earlier

September 2, 2021March 25, 2022

Optimize AI Chips with Embedded Analytics

Optimize AI Chips with Embedded Analytics
by Kalar Rajendiran on 09-02-2021 at 6:00 am
Categories: AI, EDA, Siemens EDA

The foundry model, multi-source IP blocks, advanced packaging technologies, cloud computing, hyper-connectivity and access to open-source software have all contributed to the incredible electronics products of recent times. Along with this, the complexity of developing and taking a chip to market has also increased. And that is just from the effort perspective to implement a chip that performs to its specification. Add to this, the competitive market forces that demand faster time to market cycles.

While companies overcome these challenges by leveraging a combination of top-notch talent, tools, processes and proprietary methodologies, a new generation of chips are taking these challenges to a higher level. Artificial Intelligence (AI) driven applications such as security, visual cognition, and natural language comprehension/processing are behind the demand for these AI chips. Are time-proven techniques of overcoming time to market challenges sufficient when dealing with these AI chips? This is the backdrop for a whitepaper authored by Richard Oxland and Greg Arnot, both from Siemens EDA.

The whitepaper describes how new tools and methodologies may be required to help designers optimize hardware and software not only during the development phase but also after the chips are deployed in the field. It establishes that designers having intimate visibility into the operation of the chip is imperative for the on-time development of these AI chips. It explains how analytics capabilities embedded within these chips not only can help take a chip to market faster but also assist with optimizing the performance of the systems. This blog covers the salient points I gleaned from the whitepaper.

As embedded electronic systems get more complex, the interaction between hardware and software also becomes more complex. This makes debugging and optimizing for performance a very challenging and extremely time-consuming endeavor. Not only must root causes of bugs be determined and corrected but sub-optimal performance of a correctly functioning system must also be resolved under severe time-to-market pressures.

The authors discuss the Tessent Embedded Analytics platform and use an AI accelerator chip as an example to showcase the value product developers stand to gain from utilizing embedded analytics. The Tessent Embedded Analytics architecture has been designed and the platform implemented from the bottom up as a scalable, flexible and powerful solution to harness complexity in SoCs and embedded systems. The platform comprises a portfolio of silicon IP and software interface, together with APIs, an SDK, and database and IDE functionality. Refer to figure below.

Figure: Tessent Embedded Analytics Architecture

Source: Siemens EDA

The analytics platform combines IP and software designed to provide functional insights into complex SoC behavior. Tessent silicon IP can monitor internal bus transactions, processor execution, and other system-level activity within the device, correlated across the system, and at the right level of detail for the task in hand. The platform also contains the SW tools, APIs, and libraries required to process functional data and give designers a detailed understanding of the behavior of the hardware and software in the embedded system.

The whitepaper goes into details of how different embedded analytics modules bring value to the chip development process. Refer to figure below for the different analytics modules used with the AI accelerator chip. You can learn about all available embedded analytics modules by downloading the Tessent Embedded Analytics Product Guide.

Figure: Tessent Embedded Analytics Modules in an AI Accelerator Chip

Source: Siemens EDA

System Validation and Optimization

The customers for this example AI accelerator chip are Machine Learning (ML) application developers. Their software must be able to take advantage of all the unique hardware capabilities of the accelerator chip and maximize performance. As the limiting factor for system performance is the data throughput between memory and functional units, the chip design team must be able to optimize the high-bandwidth-memory (HBM) controller and memory banking schemes with confidence.

Assuming that memory corruption events are observed during the system validation phase, the team would have traditionally looked to simulation for debugging the issue. But as the use case is large, as in the case with many AI chips, debugging this way could consume many days or even weeks. This is where embedded analytics comes in as the savior. Using the supplied Python API and library of tests, the validation team configures the embedded analytics subsystem to find the root cause of the memory corruption. The DMA module is used to write to and examine the contents of the HBM in a precisely determined timeframe. The Bus Monitor is set up to look for transactions within a fixed address range and capture bus trace into a circular buffer. And the Enhanced Trace Encoder provides a mechanism to monitor the program execution of the relevant CPU.

With the memory corruption issue resolved, engineers can now focus on measuring response latencies of the HBM for different banking schemes using the built-in functionality of the Bus Monitor and the Python API. This mechanism allows for quick and easy experimentation with different hardware configurations.

Optimization in the Field

After system validation and optimization, a chip vendor may learn during field trials that the customers’ own applications may not be meeting expected performances. Fortunately, the same embedded analytics used during system validation can be leveraged to optimize memory bandwidth and latency.

Summary

The Tessent Embedded Analytics platform provides a solution that not only helps with the debug of an AI SoC during its development phase but also performance optimization of the product throughout its lifecycle. For full details, you can download the whitepaper here.

Also Read:

AMS IC Designers need Full Tool Flows

Symmetry Requirements Becoming More Important and Challenging

Debugging Embedded Software on Veloce

September 1, 2021September 2, 2021

Intel Architecture Day – Part 2: GPUs, IPUs, XeSS, OpenAPI

Intel Architecture Day – Part 2: GPUs, IPUs, XeSS, OpenAPI
by Tom Dillinger on 09-01-2021 at 10:00 am
Categories: Events, Foundries, Intel Foundry

Introduction

At the recent Intel Architecture Day presentations, a breadth of roadmap plans were provided – an earlier article focused on the x86 client and data center cores and products. This article focuses on the GPU and IPU announcements.

X^e Graphics Core

The Intel GPU architecture for embedded, discrete, and data center acceleration is based on the X^e graphics core. The figure below illustrates the integration of multiple cores with other units to provide a “render slice” block in the overall GPU hierarchy.

The X^e core supports both fp64/fp32 and fp16/bf16 data operands, to address both high-performance computing and AI workloads.

As was highlighted in the x86 core announcements, the integration of a “matrix engine” utilizes instructions and 2D data structures optimized for deep learning applications.

Arc

Intel introduced the Arc brand, to refer to the discrete graphics card product roadmap. Arc also incorporates new “unified, re-factored” graphics software drivers.

The first Arc card is codenamed “Alchemist”, with a X^echip fabricated using TSMC’s N6 process (1Q2022).

X^eSS Super Sampling

A unique software feature added to the GPU family is X^e super sampling. This image sampling method utilizes a combination of spatial and temporal data to upscale frame resolution, such as a 1080p to 4K video stream.

The graph on the right in the figure above illustrates the time to render an image for various methods (shorter is better). The X^eSS algorithm combined with the XMX vector accelerator unit enables excellent 4K image throughput. Intel provided a demo of a 1080p video upscaled using X^eSS to 4K resolution – the distinction between the upscaled and native 4K video (@60fps) was imperceptible, offering a unique power/performance optimization. (Intel indicated that X^eSS was “neural-network driven”, but did not delve into detail.) The software development kit with X^eSS support will be available shortly (for with hardware with and without XMX, utilizing the DP4a instruction).

Ponte Vecchio High Performance Computing GPU

The most advanced illustration of Intel’s packaging technology was provided as part of the Ponte Vecchio data center GPU presentation.

The (massive) package integrates various tiles, and utilizes both Intel’s 2.5D EMIB interconnect bridges and 3D Foveros vertical stacked die. Of particular note is the constituent tiles are sourced from both TSMC (e.g., compute tile: TSMC N5) and Intel (e.g., base tile: Intel 7).

The X^elink tile (TSMC N7) enables direct connection of a variety of GPU topologies, as shown below.

Preliminary (A0 silicon) performance measurements indicated an extremely competitive positioning relative to the GPUs in prevalent use in today’s data centers.

Infrastructure Processing Unit (IPU)

Intel provided a very compelling picture of the inefficiencies in current cloud data center services. The figure below shows that a “traditional” CPU plus SmartNIC cloud server architecture requires that the CPU spends considerable cycles performing infrastructure micro-services, such as storage management, security authentication, data encryption/decryption – a range of 31% to 83% overhead, as illustrated below.

These cycles are non-billable from the cloud services provider (CSP) to the client running “tenant code”, a considerable loss of revenue for the (expensive) CPUs in the data center.

Intel indicated they have been working closely with a “major CSP” on the design of an Infrastructure Processing Unit (IPU), to offload the CPU from these tasks and thus, increase cloud-based revenue. (SmartNIC cards help accelerate some infrastructure tasks, but are a peripheral device under the control of the CPU. The IPU offloads infrastructure functions and offers an additional layer of security and greater flexibility in host-to-storage configurations, by separating tenant tasks on the CPU and CSP infrastructure functions on the IPU.)

Intel showed both an FPGA-based solution, and a new ASIC-based IPU named Mount Evans, as shown below. The cores in Mount Evans are based on the new Arm Neoverse (N1) architecture that is tightly coupled with the best-in-class packet processing pipeline and hardware accelerators.

OneAPI

Briefly, Intel described their work on the industry-standard “OneAPI” software toolkit development, an effort to provide:

a data-parallel software language (e.g., DPC++, especially for accelerators)
an open S/W development stack for CPUs and XPUs (e.g., GPUs, accelerators)
software library APIs for machine learning, video/media stream processing, matrix/vector math
a full development toolkit (compiler, debugger, accelerator hardware interface models)

Key areas of focus are the:

the definition of required hardware accelerator capabilities and services to interface with the software libraries
acceleration of data de/compression
optimization of the map-reduce framework (for faster database searches)
optimization of the data storage footprint

For more information on OneAPI, please refer to: www.oneapi.com .

Summary

Although best known for their CPU offerings, Intel’s breadth encompasses a much richer set of computing hardware and software products. At the recent Intel Architecture Day, they presented an aggressive roadmap for integrated, discrete, and (especially) data center GPUs, vying for leading performance across the full range of enthusiast/gamer and data center applications.

A close collaboration with a major CSP promises to significantly upgrade the efficiency of cloud operations, replacing the SmartNIC with a richer set of functionality in the IPU.

The OneAPI initiative will no doubt lead to higher software development productivity across a myriad of CPU plus accelerator architectures.

The Ponte Vecchio GPU deserves special mention, as an example of the tradeoff decisions in building a complex GPU accelerator, integrating silicon tiles from both TSMC and Intel foundries with Intel’s advanced packaging capabilities.

-chipguy

September 1, 2021September 2, 2021

Intel Architecture Day – Part 1: CPUs

Intel Architecture Day – Part 1: CPUs
by Tom Dillinger on 09-01-2021 at 6:00 am
Categories: Events, Foundries, Intel Foundry

Introduction

The optimization of computing throughput, data security, power efficiency, and total cost of ownership is an effort that involves managing interdependencies between silicon and packaging technologies, architecture, and software. We often tend to focus on the technology, yet the architecture and software utilities have as important a contribution to competitive product positioning, if not more so. Intel recently held their annual “Architecture Day”, providing an extensive set of presentations on their product roadmap.

The breadth of topics was vast, encompassing:

(client and data center) x86 CPUs
(discrete and integrated) GPUs, from enthusiast and gaming support to high performance AI-centric workloads
Interface Processing Units (IPUs), to optimize cloud service provider efficiency
operating system features for managing computing threads in a multi-core complex
open industry standards for software development application interfaces, focused on the integration of CPU and accelerator devices

This article will attempt to summarize key features of the upcoming CPU releases; a subsequent article will summarize the balance of the presentations.

“Performance” and “Efficient” x86 Cores

Intel introduced two new x86 core implementations – an “efficient” (e-core) and a performance-centric (p-core) offering.

The design considerations for the e-core included:

cache pre-fetch strategy
instruction cache size, and data cache size
L2$ (shared memory) architecture across cores
branch prediction efficiency, branch target buffer entries
instruction prefetch bandwidth, instruction retire bandwidth
x86 complex instruction micro-op decode and reuse strategy
out-of-order instruction dependency management resources (e.g., allocate/rename register space)
configuration of various execution units, and address generation load/store units

To maximize the power efficiency of the e-core, a wide (dynamic) supply voltage range is supported.

In the figure above, note the units associated with the x86 instructions using vector-based operands, to improve performance of the “dot-product plus accumulate” calculations inherent to deep learning software applications:

Vector Neural Network Instructions (VNNI, providing int8 calculations)
Advanced Vector Extensions (AVX-512, for fp16/fp32 calculations)

These instruction extensions accelerate neural network throughput. Active research is underway to determine the optimal data format(s) for neural network inference (with high accuracy), specifically the quantization of larger data types to smaller, more efficient operations – e.g., int4, int8, bfloat16. (The p-core adds another extension to further address machine learning application performance.)

An indication of the e-core performance measures is shown below, with comparisons to the previous generation “Skylake” core architecture – one core executing one thread on the left, and four e-cores running four threads on the right:

Whereas the efficient-core is a highly scalable microarchitecture focusing on multi-core performance per watt in a small footprint, the performance-core focuses on performance, low-latency and multi-threaded performance, with additional AI acceleration.

For example, the p-core buffers for OOO instruction reordering management and for data load/store operations are deeper.

As mentioned above, applications are selecting a more diverse set of data formats – the p-core also adds fp16 operation support.

Perhaps the most noteworthy addition to the p-core is the Advanced Matrix Extension instruction set. Whereas vector-based data serve as operands for AVX instructions, the AMX operations work on two-dimensional datasets.

Silicon “tiles” representing 2D register files are integrated with “TMUL” engines providing the matrix operations, as illustrated above.

The addition of AMX functionality is an indication of the diversity of AI workloads. The largest of deep learning neural networks utilize GPU-based hardware for both training and (batch > 1) inference. Yet, there are many AI applications where a relatively shallow network (often with batch = 1) is utilized – and, as mentioned earlier, the use of smaller data types for inference may provide sufficient accuracy, with better power/performance efficiency. It will be very interesting to see how a general purpose CPU with AMX extensions competes with GPUs (or other specialized hardware accelerators) for ML applications.

Thread Director

A key performance optimization in any computer architecture is the scheduling of program execution threads by the operating system onto the processing resources available.

One specific tradeoff is the allocation of a new thread to a core currently executing an existing thread. “Hyperthread-enabled” cores present two logical (virtual) processors to the O/S scheduler. Dual architectural state is provided in the core, with a single execution pipeline. Register, code return stack buffers, etc. are duplicated to support the two threads, at a small cost in silicon area, while subsets of other resources are statically allocated to the threads. Caches are shared. If execution of one thread is stalled, the other is enabled. The cache memory offers some benefit to the two threads, as shared code libraries may be common between threads of the same process.

Another option is to distribute thread execution across separate (symmetric) cores on the CPU until all cores are busy, before invoking hyperthreading.

A combination of p-cores and e-cores in the same CPU (otherwise known as a “big/little” architecture) introduces asymmetry into the O/S scheduler algorithm. The simplest approach would be to distinguish threads based on foreground (performance) and background (efficiency) processes – e.g., using “static” rules for scheduling. For the upcoming CPUs with both p- and e-cores, Intel has integrated additional power/performance monitoring circuitry to provide the O/S scheduler with “hints” on the optimum core assignment – i.e., a runtime-based scheduling approach. An illustration of Intel’s Thread Director is shown below.

Additionally, based on thread priority, an executing thread could transition between a p-core and e-core. Also, threads may be “parked” or “unparked”.

P-cores support hyperthreading, whereas e-cores execute a single thread.

Intel has collaborated with Microsoft to incorporate Thread Director support in the upcoming Windows-11 O/S release. (Windows-10 will still support p- and e-core CPUs, without the real-time telemetry-based scheduler allocation. At the Architecture Day, no mention was made of the status of Thread Director support for other operating systems.)

Alder Lake

The first release of a client CPU with the new p- and c-cores will be Alder Lake, expected to be formally announced at the Intel InnovatiON event in October.

In addition to the new cores, Alder Lake incorporates PCIe Gen 5 and DDR5 memory interfaces. The Alder Lake product family will span a range of target markets, from desktop (125W) to mobile to ultra-mobile (9W), with an integrated GPU core (I7 node).

Sapphire Rapids

The first data center part with the new p-core will be Sapphire Rapids, to be formally announced in 1Q2022 (Intel7 node).

The physical implementation of Sapphire Rapids incorporates “tiles” (also known as “chiplets”), which utilize the unique Intel EMIB bridge silicon for adjacent tile-to-tile dense interconnectivity.

Note that Sapphire Rapids also integrates the Compute Express Link (CXL1.1) industry-standard protocol, to provide a cache-coherent implementation of (heterogeneous) CPU-to-memory, CPU-to-I/O, and CPU-to-device (accelerator) architectures. (For more information, please refer to: www.computeexpresslink.org.)

The memory space is unified across devices – a CPU host manages the cache memory coherency. An I/O device typically utilizes a DMA memory transfer to/from system memory, an un-cached architectural model. With CXL, I/O and accelerator devices enable some/all of their associated memory as part of the unified (cache-coherent) memory space. The electrical and physical interface is based on PCIe. PCIe Gen 5 defines auto-negotiation methods for PCIe and CXL protocols between the host and connected devices.

Another unique feature of Sapphire Rapids is the application of HBM2 memory stacks integrated into the package, as depicted below.

The intent is to provide memory “tiering” – the HBM2 stacks could serve as part of the “flat” memory space, or as caches to external system memory.

Summary

Intel recently described several new products based on p-core and e-core x86 architectural updates, to be shipped in the next few calendar quarters – Alder Lake (client) and Sapphire Rapids (data center). A new Advanced Matrix Extension (AMX) microarchitecture offers a unique opportunity to accelerate ML workloads, fitting a gap in the power/performance envelope between AVX instructions and dedicated GPU/accelerator hardware. The execution of multiples threads on asymmetric cores will benefit from the real-time interaction between the CPU and the (Windows-11) O/S scheduler.

These products also support new PCIe Gen5, DDR5, and the CXL1.1 protocol for unified memory space management across devices.

As mentioned in the Introduction, optimization of systems design is based on tradeoffs in process technology, architecture, and software. The announcements at Intel Architecture Day provide an excellent foundation for current and future product roadmaps, through successive technology generations.

-chipguy