ads mdx semiwiki building trust gen 800x100ai

Performance, Power and Area (PPA) Benefits Through Intelligent Clock Networks

Performance, Power and Area (PPA) Benefits Through Intelligent Clock Networks
by Kalar Rajendiran on 12-10-2021 at 10:00 am

4 What is Maestro ICN

One of the sessions at the Linley Fall Processor Conference 2021 was the SoC Design session. With a horizontal focus, it included presentations of interest to a variety of different market applications. The talk by Mo Faisal, CEO of Movellus, caught my attention as it promises to solve a chronic issue relating to synchronizing clock networks. While clock synchronization reduces the chance of signal hazards, the act of synchronization leads to performance, power and area inefficiencies. Over the years, many different approaches have been deployed to reduce these inefficiencies. But most of these techniques still depend on clock mesh and/or clock tree trunks and traces and use clock buffers for fanning out the clock signals.

While Mo’s talk was titled “Clock Networks in a multi-core AI SoC, the solution he presented is applicable to all SoCs. The following is a synthesis of what I gathered from his presentation.

Drawbacks of Traditional Solutions

Traditional clock networks are either a mesh or a tree implemented with wires and buffers. The buffers don’t have intelligence into what is going on with the SoCs. The implementation is typically over designed with clock buffers. Movellus claims that SoCs lose about 30%-50% of their performance due to inefficiencies introduced by clock networks. In addition, there is a significant power overhead on the SoC total dynamic power (TDP) budget and introduction of latencies. Improving the quality of clock distribution networks can improve the PPA of the entire SoC.

Movellus’ Solution

Through its intelligent clock network technology named Maestro, Movellus can ameliorate or eliminate the inefficiencies introduced by traditional clock networks.  Maestro technology consists of multiple components to achieve this. In his presentation, Mo shows a smart clock module (SCM) which senses and compensates for on-chip variation (OCV) effects and skew across an entire SoC. The SCM has awareness of on-chip variation (OCV), skew and temperature drift and dynamically aligns the clock network across the entire SoC. It pushes the common clock point very close to the flops on which the clocks are operating.

Movellus’ architectural innovation drives the delivery of the following three benefits.

      • Latency Reduction
      • Energy Efficiency
      • Max Throughput

While the above attributes are typical requirements for most applications, these are particularly critical for today’s AI driven edge applications.

The Maestro solution is offered in soft IP form and fits into any EDA tool flow, making it easy to integrate into any SoC.

Some Use Cases

The Maestro technology can bring benefits to both heterogeneous SoCs and homogeneous SoCs. A heterogeneous SoC consists of many different subsystems with different care abouts, whether speed, power or timing closure. Refer to Figure below.

While Mo showcases the value of Maestro technology using a homogeneous SoC example through the bulk of his presentation, the insights gained can be directly applied to the different subsystems of a heterogeneous SoC such as the one shown above. For example, the ability to do multi-rate communication without clock-domain-crossing (CDC) FIFOs:  A SoC with a compute core running at a higher frequency with the rest of the chip running at half clock rate. With the Maestro solution, data can be moved from I/O flop to I/O flop without having to add retiming flops and CDC FIFOs. With an AI SoC where the data bus width is very wide, the maestro solution will save lot of retiming flops, reducing latency and improving PPA.

Mo calls the Maestro solution a very high-quality large-scale synchronization method at the lowest power possible.

Higher Speed

With Maestro, the common clock point is pushed very close to the flops by using SCM. Refer to Figure below for the intra-core example used. The core is a 3 sq.mm in N7 node, running at 2.5GHz. The divergent insertion delay was reduced from 750ps to 200 psec. Even with the 5ps Maestro overhead, the OCV-driven speed sacrifice is driven down from 26% to 8.3%, delivering about 18% gain is useful cycle time.

Lower Power

Traditional global clock networks typically use some variation of a clock mesh to bring the clock to all the cores and is always-on and consuming power. Refer to the Figure below for the example used. In this example, the traditional approach burns 2.5W all the time, independent of the SoC run time utilization level. The total dynamic power (TDP) of the example SoC is 50W. Under the traditional approach, the global clock distribution power at 2.5W is at 5% of the TDP. At a 20% utilization level, the 2.5W is 25% of the 10W dynamic power consumption. Generally speaking, average utilization levels are well below 100%.

For this example, a Maestro implementation helps keep the global clock distribution power at or below 2.5% of the TDP under various utilization levels.

Resultant Benefits

While the above examples quantified the efficiency gains along speed and energy dimensions, there are other tangible benefits from using the Maestro technology. For example, the ease of handling multi-rate clocks in a heterogeneous SoC. Another example is the ease of implementing the global level clock network. Once the intra-core clock network is fixed, the global clock network gets automatically corrected. All that is needed is to hook it up with a normal global level clock tree straight out of clock tree synthesis. There is no need to balance the global clock distribution. The die area savings and latency reduction through the avoidance of a large number of buffers and/or retiming flops could be significant too.

New Opportunities to Innovate

Mo encourages SoC architects and implementation specialists to think of new use cases Maestro technology could enable in their designs. What can one do with a large-scale synchronization capability like this? Does this help with simplification of software? What can you do with extra timing margin?

Mo closes his talk with the following teaser. He suggests that the amount of performance that is sacrificed to accommodate for OCV effects is only 1/3 of the performance gain that Maestro solution can deliver to an SoC. There are other details of the Maestro architecture which were not disclosed during the presentation. For more details, contact Movellus.

Also Read:

Advantages of Large-Scale Synchronous Clocking Domains in AI Chip Designs

It’s Now Time for Smart Clock Networks

CEO Interview: Mo Faisal of Movellus


Podcast EP52: A Preview of the Upcoming IEDM Meeting

Podcast EP52: A Preview of the Upcoming IEDM Meeting
by Daniel Nenni on 12-10-2021 at 10:00 am

Dan is joined by Srabanti Chowdhury, the publicity co-chair for IEDM, which will be an in-person conference December 11-15 at the Hilton San Francisco Union Square. Dan explores the topics to be discussed at the upcoming meeting and what they suggest about the future of semiconductors.

Srabanti Chowdhury is an associate professor of Electrical Engineering (EE) and a Senior Fellow of Precourt Institute at Stanford. She leads the Wide bandgap (WBG) Lab at Stanford, where her research focuses on the wideband gap (WBG) and ultra-wide bandgap (UWBG) materials and device engineering for energy-efficient and compact system architecture for various applications, including power, RF, computation, and emerging ones. Besides Gallium Nitride, her group is exploring Diamond for various active and passive electronic applications, particularly thermal management.

Srabanti received her M.S and Ph.D. in Electrical Engineering from the University of California, Santa Barbara working on Vertical GaN Switches.

She received the DARPA Young Faculty Award, NSF CAREER, and AFOSR Young Investigator Program (YIP) in 2015. In 2016 she received the Young Scientist award at the International Symposium on Compound Semiconductors (ISCS). She is a senior member of IEEE and an alumni of NAE Frontiers of Engineering.  She received the Alfred P. Sloan fellowship in Physics in 2020.  To date, her work has produced over 6 book chapters, 90 journal papers, 110 conference presentations, and 26 issued patents.  She serves the program committee of several IEEE conferences including IRPS and VLSI Symposium, and the executive committee of IEDM. She serves as the Associate Editor of Transaction Electron Devices as well as two committees under IEEE Electron Device Society (Compound Semiconductor Devices & Circuits Committee Members and Power Devices and ICs Committee).

The views, thoughts, and opinions expressed in these podcasts belong solely to the speaker, and not to the speaker’s employer, organization, committee or any other group or individual.


Ansys CEO Ajei Gopal’s Keynote on 3D-IC at Samsung SAFE Forum

Ansys CEO Ajei Gopal’s Keynote on 3D-IC at Samsung SAFE Forum
by Tom Simon on 12-09-2021 at 10:00 am

Ajei Gopal talks about 3D IC

System on chip (SoC) based design has long been recognized as a powerful method to offer product differentiation through higher performance and expanded functionality. Yet, it comes with a number of limitations, such as high cost of development.  Also, SoCs are monolithic, which can inhibit rapid adaptation in the face of changing market needs. Furthermore, integration of mixed elements into a single die, such as memory, RF, FPGA, CMOS, optical etc. can complicate product delivery. These factors have led to the growth of 2.5D and 3D-IC which can offer a high degree of package level integration while providing flexibility and freedom from yield risks and extra costs associated with single die SoCs.

At the recent Samsung Advanced Foundry Ecosystem Forum Ajei Gopal, President and Chief Executive Officer of Ansys, gave a keynote address that focused on this issue and the new types of analysis that will be needed to enable system growth through 3D-IC. Ajei spoke about how Samsung’s eXtend-Cube (X-Cube) can offer integration of multi die assemblies to create compact high-performance systems. According to Ajei X-Cube is suitable for 5G, AI, high performance computing, as well as wearables and IoT.

Ajei said to facilitate rapidly building 3D-ICs, physics based simulation can be used to account for all the effects that need to be considered in these new designs. The twist is that now many differing materials are being combined in a single package. There are new requirements for structural and fluids simulations that are critical to predict cooling and thermal warping and to ensure reliable solder ball connections. Also, electromagnetic interactions will become more significant.

Ajei cited an example where a customer used RedHawk-SC to model current flowing through thousands of microbumps and predicted that in some locations that there would be enough heat to melt the bumps. This would have led to a catastrophic failure of the 3D-IC module.

The real crux of what Ajei had to say was that while 3D-ICs are necessary for the innovations that the market calls for, to meet these needs a partnership is needed between multiple vendors to offer a complete and comprehensive solution. Not only has Ansys partnered with Samsung in areas like sign-off for EM effects in 3D-IC modules, but a broader partnership is required to satisfy design needs.

Ansys has partnered with Synopsys to integrate RedHawk, HFSS and IcePak into Synopsys 3D-IC compiler to provide highly accurate signal, thermal and power data. This combination of tools assures faster design closure with fewer iterations. Designers can also use Ansys SeaScape to apply machine learning algorithms to help filter analysis scenarios and dramatically trim analysis time.

It’s been widely understood for decades that no single vendor can provide the optimal solution for the complexities of IC, and now 3D-IC module, design. Ajei emphasized that any given analysis tool for simulation of multi-physics can take decades of effort to implement and validate. It makes the most sense to leverage several vendors to create an optimal solution. It’s best for designers when vendors work together proactively, instead of asking users to cobble something together. It was heartening to see this spirit of cooperation emphasized at this Samsung event. The only way designs that meet market needs will be produced is through multilateral cooperation. The Samsung SAFE event is available for on-demand viewing online, including the keynote address and the individual partner presentations.

Also Read

Ansys to Present Multiphysics Cloud Enablement with Microsoft Azure at DAC

Big Data Helps Boost PDN Sign Off Coverage

Bonds, Wire-bonds: No Time to Mesh Mesh It All with Phi Plus


RedCap Will Accelerate 5G for IoT

RedCap Will Accelerate 5G for IoT
by Bernard Murphy on 12-08-2021 at 6:00 am

5G 2021 min

You could be forgiven for wondering why I should push 5G when it might seem marketing is still ahead of deployment. While we may not all have it today, GlobeNewswire (September 22, 2021 12:30 ET) estimates there will be 700 million 5G connections across the world by the end of this year. That’s pretty rapid growth already, though still mostly driven by subscriber adoption. However, a key goal of 5G was always to extend cellular far beyond our phones, to trillions of IoT endpoints. Release 17 and later 18 from 3GPP are already moving to make these use models much more real, especially around a new standard called 5G RedCap.

Redefining the network

5G is a major advance over LTE, designed not only for performance but also for scalability to trillions of nodes. The classical cellular model, endpoints communicating with base stations which then communicate with central stations, is not scalable to that level. The way to resolve this problem is through disaggregation, distributing compute and radio management within the network. A central unit communicates with distributed units, which in turn connect with radio units (base stations or small cells) which in turn connect to endpoints. The tree structure is more scalable than a star structure. In a tree, each gateway and radio head can make manageably large numbers of connections.

But the infrastructure must locally handle a lot more processing, because each node must condense raw traffic for upstream nodes. More compute, AI/ML capabilities and beamforming features move into those cells and distributed units to handle many different classes of traffic. Traffic from safety critical functions for cars and remote surgeries, to factory automation, to mobile gaming and 8K streaming on your phones. Network operators are then able to provide software-driven network slicing to tier these services, so cat videos don’t override traffic or surgical safety.

The hardware supporting these functions can’t be general purpose CPUs. The hardware must provide a lot of horsepower certainly, but also AI/ML and signal processing, as well as general compute. Which is why you see the big mobile network equipment makers (and even operators) getting active again in chip design and chip partnerships. Open-RAN accelerates competition in this area, stimulating product advances not only from existing infrastructure builders but also new players.

Next RedCap

The IoT is not a monolithic producer and consumer of mobile traffic. Some devices can get by with short and infrequent bursts, suitable for a standard like NB-IoT. But some endpoints need more bandwidth. Surveillance cameras and AI glasses will work with video streams. Or at least frequent abstracted streams (detected objects and AR overlays). Vehicle V2X and telemetry on the other hand aim to support safety, traffic updates, emergency reporting and over the air software updates. All of which require decent performance and bandwidth.

This is where RedCap, short for Reduced Capacity, comes in. Nir Shapira (Director Strategic Technologies at CEVA) explained RedCap to me this way. The 5G triangle splits usage into enhanced mobile broadband (eMBB) at the top. Ultra-reliable low latency communications (URLLC) and massive machine type communications (MMTC) form the bottom two corners of the triangle. RedCap sits somewhere between eMBB and URLLC. RedCap offers performance similar to LTE while also able to take advantage of the 5G infrastructure features. Features such as network slicing and local intelligence in nearby infrastructure.

More disaggregation, more options, more opportunity

Disaggregation and Open-RAN create a lot of opportunity for chip and module makers in the infrastructure. RedCap adds opportunity for IoT solution builders who need bandwidth and potentially some of the 5G infrastructure services. At lower power/energy consumption that a 5G mobile phone. That’s likely to cover a lot of use-cases. Maybe you should talk to CEVA when you’re building your 5G cellular IoT product plans😀. They already have an impressive footprint in endpoint and infrastructure applications.

Also read:

CEVA Fortrix™ SecureD2D IP: Securing Communications between Heterogeneous Chiplets

AI at the Edge No Longer Means Dumbed-Down AI

Ultra-Wide Band Finds New Relevance


Webinar on Dealing with the Pain Points of AI/ML Hardware

Webinar on Dealing with the Pain Points of AI/ML Hardware
by Tom Simon on 12-07-2021 at 6:00 am

Achronix FPGA for AI/ML

Ever increasing data handling demands make creating hardware for many applications extremely difficult. In an upcoming webinar Achronix, a leading supplier of FPGA’s, talks about the data handling requirements for AI/ML applications – which are growing at perhaps one of the highest rates of all. Just looking at all data generated and consumed in general, the webinar host Tom Spencer, Senior Manager of Product Marketing at Achronix, points to the 294 million emails, 230 million tweets and over a billion searches performed daily. The worldwide totals for stored data have accelerated from 4.4 Zetabytes in 2018 to 44 ZB in 2020 and are expected to grow to 175 ZB by 2025. A Zetabyte is 10^21 bytes.

AI/ML applications are especially burdened because they rely on rapidly growing training sets, network models and data used for inference. According to Tom, there are a number of significant pain points associated with developing hardware for AI/ML. Indeed, the title of the webinar is “How to Overcome the Pain Points of AI/ML Hardware”. Tom artfully narrows down the choice between competing accelerator choices: GPU, FPGA and ASIC. He sees FPGAs as offering the most flexibility. FPGAs provide low latency and can get much more work done in a clock cycle than the alternatives. Also, FPGAs can handle massive data due to their data flow structure.

OK, but what are the pain points? Tom is prepared to talk about the three pain points that must be dealt with to deliver hardware that can handle the task.

Compute power has been a limiting factor in building AI/ML solutions. AI/ML requires trillions of integer and/or floating point operations per second. The data formats needed include fixed and floating from 3 bits to 64, and now often include newer formats such as Block Floating Point (BFP) and bFloat16.

Data has to be able to move on and off chip rapidly, otherwise processing will fall behind. Applications such as autonomous driving need to support high frame rates for high-resolution video. The need to achieve timing closure and build interfaces from scratch adds to the burden.

Similar to external data movement, FPGAs need to have the ability to move data internally to facilitate the data flow in the neural network. AI/ML requires huge amounts of parallel processing elements to store and pass data internally. In many cases there can be resulting timing closure issues or precious FPGA logic resources used up for this task.

Achronix FPGA for AI/ML

The webinar will talk about how the Achronix Speedster7t FPGA family can address each of these pain points, making system design much easier and delivering improved performance. The Speedster7t is available as a stand-alone FPGA device, embeddable FPGA IP or in a packaged solution – such as the VectorPath accelerator card.

Achronix Speedster7t has specific features that work together to enable AI/ML workloads. The webinar will discuss in detail each of them – which I can summarize here. First of all, there is are specialized Machine Learning Processors (MLP) available as resources for AI/ML operations such as MAC. There are over 2500 MLPs per device. Each one has control, arithmetic and storage functions.

Next, the Speedster7t FPGA fabric is built with a 2D Network on Chip (NoC) that handles data transfers from one element to another. Because it is separate from the FPGA fabric elements, valuable resources are not used just to transfer data across the array. The NoC is high speed, with more that 20 Tbps bidirectional throughput in aggregate.

Lastly, moving data on and off chip to external storage is accelerated by high speed GDDR6 and DDR4 interfaces. The GDDR6 support provides 8 controllers with 16 lanes for massive parallelism and flexibility. The DDR4 provides 64b interfaces to 128 GByte of RAM.

Achronix offers comprehensive software support for AI/ML applications with a wide selection of frameworks, neural network models and development systems. They are targeting solutions such as CNNs, RNNs, Transformer Networks and Feed Forward.

This webinar should provide a lot of useful information to developers of AI/ML hardware who are looking for a smoother path to a working product. Achronix has proven that they offer innovation, such as their embeddable FPGA fabric, 2D NoC and highspeed interfaces. The webinar can be viewed on December 16th at 10AM PST. Reserve your spot here.


CEO Interview: Fares Mubarak of SPARK Microsystems

CEO Interview: Fares Mubarak of SPARK Microsystems
by Daniel Nenni on 12-06-2021 at 10:00 am

Fares Mubarak profile

Fares Mubarak is a seasoned Global Executive with more than 30 years of broad management and hands-on experience spanning semiconductor design, software development, operations, sales, marketing, applications, EDA and healthcare IT.

Mubarak was VP/GM of the Semiconductor Business Unit followed by VP of Semiconductor Industry Sales and Business Development at ANSYS, the world’s leader in engineering simulation.

Before ANSYS, Mubarak was President of TeleResults, a Healthcare IT company focused on transplant and organ disease patient management. In his prior role, Mubarak was Sr. Vice President of Marketing and Engineering at Actel Corporation, a fabless Field Programmable Gate Array leader that was acquired by Microsemi Corporation.

Prior to his 18-year tenure at Actel, Mubarak held various management and engineering roles at AMD and Samsung Semiconductor. Mubarak holds a MSEE degree from Case Western Reserve University and a MBA from Golden Gate University.

What is the SPARK Microsystems backstory?
Analysts have predicted that the number of connected devices may reach 29.3 billion by 2023, indicating a CAGR of 20% since 2011. At this growth rate there will be seven devices for every human being on the planet within the next 5 years. Some of this growth is driven by traditional long-range communications and networking applications. Advanced wireless communication technologies such as 5G and WiFi 6 support these markets. However, a significant portion of this growth is expected to be fueled by new and exciting short-range wireless applications such as Personal Area Networks, AR/VR, gaming, positioning and IoT edge devices. These markets are expected to grow beyond $2 Trillions by 2030. Legacy short-range wireless protocols still rely on radio architectures developed in the 1990s forcing engineers to make significant compromises in their designs and product offerings. Spark Microsystems is at the forefront of developing advanced ultra-wide-band technologies for the next generation of short-range wireless devices.

SPARK Microsystems is unique in the ultra-wide band (UWB) market in that we recognized UWB’s untapped potential for high-speed multimedia and data communications at extreme low latency and low power. The SPARK Microsystems suite of UWB transceivers, the SR1000 family, has been designed specifically to meet these needs while operating reliably in noisy RF environments. More so, SPARK Microsystems’ UWB ICs consume an order of magnitude less power than Bluetooth Low Energy (BLE), the lowest energy, short-range wireless connectivity technology commercially deployed today.

While UWB is mostly being leveraged for ranging and positioning applications today, big opportunities are also in store for a new realm of short-range wireless connectivity applications – well beyond what we can imagine today. The capabilities of the SPARK Microsystems SR1000 family will be invaluable for these types of wireless application – and it’s potentially a long list of apps. We’re encouraged to see some of the world’s largest technology powerhouses together pouring billions of dollars into UWB technology today – collectively we’re looking forward to advancing some major market opportunities.

What are SPARK Microsystems’ product differentiators?
With SPARK Microsystems UWB wireless transceivers, huge volumes of data and high-quality, uncompressed audio and multimedia can be delivered with 60X lower latency and 40X better energy efficiency than legacy wireless ICs. This is hugely beneficial not only for consumer wireless applications, but also for the myriad IoT, smart city and AI applications on the horizon that will require UWB-caliber, high-speed communication among sprawling networks of battery-powered wireless sensors.

The SPARK Microsystems SR1000 UWB IC family fully leverages the UWB spectrum to simultaneously deliver industry-leading energy efficiency, latency and bandwidth, enabling consumers to wirelessly connect to a broad range of devices within their personal area network. Simply put, we can finally have wired-like experiences without any of the wires. With a proven sub-250 microseconds latency, longer battery life, faster data transmission and uncompressed audio, SPARK Microsystems delivers to gamers a new generation of wireless mice, headsets and other peripherals that close the performance gap with wired alternatives once and for all. These benefits transfer into other applications, like audio streaming and AR/VR/XR, as well.

In the IIoT environment, SPARK Microsystems allows UWB wireless sensor solutions to last 5X-10X longer on the same battery, and their ultra-low latency enables robust and high-performance mesh networks in noisy RF environments. SPARK Microsystems UWB-based sensors ensure that a mere 20% or less of the sensor power budget is consumed by the wireless comms chip. Depending on how you’re using your sensors, this could enable operations for many years before a drained battery ever becomes an issue. With so little power consumed by the UWB chip, this also opens the door to a future of battery-less sensors powered by nothing more than ambient indoor light, or even body heat.

Where have you seen the most market traction?
We’re seeing vast technology and market potential for UWB within the consumer technology market, with major implications for the next-generation of smartphones, wireless gaming peripherals, audio earphones and much more. And UWB is great for positioning apps, but this represents only a minor share of UWB’s potential. Our customer traction is predominantly in low power, low latency, high bandwidth data communications for high-res audio and consumer devices, such as gaming accessories and Extended Reality (XR) applications.

SPARK Microsystems’ UWB chips are ideally and uniquely positioned to excel within the next generation of XR, a superset of Augmented Reality (AR), Virtual Reality (VR) and Mixed Reality (MR), eyewear, headsets and peripherals. Analysts have projected that XR could deliver a $1.5 trillion boost to the global economy by 2030, observing that “XR technology can benefit virtually all industries.” Relative to Bluetooth, the gains in data throughput, reductions in latency and increases in energy efficiency afforded by SPARK Microsystems’ UWB improve responsiveness and reduce lag to synchronously harness all our senses and deliver ultra-immersive XR experiences in a way we’ve never experienced before.

These benefits are what make SPARK Microsystems’ UWB so exceptionally attractive to the XR market going forward and the technology will no doubt prove to be a major asset for future AR/VR/MR/XR hardware development initiatives.

How is SPARK Microsystems contributing to the advancement of the UWB standards? What are some elements that must be included in the next evolution of the UWB standards?
The accelerated development and commercialization of UWB technology presents a massive market opportunity for low latency, low power wireless sensing and communications. As such, SPARK Microsystems is a member of both the UWB Alliance and the FiRaTM Consortium to accelerate the development and adoption of UWB technology. We are working with both organizations to influence regulatory matters and develop international UWB technology standards.

In the next iteration of the IEEE UWB standards, we hope to see a stronger emphasis placed on the data communications architecture. Data delivered over the UWB spectrum can be sent in microseconds with extremely low latency, enabling ultra-efficient wireless data communication. Contributing our knowledge and expertise to both the UWB Alliance and the FiRa Consortium allows us to have an influence on the nascent UWB technology and IEEE standards in multiple industries.

What opportunities will this technology enable in the future?
We see a massive opportunity for UWB to improve the use of AI and edge computing, especially in IoT and IIoT sensor node applications. AI’s benefits are reliant on vast amounts of data being transmitted in real-time, but current low-power wireless solutions significantly restrict the amount of data that systems can transmit. SPARK Microsystems’ UWB enables high-speed, high-bandwidth data transmission and low power processing at the edge to feed AI engines. We envision a future of smart homes and smart buildings with wireless connectivity and battery-less sensor operations, which significantly reduces the carbon footprint.

There is also an opportunity for UWB to serve as the last mile alongside long haul for 5G. With considerably more efficient data transmission, inherently lower latency, and substantially less power requirements, these features allow for increased connectivity and reliability, as well as better coverage of large areas. SPARK’s UWB can make it possible to wirelessly connect devices and wirelessly stream rich multimedia and audio content with zero latency over emerging 5G networks.

Also Read:

CEO Interview: Mo Faisal of Movellus

CEO Interview: Da Chaung of Expedera

CEO Interview: Charbel Rizk of Oculi


Enlisting Entropy to Generate Secure SoC Root Keys

Enlisting Entropy to Generate Secure SoC Root Keys
by Tom Simon on 12-06-2021 at 6:00 am

NVM attacks

Most methods of securing SOCs involve storing a root key that provides the basis of all derived keys and encryption of communication. The weakness with these methods is that even if the root key is stored in secure non-volatile memory, there are often methods to read the key. Once a key has been divulged the device can be cloned and its security is compromised. With long and complex supply chains there is a likelihood that physical devices may come within reach of attackers. With physical access, made easy through supply chains or remote deployment, such as is often the case with IoT devices, keys stored in eFuses, Flash EEPROM or even OTP NVM can be detected.

Weaknesses of Traditional Non-Volatile Storage

Taking Advantage of Variation

It turns out that designers can enlist the help of silicon physical properties that frequently cause annoyance to help solve this problem. Usually entropy is the enemy of chip designers because it can lead to variations of chip operation affecting performance and yield. However, Intrinsic ID utilizes the unavoidable small variations that occur during manufacturing to create unique and secure root keys. As any chip designer knows before memories are initialized their value is unknown. Small variations among the devices in an SRAM cell can lead to either a 1 or 0 state at power on. These unique variations are consistent enough that they give a cell a high probability of entering the same state consistently. So, like a fingerprint on your hand there is a repeatable but unique pattern that can be read. This behavior can be used to create what is called a Physically Unclonable Function (PUF).

Intrinsic ID uses the initial values of a region of SRAM in combination with algorithms that account for any inconsistencies in the result to generate a root key on the fly for use by the root of trust. Derived keys can be created from this root key as well. To facilitate the generation of the root key, the enrollment process generates helper data that get stored locally. This helper data cannot be used to reverse engineer the root key, so even if it is read out, the root key is still secure.

Flexible Implementation

Intrinsic ID offers three methods to take advantage of PUF-based secure key storage. For SOCs their QuiddiKey hardware IP can be used in conjunction with their software driver. All that is needed is standard SRAM, no new mask layers or special processes. Their hardware and drivers contain attack countermeasures. It is standards compliant and NIST CAVP certified. For reliability they use advanced error correction that guarantees operation from -55˚C to +155˚C. There is even anti-aging to ensure consistency over a long useful life and support for multiple derived keys that are also secure.

Intrinsic ID’s Security Solutions

For FPGA based designs they offer their Apollo product that includes RTL for the FPGA fabric and software drivers that support all the necessary functionality. If the system is implemented in a MCU based system, the on-chip SRAM can be used with the key generation taking place in software. Their BK software suite is used for this application. Regardless of which implementation is used, the root key is never stored in non-volatile memory. The key never leaves the security sub-system and the only data that is stored is public.

High Security and Convenience

Intrinsic ID’s solution offers many advantages. Along with extremely high security, it is low cost because it can be used on any conventional process. It comes with random number generation (RNG) that is hardware based and is accessible through their certified software driver. The PUF enabled products have been certified by EMVCo, CC, EAL6+, PSA, ioXt and Global Platform. With 300 million ICs already using this technology in areas such as G&D, banking and IoT, they have plenty of experience with meeting customer needs for security. More information is available at www.intrinsic-id.com/products.

Also Read:

Using PUFs for Random Number Generation

Using PUFs for Random Number Generation

Webinar: How to Protect Sensitive Data with Silicon Fingerprints


Live 58th Design Automation Conference Coverage!

Live 58th Design Automation Conference Coverage!
by Daniel Nenni on 12-05-2021 at 10:00 am

Dan and Shushana Friday Harbor San Juan Islands

My beautiful first mate and I will be together at DAC this year. Her first DAC was 1985 in Las Vegas and we lived happily ever after. SemiWiki bloggers Tom Dillinger and Daniel Payne will also be at DAC attending sessions and meeting with exhibiting companies to learn and blog about the latest innovations inside the semiconductor ecosystem.

This year DAC will start with the traditional Sunday night reception and opening keynote: “EDA Growth Accelerates as Moore’s Law Slows” at 5pm on Sunday by Charles Shi, PhD, Vice President | Research Analyst, Semiconductors & Semiconductor Equipment, Needham & Company, LLC of Needham and Company. This is a must attend event for all EDA people.

Description:
It may be counter-intuitive to argue that electronic design automation (EDA) industry could see accelerated growth because Moore’s Law is slowing down. In this presentation, I will walk you through my reasons why such could be the case. We believe the slowing Moore’s Law has led to design diversification with domain-specific chip designs replacing one-size-fits-all designs, has motivated systems companies to enter the silicon race, and has nurtured the recent renaissance of semiconductor startups. EDA, IP, and foundry are key enablers and beneficiaries of these trends. In addition, the slowing Moore’s Law means chip-level scaling must be complemented with package- or system-level scaling, which creates a greater need for system design and analysis that will significantly expand the scope of EDA as well as its market size.

We are convinced that the strong growth of EDA in 2020 and 2021 was not a “Covid phenomenon” but the beginning of a new era that will feature strong double-digit growth for the EDA industry. Last but not least, we believe EDA can play a key role in mitigating the global chip shortage that may last beyond 2022, as foundries push more designs migrating to sub-20nm nodes. We argue design migration to sub-20nm nodes is an underappreciated alternative to ease chip shortage other than massive capacity additions at 28nm and above.

Charles joined our podcast last week if you would like to hear our banter on EDA. Next is the welcome reception. This is normally the best networking opportunity of DAC. If you are there please introduce yourself. It would be a pleasure to meet you all.

There will be a couple of interesting book signings on the exhibition floor. On Monday and Tuesday Wally Rhines will be signing free copies of his book “Predicting Semiconductor Business Trends” in the Infinisim booth #1652. My beautiful wife and I will be there as well. In booth #1543 S2C EDA will be giving away copies of my book “Prototypical II The Practice of FPGA Prototyping for SoC Design“. I will be there from 1-2pm on Monday and Tuesday for signings.

The rest of my time will be spent at the DAC keynotes and walking the exhibition floor meeting with friends and people who I consider family from the semiconductor ecosystem, absolutely

I hope to see you there!

About DAC

The Design Automation Conference (DAC) is recognized as the premier conference for design and automation of electronic systems.  DAC offers outstanding training, education, exhibits and superb networking opportunities for designers, researchers, tool developers and vendors.


A Next-Generation Prototyping System for ASIC and Pre-Silicon Software Development

A Next-Generation Prototyping System for ASIC and Pre-Silicon Software Development
by Kalar Rajendiran on 12-05-2021 at 6:00 am

Corigine Prototyping Systems

Every now and then, disruptive technology is brought to market, challenging the way things have been done to that point. We are all familiar with many such technologies. The rhetorical question is, how many of us were aware, recognized and acknowledged those technologies before they became well established? For example, a startup called Corigine has been rethinking prototyping and emulation solutions for semiconductor products. They are on a mission to make prototyping capability accessible to a wider audience of software and hardware engineers, right from their desktop. And to offer far greater onboard capability than is currently available on traditional prototyping solutions, thereby relieving some of the dependence on expensive emulation infrastructure. Corigine  expects to disrupt the widely used traditional prototyping/emulation model with their recently announced solutions. Before a product gets recognized and acknowledged for its value, awareness of the product needs to happen. This article is about bringing that awareness.

Before reviewing these products from Corigine, a little backdrop would be useful to understand the impact such solutions could bring to the prototyping and emulation market. Whether inventing or innovating, prototyping comes into play. During inventions, prototyping is part of the invention process. During innovations, though not essential, prototyping is invariably done for pragmatic reasons. The reasons include cost optimization and time to market reduction for the complete product. While the basic reasons for prototyping are to verify conformance to specifications and validate performance to customer/product expectations, there are other reasons too. Most products are not that useful without software running on them. Consequently, a product launch requires having both the hardware and software ready at that same time.  Software developers seek a head start rather than waiting for the hardware to be ready in its final form.

Hardware/software co-verification has become a very important aspect of product development, for integration of software (sw) with hardware (hw) well before final chips and boards become available. A good prototyping system should allow for not only ease of verification of the hardware but also enable hardware validation, software development, debugging and hw/sw integration. It is in this context that the two product announcements from Corigine are of interest. Corigine is a fabless semiconductor company that designs and delivers leading edge I/O and networking products, IPs and some EDA solutions.

In summer of this year, Corigine introduced a prototyping platform called the MimicPro™ system for SoC, ASIC and IP subsystem verification and pre-silicon software development. The platform makes prototyping capabilities easily accessible to hardware verification engineers and software developers at pre-silicon stage, thus shortening the R&D cycle for final products. It’s a system built using Xilinx UltraScale™ FPGAs with the goal of optimizing functionality and performance. The press release can be accessed here.

In November, Corigine expanded their toolkit in this space with their MimicTurbo GT card. This offering makes the capabilities accessible to engineers right at their desktop. Through a PCIe-based MimicTurbo GT card, Corigine makes silicon verification and software R&D right from an engineer’s desktop. The press release can be accessed here.

Requirements For Next-Generation Prototyping Solutions

A next-generation prototyping solution should cost-effectively enable hardware/software co-verification in addition to offering the following functionality.

  • Enable Software development
  • Greater automation, for partitioning and more
  • Necessary debug capabilities and system view
  • Multi-user access to hardware for cost-management
  • Useful scalability for handling small/simple designs to large/complex SoCs
  • Suitability for use over the cloud and within the enterprise
  • Security for IP in use

In addition, the automotive market has some very stringent safety requirements for semiconductor/electronic products that should be addressed by a prototyping solution.

Corigine Solutions

The following is a synthesis of what I gathered by reviewing/previewing Corigine’s product brochures on MimicPro and MimicTurbo GT solutions, respectively.

MimicPro FPGA Prototyping System

The system enables early software development, system validation and regression testing, while significantly reducing development time and workload. It offers a full system view with high visibility for rapid debug capability through a System Scope Logic Analyzer. Its automatic clock handling eliminates the need for manual handling of gating clock which is prone to errors and unnecessary engineering workload. The auto partitioning feature reduces the need for manual intervention by automating pin-muxing and instrumentation. While the system addresses all the requirements identified in the earlier section, the following are of special mention.

Scalability

  • Scales from 30 million gates to 1 billion gates (1 to 32 FPGAs)

Support for Automotive Safety

  • FMEDA, HIL, ADAS (ISO26262)
  • Fault injection-Force/Release capability that are essential for debugging and functional safety prototyping

 

 

MimicPro System

 

MimicTurbo GT Card

The Card can be deployed in a 16-lane PCI Express slot and supports 64 GTY transceivers (16 Quads) along with the essential I/O interfaces and includes FMC and FMC+ connectors. Bundled with the MimicTurbo software, the solution delivers the best-in-class automated partitioning for larger System-on-Chip designs. The MimicTurbo software leverages the Xilinx VU19P HSTPM IP for extraordinarily low latency with I/O pin-muxing across transceivers.

Deployment Ease

The Corigine MimicTurbo GT card is designed for quick installation in a PCIe system. The card is designed with the necessary interfaces and pre-built connectors that enables users to quickly deploy the hardware environment with the Xilinx® Virtex® UltraScale+™ FPGA XCVU19P. To further ease deployment, the card can be configured as a benchtop standalone platform as well.

Performance

The Corigine MimicTurbo GT offers automated FPGA partitioning and interconnect while leveraging the high-speed Xilinx GT (GigaHertz Transceiver) I/O connection between multiple FPGAs to deliver multi-gigabits per second performance.  The card provides GT pin muxing and automatic clock control, and features a sixteen-lane PCI Express interface along with DDR4 component memory for performance.

Scalability

The Corigine MimicTurbo GT provides modular upgradability from an entry point of a single FPGA card with 48 million gates to multiple cards deployed in a single or multiple PCIe systems.

MimicTurbo GT Card

 

Availability

MimicPro Systems: General availability of 1, 4 and 32-FPGA based systems. Configurations containing more FPGAs to follow.

MimicTurbo GT Card: Sample availability starts in December 2021.

Summary

The MimicPro system offers the scalability needed for verifying and validating completing SoCs and IP subsystems and delivers deep local debug capabilities for quicker elimination of bugs. The MimicTurbo GT card along with the MimicTurbo prototyping software solution simplifies the deployment of FPGA based prototyping at the desktop. Together, they greatly accelerate silicon verification and pre-silicon software development at semiconductor companies. The offerings are designed for deployment within the enterprise as well as over the cloud. These Corigine EDA solutions are architected to address the needs of SoCs for AI, Automotive, Communication, Processing and Vision applications.

You can access a copy of the MimicPro product brochure here. For a copy of the MimicTurbo GT product brochure, contact Corigine via marketing@corigine.com. For additional information, visit www.corigine.com.

Corigine @DAC 2021

You can meet with Corigine and check out their latest products at Booth 2443, Dec 6 -Dec 8, 2021.

DAC 2021 is being held at Moscone Center, San Francisco, CA

Also Read

Facebook or Meta: Change the Head Coach

The Roots Of Silicon Valley

CMOS Forever?