Semiwiki EDA Webinar 800x100

Webinar on Dealing with the Pain Points of AI/ML Hardware

Webinar on Dealing with the Pain Points of AI/ML Hardware
by Tom Simon on 12-07-2021 at 6:00 am

Achronix FPGA for AI/ML

Ever increasing data handling demands make creating hardware for many applications extremely difficult. In an upcoming webinar Achronix, a leading supplier of FPGA’s, talks about the data handling requirements for AI/ML applications – which are growing at perhaps one of the highest rates of all. Just looking at all data generated and consumed in general, the webinar host Tom Spencer, Senior Manager of Product Marketing at Achronix, points to the 294 million emails, 230 million tweets and over a billion searches performed daily. The worldwide totals for stored data have accelerated from 4.4 Zetabytes in 2018 to 44 ZB in 2020 and are expected to grow to 175 ZB by 2025. A Zetabyte is 10^21 bytes.

AI/ML applications are especially burdened because they rely on rapidly growing training sets, network models and data used for inference. According to Tom, there are a number of significant pain points associated with developing hardware for AI/ML. Indeed, the title of the webinar is “How to Overcome the Pain Points of AI/ML Hardware”. Tom artfully narrows down the choice between competing accelerator choices: GPU, FPGA and ASIC. He sees FPGAs as offering the most flexibility. FPGAs provide low latency and can get much more work done in a clock cycle than the alternatives. Also, FPGAs can handle massive data due to their data flow structure.

OK, but what are the pain points? Tom is prepared to talk about the three pain points that must be dealt with to deliver hardware that can handle the task.

Compute power has been a limiting factor in building AI/ML solutions. AI/ML requires trillions of integer and/or floating point operations per second. The data formats needed include fixed and floating from 3 bits to 64, and now often include newer formats such as Block Floating Point (BFP) and bFloat16.

Data has to be able to move on and off chip rapidly, otherwise processing will fall behind. Applications such as autonomous driving need to support high frame rates for high-resolution video. The need to achieve timing closure and build interfaces from scratch adds to the burden.

Similar to external data movement, FPGAs need to have the ability to move data internally to facilitate the data flow in the neural network. AI/ML requires huge amounts of parallel processing elements to store and pass data internally. In many cases there can be resulting timing closure issues or precious FPGA logic resources used up for this task.

Achronix FPGA for AI/ML

The webinar will talk about how the Achronix Speedster7t FPGA family can address each of these pain points, making system design much easier and delivering improved performance. The Speedster7t is available as a stand-alone FPGA device, embeddable FPGA IP or in a packaged solution – such as the VectorPath accelerator card.

Achronix Speedster7t has specific features that work together to enable AI/ML workloads. The webinar will discuss in detail each of them – which I can summarize here. First of all, there is are specialized Machine Learning Processors (MLP) available as resources for AI/ML operations such as MAC. There are over 2500 MLPs per device. Each one has control, arithmetic and storage functions.

Next, the Speedster7t FPGA fabric is built with a 2D Network on Chip (NoC) that handles data transfers from one element to another. Because it is separate from the FPGA fabric elements, valuable resources are not used just to transfer data across the array. The NoC is high speed, with more that 20 Tbps bidirectional throughput in aggregate.

Lastly, moving data on and off chip to external storage is accelerated by high speed GDDR6 and DDR4 interfaces. The GDDR6 support provides 8 controllers with 16 lanes for massive parallelism and flexibility. The DDR4 provides 64b interfaces to 128 GByte of RAM.

Achronix offers comprehensive software support for AI/ML applications with a wide selection of frameworks, neural network models and development systems. They are targeting solutions such as CNNs, RNNs, Transformer Networks and Feed Forward.

This webinar should provide a lot of useful information to developers of AI/ML hardware who are looking for a smoother path to a working product. Achronix has proven that they offer innovation, such as their embeddable FPGA fabric, 2D NoC and highspeed interfaces. The webinar can be viewed on December 16th at 10AM PST. Reserve your spot here.


CEO Interview: Fares Mubarak of SPARK Microsystems

CEO Interview: Fares Mubarak of SPARK Microsystems
by Daniel Nenni on 12-06-2021 at 10:00 am

Fares Mubarak profile

Fares Mubarak is a seasoned Global Executive with more than 30 years of broad management and hands-on experience spanning semiconductor design, software development, operations, sales, marketing, applications, EDA and healthcare IT.

Mubarak was VP/GM of the Semiconductor Business Unit followed by VP of Semiconductor Industry Sales and Business Development at ANSYS, the world’s leader in engineering simulation.

Before ANSYS, Mubarak was President of TeleResults, a Healthcare IT company focused on transplant and organ disease patient management. In his prior role, Mubarak was Sr. Vice President of Marketing and Engineering at Actel Corporation, a fabless Field Programmable Gate Array leader that was acquired by Microsemi Corporation.

Prior to his 18-year tenure at Actel, Mubarak held various management and engineering roles at AMD and Samsung Semiconductor. Mubarak holds a MSEE degree from Case Western Reserve University and a MBA from Golden Gate University.

What is the SPARK Microsystems backstory?
Analysts have predicted that the number of connected devices may reach 29.3 billion by 2023, indicating a CAGR of 20% since 2011. At this growth rate there will be seven devices for every human being on the planet within the next 5 years. Some of this growth is driven by traditional long-range communications and networking applications. Advanced wireless communication technologies such as 5G and WiFi 6 support these markets. However, a significant portion of this growth is expected to be fueled by new and exciting short-range wireless applications such as Personal Area Networks, AR/VR, gaming, positioning and IoT edge devices. These markets are expected to grow beyond $2 Trillions by 2030. Legacy short-range wireless protocols still rely on radio architectures developed in the 1990s forcing engineers to make significant compromises in their designs and product offerings. Spark Microsystems is at the forefront of developing advanced ultra-wide-band technologies for the next generation of short-range wireless devices.

SPARK Microsystems is unique in the ultra-wide band (UWB) market in that we recognized UWB’s untapped potential for high-speed multimedia and data communications at extreme low latency and low power. The SPARK Microsystems suite of UWB transceivers, the SR1000 family, has been designed specifically to meet these needs while operating reliably in noisy RF environments. More so, SPARK Microsystems’ UWB ICs consume an order of magnitude less power than Bluetooth Low Energy (BLE), the lowest energy, short-range wireless connectivity technology commercially deployed today.

While UWB is mostly being leveraged for ranging and positioning applications today, big opportunities are also in store for a new realm of short-range wireless connectivity applications – well beyond what we can imagine today. The capabilities of the SPARK Microsystems SR1000 family will be invaluable for these types of wireless application – and it’s potentially a long list of apps. We’re encouraged to see some of the world’s largest technology powerhouses together pouring billions of dollars into UWB technology today – collectively we’re looking forward to advancing some major market opportunities.

What are SPARK Microsystems’ product differentiators?
With SPARK Microsystems UWB wireless transceivers, huge volumes of data and high-quality, uncompressed audio and multimedia can be delivered with 60X lower latency and 40X better energy efficiency than legacy wireless ICs. This is hugely beneficial not only for consumer wireless applications, but also for the myriad IoT, smart city and AI applications on the horizon that will require UWB-caliber, high-speed communication among sprawling networks of battery-powered wireless sensors.

The SPARK Microsystems SR1000 UWB IC family fully leverages the UWB spectrum to simultaneously deliver industry-leading energy efficiency, latency and bandwidth, enabling consumers to wirelessly connect to a broad range of devices within their personal area network. Simply put, we can finally have wired-like experiences without any of the wires. With a proven sub-250 microseconds latency, longer battery life, faster data transmission and uncompressed audio, SPARK Microsystems delivers to gamers a new generation of wireless mice, headsets and other peripherals that close the performance gap with wired alternatives once and for all. These benefits transfer into other applications, like audio streaming and AR/VR/XR, as well.

In the IIoT environment, SPARK Microsystems allows UWB wireless sensor solutions to last 5X-10X longer on the same battery, and their ultra-low latency enables robust and high-performance mesh networks in noisy RF environments. SPARK Microsystems UWB-based sensors ensure that a mere 20% or less of the sensor power budget is consumed by the wireless comms chip. Depending on how you’re using your sensors, this could enable operations for many years before a drained battery ever becomes an issue. With so little power consumed by the UWB chip, this also opens the door to a future of battery-less sensors powered by nothing more than ambient indoor light, or even body heat.

Where have you seen the most market traction?
We’re seeing vast technology and market potential for UWB within the consumer technology market, with major implications for the next-generation of smartphones, wireless gaming peripherals, audio earphones and much more. And UWB is great for positioning apps, but this represents only a minor share of UWB’s potential. Our customer traction is predominantly in low power, low latency, high bandwidth data communications for high-res audio and consumer devices, such as gaming accessories and Extended Reality (XR) applications.

SPARK Microsystems’ UWB chips are ideally and uniquely positioned to excel within the next generation of XR, a superset of Augmented Reality (AR), Virtual Reality (VR) and Mixed Reality (MR), eyewear, headsets and peripherals. Analysts have projected that XR could deliver a $1.5 trillion boost to the global economy by 2030, observing that “XR technology can benefit virtually all industries.” Relative to Bluetooth, the gains in data throughput, reductions in latency and increases in energy efficiency afforded by SPARK Microsystems’ UWB improve responsiveness and reduce lag to synchronously harness all our senses and deliver ultra-immersive XR experiences in a way we’ve never experienced before.

These benefits are what make SPARK Microsystems’ UWB so exceptionally attractive to the XR market going forward and the technology will no doubt prove to be a major asset for future AR/VR/MR/XR hardware development initiatives.

How is SPARK Microsystems contributing to the advancement of the UWB standards? What are some elements that must be included in the next evolution of the UWB standards?
The accelerated development and commercialization of UWB technology presents a massive market opportunity for low latency, low power wireless sensing and communications. As such, SPARK Microsystems is a member of both the UWB Alliance and the FiRaTM Consortium to accelerate the development and adoption of UWB technology. We are working with both organizations to influence regulatory matters and develop international UWB technology standards.

In the next iteration of the IEEE UWB standards, we hope to see a stronger emphasis placed on the data communications architecture. Data delivered over the UWB spectrum can be sent in microseconds with extremely low latency, enabling ultra-efficient wireless data communication. Contributing our knowledge and expertise to both the UWB Alliance and the FiRa Consortium allows us to have an influence on the nascent UWB technology and IEEE standards in multiple industries.

What opportunities will this technology enable in the future?
We see a massive opportunity for UWB to improve the use of AI and edge computing, especially in IoT and IIoT sensor node applications. AI’s benefits are reliant on vast amounts of data being transmitted in real-time, but current low-power wireless solutions significantly restrict the amount of data that systems can transmit. SPARK Microsystems’ UWB enables high-speed, high-bandwidth data transmission and low power processing at the edge to feed AI engines. We envision a future of smart homes and smart buildings with wireless connectivity and battery-less sensor operations, which significantly reduces the carbon footprint.

There is also an opportunity for UWB to serve as the last mile alongside long haul for 5G. With considerably more efficient data transmission, inherently lower latency, and substantially less power requirements, these features allow for increased connectivity and reliability, as well as better coverage of large areas. SPARK’s UWB can make it possible to wirelessly connect devices and wirelessly stream rich multimedia and audio content with zero latency over emerging 5G networks.

Also Read:

CEO Interview: Mo Faisal of Movellus

CEO Interview: Da Chaung of Expedera

CEO Interview: Charbel Rizk of Oculi


Enlisting Entropy to Generate Secure SoC Root Keys

Enlisting Entropy to Generate Secure SoC Root Keys
by Tom Simon on 12-06-2021 at 6:00 am

NVM attacks

Most methods of securing SOCs involve storing a root key that provides the basis of all derived keys and encryption of communication. The weakness with these methods is that even if the root key is stored in secure non-volatile memory, there are often methods to read the key. Once a key has been divulged the device can be cloned and its security is compromised. With long and complex supply chains there is a likelihood that physical devices may come within reach of attackers. With physical access, made easy through supply chains or remote deployment, such as is often the case with IoT devices, keys stored in eFuses, Flash EEPROM or even OTP NVM can be detected.

Weaknesses of Traditional Non-Volatile Storage

Taking Advantage of Variation

It turns out that designers can enlist the help of silicon physical properties that frequently cause annoyance to help solve this problem. Usually entropy is the enemy of chip designers because it can lead to variations of chip operation affecting performance and yield. However, Intrinsic ID utilizes the unavoidable small variations that occur during manufacturing to create unique and secure root keys. As any chip designer knows before memories are initialized their value is unknown. Small variations among the devices in an SRAM cell can lead to either a 1 or 0 state at power on. These unique variations are consistent enough that they give a cell a high probability of entering the same state consistently. So, like a fingerprint on your hand there is a repeatable but unique pattern that can be read. This behavior can be used to create what is called a Physically Unclonable Function (PUF).

Intrinsic ID uses the initial values of a region of SRAM in combination with algorithms that account for any inconsistencies in the result to generate a root key on the fly for use by the root of trust. Derived keys can be created from this root key as well. To facilitate the generation of the root key, the enrollment process generates helper data that get stored locally. This helper data cannot be used to reverse engineer the root key, so even if it is read out, the root key is still secure.

Flexible Implementation

Intrinsic ID offers three methods to take advantage of PUF-based secure key storage. For SOCs their QuiddiKey hardware IP can be used in conjunction with their software driver. All that is needed is standard SRAM, no new mask layers or special processes. Their hardware and drivers contain attack countermeasures. It is standards compliant and NIST CAVP certified. For reliability they use advanced error correction that guarantees operation from -55˚C to +155˚C. There is even anti-aging to ensure consistency over a long useful life and support for multiple derived keys that are also secure.

Intrinsic ID’s Security Solutions

For FPGA based designs they offer their Apollo product that includes RTL for the FPGA fabric and software drivers that support all the necessary functionality. If the system is implemented in a MCU based system, the on-chip SRAM can be used with the key generation taking place in software. Their BK software suite is used for this application. Regardless of which implementation is used, the root key is never stored in non-volatile memory. The key never leaves the security sub-system and the only data that is stored is public.

High Security and Convenience

Intrinsic ID’s solution offers many advantages. Along with extremely high security, it is low cost because it can be used on any conventional process. It comes with random number generation (RNG) that is hardware based and is accessible through their certified software driver. The PUF enabled products have been certified by EMVCo, CC, EAL6+, PSA, ioXt and Global Platform. With 300 million ICs already using this technology in areas such as G&D, banking and IoT, they have plenty of experience with meeting customer needs for security. More information is available at www.intrinsic-id.com/products.

Also Read:

Using PUFs for Random Number Generation

Using PUFs for Random Number Generation

Webinar: How to Protect Sensitive Data with Silicon Fingerprints


Live 58th Design Automation Conference Coverage!

Live 58th Design Automation Conference Coverage!
by Daniel Nenni on 12-05-2021 at 10:00 am

Dan and Shushana Friday Harbor San Juan Islands

My beautiful first mate and I will be together at DAC this year. Her first DAC was 1985 in Las Vegas and we lived happily ever after. SemiWiki bloggers Tom Dillinger and Daniel Payne will also be at DAC attending sessions and meeting with exhibiting companies to learn and blog about the latest innovations inside the semiconductor ecosystem.

This year DAC will start with the traditional Sunday night reception and opening keynote: “EDA Growth Accelerates as Moore’s Law Slows” at 5pm on Sunday by Charles Shi, PhD, Vice President | Research Analyst, Semiconductors & Semiconductor Equipment, Needham & Company, LLC of Needham and Company. This is a must attend event for all EDA people.

Description:
It may be counter-intuitive to argue that electronic design automation (EDA) industry could see accelerated growth because Moore’s Law is slowing down. In this presentation, I will walk you through my reasons why such could be the case. We believe the slowing Moore’s Law has led to design diversification with domain-specific chip designs replacing one-size-fits-all designs, has motivated systems companies to enter the silicon race, and has nurtured the recent renaissance of semiconductor startups. EDA, IP, and foundry are key enablers and beneficiaries of these trends. In addition, the slowing Moore’s Law means chip-level scaling must be complemented with package- or system-level scaling, which creates a greater need for system design and analysis that will significantly expand the scope of EDA as well as its market size.

We are convinced that the strong growth of EDA in 2020 and 2021 was not a “Covid phenomenon” but the beginning of a new era that will feature strong double-digit growth for the EDA industry. Last but not least, we believe EDA can play a key role in mitigating the global chip shortage that may last beyond 2022, as foundries push more designs migrating to sub-20nm nodes. We argue design migration to sub-20nm nodes is an underappreciated alternative to ease chip shortage other than massive capacity additions at 28nm and above.

Charles joined our podcast last week if you would like to hear our banter on EDA. Next is the welcome reception. This is normally the best networking opportunity of DAC. If you are there please introduce yourself. It would be a pleasure to meet you all.

There will be a couple of interesting book signings on the exhibition floor. On Monday and Tuesday Wally Rhines will be signing free copies of his book “Predicting Semiconductor Business Trends” in the Infinisim booth #1652. My beautiful wife and I will be there as well. In booth #1543 S2C EDA will be giving away copies of my book “Prototypical II The Practice of FPGA Prototyping for SoC Design“. I will be there from 1-2pm on Monday and Tuesday for signings.

The rest of my time will be spent at the DAC keynotes and walking the exhibition floor meeting with friends and people who I consider family from the semiconductor ecosystem, absolutely

I hope to see you there!

About DAC

The Design Automation Conference (DAC) is recognized as the premier conference for design and automation of electronic systems.  DAC offers outstanding training, education, exhibits and superb networking opportunities for designers, researchers, tool developers and vendors.


A Next-Generation Prototyping System for ASIC and Pre-Silicon Software Development

A Next-Generation Prototyping System for ASIC and Pre-Silicon Software Development
by Kalar Rajendiran on 12-05-2021 at 6:00 am

Corigine Prototyping Systems

Every now and then, disruptive technology is brought to market, challenging the way things have been done to that point. We are all familiar with many such technologies. The rhetorical question is, how many of us were aware, recognized and acknowledged those technologies before they became well established? For example, a startup called Corigine has been rethinking prototyping and emulation solutions for semiconductor products. They are on a mission to make prototyping capability accessible to a wider audience of software and hardware engineers, right from their desktop. And to offer far greater onboard capability than is currently available on traditional prototyping solutions, thereby relieving some of the dependence on expensive emulation infrastructure. Corigine  expects to disrupt the widely used traditional prototyping/emulation model with their recently announced solutions. Before a product gets recognized and acknowledged for its value, awareness of the product needs to happen. This article is about bringing that awareness.

Before reviewing these products from Corigine, a little backdrop would be useful to understand the impact such solutions could bring to the prototyping and emulation market. Whether inventing or innovating, prototyping comes into play. During inventions, prototyping is part of the invention process. During innovations, though not essential, prototyping is invariably done for pragmatic reasons. The reasons include cost optimization and time to market reduction for the complete product. While the basic reasons for prototyping are to verify conformance to specifications and validate performance to customer/product expectations, there are other reasons too. Most products are not that useful without software running on them. Consequently, a product launch requires having both the hardware and software ready at that same time.  Software developers seek a head start rather than waiting for the hardware to be ready in its final form.

Hardware/software co-verification has become a very important aspect of product development, for integration of software (sw) with hardware (hw) well before final chips and boards become available. A good prototyping system should allow for not only ease of verification of the hardware but also enable hardware validation, software development, debugging and hw/sw integration. It is in this context that the two product announcements from Corigine are of interest. Corigine is a fabless semiconductor company that designs and delivers leading edge I/O and networking products, IPs and some EDA solutions.

In summer of this year, Corigine introduced a prototyping platform called the MimicPro™ system for SoC, ASIC and IP subsystem verification and pre-silicon software development. The platform makes prototyping capabilities easily accessible to hardware verification engineers and software developers at pre-silicon stage, thus shortening the R&D cycle for final products. It’s a system built using Xilinx UltraScale™ FPGAs with the goal of optimizing functionality and performance. The press release can be accessed here.

In November, Corigine expanded their toolkit in this space with their MimicTurbo GT card. This offering makes the capabilities accessible to engineers right at their desktop. Through a PCIe-based MimicTurbo GT card, Corigine makes silicon verification and software R&D right from an engineer’s desktop. The press release can be accessed here.

Requirements For Next-Generation Prototyping Solutions

A next-generation prototyping solution should cost-effectively enable hardware/software co-verification in addition to offering the following functionality.

  • Enable Software development
  • Greater automation, for partitioning and more
  • Necessary debug capabilities and system view
  • Multi-user access to hardware for cost-management
  • Useful scalability for handling small/simple designs to large/complex SoCs
  • Suitability for use over the cloud and within the enterprise
  • Security for IP in use

In addition, the automotive market has some very stringent safety requirements for semiconductor/electronic products that should be addressed by a prototyping solution.

Corigine Solutions

The following is a synthesis of what I gathered by reviewing/previewing Corigine’s product brochures on MimicPro and MimicTurbo GT solutions, respectively.

MimicPro FPGA Prototyping System

The system enables early software development, system validation and regression testing, while significantly reducing development time and workload. It offers a full system view with high visibility for rapid debug capability through a System Scope Logic Analyzer. Its automatic clock handling eliminates the need for manual handling of gating clock which is prone to errors and unnecessary engineering workload. The auto partitioning feature reduces the need for manual intervention by automating pin-muxing and instrumentation. While the system addresses all the requirements identified in the earlier section, the following are of special mention.

Scalability

  • Scales from 30 million gates to 1 billion gates (1 to 32 FPGAs)

Support for Automotive Safety

  • FMEDA, HIL, ADAS (ISO26262)
  • Fault injection-Force/Release capability that are essential for debugging and functional safety prototyping

 

 

MimicPro System

 

MimicTurbo GT Card

The Card can be deployed in a 16-lane PCI Express slot and supports 64 GTY transceivers (16 Quads) along with the essential I/O interfaces and includes FMC and FMC+ connectors. Bundled with the MimicTurbo software, the solution delivers the best-in-class automated partitioning for larger System-on-Chip designs. The MimicTurbo software leverages the Xilinx VU19P HSTPM IP for extraordinarily low latency with I/O pin-muxing across transceivers.

Deployment Ease

The Corigine MimicTurbo GT card is designed for quick installation in a PCIe system. The card is designed with the necessary interfaces and pre-built connectors that enables users to quickly deploy the hardware environment with the Xilinx® Virtex® UltraScale+™ FPGA XCVU19P. To further ease deployment, the card can be configured as a benchtop standalone platform as well.

Performance

The Corigine MimicTurbo GT offers automated FPGA partitioning and interconnect while leveraging the high-speed Xilinx GT (GigaHertz Transceiver) I/O connection between multiple FPGAs to deliver multi-gigabits per second performance.  The card provides GT pin muxing and automatic clock control, and features a sixteen-lane PCI Express interface along with DDR4 component memory for performance.

Scalability

The Corigine MimicTurbo GT provides modular upgradability from an entry point of a single FPGA card with 48 million gates to multiple cards deployed in a single or multiple PCIe systems.

MimicTurbo GT Card

 

Availability

MimicPro Systems: General availability of 1, 4 and 32-FPGA based systems. Configurations containing more FPGAs to follow.

MimicTurbo GT Card: Sample availability starts in December 2021.

Summary

The MimicPro system offers the scalability needed for verifying and validating completing SoCs and IP subsystems and delivers deep local debug capabilities for quicker elimination of bugs. The MimicTurbo GT card along with the MimicTurbo prototyping software solution simplifies the deployment of FPGA based prototyping at the desktop. Together, they greatly accelerate silicon verification and pre-silicon software development at semiconductor companies. The offerings are designed for deployment within the enterprise as well as over the cloud. These Corigine EDA solutions are architected to address the needs of SoCs for AI, Automotive, Communication, Processing and Vision applications.

You can access a copy of the MimicPro product brochure here. For a copy of the MimicTurbo GT product brochure, contact Corigine via marketing@corigine.com. For additional information, visit www.corigine.com.

Corigine @DAC 2021

You can meet with Corigine and check out their latest products at Booth 2443, Dec 6 -Dec 8, 2021.

DAC 2021 is being held at Moscone Center, San Francisco, CA

Also Read

Facebook or Meta: Change the Head Coach

The Roots Of Silicon Valley

CMOS Forever?


Podcast EP51: A Preview of the Needham Keynote at DAC

Podcast EP51: A Preview of the Needham Keynote at DAC
by Daniel Nenni on 12-03-2021 at 10:00 am

Dan is joined by Charles Shi, Vice President & Research Analyst for Semiconductors & Semiconductor Equipment at Needham & Company. Charles will be doing an opening keynote next week at the Design Automation Conference. He covers EDA as well as semiconductor equipment at Needham.

Dan explores why Charles is bullish on the EDA sector and what he sees ahead for the industry and its customers.

The views, thoughts, and opinions expressed in these podcasts belong solely to the speaker, and not to the speaker’s employer, organization, committee or any other group or individual.


CEO Interview: Da Chuang of Expedera

CEO Interview: Da Chuang of Expedera
by Daniel Nenni on 12-03-2021 at 6:00 am

Da Chuang CEO Expedera

Da is co-founder and CEO of Expedera. Previously, he was cofounder and COO of Memoir Systems, an optimized memory IP startup, leading to a successful acquisition by Cisco. At Cisco, he led the Datacenter Switch ASICs for Nexus 3/9K, MDS, CSPG products. Da brings more than 25 years of ASIC experience at Cisco, Nvidia, and Abrizio. He holds a BS EECS from UC Berkeley, MS/PhD EE from Stanford.

Tell us about Expedera?
Expedera has developed deep learning accelerator (DLA) IP that has the industry’s best performance per watt—18TOPS/watt. Our solutions are scalable to 128 TOPS with a single core and to PetaOps with multicore. We started from the ground up with a hardware/software codesign approach that enables us to deliver the most power efficient and scalable deep learning accelerator (DLA) for AI inference.

Our design outperforms other DLA blocks from leading vendors such as Arm, MediaTek, Nvidia, and Qualcomm by at least 4–5x. We’ve validated this using our 7nm test chip.

We are targeting AI inference, particularly for edge applications. We have at least one customer in this space currently in production—a top smartphone manufacturer.

My cofounders and I  founded Expedera in 2018. Our office is in Santa Clara.

What problems/challenges are you solving?
We provide a highly efficient AI inference solution. If a customer needs deterministic performance or a guaranteed level of performance with the best possible power and area efficiency, we can do that. If they need a solution that doesn’t require off-chip memory, we can do that. If they need a flexible, future-proof solution that can handle mixed models, we can do that. We also bring efficiency to model deployment because our co-designed platform reduces software complexity dramatically and ensures predictable performance.

What markets does Expedera address today?
We have announced a top-10 smartphone customer, so it’s fair to say mobile and edge AI are a sweet spot for us. Because of our scalability and determinism, we are a good fit for automotive and industrial automation. In fact, we are engaged with customers from GOPS to PetaOPS.

What are the products Expedera has to offer?
Our Origin deep learning accelerator IP platform addresses a wide variety of AI inference applications. The platform includes silicon IP and a comprehensive SDK built on TVM  that provides a Compiler that achieves out-of-the-box high performance.  The platform allows us to easily support different precisions and features—it’s very flexible.

What keeps your customers [architects, system designers] up at night? 
The reality is that most AI processors are underperforming and stall at around 30-50% utilization or less —wasting most of their potential TOPS. So system architects and designers overdesign their SoC to address unpredictable performance. Expedera provides predictable, deterministic performance with 90% utilization. Greater utilization results in better throughput for customers. Our platform gives architects the end-to-end visibility needed to right-size their AI-accelerator solutions early in the development cycle.

Another issue is the difficulty, the delays, and the uncertainty in model deployment. Data scientist can spend tremendous amounts of time to achieve minimal performance improvement.  With Expedera, engineers can deploy trained models without further changes. That increases confidence in the design, and avoids difficult development tradeoffs, bottlenecks and product uncertainty.

What added value do you bring to your customers?
Confidence in their solution. Efficient operation. Ease of deployment. Reduced BOM costs.

Providing an AI-solution as an IP has huge implications for both our business and our customers. The IP licensing approach allows us to address a broad set of edge-AI markets, and potentially license to leading vendors that already hold large shares in these markets. At the same time, we can enable startups and new market entrants that may not have the in-house expertise to design their own AI hardware and would otherwise be unable to participate or compete with incumbents.

What makes Expedera unique?
We’ve taken a fundamentally different approach to AI acceleration, in part, because we come from a networking background. We’ve taken a network-centric approach—rather than the CPU-centric approach—to neural network processing. We are able to segment the neural network into packets which are essentially command streams that can be efficiently scheduled and executed by our hardware in a very fast, efficient and deterministic manner. Additionally, our co-design approach enables a simpler software stack and a more productive and system-aware design and development experience.

What’s driving the company’s global expansion/growth?
The market expectation of AI-everywhere is driving growth and creating a competitive necessity for ODMs to provide increasingly intelligent and autonomous products. We are still in the hockey stick of AI deployment.

Also Read:

CEO Interview: Pradeep Vajram of AlphaICs

CEO Interview: Charbel Rizk of Oculi

CEO Update: Tuomas Hollman, Minima Processor CEO


Low Power High Performance PCIe SerDes IP for Samsung Silicon

Low Power High Performance PCIe SerDes IP for Samsung Silicon
by Tom Simon on 12-02-2021 at 10:00 am

SerDes IP for PCIe

No matter how impressive the specifications are for an SoC, the power performance and area of the finished design all depend on the IP selected for the IO blocks. In particular, most SOCs designed for consumer and enterprise applications rely heavily on PCI Express. Because PCIe analog IP is critical to design success, Samsung has developed a solid relationship with the IP provider Analog Bits that was highlighted in a talk given by Analog Bits Executive VP Mahesh Tirupattur at the recent Samsung Advanced Foundry Ecosystem (SAFE) Forum. The talk is titled “PCIe/CXL SERDES- Gen4/5 Enterprise Class Serdes & Lowest Power Gen3/4 Consumer SERDES in Samsung 28nm to 5nm Processes”. Mahesh offers extensive information on their SERDES IP on Samsung processes from 32nm in 2012 up the present with their 7LPP and 5LPE support in 2021.

SerDes IP for PCIe

According to Mahesh, their primary focus is in helping their customers to create highly differentiated designs targeted at customer needs. To this end they focus on low power, small silicon footprint and flexible configurations, among other things.

For the consumer market, Analog Bits delivers PCIe Gen2 and Gen3 with multiprotocol ability for SATA, eDP, XFI, etc. Their full rate architecture offers industry leading picojoules per bit combined with the lowest system and BOM cost. They support wire bond packages (up to 10G) and integration with clock chips. Their small form factor lowers silicon costs. Lastly, they include some programmability to the supported protocols.

Their enterprise and high performance PCIe SERDES offer Gen4/5 and extensibility to support SAS4, Ethernet, etc. Designs can have lane counts of 195 to over 300 placed on all multiple sides. They offer automotive grades and provide multiple channel and chassis support with channel equalization. These SERDES have been used in data center storage, GPUs, aggregators, bridges, re-timers and AI/ML.

Analog Bits’ PCIe SERDES can be arrayed to varying link widths, i.e. x1, x2, x4, x8, x16. Lanes can be independently programmed to support any PCIe spec, SAS, Ethernet, etc. They also offer flexibility for placement anywhere on an SOC and wider packaging options to improve cost and performance. One of his slides highlights the low power obtained for PCIe Gen3 in 28 FDSOI and 28LP Bulk. Coming in at 0.1 sq mm they both have similar PCIe power and dynamic power, ~54mW and ~6.8 mW/Gbps respectively. The FDSOI leakage of 30.5 microWatts betters the LP Bulk at 46.6 microWatts.

Mahesh spends some time discussing sample layouts for low power SERDES in wire bond packages. He includes test result eye diagrams at 5Gb/s, 8Gb/s and 10Gb/s that all look wide open and clear. Even the eye diagram for the high-performance full rate PCIe Gen4 SERDES that is used on Samsung NVMe SSD is impressive. It uses 117.7 mW per lane at 16Gbps (7.35mW/Gbps) in an area of 0.26 sq. mm.

Analog Bits has silicon proven test chips and also production tape outs for their Gen3 and Gen4 SERDES on Samsung 7LPP/5LPE. The Gen4 silicon is 1-16G with power coming in at 6 pj/bit. The Gen 3 is 1-8G with power at 4pj/bit. Gen5/SAS4 is on Samsung 8LPP with working silicon. Its stats are 0.583 sq. mm and 7.6 pj/bit, and is configurable across multiple lanes.

The presentation goes into extensive detail on test results and available layout configurations. I highly recommend the presentation because of the level of detail that it provides. Analog bits has a long term history developing IP for the full range of Samsung processes. As noted above it is impressive that Samsung chose Analog Bits to provide SERDES IP for their own NMVe. The presentation is available at analogbits.com/analog-bits-pcie-cxl-serdes-in-samsung-video/

Also Read:

On-Chip Sensors Discussed at TSMC OIP

Package Pin-less PLLs Benefit Overall Chip PPA

Analog Sensing Now Essential for Boosting SOC Performance


Continuous Integration of RISC-V Testbenches

Continuous Integration of RISC-V Testbenches
by Daniel Nenni on 12-02-2021 at 6:00 am

RISC V Results

In my last blog post about AMIQ EDA, I talked with CEO and co-founder Cristian Amitroaie about their support for continuous integration (CI). We discussed in some detail how their Design and Verification Tools (DVT) Eclipse Integrated Development Environment (IDE) and Verissimo SystemVerilog Linter are used in CI flows. Cristian gave a fascinating example: AMIQ EDA runs CI lint checks every few hours on the contents of the Github repository for the Universal Verification Methodology (UVM) reference implementation, and makes the results publicly available. Any time that anyone contributing to this project checks in new or changed code, it will be linted quickly. This helps to improve the quality of the code, and publishing the reports fits the whole open-source ethos.

Cristian concluded by hinting that this same process could be applied to other SystemVerilog/UVM design and verification IP available from public repositories. Last week we found out what he meant, in a new press release announcing that they have set up a CI flow for the open-source UVM RISC-V verification environment from OpenHW Group. The members of the group are using the AMIQ EDA tool results to enhance the quality, portability, and readability of their code. I asked Cristian to tell me more and, when we talked, he was kind enough to bring along Mike Thompson, the OpenHW Group Director of Engineering, Verification Task Group and Gabriel Raducan, R&D Team Lead at AMIQ EDA. Here are the highlights of our conversation.

Thanks for joining me today. Can you please start by telling me about OpenHW Group?

Mike: I expect that your readers know about RISC-V, the widely adopted free and open instruction set architecture (ISA). Many companies, organizations, and academic institutions have developed processor cores, verification tools, and many kinds of supporting software for this ISA. Of course, there is widely varying quality across these offerings. We formed OpenHW Group to develop very robust and flexible RISC-V open-source cores and best-in-class open-source verification testbench environments.

Cristian: We’re seeing increasing interest in RISC-V among our users. It’s clearly a hot topic in the industry.

So where does AMIQ EDA come into the picture?

Mike:  As individual members of the OpenHW Group use their own simulators to develop testbenches, it is important to have readable and maintainable SystemVerilog/UVM code that can run on any commercial simulator. We looked for a lint tool that could play the central role in this effort, but there are few, if any, open-source or commercial linters that support testbench code, particularly SystemVerilog/UVM. I looked at the available options and tried Verissimo because I heard good things about it.

Cristian: Mike contacted us, and we collaborated to set up an environment to check the OpenHW testbench code with our tool and then deploy it.

What does that mean? What specifically did you do?

Gabriel: There were really four parts to the project. The first was us doing some initial linting runs on the testbench and discussing the results with Mike and members of his team.

Mike: Next, Gabriel explained that the rules to be checked by Verissimo are highly customizable and he proposed an initial set. We worked together to refine this set to fit our verification goals. If we didn’t deem a particular rule important, it was easy to waive or suppress the check.

Gabriel: The third phase was setting up the CI flow that we mentioned in the press release. Any time that anyone in the Verification Task Group checks in code, it is linted within a few hours and the results are posted openly in a dashboard format. These regression runs ensure that everyone’s contributions meet the OpenHW coding guidelines and quality metrics. Finally, we added the rule and waiver files to the OpenHW repository so that they are accessible to the team.

Isn’t this a whole lot like the UVM CI flow we talked about last time?

Cristian: It’s really very similar; in both cases we run regular lint regressions on an open-source repository. Engineers working on open-source projects invest a lot of time and energy, and we are happy if we can help. We see this as an ongoing collaborative process from which both parties benefit. In fact, we constantly monitor OpenHW discussions on Github to help with linting topics and interact with more team members.

Have you found any issues with the RISC-V testbench code in this process?

Mike: Yes, we have fixed many dozens of issues reported by Verissimo. Some were violations of our SystemVerilog/UVM coding guidelines that we previously had no automated way to detect, and some were due to rules we had not considered before. I especially like the rules that warn us about constructs that may work inconsistently on different simulators or that are not even supported on all simulators. It is important for our code to be vendor-neutral and portable.

Could you give some examples of these issues?

Gabriel: Sure! SystemVerilog prohibits using a null class in a logical expression. Some simulators allow this, but we report it as non-standard code. UVM specifies that the Verilog “$random” call should be avoided, but we found a few usages in some older testbench code. We also detected some cases of overrides that didn’t actually make any changes to the base classes, which is a waste of simulation time and resources.

How has the experience been working together?

Mike: AMIQ EDA has been a wonderful partner. They’ve been proactive, responsive, and fully supportive of our project goals.

Cristian: The same is true of Mike and the OpenHW folks. Like our other advanced users, their feedback is extremely valuable in improving our products and adding useful new features.

Where do you go from here?

Mike: I think that Verissimo is now an indispensable part of our RISC-V testbench development efforts. We are using GitHub issues to track lint violations flagged by Verissimo so that individual members can address the issues found in their sections of code. This will be an on-going process. Even with only a few months of experience so far, I can’t imagine not having Verissimo in our flow.

Thank you both very much for your time.

Cristian and Gabriel: Thank you, Dan, and thank you, Mike, for making time to join us today.

Mike: Thanks to the three of you as well; it’s been a pleasure!

Also Read

Continuous Integration of UVM Testbenches

What’s New with UVM and UVM Checking?

Why Would Anyone Perform Non-Standard Language Checks?


Ansys to Present Multiphysics Cloud Enablement with Microsoft Azure at DAC

Ansys to Present Multiphysics Cloud Enablement with Microsoft Azure at DAC
by Daniel Nenni on 12-01-2021 at 2:00 pm

Picture1 1

Ansys and Microsoft  collaborated extensively over the past year to optimize and test Ansys’ signoff multiphysics simulation tools on the Azure cloud. Microsoft has invited Ansys to present the joint results in Azure’s DAC booth theater in San Francisco this year.

Two presentations are planned: covering the enablement of Ansys RedHawk-SC™ for power integrity signoff, and discussing electromagnetic simulation of large electronic systems with Ansys HFSS™.  Today’s advanced node designs and compact 3D-IC systems can require large amounts of compute resources to verify, which makes them ideal candidates for distributed processing in the cloud.

Microsoft Azure and Ansys have set up and tested both tools on Azure to determine the optimal hardware and system configurations for maximum speed, usability, licensing, and resource efficiency. These performance results will be revealed using real customer examples by examining the impacts of memory size and instance counts on throughput.

The Ansys-Azure collaboration results will be presented live at the Microsoft Azure DAC booth #1253 on:

  • Monday 6th at 3:15PM – Ansys HFSS on Azure
  • Tuesday 7th at 12:15PM – Ansys RedHawk-SC on Azure

These are key data points for any electronic designers moving to the cloud or interested in hearing about the state-of-the-art in cloud computing. If you can’t make it to DAC this year, see these blogs for more information: “Ansys RedHawk-SC on Azure: Hold on to Your Socks” and “How Azure FX VM Makes Ansys RedHawk-SC™ Run Faster the Less You Spend”.

About Ansys
If you’ve ever seen a rocket launch, flown on an airplane, driven a car, used a computer, touched a mobile device, crossed a bridge or put on wearable technology, chances are you’ve used a product where Ansys software played a critical role in its creation. Ansys is the global leader in engineering simulation. Through our strategy of Pervasive Engineering Simulation, we help the world’s most innovative companies deliver radically better products to their customers. By offering the best and broadest portfolio of engineering simulation software, we help them solve the most complex design challenges and create products limited only by imagination. Founded in 1970, Ansys is headquartered south of Pittsburgh, Pennsylvania, U.S.A. Visit www.ansys.com for more information.

Also Read

Big Data Helps Boost PDN Sign Off Coverage

Bonds, Wire-bonds: No Time to Mesh Mesh It All with Phi Plus

Optical I/O Solutions for Next-Generation Computing Systems