BannerforSemiWiki 800x100 (2)

Perforce IP and Design Data Management #61DAC

Perforce IP and Design Data Management #61DAC
by Daniel Payne on 07-24-2024 at 10:00 am

Helix IPLM, Helix Core min

I recall first blogging about Helix IPLM (formerly Methodics IPLM) at DAC in 2012, then Perforce acquired the company in July 2020, so I stopped by the Perforce booth this year at DAC to get an update from Martin Hall, Principal Solutions Engineer at Perforce. Martin’s background includes working at Dassault Systemes, Synchronicity, Innoveda and Texas Instruments. The four big messages this year were:

  1. Managing costs and footprint in an AI world
    More effective management of costs through IP-centric design practices and managing/enforcing traceability for high value IP — such as AI GPUs and cores, and low power components — to help reduce power running costs and footprints.
  2. The critical need for AI data set management
    AI depends on curating large amounts of data on which to train models, but this data needs to be reviewed and cleaned to avoid pollution. Plus, new data needs to be onboarded in a measured way, and secondary data sets need to be weighted to influence AI outcomes appropriately.
  3. Plans for the commoditization of AI
    As the world of AI designs evolve, expect to see a move away from proprietary models to third-party AI solutions (standard AI processing units) that can be used as building blocks. Efficiently managing the AI IP supply chain is going to be vital, to reduce complexity, enable scale, improve security and prevent IP leakage.
  4. What’s new in Perforce Helix IPLM and Helix Core
    Helix IPLM and Helix Core together provide a unified, scalable IP and design data management platform that tracks IP and its metadata across projects, providing end-to-end traceability and enabling IP reuse. Some tier-one semiconductor firms use Perforce solutions for IP and data management, like NVIDIA, Micron, and Samsung.

Martin walked me through a demonstration of the Helix tools, where Helix Core takes in design files and Helix IPLM performs configuration management operations to enable an IP-level abstraction of these files. Beyond the base design file content, these configurations can also include data sheets, requirements, and meta-data representing the quality and state of the IP in its lifecycle. The resulting database is used to build workspaces, a centralized, corporate IP catalog, and generally organize the IP ecosystem across the enterprise.

This increased level of transparency of the corporate IP assets will increase IP reuse, saving time and money. This approach also provides complete traceability for a project and its constituent IP hierarchy. As a project moves through its lifecycle, each release can be memorialized as an object in the Helix IPLM platform. Important releases can be tagged and easily identified. For example, when an LVS/DRC clean physical implementation is reached for an IP. Releases can also be controlled to implement certain design methodology steps and then further qualified to manage upstream integration, improving IP quality and control throughout the enterprise.

The Bill of Material (BoM) defines the complete SoC IP hierarchy, including its subsystems, PDKs, SW and all of the dependencies. In Helix IPLM, an IP can be any SoC component, including design data, material meta-data, or even the design tool versions being used. Traceability is enforced by Helix IPLM using immutable releases for each asset, including the parent project. Each engineer on a project has a workspace that renders the design files from the BoM hierarchy, and team members are notified as changes are made or bugs issued. Helix IPLM is integrated with EDA vendor flows, Jira for bug tracking, and includes helpful analytics. Git, Subversion, and ClearCase are also supported as alternatives to Helix Core for the data management layer.

Helix IPLM, Helix Core

Part of Martin’s demo showed how an SoC with many IP blocks had an issue with an ADC block. In this scenario, the ADC vendor had changed their IP and then re-ran DRC/LVS, so a new version of the ADC block was released. The team was informed of the new release, reviewed the details, then integrated the new version into their SoC. This tight communication loop improves the design team’s velocity.

With Helix IPLM the user can quickly view all of the library elements as IP blocks in the web interface, or via the command line if preferred, to query the design. In Martin’s demo, Virtuoso was used to make schematic edits on an IP, and meta-data was used to tag this as a work in progress. A new release was made, and checks were run for consistency.  To make the release the DRC had to be clean first, and finally the IP version gets updated. Users can view the complete history for any IP to understand what has changed in each release version. Your team gets to enforce its own methodology as a set of rules, scripts, triggers and schema, so that each IP has management governance.

Summary

Designing an SoC is a complex endeavor, requiring scalable data management, IP lifecycle management and an open architecture to co-exist with all popular EDA tool flows. Perforce has been offering an IP Lifecycle Management tool for many years now with Helix IPLM, along with data management through Helix Core. The combination of Helix IPLM and Helix Core have been demonstrated at the major semiconductor design companies, so it’s worth taking a closer look at for your organization.

Related Blogs


IROC Introduces an Upgraded Solution for Soft Error Analysis and Mitigation #61DAC

IROC Introduces an Upgraded Solution for Soft Error Analysis and Mitigation #61DAC
by Mike Gianfagna on 07-24-2024 at 6:00 am

DAC Roundup – IROC Introduces an Upgraded Solution for Soft Error Analysis and Mitigation

#61DAC Is the place to go for the latest ideas, technology and products for semiconductor design and manufacturing. Between the exhibit floor and the technical program, you can get a vast education on almost any topic. In this post, I will focus on a unique company and a new version of a unique solution. IROC Technologies specializes in helping the semiconductor industry evaluate and manage reliability risks during chip design to minimize soft errors over the life of the design. Advanced semiconductor processes make circuits more sensitive to soft errors and the growing use of these circuits in reliability-critical applications demands protection against glitches of all kinds. Here are some useful details from the show floor where IROC introduces an upgraded solution for soft error analysis and mitigation.

What’s New and Why It Matters

I recently covered a critical part of the technology portfolio from IROC – TFIT.  This tool delivers a best-in-class transistor/cell level soft error simulator. It essentially performs a comprehensive analysis of the circuit and particle interactions to determine if there is a potential for soft errors to occur. What is unique about the software is that it runs models using a standard SPICE simulator. Other approaches require 3D TCAD simulators which are hard to setup and run slowly, so TFIT makes detailed analysis of circuits much more accessible since it runs 100X faster than TCAD simulators. Partnerships with major foundries also ensure accurate results.

As discussed in the prior post, TFIT can be used to calculate the SER of basic cells and helps optimizing the layout of radiation hardened designs.. Once a system is built with these  basic cells, the next question to answer is how resilient the overall system is to soft errors. IROC’s SoC Failure in Time (SoCFIT) addresses this challenge, and a new version of the tool was announced at #61DAC.

SoCFIT and Its Role in Soft Error Analysis and Mitigation

Dr. Maximilien Glorieux

I was fortunate to be able to spend some time at the IROC booth with Dr. Maximilien Glorieux, the CTO at IROC. Max has been a key driving force for tools like SoCFIT, so it was a very informative discussion. Max began by explaining that SoCFIT essentially provides the next level of analysis after TFIT.

The tool embeds a fault simulator, but it’s not like the ones used for test coverage that most of us are familiar with. These products inject faults (typically stuck at one or zero) into a circuit to see if a set of test vectors will find the fault. After applying the test vectors, if the output of the faulty circuit is different from the good circuit, that fault is deemed to be “covered”. 

Max explained that SoCFIT was doing a different kind of analysis. In this case, faults from single event upsets are injected into the circuit and the focus is on how these glitches propagate through the circuit. Many don’t propagate and so don’t represent high risk. But some do, and those logic paths must be fortified with approaches such as redundant logic and arbitration circuits to monitor the outputs of the redundant elements. If there is a discrepancy, the faulty data is filtered-out, and the back-up copies are used.

Protecting the whole SoC is an expensive process in terms of area and power, so a tool like SoCFIT is critical to ensure only the risky areas of the system are treated. The tool coordinates and analyzes a large amount of information about the system as shown in the graphic at the top of this post. Max explained that this work helps meet stringent functional safety standards by identifying critical circuit weakness from the earliest stages and throughout the design cycle.

Some of the features of SoCFIT include:

  • Comprehensive error propagation analysis: evaluates fault propagation based on circuit structure and simulation vectors
  • Detailed vulnerability reporting: computes logical (LDR), temporal (TDR), functional (FDR) de-rating/vulnerability factors
  • Broad design language support: compatible with SystemVerilog, Verilog, and VHDL, fitting seamlessly into existing workflows
  • Scalable for large designs: handles over 1 million flip-flops per partition, bottom-up approach makes it ideal for even the most complex SoC
  • Ultra-fast simulation: achieves over 1,000X faster simulations than typical approaches, drastically reducing analysis time
  • Extensive reporting: generates detailed reports highlighting the contribution of each cell, module, and instance to the overall FIT rate
  • Efficient mitigation strategies: provides clear guidelines for mitigating vulnerabilities, helping you develop robust and reliable designs

Max went on to describe the features of the newest release of SoCFIT, that includes FDR FastSIM, an ultra-fast fault propagation simulation engine. This capability allows an efficient functional de-rating analysis about 1,000 times faster than conventional methods. Max also mentioned that the tool is designed to integrate seamlessly into the whole digital design flow, significantly improving end-product reliability by mitigating transient fault threats early. Its advanced features and speed make it ideal for handling complex SoC designs, maintaining accuracy and efficiency throughout the process.

To Learn More

I came away from my visit with Max knowing a lot more about what IROC can do for a wide range of designs, and why the work they are doing is so important. If high-reliability operation is important in your design work, you should learn more about how IROC can help. You can get an overview of how IROC fits into many markets here. And you can get more details on SoCFIT here.  And that’s how IROC introduces an upgraded solution for soft error analysis and mitigation at #61DAC.


Cadence® Janus™ Network-on-Chip (NoC)

Cadence® Janus™ Network-on-Chip (NoC)
by Kalar Rajendiran on 07-23-2024 at 10:00 am

Design Flow when using Janus NoC

A Network-on-Chip (NoC) IP addresses the challenges of interconnect complexity in SoCs by significantly reducing wiring congestion and providing a scalable architecture. It allows for efficient communication among numerous initiators and targets with minimal latency and high speed. A NoC facilitates design changes, enabling quick iterations to meet specific design goals regarding bandwidth, latency, area, and power. Cadence recently expanded their system IP portfolio with the addition of the Janus NoC IP. At the surface, it may prompt the question, what is the big deal, NoC IP is not a new concept and this type of IP is common in the industry. I got deeper insights by chatting with Cadence’s George Wall, group director of product marketing and Ronen Perets, senior product marketing manager, both in the Cadence Silicon Solutions Group.

Integral Subsystem Component

The Cadence Janus NoC IP is in response to requests from the company’s customer base for expanded system-level solutions. This IP is an integral part of Cadence’s silicon solutions strategy, aimed at providing significant value to its licensee partners. It leverages Cadence’s extensive design expertise and best-in-class verification tools and methodologies, ensuring that the NoC meets the highest standards of quality and performance. This strategic addition enhances Cadence’s portfolio, making it a crucial component for advanced SoC designs. The IP is designed to handle inter-chiplet communication efficiently, using programmable routing and supporting dynamic configurations. The NoC is designed to support the evolving multi-chip module and chiplet-based design architectures. This adaptability ensures future-proofing for increasingly complex SoC designs.

Leverages Cadence’s Extensive Portfolio of Software and Hardware Offerings

Cadence offers a comprehensive system solution that includes processors with a full set of Software Development Tools (SDT) and Software Development Kits (SDK), Digital Signal Processors (DSP), libraries, and frameworks, I/O controllers to facilitate various interface requirements, and PHY for physical layer implementations ensuring reliable data transmission. The Cadence Janus NoC enhances performance, power, and area (PPA) by efficiently managing high-speed communications within and between silicon components with minimal latency. By optimizing RTL for PPA and utilizing packetized messages, the NoC reduces wire count and mitigates timing closure challenges, thereby accelerating time to market.

Architectural Exploration and Verification

Cadence offers extensive simulation and emulation options to support architectural exploration and verification. The Palladium Accelerator provides full visibility and increases simulation speed, making it ideal for extensive performance benchmarking. The Protium Platform maps the full SoC onto FPGAs for extremely fast emulation, which is particularly useful for debugging at the SoC level. SystemC modeling allows for fast debugging and firmware bring-up using a functional SystemC model generated alongside the RTL. Additionally, the Cadence Helium Virtual and Hybrid Studio enables the mixing of different model types and running each module on different platforms, facilitating performance monitoring and rapid iteration.

Designed for Ease of Use

The Cadence Janus NoC is designed with ease of use in mind, offering a highly configurable and flexible architecture. It features a GUI configuration tool that allows users to easily configure and generate NoC RTL, and comes with a comprehensive package that includes synthesis scripts, a testbench, and a functional model, streamlining the design process. Early optimization of NoC design is facilitated through iterative design exploration and performance validation using Cadence simulation and emulation technologies, along with the Cadence System Performance Analysis (SPA) tool, ensuring that the architecture meets performance needs.

Cadence Janus NoC Architecture

The Cadence Janus NoC architecture consists of three main components: the Initiator Endpoint Adapter (IEA), which connects initiator endpoints to the NoC; the Target Endpoint Adapter (TEA), which connects target endpoints to the NoC; and the Routing Node, which routes packets between IEAs and TEAs to their respective destinations. A typical NoC comprises multiple IEAs, TEAs, and routing nodes. These nodes are interconnected, allowing messages to traverse from origin to destination efficiently. Routing nodes can be configured to optimize bandwidth and latency, with pipeline stages added to maintain the desired speed despite physical distance challenges.

Summary

The Cadence Janus NoC architecture offers a scalable, efficient, and adaptable approach to addressing the complex interconnect requirements of modern SoCs. With advanced configuration tools, robust simulation and emulation options, and comprehensive power management and verification strategies, Cadence’s NoC technology empowers designers to create optimized, high-performance SoCs efficiently and effectively. By managing high-speed communications efficiently, the Janus NoC helps design teams achieve their PPA targets faster and with lower risk, freeing up valuable engineering resources for SoC differentiation. As the industry continues to evolve, Cadence Janus NoC stands as a future-proof platform, enabling designers to meet current and future demands with confidence.

You can learn more about the Janus NoC System IP from here.

Also Read:

Accelerating Analog Signoff with Parasitics

Novelty-Based Methods for Random Test Selection. Innovation in Verification

Using LLMs for Fault Localization. Innovation in Verification


A Joint Solution Toward SoC Design “Exploration and Integration” released by Defacto #61DAC

A Joint Solution Toward SoC Design “Exploration and Integration” released by Defacto #61DAC
by Daniel Nenni on 07-23-2024 at 6:00 am

flow ip explorer soc compiler (1)

When I was at DAC last month, I had the chance to talk with Chouki Aktouf and Bastien Gratréaux from Defacto and they told me about a new innovative solution to generate Arm-based System-on-Chips. I heard that this solution has now been released.

Defacto and Arm developed a joint SoC design flow to help Arm users cover all needed automation—from SoC design architecture and exploration to top-level generation of all needed files for implementation and verification flows.

Through the intuitive graphical interface from the Arm design platform, Arm IP Explorer, helps make specification of the SoC easy and user friendly. Once SoC exploration is realized, RTL and IP-XACT design files are automatically generated using Defacto’s SoC Compiler design solution.

The jointly developed solution is built around a strong link between Arm IP Explorer and Defacto’s SoC Compiler to enable users to generate quickly several SoC design configurations. The speed of the Defacto SoC Compiler enables the generation of a multitude of SoC configurations based on different user specifications. With this solution, the overall design time from specification to an SoC ready for synthesis can be significantly reduced.

Why was this solution needed?

With the complexity of current SoC designs and the design space possibilities, designers and architects face significant challenges when exploring SoC architectures. They traditionally access an IP database, where they select, configure, and download IP. The following step is to connect the IPs to build the complete SoC design database which is ready for logic synthesis. Iterative work is usually needed for each of the configurations created, which impacts overall turn-around time (TAT).

Providing a comprehensive and automated design solution from specification to implementation with all necessary exploration metrics, such as chip size, power consumption, and so on, is needed more than ever.

How it works?

The joint Arm IP Explorer/SoC Compiler solution is the shortest path from the definition of Arm-based system architecture to implementation and design verification.

The first step is that users access Arm IP Explorer and start selecting IP cores from the catalog. IP parameters can be set at this level and IP configuration in general is made easy. With the selected IPs, users can architect the complete system. The platform gives also the flexibility to add custom IPs to reflect the desired system. At this stage, an estimation of the overall size of the SoC is provided.

Integration checks are performed on-the-fly to ensure that the built SoC is correct including all needed and complex connectivity. The completed and validated system is then exported into the Defacto SoC Compiler, which automatically generates the top-level IP-XACT / RTL / UPF files, along with different reports. These reports provide detailed connectivity density, chip size, and power consumption.

The generated files are fully compatible with standard RTL2GDS SoC design flows and can be provided directly to both logic synthesis tools and design verifications tools. With the simplicity, speed, and flexibility of this solution, users can quickly and automatically explore and generate several SoC design configurations.

Who is this solution for?

This solution has been developed for Arm users who need to quickly build new Arm-based SoC configurations. Using this solution users increase efficiency and productivity, making easy to find and compare Arm IPs in a unique source. With the simplified IP configuration, coupled with the automatic generation of the top level SoC, users are drastically reducing costs and time to market.

This flow has already been validated for a large number of systems and is ready to be used for several applications such as IoT, automotive, mobile, 5G, cloud computing, HPC, AI, etc.

More information can be found on the Defacto page on the Arm partner website: https://www.arm.com/partners/catalog/defacto-technologies

To have a dedicated demo and presentation of the flow, feel free to reach out to Defacto by email. (info_req@defactotech.com)

Also Read:

Defacto at the 2024 Design Automation Conference

WEBINAR: Joint Pre synthesis RTL & Power Intent Assembly flow for Large System on Chips and Subsystems

Lowering the DFT Cost for Large SoCs with a Novel Test Point Exploration & Implementation Methodology

Defacto Celebrates 20th Anniversary @ DAC 2023!


TSMC Foundry 2.0 and Intel IDM 2.0

TSMC Foundry 2.0 and Intel IDM 2.0
by Daniel Nenni on 07-22-2024 at 10:00 am

TSMC 2Q2024 Investor Call

When Intel entered the foundry business with IDM 2.0 I was impressed. Yes, Intel had tried the foundry business before but this time they changed the face of the company with IDM 2.0 and went “all-in” so to speak. The progress has been impressive and today I think Intel is well positioned to capture the NOT TSMC business by providing a trusted alternative to the TSMC leading edge business. The one trillion dollar questions is: Will Intel take business away from TSMC on a competitive basis? I certainly hope so, for the greater good of the semiconductor industry.

On the most recent TSMC investor call, which is the first call with C.C. Wei as Chairman and CEO, TSMC branded their foundry strategy as Foundry 2.0. It is not a change of strategy, it is a new branding based on what TMSC has been successfully doing for years now, adding additional products and services to keep customers engaged. 3D IC packaging is a clear example but certainly not the only one. The Foundry 2.0 brand is well earned and is clearly targeted at Intel IDM 2.0 which I think is funny and a great example of CC Wei’s sharp wit.

I thought for sure that Intel 18A would be the breakout foundry node for Intel but according to the TSMC investor call, that is not the case. TSMC N3 was a runaway hit with 100% of the major design wins. Even Intel used TSMC N3. I hadn’t seen anything like this since TSMC 28nm which was on allocation as a result of being the only viable 28nm HKMG node out of the gate. History repeated itself with N3 due to the delay of 3nm alternatives. This made the TSMC ecosystem the strongest I have ever witnessed with both the domination of N3 and TSMC’s rapidly expanding packaging success. I had originally thought that some customers would stick with N3 until the second generation of N2 appeared but I was wrong. On yesterday’s investor call:

CC Wei: We expect the number of the new tape-outs for 2-nanometer technologies in its first two years to be higher than both 3-nanometer and 5-nanometer in their first two years. N2 will deliver full load performance and power benefit, with 10 to 15 speed improvement at the same power, or 25% to 30% power improvement at the same speed, and more than 15% chip density increase as compared with the N3E.

CC had mentioned this before but I can now confirm this based on my hallway discussions inside the ecosystem at recent conferences: N2 designs are in progress and will start taping out towards the end of this year.

I really don’t think the TSMC ecosystem gets enough credit, especially after the overwhelming success of N3, but the N2 node is a force in itself:

CC Wei: N2 technology development is progressing well, with device performance and yield on track or ahead of plan. N2 is on track for volume production in 2025 with a ramp profile similar to N3. With our strategy of continuous enhancement, we also introduce N2P as an extension of our N2 family. N2P features a further 5% performance at the same power or 5% to 10% power benefit at the same speed on top of N2. N2P will support both smartphone and HPC applications, and volume production is scheduled for the second half of 2026. We also introduce A16 as our next nanosheet-based technology, featuring Super Power Rail, or SPR, as a separate offering.

And, of course, the TSMC freight train continues:

CC Wei: TSMC’s SPR is an innovative, best-in-class backside power delivery solution that is forcing the industry to incorporate another backside contact scheme to preserve gate density and device with flexibility. Compared with N2P, A16 provides a further 8% to 10% speed improvement at the same power, or 15% to 20% power improvement at the same speed, and additional 7% to 10% chip density gain. A16 is best suited for specific HPC products with complex signal routes and dense power delivery network. Volume production is scheduled for the second half of 2026. We believe N2, N2P, A16, and its derivative will further extend our technology leadership position and enable TSMC to capture the growth opportunities way into the future.

Congratulations to TSMC on their continued success, it is well deserved. I also congratulate the Intel Foundry team for making a difference and I hope the 14A foundry node will give the industry a trusted alternative to TSMC out of the starting gate.  In my opinion, had it not been for Intel and of course CC Wei’s leadership and response to Intel’s challenge, we as an industry would not be quickly approaching the one trillion dollar revenue mark. Say what you want about Nvidia, but as Jensen Huang openly admits, TSMC and the foundry business is the real hero of the semiconductor industry, absolutely.

Also Read:

Has ASML Reached the Great Wall of China

The China Syndrome- The Meltdown Starts- Trump Trounces Taiwan- Chips Clipped

SEMICON West- Jubilant huge crowds- HBM & AI everywhere – CHIPS Act & IMEC


A New Class of Accelerator Debuts

A New Class of Accelerator Debuts
by Bernard Murphy on 07-22-2024 at 6:00 am

Chimera GPNPU Block diagram

I generally like to start my blogs with an application-centric viewpoint; what end-application is going to become faster, lower power or whatever because of this innovation? But sometimes an announcement defies such an easy classification because it is broadly useful. That’s the case for a recent release from Quadric, based on an architecture which seems to carve out a new approach to acceleration. This is able to serve a wide range of applications, from signal processing to GenAI with depth in performance, up to 864 TOPs per their announcement.

The core technology

Quadric’s roots are in AI acceleration, so let’s start there. By now we are all familiar with the basic needs for AI processing: a scalar engine to handle regular calculations, a vector engine to handle things like dot-products, and a tensor engine to handle linear algebra. And that’s how most accelerators work – 3 dedicated engines coupled in various creative ways. The Quadric Chimera approach is a little different. The core processing element is built around a common pipeline for all instruction types. Only at the compute step does it branch to an ALU for scalar operations or a vector/matrix unit for vector/tensor operations.

Both signal processing and AI demand heavy parallelism to meet acceptable throughput rates, handled through wide-word processing, lots of MACs and multi-core implementations. The same is true for the latest Quadric architecture, but again in a slightly different way. Their new cores are built around systolic arrays of processing elements, each supporting the same common pipeline, each with its own scalar ALU, bank of MACs and local register memory.

This structure, rather than a separate accelerator for each operator class, has two implications for product developers. First it simplifies software development, still highly parallel to be sure, but abstracting out a level of complexity in multi-engine accelerator architectures where operations must be steered to the appropriate engines.

Second, the nature of parallelism in transformer-based AI models (LLMs or ViT for example) is much more complex than for earlier generation ResNet-class accelerators which process through a sequence of layers. In contrast, transformer graphs flip back and forth between matrix, vector and scalar operations. In disaggregated hardware architectures traffic flows similarly must alternate between engines with inevitable performance overhead. In the Quadric approach, any engine can handle a stream of scalar, vector and tensor operations locally. Of course there will be overhead in traffic between PE cores, but this applies to all parallel systems.

Steve Roddy (VP Marketing for Quadric) tells me that in a virtual benchmark against a mainstream competitor, Quadric’s QC-Ultra IP delivered 2X more inferences/second/TOPs for a lower off-chip DDR bandwidth and at less than half the cycles/second of the competing solution. Quadric are now offering 3 platforms for the mainstream NPU market segment: QC Nano at 1-7 TOPs, QC Perform at 4-28 TOPs, and QC Ultra at 16-128 TOPs. That high end is already good enough to meet AI PC needs. Automotive users want more, especially for SAE-3 to SAE-5 applications. For this segment Quadric is targeting their QC-Multicore solution at up to 864 TOPs.

All these platforms are supported by the proven Chimera SDK. Steve had an interesting point here also. AI accelerator ventures will commonly mention their “model zoos”. These are standard AI models adapted through tuning to run on their architectures. Like function libraries in the conventional processor space. As for those libraries, model zoo libraries must be optimized to take full advantage of their architectures. By implication a new model requires the same level of tuning, a concern for new customers who must depend on the AI developer to handle that porting for them, each time they add or refine a model.

In contrast, Steve says Quadric already hosts hundreds of models on their site which simply compile without changes onto their platforms (you can still tune quantization to meet your specific needs). It’s not a model zoo, but simply a demonstration that their SDK is already mature enough to directly map a wide class of models without modification. And he notes that if your model needs an operator outside the ONNX set they already support, you can simply define that operator in C++, just as you would for say an NVIDIA accelerator.

Applications and growth

Quadric is a young company, shipping their first IP just over a year ago. Since then, they can already boast a handful of wins, especially in automotive. Customer names of course are secret, but DENSO is an investor of record. Other customer wins are in domains that reinforce the general-purpose value of the platform, in traditional camera functions, perhaps also in femtocell basebands (for MIMO processing). These two cases may or may not need AI support, but they do heavily lean on the DSP value of the platform.

This DSP capability is itself pretty interesting. Each PE can handle a mix of scalar and vector operations – up to 32b integer or 16b float – and these can be paralleled across up to 1024 PEs in a QC Ultra. So you can serve your immediate signal processing needs with high-end DSP word widths and add transformer-grade functionality to your engine later.

Sounds like a new breed of accelerator engine to me. You can learn more HERE.

Also Read:

2024 Outlook with Steve Roddy of Quadric

Fast Path to Baby Llama BringUp at the Edge

Vision Transformers Challenge Accelerator Architectures


Podcast EP236: Why Comprehensive Development Support for AI/ML is Important with Clay Johnson

Podcast EP236: Why Comprehensive Development Support for AI/ML is Important with Clay Johnson
by Daniel Nenni on 07-19-2024 at 10:00 am

Dan is joined by Clay Johnson, CEO of CacheQ. Clay has decades of executive experience in computing, FPGAs and development flows, including serving as Vice President of the Xilinx Spartan Business Unit which was acquired by AMD.

Clay discusses the changes occurring in system design to leverage AI/ML and technologies such as large language models. Clay points out that enabling these changes doesn’t end with the development of a new chip that performs AI algorithms faster.

Rather, the availability of a comprehensive development environment to integrate new technologies into existing systems becomes the key enabler to progress. Clay describes several examples of this trend.

CacheQ’s heterogeneous development platform enables easy development, deployment and orchestration of applications across multiple cores and heterogeneous distributed compute architectures. This results in significant increases in application performance and a dramatic reduction in development time.

The views, thoughts, and opinions expressed in these podcasts belong solely to the speaker, and not to the speaker’s employer, organization, committee or any other group or individual.


CEO Interview: Orr Danon of Hailo

CEO Interview: Orr Danon of Hailo
by Daniel Nenni on 07-19-2024 at 6:00 am

Orr Danon CEO Hailo

Orr Danon is the CEO and Co-Founder of Hailo. Prior to founding Hailo, Orr spent over a decade working at a leading IDF Technological Unit. During this time he led some of the largest and most complex interdisciplinary projects in the Israeli intelligence community. For the projects he developed and managed, Danon received the Israel Defense Award from the president of Israel, and the Creative Thinking Award from the Head of the Military Intelligence. Danon holds a B.Sc., Physics & Mathematics from the Hebrew University as part of the ”Talpiot” program and an M.Sc. in Electrical Engineering (cum laude) from the Tel Aviv University.

Tell us about your company?
Hailo is an edge AI-focused chipmaker. We develop specialized AI processors that enable high performance machine learning applications on edge devices such as NVRs, cameras, personal computers, vehicles, robots and more.

Hailo’s current key offerings include the Hailo-8 AI accelerator, which allows edge devices to run deep learning applications at full scale more efficiently, effectively, and sustainability; the Hailo-15, vision processor that can be placed directly into next generation of intelligent cameras, and the Hailo-10 GenAI accelerator, which empowers users to operate Generative AI locally and minimize reliance on cloud-based platforms.

What problems are you solving?
The Hailo AI processors bring data-center class performance to edge devices, enabling processing of advanced deep learning models in real-time and high accuracy, at a very low power consumption and attractive cost. Users can now run sophisticated AI tasks such as object detection, image enhancement, and content creation on edge devices without compromising on cost – solving previous issues with AI at the edge.

What application areas are your strongest?
We see a number of key application areas, including security, automotive, personal computers, and industrial automation.

Hailo is already serving more than 300 customers in these market segments.

Earlier in the year we announced that our Hailo-8 AI accelerator has been chosen alongside the Renesas R-Car V4H SoC to power the iMotion iDC High domain controller, advancing the future of autonomous driving. A Chinese automaker is expected to begin mass production with the domain controller in the second half of this year.

Additionally, we announced in June that Raspberry Pi had selected Hailo to provide AI accelerators for the Raspberry Pi AI Kit, the computing company’s AI-enabled add-on for Raspberry Pi 5. The partnership will empower both professional and enthusiast creators to elevate their projects and solutions in home automation, security, robotics and beyond, with advanced AI capabilities.

What keeps your customers up at night?
Our customers are concerned with ensuring high quality machine learning and AI services independently of network connectivity, and they’re concerned with their AI empowerment offering a strong performance-to-cost ratio and performance-to-power consumption ratio.

Another aspect which customers are always concerned about is the software tools which we as a silicon company provide. AI is a rapidly developing field, and the ability to respond fast to the dynamic market environment in which our customers operate depends heavily on the quality of the software toolchain, its documentation and of course the support we provide to them.

What does the competitive landscape look like and how do you differentiate?
Hailo is the only chipmaker who designed a processor specifically for running AI applications on edge devices, taking into consideration factors like cost, size, power consumption and memory access. Other AI processors, such as GPUs were not designed to run edge AI applications, and are therefore more costly and power consuming.

Additionally, Hailo is the only chipmaker who is offering a full range of AI processors at the single-digit Watt range – from accelerators that operate as co-processors that handle the AI models only, to full blown camera SOCs that handle both vision processing and AI video enhancement and analytics, all with a single, robust software suite that allows developers to use the same applications on different platforms.

What new features/technology are you working on?
We recently announced a $120M extended Series C fundraising round, which will be used for continued research and development, and the Hailo-10 generative AI accelerators that unlock the power of GenAI on edge devices, such as personal computers, smart vehicles, and commercial robots, Hailo-10 allows users to completely own their GenAI experiences, making them an integral part of their daily routine.

How do customers normally engage with your company?
To support the thousands of AI developers using Hailo devices, and to accommodate the growing Hailo community, we recently introduced an online developer community featuring tutorials, FAQs, and other resources to foster innovation among creators and developers. Registered members will have the opportunity to engage with a team of Hailo experts and connect with each other to share code, experiences, resources, knowledge, and more.

Visit https://hailo.ai/ for more information about our products, solutions and latest case studies or contact us here.

Also Read:

CEO Interview: David Heard of Infinera

CEO Interview: Dr. Matthew Putman of Nanotronics

CEO Interview: Dieter Therssen of Sigasi


Has ASML Reached the Great Wall of China

Has ASML Reached the Great Wall of China
by Claus Aasholm on 07-19-2024 at 6:00 am

ASML Holdings 2024

Is it time to abandon the ASML stock?

The first tool company to report Q2-24 results is ASML, and the lithography leader delivered a result above the guidance of EUR5.95B. Revenue of EUR6.242B is 4.9% above guidance and 18% above last quarter’s result of EUR5.29B.

Both operating profit and gross profit grew but not to the level of the end of last year. ASML management calls 2024 a transition year in investor communications, indicating a stronger 2025.

Tool revenue increased after a significant dip. Service Revenue is much more resilient than tool revenue, as it is dependent on the installed base of tools.

Almost all of the tool revenue growth came from memory tool sales, indicating that the memory companies are finally ready to make substantial investments in new capacity, which is much needed after the shift to HBM production.

From a product perspective, the short-term trend of EUV revenue decline continued while the immersion product sales were solid.

Immersion is a technique that utilises that light through water, resulting in amplification, allowing better resolution at the same light wavelength.

Given the Chips Act and other subsidies, the ASML result is somewhat counter-intuitive as EUV is used for 3-7nm leading-edge manufacturing nodes, and immersion is used for 7-14nm. Given the US attempt to become a leading-edge manufacturing location, it could be expected that leading-edge tools would dominate revenue. This indicates that the new factories are not yet in the tooling phase.

The other significant consumer of leading-edge tools is TSMC, which reported Q2-24 result right after ASML.

 

Although Capex spending was up, it was still just slightly above the maintenance investment level—the investment needed to maintain the deterioration of the existing manufacturing assets. TSMC is likely waiting for ASML’s High-NA tool to be available. ASML has confirmed they shipped one of these babies last quarter and installed another in Veldhoven on the joint IMEC/ASML manufacturing line. The tool is priced North of $350M, and ASML is trying to reach a production capacity of 20 systems annually during the 24/25 timeline.

Despite beating the guidance and reasonable growth, the ASML share price plunged in the stock market. Are the markets losing confidence in the Lithography leader?

What about China?

The key reason for the decline is the ASML result coincided with news that further export limitations are in the works.

Since the signing of the Chips Act, tool sales to China have exploded. While this could be expected, it seems like the US administration’s patience has run out.

The Chinese companies have not had access to the EUV systems since 2019, and the latest embargo, which began on September 23, banned sales of the immersion systems. This makes 80% of ASML’s products (from a revenue perspective) unavailable for Chinese customers.

As ASML has been allowed to ship the backlog, the effect has been delayed, and China still accounted for 49% of all tool sales in Q2-24.

This, however, is about to end abruptly as the Chinese backlog has been depleted.

The ASML backlog now reflects the embargo revenue view, and from now on, the Chinese revenue will fall to 20% of the total from the current level of 49%.

The potential new embargo will impact ASML’s service revenue, which is currently 24% of total revenue. Under a potential new embargo, ASML can lose the ability to service its Chinese customers, which is incredibly important for keeping the tools alive and productive. As the Chinese manufacturing base could deteriorate fast, this could create new opportunities for ASML as mature node capacity would grow outside China.

The longer-term view

With the likely dip in China business and a potential embargo impacting service revenue, investors are starting to panic and run away from ASML. It is worth noting that this is an amazing company founded on a philosophy of long-term cooperation with its suppliers and other stakeholders. Constant innovation drives higher productivity and tool pricing a reaching an alarming (for customers) increasing in price.

While each tool increases productivity, it is still a hefty price if you want to be at the bleeding edge of Semiconductor manufacturing.

The current ASML manufacturing plan will enable the company to deliver a 20B$+ quarter (at current pricing) at the end of 2026. This is not a given or a forecast and can be changed according to industry development. However, it is a very strong indication that the company has faith in the long-term future of the current strategy.

Our research is focused on the business results and not on investment advice. However, if you have faith in the long-term plan of ASML, it might be too early to dump ASML shares.

Also Read:

Will Semiconductor earnings live up to the Investor hype?

What if China doesn’t want TSMC’s factories but wants to take them out?

Blank Wafer Suppliers are not Totally Blank


Blue Cheetah Advancing Chiplet Interconnectivity #61DAC

Blue Cheetah Advancing Chiplet Interconnectivity #61DAC
by Daniel Payne on 07-18-2024 at 10:00 am

blue cheetah 61dac min

At #61DAC, I love it when an exhibitor booth uses a descriptive tagline to explain what they do, like when the Blue Cheetah booth displayed Advancing Chiplet Interconnectivity. Immediately, I knew that they were an IP provider focusing on chiplets. I learned what sets them apart is how customizable their IP is to support specific physical and system bandwidth requirements, how the interconnect IP is configured for cost-sensitive or high-performance cases, how the energy and performance are optimized from 32 Gb/s down to 8Gb/s and lower, being process-ready at nodes from 16nm to 3nm, and finally having been silicon-proven with reference board designs. I sat down with John Lupienski, VP Product Engineering at Blue Cheetah, to better understand what they were all about. John’s background covers roles at Cadence, Broadcom, and Motorola.

Blue Cheetah at #61DAC

Chiplet designers can opt for an industry-standard interconnect, such as UCIe or BOW, or something custom; Blue Cheetah supports either approach. Blue Cheetah is active with the emerging chiplet standards and is an active participant of both organizations. Smaller IO core area, lower energy per bit, tailor-fit designs are compelling reasons to talk with this IP vendor. The company can customize its IP links per each unique application and deliver solutions using advanced process technologies across multiple foundries and supporting standard and advanced packaging technologies. Its IP has been used in tape-outs for chiplet interconnects ranging from 16nm down to the 4nm node.

During DAC,  Baya Systems and Blue Cheetah announced their combined chiplet-optimized Network on Chip (NoC) and Physical Layer (PHY) interconnect IP offerings, making it easier and less risky to design with chiplets. Tenstorrent, announced in February that it uses the Blue Cheetah die-to-die interconnect IP for its AI and RISC-V products. Tenstorrent recently announced that it also uses Baya Systems’ NoC fabric IP.

The demonstration at the booth showed test packages integrating 12nm chiplets (availability announced in May 2023) with channel lengths spanning 2mm up to 25mm. Blue Cheetah’s customers’ develop products for a wide variety of end markets; in addition to Tenstorrent, publicly known examples of Blue Cheetah’s customers and partners include DreamBig Semiconductor, FLC, and Ventana Microsystems.

Blue Cheetah test chip, various channel lengths

The architecture of the interconnect IP is modular, making it quicker to port to newer process nodes. John mentioned that packaging for chiplets requires an engineer to perform SI/PI analysis, as customers often use an OSAT to assemble, and each chiplet can be fabricated at different nodes, so you really want interconnect IP that has been silicon-proven. To help get you started with chiplets, they offer reference boards and software to speed up the learning curve.

Summary

SoCs have been around for decades, while the trend of using chiplets has just started in the last several years. Blue Cheetah is a trailblazer in the industry and has solidified its position with high-speed, low-latency, power-efficient D2D BlueLynx™ interface products. The company’s standards-based and customizable IP solutions are available now in 16nm,12nm, 7nm, 6nm, 5nm, 4nm, 3nm, and below across multiple semiconductor foundries.

You can follow up with John directly or contact the company on its website for more info. The company appears at many events throughout the year, including DAC, Chiplet Summit, ISSCC, OCP Global Summit, SemIsreal Expo, and foundry events.

Related Blogs