DAC2025 SemiWiki 800x100

Application-Specific Lithography: Avoiding Stochastic Defects and Image Imbalance in 6-Track Cells

Application-Specific Lithography: Avoiding Stochastic Defects and Image Imbalance in 6-Track Cells
by Fred Chen on 02-07-2024 at 6:00 am

Application Specific Lithography

The discussion of any particular lithographic application often refers to imaging a single pitch, e.g., 30 nm pitch for a 5nm-family track metal scenario. However, it is always necessary to confirm the selected patterning techniques on the actual use case. The 7nm, 5nm, or 3nm 6-track cell has four minimum pitch tracks, flanked by larger width lines for ground and power rails. Figure 1 shows how a uniform 30 nm pitch exposure would lead to aggravated stochastic defect locations. When the illumination gives rise to fewer diffraction orders, a valley appears where we expect the rail peak, while image intensity imbalance occurs among the four inner trenches when more diffraction orders are included [1].

Figure 1. 30 nm pitch 6-track cell with exposed trenches (0.33NA EUV). The illumination angle (left) strongly affects both the optical image (center) and the expected stochastic defect location (right). The red circles indicate the stochastic defect hot spot locations.

When using a more absorbing metal oxide resist, unexposed trenches can be patterned. However, the aggravated stochastic defect locations are still present, and the image imbalance is worse, with pitch walking also evident (Figure 2).

Figure 2. 30 nm pitch 6-track cell with unexposed trenches (0.33NA EUV). The illumination angle (left) strongly affects both the optical image (center) and the expected stochastic defect location (right). The red circles indicate the stochastic defect hot spot locations. Red arrows indicate pitch walking (shift of peak positions).

The High-NA EUV system does not change the outcome [1], and is also not available. To avoid the above issues, the only one-mask solutions for patterning four 30 nm pitch tracks with the two large rails would be EUV Self-Aligned Double Patterning (SADP) and DUV Self-Aligned Quadruple Patterning (SAQP), starting with the same core cell pitch. The EUV-DUV difference of complexity and cost far exceeds the SAQP-SADP difference of processing complexity and cost, since SAQP has already long matured on mature tools [2,3]. The expected spacer layout and arrangement for the SAQP approach are shown in Figure 3.

Figure 3. Spacer layout and arrangement for DUV SAQP for the 6-track cell. The blue and red lines would be cut or trimmed separately.

The SAQP would then be followed by applying a cut mask at least twice, each time on alternating lines. The same dual cut mask approach has been applied as part of the well-known Self-Aligned Litho-Etch-Litho-Etch (SALELE) approach used with EUV, which uses four EUV masks in total [4]. The LELE EUV masks may also be substituted by DUV LELE with SADP at lower cost, though up to 4 extra DUV masks may be needed [5].

References

[1] F. Chen, “6-Track Cell Imaging in Low-NA and High-NA EUV Lithography,” https://www.youtube.com/watch?v=Z9MQuKrqLYw

[2] L-A. Ragnarsson et al., “The Environmental Impact of CMOS Logic Technologies,” 2022 EDTM.

[3] L. Liebmann et al., The daunting complexity of scaling to 7NM without EUV: Pushing DTCO to the extreme,” Proc. SPIE 9427, 942702 (2015).

[4] R. Venkatesan et al., Direct print EUV patterning of tight pitch metal layers for Intel 18A process technology node,” Proc. SPIE 12292, 1229202 (2022).

[5] F. Chen, https://www.linkedin.com/pulse/extension-duv-multipatterning-toward-3nm-frederick-chen; F. Chen, https://www.linkedin.com/pulse/beol-mask-reduction-using-spacer-defined-vias-cuts-frederick-chen

This article first appeared in LinkedIn Pulse: Application-Specific Lithography: Avoiding Stochastic Defects and Image Imbalance in 6-Track Cells

Also Read:

Non-EUV Exposures in EUV Lithography Systems Provide the Floor for Stochastic Defects in EUV Lithography

Application-Specific Lithography: Sense Amplifier and Sub-Wordline Driver Metal Patterning in DRAM

Is Intel cornering the market in ASML High NA tools? Not repeating EUV mistake


2024 Outlook with Thomas Gerner Nørgaard, Founder Comcores

2024 Outlook with Thomas Gerner Nørgaard, Founder Comcores
by Daniel Nenni on 02-06-2024 at 10:00 am

TGN Head View

SemiWiki has been working with Comcores for two years. They are very active in the semiconductor ecosystem and highly regarded.

Comcores specializes in providing digital intellectual property (IP) solutions and design services for communication and networking technologies. They focus on developing and delivering IP cores, which are pre-designed and pre-verified building blocks of digital systems, to help companies accelerate their product development in areas such as wireless communication, networking, and digital signal processing.

Comcores’ offerings often include IP cores for technologies like Ethernet, optical networking, 5G, and other communication standards. These cores are designed to be integrated into system-on-chip (SoC) solutions, helping companies reduce development time and costs while still meeting the requirements of modern communication systems.

Tell us a little bit about yourself and your company:
My name is Thomas Nørgaard, and I am a serial entrepreneur, with a keen interest in advanced technology. I founded Comcores in 2014, after my first venture Radiocomp. At Comcores, we are driven by creating technologies to enable the future. Our focus is to develop digital IP components & solutions that help reduce the barrier to developing new solutions by easy access to complex technology, lowering the total cost of integrating such functionality into new projects and providing the convenience of using a quality proven, interoperability tested and silicon agnostic solution.

What was the most exciting high point of 2023 for your company?
2023 was a very active and exciting year for us at Comcores. We worked on new complex projects within Time Sensitive Networking solutions and launched the new JESD204D IP. We are proud to be leading these areas and look forward to working with our customers in integrating these complex technologies.

What was the biggest challenge your company faced in 2023?
The continuous development of the TSN standards has been a difficult area to prioritize from a product development perspective for us. There are many standards and many profiles to account for and we want to make sure that our Ethernet switching solutions are meeting the latest profiles and features within this domain. Likewise, the software orchestration of these systems has been an area of intensive research.

How is your company’s work addressing this biggest challenge?
Comcores has launched multiple streams of development to meet the many profiles of TSN under development and has invested in expanding our teams with engineers having expertise in this field. To orchestrate these systems Comcores also has developed its first version of a Centralized Network Controller to handle the QoS policies in the TSN networks.

What are your expectations for 2024 and your plans for the trends?
2024 will hold a lot of exciting product launches that will contribute to anchoring our status as a leader of IP solutions in our respective areas. Ethernet Solutions is a particular area of interest where we will launch many new solutions with an enhanced content of software. Likewise, we will continue to launch support for new standards and features within JESD, Ethernet, MIPI and eCPRI. We have a lot to look forward to in 2024, and we see it being a fantastic year. We are perfectly poised to support our customers in developing their new solutions and help mature them.

About Comcores
We are a key supplier of digital IP Cores and solutions with a focus on EthernetWireless and Chip-to-chip interfaces. We offer both  Stand-Alone IP components and Ethernet systems for ASIC & FPGA. Our solutions can be customized to meet your design specifications. By purchasing our IPs you drastically reduce your product cost, risk, and time to market. We enable you to focus on your competences.

We pride ourselves on providing the best-in-class, quality IP components and systems to ASIC, FPGA, and System vendors. Our core competence is the development and thorough validation of the IPs. 

Comcores, headquartered north of Copenhagen in Kgs. Our long-term background in mobile communication and being a first-mover in remote radio heads along with expertise in wireless networks gives us a solid foundation for understanding the complex requirements of modern communication tasks.

This know-how coupled with 40+ dedicated coworkers / employees/ professionals help us define and build state-of-the-art high-quality solutions.

Also Read:

WEBINAR: Understanding TSN and its use cases for Aviation, Aerospace and Defence

JESD204D: Expert insights into what we Expect and how to Prepare for the upcoming Standard

WEBINAR: O-RAN Fronthaul Transport Security using MACsec


proteanTecs Addresses Growing Power Consumption Challenge with New Power Reduction Solution

proteanTecs Addresses Growing Power Consumption Challenge with New Power Reduction Solution
by Mike Gianfagna on 02-06-2024 at 6:00 am

proteanTecs Addresses Growing Power Consumption Challenge with New Power Reduction Solution

proteanTecs is a unique company, delivering electronics visibility from within. Its core mission is to enable the electronics industry to continue to scale. The company achieves this goal by first embedding on-chip monitors, called Agents, during the design process to generate deep data on the chip’s profiling, health, and performance. Machine-learning algorithms process this on-chip data throughout the device’s lifecycle, in both test and mission modes. Users gain deep data analytics with actionable insights and alerts on a cloud-based platform with edge applications on the automated test equipment (ATE) or embedded on the board itself.

It’s well-known that power optimization is a huge and growing problem across many markets, so the application of this unique technology to this problem holds great promise. Read on to learn how proteanTecs is addressing the power consumption challenge.

Overview of the Power Reduction Solution

Recently, proteanTecs announced a new power reduction solution aimed at high-performance markets. Using its proven technology, chip makers and system operators can now optimize power and performance without the risk of system failure. This unique capability uses on-chip telemetry, machine learning and predictive analytics to enable workload-aware system-on-chip (SoC) power reduction during production testing and in-field operations. 

For in-field power savings, proteanTecs introduced AVS Pro™,  a real-time application for functional-workload aware adaptive voltage scaling (AVS). Based on timing margin Agents, AVS Pro leverages excessive guard-bands to reduce power while guaranteeing failure prevention. This capability is silicon-proven in leading-edge technologies and has enabled customers to reduce power consumption by an average of 8-14%.

For power reduction during production, proteanTecs offers prediction-based VDDmin optimization per individual chip and system, with deep data analytics for process grading. These applications are deployed on the tester with advanced analysis on the proteanTecs cloud platform. Partnering with leading ATE vendors, parametric power and performance visibility for inline decision making is enabled.

The technology finds application across a broad range of reliability-critical markets, such as cloud computing, mobile, telecommunications and automotive. The figure below provides an overview of how the pieces fit together.

Case Study of Use in a Real Design

Results in a real design project are the most compelling proof for any new optimization technology. proteanTecs has made a detailed case study available to demonstrate impact. Some key details and achievements disclosed in this case study include:

  • The customer is a fabless chipmaker making 5nm networking chips for datacenters
  • The challenge faced by this customer was high power consumption due to excessive voltage guard-bands
  • Using proteanTecs AVS Pro, power optimization with a safety-net was achieved in real-time
  • The result was a 12.5% dynamic power reduction which translates into more than $5M per year energy cost savings for hyperscale cloud vendors
  • An 11% performance increase was also achieved because of a higher utilization rate per system

There is more to this impressive story. You can download a complete copy of the case study here.

To Learn More about proteanTecs AVS Pro

Application-specific power performance optimization based on chip telemetry embodies many moving parts to realize the complete solution. The proteanTecs’ AVS Pro application monitors the margin to timing failure of millions of real paths, in real time, under real workloads, to reduce voltage to the lowest point that still allows error-free functionality. It also provides an inherent safety-net to prevent failures when events like voltage drops occur, while enabling fast frequency and voltage scaling.

The company has published a comprehensive white paper that dives into the details of how its technologies are combined to deliver these results. The piece covers the power, performance, reliability equilibrium that can be achieved with properly developed guard bands. The various approaches to dynamic power management are also discussed. The AVS Pro solution is explained in detail to illustrate how significant power reduction can be achieved. You can download your copy of this very informative white paper here.

Calculate Your Savings

If you visit the power reduction solution web page, there is a calculator to help you estimate the improvements that are possible for CPU, GPU and AI accelerator chips in the datacenter. If power optimization is a concern for a current or next design, I highly recommend you check out the savings that are possible. 

Also Read:

Fail-Safe Electronics For Automotive

Building Reliability into Advanced Automotive Electronics

Unlocking the Power of Data: Enabling a Safer Future for Automotive Systems


2024 Outlook with Toshio Nakama of S2C

2024 Outlook with Toshio Nakama of S2C
by Daniel Nenni on 02-05-2024 at 10:00 am

Precision Chip Strategy, PCS

Tell us a little bit about yourself and your company.
I am Toshio Nakama, the founder and CEO of S2C and a strong advocate of FPGA accelerated ASIC/SoC design methodology. I first started my career at Altera as an FAE and served in technical and sales management roles at Aptix Corporation. I co-founded S2C in Silicon Valley in 2003 and established R&D and manufacturing teams in Shanghai, China in 2004.

S2C is a leading global supplier of digital EDA solutions for today’s innovative SoC and ASIC designs, now with the second largest share of the global prototyping market. S2C has been successfully delivering rapid SoC prototyping solutions for 20 years.

With over 600 customers, including 6 of the world’s top 10 semiconductor companies, our world-class engineering team and customer-centric sales team are experts at addressing our customer’s SoC and ASIC verification needs. S2C has offices and sales representatives in the US, Europe, mainland China, Hong Kong, Korea and Japan.

What was the most exciting high point of 2023 for your company?
Our 20th Anniversary! An exciting time for the company. Also, new products coming including OminiArk emulation, PegaSim simulation, and Claryti debugger

What was the biggest challenge your company faced in 2023?
New applications  and new technologies are coming quickly. In terms of new applications, the industry is experiencing growth in various areas, including market expansion and the emergence of technologies like RISC-V, Chiplet, and AI. New technology trends, including fresh architectures, evolving standards, changing requirements, and innovative ideas, continue to push the industry ahead.

How is your company’s work addressing this biggest challenge?
Delivering solutions tailored to meet customer demands for new technologies and applications. Collaborating with environmentally conscious partners to create a mutually beneficial ecosystem.

What do you think the biggest growth area for 2024 will be, and why?
New applications and technologies will be the primary growth area in 2024. Chiplets and AI specifically and newly developed IP such as RISC-V and the many variations. We are also seeing growth in the ARM ecosystem. With software being a barrier for systems companies prototyping is even more important so software can be developed in concert with hardware. 2024 will definitely be a growth year.

How is your company’s work addressing this growth?
The recently introduced S2C Precision Chip Strategy:

Heterogeneous verification method accelerates large-scale digital circuit, boosting user productivity. By seamlessly combining various products line, S2C empowers designers to tackle complex challenges more efficiently, ultimately leading to more robust and optimized designs cyclical.

Parallel Drive & Shift-left Method. S2C assists our valued customers in two crucial aspects of their projects: designing the right chip and designing the chip right.

What conferences did you attend in 2023 and how was the traffic?
We were very active in conferences around the world. DAC and DVCon are a long tradition with our company. We are also participating in RISC-V conferences and other regional events. In 2024 we expect even more conference activity around the world with much higher attendance. It should be a very big growth year for S2C and the semiconductor industry.

Final Comment?
Thank you very much for this opportunity. SemiWiki  is a valued partner we look forward to seeing you again at DVCon Next month.

Also Read:

Prototyping Chiplets from the Desktop!

S2C’s FPGA Prototyping Accelerates the Iteration of XiangShan RISC-V Processor

ViShare’s Rapid Market Entry with FPGA-Based Prototyping Solution from S2C


Expedera Proposes Stable Diffusion as Benchmark for Edge Hardware for AI

Expedera Proposes Stable Diffusion as Benchmark for Edge Hardware for AI
by Bernard Murphy on 02-05-2024 at 6:00 am

Stable diffusion image min

A recent TechSpot article suggests that Apple is moving cautiously towards release of some kind of generative AI, possibly with iOS 18 and A17 Pro. This is interesting not just for Apple users like me but also for broader validation of a real mobile opportunity for generative AI. Which honestly had not seemed like a given, for multiple reasons. Finding a balance between performance and memory demand looks daunting for models baselining at a billion or more parameters. Will power drain be a problem? Then there are legal and hallucination issues, which perhaps could be managed through carefully limited use models. Despite the apparent challenges, I find it encouraging that a company which tends to be more thoughtful about product releases than most sees a possible path to success. If they can then so can others, which makes a recent blog from Expedera enlightening for me.

A quick recap on generative image creation

Generative imaging AI is a field whose opportunities are only just starting to be explored. We’re already used to changing our backgrounds for Zoom/Google Meet calls, but generative AI takes this much further. Now we can re-image ourselves in different costumes with different features in imaginary settings – a huge market for image-conscious consumers. More practically, we should be able to virtually try on clothing before we buy or explore options when remodeling a kitchen or bathroom. This technology is already available in the cloud (for example Bing Image Creator) but with all the downsides of cloud-based services, particularly in privacy and cost. Most consumers want to interact with such services through mobile devices; a better solution would be local AI embedded in those platforms. Generative AI through the open-source Stable Diffusion model is a good proxy for hardware platforms to serve this need and more generally for LLM models based on similar core technologies.

Can on-board memory and performance be balanced at the edge?

First, we need to understand the Stable Diffusion pipeline. This starts with a text encoder to process a prompt (“I want to see a pirate ship floating upside down above a sea of green jello”). That step is followed by a de-noising neural net which handles the diffusion part of the algorithm, through multiple iterations creating information for a final image from trained parameters. I think of this as a kind of inverse to conventional image recognition, matching between prompt requirements and the training to create a synthesized match to the prompt. Finally a decoder stage renders the image from the data constructed in the previous step. Each of these stages is a transformer model.

The Expedera blog author, Pat Donnelly (Solutions Architect), gives a detailed breakdown of parameters, operations and data moves required throughout the algorithm which I won’t attempt to replicate here. What stood out for me was the huge number of data moves. Yet he assumes only an 8MB working memory based on requirements he’s seeing with customers rather than optimal throughput. When I asked him about this, he said that operation would clearly depend on a DDR interface to manage the bulk of this activity.

This is a switch from one school of thought I have heard – that model execution must keep everything in local memory to meet performance requirements. But that would require an unreasonably large onboard SRAM. DRAM makes sense for handling the capacity, but another school of thought suggests that no one would want to put that much DRAM in a mobile device. That would be too expensive. Also slow and power hungry.

DRAM or some other kind of off-chip memory makes more sense but what about the cost problem? See the above reference on Apple. Apparently they may be considering flash memory so perhaps this approach isn’t so wild. What about performance? Pat told me that for Stable Diffusion 1.5, assuming an 8K MAC engine with 7 MB internal memory and running at 750 MHz with 12 GBps external memory bandwidth, they can process 9.24 images/second through the de-noiser and 3.29 images/second through the decoder network. That’s very respectable consumer-ready performance. Power is always tricky to pin down since it depends on so many factors, but numbers I have seen suggest this should also be fine for expected consumer use models.

A very useful insight. Seems like we should lay to rest the theory that big transformer AI for the edge cannot depend on off-chip memory. Again you can read the Expedera blog HERE.


What ChatGPT has to say about the Chiplet Summit

What ChatGPT has to say about the Chiplet Summit
by Daniel Nenni on 02-04-2024 at 12:00 pm

Chiplet

Quite a few of the articles I now see about the semiconductor industry use AI. It is not hard to tell, especially for someone like myself, a 40 year experienced semiconductor professional who also writes. In the last 13 years (yes SemiWiki is now a teenager) we have published 9,058 blogs. SemiWiki was the first of now many semiconductor blogging sites and it is an honor to work with so many talented people on both sides of the table.

I already did a post on the Chiplet Summit but since chiplets are the next big semiconductor thing I thought it would be fun to see what ChatGPT had to say.

ChatGPT: Write and article about the Chiplet Summit:

Exploring the Future of Semiconductor Innovation: A Recap of the Chiplet Summit

Introduction: The technology landscape is constantly evolving, and one of the key drivers behind this evolution is the semiconductor industry. In recent times, a groundbreaking concept has been gaining traction in the semiconductor community – chiplets.  of chiplets in shaping the future of semiconductor design.

Understanding Chiplets: At its core, a chiplet is a modular component of a semiconductor design that serves a specific function. Rather than creating monolithic chips that encompass all functionalities, chiplets allow for the assembly of smaller, specialized components into a cohesive system. This modular approach has the potential to revolutionize the semiconductor industry by offering greater flexibility, scalability, and efficiency in the design and manufacturing processes.

Key Themes Explored at the Chiplet Summit:
  1. Interoperability and Standardization: One of the primary discussions at the Chiplet Summit revolves around establishing standards for chiplet design and interoperability. Creating a common framework ensures that chiplets from different manufacturers can seamlessly integrate into a single system. Standardization not only simplifies the design process but also encourages collaboration among industry players.
  2. Scalability and Customization: Chiplets pave the way for scalable semiconductor architectures, allowing designers to mix and match components based on specific application requirements. This scalability enables the creation of highly customized solutions for various industries, from consumer electronics to data centers. Attendees at the summit explore the potential applications and advantages of scalable chiplet architectures.
  3. Economic and Environmental Impacts: Chiplets have the potential to significantly reduce the economic and environmental costs associated with semiconductor manufacturing. By reusing proven chiplet designs across different products, manufacturers can streamline their production processes and minimize waste. The summit addresses how chiplets contribute to a more sustainable and cost-effective semiconductor ecosystem.
  4. Innovations in Integration Technologies: The Chiplet Summit serves as a showcase for the latest advancements in integration technologies. From advanced packaging techniques to novel interconnect solutions, participants explore the cutting-edge technologies that enable seamless integration of chiplets. These innovations play a crucial role in realizing the full potential of chiplet-based designs.
  5. Challenges and Solutions: While chiplets offer tremendous opportunities, they also present challenges, such as ensuring robust communication between chiplets, managing thermal issues, and addressing security concerns. The summit provides a platform for experts to discuss these challenges and collaboratively work towards viable solutions.

Conclusion: The Chiplet Summit stands at the forefront of semiconductor innovation, bringing together visionaries and thought leaders to explore the vast potential of chiplet technology. As the industry continues to embrace this modular approach to semiconductor design, the summit plays a pivotal role in shaping the future of electronics. With ongoing advancements and collaborative efforts, chiplets are poised to revolutionize the way we perceive and create semiconductor devices, unlocking new possibilities for technological innovation.

End:

In my opinion AI will in fact replace most of the people who write “popular”  semiconductor articles because ChatGPT in effect does what they do, re hash what has already been written. SemiWiki bloggers however share their semiconductor experience, observations, and opinions so we are safe, for now.

Also Read:

Chiplets Open Pandora’s Box

How Disruptive will Chiplets be for Intel and TSMC?

Will Chiplet Adoption Mimic IP Adoption?

UCIe InterOp Testchip Unleashes Growth of Open Chiplet Ecosystem


Podcast EP206: An assessment of the Chiplet Ecosystem Movement with Alphawave’s Tony Chan Carusone

Podcast EP206: An assessment of the Chiplet Ecosystem Movement with Alphawave’s Tony Chan Carusone
by Daniel Nenni on 02-02-2024 at 10:00 am

Dan is joined by Tony Chan Carusone, Chief Technology Officer of Alphawave Semi. Tony has been a professor of Electrical and Computer Engineering at the University of Toronto since 2001 and has served as a Distinguished Lecturer for the IEEE Solid-State Circuits Society and on the Technical Program Committees of the world’s leading circuits conferences.

He co-authored the classic textbooks “Analog Integrated Circuit Design” and “Microelectronic Circuits” and he is a Fellow of the IEEE. Tony has also been a consultant to the semiconductor industry for over 20 years, working with both startups and some of the largest technology companies around the world.

Dan and Tony discuss the emerging movement toward a chiplet ecosystem. Tony describes some of the significant benefits of this approach, including ease of prototyping, power reduction and achieving density beyond the reticle limit. The flexibility offered by a chiplet ecosystem and the ability to more easily integrate optical interconnect are also discussed.

Tony describes Alphawave’s commitment to building a chiplet ecosystem, the technology and products that the company offers and an assessment of what the future holds. He offers a realistic view of how the chiplet ecosystem will mature that is based on past achievements in the industry.

The views, thoughts, and opinions expressed in these podcasts belong solely to the speaker, and not to the speaker’s employer, organization, committee or any other group or individual.


CEO Interview: Jay Dawani of Lemurian Labs

CEO Interview: Jay Dawani of Lemurian Labs
by Daniel Nenni on 02-02-2024 at 6:00 am

Full Slide LL Spatial Processing Unit Architecture (1)

Jay Dawani is the co-founder and CEO at Lemurian Labs, a startup developing an accelerated computing platform tailored specifically for AI applications. The platform breaks through the hardware barriers to make AI development faster, cheaper, more sustainable, and accessible to more than just a few companies.

Prior to founding Lemurian, Jay founded two other companies in the AI space. He is also the author of the top-rated “Mathematics for Deep Learning.”

An expert across artificial intelligence, robotics and mathematics, Jay has served as the CTO of BlocPlay, a public company building a blockchain-based gaming platform, and served as Director of AI at GEC, where he led the development of several client projects covering areas from retail, algorithmic trading, protein folding, robots for space exploration, recommendation systems, and more. In his spare time, he has also been an advisor at NASA Frontier Development Lab, Spacebit and SiaClassic.

The last time we featured Lemurian Labs you were focused on robotics and edge AI. Now you’re focused on data center and cloud infrastructure. What happened that made you want to pivot?

Indeed, we did transition from focusing on building a high performance, low latency, system-on-chip for autonomous robotics applications that could accelerate the entire sense-plan-act loop to building a domain specific accelerator for AI focusing on datacenter-scale applications. But it wasn’t just an ordinary pivot; it was a clarion call we felt we had the responsibility to answer.

In 2018, we were working on training a $2.1 billion parameter model, but we abandoned the effort because the cost was so extraordinarily high that we couldn’t justify it. So imagine my surprise that GPT3, which OpenAI released as ChatGPT in November 2022, was a $175 billion parameter model. This model is more than 80X larger than what we were working on merely 4 years earlier, which is both exciting and frightening.

The cost of training such a model is staggering, to say the least. Based on current scaling trends, we can expect the cost of training a frontier AI model to exceed a billion dollars in the not too distant future. While the capabilities of these models will be astounding, the cost is ridiculously high. Based on this trajectory, only a handful of very well resourced companies with their own datacenters will be able to afford to train, deploy and fine-tune these models. This isn’t purely because compute is expensive and power hungry, but also because the software stacks we rely on were not built for this world.

Because of geographical and energy constraints, there are only so many places to build datacenters. To meet the compute demands of AI, we need to be able to build zettascale machines without requiring 20 nuclear reactors to power it. We need a more practical, scalable and economical solution. We looked around and didn’t see anyone on a path to solving this. And so, we went to the drawing board to look at the problem holistically as a system of systems and reason about a solution from first principles. We asked ourselves, how would we design the full stack, from software to hardware, if we had to economically serve 10 billion LLM queries a day. We’ve set our sights on a zettascale machine in under 200MW, by 2028.

The trick is to look at it from the point of view of incommensurate scaling –  different parts of a system follow different scaling rules, so at some point things just stop working, start breaking or the cost benefit tradeoff no longer makes sense. When this happens, the only option is to redesign the system. Our assessment and solution encompasses the workload, number system, programming model, compiler, runtime and hardware holistically.

Thankfully, our existing investors and the rest of the market see the vision, and we raised a $9M seed round to develop our number format – PAL, to explore the design space and converge on an architecture for our domain specific accelerator, and architect our compiler and runtime. In simulations, we’ve been able to achieve a 20X throughput gain in the smaller energy footprint than modern GPUs, and are projecting to be able to deliver an 8X benefit in system performance for total cost of ownership on the same transistor technology.

Needless to say, we’ve got a lot of work ahead of us, but we’re pretty excited about the prospect of being able to redefine datacenter economics to ensure a future where AI is abundantly available to everyone.

That certainly sounds exciting and those numbers sound impressive. But you have mentioned number systems, hardware, compilers and runtimes as all the things you’re focused on – it sounds like a lot for any company to take on at once. It seems like a very risky proposition. Aren’t startups supposed to be more focused? 

It does sound like a lot of different efforts, but it is, in fact, one effort with a lot of interconnected parts. Solving only one of these components in isolation of the others will only hinder the potential for innovation because it results in overlooking the systemic inefficiencies and bottlenecks. Jensen Huang said it best, “In order to be an accelerated computing company, you have to be a full stack company”, and I fully agree. They are the current market leader for a reason. But I would challenge the notion that we are not focused. It is in how we think about the problem holistically and how to best solve it for our customers, is where our focus is.

Doing that requires a multidisciplinary approach like ours. Each part of our work informs and supports the others, enabling us to create a solution that is far more than the sum of its parts. Imagine if you had to build a racecar. You wouldn’t arbitrarily pick a chassis, add racing tires and drop in the most powerful engine you can find and race it, right? You would think about the aerodynamicity of the car’s body to reduce drag and enhance downforce, optimize the weight distribution for good handling, custom design the engine for maximum performance, get a cooling system to prevent overheating, spec a roll cage to keep the driver safe, etc. Each one of these elements builds upon and informs the other.

That said, it is risky to try and do all of it at once for any company in any industry. To manage the risks we are taking a phased approach, allowing us to validate our technology with customers and adjust our strategy as needed. We have proven our number format works and that it has better power-performance-area than equivalent floating point types, while also having better numerical properties which make it easier to quantize neural networks to smaller bit-widths. We have designed an architecture which we feel confident in, and it is suitable for both training and inference. But more important than all of that is getting the software right, and that is the bulk of our immediate focus. We need to ensure we make the right decisions in our software stack for where we see the world being a year or two or more from today.

Building a hardware company is tough, expensive and takes a long time. The focus on software first sounds like a very viable business on its own, and potentially more appealing to investors in the current climate. Why are you also doing hardware given so many well-funded companies in the space are closing their doors, struggling to get adoption with customers and larger players  are building their own hardware?

You’re absolutely correct that software businesses have generally been able to raise capital much more easily than hardware companies, and that hardware is very tough. Our current focus is very much on software because that’s where we see the bigger problem. Let me be clear, the problem isn’t whether I can get kernels running on a CPU or GPU with high performance; that’s a long solved problem. The problem of today is how do we make it easier for developers to get more performance, productively out of several thousand node clusters made up of heterogeneous compute without asking them to overhaul their workflow.

That’s the problem we’re currently focused on solving with a software stack that gives developers superpowers and unlocks the full capability of warehouse scale computers, so we can more economically train and deploy AI models.

Now, regarding investment, yes, VCs are being more selective in the kind of companies they back, but it also means VCs are looking for companies with the potential to offer truly groundbreaking products that have a clear path to commercialization while having significant impact. We’ve learned from the challenges and mistakes of others and have actively designed our business model and roadmap to address the risks. It’s also important to take note that what’s made startups successful has rarely been how easily they can raise VC funding, but has more to do with their resourcefulness, stubbornness and customer focus.

And before you ask, we are still working on hardware, but primarily in simulation right now. We don’t intend to tape out for a while. But we can save that conversation for another time.

That is certainly compelling and your phased approach is very different compared with what we’ve seen other hardware companies do. I understand the problem you’re saying your software stack will address, but how does your software differentiate from the various efforts in the market?

Most of the companies you’re referring to are focusing on making it easier to program GPUs by introducing tile-based or task-mapping programming models to get more performance out of GPUs, or building new programming languages to get high performance kernels scheduled on different platforms with support for in-line assembly. Those are important problems that they’re addressing, but we see the problem we’re solving as almost orthogonal.

Let’s for a moment think about the cadence of hardware and software transitions. Single-core architectures gained performance from clock speed and transistor density, but eventually clock speeds hit a plateau. Parallelism using many cores circumvented this and provided sizable speedups. It took software roughly a decade to catch up, because programming models, compilers and runtimes had to be rethought to help developers extract the value in this paradigm. Then, GPUs started becoming general purpose accelerators, again with a different programming model. Again, it took almost a decade for developers to extract value here.

Again, hardware is hitting a plateau – Moore’s law, energy and thermal constraints, memory bottlenecks, and the diversity of workloads plus the need for exponentially more compute is pushing us towards building increasingly heterogeneous computer architectures for better performance, efficiency and total cost. This shift in hardware will of course create challenges for software because we don’t have the right compilers and runtimes to support the next evolution of computing. This time though, we shouldn’t have to wait another 10 years for software to extract the value of heterogeneous architectures or large clusters, especially when they are going more than 80% unutilized.

What we’re focusing on is building a heterogeneity-aware programming model with task-based parallelism, addressing portable performance with cross processor optimizations, context-aware compilation and dynamic resource allocation. And for us, it doesn’t matter whether it’s a CPU, GPU, TPU, SPU (Lemurian’s architecture) or a mesh of all of them. I know that sounds like a lot of fancy words, but what it’s really saying is that we’ve made it possible to program any kind of processor with a single approach, and we can port code from one kind of processor over to another with minimal effort without needing to sacrifice performance, and schedule work adaptively and dynamically across nodes.

If what you say is true you may just completely redefine computing. Let’s talk about funding. You raised $9 million in seed funding last year which signifies strong investor support and belief in your vision. What have you done since?

Over the past year, fueled by the seed funding, we have made significant strides. With our team now at 20 members, we meticulously addressed challenges, engaged with customers and refined our approach.

We focused on enhancing PAL for training and inference, explored computer architecture for our accelerator and developed a simulator for performance metrics. Simultaneously, we reimagined our software stack for datacenter applications, emphasizing heterogeneous computing.

This effort resulted in a well-defined architecture, showcasing PAL’s efficacy for AI at scale. Beyond tech advancements, we pursued collaborations and outreach to democratize access. These efforts position Lemurian Labs to address immediate customer challenges, gearing up for the release of our production silicon.

What are Lemurian Labs’ medium-term plans regarding software stack development, collaborations, and the improvement of the accelerator’s architecture?

Our immediate goal is to create a software stack that targets CPUs, GPUs and our AI accelerators with portable performance, which will be made available to early partners at the end of the year. We’re currently in conversations with most of the leading semiconductor companies, cloud service providers, hyperscalers and AI companies to give them access to our compiler and runtime. In parallel, we continue to work on and improve our accelerator’s architecture for a truly co-designed system of hardware and software. And of course, we have just started raising our series A with very strong interest from the investor community, which will enable us to grow our team and meet our target for software product delivery at the end of the year.

In closing, how do you see Lemurian Labs contributing to changing the landscape of AI development, accessibility and equity in the coming years?

We didn’t set out to redefine computing only for commercial gain or for the fun of it. As Lemurians, our driving force is that we believe in the transformative potential of AI and that more than just a few companies should have the resources to define the future of this technology and how we use it. We also don’t find it acceptable that the datacenter infrastructure for AI is on track to consume as much as 20% of the world’s energy by 2030. We all came together because we believe there is a better path forward for society if we can make AI more accessible by dramatically lowering its associated cost, accelerate the pace of innovation in AI and broaden its impact. By addressing the challenges of current hardware infrastructure, we seek to pave the path to empowering a billion people with the capabilities of AI, ensuring equitable distribution of this advanced technology. We hope our commitment to product-focused solutions, collaboration and continuous innovation positions us as a driving force in shaping the future of AI development to be a positive one.

Also Read:

Luc Burgun: EDA CEO, Now French Startup Investor

CEO Interview: Sridhar Joshi of TenXer

CEO Interview: Suresh Sugumar of Mastiska AI


2024 Outlook with Niels Faché of Keysight EDA

2024 Outlook with Niels Faché of Keysight EDA
by Daniel Nenni on 02-01-2024 at 10:00 am

Keysite EDA EVent

We have been working with Keysight EDA for the past two years and it has been an honor. The technical depth we have reached with them is impressive. Niels Faché, VP & GM, Keysight EDA, is responsible for Keysight’s design and simulation portfolio. Niels has been part of the Keysight-Agilent-HP family since 1994 when HP acquired Alphabit, a company he co-founded.

Tell us a little bit about Keysight EDA.
Keysight EDA is the design and simulation business inside Keysight Technologies test and measurement business. Our EDA solutions are focused on high speed, high frequency design problems. They include RF and microwave circuit and system design for MMICs and RFICs and modules, phased arrays, wireless and satellite systems, high-speed digital design for signal integrity, power integrity of complex modules and boards, memory and interconnect design, semiconductor device modeling, and power electronics, and of course process and data management (Cliosoft).

What was the most exciting high point of 2023 for your company?
The acquisition of Cliosoft in the design data and IP management space. Progress on partnerships with Synopsys, Ansys, Altium, and TSMC, Samsung on the foundry side. Introduction of new capabilities in our software suite that support mmWave workflows, Python automation, AI/ML in device modeling and more.

What was the biggest challenge your company faced in 2023?
Being recognized as a top-tier rather than niche EDA industry player. We made headway on this with our involvement in the ESD Alliance and showing at DAC.

How is your company’s work addressing this biggest challenge?
We are broadening our product portfolio to appeal to a larger audience. We are addressing customer pain points with design complexity and time to market requirements by focusing on connected design and test workflows and shift left from physical to virtual prototypes where issues can be resolved earlier and more cost effectively.

What do you think the biggest growth area for 2024 will be, and why?
In 2024, engineers will continue embracing shift left with their electronic product development cycles. As design moves from the physical into the virtual space, engineers are able to quickly discover and fix problems in the most efficient manner, providing greater insights and performance improvements. The next few years will see a continuing emphasis on connecting design and test workflows to handle rising complexity and more demanding time-to-market requirements for electronic products in wireless, wired, aerospace/defense, and other industries. Here are the emerging electronic design innovations we see for 2024:

3DIC and Heterogeneous Chiplets: New Standards Come into View
New standards such as UCIe are emerging for the creation of chiplets and the disaggregation of system-on-chip designs into smaller pieces of intellectual property that can be assembled into 2.5D and 3D integrated circuits using advanced packaging. For designers to accurately simulate die-to-die physical layer interconnect, it will require high-speed, high-frequency channel simulation to UCIe and other standards.

EDA Turns to AI: From Complexity to Clarity
The application of AI and ML techniques in EDA is still in the early adopter phase, with design engineers exploring use cases to simplify complex problems. The intelligence is particularly valuable in model development and validation for simulation, where it assists in processing large volumes of data. In 2024, organizations will increasingly adopt both technologies for device modeling of silicon and III-V semiconductor process technologies, as well as system modeling for forthcoming standards such as 6G, where research is well underway.

Software Automation Empowers Engineers
As Moore’s Law reaches its limits, improving design processes through workflow automation will provide a pathway to increasing the productivity of design engineers. In 2024, software automation techniques, such as Python APIs, will take a more significant role in integrating “best-in-class” tools into open, interoperable design and test ecosystems.

Navigating the Digital Shift: Design Management Essentials
With the creation of digital enterprise workflows, many organizations are investing in design management across tool sets, data, and IP. Moving forward, design data and IP management software will play a critical role in the success of complex SoC and heterogeneous chiplet designs supporting large, geographically distributed teams. Creation of digital threads between requirements definition and compliance, and establishing tighter links with enterprise systems such as PLM also play a role in digital transformation of product development cycles.

Next-Gen Quantum Design: Optimizing System Performance
Quantum computing is advancing at a rapid pace and is transitioning from predominantly free research tools to commercial products and workflows in quantum design. Next-generation quantum design will require more integrated simulation workflows that provide developers with fast and accurate capabilities to optimize system performance.

Silicon Photonics Fuels Data Center Transformation
Data centers are evolving to provide higher compute performance to support the exponential growth in AI and ML workloads, as well as the need for more efficient power utilization and thermal management. Silicon photonics will play a critical role in accelerating the transformation of data centers to meet the appetite for compute performance. As design engineers develop high-speed data center chips that incorporate silicon photonics interconnect, they will need process design kits (PDKs) and accurate simulation models that support the advanced development work.

How is your company’s work addressing this growth?
We work closely with customers to solve their design, simulation, and workflow problems. We focus on growth sub-segments of the market and are seeing strong growth in our high-speed digital business. We are launching new simulation engines and workflows, for example focused on UCIe simulation for chiplets. We incorporate measurement science into our products to make them more accurate.

What conferences did you attend in 2023 and how was the traffic?
DesignCon, OFC, TSMC Tech Symposium and OIP Forum, DAC, IMS, European Microwave Week and others. DAC was the highlight for us last year. We had a very strong presence with many customer meetings. It was a big success leveraging Cliosoft’s past approach to DAC for Keysight EDA. Our EDA business is relationship driven and we value show participation as a way to foster those relationships and initiate new relationships. We will be participating in the same conferences this year in addition to the Chiplet Summit and SNUG.

Also Read:

Higher-order QAM and smarter workflows in VSA 2023

Advanced EM simulations target conducted EMI and transients

Seeing 1/f noise more accurately


Cadence Claims the CFD High Ground with a New GPU-Based Accelerator

Cadence Claims the CFD High Ground with a New GPU-Based Accelerator
by Bernard Murphy on 02-01-2024 at 6:00 am

J24135 Millennium Press Image 400x400 min

For observers of EDA markets there is an easily overlooked opportunity for new growth. Today around 50% of EDA revenues come from systems rather than semiconductor companies, from datacenters to automotive, aerospace, energy, and others. In most of these industries total system design depends as much on mechanical and other multiphysics optimizations (aerodynamics, stress, thermal, electromagnetics, etc.) as on electronic design. Multiphysics analysis has already penetrated semiconductor design, for example in-package to in-system thermal analysis and management using computational fluid dynamics (CFD) for cooling analytics. In short, multiphysics bridges between electronic system design and total system design as critical to support power generation, airline, and automotive markets. As in chip design, system problems in these domains keep getting harder, demanding active and continuous innovation from solution providers to address modern design needs. Exploiting synergies between EDA and multiphysics expertise, Cadence claims the Millennium platform delivers a breathtaking performance advance for multiphysics analytics, solving industrial scale problems in hours rather than weeks and opening large new growth opportunities.

A Fast Layman’s Guide to CFD

CFD simulates the flow of a fluid (liquid or gas) around/through mechanical structures like circuit boards, aircraft, gas turbines and cars. Without CFD, these measurements must be made on prototypes, for example in wind tunnels, an expensive and time-consuming process. With CFD, engineers can shift-left (a familiar concept in EDA), to study performance of a digital twin against simulated fluid flows.

Simulations are based on the Navier-Stokes differential equation, mapped across discrete meshes to enable numerical solving. Meshes are designed for finer spacing around critical zones with coarser spacing elsewhere and commonly run to many millions of elements. Factors considered in solving across the mesh include pressure and temperature, also viscosity because all fluids flow slower near boundaries. Compressibility can be important when considering acoustics or Mach speeds; turbulence is another factor at high speeds. These factors have enough impact on mesh and solver methods that CFD must provide a family of technology solutions.

Turbulence is the most challenging condition to simulate accurately. The most widely practiced technique in industry today develops static averages, a weak approximation for a dynamic phenomenon able to deliver accurate CFD around an airplane wing at cruising altitude but not during ascent or descent. A different technique called Large Eddy Simulation (LES) can model much more accurately and dynamically but is more expensive in computation, making extensive turbulence modeling through a digital twin impractical. Thus critical analyses have been limited to real physical modeling using prototypes in wind tunnels, effective but too cumbersome to explore thousands of scenarios for optimization.

Cadence Authority in CFD and LES

CFD is a high expertise domain with a lot of history. Tool departments and often product teams are staffed with armies of PhDs. Algorithms for meshes and solvers, together with software, have evolved significantly and of course continue to evolve. In other words this is a domain an EDA company must enter inorganically.

Cadence started here in 2021 with a series of acquisitions. These include NUMECA with strong meshing and solver technologies and an established reputation in marine and turbomachinery applications. Shortly after Cadence acquired Pointwise with proven strength in CFD meshing and established in aerospace and defense markets. In late 2022 they acquired Cascade Technologies, a Stanford spinout with compelling technology for LES. Through these acquisitions Cadence has built a stable of thoroughbred technology and experts in CFD, adding to their established strength in other aspects of multiphysics. But it seems they didn’t stop there.

Industries are desperate for higher LES performance for more accurate digital twin modeling. As one example, 50% of the energy consumed by a car goes to overcoming aerodynamic drag, directly affecting ICE fuel consumption or EV range. Designers need digital twins to simulate over thousands of operating conditions to find and optimize the many small improvements they can make around the car structure to reduce drag. How did Cadence step up to this need?

Cadence Millennium M1 and the Fidelity LES Solver

CFD is very parallelizable, so an obvious solution is run a job across many server/CPU clusters. This was already possible on big CPU farms or supercomputers, but cost becomes prohibitive when running complex LES algorithms over very large meshes with experiments over thousand of runs. Overcoming this barrier has been one of the drivers prompting development of Millennium M1, Cadence’s first GPU-based accelerator.

Cadence has a proven track record in hardware acceleration across multiple generations of the Palladium and Protium platforms for hardware verification. They have worked out the design, operations, and supply chain kinks to build these platforms and they have established infrastructure to provide cloud-based access. (All platforms including Millennium can also be purchased for on-premises analysis.) Extending this expertise to a GPU-based platform is both obvious and brilliant. In one stroke (though I’m sure it took them time to get there 😀) they can accelerate CFD simulations. Adding new generative AI methods for design and analysis exploration they claim delivers up to 100X design impact in accuracy, speed, and scale at much lower power when compared with massive CPU server parallelism. Hardware acceleration from Cadence hardware know-how combined with genAI expertise from both EDA and CFD teams demonstrates the synergy required to deliver the active and continuous innovation I mentioned earlier.

CFD algorithm development has also been very active. Software is designed from the ground up to be GPU native. Problem preparation for analysis includes low-touch optimized mesh generation. And there are new numerical methods to ensure high stability in LES simulations (normally prone to unphysical behavior in turbulence modeling).

This capability is available today for CFD multiphysics modeling, in the cloud or on-premises.

Millennium is Not Just for CFD

It is obvious that a GPU-based accelerator should be able to do more than accelerate CFD. It could accelerate finite element analyses such as stress, thermal diffusion, and electromagnetics. It can also run generative AI. But why not just use one of the giant hyperscaler GPU banks for that purpose? For me one reason is simply availability and latency in competition with chatbots and creative image apps. Equally it is hard to believe that application-specific fine-tuning on top of a mass market LLM models could serve the high complexity, high accuracy, and domain-specific needs of modern EDA and multiphysics software. Dedicated hardware is the way to go, accessible through the cloud or in on-premises installations.

It will be very interesting to see what capabilities Millennium will offer in the future both for electronic design and for multiphysics. You can learn more HERE.

Also Read:

2023 Retrospective. Innovation in Verification

Information Flow Tracking at RTL. Innovation in Verification

ML-Guided Model Abstraction. Innovation in Verification