RVN! 26 Banner revised (800 x 100 px) (600 x 100 px)

Neural Network Growth Requires Unprecedented Semiconductor Scaling

Neural Network Growth Requires Unprecedented Semiconductor Scaling
by Tom Simon on 10-20-2021 at 6:00 am

Neural Network Growth

The truth is that we are just at the beginning of the Artificial Intelligent (AI) revolution. The capabilities of AI are just now starting to show hints of what the future holds. For instance, cars are using large complex neural network models to not only understand their environment, but to also steer and control themselves. For any application there must be training data to create useful networks. The size of both the training and inference operations are growing rapidly as useful real-world data is incorporated into models. Let’s look at the growth of models over recent years to understand how this drives the needs for processing power for training and inference.

Neural Network Growth

In a presentation at the Ansys 2021 Ideas Digital Forum, the VP of Engineering at Cerebras, Dhiraj Mallik, provided some insight into the growth of neural network models. In the last two years model size has grown 1000X, from BERT Base (110 MB) to GPT-3 (175 GB). And in the offing, there is the MSFT-1T model, with a size of 1 TB. The GPT-3 model – which is an interesting topic of its own – was trained with conventional hardware using 1024 GPUs for 4 months. It’s a natural language processing (NLP) model that uses most of the text data on the internet and other sources. It was developed by Open AI, and is now the basis for the OpenAI Codex, which is an application that can write useful programming code in several languages from plain language instructions from users. GPT-3 can be used to write short articles that a majority of readers cannot tell were written by an AI program.

As you can see above, running 1024 GPUs for 4 months is not feasible. In his talk titled “Delivering Unprecedented AP Acceleration: Beyond Moore’s Law” Dhiraj makes the point that the advances needed to support this level of semiconductor growth go far and away beyond what we have been used to seeing with Moore’s Law. In response to this perceived market need, Cerebras released their WSE-1, wafer scale AI engine in 2019 – 56 times larger than any chip ever produced. A year and half later they announced the WSE-2, again the largest chip every built with:

  • 6 trillion transistors
  • 850,000 optimized AI cores
  • 40 GB RAM
  • 20 petabytes/s memory bandwidth
  • 220 petabytes fabric bandwidth
  • Built with TSMC’s N7 process
  • A wafer contains 84 dies, each 550 mm2.

The CS-2 system that encapsulates the WSE-2 can fit AI models with 120 trillion parameters. What is even more impressive is that CS-2 systems can be built into 192-unit clusters to provide near linear performance gains. Cerebras has developed a memory subsystem that disaggregates memory and computation to provide better scaling and improved throughput for extremely large models. Cerebras has also developed optimizations for sparsity in training sets, which saves time and power.

Dhiraj’s presentation goes into more detail about their capabilities, especially in the area of scaling efficiently with larger models to maintain throughput and capacity. From a semiconductor perspective it is also interesting to see how Cerebras analyzed the IR drop, electromigration, and ESD signoff on a design that is 2 orders of magnitude bigger than anything else ever attempted by the semiconductor industry. Dhiraj talks about how at each level of the design – tile, block, and full wafer – Cerebras used Ansys RedHawk-SC across multiple CPUs for static and dynamic IR drop signoff. RedHawk-SC was also used for power electromigration and signal electromigration checks. Similarly, they used Ansys Pathfinder for ESD resistance and current density checks.

With a piece of silicon this large at 7nm, the tool decisions are literally “make or break”. Building silicon this disruptive requires a lot of very well considered choices in the development process, and unparalleled capacity is of course a primary concern. Yet, as Dhiraj’s presentation clearly shows, CS-2’s level of increased processing power is necessary to manage the rate of growth we are seeing in AI/ML models. Doubtless we will see innovations that are beyond our imagination today in the field of AI. Just as the web and cloud have altered technology and even society, we can expect the development of new AI technology to change our world in dramatic ways. If you are interested in learning more about the Cerebras silicon, take a look at Dhiraj’s presentation on Ansys IDEAS Digital Forum at www.ansys.com/ideas.

Also Read

SeaScape: EDA Platform for a Distributed Future

Ansys Talks About HFSS EM Solver Breakthroughs

Ansys IDEAS Digital Forum 2021 Offers an Expanded Scope on the Future of Electronic Design


Take the Achronix Speedster7t FPGA for a Test Drive in the Lab

Take the Achronix Speedster7t FPGA for a Test Drive in the Lab
by Mike Gianfagna on 10-19-2021 at 10:00 am

Take the Achronix Speedster7t FPGA for a Test Drive in the Lab

Achronix is known for its high-performance FPGA solutions. In this post, I’ll explore the Speedster7T FPGA. This FPGA family is optimized for high-bandwidth workloads and eliminates performance bottlenecks with an innovative architecture. Built on TSMC’s 7nm FinFET process, the family delivers ASIC-level performance while retaining the full programmability of an FPGA. There is a lot to learn about the Speedster7T. Achronix now has a video available that will answer a lot of those questions. There is a link to that video and more coming, but first let’s see what happens when you take the Achronix Speedster7t FPGA for a test drive in the lab.

Steve Mensor

Steve Mensor, VP of sales and marketing at Achronix introduces the video. Steve has been with Achronix for almost ten years and spent 21 years at Altera before that. He certainly knows a lot about FPGAs – design and application. Steve begins by outlining some of the elements of the previously mentioned innovative architecture. There is a lot of dedicated capability on board the Speedster7T. This includes:

  • 112 Gbps SerDes
  • 400G Ethernet
  • PCIe Gen5
  • GDDR6 running at 4 Tbps
  • DDR 4 running at 3,200 Mbps
  • A proprietary machine learning processor
  • 2D network on chip (NoC)

The proprietary machine learning processor delivers a lot of functionality, including floating point, block floating point and integer operations. The 2D NoC is a new-to-the-industry capability for FPGAs from Achronix. The NoC can route data from any of the high-speed interfaces to the core FPGA fabric at 2 GHz without consuming any of the FPGA logic resources.  All of this on-board technology allows you to get to ASIC-level performance in an FPGA.

Katie Purcell

Steve then hands the presentation over to Katie Purcell, application engineering manager at Achronix. Katie has been with Achronix for four years. Prior to that she was an ASIC designer. She also spent time at Xilinx. Katie is the one who takes the Speedster7t FPGA for a test drive in the lab, and she is definitely up to the challenge.

Katie takes the viewer into the Achronix lab where bring-up of the Speedster7T is being performed – validation and characterization. The demo Katie presents shows the device running 400G ethernet traffic on the Achronix VectorPath accelerator card. Katie begins by summarizing the key elements of the demonstration, which include:

  • 8 X 50G external interface
  • Single 400G interface in ethernet subsystem
  • Data divided to four separate streams in the 2D NoC
  • Each stream processed independently

Katie spends some time on the 2D NoC. She points out that this capability makes the design simpler and easier to close timing. This unique 2D NoC came up several times during the demo. It’s worth digging in a bit more to understand it. Achronix previously presented a webinar about this unique capability that was covered on SemiWiki called 5 Reasons Why a High Performance Reconfigurable SmartNIC Demands a 2D NoC. The good news is that a replay of this very informative webinar is now available. You can watch it here.

Katie takes you through a detailed look at what’s going on inside the Speedster7T device as it processes the data packets. Knowing those details helps to understand the ease of setup and delivered accuracy that is shown during the demo. If you think a unique device like this could help your design project, I highly recommend you watch the demo. It’s short, but very useful. You can access the demo video here.

Now you know how to take the Achronix Speedster7t FPGA for a test drive in the lab. You can find out more details about this unique FPGA family here.


Using PUFs for Random Number Generation

Using PUFs for Random Number Generation
by Kalar Rajendiran on 10-19-2021 at 6:00 am

3 API Functions

In our daily lives, few of us if any, would want randomness to play any role. We look for predictability in order to plan our lives. But reality is that random numbers have been playing a role in our lives for a long time. The more conspicuous use cases of random numbers are with key fobs, and nowadays mobile phones. And then there are a whole lot of behind the scene use cases that leverage random numbers for security purposes. As such, generating true random numbers is a topic of ongoing interest within the field of math and computer science.

There have been instances where faulty random number generators caused security and authentication issues. Nonces are random numbers used in authentication protocols. The term nonce as used here carries the meaning “number only used once.” Not to be mixed up with the slang meaning of this word as used in some parts of the world. There have been incidents in the past where a nonce was generated more than once due to weak random number generators. Lot of progress has been made since then and such lapses don’t happen as much. Nonetheless, due to the explosive growth in internet linked devices, we face increasing levels of threats to device and information security.

It is in this context that the recent announcement by Intrinsic ID of a NIST-Certified  Software Random Number Generator (RNG) is of interest. You can read the press release “Intrinsic ID Announces NIST-Certified Software Random Number Generator (Zign RNG) for IoT Devices” here. It points to a report that has identified critical vulnerabilities in billions of IoT devices due to poor random number generation mechanisms. The announcement states that their Zign RNG ensures a source of true randomness addressing the critical security flaw in IoT devices. And that the Zign RNG is the industry’s first and only embedded RNG software solution.

One of the coveted features of any solution is its ability to be a drop-in replacement without requiring any hardware change to already designed and manufactured devices. Particularly those that are already deployed in the field. Zign RNG claims to be such a solution, a cryptographically secure NIST-certified RNG that can even be retrofitted on already-deployed devices.

The curiosity in all of us would want to know how is Intrinsic ID able to achieve the above claimed solution. The answer can be found in the webinar that Intrinsic ID recently hosted on PUF Café. The webinar was titled “Using PUFs for Random Number Generation” and was delivered by Dr. Nicolas Moro. He is an embedded security systems engineer at Intrinsic ID. This blog is a synthesis of the salient points from watching that webinar.

 

Attractiveness of SRAM PUF for RNG

PUFs can be seen as the fingerprint of a device. In the case of SRAM PUFs – like most PUFs – this fingerprint comes with noise. In the normal use case of a PUF, algorithms are used to remove this noise. On the other hand, random number generators need a non-deterministic source of entropy. And the noise from an SRAM PUF fingerprint can be used for that purpose. Given that we know that every IoT device already contains SRAM, it makes sense to use the noise from the SRAM PUF as a source of entropy to extract a true random seed. Not only that, SRAM has additional benefits as seen in the Figure below.

 

Harvesting the Entropy for RNG

The non-deterministic part of the start-up values of the SRAM is used as a source of entropy. This entropy is used as a seed for a Deterministic Random Bit Generator (DRBG).

The output of an SRAM PUF is slightly different each time. Random and repeatable but not 100% repeatable. For PUF purposes, this noise is eliminated through error correction techniques. But for random number generation, the data derived from the noise needs to pass some statistical tests. Thus, some processing is needed to create the seed for the DRBG.

The SRAM PUF source has enough noise in it to be able to generate the right number of bits of entropy for desired levels of security. For example, Intrinsic ID has established through its experiments that only 2 kilobytes of uninitialized SRAM are required to get 192 bits of entropy.

 

PUF-based RNG

The National Institute of Standards and Technology (NIST) has established clear set of specifications for secure random number generators. Refer to Figure below for the specification for each topic relating to this subject matter.

 

The entropy derived from the PUF is fed as the seed to a DRBG to yield random data. A maximum of 2^19 bits can be received per call to the DRBG. The maximum number of calls that can be made before the reseed counter runs out is 2^48. That is more than 281 trillion calls. Everyone would agree that is large number. In the event the counter does run out, a power reboot of the device would start it off with new entropy to reseed the DRBG.

For implementing a 128-bit security level, a total of 192 bits of entropy is needed from the SRAM PUF. As noted earlier, this is achievable with just 2kilobytes of uninitialized SRAM. Once the seeding is one, this 2kilobytes of memory is available for other uses and purposes within the application.

 

NIST Certification

NIST has qualified third party certification labs to conduct validation programs. The Cryptographic Algorithm Validation Program (CAVP) is for certifying the RNG algorithm. The Cryptographic Module Validation Program (CMVP) is for certifying the system/module that implements and uses the algorithm. In addition to a GetEntropy function, a RNG solution must include a GetNoise and a HealthTest function. These are requirements by NIST and must be made available through an API at the top level. Refer to Figure below.

 

The GetEntropy function is of course the one that returns the seed for the DRBG.

The GetNoise function is for use by a third-party certification lab during their evaluation to check the entropy source behavior against the specification.

The HealthTest function is used during run time to protect against any catastrophic failures. For example, if during run time, all zeroes are received as input for seeding DRBG, it will abort and raise a flag.

 

Summary

Zign RNG software can be implemented at any stage of a device’s lifecycle, even after a device is already deployed in the field. It has passed all standard NIST randomness tests and is a NIST/FIPS-compliant software solution. It addresses the issue of hardware RNG peripherals used in IoT devices running out of entropy and leaving the devices vulnerable.

Intrinsic ID’s Zign RNG is available now and is applicable to anyone making IoT devices or chips for IoT. More details about the product can be accessed here and a product brief can be downloaded from here.

You can access the full press release of the Zign RNG product announcement here.

And you can watch and listen to the webinar by registering here at PUF Café.

Also Read:

Quantum Computing and Threats to Secure Communication

Webinar: How to Protect Sensitive Data with Silicon Fingerprints

CEO Interview: Pim Tuyls of Intrinsic ID


APR Tool Gets a Speed Boost and Uses Less RAM

APR Tool Gets a Speed Boost and Uses Less RAM
by Daniel Payne on 10-18-2021 at 10:00 am

Aprisa

Automatic Place and Route (APR) tools have been around since the 1980s for IC design teams to use, and before that routing was done manually by very patient layout designers. Initially the big IDMs had their own internal CAD groups coding APR tools in house, but eventually the commercial EDA market picked up this automation area, and it’s been a highly competitive segment ever since then. Users of APR tools have a few metrics that they follow closely:

  • Capacity, # of cells in a block
  • Runtime
  • Quality of Results, did I meet area, timing and power goals?
  • Memory usage
  • Number of DRC violations to fix after APR

I just had a video call with Henry Chang at Siemens EDA about their latest APR tool release of Aprisa 21.R1, and it was an eye opener.  On capacity, he mentioned that many users run blocks in Aprisa that are 3-4 million instances, although some teams prefer to run larger blocks with 8-9 million instances. The tool can be run in either flat or hierarchical mode, and the big news is that on the largest designs the run times are up to 2X faster than the previous release. So, how did they do that?

Aprisa

You may recall that Siemens EDA acquired Avatar last year, July 2020. The engineers working on the previous routing tool Nitro combined to work on Aprisa, and then management hired even more developers to get the runtime improvements, so developer headcount increased by more than 2X. Not only did the runtime improve by an average of 30% for all designs, but the RAM usage decreased by up to 60% as well. Most APR jobs are run locally on a big server, so being more efficient is attractive to engineering departments., especially when their biggest jobs can run for a few days.

Aprisa developers improved pretty much all parts of the tool:

  • Placement optimization
  • Clock Tree Synthesis (CTS) optimization
  • Route optimization
  • Timing analysis

APR, like most critical EDA tools goes through a qualification process at the foundries, and Aprisa is fully certified for the TSMC 6nm process, while the 5nm and 4nm nodes have implemented all of the required design rules and features, so stay tuned for the official announcement of qualifications for these nodes.

Modern SoCs often use multi-power domains (MPD), and Aprisa handles that too.  It sounds like the QoR with Aprisa come about in part by their approach to detailed routing, and there’s a previous blog on that topic.

Henry Chang

I first met Henry at Mentor Graphics, back in 2000, when we worked on a Fast-SPICE circuit simulator called Mach TA. He started his EDA career in 1993 as a co-founder and architect at Anagram, and had stints at Avant! and Atoptech, so he really understands the IC design process from SPICE to AMS and APR.

Summary

APR tool users have choices when it comes to finding and using a leading-edge tool for design implementation, and having a multi-vendor tool flow is a solid choice. Siemens EDA has bulked up their Aprisa team, and the improvements that are now revealed  in faster run times, with a smaller memory footprint look impressive, with foundry support qualified at 6nm, and a pipeline to include 5nm and 4nm nodes well underway. Read the press release online.

Related Blogs

 

 


IEDM 2021 – Back to in Person

IEDM 2021 – Back to in Person
by Scotten Jones on 10-18-2021 at 6:00 am

IEDM 2021 SemiWIki

Anyone who has read my previous articles about IEDM knows I consider it the premier conference on process technology.

Last year due to COVID IEDM was virtual and although virtual offers some advantages the hallway conversations that can be such an important part of the conference are lost. This year IEDM is returning as a live event in San Francisco from Dec. 11-15, 2020, with on-demand access to materials starting Dec. 17.

Saturday, December 11 will be the tutorials:

90-Minute Tutorials – Saturday, Dec. 11

The 90-minute Saturday tutorial sessions on emerging technologies have become a popular and growing part of the IEEE IEDM. They are presented by experts in the field to bridge the gap between textbook-level knowledge and leading-edge current research, and to introduce attendees to new fields of interest:

2:45 p.m. – 4:15 p.m.

4:30 p.m. – 6:00 p.m.

Sunday, December will be the short courses:

IEDM Short Courses – Sunday, Dec. 12

In contrast to the Tutorials, the full-day Short Courses are focused on a single technical topic. Early registration is recommended, as they are often sold out. They offer the opportunity to learn about important areas and developments, and to network with global experts.

  • Future Scaling and Integration Technology, organized by Dechao Guo, IBM Research

 

Practical Implementation of Wireless Power Transfer, Hubregt Visser, IMEC

Monday will begin the conference with the plenary talks:

Plenary Presentations – Monday, Dec. 13

  • The Smallest Engine Transforming Our Future: Our Journey Into Eternity Has Only Begun, Kinam Kim, Vice Chairman & CEO, Samsung Electronics Device Solutions Division
  • Creating the Future: Augmented Reality, the Next Human-Machine Interface, Michael Abrash, Chief Scientist, Facebook Reality Labs
  • Quantum Computing Technology, Heike Riel, Head of Science & Technology, IBM Research and IBM Fellow

Followed by the full conference Monday through Thursday.

As every year, IEEE IEDM 2021 will offer Special Focus Sessions on emerging topics with invited talks from world experts to highlight the latest developments.

Monday, December 13, 1:35 PM

Session 3 – Advanced Logic Technology – Focus Session – Stacking of devices, circuits, chips: design, fabrication, metrology – challenges and opportunities

Tuesday, December 14, 9:05 AM

Session 14 – Emerging Device and Compute Technology – Focus Session – Device Technology for Quantum Computing

Wednesday, December 15, 9:05 AM

Session 25 – Memory Technology/Advanced Logic Technology – Focus Session – STCO for memory-centric computing and 3D integration

Wednesday, December 15, 1:35 PM

Session 35 – Sensors, MEMS, and Bioelectronics/Optoelectronics, Displays, and Imaging Systems – Focus Session – Technologies for VR and Intelligence Sensors

Session 38 – Emerging Device and Compute Technology/Optoelectronics, Displays, and Imaging Systems – Focus Session – Topological Materials, Devices, and Systems

For more information and to access the full program as it becomes available, please go to: https://www.ieee-iedm.org/

IEEE International Electron Devices Meeting (IEDM) is the world’s preeminent forum for reporting technological breakthroughs in the areas of semiconductor and electronic device technology, design, manufacturing, physics, and modeling. IEDM is the flagship conference for nanometer-scale CMOS transistor technology, advanced memory, displays, sensors, MEMS devices, novel quantum and nano-scale devices and phenomenology, optoelectronics, devices for power and energy harvesting, high-speed devices, as well as process technology and device modeling and simulation.


Silicon Startups, Arm Yourself and Catalyze Your Success…. Spotlight: Semiconductor Conferences

Silicon Startups, Arm Yourself and Catalyze Your Success…. Spotlight: Semiconductor Conferences
by Kalar Rajendiran on 10-17-2021 at 10:00 am

Live Panel

The arrival of fall seems to typically raise the number of conferences hosted by semiconductor ecosystem companies. The conferences may go by different names. But whether called a forum, summit, conference, or by some other creative name, the purpose is the same. It is to bring together technologists and business people together to share ideas. More specifically, to discuss industry opportunities and challenges and of course to tout the respective ecosystem partners’ accomplishments. Samsung Foundry Forum and Global Foundries Technology Summit were held in September. October has some interesting ones too.

As important as it is for everyone within the industry to participate in these conferences, the value for smaller companies is potentially much higher. Why? Because, where else and how else would a smaller company with smaller financial resources be able to get this, compressed into a few days, all happening in one location? Of course, nowadays, that one location happens to be a virtual one, in many cases. That is even better.

Just because something is free and you can attend it without even leaving your home or office does not mean you should or could attend all these conferences. So, how do you go about choosing? Well, to some extent it depends on who you are, where you are in terms of your product idea and what kind of assistance and insights you would need for that stage.

Leveraging Conference Opportunities

Arm Dev Summit

One that is in the immediate offing is the Arm Dev Summit, scheduled as a virtual event for Oct 19-Oct 21, 2021. There, you will get a chance to network with Arm’s global community of hardware designers and software developers. And an opportunity to hone your ideas, whether it is in AI, IoT, 5G, wired communications or super-computing. It’s a 3-day virtual event and the agenda is extensive. Whether you are just starting up, simply crazy about autonomous vehicles, a myth buster type or for that matter a 5G Campfire kind of person, there is something of value for everyone. Arm’s entire ethos is about creating diverse, multi-participant ecosystem to unlock new possibilities.

Register and attend all sessions that are relevant to your areas of interest. If you are a startup, you would certainly want to check out the following back-to-back sessions. Attend a live panel session that will focus on helping startups succeed, followed by a networking session with the same theme.

Of course, navigating a startup to market success demands lot more than just attending developer conferences. Building investor confidence for securing early rounds of funding is key. Innovating while keeping costs down is essential. Flexibility to test and iterate without overrunning the budget would be a big advantage. Reducing the risk in the project and product schedules cannot be understated. Getting a product to market faster than the competition is crucial. A series of future blogs will tackle these topics and how Arm and Silicon Catalyst could be of help.

TSMC OIP Forum

If you’re looking at the siliconization aspects of your products, you would want to register for the TSMC Open Innovation Platform (OIP) Forum. This one is scheduled for Oct 26, 2021 for the Americas Time Zone and for Oct 27, 2021 for Europe and Asia Time Zones. You can register for TSMC OIP Forum using this link.

Snapshot of some Sessions: TSMC OIP Forum, Oct 26, 2021

Catalyzing Your Business

Attending the right conferences certainly is helpful. But if you’re an emerging startup, you could benefit from more help on an ongoing basis, at least until you gain escape velocity.

Silicon Valley is known all over the world and semiconductors are the life force behind all electronics. All the modern conveniences we take for granted are powered by advanced and complex chips. And designing, manufacturing and producing these chips in high volumes are challenging tasks. Yet, there are no incubators focused on semiconductors. Although hard to believe, it is true.

Silicon Catalyst is the world’s only incubator focused on semiconductor solutions, including MEMS, sensors and intellectual property. Silicon Catalyst’s mission is to help semiconductor startups succeed. Through a coalition of in-kind and strategic partners, investors and advisors, Silicon Catalyst helps startups accelerate their ideas through prototypes, and onto a path to volume production.

As a strategic and in-kind partner, Arm participates in the incubation selection process and actively looks for opportunities to partner with these startups. As an in-kind partner, TSMC provides MPW shuttles for companies in the incubator. The Silicon Catalyst coalition provides everything startups need to design, fabricate, and market semiconductor solutions. The startups gain millions of dollars’ worth of EDA tools, IP, PDKs, prototypes, design and test services, packaging and business solutions and expert guidance from accomplished advisors. For more details, check out the full list of Silicon Catalyst partners.

Upcoming Blog Series

This blog is the first piece in a series of blogs to follow. The future blogs will cover challenges and opportunities that silicon startups commonly face and how the Silicon Catalyst ecosystem can be of significant help in accelerating their growth. The primary goal of the blog series is to identify tried-and-true solutions to the many problems to be faced, ultimately helping accelerate a silicon startup’s transformation from the initial conceptual stage to business success in the market.

Also Read:

WEBINAR: Maximizing Exit Valuations for Technology Companies

Silicon Catalyst and Cornell University Are Expanding Opportunities for Startups Like Geegah

Silicon Catalyst is Bringing Its Unique Startup Platform to the UK


Chiplet: Are You Ready For Next Semiconductor Revolution?

Chiplet: Are You Ready For Next Semiconductor Revolution?
by Eric Esteve on 10-17-2021 at 6:00 am

D2D IP market forercast 2020 2025

During the 2010-decade, the benefits of Moore’s law began to fall apart. Moore’s law stated transistor density doubled every two years, the cost of compute would shrink by a corresponding 50%. The change in Moore’s law is due to increased in design complexity the evolution of transistor structure from planar devices, to Finfets. Finfets need multiple patterning for lithography to achieve devices dimensions to below 20-nm nodes.

At the beginning of this decade, computing needs have exploded, mostly due to proliferation of datacenters and due to the amount of data being generated and processed. In fact, adoption of Artificial Intelligence (AI) and techniques like Machine Learning (ML) are now used to process ever-increasing data and has led to servers significantly increasing their compute capacity.

Servers have added many more CPU cores, have integrated larger GPUs used exclusively for ML, no longer used for graphics, and have embedded custom ASIC AI accelerators or complementary, FPGA based AI processing. Early AI chip designs were implemented using larger monolithic SoCs, some of them reaching the size limit imposed by the reticle, about 700mm2.

At this point, disaggregation into a smaller SoC plus various compute and IO chiplets appears to be the right solution. Several chip makers, like Intel, AMD or Xilinx have select this option for products going into production. In the excellent white paper from The Linley Group, “Chiplets Gain Rapid Adoption: Why Big Chips Are Getting Small”, it was shown that this option leads to better costs compared to monolithic SoCs, due to the yield impact of larger.

The major impact of this trend on IP vendors is mostly on the interconnect functions used to link SoCs and chiplets. At this point (Q3 2021), there are several protocols being used, with the industry trying to build formalized standards for many of them.

Current leading D2D standards includes i) Advanced Interface Bus (AIB, AIB2) initially defined by Intel, and now has offered royalty free usage, ii) High Bandwidth Memory (HBM) where DRAM dies are stacked on each other on top of a silicon interposer and are connected using TSVs, iii) Open Domain-Specific Architecture (ODSA) subgroup, an industry group, has defined two other interfaces, Bunch of Wires (BoW) and OpenHBI.

Heterogeneous chiplet design allows us to target different applications or market segments by modifying or adding just the relevant chiplets while keeping the rest of the system unchanged. New developments could be launched quicker to the market, with significantly lower investment, as redesign will only impact the package substrate used to house the chiplets.

For example, the compute chiplet can be redesigned from TSMC 5nm to TSMC 3nm to integrate larger L1 cache or higher performing CPU cores, while keeping the rest of the system unchanged. At the opposite end of the spectrum, only the chiplet integrating SerDes can be redesigned for faster rates on new process nodes offering more IO bandwidth for better market positioning.

Intel PVC is a perfect example of heterogeneous integration (various functional chiplet, CPU, switch, etc.) that we could call vertical integration, when the same chip maker owns the various chiplet components (except for memory devices).

Chip maker developing SoCs for high-end applications, such as HPC, datacenter, AI or networking are likely to be early adopters for chiplet architectures. Specific functions, like SRAMs for larger L3 cache, or AI accelerators, either Ethernet, PCIe or CXL standards should be the first interface candidate for chiplet designs.

When these early adopters have demonstrated the validity of heterogeneous chiplets leveraging multiple different business models, and obviously the manufacturing feasibility for test and, packaging, it will create an ecosystem will have been create that is critical to support this new technology. At this point, we can expect a wider market adoption, not only for high-performance applications.

We could imagine that heterogeneous products can go further, if a chip maker will launch on the market a system made of various chiplets targeting compute and IO functionality. This approach makes convergence on a D2D protocol mandatory, as an IP vendor offering chiplets with an in-house D2D protocol is not attractive to the industry.

An analogy to this, is the SoC building in the 2000’s, where semiconductor companies transition to integrating various design IPs coming from different sources. The IP vendors of the 2000’s will inevitably become the chiplet vendors of the 2020’s. For certain functions, such as advanced SerDes or complex protocols, like PCIe, Ethernet or CXL, IP vendors have the best know-how to implement it on silicon.

For complex Design IP, even if simulation verification has been run before shipping to customers, vendors have to validate the IP on silicon to guarantee performance. For digital IP, the function can be implemented in FPGA because it’s faster and far less expensive than making a test chip. For mixes-signal IP, like a SerDes based PHY, vendors select the Test Chip (TC) option allowing to silicon enabling them characterize the IP in silicon before shipping to customer.

Even though a chiplet is not simply a TC, because it will be extensively tested and qualified before being used in the field, the amount of incremental work to be done by the vendor to develop a production chiplet is far less. In other words, the IP vendor is the best positioned to quickly release a chiplet built from his own IP and offer the best possible TTM and minimize risk.

The business model for heterogeneous integration is in favor of various chiplets being made by the relevant IP vendor (eg. ARM for ARM-based CPU chiplets, Si-Five for Risc-V based compute chiplets and Alphawave for high-speed SerDes chiplets) since they are owner of the Design IP.

None of this prevents chip makers to design their own chiplets and source complexe design IPs to protect their unique architectures or implement house-made interconnects. Similar to SoC Design IP in the 2000’s, the buy or make decision for chiplets will be weighted between core competency protection and sourcing of non-differentiating functions.

We have seen that the historical and modern-day Design IP business growth since the 2000’s has been sustained by continuous adoption of external sourcing. Both models will coexist (chiplet designed in-house or by an IP vendor) but history has shown that the buy decision eventually over takes the make.

There is now consensus in the industry that a maniacal focus on achieving Moore’s law is not valid anymore for advanced technology nodes, eg. 7nm and below. Chip integration is still happening, with more transistors being added per sq. mm at every new technology node. However, the cost per transistor is growing higher every new node as well.

Chiplet technology is a key initiative to drive increased integration for the main SoC while using older nodes for other functionality. This hybrid strategy decreases both the cost and the design risk associated with integration of other Design IP directly onto the main SoC.

IPnest believes this trend will have two main effects in the interface IP business, one will be the strong growth of D2D IP revenues soon (2021-2025), and the other is the creation of the heterogenous chiplet market to augment the high-end silicon IP market.

This market is expected to consist of complex protocols functions like PCIe, CXL or Ethernet. IP vendors delivering interface IP integrated in I/O SoCs (USB, HDMI, DP, MIPI, etc.) may decide to deliver I/O chiplets instead.

The other IP categories impacted by this revolution will be SRAM memory compiler IP vendors, for L3 cache. By nature, the cache size is expected to vary depending on the processor. Nevertheless, designing L3 cache chiplet can be a way for IP vendor to increase Design IP revenues by offering a new product type.

As well, the NVM IP category can be positively impacted, as NVM IP are no longer integrated in SoCs designed on advanced process nodes. It would be a way for NVM IP vendors to generate new business by offering chiplets.

We think that FPGA and AI accelerator chiplets will be a new source of revenues for ASSP chip makers, but we don’t think they can be strictly ranked as IP vendors.

If Interface IP vendors will be major actors in this silicon revolution, the silicon foundries addressing the most advanced nodes like TSMC and Samsung will also play a key role. We don’t think foundries will design chiplets, but they could make the decision to support IP vendors and push them to design chiplets to be used with SoCs in 3nm, like they do today when supporting advanced IP vendors to market their high-end SerDes as hard IP in 7nm and 5nm.

Intel’s recent transition to 3rd party foundries is expected to also leverage third party IPs, as well as heterogenous chiplet adoption by semiconductor heavyweights. In this case, no doubt that Hyperscalars like Microsoft, Amazon and Google will also adopt chiplet architectures… if they don’t precede Intel in chiplet adoption.

By Eric Esteve (PhD.) Analyst, Owner IPnest

Also Read:

IPnest Forecast Interface IP Category Growth to $2.5B in 2025

Design IP Sales Grew 16.7% in 2020, Best Growth Rate Ever!

How SerDes Became Key IP for Semiconductor Systems


Podcast EP43: Navigating the Architecture Exploration Jargons and What Do They Mean to a Chip Architect?

Podcast EP43: Navigating the Architecture Exploration Jargons and What Do They Mean to a Chip Architect?
by Daniel Nenni on 10-15-2021 at 10:00 am

Dan is joined by Deepak Shankar, founder of Mirabilis Design. Dan explores the application and impact of architectural exploration on chip and system design.

The views, thoughts, and opinions expressed in these podcasts belong solely to the speaker, and not to the speaker’s employer, organization, committee or any other group or individual.


CEO Interview: Rick Seger of SigmaSense

CEO Interview: Rick Seger of SigmaSense
by Daniel Nenni on 10-15-2021 at 6:00 am

RickSeger Photo

Rick Seger is a pioneer of the PCAP touch and pen industry. Seger formerly was the President of N-trig, which incorporated PCAP technology with active pen solutions into devices for nearly all major PC OEMs. Microsoft acquired N-trig in 2015, and Seger is now the CEO of SigmaSense, a global leader in touch sensing performance. SigmaSense brings the best user experiences to products ranging from mobile phones and laptops to large monitors and digital signage. Their revolutionary new approach to sensing delivers 100 to 1,000 times improved Signal-to-Noise Ratio (SNR) performance in many instances.

We are speaking to Rick as SigmaSense is embarking on an exciting next step in their Company’s growth: they have just concluded a Series B round totaling $24 million, which will help the Company begin mass production to deliver its innovations and features to a range of applications, starting with touch and improved user experiences.

Q: Can you provide a brief background on the technology behind SigmaSense®?

Troy Gray, one of our founders, is the inventor who originally conceived the idea to use the concurrent bi-directional drive and sense for detecting changes in electric fields instead of scanning voltage thresholds. Troy is an inventor at his core and has nearly 30 years in the touch industry. Several years ago, he came up with this exciting concept based on his innate understanding of how to detect electron movement: the ability to manipulate and sense electron fields concurrently and adaptively. From this idea of concurrent driving and sensing on the same pin, SigmaSense was born.

Q: How does this technology fundamentally revolutionize the semiconductor industry? 

Human factors decide the winners in almost every market. Digital devices are becoming more intuitive by leveraging AI capabilities that delight customers. For this reason, the future of semiconductors is becoming as much about sensing high fidelity analog interactions as it is about processor performance.  As sensing becomes a top priority, we are shifting the industry from mature voltage mode Analog-to-Digital Converters (ADCs) to current and frequency-based ADCs. This shift moves much of what is done in analog today to 90% digital silicon.

The analog voltage-based sensing industry has been stuck for the past 40 years fighting RC time constraints to detect voltage thresholds above the electrical noise using analog processes that are difficult to scale.  The technology injects latency, uses too much battery power, is limited in canceling noise, and cannot rapidly adapt to changing conditions.  By contrast, current mode ADCs take advantage of scalable digital semiconductor processes that are faster, lower power, higher SNR, and less expensive to manufacture.

At the hardware level, each pin on a semiconductor IC is typically dedicated to transmitting, receiving, power, OR other communications purposes. Troy’s breakthrough can reduce multiple pins to a single pin for many applications.   We apply current mode ADCs to enable a single pin to transmit, receive, deliver power, and encode communications concurrently and without muxing.  Combined with AI or edge processors, rather than transmitting massive amounts of data up to the cloud, designers can be selective on the data they want to deliver, capturing the highest fidelity data from any target analog system with far higher efficiency.

We are not the only ones to see SigmaSense revolutionizing the analog-to-digital interface. With our Series B, we are adding Aurelio Fernandez to our Board, who served as the first VP of Worldwide Sales for Broadcom and is exceptionally well connected in the semiconductor industry. Aurelio joins another semiconductor veteran, David French, who was added to our Board in our previous funding round and has a long history in the semiconductor industry with NXP, Cirrus Logic, ADI, and TI.

Q: How does the SigmaSense technology work? 

Our SigmaDrive® current-mode ADC technology can be applied to any impedance-based sensing problem, but let’s use the PCAP touch industry as a good example. With SigmaSense, all channels (rows and columns) are programmable and can sense temperature, pressure, capacitance, voltage, current, resistance, and impedance changes. Now, the presence of a hand or the movement of an object, whether it’s a conductive object or a dielectric, can be detected using a unique ability to see and sense all changes in the electric fields.

Q: What are the problems you are solving? 

Our first product, a software-defined touch-sensing solution we call SigmaVision®, is faster and more robust than current systems.  Touch defines the Human Machine Interactions (HMI), which defines the experience, which ultimately defines the brand. In the last 12 months, leading phones have struggled with firmware upgrades and failures in the field due to voltage mode touch solutions that are pushed to their limit.  The touch solutions have been found to be highly sensitive to noise, slowing report rates, increasing lag, and generally compromising customers’ experiences.  HMI is the wrong place to make compromises.  Imagine a touch response speed that is faster and smoother for high-speed gaming or one that readily works through gloves or can even perform gesture recognition above the screen without touching. Now, picture yourself in front of a 100-inch screen that you touch with gloves through a storefront window while it’s raining.  This technology enables a new class of current mode ADCs for use within an entirely new generation of faster, more responsive, and interactive devices.

Q: Why are Signal-to-Noise ratios and higher-quality data collection essential?

RS: Data starts at the sensor, at the conversion point from analog to digital, and ends with the desired output or expected response. Our silicon systems need better, faster data capture, especially as AI becomes more prevalent. We are watching now as AI systems are at the mercy of the data we load into them.  The data determines the experience.

Are we surprised by garbage in, garbage out? We have nearly unlimited sensing data everywhere: flowing through our bodies, coming off a touchscreen, or inside our vehicles, including all the changes and movements of electrons through various disruptions and interactions. Identifying which data is to be processed, which has the highest value, and which provides the best results will require high fidelity data provided by software-defined sensing systems that are adaptive and flexible. Analog systems are chaotic, changes are continuous, happening in real-time and cloud processing is not efficient, so we see significant silicon investments to improve processing performance at the edge.

Q: Why is SigmaSense’s ultra-low voltage a breakthrough in the semiconductor space? 

We have developed a single pin on a semiconductor device that can concurrently transmit, receive, communicate and provide power using ultra-low voltages, up to a thousand times lower voltages than what our competitors need for sensing in that same environment. The breakthrough means we no longer need high voltage signals to get above a noise floor. The benefits of ultra-low-voltage sensing are lower power consumption, longer battery life, lower-cost materials, better display optics, improved sensor reliability, and lower emissions.

Q: What trends do you see in the semiconductor space?

Recent semiconductor shortages have driven an increased focus on semiconductors’ importance in all our lives. Markets will drive semiconductor designs to higher efficiency, specifically a renewed focus on more efficient processing at the edge. Better sensing data is critical for our devices’ “end-to-end” processing performance and ultimately determines the human factors we want.

Many industry leaders are beginning to put a priority on end-to-end processing performance. The focus on semiconductors delivering raw processing performance will not end. Still, many of the most significant benefits of mixed-signal performance and efficiency are silicon, enabling better data capture. Silicon enablement of adaptive sensing systems is sure to win the new end-to-end processing challenge.

Our recent Series B fundraise and the addition of two semiconductor veterans to our board in the past year makes us very well positioned for making substantial impacts in a range of mixed signal markets. While we’re initially focused on capturing sizable market share in the touch and HMI (Human Machine Interface) markets, we will then extend our technology into wearables, bio-sensing, IoT and automotive applications.

Rick Seger’s Bio:

Rick Seger is a Pioneer of the PCAP touch and pen industry. As President of N-trig Inc. from 2006 to 2015, he helped define the first customer products to incorporate PCAP technology, enabling pen solutions into devices from nearly all major PC OEMs. Since 2006 he has driven the adoption of modern touch and pen-based input methods, influencing hardware and software design decisions. Laptop Magazine named him as one of the “25 Most Influential People in Mobile Technology” for his evangelism in this area.

Mr. Seger is a leading advocate for the advancement of Interactive Displays and is passionate about helping manufacturers deliver products that will drive broad adoption. Most exciting to Mr. Seger is the impact interactive touch and pen-based displays can have on Education Markets. From Healthcare to Education, from Business to the Arts, the promise of interactivity is strongly sought after driving design decisions. He has been integral to defining some of the best touch solutions, pens, applications, and sensing devices poised for rapid adoption.

Mr. Seger started his career at Intel Corporation and further developed his leadership experience in the Semiconductor Industry as VP Sales Motorola SPS Consumer Group. During his 13 years at Motorola, Mr. Seger consistently grew the Business, managing more than 80 IC design wins with major OEMs and generating more than $500M in annual revenues.


Webinar – Comparing ARM and RISC-V Cores

Webinar – Comparing ARM and RISC-V Cores
by Daniel Payne on 10-14-2021 at 10:00 am

Mirabilis Webinar, October 21

Operating systems and Instruction Set Architectures (ISA) can have long lifespans, and I’ve been an engineering user of many ISAs since the 1970s. For mobile devices I’ve followed the rise to popularity of the ARM architecture, and then more recently the RISC-V ISA which has successfully made the leap from university project to commercialization with a widening ecosystem of support.  Naturally the question arises as to which ISA fits a specific workload for the best efficiency, like: MIPS, latency, and the number of instructions.

One company that has expertise in answering these questions is Mirabilis Design, and they’re hosting a webinar about how to model and measure the efficiency of three popular cores:

  • ARM Cortex A53
  • ARM Cortex A77
  • SiFive U74

Mirabilis Design will show their models of these processors, and how to configure each processor model with settings for:

  • Clock Speed
  • Caches: L1, L2, DSU
  • AXI Speed
  • DRAM Speed
  • Custom switches

The same C code will be used across each processor, and the specific C compilers will be used. Simulations with the compiled code are run in the VisualSim tool, then the results of the simulations are compared to show metrics, like:

  • # of Instructions
  • Latency
  • Maximum MIPS
  • Cache hit-ratio
  • Memory bandwidth
  • Power

You will find out which ISA has the smallest # of instructions, ARM or RISC-V, meaning the best compiler efficiency, along with latency and MIPS numbers. With the Mirabilis Design approach it only takes minutes to run a simulation on your own C code, then collect all of the efficiency numbers for an ISA that you have configured. This information helps a system architect to detect any bottlenecks, and then optimize the architecture for best performance.

Summary

System architects and SoC design teams trying to decide which ISA to go with on their next project should be interested in this webinar. You can see the replay HERE.

Compare Performance-power of Arm Cortex vs RISC-V for AI applications.

Abstract:
In the Webinar, we will show you how to construct, simulate, analyze, validate, and optimize an architecture model using pre-built components. We will compare micro and application benchmarks on system SoC models containing clusters of ARM Cortex A53/A77/A65AE/N1, SiFive u74, and other vendor cores.

Aside from the processor resources such as cache and memory, the system will contain custom switches, Ingress/Egress buffers, credit flow control, DMA AI accelerators, NoC and AMBA AXI buses.

The evaluation and optimization criteria will be task latency, dCache hit-ratio, power consumed/task and memory bandwidth.

The parameters to be modified are bus topology, cache size, processor clock speed, custom arbiters, task thread allocation and changing the processor pipeline.

Key Takeaways:
1. Validating architecture models using mathematical calculus and hardware traces
2. Construct custom policies, arbitrations and configure processor cores
3. Select the right combination of statistics to detect bottlenecks and optimize the architecture
4. Identify the right use of stochastic, transaction, cycle-accurate and traces to construct the model

Speaker Bios:
Alex Su is a FPGA solution architect at E-Elements Technology, Hsinchu, Taiwan. Prior to that, Mr Su has worked at ARM Ltd for 5 years in technical support of Arm CPU and System IP.

Deepak Shankar is the Founder of Mirabilis Design and has been involved in the architecture exploration of over 250 SoC and processors. Deepak has published over 50 articles and presented at over 30 conferences in EDA, semiconductors, and embedded computing.

About Mirabilis Design
Mirabilis Design, a Silicon Valley company, designs cutting edge software solutions that identify and eliminate risks in product performance. Its flagship product, VisualSim Architect is a system-level modeling, simulation, and analysis environment that relies on libraries and application templates to vastly improve model construction and time required for analysis. The seamless design framework facilitates designers to work on a design together, cohesively, to meet an intermeshed time and power requirements. It is typically used for maximum results, early in the design stage, parallel to the development of the product’s written specification. It precedes implementation stages – RTL, software code, or schematic – rendering greater design flexibility.

Related Blogs