SNPS1670747138 DAC 2025 800x100px HRes

Siemens EDA Offers a Comprehensive Guide to PCIe® Transport Security

Siemens EDA Offers a Comprehensive Guide to PCIe® Transport Security
by Mike Gianfagna on 09-17-2024 at 6:00 am

Siemens EDA Offers a Comprehensive Guide to PCIe Transport Security

It is well-known that there is more data being generated all the time. The need to store and process that data with less power and higher throughput dominates design considerations for virtually all systems. There is another dimension to the problem – ensuring the data is secure as all this movement and processing occurs. Within computing systems, the Peripheral Component Interconnect Express, or PCIe standard is the de-facto method to move data. This standard has gained tremendous momentum. If you’d like to peruse the various versions of the standard, I recommend you visit the PCI SIG website. The considerations for how to secure PCIe channels and how to verify the robustness of those channels is the subject of this post. The options to consider are many, as are the technical requirements to design and validate a robust architecture. The good news is that a market leader has published a white paper to help guide you. Let’s see how Siemens EDA offers a comprehensive guide to PCIe transport security.

Framing the Problem

The concept of a secure PCIe link is easy to imagine. Making it work reliably with real world constraints is not as easy, however. It turns out there are many tradeoffs to face, and many decisions to make. And once you’ve done that, verifying the whole thing will work reliably is yet another challenge. As I read the white paper from Siemens EDA, I got an appreciation for the complexity of this task. If you plan to use PCIe channels in your next design, you’ll want to get a copy. A link is coming, but first let’s look at some of the items covered.

Suprio Biswas

The white paper is written by Suprio Biswas, an IP Verification Engineer at Siemens EDA. He has been working in the field of digital design and communication at Siemens EDA for over four years and has presented his work at a recent PCI-SIG conference. Suprio has a knack for explaining complex processes in an approachable way. I believe his efforts on this new white paper will help many design teams.

Before we get into some details, I need to define two key terms that will pop up repeatedly in our discussion:

  • Security protocol and data model (SPDM) specification – defines a message-based protocol to offer various security processes for authentication and setting up a secure session for the flow of encrypted packets.
  • Component measurement and authentication (CMA) – defines a mapping of the SPDM specification for PCIe implementation.

With that out of the way, let’s look at some topics covered in the white paper.

Some Details

The white paper begins with an overview of the topics to consider and the decisions that need to be made. Authentication, access control, data confidentiality/integrity and nonrepudiation are discussed. This last item prevents either the sender or the receiver from denying the transmission of a message. There is a lot of coordination to consider among these topics.

The aspects of implementation are then covered. This discussion centers on the various approaches to encryption, decryption and how keys are handled. The design considerations to be made are inter-related. For example, there can be a single key (secret key), or a pair of keys (public key and private key) based on the chosen cryptographic algorithm.

Getting back to the terms defined above, there is a very useful discussion about implementing security through the CMA/SPDM flow. There are many considerations to weigh here and trade-offs to be made. It is best to read the white paper and get the direct guidance of Suprio. To whet your appetite, below is a high-level CMA/SPDM flow for establishing a secure connection.

CMA/SPDM flow for establishing a secure connection

Suprio then covers the Siemens Verification IP (VIP) for PCIe. This IP verifies designs that test the successful establishment of a secure connection through CMA/SPDM before starting the flow of encrypted packets. The IP is compliant with the CMA Revision 1.1 specification and SPDM version 1.3.0 specification.

Many more details are provided in the white paper.

To Learn More

If you’d like to learn more about PCIe Gen6 verification, you can find that here. And finally, download your own copy of this valuable white paper here. You will find it to be a valuable asset for your next design. And that’s how Siemens EDA offers a comprehensive guide to PCIe transport security.


Semiconductor Industry Update: Fair Winds and Following Seas!

Semiconductor Industry Update: Fair Winds and Following Seas!
by Daniel Nenni on 09-16-2024 at 10:00 am

Malcolm Penn Four Horsemen

Malcolm Penn did a great job on his semiconductor update call. This is about the whole semiconductor industry (Logic and Memory) versus what I track which is mostly logic based on design starts and the foundries. Malcolm has been doing this a lot longer than I have and he has a proven methodology, but even then, semiconductors are more of a rollercoaster than a carousel so predictability is a serious challenge.

Malcolm feels that we are at the bottom of the downturn after the pandemic boom. He calls it the Golden Cross Breach which will lead to a good term of growth. The Golden  Cross breach is when the green 3/12 curve breaches the blue 12/12 curve. Again, this is memory and logic. Inventory is a much bigger factor with memory and that is a big part of hitting bottom, depleting excess inventory from the pandemic shortages scare.

Remember, at the end of 2023 forecasters suggested double digit growth for 2024. TSMC also predicted a double digit industry growth (10%) and a TSMC revenue growth of more than double the industry growth. Today TSMC is at a 30% revenue increase and I see that continuing the rest of the year with 2025 being even better when more customers hit N3 HVM, absolutely.

Unfortunately, after the new year the semiconductor industry crashed going from a +8.4% growth rate in Q4 2023 to a -5.7% growth rate in Q1 2024 sending the forecasters back to the magic 8-ball for revised predictions. Q2 2024 came back with a vengeance with a +6.5% growth rate giving forecasters a whiplash. We have been very forecast positive since then with a double digit revenue growth for 2024.

Malcolm’s forecasting formula looks at four things:

As Malcolm explained, the economy determines what we can buy. This means consumers and suppliers (CAPEX). Unit shipments is critical in my opinion because that is what we actually buy but that number depends on inventory in the financial forecasting sense. According to Malcolm we still have excess inventory right now which is still liquidating. Unit shipments is a big indicator for me, much bigger than ASPs, which are the prices we sell chips for (supply versus demand). Given the AI boom and the excessive GPU prices (Nvidia) this number is artificially inflated in my opinion. Fab capacity is also a big one for me. The semiconductor industry generally runs with fab utilization averaging 80%-90%. During the pandemic, orders were cancelled then restarted again so some fabs rebounded with 100%+ utilization then fell back to 60-70%. Today I have read that average capacity utilization is hedging back up to 80%-90% which I believe will be the case for the rest of 2024 and 2025.

My big concern, which I have mentioned in the past, is over capacity. If you look at the press releases in 2022 and 2023 the fab build plans were out of control. It really was an arms race type of deal. I blame Intel for that since the IDM 2.0 plan included huge growth and fab builds and the rest of the foundries followed suit. We also have re-shoring going on around the world which is more of a political issue in my opinion. Reality has now hit so the fab builds will scale down but China is still over spending (more than 50% of the total world wide CAPEX) on semiconductor equipment. Malcolm covered that in his update in more detail.

Moving forward Malcom updated his forecast for 2024 to 15% growth for the semiconductor industry and 8% growth in 2025. We will hear from other forecasters in Q3 but I would guess that they will follow Malcolm’s double digit number this year and back down to the normal semiconductor industry single digit growth for 2025, absolutely.

Malcolm’s presentation had 50+ slides with a Q&A at the end. For more information give him a ring:

Future Horizons Ltd
Blakes Green Cottage
Sevenoaks, Kent
TN15 0LQ, England
T: +44 (0)1732 740440
E: mail@futurehorizons.com
W: https://www.futurehorizons.com/

Also Read:

Robust Semiconductor Market in 2024

Semiconductor CapEx Down in 2024, Up Strongly in 2025

Automotive Semiconductor Market Slowing

2024 Starts Slow, But Primed for Growth


Samsung Adds to Bad Semiconductor News

Samsung Adds to Bad Semiconductor News
by Robert Maire on 09-16-2024 at 6:00 am

Samsung Layoffs Intel
  • Samsung follows Intel in staff reductions due to weakness in chips
  • Chip industry split between haves & have nots (AI & rest of chips)
  • Capital spend under pressure – Facing Eventual China issues
  • Stick with monopolies, avoid commodities

Samsung announces layoffs amid weak chip business and outlook
Samsung announced staff reductions across the company with some areas seeing a potential reduction of up to 30% of staff. In addition the Taylor Texas fab appears in trouble with likely further delays on the horizon.

Samsung Cuts staff and Texas Fab

Samsung changes Chip leader & worker issues

Samsung CHIPS Act funding in jeopardy just like Intel

As with Intel, CHIPS Act grants and loans are milestone based and if Samsung doesn’t hit the milestones they may not get the money.

We remain concerned about the progress of CHIPS Act projects and Intel and Samsung are already at risk.

Given that the memory market is not in great shape we are also very concerned about Micron’s future progress in CHIPS Act fabs. We have stated from the beginning that the planned fabs in Clay NY would likely take a while given the volatile conditions in the memory market.

TSMC appears to be on track, more or less, but is still having issues getting qualified operators in the US.

Global foundries will likely spend CHIPS Act money on its existing fab but certainly doesn’t need a second fab in New York when there isn’t enough demand for the first and China based competition is breathing down their neck

DRAM pricing dropping like a stone in market share fight

DRAM pricing has been dropping over the past few months as it appears to be a typical market share fight that we have seen in the past……

In past cycles, Samsung has used its cost of manufacture advantage to try and drive the market away from weaker competitors by cutting pricing.

This time around its a bit different as Samsung does not appear to have the price advantage it has previously enjoyed so cutting pricing doesn’t gain market share, it just becomes a race to the bottom which benefits no one.

Unseasonal weakness even more concerning

We are at a point in the annual seasonality where memory pricing should be at its strongest as we have new IPhones coming out and products being built in anticipation of the holiday selling season…..but not so……

Memory pricing is going down when is should usually be going up….not good.

We hear that there is a lot of product/excess inventory in the channel……

HBM not to the rescue

As we have said a number of times in the past HBM and AI is nothing short of fantastic but HBM memory is single digit percentages of the overall memory market.

When we had just SK Hynix supplying HBM, prices were obviously high due to a monopoly. Now that Samsung and MIcron are adding to the mix, not so much a monopoly anymore……

HBM is a commodity just like every other type of memory…..don’t forget that fact and act accordingly

Memory makers becoming unhinged

Everyone for the past couple of years had been complementing the memory makers for their “rational” behavior….well not so anymore. Perhaps the world of politics is infecting the memory industry with irrational, unhinged, behavior. It feels as if memory makers are back to their old ways of irrational spend, pricing and market share expectations.

As we have seen in prior times this type of behavior suggests they are just shooting themselves in their own foot and creating their own oversupply/declining price driven downcycle.

We think memory maker stocks should likely reflect this irrational behavior much as their stock prices were previously rewarded for prior rational behavior…it means the recent stock price declines are well justified and will likely continue.

The Stocks
Commodities & Monopolies

As always, we would avoid commodity chip producers (AKA memory) unless there is an extended shortage (which we are obviously over) for demand or technology based reasons.

We prefer monopoly-like companies in both chips as well as chip equipment.

In chips, the best monopoly is clearly Nvidia as no one else seems to come close in AI devices (at least not yet).

In equipment companies, we continue to prefer the monopoly of ASML despite the China issues and regulatory problems.

In foundries, TSMC has a virtual monopoly as Samsung’s foundry business appears to have fallen even further behind TSMC in technology and yield. There is no other foundry within striking distance of TSMC, the rest are behind Samsung or not in the same universe.

We have been repeating for quite some time now that the chip industry is a one trick pony (AI) and the rest of the industry, which is the majority, is not in great shape and memory looks to be in decline.

Stock prices seem to finally have figured out what we have been saying.

Its equally hard to come up with a recovery scenario for semiconductor equipment stocks given the likely negative bias of Intel & Samsung (and others soon to follow)

If CHIPS Act related projects start to unravel, due to industry downturns, in Ohio, Texas, New York or similar supplemented projects in Germany, Israel, Korea etc; capital spending will also unravel.

If we can’t take advantage of essentially “free money” in a capital intensive industry somethings wrong…..

Then, on top of everything else we have the 800 pound gorilla that is China, both in Chip production as well as equipment purchases.

Rising China production is an existential threat to second tier foundries and the 40% of all equipment that continues to flow to China is keeping the equipment industry in the black.

Sooner or later, all the equipment that China has purchased will come on line. Sooner or later China will slow its non China based equipment purchases.

Things are shaky and getting shakier in the overall chip industry. Hardly a confidence inspiring situation as the news flow seems to be more negative when it should be getting more positive on a seasonal basis.

We still love AI and all related things and continue to own Nvidia, but the headwinds in the rest of the semiconductor industry may be building………

About Semiconductor Advisors LLC

Semiconductor Advisors is an RIA (a Registered Investment Advisor),
specializing in technology companies with particular emphasis on semiconductor and semiconductor equipment companies.
We have been covering the space longer and been involved with more transactions than any other financial professional in the space.
We provide research, consulting and advisory services on strategic and financial matters to both industry participants as well as investors.
We offer expert, intelligent, balanced research and advice. Our opinions are very direct and honest and offer an unbiased view as compared to other sources.

Also Read:

AMAT Underwhelms- China & GM & ICAP Headwinds- AI is only Driver- Slow Recovery

LRCX Good but not good enough results, AMAT Epic failure and Slow Steady Recovery

The China Syndrome- The Meltdown Starts- Trump Trounces Taiwan- Chips Clipped


Podcast EP247: How Model N Helps to Navigate the Complexities of the Worldwide Semiconductor Supply Chain

Podcast EP247: How Model N Helps to Navigate the Complexities of the Worldwide Semiconductor Supply Chain
by Daniel Nenni on 09-13-2024 at 10:00 am

Dan is joined by Gloria Kee, Vice President of Product Management at Model N. For 15 years at Model N, she has spent her time focused on product management and with an in-depth understanding of implementing and designing innovative software across a variety of business challenges. She is committed to product innovation and development within the High Tech Industry.

In this far reaching discussion, Dan explores the geopolitical forces at play in the semiconductor industry with Gloria. Global supply chain dynamics, including the evolving relationship with China are discussed. Gloria comments on national security considerations, workforce development, the role of international collaboration and the importance of sustainability. How technologies such as cloud and AI fit are also reviewed.

Gloria explains the broad charter for Model N to support the complex business needs of the world’s leading brands in pharmaceutical, medical device, high tech, manufacturing and semiconductors across more than 120 countries.

The views, thoughts, and opinions expressed in these podcasts belong solely to the speaker, and not to the speaker’s employer, organization, committee or any other group or individual.


Executive Interview: Michael Wu, GM and President of Phison US

Executive Interview: Michael Wu, GM and President of Phison US
by Daniel Nenni on 09-13-2024 at 6:00 am

IMG 2888

Michael Wu is the GM and President of Phison US. Michael is an acclaimed technology expert in the NAND storage sector and boasts over 17 years of industry experience. Over the course of his 14+ years at Phison, Michael has held positions such as GM, Director of Global Customer Relations and Project Manager to position Phison as the world’s leading NAND storage solution company. Under his leadership at Phison, Michael manages and oversees the company’s USA operation to drive sales growth and profitability. During his tenure at Phison, Michael previously coordinated customer activities and support for the North America, South America and European regions and provided technical support to design-in projects and managing product life cycles. Previously, Michael was the Verification Engineer at RF Micro Devices. Michael holds a Master of Science in Electrical Engineering and a Bachelor of Science in Electrical Engineering from Virginia Tech.

Tell us about your company?

Phison is a global leader in NAND controllers and storage solutions that delivers industry-first solutions built for the modern era of AI workloads and enterprise computing.

Today, Phison is one of the best kept secrets in the global tech sector as the largest SSD supplier for over 20 years that many are unaware of due to its white label business model origin.

For over two decades, Phison’s focus has been on offering customized design solutions and delivering turnkey SSDs on behalf of huge brands to address today’s enterprise storage challenges, particularly with the onslaught of AI applications. In May, with the launch of its PASCARI brand, Phison came out of stealth mode and showed its commitment to foster rapid advancements in enterprise innovation to keep up with evolving dynamic needs of the enterprise storage market. Then, by combining the power of PASCARI with Phison’s proprietary software, Phison launched aiDAPTIV+ Pro Suite, designed for optimal memory and storage usage to meet the demands of SMBs looking to leverage GenAI. The debut of these products demonstrates that Phison is on a mission to democratize AI so that enterprises of ALL sizes can participate.

What problems are you solving?

With aiDAPTIV+, Phison is removing the barriers of entry for small and medium size businesses to access the power of GenAI. While the hype cycle for GenAI has been robust, it is at an inflection point with financial market wobbles because many companies are not yet investing due to concerns about the ROI. CIOs and decision makers will not buy-in until it becomes accessible, affordable and profitable. Additionally, it requires that companies of all sizes can participate to realize the true potential and usher in the next Industrial Revolution. This remains a hurdle as Generative AI requires massive computing power that is mostly inaccessible for small and mid-size companies. If these companies want to do their own AI projects, they typically must do them through a cloud service provider, meaning the company suddenly loses a portion of control over its data.

At Phison, we have developed the aiDAPTIV+ Pro Suite as a solution to enable any organization to run generative AI projects and large language models (LLMs) from regular on-premises workstations. Our solution is an AI-specialized aiDAPTIV+ SSD with proprietary software and tools, along with an ecosystem of support through our trusted partners.

With this aiDAPTIV+ SSD solution, companies can develop their own AI projects and retain complete control over their data on-premises. The “entry fee” to generative AI is no longer exclusive to large enterprises.”

What application areas are your strongest?

Prior to the release of aiDAPTIV+ Pro Suite to the market, small and medium-sized businesses dealt with limited technology options offering small and imprecise training without the ability to scale beyond 7B model training. Phison’s aiDAPTIV+ solution enables significantly larger model fine-tuning to allow users to run workloads previously reserved for data centers. For the first time in the U.S., aiDAPTIV+ Pro Suite was demonstrated as an end-to-end on-premises generative AI solution at FMS 2024 and earned “Best of Show, Most Innovative AI Application”. Now even users with limited resources are empowered to train large language models. This AI appliance solution allows system integrators to build turn-key large language model training systems from start to successful finish. With a simplistic user experience that turns raw data into tokenized data, it runs the fine-tuning process with Llama-3 70B precision and offers the ability to ask questions about the data, all from a local domain on premises. Data can be fine-tuned in about 4 hours which is sufficient for most business applications.

At a game changing price point of about $40K, aiDAPTIV+ removes the barriers to entry for small and medium sized businesses, organizations fully own their data and can fine tune it with a turnkey solution, no additional IT or engineering staff is required to run it, and security risks and expenses are minimized with keeping data on premises. Most importantly, trained data delivers immediate business value.

In less than a year, the market response has been remarkable with over 100 enterprises using it in a variety of use cases. The solution has addressed universal pain points like onboarding new employees, keeping up with professional development needs, keeping up with coding demands, and the need to automate tasks to keep up with huge data volumes.

What keeps your customers up at night?

IT managers that have to keep up with evolving technology demands with less budget and staff to run it worry about security, loss of data, failures of legacy systems, unpredictable cloud service bills, vendor lock-in from cloud providers, and fear of missing out on promising technologies like GenAI.

What does the competitive landscape look like and how do you differentiate?

Prior to the launch of Phison’s aiDAPTIV+, only large enterprises with deep pockets could access and afford GenAI. We were the first to use storage as memory to lower the price point from $1million plus for equipment with associated fees for cloud services to $40K located on premises. It takes about 4 hours to train LLMs, this is acceptable for most businesses.

Now even users with limited resources are empowered to train large language models. Our end-to-end AI appliance solution has been recognized for allowing system integrators to build turn-key large language model training systems from start to successful finish. With a simplistic user experience that turns raw data into tokenized data, aiDAPTIV+ runs the fine-tuning process with Llama-3 70B precision and offers the ability to ask questions about the data, all from a local domain on premises.

What new features/technology are you working on?

Phison continues to invest in R&D and engage with customers to understand business challenges to best align roadmap to modern customer requirements. As the aiDAPTIV+ and Pascari brands grow, so will the robust set of features to meet the current and future industry needs.

How do customers normally engage with your company?

Previously customers relied solely on channel partners and system integrators to purchase Phison products. One in four SSDs in applications today are our white label products SSDs and NAND controllers. With Pascari and aiDAPTIV+ we bring the technology to the end user ourselves and with select partners.

Also Read:

CEO Interview: Wendy Chen of MSquare Technology

CEO Interview: BRAM DE MUER of ICsense

CEO Interview: Anders Storm of Sivers Semiconductors


Ansys and eShard Sign Agreement to Deliver Comprehensive Hardware Security Solution for Semiconductor Products

Ansys and eShard Sign Agreement to Deliver Comprehensive Hardware Security Solution for Semiconductor Products
by Marc Swinnen on 09-12-2024 at 10:00 am

Probing

Integrated circuits, or chips, lie at the heart of today’s electronic systems that are mission critical for almost every sector of the economy – from healthcare, to banking, military equipment, cars, planes, telecommunications, and the internet itself. The data flowing through these systems is the lifeblood of modern life and we go to great lengths to protect it from unauthorized access.

We are all familiar with security passwords, PIN codes, and two-factor authentication that aim to secure electronic systems against code viruses and software vulnerabilities. But there is also a completely different category of hacking vulnerabilities that are rooted in hardware, not software. This area of hardware security seeks to prevent unauthorized leakage of critical information carried out by so-called “side-channel attacks”. A side-channel can be any physical phenomenon that can be exploited to reveal the internal workings of a chip. Some of the most common side-channels are power noise, electromagnetic radiation, and thermal.

Power noise refers to the changes in power supply current drawn by a chip as it executes different instructions. By monitoring how much the supply current goes up and down it is possible to reveal a cryptographic key embedded in the chip. Similarly, an electromagnetic probe hovering a few millimeters above the surface of a chip can detect the switching activity of internal signals. Thermal side-channel attacks monitor the amount and location of heat produced by the chip. These are all examples of hardware security vulnerabilities that can be used to reveal secure data and cryptographic keys. A side-channel attack leaves no trace that the data has been compromised and may not even require any physical contact with the chip. It also cannot be defended against with traditional software security techniques.

Examples of probing integrated circuits for electromagnetic side-channel emissions

Much of our modern infrastructure relies on cryptography to secure data storage and communication. The internet has moved to secure “https://” website addresses; credit card and online banking transactions are secured with encryption in silicon; healthcare data is kept confidential with encryption; and military units rely on encryption to communicate on the battlefield. Hardware security vulnerabilities have already been exploited in the real world with examples ranging from the cloning of car fobs to identified microprocessor security vulnerabilities called ‘Meltdown’, ‘Spectre’, and ‘Platypus’. Data security has become a pervasive concern for many leading companies and semiconductor designers are eager to strengthen their chips against hardware vulnerabilities.

Traditionally, side-channel vulnerabilities have been measured and evaluated in a lab by examining the actual device. While accurate and reliable, lab testing is expensive and slow and – most importantly – it is closing the barn door after the horse has bolted. There is no way to fix any detected vulnerability! Often millions of dollars have already been spent on creating photomasks and the chip has been manufactured. Any fixes will require an expensive redesign, more money for new photomasks, and months of extra delay. It would be better to be able to simulate and predict side-channel vulnerabilities at the design stage before manufacturing. Simulation is faster, cheaper, and more flexible than actual physical measurements. It requires less operator expertise, and it can point the way to improving and safeguarding the design before any money is spent on manufacturing.

Ansys and eShard Technologies are both leading experts in hardware security and have announced an agreement to collaborate to deliver a comprehensive solution that includes both pre-silicon and post-silicon security verification. eShard is a leading provider of chip security testing with the esDynamic™ testing platform for side-channel testing, fault injection, and failure analysis. With dozens of customers, including major semiconductor companies, eShard is able to deploy proven algorithms for verifying many advanced security algorithms, including AES, RSA, ECC, and HMAC. esDynamic has algorithms to efficiently evaluate the security for these standards in physical hardware and generate relevant metrics for the strength of the security implementation.

eShard’s  agreement with Ansys allows Ansys RedHawk-SC Security™ to perform the same extensive suite of cryptographic analyses at the pre-silicon design stage and flag potential areas of weakness. RedHawk-SC Security is built on the foundry-certified Ansys RedHawk-SC™ golden signoff tool for power integrity analysis of digital chips. RedHawk-SC Security simulates pre-silicon layouts with Ansys’s industry-leading physics simulation engines for electromagnetic and power noise analysis. These pre-silicon predictions cover all anticipated usage modes by combining user-generated activity vectors, automatic ‘vectorless’ activity, and real-world activity based on actual software execution. RedHawk-SC Security ties into esDynamic’s platform for management of the entire security verification workflow.

Building a best-in-class pre-silicon to post-silicon security testing solution

The collaboration brings together eShard’s expertise in security with Ansys’s foundry-certified expertise in physical simulation to deliver a uniquely broad and capable hardware security solution. This collaboration offers the joint advantages of pre-silicon simulation for fast, cost-effective problem avoidance, and post-silicon verification for the most reliable accuracy. Customers can now deploy an integrated solution platform that gives regular chip designers at all stages in the design flow – from RTL to layout – the expertise to verify a comprehensive suite of security protocols. This easy-to-use workflow and deliver proven levels of hardware security at every stage of semiconductor product development.

Marc Swinnen, Director of Product Marketing – Semiconductors Ansys

Also Read:

Ansys and NVIDIA Collaboration Will Be On Display at DAC 2024

Don’t Settle for Less Than Optimal – Get the Perfect Inductor Every Time

Simulation World 2024 Virtual Event


Gazzillion Misses – Making the Memory Wall Irrelevant

Gazzillion Misses – Making the Memory Wall Irrelevant
by admin on 09-12-2024 at 6:00 am

Gazzillion pyramid

Memory Hierarchy and the Memory Wall

Computer programs mainly move data around. In the meantime, they do some computations on the data but the bulk of execution time and energy is spent moving data around. In computer jargon we say that applications tend to be memory bound: this means that memory is the main performance limiting factor. A plethora of popular applications are memory bound, such as Artificial Intelligence, Machine Learning or Scientific Computing.

By memory we mean any physical system able to store and retrieve data. In a digital computer, memories are built out of electrical parts, such as transistors or capacitors. Ideally, programmers would like the memory to be fast and large, i.e. they demand quick access to a huge amount of data. Unfortunately, these are conflicting goals. For physical reasons, larger memories are slower and, hence, we cannot provide a single memory device that is both fast and large. The solution that computer architects found to this problem is the memory hierarchy, illustrated in the next figure.

The memory hierarchy is based on the principle of locality, which states that data accessed recently are very likely to be accessed again in the near future. Modern processors leverage this principle of locality by storing recently accessed data in a small and fast cache. Memory requests that find the data in the cache can be served at the fastest speed; these accesses are called cache hits. However, if the data is not found in the cache we have to access the next level of the memory hierarchy, the Main Memory, largely increasing the latency for serving the request. These accesses are called cache misses. By combining different memory devices in a hierarchical manner, the system gives the impression of a memory that is as fast as the fastest level (Cache) and as large as the largest level (Hard Drive).

Cache misses are one of the key performance limiting factors for memory bound applications. In the last decades, processor speed has increased at a much faster pace than memory speed, creating the problem known as the memory wall. Due to this disparity between processor speed and memory speed, serving a cache miss may take tens or even hundreds of CPU cycles, and this gap keeps increasing.

In a classical cache, whenever a cache miss occurs, the processor will stall until the miss is serviced by the memory. This type of cache is called a blocking cache, as the processor execution is blocked until the cache miss is resolved, i.e. the cache cannot continue processing requests in the presence of a cache miss. In order to improve performance, more sophisticated caches have been developed.

Non-Blocking Caches

In case of a cache miss, there may be subsequent (younger) requests whose data are available in the cache. If we could allow the cache to serve cache hits while the miss is solved, then the processor could continue doing useful work instead of just being idle. This is the idea of non-blocking caches [1][2], a.k.a. lockup-free caches. Non-blocking caches allow the processor to continue doing useful work even in the presence of a cache miss.

Modern processors use non-blocking caches that can tolerate a relatively small number of cache misses, typically around 16-20. This means that the processor can continue working until it reaches 20 cache misses and then it will stop, waiting for the misses to be serviced. Although this is a significant improvement over blocking caches, it can still result in large idle times for memory intensive applications.

Gazzillion Misses

Our Gazzillion MissesTM technology takes the idea of non-blocking caches to the extreme by providing up to 128 cache misses per core. By supporting such a large number of outstanding misses, our Avispado and Atrevido cores can avoid idle times waiting for main memory to service the data. Furthermore, we can tailor the aggressiveness of the Gazzillion to fulfill customer’s design targets, providing an efficient area-performance trade-off for each memory system.

There are multiple reasons why Gazzillion Misses results in significant performance improvements:

Solving the Memory Wall

Serving a cache miss is expensive. Main memories are located off chip, on dedicated memory circuits based on DDR [3] or HBM [4] technology and, hence, just doing a round-trip to memory takes a non-negligible amount of time. This is especially concerning with the advent of CXL.mem [5], which locates main memory even further away from the CPU. In addition, accessing a memory chip also takes a significant amount of time. Due to the memory wall problem, accessing main memory takes a large number of CPU cycles and, therefore, a CPU can quickly become idle if it stops processing requests after a few cache misses. Gazzillion Misses has been designed to solve this issue, largely improving the capability of Avispado and Atrevido cores to tolerate main memory latency.

Effectively Using Memory Bandwidth

Main memory technologies provide a high bandwidth, but they require a large number of outstanding requests to maximize bandwidth usage. Main memory is split in multiple channels, ranks and banks, and it requires a large number of parallel accesses to effectively exploit its bandwidth. Gazzillion Misses is able to generate a large amount of parallel accesses from a small number of cores, effectively exploiting main memory bandwidth.

A Perfect Fit for Vectorized Applications

Vectorized codes put a high pressure on the memory system. Scatter/gather operations, such as indexed vector load/store instructions, can generate a large number of cache misses from just a few vector instructions. Hence, tolerating a large number of misses is key to deliver high performance in vectorized applications. A paradigmatic example of such applications are sparse, i.e. pruned, Deep Neural Networks [7], that are well known for exhibiting irregular memory access patterns that result in a large number of cache misses. Gazzillion Misses is a perfect solution for such applications.

What does this have to do with Leonardo Da Vinci?

To better illustrate Gazzillion Misses, we would like to borrow an analogy from the classical textbook “Computer Organization and Design” [8]. Suppose you want to write an essay about Leonardo Da Vinci and, for some reason, you do not want to use the Internet, Wikipedia or just tell ChatGPT to write the essay for you. You want to do your research the old-fashioned way, by going to a library, either because you feel nostalgic or because you enjoy the touch and smell of books. You arrive at the library and pull out some books about Leonardo Da Vinci, then you sit at a desk with your selected books. The desk is your cache: it gives you quick access to a few books. It cannot store all the books in the library, but since you are focusing on Da Vinci, there is a good chance that you will find the information that you need in the books in front of you. This capability to store several books on the desk close at hand saves you a lot of time, as you do not have to constantly go back and forth to the shelves to return a book and take another one. This is similar to having a cache inside the processor that contains a subset of the data.

After spending some time reading the books in front of you and writing your essay, you decide to include a few words on Da Vinci’s study of the human body. However, none of the books on your desk mention Da Vinci’s contributions to our understanding of human anatomy. In other words, the data you are looking for is not on your desk, so you just had a cache miss. Now you have to go back to the shelves and start looking for a book that contains the information that you want. During this time, you are not making any progress on your essay, you are just wandering around. This is what we call idle time, you stop working on your essay until you locate the book that you need.

You can be more efficient by leveraging the idea of non-blocking caches. Let’s assume that you have a friend that can locate and bring the book while you continue working on your essay. Of course, you cannot write about anatomy because you do not have the required book, but you have other books on your desk that describe Da Vinci’s paintings and inventions, so you can continue writing. By doing this you avoid stopping your work on a cache miss, reducing idle time. However, if your friend takes a large amount of time to locate the book, at some point you will be again idle waiting for your friend to bring the book.

This is when Gazzillion Misses comes pretty handy. Our Gazzillion technology gives you 128 friends that will be running up and down the library, looking for the books that you will need to write your essay and making sure that, whenever you require a given book, it will be available on your desk.

To sum up, our Gazzillion Misses technology has been designed to effectively tolerate main memory latency and to maximize memory bandwidth usage. Due to its unprecedented number of simultaneous cache misses, our Avispado and Atrevido cores are the fastest RISC-V processors for moving data around.

Further information at www.semidynamics.com

References

[1] David Kroft. “Lockup-Free Instruction Fetch/Prefech Cache Organization”. Proceedings of

the 8th Int. Symp. on Computer Architecture, May 1981, pp. 81-87.

[2] Belayneh, Samson, and David R. Kaeli. “A discussion on non-blocking/lockup-free caches”. ACM SIGARCH Computer Architecture News 24.3 (1996): 18-25.

[3] DDR Memory: https://en.wikipedia.org/wiki/DDR_SDRAM

[4] High Bandwidth Memory: https://en.wikipedia.org/wiki/High_Bandwidth_Memory

[5] Compute Express Link: https://en.wikipedia.org/wiki/Compute_Express_Link

[6] Memory Wall: https://en.wikipedia.org/wiki/Random-access_memory#Memory_wall

[7] Han, S., Mao, H., & Dally, W. J. (2015). Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149.

[8] Patterson, D. A., & Hennessy, J. L. (2013). Computer organization and design: the hardware/software interface (the Morgan Kaufmann series in computer architecture and design). Paperback, Morgan Kaufmann Publishers.

Gazzillion Misses is a trademark of Semidynamics

Also Read:

CEO Interview: Roger Espasa of Semidynamics

Semidynamics Shakes Up Embedded World 2024 with All-In-One AI IP to Power Nextgen AI Chips

RISC-V Summit Buzz – Semidynamics Founder and CEO Roger Espasa Introduces Extreme Customization


Bluetooth 6.0 Channel Sounding is Here

Bluetooth 6.0 Channel Sounding is Here
by Bernard Murphy on 09-11-2024 at 6:00 am

car keyless entry

I posted a blog on this topic a year ago. Now the Bluetooth Sig has (just) ratified the standard it is timely to provide a reminder on what this new capability can offer. Channel Sounding introduced in Bluetooth Core specification version 6.0 is a method to significantly increase the accuracy of Bluetooth-based distance measurements, from an accuracy of around 3-5 meters down to around 30-50 centimeters. Channel sounding opens new, more efficient and more secure options to support keyless entry, Find My Device, and other applications.

Why do we need a new standard for ranging?

Bluetooth is everywhere and already provides ranging support through RSSI (received signal strength indication). Does that suggest there is no need for a new standard and new devices? Can’t we just use what we already have? RSSI is as simple as it gets, measuring the attenuation in signal strength from the transmitter. However attenuation depends on more than distance; obstacles and diffraction also contribute. Multiple beacons can help improve accuracy in say retail or hospital settings, but such solutions are obviously not helpful for personal applications like keyless car entry.

Angle of arrival (AoA) and angle of departure (AoD) measurements, introduced in Bluetooth 5.1, can help improve accuracy through trigonometric refinements, though they require multiple antennas on receiver or transmitters. And these methods are equally compromised by multipath propagation resulting from reflection from surfaces around the path between receiver and transmitter.

Bluetooth 6.0 Channel Sounding instead uses phase-based ranging. A device sends a sine wave to a peer device which then sends the same signal back to the original device. The phase difference between the initial signal and the received signal gives a quite precise measure of distance. Better yet, a reflection off an obstacle will travel a longer path than the direct return, exhibiting a bigger phase shift and making it easy to ignore in distance estimation.

On the other hand, UWB (ultra-wide band) is very accurate, able to deliver positions with accuracy down to a centimeter or below. The tradeoff is that UWB requires an additional and specific MAC, modem and radio (either integrated or in separate chipset), adding to a device bill of materials (given that the device will also need to support Bluetooth for other reasons). And UWB is more energy hungry than Bluetooth, draining a battery faster unless used only when absolutely needed.

Is there a best solution?

One idea might be to combine RSSI methods for approximate ranging with UWB for accuracy close-in. There are two problems here. First, earlier generation Bluetooth versions are vulnerable to relay attacks. In a raw ranging attempt an attacker can intercept the BT communication and relay it to another device, allowing them to open your car door if UWB is not used for added security.

The second problem is with power consumption under some conditions. Suppose your car is parked near your front door and your car keys are on a table just inside the door, within range of the car. It’s not hard to imagine your key fob constantly trying to communicate with the car, triggering UWB ranging sessions and quickly draining the key fob battery.

There is significant support for the view that practical applications must depend on a combination of methods. The FiRa consortium advancing UWB supports this direction as does the Car Connectivity Consortium (CCC). One suggestion is to use RSSI for initial rough estimates within perhaps 30 meters. This then switches to channel sounding as you get closer. Here it can exchange credentials to first establish a secure connection and can further refine the RSSI estimates at the driver approaches the car. At some point within closer range UWB can further refine the estimate.

Switching between these methods should of course be transparent to a user and should be tuned to avoid the problem of unnecessarily turning on UWB when not required. Optimizing these choices is an area where product builders can differentiate if they work with a supplier offering all three options.

Talk to an expert to optimize your product options

Now for a no apologies company plug. If you really want the best solution in your product, you should probably talk to a company that has been at the forefront of embedded Bluetooth since the very early releases. Ceva is well established here, offering backward compatible Bluetooth Classic, Bluetooth LE, and dual mode over many generations with countless chips in production. They are just as active in the newer UWB standard, with supporting IP proven across several customers.

On every generation Ceva tracks emerging standards, ready to release once the standard has been ratified, and ready to announce certification once the test spec has been announced. They already have compliant Bluetooth 6.0 IP for early testing and expect to be certified shortly after the test spec is released.

With this depth of expertise, they encourage you first to simply talk with them about your goals. They can advise how you can best meet your objectives. Once you feel comfortable, they have all the IP, software stacks and profiles to help you deliver the very best of what you have to offer.

Also check out Ceva’s Bluetooth solutions and UWB solutions web pages.


TetraMem Integrates Energy-Efficient In-Memory Computing with Andes RISC-V Vector Processor

TetraMem Integrates Energy-Efficient In-Memory Computing with Andes RISC-V Vector Processor
by Wenbo Yin on 09-10-2024 at 10:00 am

MX100

The rapid proliferation of artificial intelligence (AI) across a growing number of hardware applications has driven an unprecedented demand for specialized compute acceleration not met by conventional von Neumann architectures. Among the competing alternatives, one showing the greatest promise is analog in-memory computing (IMC). Unleashing the potential of multi-level Resistive RAM (RRAM) is making the promise more real today than in the past. Leading this development, TetraMem, Inc., a Silicon Valley based startup, is addressing the fundamental challenges holding this solution back. The company’s unique IMC that employs multi-level RRAM technology provides more efficient, low-latency AI processing that meets the growing needs of modern applications in AR/VR, mobile, IoT, and beyond.

Background on the Semiconductor Industry

The semiconductor industry has seen significant advancements over the past few decades, particularly in response to the burgeoning needs of AI and machine learning (ML). Innovations in chip design have pushed the boundaries of performance and efficiency. However, several intrinsic persistent challenges remain, such as the von Neumann bottleneck and memory wall, which limits data transfer rates between the CPU and memory, and the escalating power consumption and thermal management issues associated with advanced node technologies.

In-memory computing (IMC) represents a ground-breaking computing paradigm shift in how data processing is accomplished. Traditional computing architectures separate memory and processing units, resulting in significant data transfer overheads, especially for the data centric AI applications. On the other hand, IMC integrates memory and processing within the same physical location, enabling faster and more efficient data computations with a crossbar array architecture to further eliminate the large quantity of intermediate data from those matrix operations. This approach is particularly beneficial for AI and ML applications, where large-scale data processing and real-time analytics are critical.

Selecting a suitable memory device for IMC is crucial. Traditional memory technologies like SRAM and DRAM are not optimized for in-memory operations due to their device and cell constraints and their volatility idiosyncrasies. RRAM, with its high density, multilevel capability and non-volatility with superior retention, overcomes these challenges with no refresh needed. The working principle of RRAM involves adjusting the resistance level of the memory cell through controlled voltage or current, mimicking the behavior of synapses in the human brain. This capability makes RRAM particularly suited for analog in-memory computing.

TetraMem has focused its efforts on multi-level RRAM (memristor) technology, which offers several advantages over traditional single level cell memory technologies. RRAM’s ability to store multiple bits per cell and perform efficient matrix multiplications in situ makes it an ideal candidate for IMC. This technology addresses many of the limitations of conventional digital computing, such as bandwidth constraints and power inefficiency.

The RRAM programmable circuit element remembers its last stable resistance level. This resistance level can be adjusted by applying voltage or current. Changes in magnitude and direction of voltage and current applied to the element alters its conductance, thus changing its resistivity. Akin to how a human neuron functions, this mechanism has diverse applications: memory, analog neuron, and, at TetraMem, in-memory computing. The operation of an RRAM is driven by ions. With control of the conductive filament size, ion concentration and height, different multi-levels for cell resistance can be precisely achieved.

Data processed in the same physical location as it is stored with minimum intermediate data movement and storage results in low power consumption. Massive parallel computing by crossbar array architecture with device-level grain cores yields high throughput. And computing by physical laws in this way (Ohm’s law and Kirchhoff’s current law) produces low latency. TetraMem’s nonvolatile compute in-memory cell reduces power consumption by orders of magnitude over a conventional digital von Neumann architecture.

Notable Achievements

TetraMem has achieved significant milestones in the development of RRAM technology. Notably, the company has demonstrated an unprecedented device with 11 bits per cell, achieving over 2,000 levels in a single element. This level of precision represents a major breakthrough in memory compute technology.

Recent publications in prestigious journals such as Nature1 and Science2 highlight TetraMem’s innovative approaches. Techniques to improve cell noise performance and to enhance multi-level IMC have been key areas of advancement. For example, TetraMem has developed proprietary algorithms to suppress random telegraph noise, resulting in superior memory retention and endurance characteristics for RRAM cells.

Operation of IMC

TetraMem’s IMC technology utilizes a crossbar architecture, where each cross-point in the array corresponds to a programmable RRAM memory cell. This configuration allows for highly parallel operations, which are essential for neural network computations. During a Vector-Matrix Multiplication (VMM) operation, input activations are applied to the crossbar array, and the resulting computations are collected on the bit lines. This method significantly reduces the need to transfer data between memory and processing units, thereby enhancing computational efficiency.

Real-World Applications

TetraMem’s first evaluation SoC through the commercial fab process, the MX100 chip (see figure) exemplifies the practical applications of its IMC technology. The chip has been demonstrated in various on-chip demos, showcasing its capabilities in real-world scenarios. One notable demo, the Pupil Center Net (PCN), illustrates the chip’s application in AR/VR for face tracking and authentication monitoring in autonomous vehicles.

To facilitate the adoption of its technology, TetraMem provides a comprehensive Software Development Kit (SDK). This SDK enables developers to define edge AI models seamlessly. Furthermore, the integration with Andes Technology Inc.’s NX27V RISC-V CPU with Vector extensions streamlines operations, making      it easier for customers to deploy TetraMem’s solutions in their products.

The TetraMem IMC design is great for matrix multiplication but not as efficient in other functions such as vector or scalar operations. These operations are used frequently in neural networks.  For these functions, Andes provides the flexibility of a CPU plus a vector engine as well as an existing SoC reference design and a mature compiler and library to accelerate our time to market.

TetraMem collaborated with Andes Technology to integrate its IMC technology with Andes’ RISC-V CPU with Vector Extensions. This partnership enhances the overall system performance, providing a robust platform for a variety of AI tasks. The combined solution leverages the strengths of both companies, offering a flexible and high-performance architecture.

Looking ahead, TetraMem is poised to introduce the MX200 chip based on 22nm, which promises even greater performance and efficiency. This chip is designed for edge inference applications, offering low-power, low-latency AI processing. The MX200 is expected to open new market opportunities, particularly in battery-powered AI devices where energy efficiency is paramount.

Conclusion

TetraMem’s advancements in in-memory computing represent a significant leap forward in the field of AI hardware. By addressing the fundamental challenges of conventional computing, TetraMem is paving the way for more efficient and scalable AI solutions. As the company continues to innovate and collaborate with industry leaders like Andes Technology, the future of AI processing looks promising. TetraMem’s solution not only enhances performance but also lowers the barriers to entry for adopting cutting-edge AI technologies.

By Wenbo Yin, Vice President of IC Design, TetraMem Inc.

  1. “Thousands of conductance levels in memristors monolithically integrated on CMOS”, Nature, Mar 2023 https://rdcu.be/c8GWo

“Programming memristor arrays with arbitrarily high precision for analog computing”, Science, Feb 2024 https://www.science.org/doi/10.1126/science.adi9405

Also Read:

Unlocking the Future: Join Us at RISC-V Con 2024 Panel Discussion!

LIVE WEBINAR: RISC-V Instruction Set Architecture: Enhancing Computing Power

Andes Technology: Pioneering the Future of RISC-V CPU IP


Samtec Demystifies Signal Integrity for Everyone

Samtec Demystifies Signal Integrity for Everyone
by Mike Gianfagna on 09-10-2024 at 6:00 am

Samtec Demystifies Signal Integrity for Everyone

As clock speeds go up, voltages go down and data volumes explode the need for fast, reliable and low latency data channels becomes critical in all kinds of applications. Balancing the requirements of low power and high performance requires the mastery of many skills. At the top of many lists is the need for superior signal integrity, or SI. In its most basic form, SI ensures that a signal is transmitted with sufficient quality (or integrity) to allow effective communication. This definition is deceptively simple. Implementing effective SI requires mastery of many disciplines. There are signal integrity experts among us who have spent an entire career learning how to create reliable, efficient signal channels. But what about the rest of us?  It is often difficult to even know where to start. The good news is that Samtec has an extensive set of resources to help. Let’s look at how Samtec demystifies signal integrity for everyone.

The Tools of the Trade

Understanding signal integrity and how to optimize it demands a working knowledge of a vast array of concepts, technologies and metrics. While most of us know what a baud rate and a differential signal are, the concepts of Nyquist frequency, bit error rates, insertion loss and what exactly an eye diagram illustrates may be a less obvious.

The world of frequency domain analysis, S-parameters and electromagnetic compatibility opens a lot of new concepts and analysis regimes as well. The terminology is easy to get lost in, such as evaluating crosstalk through a NEXT or FEXT lens.

The good news is that Samtec, a company that has mastered signal integrity to deliver high performance data channels of all types, also provides a wide range of tools to help us all understand signal integrity – why it’s important, how to measure it and how to optimize it.

If you are unfamiliar with Samtec, you can get a good overview of what the company has to offer on SemiWiki here.

Samtec to the Rescue

A core resource to help navigate the subtleties and semantics of signal integrity is the document referenced at the top of this post. Samtec’s Signal Integrity Handbook provides everything you need to decode what good signal integrity practices look like and why they are so important. A link is coming so you can get your copy. You’ll be glad to have it. Let’s first look at what’s covered in this handbook and touch on some of the other ways Samtec helps its customers achieve the best system performance possible.

The handbook begins by covering the basics – the definition of signal integrity and the differences in dealing with single-ended and differential signals. Key signaling terms such as NRZ and PAM4 are then explained.  We then explore the entire world of frequency domain analysis and the role of S-parameters. Topics such as insertion loss, return loss and crosstalk are covered here.

The basics of time domain analysis are then explored, with a discussion of topics such as impedance, apparent impedance, propagation delay and skew. The various metrics used to characterize channel performance are then covered. This is followed by a discussion of modeling considerations, where topics such as simulation tools and validation are explored.

System analysis is then covered. This is where we learn about eye diagrams. Methods to select appropriate connectors and how to use them are then detailed. The handbook concludes with a discussion of signal integrity for RF systems. 

You will not likely read this handbook cover to cover, although it is quite well organized and easily consumed. Rather, it will become a reference for you as you navigate the challenges of your next system design.

Beyond the Signal Integrity Handbook, Samtec offers a broad set of resources to dig into the many topics associated with signal integrity. These take the form of short videos and informative blogs. There are links to some of these resources embedded in the handbook, which is quite convenient. If you want a quick overview of the Samtec Signal Integrity Group, you can get that in under two minutes with this informative video. If you want broader access to experts and materials, you can find that on Samtec’s Signal Integrity Center of Excellence. If you’ve moved beyond the basics, Samtec also offers its monthly gEEk spEEK webinar series.

To Learn More

At last, here is where you can download your copy of the Samtec Signal Integrity Handbook. Keep it handy when you start your next design. And that’s how Samtec demystifies signal integrity for everyone.