RVN! 26 Banner revised (800 x 100 px) (600 x 100 px)

eSilicon Bucking the Trend at OFC with 7nm SerDes

eSilicon Bucking the Trend at OFC with 7nm SerDes
by Daniel Nenni on 03-11-2019 at 8:00 am

A recent press release from eSilicon caught my eye. The company has been touting their 7nm SerDes quite a bit lately – reach, power, flexibility, things like that. While those capabilities are important, any high-performance chip needs to work in the context of the system, which usually contains technology from multiple sources. So, interoperability does matter and eSilicon’s press release announcing the addition of an interoperability demo with a mainstream FPGA at a major show is relevant. The release also talked about working with another ecosystem partner – Precise-ITC, to validate that their forward error correction (FEC) IP worked with the eSilicon SerDes as well.

Interoperability demo at OFC: eSilicon 56G SerDes and Precise-ITC 400G FEC

“Our current SerDes demonstration showcases the robustness, low power and flexibility of our 7nm device,” said Hugh Durdan, vice president of strategy and products at eSilicon. “It is also important to demonstrate interoperability with other popular hardware. I am delighted we can showcase this additional aspect of our SerDes capabilities at OFC.”

“Precise-ITC is a leading provider of Ethernet and optical transport (OTN) intellectual property products for ASIC and FPGA,” said Silas Li, Director of Engineering at Precise-ITC. “OFC2019 is a showcase event for the partnerships we have with FPGA vendors, ASIC developers, like eSilicon, and test equipment developers. Together, we’re enabling rapid deployment of 400GbE.”

Digging a bit more, the release announced additions to the demo compliment eSilicon will showcase at OFC. The Optical Fiber Communication Conference and Exposition (OFC) is a huge technical conference and trade show that is over 40 years old. According to their website: “OFC is the largest global conference and exhibition for optical communications and networking professionals.” There are over 700 exhibitors on 350,000 square feet of exhibit space. The show takes up the entire San Diego Convention Center, which is also where Comic-Con is held. This a huge show, absolutely.

OFC show floor

Digging further, you can find some more interesting news in the press release. In addition to the interoperability demo, eSilicon is demonstrating a complete HBM2 memory subsystem using Silicon’s latest 7nm HBM2 PHY, Northwest Logic’s memory controller and an HBM DRAM stack from a leading memory supplier. And they’re demonstrating the performance, flexibility and extremely low power consumption of their 7nm SerDes using with a five-meter ExaMAX Backplane Cable Assembly from Samtec.

eSilicon booth
Five-meter cable demo

eSilicon is demonstrating high-speed communications over a five-meter copper cable at the biggest optical networking show in the world. I would say that takes a lot of confidence. I had some spies at the show, and they reported quite a bit of interest in eSilicon’s copper cable demo. They appear to be driving the longest electrical cable at the show. Getting high speed and low power with a proven, simpler technology such as copper is certainly appealing. I’ll be watching to see what eSilicon announces next.


Ultra low-power Analog Design using a Multi-Project Wafer approach

Ultra low-power Analog Design using a Multi-Project Wafer approach
by Daniel Payne on 03-10-2019 at 1:00 pm

On SemiWiki we often talk about bleeding-edge technology like 7nm, 5nm or even 3nm, but for analog IC designs there’s a low-cost alternative to getting your ideas validated and prototyped without taking out a multi-million dollar loan, and that’s through the use of Multi-Project Wafers (MPW). Starting with a mature process node like 180nm still produces adequate silicon for low-power applications like IoT where analog sensors and converters are the main part of the chip functionality, along with some digital control logic, think big A and little D applications.

My industry contact Wladek Grabinski shared information with me this week about a company in France called CMP(English translation Multi-Project Circuits) that has been offering MPW foundry services since 1981 to keep costs down for IC designers at Universities, research laboratories or industrial companies that want to prototype their analog ideas economically.

For an MPW project you likely want from dozens to thousands of pieces manufactured for you, either packaged or just bare die, ready for testing. In total, some 7,900 projects have been prototyped through 1,043 MPW runs at CMP over the years, helping 614 customers realize their analog ideas into silicon. CMP certainly has their act together and provide a much needed service for companies needing to get quick prototypes for big A, little D designs.

A Swiss company em microelectronic (EM) has an ultra low-power IP library and foundry all ready to use with MPW services provided by CMP. Here’s what EM has to offer you:

  • Mature 180nm node for ultra low-power analog design (APL018)
  • NVM (EE or Flash)
  • EKV accurate models near and sub Vth operation
  • Analog and digital IP libraries characterized for low voltage (down to 0.4V), low current (nA bias)
  • I/O pads, low leakage ESD protection
  • Design Kit for Cadence
  • Digital flow for Synopsys

EM really knows IC design, as they’ve been in business since 1975 and their ultra-low power silicon is used in six major application areas:

  • Energy – harvesting, power management, storage
  • Interfaces – displays, tactile surfaces, computer peripherals, motion sensing, sound production
  • Sensing – interfaces, sensors
  • Communications – RF technologies, RF long range communication, RFID, beacons
  • Smart Processing – wearables, cryptography and security
  • Time – watches, fobs

Even though the EM headquarters are in Marin, Switzerland, you can also find their facilities around the globe in:

  • Colorado Springs, USA
  • Prague, Czech Republic
  • Bangkok, Thailand

If you’ve ever shopped for a watch you likely have seen the iconic Swatch brand in retail stores and online, so EM is the semiconductor company for Swatch.

Looking at the most recent press releases at EM I conclude that this company is well suited for IC designs that require Bluetooth, IoT, RF and anything that is battery-powered and requires ultra-low power consumption.

CMP invited EM to present at a seminar last month, so check out the slides here.


Lyft IPO Paints Perilous Profitless Picture

Lyft IPO Paints Perilous Profitless Picture
by Roger C. Lanctot on 03-10-2019 at 8:00 am

Lyft’s S1 filing for its IPO is a sobering read, as such documents often are, requiring, as they do, the full disclosure of current financial circumstances and, everyone’s favorite: risk factors. Lyft identifies 18 risk factors (below) which could interfere with the long-term success of the operation. I think there are more. Continue reading “Lyft IPO Paints Perilous Profitless Picture”


Data Centers and AI Chips Benefit from Embedded In-Chip Monitoring

Data Centers and AI Chips Benefit from Embedded In-Chip Monitoring
by Daniel Payne on 03-08-2019 at 12:00 pm

Webinars are a quick way to come up to speed with emerging trends in our semiconductor world, so I just finished watching an interesting one from Moortec about the benefits of embedded in-chip monitoring for Data Center and AIchip design. My first exposure to a data center was back in the 1960s during an elementary school class where they wheeled in a Teletype machine connected to a telephone line, and at the other end was a centralized computer system located in some air-conditioned room that ran a Civil War game app that had us students choosing how to run a campaign with our resources and then predict the outcome of the battle. In the 1970s at the University of Minnesota our data center was powered by machines from Control Data Corporation, and then at my first job with Intel in 1978 the data center was powered by IBM mainframes in a remote location that we accessed from Oregon.

Living in Oregon we know something about data centers because of the low cost of electricity from our plentiful hydro power generators, moderate climate, and generous tax breaks for companies like Googleto locate. In 2018 the data centers in the US consumed some 90 billion kilowatt-hours of electricity, while globally that power consumption was 416 terawatts, which was 3% of the total electrical output. This growing trend for data center power consumption causes heat-induced reliability issues for each of the semiconductor components mounted on boards, stuffing racks of equipment.

Source: Google Data Center

Much new VC money in 2018 has poured into AI chip startups, so let’s just summarize both the data center and AI chip design challenges:

Data Center
· Reliability and long MTBF(Mean Time Between Failures)
· Low service interruption
· Big die sizes at advanced nodes
· High volume with high manufacturing yield required
· Fine grain DVFS (Dynamic Voltage and Frequency Scaling) control
· Chip supply voltage noise

AI
· High data throughput
· Intense and bursty computations
· Constrained power
· Variable CPU core usage, or utilisation
· Continual optimisation of algorithms for data analysis and manipulation
One method to deal with all of these chip design challenges is to place PVT (Process, Voltage, Temperature) monitors in your AI or data center chips, allowing you to measure in real time what’s happening deep within each chip, then use that info to make decisions about changing the Vdd values or local clock speeds to ensure chip reliability and meet MTBF goals. Take the example of a typical AI chip which may have CPU clusters with thousands of cores being used, as shown below where 16 cores form each cluster and then placed around each cluster are PVT blocks sensor (colored blocks):

CPU Clusters with PVT monitors

The temperature monitors will let you know if the Junction Temperatures are within specifications, for example 110C. Thermal monitors can be used to:

· Avoid Electrical Over Stress (EOS)
· Mitigate Electromigration effects
· Limit hot carrier aging
· Prevent thermal runaway

Semiconductor processes are not uniform, so you cannot expect that Silicon will be centered on the TT corner, instead you can expect:
· Process variability across each die
· Variation caused by lithography
· Reliability effects like aging
· FinFET variations

IC designers start out with an ideal power supply concept like a Vdd value of 1.1V, but then you have to deal with the non-ideal physical realties with on-chip voltages like:
· Interconnect resistance causing dynamic IR drops along Vdd paths
· Dynamic versus static power
· Electromigration effects on Power, clock and interconnect

Static Timing Analysis (STA) tools are run on chips before tapeout to ensure that your design meets speed criteria across all PVT corners, but with actual physical local variations on advanced nodes it’s conceivable that one die region has a temperature of 50C, Vdd of 0.8V and SS corner, while another region has a slightly different temperature of 65C, Vdd of 0.9V and TT corner. Your STA tool needs to handle these on-chip variations (OCV) while calculating path delays.

Not all thermal monitors are created equal, so if Moortec provides a thermal monitor with +/- 2C accuracy, and another vendor has a +/- 5C accuracy thermal monitor, go with the 2C monitor in order to provide tighter control to your thermal throttling system, which in turn provides greater power savings and allows for the highest data throughput.

Consider the power savings for a data center with 100,000 servers (Facebook having ~400,000 for example) and you could save 2W per chip by using a Moortec PVT approach versus a less accurate monitor that requires 6C more thermal guard-banding. The webinar provided a case study with calculations, showing if this saving per chip were scaled upward then a data center could save around $2M per year in electricity costs.

Just like tighter thermal guard-banding is beneficial to data center chips and systems, the same can be said for voltage guard-banding with highly accurate 1% values with Moortec mean fewer watts wasted on a system compared with less accurate voltage guard-banding. An example system using 0.8V for Vdd and a 20W target and using Moortec voltage monitors shows a worst-case value of 20.4W, while a less accurate voltage monitor has a worst-case value of 22.1W which is 10% more wasted power than what Moortec provides. Again, Moortec outlined that there were material cost savings to the data center operators.

SoCs that use Adaptive Voltage Scaling (AVS) in closed loop benefit from using embedded Process or Voltage Monitors that tell the PMIC (Power Management IC) what the actual silicon values are.


Voltage scaling optimization

Summary
There’s only one IP vendor dedicated 100% to PVT monitoring for ICs and that’s Moortec, they started in the UK back in 2005 and have customers now around the globe using the most popular nodes from the major foundries. You can take the next step and contact one of their offices nearest to your timezone: UK, USA, China, Taiwan, Israel, Europe, South Korea, Russia, Japan.

Watch the entire 35 minute webinar recording online, after a brief registration process.

Related Blogs


Arm Deliver Their Next Step in Infrastructure

Arm Deliver Their Next Step in Infrastructure
by Bernard Murphy on 03-08-2019 at 7:00 am

Arm announced their Neoverse plans not long ago at TechCon 2018. Neoverse is a brand, launched by Arm, to provide the foundations for cloud to edge infrastructure in support of their vision of a trillion edge devices. To a cynic this might sound like marketing hype. Sure, they’re widely used in communications infrastructure and certainly in edge devices, but they never really cracked the datacenter, or so conventional wisdom held. They put that concern to rest not long after TechCon when AWS announced immediate availability of EC2 A1 instances in their services. These are built on Arm-based Graviton processors, developed by AWS Annapurna Labs.

Continue reading “Arm Deliver Their Next Step in Infrastructure”


Newer cryptocurrencies highlight need for agile mining strategies

Newer cryptocurrencies highlight need for agile mining strategies
by Tom Simon on 03-07-2019 at 12:00 pm

Cryptocurrencies represent a radical departure from traditional forms of money. Currencies like Bitcoin, Etherium and Monero offer many unique advantages over traditional currencies, and are changing how money is created and used. Bitcoin, the pioneer of cryptocurrencies, relies on pure computational power for so-called mining, which is the process where transactions are verified and providers of this service are rewarded with newly minted bitcoins. Starting with CPU’s, then GPU’s this lead to an inexorable spiral towards more powerful and dedicated mining hardware. The mining activity moved to FPGAs and then to dedicated ASICs; at the same time, it moved to very specific geographies with low electricity costs. And, the democratization of cryptocurrency yielded to a smaller group of niche players.

Fortunately, this trend has been challenged by newer cryptocurrencies that have imposed new requirements on mining that make it more democratic. For instance, newer currencies such as Monero regularly perform forks, which change the algorithm for mining, rendering dedicated ASICs obsolete. Another strategy is requiring random memory access in a large address space. Both of these features make it more challenging to develop silicon specifically targeted at gaining an advantage in mining.

Interestingly, Achronix has developed a radical departure from traditional FPGAs in the form of embeddable FPGA (eFPGA) fabric, that coincidentally offers some compelling advantages in the mining of these newer cryptocurrencies. Achronix has written a white paper that outlines how their Speedcore eFPGA is well suited to the task of mining. However, their treatise on how well their eFPGA is for mining, also speaks indirectly to how eFPGA can be used to solve a wide variety of challenges that either traditional ASIC or FPGA may struggle with.

Achronix’s Speedcore eFPGA is highly configurable, and at the same time does not drag a lot of unnecessary blocks into the finished design. In an amusing section of their white paper Achronix refers to how some writers refer to standard FPGAs as programmable piles of parts. In all seriousness, standard FPGA parts often are mismatched to the task at hand. Nowhere is this truer than in the area of cryptocurrency mining. Things like Ethernet, PCIe, MAC’s, SerDes, etc. are not needed and just end up taking up valuable real estate for no actual benefit. Also, a multitude of small memories do not suffice for the memory needs associated with mining.

When a precisely configured eFPGA core can be married to custom memory instances, it leads to big performance, power and area advantages. Their white paper compares a case study that uses eFPGA in an ASIC to the performance of GPU or standard FPGA based alternatives. A traditional ASIC based alternative was ruled out because it lacks the re-programmability to deal with forks that require new algorithms for mining.

While perhaps some readers of their white paper may be compelled to embark on designing a new mining chip – the white paper certainly makes clear that it would be a wise choice – the bigger take away is that Speedcore eFPGA offers numerous advantages for a wide range of problems that are currently being addressed with CPUs, GPUs, ASICs or standard FPGAs. It was of course an interesting read on the directions where cryptocurrencies are headed. If you want to learn more, the white paper is available on their website, and makes for good reading.


Intelligent Electronic Design Exploration with Large System Modeling and Analysis

Intelligent Electronic Design Exploration with Large System Modeling and Analysis
by Camille Kokozaki on 03-07-2019 at 7:00 am

At the recent DesignCon 2019 in Santa Clara, I attended a couple of sessions where Cadence and their research partners provided some insight on machine learning/AI and on large system design analysis; with the first one focused on real-world cloud & machine learning/AI deployment for hardware design and the second one focused on design space exploration analyzing large system designs.

I. Intelligent Electronic Design and Decision

The first session was kicked off by Dr. David White of Cadenceand was entitled Intelligent Electronic Design and Decision. He contrasted the internet-driven image recognition AI problems with EDA related AI. The characteristics of image recognition include natural or man-made static objects with a rich set of online examples whereas EDA characteristics are dynamic and require learning adaptability with sparse data sets where verification is critical and optimization very important.

White pointed out that not a lot of large data sets exist and verification is essential to all we do in EDA/SoC design, and optimization plays a role in large designs when finding design solutions. The ML/DL space additionally refers to a few different technologies such as optimization and analytics. He also noted that these approaches can be computationally heavy, so massive parallel optimization is used to get the performance back. In the development of design automation solutions, uncertainty arises in one of two forms:

  • Factors/features that are unobservable
  • Factors/features that are observable but change over time.

Design intent is not always captured in EDA tools where designers have an objective and intention in mind and then tune to an acceptable solution. This can be problematic at recent silicon technologies where uncertainty is greatest and there is a low volume of designs to learn from. The goal is to use AI technology and tools to learn from a prior design database, to explore, and reach an acceptable solution. At PCB West 2018, auto router results presented from Intel took 120 hours but when using AI-based smart routing the runtime got down to 30 minutes.


There are five challenges for intelligent electronic design:
1. Developing real-time continuous learning systems:

  • Uncertainty requires the ability to adapt quickly
  • Limited observability requires ways to determine design intent

2. Creation of contextual learning for hierarchical decision structures:
There are a series of design decisions a designer makes to design a chip, package or board, those decisions drive to a number of sub-goals. This leads to a number of complicated objective functions or a complicated optimization problem that requires solving in order to automate large chunks of the automation flow.

3. Robust flexibility and verification:
Most designs are used behind firewalls, and solutions need autonomy. Formalized verification processes are needed to ensure stable learning and inference. Robust optimization approaches are needed to ensure stable decisions.

4. Cold start issues:
Learning and model development is difficult when a new silicon technology is ramped. Typically very little data is available and there is no model to transfer. This is typical of early silicon nodes (like 7nm) when there are few designs to learn from and overall uncertainty is largest.

5. Synthesizing cost functions to drive large-scale optimization is complex and difficult.

II. Design Space Exploration Models for Analyzing Large System Designs

The second session addressed Design Space Exploration with Polynomial Chaos Surrogate Models for Analyzing Large System Designs.[1] Cadence is collaborating with and supporting the academic work that was presented in that session.

Design space exploration usually involves tuning multiple parameters. Traditional approaches (sweeping, Monte Carlo) are time-consuming, costly, and non-optimal. The challenge is quantifying uncertainty from un-measurable sources. Polynomial Chaos (PC) provides more efficient uncertainty quantification methods and addresses the curse of dimensionality (too many parameters to track which may or may not be significant). In order to address this curse of dimensionality and since the size of the PC surrogate model increases near-exponentially, a dimension reduction of less important variables that have a negligible effect on output can occur as follows:

• Only sensitive variables are considered as random.
• The rest are fixed at their average value.
• A full PC model is developed based on the selected terms.

Polynomial Chaos theory was presented (with intimidating math that was well explained including sensitivity analysis). A multi-stage approach for developing surrogate models was proposed and goes as follows:

• First, a simplified Polynomial Chaos (PC) model is developed.
• The simplified model is used for sensitivity analysis.
• Sensitivity analysis results are used for dimension reduction.
• The sensitivity of different ranges of variables is evaluated.
• Training samples are placed based on the results.
• A full PC surrogate model is developed and used for design space exploration.
• A numerical example with a DDR4 topology was presented for validation, with results summarized in the table and diagram:



I had a chance to chat with Ambrish Varma Sr Principal Software Engineer, who is working in the Sigrity High-Speed analysis division andKen Willis (product engineering architect, signal integrity). Their products are system level topology end-to-end from transmitters to receivers, not just for SerDes but also for parallel buses. Anything on the board can be extracted, making models for the transmitter and receiver, so pre-layout and post-layout simulations can be done. Now, one can use machine learning algorithms to hasten the simulations. Even if a simulation takes 30 or 90 seconds each, a million of those takes weeks. One needs to figure out which parts of the SerDes to focus on. One could make a model of the layout and then never be able to run a simulation. The R&D here is the first foray into simulation analysis smart technology.

ML trains and gathers the data, and to ensure the training data is not biased, the test will use random data. You then decide which parameters and variables to focus on. This is the first phase of the analysis. Next you abstract to a behavioral model, so a simulation lasts a couple of minutes, but then with more training data, you can dial in the accuracy. Final results get within 1% of the predicted value. When sensitivity analysis is run, models developed have an objective function or criteria. They use a metric called NJN, Normalized Jitter Noise, a metric of how open or closed an eye is within one unit interval, but the metric could also be overshoot, or could be channel operating margin, power ripple, signal-noise-ratio.

Picking that objective function is important and then the sensitivity analysis can focus on the major contributor. Cadence is helping academia as part of a consortium of industry and three universities, Georgia Tech, NC State and UIUC. This is still in the research stage and no release to production has occurred yet. One can tune the R, L, C, and the sensitivity analysis helps in the choices of the optimum setting. A model will be part of a library of use cases. Design reuse is enhanced with physicality, a snippet of layout, logic, netlist. If those reusable blocks are augmented with ML models for different objective functions, you can leverage the analysis in the reuse. It is possible that the ML models get standardized so that they can be used across all EDA tools. The solution space will have different designs with models that can be standardized. Whole solutions could be tool-based or tool-specific.

Cooperation with academia, and making the tool smarter are objectives such as trying to minimize input from the user by being smarter. A design cell is used as input, is an edge thing run now, but one can imagine that computations and sampling can be sent to an engine in the cloud, which could be returning data. One step push button, computationally intensive can be envisioned moving forward. The team is working on firming the model with tangible applications in mind. There is a tendency to think that is replacing traditional methods. It is, however, more an augmentation than a replacement. Advanced analysis is democratized a lot more, more simulation will be needed in the future, and this capability comes at the right time.

[More on Cadence signal integrity with artificial neural networks and deep learning]

[1]
Majid Ahadi Dolatsara(1), Ambrish Varma(2), Kumar Keshavan(2), and Madhavan Swaminathan(1)
(1) Department of Electrical and Computer Engineering, Georgia Institute of Technology, Center for Co-Design of Chip, Package, System (C3PS)
Center for Advanced Electronics Through Machine Learning (CAEML), (2) Cadence


PCIe 5.0 Jumps to the Fore in 2019

PCIe 5.0 Jumps to the Fore in 2019
by Tom Simon on 03-06-2019 at 12:00 pm

2019 will be a big year for PCIe. With the approval of version 0.9 of the Base Layer for PCIe 5.0, implementers have a solid foundation to begin working on designs. PCIe 4.0 was introduced in 2017, before that the previous PCIe 3.0 was introduced in 2010 – ages ago in this industry. In fact, 5.0 is so close on the heels of 4.0, many products may simply leapfrog the 4.0 version and go directly to 5.0. Each version of PCIe has doubled the throughput, with 5.0 coming in at 63 GB/s with a 16 lane implementation. Compare that to the 4 GB/s throughput for the 2003 PCIe 1.0 with 16 lanes.

It’s even more amazing to go back to the specs of the original PCI from Intel in 1992. Back then the clock rate was 33.33 MHz with data rates of 133MB/s for a 32-bit bus. Of course, the original PCI used parallel synchronous data lines, which limited throughput due to clocking and bus arbitration issues. All of the PCIe specifications rely on high speed serial data transfers with each connected device having a dedicated full-duplex pair of transmit and receive lines. As with modern serial links the clock is embedded in the data stream, eliminating the need for external clock lines. Multiple lanes are used to increase throughput with the added requirement of limited lane skew so that the controller can reassemble the striped data.

Indeed, designers of PCIe IP and teams that are integrating PCIe 5.0 need to be mindful of a number of technical considerations. Synopsys recently posted an informative article about PCIe 5.0 on their website that discusses many of these issues. At the rate of 32GT/s the Nyquist frequency increases to 16GHz. This higher frequency for transmitting data complicates the channel design. Insertion loss increases at this higher operating frequency, and cross talk becomes a more serious problem. FR4 as a choice for PCB material is completely ruled out for most designs, unless retimers can be used. Maximum allowed channel loss for PCIe is 36dB. A 16 inch 100 Ohm differential pair stripline on FR4 would have a loss of 33.44 at 16 GHz. Leaving virtually no loss allowable for the other elements of the channel such as packaging, connectors, cabling, etc. Fortunately, there are alternatives that perform better, if the right design decisions are made.

In their article Synopsys also points out that the interplay between the PHY and controller becomes more interesting. There is an interface, known as the PHY Interface for PCIe (PIPE), for integrating the PHY and controller, with the latest PIPE 5.1.1 supporting the changes for PCIe 5.0. In the latest version, the pin count has been reduced by moving side-band pins into register bits, the Physical Coding Sublayer (PCS) moved from the PHY to the controller to permit the use of more general purpose PHY designs, and a 64-bit option has been added to help reduce the speed needed in the PIPE interface.

The Synopsys white paper offers an excellent description of the trade-offs relating to timing closure on 8 and 16 lane interfaces running at the highest transaction rates. Using a 512-bit controller with a 32-bit PIPE, running at 32 GT/s with 16 lanes, the controller logic timing can be closed with a 1 Ghz clock rate. Other options either require much higher clock rates, making timing closure infeasible, or call for a larger controller that is not available in today’s market.

Synopsys also provides a lot of useful information about packaging and signal integrity considerations for PCIe 5.0. They conclude with a section on modeling and testing of the interfaces.

Synopsys offers a complete solution for PCIe 5.0, including controllers, PHYs, and verification IP. This should come as some comfort to design teams that are looking to add the latest generation to their products.

There are a lot of considerations and choices to be made in order to build the right interface for a given application. The Synopsys DesignWare IP for PCIe includes configurability with support for multiple data path widths, including a silicon proven 512-bit architecture. The article on their website is very informative and helps clarify some of the biggest issues relating to the move to PCIe 5.0.


Mentor Showcases Digital Twin Demo

Mentor Showcases Digital Twin Demo
by Bernard Murphy on 03-06-2019 at 6:00 am

Mentor put on a very interesting tutorial at DVCon this year. Commonly DVCon tutorials center around a single tool; less commonly (in my recent experience) they will detail a solution flow but still within the confines of chip or chip + software design. It is rare indeed to see presentations on a full system design including realistic use-case development, system design and end-application validation together with an electro-mechanical model. That’s what Mentor presented in this tutorial and my hat is off to them. Obviously synergy with Siemens is starting to have an impact.

Jacob Wiltgen (Mentor, all the speakers were from Mentor) kicked off by outlining their goal for a level 4/5 autonomous car: to develop a computer vision system from scratch, to functionally verify that systems and optimize for PPA, to plan, measure and integrate safety into the system to meet an ASIL-B safety goal and then to validate the operation of that system in a digital twin all the way from sensing in simulated but realistic driving scenarios, through compute (recognition) to actuation, electro-mechanically simulating braking. In this case the goal was to detect a pedestrian in the highway and apply the brakes autonomously.

David Aerne started this flow with a presentation on using high-level (C++) synthesis to build a CNN recognition engine. It’s pretty clear that architectures in this space are very dynamic; in automotive applications where response time and accuracy are paramount, it would not be surprising to see a lot of custom implementations. HLS, often associated with image and similar processing functions, is a natural fit for CNNs. Optimizing the CNN to an application involves many architectural tradeoffs – number of layers, pooling choices, sliding window architecture, memory architecture, fixed point word-sizes at each layer, … Trying to manage this at RTL would be impossible, but is a natural process in C++ using abstraction/complexity-hiding to be able to easily compare alternative implementations. Another very important advantage that comes with design at this level is that you can also verify at the same level. Which means you can verify against very large image databases, orders of magnitude faster than would be possible in RTL.

Jacob followed to talk about the functional safety part of this flow. This is a topic that gets a lot of coverage, so I’ll just pick out a few points that struck me. First this is clearly an area of strength for Mentor. They have the broadest range of tools in this space that I have seen:

  • Safety analysis through SafetyScope (through acquisition of Austemper) – still relatively unchallenged in EDA as far as I know
  • Design for safety through Annealer and Radioscope (also from Austemper) and Tessent BIST
  • Safety verification through Kaleidoscope (again Austemper), Questa Formal, Veloce FaultApp and Tessent DefectSim
  • Lifecycle management through Siemens Polarion and Questa verification management

Through this suite of tools, they are able to do early safety exploration, estimating what level of diagnostic coverage may be realistically achievable. Then they can automate insertion of planned safety mechanisms and assess their PPA impact. Finally they can plan a fault campaign for FMEDA analysis, classifying faults and grading and filtering tests to optimize fault simulation throughput. Which they then manage in parallelized concurrent fault sims.

The last part of the tutorial was a real eye-opener. Richard Pugh presented a flow using emulation hardware-in-the-loop for a true system-of-systems verification, something I consider a digital twin to the real-life application. A challenge in proving level 5 autonomy is (at minimum) the number of miles of testing required – Toyota have estimated over 14 billion miles. Doing this level of testing live isn’t practical; it has to be simulated in large part, hence the need for digital twins.

This is where being a part of Siemens becomes a real advantage. Scenario modeling starts with PreScan from Tass International (also a Siemens company). This generates photo-realistic driving simulations across a wide range of conditions – city, highway, complex road networks, nighttime, fog, congestion, pedestrians, etc, etc, etc. That feeds into (in this example) pedestrian detection running on a Veloce system. Which in turn feeds into LMS AMEsim (another Siemens product) to model the autonomous emergency braking system in the context of the real electro-mechanical response of the braking system and the frequency response of the chassis (because a real car won’t stop on a dime).

Richard wrapped up with a quick view of a range of digital twin flows of this type, for the dashboard ECU, engine control, transmission control, braking control (the example above) and ADAS control. Powerful stuff. If you want to see the future of verification of sense-compute-actuate systems for transportation, you might want to check them out.


A Preview of Spring Symposium on AIoT

A Preview of Spring Symposium on AIoT
by Alex Tan on 03-05-2019 at 12:00 pm

i9c4a46NzX1IA6hou1i sXafli6HRPz 1aOtGadiEwEVJnwLyUEU 1n3 SNddZusG6T3AgCh4wTmcjTrR4h9yz2TcMEXPCFi4pleiE7s7wipPwRVDAWvo3Plz3WQkn 0dk9nVqQ6

The trend of AI augmentation into many facets of silicon based hardware applications is on the rise. During the CASPA press conference in Santa Clara last week, Silvaco CEO David Dutton and SiFive VP, GM Christopher Moezzi were present to share their insights.

Silvaco CEO David Dutton mentioned that we are in new era in which many decisions in our day-to-day life will be augmented by views from compute based analytics such as traffic heads-up every morning to fit one schedule. This augmented era will bring society to new level of productivity. It also comes with the need of increasing improvements in the AI related technologies. It is a high growth segment in China, with over 30 AI startups and counting. He will be elaborating more on this in his upcoming presentation at this week CASPA Spring Symposium.

Chris Moezzi from SiFive was also very upbeat on the growth trend of SiFive and the RISC-V ecosystem. He pointed out the three verticals in the semiconductor industry: client segment (such as drones, IoTs, AR/VR, smartphone), data center (cloud, edge) and auto-vehicles (with ADAS). With IoT fragmented markets: a faster and cheaper development cycle is needed. His view on product customization is that it takes only 10% or so for differentiation, while 90% can be pre-selected early (as IPs). He will also elaborate more on this coming symposium as well.

According to Danny Hua, CASPA Chairman and President, this year theme will reflect AI impacts on the edge: AI of Things (which is the equivalent to AI on the IoTs). He mentioned that about 500 registered attendees so far. David Dutton and a number of other speakers from the industry and academics will be sharing the state of the AI applications landscape.


CASPA (Chinese American Semiconductor Professional Association) has been sponsoring the symposium semi-annually. Many semiconductor industry luminaries have participated in presenting their views regarding the current technology trends.

For this coming event, the scheduled talks are as following:

For more info on the event please check HERE or more on CASPAHERE