RVN! 26 Banner revised (800 x 100 px) (600 x 100 px)

Artificial Intelligence calls for Smart Interconnect

Artificial Intelligence calls for Smart Interconnect
by Tom Simon on 04-18-2018 at 7:00 am

Artificial Intelligence based systems are driving a metamorphosis in computing, and consequently precipitating a large shift in SOC design. AI training is often done in the cloud and has requirements for handling huge amounts of data with forward and backward data connections. Inference usually occurs at the edge and must be power efficient and fast. Each of these imposes new requirements on computing systems. Training puts a premium on throughput and inference relies on low latency, especially for real time applications like ADAS.

To accommodate these new requirements, there are sweeping changes occurring in computational architectures. In much the same way that mini- and then micro- computers changed the landscape of computing, the changes necessitated to support AI will permanently alter how things are done.

The what and how of these changes was the topic of a presentation given by NetSpeed at the Linley Processor Conference on April 11[SUP]th[/SUP] in Santa Clara. The presentation by Anush Mohandass, VP of Marketing at NetSpeed, discusses how a smart interconnect fabric helps to enable embedded AI applications. Their first point was that AI is making its way into a large and broad number of applications. These include vision, speech, forecasting, robotics and diagnostics, among others.

Inside of these new SOCs there is a new data flow. A large number of compute elements which are small and efficient need to perform peer to peer data exchange rapidly and efficiently. There will be many multicast requests and the transfers should be non-blocking. Indeed, QoS becomes very important. Previous architectures operated differently, with processing units using a central memory as an interchange system.

AI systems need ‘any-to-any’ data exchanges that benefit from wide interfaces and need to support long bursts. However, the central requirement is that all the elements need to be active simultaneously. Naturally, it is easy to see that this can lead to power management issues that should be resolved with aggressive clock gating and traffic sensitive optimizations.

NetSpeed talked about their approach, which can help enable SOCs that have requirements like those imposed by AI applications. They provide the logic needed to integrate, coordinate and control the large number of types and instances of IPs in an SOC. This covers many facets: interconnect, cache coherency, system level cache, system level debug, bandwidth allocation, QoS controls, power management, and clock crossings. With so many parameters and requirements, what is really needed is a design environment specifically geared to implementing the optimal solution.

This is something NetSpeed offers. It supports an architectural design approach that starts off with a specification, and then helps work through the various tradeoffs. Their design environment provides feedback along the way and is checking for design correctness continually.

NetSpeed offers Orion for creating non-coherent interconnect. Their Gemini offering is for coherent system backbones. Their Crux backbone is architecture agnostic. Finally, for programmable L2, L3, and LLC cache they offer Pegasus. Their design environment assists with design and assembly. They use a machine learning based cognitive engine to help with implementation. The system outputs extensive data analytics and visualizations.

In much the same as TCP/IP offers a multi layered protocol that provides abstraction for data transmission on the internet, NetSpeed’s SOC solution uses a multi-layer protocol implementation to provide optimal performance and highest throughput. With this comes QoS, multicast support and no blocking behavior, needed for AI processing.

The NetSpeed presentation went into greater depth on the technology and is well worth reviewing. The big take away is that entirely new ways of design will be necessary to accommodate the needs of AI in future SOCs. It may come to pass that we look back at CPU based computing the way we do punched cards and magnetic tapes.


Tensilica 5th Generation DSP: Mix of Vision and AI

Tensilica 5th Generation DSP: Mix of Vision and AI
by Eric Esteve on 04-17-2018 at 12:00 pm

Cadence has launched the new Tensilica Vision Q6 DSP IP, delivering 1.5x more performance than the former Vision P6 DSP IP and 1.25X better power efficiency. According with Cadence, the mobile industry is moving from traditional feature-based embedded vision to AI-based algorithm, even if all use cases still have mix of vision and AI operations. The result is need for both vision and AI processing in the camera pipeline, translating into the implementation of both Vision Q6 DSP and C5 DSP to solve the complete camera processing pipeline.

Implemented in the Huawei Mate 10, Cadence Vision DSP enables advanced imaging applications like HDR video, image stabilization or hybrid zoom with 2 scene facing cameras. Compared to CPU or GPU, Vision P6 and now Q6 helps meeting high resolution video capture, thanks to their high-performance capability and battery life requirements, thanks to much better energy efficiency. The Vision P6 IP core also serves as the processing unit for AI processing in the MediaTek P60, that MediaTek call the Mobile APU.

If you look at the way MediaTek communicates about their P60, AI capability is as much highlighted as the power of the four ARM Cortex A-73 CPU as “users can enjoy AI-infused experiences in apps with deep-learning facial detection (DL-FD), real-time beautification, novel, real-time overlays, object and scene identification, AR/MR acceleration, enhancements to photography or real-time video previews and much more.”

Cadence Vision DSP are also implemented in chips supporting automotive application like the GW5400 camera video processor (CVP) from GEO Semiconductor where the Vision DSP enables ADAS functions such as pedestrian detection, object detection, blind spot detection, cross traffic alert, driver attention monitoring, lane departure warning, as well as target-less auto calibration (AutoCAL®). For such device, energy efficiency is key to meet the very-low-power, zero-air-flow requirements for automotive cameras.

According with Mike Demler, senior analyst at The Linley Group. “SoC providers are seeing an increased demand for vision and AI processing to enable innovative user experiences like real-time effects at video capture frame rates. The Q6 offers a significant performance boost relative to the P6, but it retains the programmability developers need to support rapidly evolving neural network architectures. This is a compelling value proposition for SoC providers who also want the flexibility to do both vision and AI processing.”

The race for higher performance in vision processing is impacting all kind of applications, as well as the emerging need to implement local AI engines. If we take a look around, we can list:

Mobile

Over the next 4 years, there will be 3X increase in dual cameras and projections show that smartphone shipments integrating dual-sensors will be at 50%/50% in 2020.
On-device AI experiences at video capture rates is now a feature that help differentiate smartphone suppliers.

AR/VR Headsets

In robotic mapping and navigation, simultaneous localization and mapping (SLAM) is the computational problem of constructing or updating a map of an unknown environment while simultaneously keeping track of an agent’s location within it. Latency requirements for SLAM and image processing are decreasing, pushing again the need for speed.
On-device AI is required for object detection/recognition, gesture recognition and eye tracking.

Surveillance cameras

Need to increase camera resolution and image enhancement techniques and for on-device AI for family/stranger resolution and anomaly detection.

Automotive

This is probably the most demanding segment, as it requires increase in number of cameras and camera resolution. On-device AI is clearly a “must have” for ADAS for pedestrian/object recognition.

Drones and robots

360 capture at 4K or greater resolutions and advanced computer vision for autonomous navigation is required as well as on-device AI for subject recognition and scene recognition.

To increase performance, some obvious solution like increasing SIMD width or VLIW slots to bring more parallelism, implementing N-core to multiply the processing power or simply run the processor at higher frequency have severe drawback in term of power consumption, area impact or programming model.

Cadence has reworked the processor architecture, now based on 13-stages pipeline, and the Vision Q6 can reach 1.5 GHz peak frequency. Compared with the Vision P6, the Q6 delivers 1.5X performance for vision and AI applications, 1.5X frequency in the same floorplan area and 1.25X better energy efficiency at Vision P6 peak performance. To compare apple with apple, these data comes from an implantation on 16nm process in both cases.

As we can see on the above picture, the complete architecture of Tensilica Vision Q6 DSP has been reworked, with deeper pipeline, improved system bandwidth and imaging and AI enhancement for this 5[SUP]th[/SUP] generation Vision DSP IP.

ByEric Esteve fromIPnest


Sometimes a Solver is a Suitable Solution

Sometimes a Solver is a Suitable Solution
by admin on 04-17-2018 at 7:00 am

Traditional, rule based, RC extractors rely on a substantial base of assumptions, which are increasingly proving unreliable. Having accurate RC extraction results for parasitic R’s and C’s is extremely important for ensuring proper circuit operation and for optimizing performance and power. Advanced process nodes are making it more difficult to get sufficiently accurate parasitics using rule based extractors. The problem is twofold, the design data given to the extractor is looking less and less like the actual fabricated physical design, and using rules is becoming less accurate due to increasingly complex structures in the circuits. These problems are occurring in BEOL and MEOL.

During a webinar in March, Dr. Garrett Schlenvogt at Silvaco gave some examples of the divergence between rules based extraction and the more accurate solver based approach. Using a ring oscillator, Garrett showed how, as metal structures become more complex, simulated delays diverged from measurement and solver based delays. The figure below illustrates this.

Another point that Garrett made during the webinar is that the 3D geometry to be analyzed needs to match the results from the fabrication process, not just the idealized 3D extrapolation of the 2D layout. He outlined the many factors that need to be considered. In advanced designs there are multiple dielectrics and metals. The geometries are not nicely stratified and metals frequently are not planar. In addition, the metal cross sections are not rectangular. The image below gives an idea of the complexity of fabricated 3D structures.

Clearly a solver cannot be used on large designs, but there are many cases where it can be used not just at the device level, but also at the circuit level. Using a simple step by step sequence, each step in the fabrication process is described and then applied to the mask information. Users can toggle between precise physical modeling or a simplified final representation, depending on accuracy requirements.

The output of Victory Process is passed to Victory Mesh for meshing. For non TCAD users it’s easy enough to take the interconnect portion of the design into Clever 3D, their Field Solver based extraction tool. This will produce a netlist including parasitics suitable for SPICE simulation. This provides a flow that is much easier to deploy than a classic TCAD approach, but gives the benefits of extremely high accuracy.

Because their modeling of physical fabrication steps is comprehensive, there are applications for this flow in many other domains besides FinFET/CMOS. Garrett touched on TFT/LCD/OLED, power devices such as DMOS/IGBT/SiC/GaN, optical, and even rad-hard applications. Another of his examples showed a conformal metal interconnect modeled with and without 3D considerations. The figure below shows the difference in the resistance value results.

During the webinar Garrett mentioned several interesting applications that can benefit from accurate RC extraction. One of these was MEMS capacitors. Another application he highlighted was CCD sensors. Garrett closed with an example containing a memory cell. Along with the parasitics, Silvaco generates a 3D model that can be viewed to ensure the processing steps are properly defined and that the resulting structure is correct.

For engineers looking for the most accurate results, field solver based extraction is the first choice. A field solver based extractor can also be used to verify a rule based approach. However, for full chip and high capacity designs a rule based approach will be needed. The entire webinar, with much more information than we could cover here, is available on the Silvaco website.


Functional Safety – the Analytics

Functional Safety – the Analytics
by Bernard Murphy on 04-17-2018 at 7:00 am

ISO 26262 is serious stuff, the governing process behind automotive safety. But, as I have observed before, it doesn’t make for light reading. The standard is all about process and V-diagrams, mountains of documentation and accredited experts. I wouldn’t trade a word of it (or my safety) for a more satisfying read, but all that process stuff doesn’t really speak to my analytic soul. I’ve recently seen detailed tutorials / white-papers from several sources covering the analytics, which I’ll touch on in extracts in upcoming blogs but I’ll start with the Synopsys functional safety tutorial at DVCon, to set the analytics big picture (apologies to the real big picture folks – this is a blog, I have to keep it short).

To open, chip and IP suppliers have to satisfy Safety Element out of Contexttesting requirements under Assumptions of Use which basically comes down to demonstrating fault avoidance/control and independent verification for expected ASIL requirements under expected integration contexts. Which in turn means that random hardware failures/faults can be detected/mitigated with an appropriate level of coverage (assuming design/manufacturing faults are already handled).

Functional safety analysis/optimization then starts with a failure mode and effects analysis (FMEA), a breakdown of the potential functional modes of failure in the IP/design. Also included in this analysis is an assessment of the consequence of the failure and the likely probability/severity of the failure (how important is this potential failure given the project use-modes?). For example, a failure mode for a FIFO would be that the FULL flag is not raised when the FIFO is full, and a consequence would be that data could be overwritten. A safety mechanism to mitigate the problem (assuming this is a critical concern for projected use-cases) might be a redundant read/write control. All of this obviously requires significant design/architecture expertise and might be captured in a spreadsheet or a spreadsheet-like tool automating some of this process.

The next step is called failure mode and effects diagnostic analysis (FMEDA) which really comes down to “how well did we do in meeting the safety goal?” This document winds up being a part of the ISO 26262 signoff so it’s a very important step where you assess safety metrics based on the FMEA analysis together with planned safety mechanisms where provided. Inputs to this step include acceptable FIT-rates or MTBF values for various types of failure and a model for distribution of possible failures across the design.

Here’s where we get to fault simulation along with all the usual pros and cons of simulation. First, performance is critical; a direct approach would require total run-times comparable to logic simulation time multiplied by the number of faults being simulated, which would be impossibly slow when you consider the number of nodes that may have to be faulted. Apparently, Synopsys’ Z01X fault simulator is able to concurrently simulate several thousand faults at a time (I’m guessing through clever overlapping of redundant analysis – only branch when needed), which should significantly improve performance.

There are two more challenges: how comprehensively you want to fault areas in the design and, as always, how good your test suite is. Synopsys suggests that at the outset of what they call your fault campaign, you start with a relatively low percentage of faults (around a given failure mode) to check that your safety mechanism meets expectations. Later you may want to crank up fault coverage depending on confidence (or otherwise) in the safety mechanisms you are using. They also make a point that formal verification can significantly improve fault-sim productivity by pre-eliminating faults that can’t be activated or can’t be observed (see also Finding your way through Formal Verification for a discussion on this topic).

An area I find especially interesting in this domain is coverage – how well is simulation covering the faults you have injected? The standard requires determining whether the effect of a fault can be detected at observation points(generally the outputs of a block) and whether diagnostic points in a safety mechanism are activated in the presence of the fault (e.g. a safety alarm pin is activated). A natural concern is that the stimulus you supply may not be sufficient for a fault to propagate. This is where coverage analysis, typically specific to fault simulation, becomes important (Synopsys provides this through Inspect Fault Viewer).

At the end of all this analysis, refinement and design improvement you get estimated MTBFs for all the different classes of fault which ultimately roll up into 26262 metrics for the design. These can then be aligned to the standard required for the various ASIL levels.

Now that’s analytics. You can learn more about the Synopsys safety solutions HERE.


Samsung is Starting 7nm Production with EUV in June

Samsung is Starting 7nm Production with EUV in June
by Scotten Jones on 04-16-2018 at 12:00 pm

There is a report in the Seoul Economic Daily that Samsung has completed development of their 7nm process using EUV and that production will begin in June. What is claimed in the report is:

  • The process is installed in the Hwaseong S3 Fab
  • Samsung has more than 10 EUV systems installed
  • Production starts in June with Qualcomm, Xilinx, Apple and HiSilicon as customers (Authors correction: the original article was in Korean and the source I used got the translation wrong, apparently only Qualcomm was listed in the original article)

Initially when I read this I was skeptical, but the more I have thought about it and investigated the various elements of this claim, the more I have come to believe this report is largely true. The following is my rational:

According to my tracking of 300mm wafer fabs as published in the IC Knowledge 300mm Watch Database, the Hwaseong fab has 4 phases. Phase 1 is for DRAM, phase 2 is for 3D NAND, phase 3 and phase 4 are know as S3 phase 1 and 2 and are for logic. Phase 1 is 10nm and phase 2 for 7nm. The S designation is used by Samsung for their foundry logic fabs. This fab “cluster” is also known to be Samsung’s EUV hub so 7nm production with EUV in “S3” makes sense and is consistent with this site and our expectations for how it will be used.

I gave a talk on EUV at ISS this year that I wrote up here. While researching EUV status for that talk I tried to determine where every EUV system installed to-date is located. It was my conclusion that Samsung had approximately 10 EUV systems installed consistent with the more than 10 EUV system installed assertion in the article.

The biggest surprise in this article is the idea that development of 7nm with EUV is done. At the SPIE Advanced Lithography Conference this year it seemed like everyone just woke up to the stochastic issues with EUV (I wrote that up here).

Simply put, dose is given by:

Dose = photon energy x number of photons

EUV photons have roughly 10x the energy of deep UV photons and for the same dose there is roughly 10x fewer EUV photons. This contributes to a variety of stochastic effects such as shot noise and photoresist issues. There is however a simple fix to this – run a higher dose. The problem with running a higher dose is the impact on throughput.

The following slide from my ISS talk illustrates the effect of dose on throughput.

Figure 1. EUV Throughput

To-date the throughput numbers that ASML has published are based on a 20mJ/cm[SUP]2[/SUP] dose with 96 steps and no pellicle. Logic devices generally require around 110 steps and Samsung is expected to be using EUV for metal layers, so they will need a pellicle. I have been hearing rumors that Samsung is using a 50mJ/cm[SUP]2[/SUP] dose. The overall impact of the higher dose, pellicle and more steps is that throughput will only be around 60 wafers per hour (wph) (please note that since this slide was created ASML has achieved 140 wph for 96 steps, 20mJ/cm2 and no pellicle).

Assuming Samsung is in fact running at 50mJ/cm[SUP]2[/SUP] that may be sufficient to get around most of the worst stochastic issues and produce a usable process.

The question then becomes would they be willing to accept such low throughput and therefore increased costs. Once again there is a relevant rumor and that is that “foundries” are accepting that they may have to absorb higher initial costs for EUV wafers. Samsung is also a company that is rumored to use brute force to get a process started. At 10nm it was said that in the beginning when yields were low that they simply ran a lot of wafers to get shippable parts. Perhaps in order to get EUV started they will accept low throughput and high costs to be first to production and to start the high-volume learning process.

The combination of what is known about EUV and the rumors about Samsung make me believe that we will in fact see Samsung begin to ship 7nm wafers using EUV starting in June. Likely this will be by running the EUV systems in a way that delivers low throughput and high costs and there may be yield issues as well, but this will make Samsung the first to enter production with EUV.

I will say the customer list surprises me, I thought Apple was at TSMC for 100% of their 7nm business and I thought Qualcomm and Xilinx were also TSMC 7nm customers. But the rest of this report is credible in my opinion.


EDA CEO Outlook 2018 Partly Cloudy

EDA CEO Outlook 2018 Partly Cloudy
by Daniel Nenni on 04-16-2018 at 7:00 am

The funniest line of the EDA CEO Outlook event was that we should rename our Amazon Echos Wally. Yes Wally is that smart and he remembers pretty much everything. I wish I could rename my Echo Wally because my daughter in-law is named Alexa so we have to turn it off when she is over. The discussion took an interesting turn with EDA in the cloud. Dean Drako, CEO of IC Manage, hijacked the evening with a prepared statement on how now is the time for EDA in the cloud. I posted a poll in the forum so let’s wait and see what the crowd thinks. My bet is that it will be overwhelmingly in favor of EDA in the cloud so the $5B question is: Why aren’t we there yet?

Poll: Do you want your EDA tools in the cloud?


During my time with Berkeley Design Automation and Solido Design over the last ten years we would ask customers how many simulations they ran and the answer was always not as many as they would like due to time constraints. What that really meant is that they did not have enough software licenses and/or compute resources which is why we have fast spice and statistical verification tools like Mentor AFS and Solido Variation Designer.

Based on that experience I gladly accepted a business development job with an EDA cloud start-up. My task was to get the EDA companies on board with cloud and work out a profit sharing deal. Interestingly enough ALL of the EDA companies we met with were ready, willing, and able to jump into the cloud. This was the second or third time they had been asked and their only requirement was to get a name brand customer to collaborate with. We failed of course as have many other commercial EDA cloud efforts including the one from IBM in 2015. We never really got past the lawyers much less the internal IT groups.

Today EDA server farms are either in the back room or configured as a private cloud somewhere cheaper depending on the size of the company. TSMC of course has a huge private cloud for its customers and partners. ARM also has a significant server farm and when the subject of cloud came up Simon jumped in and pretty much begged for an EDA cloud. Wally added that most EDA companies including Mentor already have a cloud option for things like emulation but it really is customer driven.

The reality of the EDA Cloud situation has not really changed much from when I was at the cloud company. As a cloud user myself (SemiWiki.com) my experience is good and bad. Yes it is definitely much more cost effective and I have a staff of support engineers at my disposal 24/7. But when there are problems I really am at the mercy of my cloud provider. My site was recently migrated to a new server and it caused a whole host of problems even though we only have a handful of software products running. Software updates are always a challenge and not “owning” the problem is very unnerving. Big EDA customers have been telling me lately that they prefer “one throat to choke” (less vendors to deal with) when it comes to EDA and leading edge design. Even though EDA is consolidating, adding in a cloud vendor (another throat to choke) still seems to be a risky proposition for medium to large chip companies, my opinion.

Here is one of many cloud discussions on SemiWiki (in 2012). Have things really changed other than semiconductors now being a $400B market?

Chip in the Clouds – “Gathering”


Is Facebook causing the end of happiness?

Is Facebook causing the end of happiness?
by Vivek Wadhwa on 04-15-2018 at 7:00 am

For the past 30 years, most of us around the globe have welcomed modern technology with few questions and fewer reservations. We have treated each new product as a “solution” and paid little attention to its accompanying problems.

The past six months, though, has seen a rapid change of opinion in the United States, as many in the technology elites have called GAFA (Google, Apple, Facebook, Amazon) and other tech giants to account. One of the most outspoken of Silicon Valley’s moguls, Roger McNamee, who was a mentor to Mark Zuckerberg, has published several articles highly critical of Facebook and launched a campaign, “Truth About Tech”, to educate the world about the evils of Big Tech and strategies for healthier interactions with technology.

I was not surprised by this turn of events, because I had begun work, more than a year ago with Alex Salkever, on a new book on precisely this topic: technology’s impacts on all of us. In the forthcoming Your Happiness Was Hacked, we were fashioning a narrative in which technology companies’ interactive products have been robbing us of fulfillment and connection by deliberately limiting our choices, using sophisticated manipulation to entice us into ever more consumption of their wares.

This may at first be counter-intuitive. The promise of the Internet, the smartphone, social media, and virtual and augmented realities is of enrichment and improvement of our lives by the additional choices they offer. But it is a mirage. Though the Internet may seem to offer an endless range of applications, content, and communication tools, the unhappy reality is that the options available are rapidly decreasing in utility and reward and increasingly herding us into habits of mindless consumption.

Witness what has become of Google. The search engine that originated as a means of finding the most relevant answers to search queries has degenerated into a massive online advertising medium that heavily prioritizes whatever others pay it to. A search on a mobile phone — say, for the best hotel in Mumbai — yields a handful of results of which every one of the top 10 has either been paid for specifically or represents a giant media or hotel company.

Facebook too manipulates the information we would imagine it supplying unfiltered. Its deep detective work into our individual lives is its basis for manipulating our news feeds with the aim of maximizing our clicks and taps — without actually asking us whether we enjoy the endless array of pictures of our friends’ weddings. (We must, because we spend time there, right?

Then there are the incessant beeps, noises, and interruptive alerts of Whatsapp. Intrusions of this type are now common to most communication applications, and they take a large toll on our well-being. They make it harder for us to do our jobs in a concentrated or thoughtful fashion. We accomplish less, which makes us miserable. Economists are even suggesting that the very technologies that we suppose make all of us so productive have, through their distractiveness, instead become responsible for a plateau in the growth of worker productivity in the past decade.

Yet we find ourselves unable to break the habit: we are afraid of missing out; we are expected to respond quickly to friends, relatives, and co-workers; and all of these technologies embed addictive characteristics — the most obvious being psychological rewards such as “likes” — that use the same techniques of beguilement as casinos’ computerized gambling machines do to ensnare us.

The raw truth is that smartphones and applications foster psychological addictions without consideration of the human cost or of design principles that might be less profitable for them but healthier for people in the long run.

How can we alter our technology lives such that we enjoy real choice, understand the trickery of enticements, and regain the agency necessary to human happiness? How can we make the tech companies back off and allow us to establish our own cadence in our use of their tech?

Pushed to operate ethically, smartphone makers could allow us, on their phones’ home screens, to select a “focus mode” that would disable all notifications and social media, even taking the additional step of reverting them to greyscale to reduce the attractiveness of their screens’ brightly colored notification bubbles. YouTube could ask us whether we wish to always play another video automatically when we first sign up for the service, in order to help us avert binge watching.

As for our own defenses, we will need to work hard to insert pauses into periods of thoughtless enthrallment. Turning off most applications’ alerts, checking e-mail only in batches at designated times, and using our phones to call family and friends and talk to them rather than sending them incessant smartphone messages would help most of us make a great start on rejoining the living.

For more, you can preorder my next book,Your Happiness Was Hacked, it will show you how you can take control and live a more balanced technology life.

This article is one in a series related to the10th Global Peter Drucker Forum, with the thememanagement. the human dimension, taking place on November 29 & 30, 2018 in Vienna, Austria #GPDF18


Enabling A Data Driven Economy

Enabling A Data Driven Economy
by Alex Tan on 04-13-2018 at 12:00 pm


The theme of this year CDNLive Silicon Valley keynote given by Cadence CEO, Lip-Bu Tan evolves around data and how it drives Cadence to make a transition from System Design Enablement (SDE) to Data Driven Enablement (DDE). Before elaborating further, he noted on some CDNLive conference statistics: 120 sessions, 84% done by users, 1200 registered attendees and for the first time it was extended to a two-day event.

Lip-Bu provided snapshots of data growth hitting 5-8 Zettabytes volume. He indicated that in data driven economic cycle, we need to understand and address how data get created, stored, transmitted and analyzed as illustrated in figure 1.

Admitting to have more financially apt perspectives, Lip-Bu shared his upbeat take on how data has driven the economic cycle. Last year growth was 22% and for the first time crossing the $400 billion mark. He noted it has shown an encouraging strength going forward. The enablers, he coined as ‘key waves’ are in mobile, automotive, machine learning, edge computing and data center.

For 2018-2019, these segments will bring in growth ranging from 4.2% CAGR in cellular (5G, 3D sensing), 11.4% CAGR in automotive (ADAS, infotainment, etc.) to 13.1% CAGR in IoT (with distributed edge clouds, closer to user, shorter latency and different way to compute). He also reiterated growth coming from hyperscale webservices or data centers. He shared research data on AI related Venture Capital (VC) funding to top $14 billion with 1600 deals and 42% projected CAGR for Deep Learning chipset covering 2016 – 2025 period.

The opportunities are there, spanning from sensors and devices feeding data to the intelligent edge (where protocol translation and device management take place), through ML and neuromorphicprocess, and eventually ending with the cloud. On the horizon, Lip-Bu pointed out some emerging disruptive technologies such as silicon photonics, neuromorphic computing, quantum computing, nanotubes, and blockchain as becoming the future growth drivers. There will be push towards 400Gb/s and 800Gb/s interface speeds; augmenting quantum with AI to gain stability and performance; neuromorphic applications with ultra-low-power environment; blockchain assisted transaction through semi/GPU’s and brain related applications (wakeup or sleep controls).

AI and Hardware Design
Halfway into his presentation, Lip-Bu introduced two guest speakers addressing hardware solutions optimized for machine learning related and data analytics applications. The first one was Rodrigo Liang, a former Oracle SPARC hardware executive turned CEO, who just received first round of funding for his SambaNova System startup. “Semi(conductor) is capital intensive effort”, he said. He believed semiconductor (silicon) is in the center of AI. “We need to consider the software stack, what the software wants”.

Rodrigo replayed the compute evolution from scale-up in nature, to scale-out and eventually to AI computing oriented. Each domain owns its own unique bottleneck to tackle: from CPU instruction set, network latency or bandwidth to the current memory bandwidth or capacity related. Furthermore, each is also characterized by its own business challenges (such as power, cooling, cost constraints) and technology issues (memory, implementation platform: FPGA vs custom ASIC, new software development: neural network types).

The second guest speaker was Gopal Raghavan, CEO from Eta Compute, a startup founded in 2015 attempting to deliver enabling solution for intelligent IoT devices. He showcased an embedded platform to do low-power, machine learning audio/speech and visual/image recognitions, allowing training done on the edge. This approach was intended to avoid the need of transmitting high volume data over the power intensive RF based network.

The hardware design was asynchronous and utilized a few of Cadence tools (JasperGold formal verification, Modus test insertion, Virtuoso ADE, Variety and Tempus Statistical). The amount power consumed by the demonstrated device was between 1.0 to 1.5 mW and only needing 55nm low-cost, process technology.

Enabling Cadence Solution Offering
In the second half of his talk, Lip-Bu showed more growth data in design starts based on technology nodes, including for 10nm and smaller of 29.2%. The projected EDA CAGR growth of 6.2% for 2017-2022 (from 2.1% level in 2011-2016). He noted that as proof to Cadence’s culture of innovation, between 2015-2017 more than 25 new organically developed tools were introduced. He stressed on three areas in moving from SDE to DDE, namely system integration, package and board and CHIP (core EDA).


He announced an enhanced Virtuoso Design Platform to support advanced process nodes including 5nm (more coverage on this in my subsequent blog). He highlighted solution supports to photonics (hi-speed) or packaging (2.5D, 3D); the ongoing AI/ML augmentation in the implementation fabrics (from design creation, physical implementation, electrical signoff to physical signoff; and addressing mixed-signal, low-power and safety in verification spectrum (from formal/static, simulation, emulation to prototyping). His take on key technologies to address the uncertainty of design intent are parallelization, optimization and ML or data analytics.

Closer to the design ecosystem, the IP segment has an 18% growth and tends to be more vertical focus (HPC, auto, mobile/communication). It has a comprehensive portfolio including for advanced nodes with further works in PCIe, USB and memory related areas. Commenting on the recent nuSemi acquisition as enabling those hyperscale data center to address high-speed I/O connectivity needs, he alluded to the Star-IP notion as applied to Tensilica. He said Tensilica processor as an ideal core to power various kind of applications such as upcoming sentiment analysis, song analysis, etc. Its accompanied software stack includes Xtensa Neural Network Compiler on top of Xtensa C/C++ compiler.

In his closing remarks, Lip-Bu convinced that the existing design ecosystem comprising of four spheres (foundry, IP, EDA/Cadence, customer) should now include these additional, smaller spheres (software, channel, standard and compliance, design tools). It is a $400 billions IT industry with new frontier of requirements. “It is not a sunset industry”, he quipped.


Intel Based FPGA Prototyping Webinar Replay

Intel Based FPGA Prototyping Webinar Replay
by Daniel Nenni on 04-13-2018 at 7:00 am

Due to the overwhelming response, here is the first part of the webinar that I did with S2C and a link to the replay. Richard Chang, Vice President of Engineering at S2C did the technical part of the webinar. Richard has a Masters degree in Electrical engineering from the University at Buffalo and more than 20 years experience designing chips, including two US patents. Here is the agenda:

 

Achieve High-performance & High-throughput with Intel based FPGA Prototyping

FPGAs have been used for ASIC prototyping since the beginning of FPGAs (1980s) allowing hardware and software designers to work in harmony developing, testing, and optimizing their products. The high density FPGA – Intel Stratix 10 and Arria 10 are available now with Stratix 10 FPGAs delivering breakthrough advantages in performance, density, and system integrations with single logic die using the Intel 14nm Tri-gate process. In this webinar, we will highlight the advantages of using Intel FPGAs for prototyping and walk through the implementation flow for both single and multi-FPGA boards.

  • Stratix 10 & Arria 10 FPGA Highlights
  • S2C S10 & A10 Prototyping Platforms
  • Single FPGA Design and Debug Flow
  • Multi-FPGA Design and Debug Flow
  • Demonstration – Implementing DDR4
  • Q&AIt really did bring me back to the good old Altera vs Xilinx days where they used to beat each other up and provide customers with the most cost competitive products. Based on what I have learned by working with S2C the past few months Intel/Altera is now superior to Xilinx for FPGA Prototyping, absolutely.Webinar:Intel’s latest Stratix-10 and Arria-10 FPGAs have considerably improved FPGA prototyping applications. Using the Intel 14nm process the Stratix-10 FPGA performance is more than twice the speed and capacity is more than five times larger than the previous generation. Today, we will start the webinar with highlights of Stratix-10 and Arria-10 features for FPGA prototyping. We will then introduce the new S2C Intel-based product line. We will also illustrate the compile flows for both single and multi FPGA designs. Finally, we will walk through a quick design implementation using a DDR4 reference design followed by questions and answers.

    Intel is now shipping the production version of its flagship Stratix-10 FPGA 2800 devices. The 2800 is about 3 times the density of the stratix-5 generation which makes design fitting and partitioning much easier. In addition, the Intel Stratix-10 FPGA uses a single logic die architecture versus multiple dies which enables higher utilization and better performance. Intel is also planning to ship the Stratix-10 5500 device that will almost double the capacity of the 2800. Additionally, the 5500 will have a package footprint that allows easy upgrading from the 2800.

    The Intel 14nm process also makes a big difference on performance. The maximum frequency has increased from 174MHz to 427MHz compared with the previous Stratix 5 generation. There’s also significant improvement on Stratix-10 FPGA I/O and high-speed transceivers. LVDS is now fully configurable and can run at 1.6GHz making pin-multiplexing between FPGAs more efficient. The high-speed transceivers can run at up to 58G – which is more than enough for most SoC Prototyping applications such as video streaming and high-speed data transfer.

    The Arria-10 has most of the features of the Stratix-10 except it is smaller. The largest Arria-10 device the 1150 is about half the size of the Stratix-10 2800. With its attractive entry price point the 1150 is suitable for a variety of small to mid-sized IoT/SoC applications. The Arria-10 has abundant internal memories and lots of DSP cores. In fact, the DSP cores are the industry’s only hardened floating-point DSP blocks making the Arria-10 the top choice for computation intensive applications.

    Many of today’s applications, such as AI, IoT, computer vision, and autonomous driving, requires intensive software and firmware development so having the ability to deploy an array of pre-silicon platforms for software development and compatibility testing dramatically increase the chance of a successful product launch. With affordable pricing, Arria-10 1150 FPGA is the ideal candidate for those applications.

    Next I will introduce S2C’s complete FPGA prototyping solution for Stratix-10 and Arria-10 FPGA but first a quick overview of S2C. S2C is a worldwide leader in providing both hardware and software solutions for FPGA prototyping. The S2C 60+ member team is fully dedicated to delivering FPGA prototyping solutions and they have served over 400 customers in the past 15 years. S2C is headquartered in San Jose, CA with direct support centers in Shanghai, Beijing, Hsinchu, Shin-Yokohama and Seoul.

    S2C offers a wide range of Intel Stratix-10 and Arria-10 based FPGA prototyping hardware. For the S10-series S2C offers Single, Dual, and Quad Prodigy Logic Modules that can go from 28M gates to 220M gates when the 5500 is available from Intel. The 2800 Dual and Single Prodigy Logic Module are shipping now and the 2800 Quad Prodigy Logic Module will be available in July. For smaller and medium sized designs, the A10-1150 is a good alternative with 2 form factors to choose from: standard expandable chassis with flexible I/O or the PCIe finger form factor.

    The S10 and A10 Prodigy Logic Modules are S2C’s 6[SUP]th[/SUP] generation FPGA prototyping system that is easy to expand for different applications, scale for different design sizes, and are reusable for different projects. Next is a 1 minute video that highlights the key features of the new S10 and A10 Prodigy Logic Module chassis system…

    Another key feature of S2C’s S10 and A10 FPGA prototyping systems is the many off-the-shelf daughter cards that are available. The use of daughter cards for FPGA prototyping is an important concept as it allows flexibility in case design specs change, expandability for design growth, and reusability for future designs.

    S2C provides 80 different types of memory, interface, and accessory cards for customers to quickly put together prototyping platforms that closely resemble final products. Some examples are ARM processors, PCIe, Ethernet, USB, DDR4, Flash memories, HDMI, and many others.

    S2C also provides daughter card design guidelines in case prefer to develop your own application daughter cards or if you choose not to build-your-own but still want a customized application specific daughter cards. S2C also provides daughter card design services.

    Next Richard will explain the FPGA prototyping software flows for Intel Stratix-10 and Arria-10 FPGAs….

     

     


HCM Is More Than Data Management

HCM Is More Than Data Management
by Alex Tan on 04-12-2018 at 12:00 pm

While tracking Moore’s Law has become a more expensive and difficult endeavor in the HPC design, the mobile SOC design space is also increasingly heterogeneous and complex. Strict safety guidelines such as the ISO-26262 being imposed in the automotive applications further exacerbate the situation.

Looking closer into the design ecosystem, we could view the segregated landscape as being occupied by four key players, namely foundry, EDA, IP and design service providers. For example, the first ADAS computer vision SOC tapeout in February last year is as a result of a collaboration among three IP companies (Dreamchip, ARM, Arteris), an EDA (Cadence), a design service (INVECAS) and a foundry (Global Foundries). It is intuitively clear that collaboration should serve as a common denominator, in order to ascertain a seamless design implementation and successful product rollout.

Design realization involves taking its formulation into different level of abstractions, which then get optimized, verified, analyzed and aligned with foundry requirements. All of these imply frequent and occasionally massive data generation, in binary and ASCII alike. Key to a proper handshake among these ecosystem players is a formal process or policy for data and version control management. Last month, the use of Hardware Configuration Management (HCM) from ClioSoft, SOS7, as embedded agent in various underlying point-tools or flow interface had been discussed in this blog. In this article, we will expand its usage scenarios within the ecosystem.

Foundry Files
When a new or derivative process node is introduced, it is normally accompanied by foundry’s Process Design Kit (PDK). A PDK is a collection of foundry-specific data files and script files used with EDA tools in a chip design flow. PDK main components are models, symbols, technology files, parameterized cells (PCells), and rule files. Any process related fine-tuning and control variations could result in an incremental release of PDK. On the other hand, timing models and its related parameters are captured and released as SPICE model as illustrated in Figure 2.

Once the PDK is passed to the foundry customers, the chain reaction starts. The design and IP teams should make a call as to which part of the design step(s) in the flow that need a respin. PDK changes usually impact changes to routing vias and metal stacks, parasitics parameters or extraction setup; although they may or may not be relevant to the integrity of the standard cells library. SPICE model updates, however, would trigger a library recharacterization and a timing respin. With ClioSoft’s HCM SOS7, such PDK update can be captured as separate reference projects, allowing ease of retrievals for correlating with prior versions and tracking trade-off of chosen design metrics. It is usually normal to expect between 4 – 6 iterations for a new process. For example, annually TSMC releases between 500 – 700 techfiles and 50-70 PDK updates for all supported processes.

Aside from PDK, there is usually a validated reference flow accompanying each foundry process node rollout. A reference flow is adopted by foundry and open-source IP provider such as ARM to address critical design challenges associated with the new process technologies and pipe-clean the flow to be ready for performance, power and area optimization.

Packaging
Other variations which might require creating different design implementation scenarios, are presented from the IP and packaging selections. Depending on the market segments (automotive, IoT or mobile), the form factor, power or thermal requirements may drive the package selection. Figure 5 shows various package technologies vs market segments. With stringent requirements such as ISO-26262
and the availability of advanced packaging analysis, it is becoming common to analyze the impact of project targeted packaging on the system’s silicon. For example, FOWLP (Fan-Out Wafer Level Packaging) is known for its low cost and high performance and selected in low-power, high performance mobile applications. A thermal-stress analysis can be performed to assess its reliability. Another example is the System-in-Package (SiP) which is targeted for the IoT wearable, RF and automotive. Each of these packaging analysis such as 3D-Electro-Magnetic simulation, thermal and stress analysis need to be aligned and synchronized with upstream system silicon. Since SOS7 platform is methodology agnostic, data management from this downstream analysis can be folded into the ecosystem.

IP Reuse and Management
We often discuss about design reuse as it applies to both internal and third party IPs. The steps in generating, maintaining and propagating design changes as well as user experiences as manifested in scripts, documents, or other file formats, are daunting–especially with increasingly shortened deliverable schedules to meet time to market. ClioSoft SOS7 addresses most of these requirements. It helps the design team to streamline the IP development and management, ensuring efficient collaboration while dealing with many design collaterals.

ClioSoft’s SOS7 platform can be easily integrated with different applications. The development environment is separated from the release environment. Normal procedural access steps (such as checkout, modify, checkin) are enforced with either a corresponding locking mechanism or concurrent checkout (with merging capability), similar to Software Configuration Management (SCM) features. Several other ClioSoft’s SOS7 neat features include:

  • Customizable triggers as a condition, e.g., no check-in prior to a clean code linting.
  • The use of symbolic labels/tags on revisions to communicate a revision status.
  • Customizable composite object, treating multiple files as single object.
  • Sandbox for local workspace, while SOS7 monitors and send project level periodical updates.
  • Rewind or snapshot features add flexibility to move along progress or debug timelines.
  • Simplified IP release through a script, copying collaterals from development to the release environment.
  • Tool level ‘diff’-ing of two revisions.

Unlike Software Configuration Management (SCM) which may be confined into a distinct set of files and formats, Hardware Configuration Management (HCM) involves handling many design parts and formats. ClioSoft SOS7 offers an integrated development and management platform for not only design data but also design knowledge.

For more info on ClioSoft HCM SOS7 please check HERE

Also Read

ClioSoft and SemiWiki Winning

IoT SoCs Demand Good Data Management and Design Collaboration

ClioSoft’s designHUB Debut Well Received