Banner Electrical Verification The invisible bottleneck in IC design updated 1

A DVCon Tutorial on Advanced Formal Usage

A DVCon Tutorial on Advanced Formal Usage
by Bernard Murphy on 03-27-2018 at 7:00 am

Synopsys has been quite active lately in their messaging around formal verification. One such event at DVCon this year was a tutorial on some of the more advanced techniques/ methodologies that are accessible to formal teams, mostly presented by customers, though opened by a Synopsys presentation. The tutorial covered so many bases that I won’t try your patience by reviewing it all in one blog. Here I talk about the Synopsys and Samsung talks; there will be a follow-on blog covering the NVIDIA and Qualcomm presentations.


Iain Singleton was a hardware engineer at Imagination Tech before he moved into formal verification support at Synopsys and posted at least a couple of papers (with Ashish Darbari and others) on formal methods while at IMG. His topic for this talk was use of invariants and inductions to improve convergence in formal proving. I was confused at first about these methods, but I’ve got (I think) a clearer picture now.

The point of invariants is to reduce the size of the proof state-space to encourage convergence. Invariants are related to assume-guarantee methods where you split a big problem into two parts, say A and B, assume certain properties must hold on inputs to A in your first sub-proof, then prove/guarantee the properties you assumed in your second sub-proof. Invariants are like those intermediate properties except no design-split is required; you plant them in the design. In the example above, Iain starts with a complex assertion he breaks it down into sub-assertions and continues breaking down until he is able to prove relatively simple assertions. Turn those into constraints and the state-space shrinks! Then prove the next more complex invariant candidates and so on up to the main property you want to prove.

Induction is a little trickier to grasp but very useful. The idea is that if you can prove a property is true in one case, and you can prove if it is true at some cycle it must also be true at the next cycle, then it must always be true. Iain demonstrated this through a proof on a small FSM with significant sequential depth (always challenging for formal). If you start in the reset state with the assertion above, it will either take forever to find the bug or more likely the tool will give up. Instead Iain replace the assertion with this:

as_ind_state_equal: assert property (design_state == tb_state |=> design_state == tb_state);

See the difference? Instead of jumping straight to trying to prove the design state and TB state are always equal, instead you go for a proof that if they are equal in some cycle, then they must also be equal in the next cycle. One more thing – you don’t start in the reset state; you start in an unconstrained state and rely on the magic of the formal engines to find a counter-example (CEX) quickly. Which they do in 2 cycles in this example!

Finally, you can combine induction and invariants into something truly powerful – inductive invariants. From a non-reset state, find a CEX to a property you would like to use as a constraint. Then use an inductive approach to prove this CEX is not possible. Then use your constraint as an invariant in larger proofs, again with the goal of greatly reducing state-space for your primary objective. Pretty clever stuff but well within the grasp of sharp engineers.

Shaun Feng from Samsung Austin R&D provided a view of efficient formal modeling techniques. He should know; he has led formal verification over 10 years at Samsung, Oracle Labs and NVIDIA. Shaun opened with the basics: how to efficiently get to useful proofs as quickly as possible. So cut-points, black-boxes, assume-guarantee, abstracting design by shrinking size, grouping assertions by similar cones of influence and using proven assertions as invariants to cut down the state space (see above). One interesting idea for me was symbolic constants. These are values you want to remain stable (after reset) through the proof but at an unknown/random value. Shaun showed an example where this could be useful in proving properties for two types of arbiter (round-robin and priority) and for an in-order transport proof (for a bus protocol check).

Think about a priority arbiter. This has some equal number of req and grant signals. One requirement is that if there is a request on req(n) and a request on req(m) and m is less than n, there cannot be a grant on n before there has been a grant on m. You could enumerate and test all the possibilities but that would be painful. Instead, you use symbolic constants for m and n; they’re randomized coming out of reset but also stable through the proof, so you get to cover all possibilities in one proof. Shaun elaborated similarly on the round-robin and in-order transport examples which are worth studying for the methods he describes.

Sean had some interesting comments on formal and simulation. First, he noted that symbolic constants obviously don’t map directly into simulation, so you have to use a little conditional compile code to enable the symbolic constant in formal runs and a random value in simulation runs. Why does he bother, instead just going with the formal proof? Shaun echoed something I have heard elsewhere – that even though he has proven his assertions in formal, he still runs them in simulation signoff.

A member of the audience asked if Shaun had ever missed a bug in formal. He implied this was rare, but he had once missed a clock-gating bug. Naturally this could be attributed to incomplete assertion coverage (though he added that RTL design had changed after formal signoff was wrapped up) but how do you prove you don’t have that problem? There are coverage techniques leading us towards better confidence, but Shaun admitted because of that bug, he’s still not comfortable with formal-only signoff and will continue to use simulation with proven assertions to handoff.

You can learn more about the Synopsys VC Formal solution HERE.


Vertical Prototyping with Intel FPGAs

Vertical Prototyping with Intel FPGAs
by Bernard Murphy on 03-26-2018 at 7:00 am

It has been an article of faith in the design tools business that there’s little to be gained from targeting market verticals because as far as tools are concerned, all verticals have the same needs. Which is good in some respects; you maximize the breadth of the market to which tooling can appeal. But in so doing the depth of contribution is bounded, as is the value that can be extracted in any given application. (The story for IP is different in some cases but here I’m talking about tooling.)

Webinar: Achieve High-performance & High-throughput with Intel based FPGA Prototyping

This picture is starting to change as new and big markets are emerging, most obviously in automotive where safety and security, ADAS and autonomy requirements are inching their way into EDA product expectations. One example can be found in fault simulation capabilities for failure testing in ASIL-D hardware certification. Another big (possibly bigger) market driver is in cloud applications, serving the design of the specialized hardware that is becoming so popular in that area. Particularly in support of these emerging needs, S2C has just released their single A10 Prodigy Logic Module for FPGA-based prototyping.

I should say up front that S2C position this as a general prototyping solution for any small to medium SoC (the board is built on a single Intel Arria 10 GX1150 FPGA). And I’m sure it is. But it seems particularly well suited to prototyping cloud acceleration applications, ADAS imaging and communications designs. The FPGA hosts extensive DSP resources, well suited for imaging and I would guess also for prototyping deep learning subsystems for image-based search and inference for real-time object recognition, as just a few examples. It also hosts the fast transceivers needed to prototype networking / communications systems.

The single logic module system is very compact, fitting all components – FPGA board, extendable power control module, and power supply – into a single low-profile unit for flexibility, durability, and portability. I could see this being appealing for a lot of budget and space-constrained operations building modest-sized designs, and needing the real-time performance that an FPGA-prototype can offer for software development and debug.

The FPGA itself is fabricated in 20nm, so offers leading-edge performance. I already mentioned the DSP resources (over 3000 of them) and over a million programmable logic elements. It also supports 53Mb of on-FPGA memory, also a high-performance on-board DDR4 interface. You get 48 transceivers that will run at 16Gpbs (ideal to prototype backplane communication in your acceleration solution). It is also optionally compatible with the Intel SDK for OpenCL, again a hint to the value of this system in cloud acceleration and other heterogeneous processor applications.


The A10 Prodigy platform comes with extensive support through supplied and available options including run-time interface, a bridge to host-based prototyping, a wide range of prototype-ready IP daughter cards which can plug directly into the chassis, and cloud-based job monitoring. It also supports scale-up to Cloud Cube 32, an enterprise-class solution based on 16 Single A10 logic modules or 8 quad-A10 logic modules, supporting larger designs and/or multi-user usage. And of course the multi-user debug capability I mentioned in my last S2C blog is supported.

If you’re building solutions for cloud acceleration, ADAS, 5G or software defined networking, you face a big software development task on top of your hardware development objectives. You probably really shouldn’t wait for first silicon to start developing software. You could hack together your own FPGA prototype board, but by the time you have it debugged and ready to use you’ll probably have those first silicon samples. Or you could start with a turnkey prototype like the A10 Prodigy. Time is money – perhaps you should look harder at the turnkey solution. You can learn more about the A10 Prodigy HERE.

Webinar: Achieve High-performance & High-throughput with Intel based FPGA Prototyping


Aart de Geus At the Heart of Impact!

Aart de Geus At the Heart of Impact!
by Alex Tan on 03-26-2018 at 7:00 am

At the Silicon Valley SNUG 2018, Synopsys Chairman and co-CEO Dr. Aart de Geuss gave his keynote speech addressing attendees on how far we have evolved, and at times encountered the aha factor that helps propel us to the next level. He explored trends as well as the current state of his company solution offerings.


Moore’s Law, Digital Twin and Security

Aart gave snapshots on how the Synopsys synthesis product (Design Compiler) was incepted in 1986 and called SOCRATES which stands for Synthesis and Optimization of Combinatorial logic using Rule-based And Technology independent Expert System. He quipped that it already contains some intelligent aspects and pointed out that it has three main interacting components timing, library and synthesis.

Next, he noted that the progression steps towards automation usually involves capturing the concept or requirements, followed by modeling it, then simulate, analyze the results and subsequently find ways to optimize and eventually automate it. He indicated that the March of Moore’s Law although slower and more expensive is still occurring towards the 5nm process node. Aart said that the device structure is more “funky”, one could see the interfaces with only few atomic layers. He alluded to the importance of having a digital-twin, a replica of the design or system one’s building. He gave few examples along this line. First on how GE was able to build and simulate the operation of a turbine, jet engine and applied various conditions for a period of time simulating its usage in the field, and being able to identify potential failure spots. Designer could use Synopsys virtual prototyping and performed some analysis on his design prior to the end of its implementation. With the current tool offering, one could also build new device, build the cell, simulate and characterize its timing.

Security is also crucial and getting more attention as systemic complexity and ownership demand measures to respond upon vulnerability to attacks. To this end, Synopsys offers solutions from its acquired security companies: Coverity to address code static analysis; Black Duck to address software composition analysis; and Seeker to address dynamic analysis (a debugging method that evaluates a program during execution). The last one can be used to verify vulnerability during runtime. To illustrate his case, Aart showed four access scenarios that correspond to the four quadrants.

The rise of AI
Aart provided a flashback using slides on how we got here. During the millenium phase: with agriculture and tooling time; followed by the century phase with the steam engine and industrial revolution; then by our recent decades in which we have physical (computing), biological (mobility), and finally at the present time by intellectual (smart everything). The increased valuation resulted from the last inflection points which occurred during the past decades:
● 100B – AI: IoT, Big Data, ML
● 50B – Compute: PC, Internet, Networking Server, Cloud
● 25B – Mobility: phone, smartphones, social media

Just like the invention of the printing press and the use of the alphabet, we are harnessing AI and the digitization of things through a binary-logic concept to create design. “Systemic understanding blossoming worldwide”, he said. Applications such as autonomous vehicles, although not perfected (with recent fatality incidents) and raises ownership as well as privacy issues on the massive amount of data being generated. This has accelerated the impact on compute power, demanding faster performance. He touted that you have to be fast in order to solve the fast-performance demand. He highlighted the company photonic design solution which was complemented recently through the acquisition of Phoenix B.V. The acquisition enables Synopsys to enhance the current photonics design automation in handling physical layout capabilities.

He added that on the path from capture to automation, AI (through machine learning) will add learning and interpretation at the interface of the analysis and the optimization steps. Iterating on the interface is where the action is. He highlighted Synopsys current tool offering as being a total fusion. Previously the company introduced Common Platform which as a framework for synthesis, P&R, STA and extraction. Now Fusion offers four types: Design, ECO, Signoff and Test. In Design Fusion, a tight usage correlation is achieved between Design Compiler and the ICC2 tool. The integration provides a few percentage gains in both power and area, on a hard-to-close timing design. Another fusion example is ECO Fusion which enables PrimeTime and Star-RC/XT usage in ICC2. Ongoing and future works will involve folding in machine learning for additional QoR and improved TTR (Time to Results).

Aart also shared the status of the first Synopsys collaboration with ANSYS in bundling the RedHawk analysis tool in a fusion-like approach to address power integrity analysis within the ICC2 implementation stage. He shared a roadmap towards providing a complement set of features from ANSYS toward midyear.

Aart stated that we have the privilege to be in the middle of profound changes –changes that will impact on our lives and the lives of our children presenting big opportunities, absolutely. In closing he thanked everyone for their continued support.


FPGA, Data and CASPA: Spring into AI (2 of 2)

FPGA, Data and CASPA: Spring into AI (2 of 2)
by Alex Tan on 03-23-2018 at 12:00 pm

Adding color to the talks, Dr. Jeff Welser, VP and IBM Almaden Research Lab Director showed how AI and recent computing resources could be harnessed to contain data explosion. Unstructured data growth by 2020 would be in the order of 50 Zetta-bytes (with 21 zeros). One example, the Summit supercomputer developed by IBM for use at the Oak Ridge National Lab utilized over 27K nVidia GPUs targeting 100+ PetaFlops.

The next question is where do AI algorithms run? Figure 4 shows the segregation of AI algorithms and its respective compute platform. Advanced analytics of the Big Data is usually done on the CPU’s, while ML (learning without explicit programming) being performed on a mixed of CPUs, FPGAs, GPUs; and deep learning (many-layer neural networks) uses GPUs to train; CPUs, FPGA’s to inference; with current trend is to race to ASICs.

Jeff showed how over the years the percentage of error rate in performing image and speech recognitions have dramatically dropped to single digit, approaching human accuracy level. Interestingly, he also shared how custom hardware implementation for AI has circled from FPGA/ASIC back to FPGA again as illustrated in Figure 5.

How can we continue the rate of progress beyond the GPU? The trend is x2.5 Gflops/Watt/year. We could use a reduced or mixed precision accelerator. One example is the use of Phase Change Memory (PCM) to deliver up to 500X speed-up with respect to current NVM devices at equivalent accuracy to a GPU. Some researchers attempt to optimize resistive devices for symmetric switching by exploring new materials and devices as building blocks for new AI chip with neuron-like network. Spiking-based architecture (a non-VNA) is adopted for low-power inference (a.k.a. TrueNorth). It consists of 1 million neurons, 256 million synapses, consuming ~70 mW and has ~4 cm2 footprint. IBM collaborated with the Air Force Research Lab on a TrueNorth 64-chip array consisting of 100 million neurons last year.

Another approach is to use a massively distributed deep learning by optimizing communication and data movement based on the available hardware. Scaling the Resnet training to 64 Minsky nodes (IBM nickname for Power8 based HPC), 256 GPU’s, cut the training time from 16 days to 7 hours. To sum up, he anticipates the following key technologies for the next era of computing:

Context and Learning; Visual Analytics and Interaction; Software Defined Environment; Data-centric Systems; and Atomic and Nano-scale.

While previous speakers examined system level aspects of the AI ecosystems, Dr. Steven Woo, VP of Systems and Solutions at Rambus shared his perspective from different vantage point, a bottom-up view of the memory element. Increased data spurred high-speed memory demand to allow quick data movement related to AI training and inferencing.

The performance bottleneck shifted due to growing data from interconnected devices. AI drove new system architecture development as manifested in the nVidia’s Tesla V100 and Wave Computing Dataflow Processing Units. The Roofline Model can be utilized for performance prediction. One could analyze the sweet-spot of an application’s optimal performance on the underlying hardware’s memory bandwidth and processing power. Rooflines vary for different system architectures. The plot in Figure 6 captures performance (operations per second) versus the operational intensity (operations per byte). Two architectural limits are illustrated by the green lines which intersect at ridge point forming a roofline shape.

How to ease the B/W issue? There are several ways, each comes with its own trade-offs:

  • Reduce precision of data – less bit width.
  • On-Chip Memory – highest bandwidth and power efficiency; tens of Tb/sec but less storage vs DRAM).
  • High Bandwidth Memory (HBM) – very high bandwidth and density; 256Gb/sec, High-IO’s, but has interposer related challenges.
  • Graphics Double-Data Rate (GDDR) – good tradeoff between bandwidth, power efficiency, cost, reliability; usually for graphics, now AI; its application challenge is related to ensuring clean signal integrity.

As random access latency is inherently long, architect may convert random access to more streaming application.

Let’s look into how AI may lend a hand in perfecting our senses.
Dr. Achin Bhowmik, CTO and EVP of Engineering for Starkey Hearing Technologies pointed out the 3.2 billion-dollar healthcare market value and the hearing technology product his company provides. Formerly an Intel engineering executive of the Perceptual Computing Group, covering various aspects of IoT designs and AI applications, his passion is to push the envelope of the hearing technology to augment human perception, leading to a better life.

As we know our sensory systems is comprised of vision, hearing, touch, taste and smell (he mentions that we should include spatial sense or sense-of-balance in the list). He quoted statistics from the National Council of Aging that every 13 seconds an older adult is treated at emergency room for a fall (and every 20 minutes an older adult dies from a fall), costing an estimated $67.7B by 2020. This data accentuates how restroring hearing loss (and hence, spatial sense) help reduce such incidents.

Unlike the advancement in computer vision and face recognition technologies, hearing technology is evolving. Just like IoT, hearing aid requires small form-factor. It needs to be small enough to be practical, while on the other hand it requires many interacting components (6 sensors including spatial one, DSP chip and radio/receiver). He highlighted the latest hearing aid recently introduced which could fit inside the ear-canal. It has 7 days of battery life, running at 5 mWatt with ability to interact with phone as gateway. The device has two-layer data security protocol to prevent snooping, one at the device and another at the smartphone app. Figure 8 provides a snapshot of the miniaturization trend.

The product utilizes technology referred as Accuity Immersion, which leverages microphone placement to aid with high-frequency information for improved sound quality and sense of special awareness, helping users relearn key acoustic brain cues to support clear speech, a sense of presence and spatial attention for vital connection to their environment. It allows also sense of directionality (to restore front-to-back cues for a more natural listening experience).

Completing the talks, Dr. Yunsup Li, CTO and co-founder of SiFive, the first fabless company to build customized silicon based on the free and open RISC-V instruction set architecture, pointed out the high-barrier of entry for building custom chips at scale. The company provides scalable chip development on Amazon’s AWS environment. The two-flavors the E300-Everywhere targets embedded processing, IoT, and wearables markets. Designed in TSMC 180 nm.

The second is U500-Unleashed customizable RISC-V SoC IP contains the configurable U5-Coreplex 1.6 GHz+ cache-coherent 64-bit multiprocessor with up to eight cores and application-specific custom hardware. His company selling point is that the fully Linux-capable IPs can reduce NRE and time-to-market for customized SoC designs in markets such as ML, storage, and networking. Also available high-speed peripherals such as PCIe 3.0, USB3.0, GbE, and DDR3/4. The design is compatible with a TSMC 28-nm process.

Delivering the intelligence from the edge to compute brain involves an AI ecosystem. With the backdrop of Intel’s based environment one could anticipate that scalability and elasticity across the networked components is key to match with data explosion. This includes not only pushing the current switching technology envelope (CMOS, FinFet) and memory limits, but also probing into non-conventional solutions such as neuron-like approach.


Uber’s Monkey in the Wrench

Uber’s Monkey in the Wrench
by Roger C. Lanctot on 03-23-2018 at 7:00 am

The news of a pedestrian fatality in Tempe, Ariz., resulting from the operation of an Uber autonomous vehicle has set off alarm bells throughout the AV development community. As always in such circumstances there will be a simultaneous rush to judgement and the immediate termination of all such testing, as well as a call for calm as investigators investigate. For the time being Uber’s testing has stopped.

The highly likely outcome is a finding in favor of Uber after the pattern set by Tesla Motors and followed recently by Cruise Automation: the parties responsible for the robot driver will attribute responsibility for the fatality to the human – either the “driver” in the Uber or the pedestrian. Actually, this is a pattern established more than a century ago by the makers of human driven cars, which allowed car makers to almost completely ignore safety enhancements to cars until well into the second half of the 20th century, by which time millions of humans (drivers, passengers and pedestrians) had been killed by vehicles.

Humans have long been the fly in the ointment of automation. Whether chewed up or maimed by ravenous factory equipment or simply savaged by trains or other moving vehicles, human beings have long been, in the words of “Die Hard”’s John McClane, “the fly in the ointment, the monkey in the wrench.”

Humans have served as the speed bumps along the path to the very progress intended to enrich human existence. In the case of the Uber fatality, though, an essential element has changed. The machines are now able to testify in their own defense, while the humans are left defenseless.

This new pattern started with the now two-year-old fatal crash of a Tesla in Florida. Tesla not only used the vehicle data to show that the Model S in question was being miss-used by the human (a fact that was manifest by the location of the crash), the company used the vehicle data and that of thousands of other Tesla’s to assert that vehicles equipped with Autopilot generated fewer crashes and claims.

Cruise Automation has been quick to follow-up reports of multiple collisions with non-automated vehicles in the San Francisco area by pointing out the blamelessness of its own automated cars. The Cruise vehicles invariably had stopped short to avoid collisions and were, in turn, rear-ended by other vehicles.

Cruise is unique in experiencing at least one incident of human-inflicted violence (mild) on its cars, suggesting a visceral reaction among humans to the existence of self-driving vehicles. Similar efforts to interfere with or impede self-driving cars have occurred elsewhere.

Open hostility to self-driving cars, though, has largely been limited to more formal resistance from safety advocates who oppose the SELF Drive Act currently before the U.S. Senate. The bill has stalled despite widespread support among car makers and previous passage in the House of Representatives.

I ran into a senior auto industry executive at the Geneva Motor Show who expressed strong support for the bill, only to later run into a senior executive of self-driving car maker Waymo on the show floor who indicated indifference to the legislation (even though Waymo itself is a member – along with Ford and Volvo – of the Self-Driving Coalition for Safer Streets – which supports the bill). The bill provides exemptions from Federal Motor Vehicle Safety Standards which call for safety equipment on self-driving vehicles including steering wheels and brake pedals.

The legislation may be less important to Waymo because Waymo is positioning itself as a solution provider, not a car maker. Waymo is providing a transportation service. If anyone is interested in the operation of its vehicles, the company’s guidance is: “Read our safety report.”

Waymo’s safety report describes the operation of its vehicles. This description is seen as adequate to fulfill the requirements of regulators and law makers. There is no need for an exemption nor an abdication of responsibility or liability.

Thankfully, Waymo has neither killed nor injured any drivers or pedestrians, suggesting that, indeed, the company’s operational safety vision – so far – is a sound one. More saliently, this is in the context of Waymo’s long-held and oft-stated intention of delivering cars without steering wheels or brake pedals. Waymo’s is not an incremental approach to automation. It is all or nothing.

Waymo’s approach is not exactly an expression of hostility toward human drivers. It is a simple founding philosophy of its program. Uber, on the other hand, arguably has a history of hostility or at least ill treatment of its drivers, which puts a different paintjob on the manner in which the public will process news of an Uber-inflicted pedestrian fatality.

It is appropriate that Uber suspend its testing to determine what precisely failed during the human-monitored self-driving process that caused a pedestrian fatality. But the vehicle will surely defend itself and Uber AVs will be back on the road within days or weeks. The humans (driver and pedestrian), meanwhile, will be left to plead their case, or at least the driver will, to a skeptical jury of their peers: transportation investigators. (Uber successfully exonerated itself in a previous crash with a Volvo, also in Tempe, AZ. Too bad Uber and Waymo can’t collaborate, in spite of being neighbors.)

Should all such self-driving car testing be suspended as a result of the Uber failure? Probably not. But the incident highlights the challenge of proving a negative – i.e. proving that a crash or injury did not occur because of the presence of safety systems or a self-driving robot. Has Waymo simply been lucky?

Autonomous driving development is now universally seen by legislators and regulators as a source of technological progress and leadership as well as a creator of jobs and a sponge for investment capital. This automated economic engine itself is on autopilot and not likely to be slowed by the humans any time soon – even if the humans throw themselves in the path of progress. The “roads must roll,” to steal a phrase from Robert Heinlein’s book of the same title.

It is only a matter of time before self-driving cars begin clamoring digitally for the complete removal of human drivers from the roadways in the name of efficiency and safety. Is it too late for the human drivers (and pedestrians) to rise and resist? Yippee ki yay! Old rules die hard.

– Self-Driving Car Kills Pedestrian in Arizona, Where Robots Roam – NYTimes.com

Explanation Of Why The Uber Self Driving Car Might Have Killed a Pedestrian at Dangerous Pedestrian Intersection – Bad Intersections blog

– Uber’s Self-Driving Car Showed No Signs of Slowing before Fatal Crash, Police Say – Theverge.com


Qualcomm, AMD on Verification with Synopsys

Qualcomm, AMD on Verification with Synopsys
by Bernard Murphy on 03-22-2018 at 7:00 am

Synopsys hosts a regular lunch at DVCon each year (at least over the last few years I have checked), a nice meal and a show, opening a marketing update followed by 2-3 customer presentations on how they use Synopsys verification in their flows. This year’s event was moderated by Piyush Sancheti from Synopsys Verification marketing and a buddy of mine from way back in my Atrenta days.

As promised, Piyush provided a market update on Synopsys growth in verification. He reminded us that their emulation has been growing nicely (50% CAGR) and that they are viewed among their customers as particularly strong in accelerating software bring-up (backed up by their speaker from AMD). The Verification Group is expanding their focus from platforms to solutions, particularly in automotive, networking, 5G, AI and storage (I may have missed a few). Nonetheless they continue to invest heavily in platforms. They’re seeing good traction with their fine-grained parallelism in simulation, now available to all VCS customers with all simulation flows. VC Formal is also getting strong pickup and continues to add apps and assertion IP.

Next up was Deepak Manoharan, SoC verification manager at Qualcomm North Carolina, on the power of focus in QCOM datacenter technologies. Deepak, a good story-teller, talked more about his philosophies than technical details. He split his topic into two main areas: reacting to changeand conserving time, illustrated by work he manages on the Qualcomm Centriq 2400 server processors. These beasts host 48 cores along with DDR and PCIe interfaces and one of the bigger verification challenges they face is, guess what, proving cache coherence under all possible circumstances. He noted that in general, verification is a lot more challenging than for mobile platforms because servers must very efficiently support many use-cases.

Deepak pointed out that change is a fact of life in real design projects; ability to react quickly is essential and depends on a very clear and complete verification plan. When you need to adapt quickly to a change, interoperability between (verification) platforms is important as is bring-up time. Equally you must always remember why you are doing a certain task in verification. Plodding through the plan is important but when the ground changes underneath you, sometimes certain line-items in that plan lose value and new critical objectives emerge. Handling this effectively is part of reacting quickly.

Conserving time depends on continuing platform performance improvements of course (he noted 2-3X speed-up in the latest release of VCS), but also test-planning by platform, features to simplify and accelerate adapting to varying needs and changes (backdoor load, save/restore, efficient debug and portability between platforms).

Brian Fisk, principal MTS at AMD, introduced us to IP-level hybrid emulation. This is a very interesting direction in shift-left, an approach in which you can pre-verify a fairly extensive software stack while the rest of the hardware is still in development. Brian opened this discussion with an interesting question: are we approaching practical limits for full SoC simulation and emulation? He pointed out that the amount of software development we are having to shift-left (BIOS, drivers, etc) is growing, also we now must worry about power and performance. He suggested that the SW/HW development demands are now accelerating faster than verification platform improvements. Which points to the value of developing and proving out stacks at the IP level.

AMD has had for some time now, I think, a very useful capability they call SimNow. This allows them to model a full SoC, say a discrete GPU, with different parts potentially at different levels of abstraction and/or running on different verification platforms – simulation, emulation, prototyping or even early silicon. From a software developer point of view, the details are transparent, except in performance. Brian cited a possible configuration where the GFX engine runs on a ZeBu emulator and all the other stuff (peripherals, etc.) runs in SystemC models. The software stack runs in VirtualBox connected to this SimNow hardware model and the SW user can (I believe) fairly transparently configure the SimNow model to manage accuracy/ performance tradeoffs (through component swaps) as needed.

Now back to the IP+SW topic. Brian said that they used to compile the whole design into ZeBu. This worked fine but obviously tied down a limited resource during that period, limiting sub-component prove-out at various stages. Now they have switched to the hybrid (using SimNow) emulation model approach where prove-out of sub-components of the GFX core can share an emulator. They are now at a point that 4 independent SW teams can be working simultaneously on hybrid models testing their code against different aspects of the GFX.

In Brian’s view, ZeBu has been a game-changer for AMD. Hybrid models are passing first regressions 8-10 weeks ahead of previous milestones and each is seeing several orders of magnitude more real-world stimulus that they had been able to exercise before. Power and performance testing also starts much earlier. As a result, they now see SW developers finding HW bugs and (the nirvana of system development) the gap between SW and HW developers is closing. Brian wrapped up by answering the question with which he opened; In his view, yes, we have hit the limit of SoC emulation; system level SW development and verification must move to the IP level. Food for thought.

You can register HERE to watch a recording of the panel.


FPGA, Data and CASPA: Spring into AI

FPGA, Data and CASPA: Spring into AI
by Alex Tan on 03-21-2018 at 12:00 pm

Just like good ideas percolate longer, we have seen AI adoption pace picking-up speed, propelled by faster GPUs. Some recent data points provide good indication that FPGA making a comeback to bridge chip-design needs to keep-up with AI’s ML applications.

According to the Deloitte research firm there is a projected increase of ML chips using FPGA, GPU and ASIC implementations in 2018 (refer to Figure 1a). There are more collaborative efforts between FPGA makers Intel, Xilinx and cloud providers in addressing neural networks implementation or ML algorithms.

Let’s take a closer look how FPGA plays a role in the AI space. This year CASPArolled-out a sequel to its last year symposium related to AI and Semiconductor Fusion. Several Silicon Valley technology experts and leaders representing large companies (Intel, IBM, Rambus, Starkey) and startups (SiFive, ADS) shared their perspectives on AI current impacts and its projected trajectories within their respective domains.

Few takeaways from this symposium:
– FPGA seems to be gaining tractions in supplementing GPUs horsepower for running AI based algorithms.
-Just like Moore’s Law on design density was challenged few years ago, it is likely that the Von-Neumann Architecture (VNA) may face similar challenge as design teams explore new techniques to resolve computing performance and bandwidth bottlenecks.
-A preview of humanizing AI application into the medical wearable device, closer to the left-end of the intelligence spectrum, the human brain.

Vincent Hu, Intel’s VP of Strategy Innovation and Planning, whose charter includes assisting Intel’s CTO office and driving the FPGA roadmap for Intel’s Programmable Solutions Group (formerly Altera), outlined how AI is transforming industries and generating data-increase driving dynamics (refer to Figure 2).


His take on Intel’s recent second pivot is data centric in nature, replacing the first one which was evolving around processor and memory.

Furthermore, our need of FPGA is stemmed from the increased programmability of software coupled with performance demand of hardware. One major projected application is related to the 5G migration requiring high bandwidth plus many programmable components (DSP/MIMO/Security). China deployment is expected to be in early 2020. To this end, FPGA provides deterministic system latency. It can leverage parallelism across the entire chip, reducing the compute time to a fraction and bringing system latency (i.e., I/O latency + compute latency) to reasonable level allowing high-throughput.

In what context does FPGA fit into the AI ecosystem? Let’s consider two major aspects in AI’s deep learning (DL) process, commonly narrated when one is covering ML/DL: training and inferencing steps. The former still requires heavy cloud infrastructure comprised of Xeon based power-servers while the FPGA boxes focuses on inferencing (Figure 3a).

His counterpart, Dr. Wei Li, Intel’s VP of Software and Services Group and GM of Machine Learning and Translation, highlighted the complete Intel’s AI ecosystem that provides scalability and the advanced analytics necessary to power AI (Figure 3b). The growth for AI compute cycles is anticipated to be 12x by 2020. AI accelerators such as Intel’s Nervana neural network processors are utilized in the cloud and data center. Libraries and frameworks were hardware optimized as AI performance is based on the sum of software and hardware throughputs. As a point of comparison, a Xeon platinum based server alone offers 2.2X performance improvement versus its predecessor, but a 113X better if software speed-up is included. Both inference and training throughputs improved by over two order of magnitude with the most recent Xeon Platinum based processors.

In order to scale deep learning to multi-node, a three-prong approach is applied:

[LIST=1]

  • Distributing Software Implementation: optimizing DL Frameworks with Intel ML Scaling library (MLSL) and Intel MPI.
  • Increasing Scaling Efficiency: large min-batch training methods; communication volume reduction
  • Support of Parallel Training Algorithms: data/model parallelism; synchronous, asynchronous and hybrid AI optimization technique Stochastic Gradient Descent.

    In tabulated comparison on the left, Intel XEON’s INT8 designated for deep learning inferencing has lower response time, higher throughput and less memory, producing little accuracy loss compared with single precision floating point (FP32).
    The scaling trend keeps increasing: from the initial 32/64 (Google-Net) nodes to 256 CPU nodes (Barcelona Supercomputing Center Marenostrum4) with a 90% scaling efficiency, and recently UC Berkeley used 1600 Skylake CPU nodes and Layer-wise Adaptive Rate Scaling (LARS) algorithm to outdo Facebook’s prior results by finishing a time-to-train (TTT) of ResNet-50 in 32 minutes (AlexNet in 11 minutes) and with much higher accuracy. To show how elastic Intel’s Xeon based application, he pointed out researchers utilized Amazon 1.1 Million nodes for topic modeling (Clemson University). Company such as Facebook also leverage deep learning training during off-hours to gain compute capacity.

    [Part 1 of 2]To continue reading, please refer to [Part 2 of 2]


  • Siemens Leverages Mentor Embedded IoT Framework for Industry 4.0

    Siemens Leverages Mentor Embedded IoT Framework for Industry 4.0
    by Daniel Nenni on 03-21-2018 at 7:00 am

    For those of you who wondered at the logic behind Siemens acquisition of Mentor Graphics last year, look no further than a recent announcement by Mentor, now a Siemens business, regarding the release of their new Mentor Embedded IoT Framework (MEIF). To help connect the dots, we need to back up a bit and review a few things about how the IoT and Industry 4.0 works.


    Since we are primarily concerned with ICs, we note that as part of the Industry 4.0, sensors, actuators, micro-controllers and sometimes multi-core SoCs are being added to industrial equipment to better enable management and control of industrial processes. The basic premise is that IoT edge devices are co-located with industrial equipment. Those devices monitor industrial processes and then send data to IoT gateways for data fusion and some processing. Eventually the data is then sent to the IoT Cloud for further processing.

    In reality, it’s much more complex than that. IoT systems typically are made up of many Clouds and many edge and gateway devices from different providers that must talk to each other. Back-end processing in the cloud is usually handled by service providers such as Amazon, Microsoft and surprise, Siemens!

    We typically think of Cloud companies providing services for things like data storage, data analytics, and transforming data into actionable business intelligence. However, cloud providers also have services to help manage IoT devices, including monitoring of device health, security, and providing for over-the-air software updates for devices with embedded software. This is usually done through a set of application programming interface (API) calls supplied by a software developer’s kit (SDK) that is unique to the cloud provider. The SDK and APIs help to massage data from edge and gateway devices into a form that can be used by the Cloud providers IoT operating system and data analysis applications.


    Now let’s connect a couple more dots. For sophisticated SoCs at the gateway (and sometimes edge) devices, there will be embedded software that must deal with not one SDK but multiple SDKs from multiple cloud vendors. This is where Mentor’s MEIF product comes in. It acts as a standard software switch box that enables the embedded software provider to be able to write their software using one set of libraries, routines and API calls, regardless of the operating system or hardware it is running on, and regardless of the number of different cloud SDKs to which it must talk.

    Some of the key attributes of Mentor’s Embedded IoT Framework are that it is:

    • scalable from micro-controllers up to large multi-core SoCs
    • portable across operating systems including Mentor’s own Embedded Linux and Nucleus RTOS
    • portable across hardware architectures like ARM and X86
    • extensible and customizable
    • has infrastructure and tools to enable IoT security
    • support multi-cloud environments


    MEIF maps embedded software calls in edge and gateway devices to Cloud-provider specific API calls from Amazon Web Services (AWS), Eclipse IoT, Microsoft Azure, and Siemens MindSphere. It also adds code that takes care of a breadth of services that enables backend communications and applications including: device authentication and provisioning; configuration and control; monitoring and diagnostics; and software updates and maintenance. Mentor’s framework is open and customizable for industrial customers who want scalable and portable solutions for multi-cloud environments.

    MEIF also supports features to help manufacturers manage reliability, uptime, and overall quality by using Smart Device information to report device utilization, do system profiling and to provide support for alarms and events, all of which are key value propositions for using Industry 4.0.


    MEIF incorporates Mentor’s Embedded runtime platforms which provide for hardware-based root-of-trust and software chain-of-trust security both during runtime and during device boot-up. This also includes support for secure data transfers both up to the cloud and down to the devices on the manufacturing floor.

    So, to complete the picture, when large companies make acquisitions, such as was done by Siemens with Mentor, you need to first look at the adjacencies. It’s clear in this case that Siemens has a large interest in the IoT Cloud infrastructure with their MindSphere family of products. Potential business is huge, but it could be bigger if they could make it easier for IoT edge and gateway device suppliers to be able to use MindSphere with other cloud providers. Mentor’s embedded systems group was a key enabler to doing just that.

    While it may not be the only reason Siemens acquired Mentor, it certainly looks like one of the good ones. It makes a lot of sense and it’s good to see Siemens leveraging their Mentor investment back into their other products and services.

    See Also:
    Press release – Mentor advanced Industry 4.0 for Smart, Connected Devices
    Data sheet – Mentor Embedded IoT Framework
    Siemens MindSphere


    Free Webinar: Silvaco 3D Solver Based Extraction for Device and Circuit Designers

    Free Webinar: Silvaco 3D Solver Based Extraction for Device and Circuit Designers
    by admin on 03-20-2018 at 12:00 pm

    Designers spend a lot of time looking at their layouts in 2D. This is done naturally because viewing in 2D is faster and simpler than in 3D. It helps that humans are good at extrapolating from 2D to 3D. Analysis software, such as extraction software also spend a lot of time looking at layouts in 2D. While this is fine for approximate results, it turns out that looking at the design this way is not accurate enough for a wide variety of target designs. Fortunately, there is an alternative offered by Silvaco that uses field solvers on 3D structures derived from layout data.


    Using realistic 3D TCD structures for RC extraction may be new to many readers. Fortunately, Silvaco is offering a webinar on March 29[SUP]th[/SUP] that goes into depth on their preferred solution for highly accurate extraction. Dr. Garret Schlenvogt will present a detailed agenda covering an introduction to RC extraction and specifically the topic of solver based 3D extraction, while highlighting the important differences in approaches.

    The webinar will also cover Silvaco’s solution from extraction through SPICE. Their solution works for devices, cells and interconnect – offering a versatile method to work at every level of critical designs. Their 3D structure generation takes advantage of Silvaco’s deep experience with TCAD.

    This free webinar is going to be held at 10AM PST on March 29[SUP]th[/SUP]. It’s easy to register online through their website. This webinar will be informative to a wide audience, including circuit designers who need better RC parasitic information and also process and device engineers looking to improve their optimization efforts.

    About Silvaco, Inc.
    Silvaco, Inc. is a leading EDA and IP provider of software tools used for process and device development and for analog/mixed-signal, power IC and memory design. Silvaco delivers a full TCAD-to- sign-off flow for vertical markets including: displays, power electronics, optical devices, radiation and soft error reliability and advanced CMOS process and IP development. For over 30 years, Silvaco has enabled its customers to bring superior products to market with reduced cost and in the shortest time. The company is headquartered in Santa Clara, California and has a global presence with offices located in North America, Europe, Japan and Asia. www.silvaco.com


    Formal: Going Deep and Going Early

    Formal: Going Deep and Going Early
    by Bernard Murphy on 03-20-2018 at 7:00 am

    This year I got a chance to talk with Cadence at DVCon on a whole bunch of topics, so expect a steady stream of blogs over the next couple of months. First up was an update from Pete Hardee (Director of Product Management) on, surprise, surprise, formal verification. I’m always trying to learn more about this space, so I picked a couple of topics from our discussion to highlight here.

    First, going deep. Formal methods, particularly bounded model-checking, typically operate breadth-first. The engine steps forward one cycle from the current state and checks active assertions, steps out another cycle and so on until assertions are proved, or counter-examples are found, or proving exceeds a specified depth (all modulo constraints of course). This method of proving is exhaustive but limited in the number of cycles it can analyze, since analysis size expands more or less exponentially with depth.

    Does this mean you are stuck with checking properties inside that depth? Apparently not. While property-proving in the classical sense is restricted to breadth-first methods, bug-hunting to significantly deeper levels is also possible. One technique the Cadence JasperGold toolset supports is called cycle-swarm, in which the prover works exhaustively for a number of cycles, then advances forward some number of cycles while either not testing or only lightly testing, then restarts full proving from that point, and so on.

    This trick, in which the engine ignores big chunks of the state space in order to reach further out, can be directed in other ways. State-swarm follows a trail of cover properties you plant on the design. It doesn’t guarantee to follow them in any particular order, only that it will hit each at some point. Guide-pointing follows a similar approach but guarantees to hit your cover properties in a specified order. In both cases you want to define cover-properties close to where you think something might fail.


    I have struggled before in understanding these methods, but I believe I have got it now. The engine starts a new search from each such property (once hit), effectively resetting the span of the search at each such point. From each property you are starting a new (forward) cone of analysis, which is what allows you to reach out so far; you’re advancing step-wise, in bounded cones of analysis. This probably also means you need to scatter cover properties in increments of provable cycle depths before your goal property so none of the intermediate cones blow up before it hits such a property.

    Pete and others freely acknowledge this is bug-hunting rather than proving, however several customers claim this still can be a very productive exercise whether or not you also deploy “classic” formal. An HP user presented at JUG 2015 a case where he found a potential bug (FIFO underflow) using state-swarm at nearly 3000 cycles, far beyond the normal scope of formal proofs.

    The second topic that always interests me is how to put more verification in the hands of RTL designers. Naturally there is a self-serving aspect to this for any tool provider but there are also broader benefits. One is in supporting continuing shift-left. To the extent that RTL designers can hand off cleaner code, verification engineers spend less time tracking down bugs and iterating on RTL updates. A second benefit is in supporting reusability. For a current application, you know bugs will be shaken out in block or system verification. But if this has to happen again and again on new applications of that block, reuse loses a lot of its appeal.

    Formal apps can be a valuable contributor to ensuring high quality handoff (against certain objectives) in both cases. Formal lint (Cadence calls it Superlint) should be a familiar starting point for any RTL designer since it requires little more effort than running a regular linter in most cases.

    The other app in the designer’s desktop is CDC. This app will run structural (e.g. requiring approved synchronizers at domain crossings), functional (e.g. checking grey-coding on FIFO read/write pointers) and reconvergence (e.g. requiring one-hot coding) checks. Handing this kind of analysis to RTL designers ought to be a no-brainer for handoff, though I’m not sure how far this has progressed across the industry and how much it’s still farmed out to the verification team. Perhaps the inevitability of the shift-left squeeze will make the transition inescapable.

    Go HERE to learn more about JasperGold verification apps and watch a video in which Pete explains the advantages of RTL designer’s desktop.