Banner Electrical Verification The invisible bottleneck in IC design updated 1

Nvidia’s Pegasus Putsch!

Nvidia’s Pegasus Putsch!
by Roger C. Lanctot on 10-29-2017 at 7:00 am

There hasn’t been this much excitement in Munich since the 1920’s. Nvidia’s great pivot was on display at the GPU Technology Conference Munich 2017. Digital dashboards are out and robotaxis are in as Nvidia narrows its focus on the tip of the automotive industry disruption spear.

To be clear, Nvidia is triangulating on the automotive industry from a number of different angles ranging from augmented and virtual reality design tools (Holodeck) to server technology implemented across the spectrum of cloud service providers. It just so happens that Nvidia is also trying to reduce the “supercomputer-in-the-trunk” phenomenon of self-driving cars down to a device the size of a license plate.

Notes Nvidia: “Of the 225 partners developing on the Nvidia Drive PX platform, more than 25 are developing fully autonomous robotaxis using Nvidia Cuda GPUs. Today, their trunks resemble small data centers, loaded with racks of computers with server-class Nvidia GPUs running deep learning, computer vision and parallel computing algorithms. Their size, power demands and cost make them impractical for production vehicles.”

The significance and manifestation of Nvidia’s Pegasus pivot was the introduction at the GTC event in Munich of a license-plate sized device capable of displacing all that trunk-hogging hardware in self-driving cars. Setting aside the cost and performance conversation, the shift of focus at Nvidia to so-called robotaxi startups is a massive turning point for the industry.

Intel, too, has turned away from its prior fixation on infotainment in automotive dashboards. Intel’s tie up with BMW and acquisition of Mobileye was an equally tectonic shift in focus on safety and autonomy.

Nvidia prides itself on being the coolest kid on the technology block – from Jensen’s signature keynote leather jacket (did he borrow that from GM CEO Mary Barra or did she mimic him?) to the whooping and chanting Nvidia fan-boys (at least in Silicon Valley) when the company talks about petaflops and such at press events. But robotaxis? Nvidia? Really?

The big news out of Munich is that Nvidia is seeking to confer its uber-coolness upon robotaxis, by which the company clearly means Uber and Lyft judging by the slide image (pictured above). Of course Nvidia is conflating definitions and business models by using the term robotaxis along with the super-frothy Goldman Sachs forecast of $285B.

The real change that is coming along to blindside the entire transportation industry is multipassenger shared and autonomous transportation vehicles. The mass of startups emerging across the planet is primarily focused on luring consumers out of their individually owned and operated vehicles into shared public transportation resources.

But somehow, announcing to the world that Nvidia was pushing pods and shoving shuttles likely didn’t have the same “Total Recall,” Philip K. Dick sizzle of launching a robotaxi assault on people moving. It doesn’t matter. The bottom line is that Nvidia has taken the first step toward making shared, public transportation cool. What’s not to like?

So, Jensen, bring on the petaflops and let’s save some lives, reduce some emissions and congestion and feel uber-cool in the process. Hail Nvidia! Hail shared vehicles! Hail disruption!


Software Defined Networks (on Chip) – NetSpeed Systems and UltraSoC Team Up to Use Embedded Analytics to Enable Next Generation SoCs

Software Defined Networks (on Chip) – NetSpeed Systems and UltraSoC Team Up to Use Embedded Analytics to Enable Next Generation SoCs
by Mitch Heins on 10-28-2017 at 7:00 am

NetSpeed Systems is known for their network-on-chip (NoC) IP that enables complex heterogeneous SoC architectures. NetSpeed IP supports both non-coherent and coherent memory and I/O schemes as well as configurable, customized last level cache optimization through their Orion, Gemini and Pegasus IP respectively. They are also known for their NoCStudio software which uses artificial intelligence (AI) techniques to synthesize and analyze a NoC based on different security, power, performance, area and quality of service (QoS) requirements.

NoCStudio works exceptionally well, assuming you have good estimations of the types of network traffic that will be seen between various master-slave combinations of the SoC. The better the understanding of the expected network traffic, the better job NoCStudio can do to come up with an optimized NoC solution for your SoC. Depending on the end application however, this can be more complex than it first sounds. Obviously, designers can simulate the system but keeping track of the myriad of interactions going on between master-slave combinations while accounting for factors such as dynamically changing traffic profiles and varying workloads is not trivial. This approach requires greater insight into the interactions among the various subsystems and components on the chip itself.

NetSpeed addressed this challenge by announcing that they have teamed up with UltraSoC, a technology startup based in Cambridge, UK that specializes in embedded analytics IP. UtraSoC and NetSpeed have integrated UltraSoC’s monitors, debug ports, and analytical reporting to work seamlessly with NetSpeed’s IP and NoCStudio software. UltraSoC’s IP lets designers intelligently monitor and control the activity of on-chip structures such as custom logic, buses and CPU cores. This data can be used to better understand the system interactions within the SoC, revealing hard-to-find bugs, increasing quality and removing development risks and potential liability costs.

UltraSoC’s IP is modular, hierarchical and scalable and consists of three classes of IP. The first being made up of analytic modules that enable monitoring and control of system components. The second being a dedicated messaging infrastructure fabric that connects all the UltraSoC components. The third being a communicators interface module that connects the UltraSoC system to on-chip or external systems. Because UltraSoC’s IP is modular and scalable, it is an exceptionally good fit with NoCStudio as it has the degrees of freedom required to allow NoCStudio’s synthesis engine to be able to make the necessary system level trade-offs.

A key feature of the joint solution is that the UltraSoC IP provides the SoC designers with an unprecedented capability to help designers with post-silicon validation and debug of the SoC when it is first coming off the manufacturing line. The IP gives the engineers the ability to get a much better insight into how the system and NoC is really behaving when running the SoC for the first time. Designers can monitor a plethora of different traffic patterns through the system and get real time data on system performance by having the embedded monitors reporting data through the communicators module to either embedded software or to the outside world through different types of interfaces.

While the partnering companies are not necessarily advertising the capability, you’ve got to wonder how long it will be before a clever designer uses the joint solution to parameterize the NoC and system at design time and then use the embedded monitors and analytics to configure the system at run time through the UltraSoC messaging infrastructure. Imagine a system that can be monitoring itself using the analytic data and then adjusting its NoC setup to achieve software defined requirements based upon the application being handled by the SoC. Embedded software could be monitoring the SoC’s performance for things such as how long it is taking to make DMA transfers, or looking for mismatches between processors and dedicated hardware accelerators during certain compute intensive tasks and then take action to temporarily optimize the network traffic paths, all the while ensuring that QoS of other paths is being maintained.

Combining UltraSoC’s IP with NetSpeed Systems’ NoCStudio and NoC IP is a very complimentary and smart connection. It enables the SoC designers to embed structures that will enable them to more quickly bring-up the SoC and then integrate the chip into various systems by looking at data from the chip itself. And, as eluded to earlier, the system has legs to enable some truly unique capabilities in the future for software defined NoCs (SDNoCs). While UltraSoC’s IP is scalable and parameterized, what better way to make use of those features than to have NoCStudio’s synthesis engines use them during its trade-offs. Add the capability to synthesize in flexibility based on embedded system monitors and you’ve got a system that can literally tune itself to its work load. Very impressive.

If you want to find out more about this new approach, NetSpeed Systems and UltraSoC will be hosting a joint webinar to discuss this solution in more detail. The webinar entitled: Debug, Analytics, NoC, and beyond… Exploring uncharted galaxies of interconnects!Will take place on November 2, at 17:00 GMT (UK time). To see more details of the webinar and to register for the event use this link.

See also:
NetSpeed Systems offerings
UltraSoC offerings


Arm 2017 TechCon Keynote Simon Segar!

Arm 2017 TechCon Keynote Simon Segar!
by Daniel Nenni on 10-27-2017 at 7:00 am

Now that the dust has settled with the Softbank acquisition I must say that Arm is truly a different company. There are now a lot of new faces from outside the semiconductor industry, which is a good thing, and a lot less stress from Wall Street which is an even better thing. Simon can now wear whatever he wants without the worry of lowering the stock price…

Simon’s keynote “Humanizing Technology” was very interesting and the panel session with Cyber Psychologist Dr. Mary Aiken is definitely worth your time when it goes up for replay. Simon mentioned that October is National Cyber Security Awareness Month (NSCAM) which I did not know. In fact nobody else in the audience seemed to know either so that is a problem. NSCAM is sponsored by Homeland Security, our tax dollars at work…


Simon also covered the new Arm IoT Security Manifesto which you can download:

A battle is raging to keep systems secure as we race to realize the immense value data insights can bring. As part of this battle, technology companies have a responsibility to society that extends beyond just delivering products. In our Manifesto document, we describe how the threat to the data-driven world is increasing and detail technical directions we can follow to confront that risk. Beyond that, we explore the nature of that responsibility as guardians of the Information Revolution and discuss the Social Contract all technology providers need to rally behind.

It is a quick but important read especially if you have children or grandchildren coming. The foreword is by Mary and that led me to her book “The Cyber Effect” which is now in my Kindle library. Unfortunately, my children are oversharing millennials and a lost cause when it comes to privacy and security so I will focus on my grandchildren.

Dr Mary Aiken is the world’s leading expert in forensic cyberpsychology – a discipline that combines psychology, criminology and technology to investigate the intersection between technology and human behaviour. In this, her first book, Aiken has created a starting point for all future conversations about how the Internet is shaping our perception of the world, development and behaviour, societal norms and values, children, safety and security…

Security was mentioned in every one of the keynotes I attended and I would say it was the most popular topic of discussion on the show floor. It really is daunting when you think about a trillion devices on the internet just waiting to be hacked. Even if you are diligent about security (as I am) you may still fall prey to one of your inner circle (friends and family) who got hacked.

The other big concern is bandwidth. Not only are more devices being added to the internet everyday, much more data per device is being generated. When AI starts to hit our silicon the amount of data will increase exponentially causing data jams of epic proportions.

If you put these two things together:

“Cybersecurity is a mess and the bad news is unless we do something it’s going to get worse.” Simon Sagers, Arm TechCon, 2017.

Absolutely.


Navigating the System-in-a-Package Manufacturing Ecosystem

Navigating the System-in-a-Package Manufacturing Ecosystem
by Mitch Heins on 10-26-2017 at 12:00 pm

Being an old ASIC physical design guy, I tend to think of ASICs from a “bond-pads-in” perspective. This week however, I had a very eye-opening discussion with Dan Leung, Director of Packaging and Assembly for Open-Silicon, that totally changed my perspective. While I had been exposed many times to the concept of systems-in-a-package (SiPs) I had never thought of it from the view point of an ASIC or IP provider. The point to be made here is that one can’t afford to think “pads-in” ASICs anymore.

The more-than-Moore effect has resulted in a very robust manufacturing ecosystem for SiPs. As a result, ASIC and IP vendors alike really need to be thinking about the full system-in-package solution. In my conversation with Dan, he walked me through a presentation done by Open-Silicon at the 24[SUP]th[/SUP] annual IEEE Electronic Design Process Symposium (EDPS) on Efficient Design and Manufacturing that was held last month in Milpitas, CA. The presentation was entitled “High Volume Manufacturing Supply Chain Ecosystem for 2.5D HBM2 ASIC SiPs”. While the presentation focused on manufacturing for HBM2-based systems, it quickly became apparent that this ecosystem is key to enabling not only high-bandwidth memory applications, but also the quickly growing internet-of-things (IoT) market. SiPs have gone mainstream and how you build your IP and ASICs will be highly dependent upon how you plan to manufacture your SiPs.

As an example, Open-Silicon recently released a HBM2 memory control IP subsystem. When doing this IP, they went through the process of designing their own HBM2 SiP so that they could understand the trade-offs that must be made during the design process.

It turns out that there are many challenges to properly designing a SiP and the number of players in the ecosystem with whom you must work is daunting to say the least. It includes foundries, interposer foundries, OSATS (outsourced assembly and test companies), ASIC and IP houses, known good die (KGD) vendors, package vendors, test companies and EDA vendors.

What Open-Silicon found in doing their IP design is that it is key to understand the manufacturing ecosystem and the impact it will have on your design. You can’t afford to be only thinking pads-in, but instead, you must also be thinking about the constraints the silicon interposer and package will have on the complexity and cost of your design.
In Open-Silicon’s HBM2 memory subsystem, they spent a lot of time optimizing the pad locations and drivers of their ASIC IP so that they could meet the stringent HBM2 interface specs while minimizing the footprint and cost of their interposer. Open-Silicon also had to think about how to make their design IP as agnostic as possible to different design rules from the various interposer manufacturers so that their IP could be readily usable in both foundry and OSAT manufacturing flows.

Open-Silicon also found that it was key to consider your proposed interposer complexity. State of the art manufacturing enables wafer level testing of die so that you have known good die before assembly. The interposer however is another story. Interposers play an important role in the overall yield and cost function of the SiP. Silicon interposers are not bleeding edge technology in terms of printing, however in terms of assembly they are unique in that interposers with through-silicon-vias (TSVs) must go through many more manufacturing steps to thin down the interposer (in some cases the system die as well if you are stacking die on the interposer). These ultra-thin dice are easily deformed and require special assembly techniques and are highly susceptible to yield loss.

Additionally, since the interposers don’t have active devices on them, testing can be problematic. On an interposer die the size of a reticle field, there can be hundreds if not thousands of traces running through a 2 to 3 level metallization. The large die size can negatively impact yield both in terms of manufacturing defects and handling defects. Interposers bigger than the reticle field can be costly to print and the fine pitches even at 65nm are such that it can be very expensive to build probe cards capable of testing every trace through the interposer.

To keep costs down, manufacturers put test structures on the interposer and use those to test the overall manufacturing process. The interposer function however is usually not fully tested until the interposer can be placed onto the package substrate with at least one of the known good die. It’s at that point that you have electrical signals that can be generated by the die along with probe a fixture that can be easily used on a tester. The bad news is that if you have a bad interposer, you likely just wasted an expensive known good die and possibly the package. Having a well thought out test strategy that can be used to check the interposer before adding the most expensive die can save you a lot of money.

So, how do you navigate the fast waters of this new SiP manufacturing ecosystem? The answer is to work with those who have traveled those paths before you. Working with a company like Open-Silicon who has gone through the SiP design, manufacturing and testing process multiple times with multiple different vendors in the ecosystem can mitigate a lot of risk, and save you a lot of time and money, especially if this is your first SiP design.

For ASIC designers who have had a pads-in mentality, it’s time to wake up and start drinking your early morning coffee with companies like Open-Silicon who can you help you navigate the new frontier of the SiP manufacturing ecosystem.

About Open-Silicon
Open-Silicon transforms ideas into system-optimized ASIC solutions within the time-to-market parameters desired by customers. The company enhances the value of customers’ products by innovating at every stage of design — architecture, logic, physical, system, software and IP — and then continues to partner to deliver fully tested silicon and platforms.

See Also:
Open-Silicon web site
Electronic Design Process Symposium (EPDS)


Good Library Hygiene Takes More Than an Occasional Scrub

Good Library Hygiene Takes More Than an Occasional Scrub
by Bernard Murphy on 10-26-2017 at 7:00 am

You don’t shower only before you have to go to an important meeting (teenagers excepted). Surgical teams go further, demanding a strict regimen of hygiene be followed before anyone is allowed into an operating room. Yet we tend to assume that libraries and physical IP (analog, memories, other physical blocks) are checked and pronounced clean by the provider and thereafter require no further hygiene-checks.


That view is based on a presumption that libraries and physical IP were somehow frozen in time and were perfectly checked (or, more likely, that that is somebody else’s problem). In fact, library and other physical IP (and even hardened digital IP) are just as subject to change (and errors) as any soft IP. Bugs in design and characterization are found and fixed, parametric models are improved and models are updated to reflect process and design refinements. As a result, it is common that careful teams walk a library hygiene path close to surgical expectations to minimize surprises.

Here I’m not thinking about the functionality and general parametrics of the IP, but more the consistency, completeness and basic reasonableness of the library models. If a supplier or an internal group give you a NAND gate or a PHY or a memory which doesn’t function as advertised, you have to start a different discussion. But you should be able to detect and demand correction to bad library models before they contaminate active designs.

Fractal Technologies have just published an entertaining white paper detailing the daily routine of a user of their Crossfire product in ensuring that good library hygiene is maintained. They illustrate this based on a new rev of a design they call Enigma-II, very successful in the first rev, now being ported to a smaller process node with a few additional interfaces. And naturally they have a short window to release this updated product.


The library verification engineer’s day (the engineer is JT Kirk in the WP, next update hoping to see Michael Burnham) starts by checking the nightly regression – sounds familiar. In Engima-I the design team ran into a bunch of library inconsistencies, some caught late in design. Now using Crossfire, Jim can quickly detect mismatches between views or missing pin-labels in rarely-used corner-case files and fire off a “fix ASAP” note to the IP owners, with all relevant details.

Jim also has to complete the Liberty power model for an IP with 7 different power domains and nearly 100 power terminals. Lots of opportunities for mistakes in power arcs and power pin attributes. Getting this right requires careful checking between Spice and Liberty files with power-domain-aware schematic support using SpiceVisionPro (from one of my favorite companies, Concept Engineering). Jim finds a problem in the schematic which hadn’t been caught in Spice testing completed so far. Note that here he caught not a characterization bug but a design bug – potentially much more damaging.


Then Jim has to inspect a new foundry library update. Who knows what changes this might represent, across hundreds of cells and hundreds of process corners? With Crossfire, Jim can fire off regression runs across a server farm to retest all required checks, yet allow for some acceptable variation in parameters like terminal capacitances for example. A challenge here is that this many checks across hundreds of cells and corners could lead to a deluge in violations from a few root causes. Crossfire has a neat way to visualize such problems through what they call an error fingerprint, to quickly identify a possible root cause for multiple violations. Once isolated, he can start a discussion with the design team and possibly the foundry. No need for surprises at signoff – significant changes become visible immediately.

Enigma-I was a big enough success that Jim’s company wants a second source for the derivative design, so now he has to qualify another library. But he can’t afford to double his effort, so he communicates acceptable quality expectations to that foundry in the Crossfire Transport format; using this the foundry can run all required checks and make corrections as needed, so Jim’s final incoming inspection should always pass clean.

Multiple libraries, multiple updates, frequently updated IP – that’s life in design these days. We all need a process to ensure that what we are getting in these updates is as thoroughly scrubbed as we expect it to be – not occasionally, but every time we get a new drop, because we will be accountable for not finding problems, even if the root-cause was somewhere else. You can read the white paper HERE.


Open source RISC-V ISA brings a new wrinkle to the processor market

Open source RISC-V ISA brings a new wrinkle to the processor market
by Tom Simon on 10-25-2017 at 12:00 pm

By now most people are quite comfortable with the idea of using an open source operating system for many computing tasks. It speaks volumes that Unix, and Linux in particular, is used in the vast majority of engineering, financial, data base, machine learning, data center, telecommunications and many other applications. It was not always so.

The history of commercial operating systems is replete with proprietary OS’s. At first there was tremendous resistance to the idea of using open source for something so fundamental. However, the advantages are pretty clear. One thing that adoption of open source OS’s lead to was a reevaluation of where value in the ecosystem resides. RedHat made a successful business model of offering superior support with an open source product. Point being that companies in these markets now go looking for places to add value rather than attempting to generate revenue by locking customers in.

Now you say this is all well and good for software, but what about processors? With the x86 architecture we have seen decades of litigation and conflict. Think of the millions of dollars spent on legal and court costs in the battles over that instruction set architecture (ISA). Indeed, the current licensing arrangement for x86 and its 64 bit variant boggles the mind. Even now Intel is shaking their swords at Qualcomm over ISA emulation of the x86 instruction set.

So the question needs to be asked: where is the value in processor design? Is the ISA a big competitive advantage, or if there was an open source ISA would the value shift to the specific implementation, and would the entire industry benefit by shared development? Well, we are about to find out. And the progress to-date is impressive.

Taking a quick survey of the processor market, we see that the big players are ARM and x86. The x86 ISA is of course divided up between Intel and AMD – just go to Wikipedia to read the whole gory story. There are a number of smaller processors serving the embedded market such as AVR, MIPS, etc. But, for the most part the big players in the ISA market are ARM and x86, both of which have evolved over many years. ARM for its part is trying to move up the food chain into servers, and Intel is trying to move down into the IoT and embedded markets. Each architecture comes with its own baggage and is having to adapt to make their move.

Reduced instruction set computer (RISC) ISA based processors have been around for quite a while, but none of them is enjoying huge commercial success right now. Many years ago, in an effort to create a vehicle for processor design research computer scientists at Berkeley started working on a non-proprietary RISC ISA. Fast forward many iterations to today and we have the RISC-V initiative. They have published a complete, usable and implementable ISA that is open source with no license and no royalties.

The RISC-V foundation now has over 65 members, including some of the biggest names in semiconductors, hardware and software. The ISA is modular, with a minimum base and standard extensions, as well as provisions for custom extensions. It supports 32, 64 and 128 bit architectures, along with operating modes for User, Supervisor and Machine.

There are bit streams for use in FPGA’s, RTL impplementations, and there are off the shelf IC’s you can buy. One company, SiFive, even has an Arduino compatible development board available for purchase based on their working silicon. The San Mateo based SiFive recently presented their latest offerings at the Linley Processor conference in Santa Clara.

During their presentation SiFive covered many interesting points about RISC-V and their specific implementations. They have partnered with TSMC and have an off the shelf implementation of their E310 core available as a part or on their Arduino compatible development board. The Freedom E310 chip incorporates SiFive’s E31 RISC-V 32 bit core running at over 320 MHz. This specific core, the RV32IMAC includes the integer instruction set, the extension for integer multiplication and divide, extension for atomic instructions, extension for compressed instructions, the privileged ISA specification, and external debug support. It also comes with 16KB L1 instruction cache, a 16KB data SRAM scratchpad, onboard OTP NVM, a wide variety of clock and interface support.

The E31 core is also available for integration into SOC’s. It is available as an FPGA bitstream for evaluation, or as RTL for synthesis for evaluation prior to full licensing. Moving up their product hierarchy there is the E51, which is a 32 bit core that they suggest is ideal for applications such as SSD controllers or networking applications.

However, the star of their presentation at the Linley Processor Conference was their new U54-MC core. This core comes with four of their U54 cores combined with an E51. It is capable of running a full featured Linux OS. This quad core processor is suitable for AI, machine learning, networking, gateways and smart IoT devices. In TSMC’s 28nm it runs at 1.5GHz typical.

Here is a summary of the features of SiFive’s U54-MC, which they favorably compare to the ARM Cortex-A35.

The final point is that a system designer might be concerned about the availability of development tools for processors with a new ISA. Because of the interest in RISC-V there has been a lot of development in this area. This is evident if you go take a look at the RISC-V Github repository at https://github.com/riscv. There is a wide range of support for things like OpenOCD, GNU, Linux, etc. Additionally, SiFive is making sure that users of their cores can access their Freedom Studio, which works on top of the Eclipse IDE. Freedom Studio is available on Windows, Mac and Linux.

SiFive is also bringing a radically different business model to processor IP. They have streamlined the process so you can get the specifications without any NDA. FPGA bit streams are downloadable, and RTL is also easy to get. RISC-V, and SiFive along with it, are gaining a lot of momentum. Any new processor has to compete on technology, but it seems that RISC-V is a solid and stable specification and that SiFive is making big strides in implementation. I look forward to seeing how this plays out in the market. In the meantime, I might just go and order one of their arduino boards to get some hands-on experience with a RISC-V processor based system. The SiFive website has a lot more information on RISC-V, their own cores and the development tools and environments.


Timing Analysis for Embedded FPGA’s

Timing Analysis for Embedded FPGA’s
by Tom Dillinger on 10-25-2017 at 7:00 am

The initial project planning for an SoC design project faces a difficult engineering decision with regards to the “margin” that should be included as part of timing closure. For cell-based blocks, the delay calculation algorithms within the static timing analysis (STA) flow utilize various assumptions to replace a complex RC interconnect load after routing and parasitic extraction with an effective capacitance for gate delay modeling. The library characterization data is then used to launch an (effective) waveform at the gate output to calculate the arrival times and slews at the RC network fanout pins. These calculations have an implicit error tolerance that is incorporated into the margins added to the STA flow path delay histograms.

The other day, I was having coffee with Geoff Tate and Cheng Wang from Flex Logix Technologies, providers of embedded FPGA IP on leading process nodes. Naively, I asked what guidelines Flex Logix provides to their customers, in terms of timing margins for the delay calculator and STA reporting features of their eFLEX compiler.

Cheng smiled, and said, “We do not have to provide margins. The reported path timing will accurately reflect what the customers will ultimately measure in silicon.”

He could tell that I looked a little puzzled.

Cheng continued,“An embedded FPGA implementation is different than a typical SoC block physical design. Yes, both approaches utilize a synthesis flow to a target library, followed by placement and routing steps. Yet, whereas timing analysis for the cell-based block has the challenge of modeling the interconnect load for a general fanout network, all the interconnects in an eFPGA are pre-defined. We invest significant resource to accurately characterize all elements of the eFPGA fabric to determine their signal delays and arrival slews — the LUT cells, all the route segments, the logic switches. And, then we confirm those models during our silicon qualification. The accuracy of the timing reports for a customer design is built-in, due to the extensive characterization data that is directly applicable.”

I finally got it. The building blocks of the eFPGA enable detailed characterization to be completed prior to customer release.

“So, how does an eFPGA customer run STA?”, I asked.

Cheng replied,“The eFLEX customer will follow a familiar flow as they have used for a general SoC block. A set of timing constraints are input, to define clocks and operating modes. A multi-corner, multi-mode (MCMM) set of scenarios is defined.” (see the figure below)


Geoff added, “The eFLEX compiler exercises synthesis, place, and route for the highest priority MCMM setting, to achieve the optimum implementation. The compiler provides STA path timing results for all the MCMM scenarios.”

“Given the pre-qualified characterization detail, STA is simplified to summation of individual circuit and net segment delays, once switch assignment and routing are complete. eFLEX uses a path-based delay propagation algorithm.”, Cheng described. “And, as clock arrival skews are also accurately characterized, any necessary hold time corrections by LUT delay insertion are applied judiciously. As the clock routes in the eFPGA fabric are highly optimized, very little functional delay path padding is typically required, perhaps in DFT scan mode.”

Geoff and Cheng shared some example screen shots of the STA results from the eFLEX compiler. The figures below depict a path delay histogram, and upon selecting a specific path, the detailed breakdown of its individual delay contributions.


There was one caveat that Geoff and Cheng shared.“Recall that an eFPGA design recommendation is to register the I/O signals. Whereas a general SoC block designer may invest significant effort in time budgeting and constraint file settings for inter-block paths, that is not the focus for out customers. They are seeking accurate, predictable register-to-register path timing for the functionality implemented in the programmable eFPGA logic.”

eFPGA IP is certainly unique. The detailed characterization of all the fabric elements enables accurate path timing analysis results, and eliminates the need to allocate significant timing margins

For more information on the Flex Logix eFLEX compiler and the path timing analysis features, please follow this link.

-chipguy


Silicon Creations talks about 7nm IP Verification for AMS Circuits

Silicon Creations talks about 7nm IP Verification for AMS Circuits
by Daniel Payne on 10-24-2017 at 12:00 pm

Designing at 7nm is a big deal because of the costs to make masks and then produce silicon that yields at an acceptable level, and Silicon Creations is one company that has the experience in designing AMS IP like: PLL, Serializer-Deserializer, IOs, Oscillators. Why design at 7nm? Lots of reasons – lower power, higher speeds, longer battery life.
Continue reading “Silicon Creations talks about 7nm IP Verification for AMS Circuits”


DSP-Based Neural Nets

DSP-Based Neural Nets
by Bernard Murphy on 10-24-2017 at 7:00 am

You may be under the impression that anything to do with neural nets necessarily runs on a GPU. After all, NVIDIA dominates a lot of what we hear in this area, and rightly so. In neural net training, their solutions are well established. However, GPUs tend to consume a lot of power and are not necessarily optimal in inference performance (where learning is applied). Then there are dedicated engines like Google’s TPU which are fast and low power, but a little pricey for those of us who aren’t Google and don’t have the clout to build major ecosystem capabilities like TensorFlow.


Between these options, lie DSPs, especially embedded DSPs with special support for CNN applications. DSPs are widely recognized to be more power-efficient than GPUs and are often higher performance. Tying that level of performance into standard CNN frameworks like TensorFlow and a range of popular network models makes for more practical use in embedded CNN applications. And while DSPs don’t quite rise to the low power and high performance of full-custom solutions, they’re much more accessible to those of us who don’t have billion dollar budgets and extensive research teams.

CEVA offers a toolkit they call CDNN (for Convolutional Deep Neural Net), coupling to their CEVA-XM family of embedded imaging and vision DSPs. The toolkit starts with the CEVA network generator which will automatically convert offline pre-trained networks / weights to a network suited to an embedded application (remember, these are targeting inference based on offline training). The convertor supports a range of offline frameworks, such as Caffe and TensorFlow and a range of network models such as GoogLeNet and Alex, with support for any numbers and types of layers.

The CDNN software framework is designed to accelerate development and deployment of the CNN in an embedded system, particularly though support functions connecting the network to the hardware accelerator and in support of the many clever ideas that have become popular recently in CNNs. One of these is “normalization”, a way to model how a neuron can locally inhibit response from neighboring neurons to sharpen signals (create better contrast) in object recognition.

Another example is support for “pooling”. In CNNs, a pooling layer performs a form of down-sampling, to reduce both the complexity of recognition in subsequent layers and the likelihood of over-fitting. The range of possible network layer types like these continues to evolve, so support for management and connection to the hardware through the software framework is critical.

This framework also provides the infrastructure for these functions you are obviously going to need in recognition applications, like DMA access to fetch next tiles (in an image, for example), store output tiles, fetch filter coefficients and other neural net data.

The CDNN hardware accelerator connects these functions to the underlying CEVA-XM platform. While CEVA don’t spell this out, it seems pretty clear that providing a CEVA-developed software development infrastructure and hardware abstraction layer will simplify delivery of low-power and high-performance for embedded applications on their DSP IP. An example of application of this toolkit to development of a vision / object-recognition solution is detailed above.

Back to the embedded / inference part of this story. It has become very clear that intelligence can’t only live in the cloud. A round-trip to the cloud won’t work for latency-sensitive applications (industrial control and surveillance are a couple of obvious examples) and won’t work at all if you have a connectivity problem. Security isn’t exactly enhanced in sending biometrics or other certificates upstream and waiting for clearance back at the edge. And the power implications are unattractive in streaming the large files required for CNN recognition applications to the cloud. For all these reasons, it has become clear that inference needs to move to the edge, though training can still happen in the cloud. And at the edge, GPUs characteristically consume too much power while embedded DSPs are battery-friendly. DSPs benchmark significantly better on GigaMACs and on GigaMACs/Watt, so it seems pretty clear which solution you want to choose for embedded edge applications.

To learn more about what CEVA has to offer in this area, go HERE.


IoT Security Hardware Accelerators Go to the Edge

IoT Security Hardware Accelerators Go to the Edge
by Mitch Heins on 10-23-2017 at 12:00 pm


Last month I did an article about Intrinsix and their Ultra-Low Power Security IP for the Internet-of-Things (IoT). As a follow up to that article, I was told by one of my colleagues that the article didn’t make sense to him. The sticking point for him, and perhaps others (and that’s why I’m writing this article) is that he couldn’t see why you would want hardware acceleration for security in IoT edge devices. He wasn’t arguing the need for security. He was simply asking why you would spend the extra hardware area in a cost-sensitive device when you could just use the processor you already have in the device to do the work in software.

I thought this was a good question and one that needed more than a flippant answer from me, so I went back to Intrinsix and had an interesting discussion with Chuck Gershman, director of strategic development at Intrinsix. It turns out the short answer is “power.” Edge devices spend a large percentage of their life not doing much. Many of the newest edge devices run off the tiniest of batteries and use energy harvesting from vibrations, pressure, light, etc. to fuel themselves. To do this, however, they must literally be able to shut themselves down for long periods of time.

So, how do security accelerators help? Well, most IoT edge devices that use their CPUs for security tasks don’t really shut down. They go into a sleep mode that keeps system registers alive so that they don’t lose device state. If you lose device state, the device must do a secure boot when it is time to wake up, and that takes CPUs both time and energy.

Intrinsix has shown that by using their hardware accelerated IP, they can fully shut down the device and then do a secure reboot in milliseconds instead of multiple seconds it takes a CPU to do the same thing. By using a dedicated hardware accelerator, they can boot the system up to 800 times faster, and the amount of power saved by being fully shut down instead of simply sleeping can lead to a 1000X power reduction and up to 10X better battery life.

Not to be totally rebuffed, my colleague then made the point that we were talking about IoT edge devices that were supposed to cost in the sub $1 / device range (in some cases in the pennies per device range). Hardware accelerators implied bigger die which means higher costs. In one sense my colleague was correct. It’s well known that cloud servers and IoT network hubs are expected to have lots of encrypted traffic as the cloud servers could be dealing with hundreds of network hubs, and the network hubs could be dealing with thousands of edge devices. One would expect to see dedicated security hardware in these devices to handle all the secure connections. The edge device, however, is likely only to be talking to just a few or maybe only one network hub.

Time for another discussion with Chuck who was all too happy to explain that the beauty of the Intrinsix security IP was that it was highly scalable. It turns out when Intrinsix designed their IP, they used an architecture that let them use configurable parallel computing for the security features. This means that they can optimize the design to meet different power, performance, and area (PPA) trade-offs while still giving you the benefit of having hardware acceleration.

So, you can still get the power benefits provided by the accelerators while having a minimal area penalty (which could be insignificant depending on the silicon technology used, pinout and package). And, since the IP is configurable, you can optimize the IP for whatever work load the device is expected to see. For network hubs and servers that means you can significantly boost their performance by adding more parallel compute lanes in the IP.

Last statement from my doubting colleague, was “Ok, so it sounds like I have to be a security guru to know how to optimize this IP to make the implied trade-offs”. For this one, I already knew the answer, which was, “no, you don’t.” Intrinsix is a design services firm that has the platforms, process, and people required to ensure first-turn success of your semiconductor project. They already have security expertise in-house and the necessary knowledge to optimize the IP for you. You tell them what you are trying to do and they can generate a fully optimized security IP for the job that is ready to drop into your ASIC. And… If you so desire, they can also help you to embed the IP into your ASIC or do the entire ASIC as well.

So, for those readers who had the same doubts as my colleague, I hope this article has cleared things up. Of course, if you want more details the Intrinsix team will be happy to talk to you.

If you want to learn more about Intrinsix and their IoT offerings, you can find them online at the link below. You may also want to download their IoT eBook.

See also:
eBook: IoT Security The 4[SUP]th[/SUP] Element
Intrinsix Fields Ultra-Low Power Security IP for the IoT Market
Intrinsix Website