Bronco Webinar 800x100 1

Navigating the System-in-a-Package Manufacturing Ecosystem

Navigating the System-in-a-Package Manufacturing Ecosystem
by Mitch Heins on 10-26-2017 at 12:00 pm

Being an old ASIC physical design guy, I tend to think of ASICs from a “bond-pads-in” perspective. This week however, I had a very eye-opening discussion with Dan Leung, Director of Packaging and Assembly for Open-Silicon, that totally changed my perspective. While I had been exposed many times to the concept of systems-in-a-package (SiPs) I had never thought of it from the view point of an ASIC or IP provider. The point to be made here is that one can’t afford to think “pads-in” ASICs anymore.

The more-than-Moore effect has resulted in a very robust manufacturing ecosystem for SiPs. As a result, ASIC and IP vendors alike really need to be thinking about the full system-in-package solution. In my conversation with Dan, he walked me through a presentation done by Open-Silicon at the 24[SUP]th[/SUP] annual IEEE Electronic Design Process Symposium (EDPS) on Efficient Design and Manufacturing that was held last month in Milpitas, CA. The presentation was entitled “High Volume Manufacturing Supply Chain Ecosystem for 2.5D HBM2 ASIC SiPs”. While the presentation focused on manufacturing for HBM2-based systems, it quickly became apparent that this ecosystem is key to enabling not only high-bandwidth memory applications, but also the quickly growing internet-of-things (IoT) market. SiPs have gone mainstream and how you build your IP and ASICs will be highly dependent upon how you plan to manufacture your SiPs.

As an example, Open-Silicon recently released a HBM2 memory control IP subsystem. When doing this IP, they went through the process of designing their own HBM2 SiP so that they could understand the trade-offs that must be made during the design process.

It turns out that there are many challenges to properly designing a SiP and the number of players in the ecosystem with whom you must work is daunting to say the least. It includes foundries, interposer foundries, OSATS (outsourced assembly and test companies), ASIC and IP houses, known good die (KGD) vendors, package vendors, test companies and EDA vendors.

What Open-Silicon found in doing their IP design is that it is key to understand the manufacturing ecosystem and the impact it will have on your design. You can’t afford to be only thinking pads-in, but instead, you must also be thinking about the constraints the silicon interposer and package will have on the complexity and cost of your design.
In Open-Silicon’s HBM2 memory subsystem, they spent a lot of time optimizing the pad locations and drivers of their ASIC IP so that they could meet the stringent HBM2 interface specs while minimizing the footprint and cost of their interposer. Open-Silicon also had to think about how to make their design IP as agnostic as possible to different design rules from the various interposer manufacturers so that their IP could be readily usable in both foundry and OSAT manufacturing flows.

Open-Silicon also found that it was key to consider your proposed interposer complexity. State of the art manufacturing enables wafer level testing of die so that you have known good die before assembly. The interposer however is another story. Interposers play an important role in the overall yield and cost function of the SiP. Silicon interposers are not bleeding edge technology in terms of printing, however in terms of assembly they are unique in that interposers with through-silicon-vias (TSVs) must go through many more manufacturing steps to thin down the interposer (in some cases the system die as well if you are stacking die on the interposer). These ultra-thin dice are easily deformed and require special assembly techniques and are highly susceptible to yield loss.

Additionally, since the interposers don’t have active devices on them, testing can be problematic. On an interposer die the size of a reticle field, there can be hundreds if not thousands of traces running through a 2 to 3 level metallization. The large die size can negatively impact yield both in terms of manufacturing defects and handling defects. Interposers bigger than the reticle field can be costly to print and the fine pitches even at 65nm are such that it can be very expensive to build probe cards capable of testing every trace through the interposer.

To keep costs down, manufacturers put test structures on the interposer and use those to test the overall manufacturing process. The interposer function however is usually not fully tested until the interposer can be placed onto the package substrate with at least one of the known good die. It’s at that point that you have electrical signals that can be generated by the die along with probe a fixture that can be easily used on a tester. The bad news is that if you have a bad interposer, you likely just wasted an expensive known good die and possibly the package. Having a well thought out test strategy that can be used to check the interposer before adding the most expensive die can save you a lot of money.

So, how do you navigate the fast waters of this new SiP manufacturing ecosystem? The answer is to work with those who have traveled those paths before you. Working with a company like Open-Silicon who has gone through the SiP design, manufacturing and testing process multiple times with multiple different vendors in the ecosystem can mitigate a lot of risk, and save you a lot of time and money, especially if this is your first SiP design.

For ASIC designers who have had a pads-in mentality, it’s time to wake up and start drinking your early morning coffee with companies like Open-Silicon who can you help you navigate the new frontier of the SiP manufacturing ecosystem.

About Open-Silicon
Open-Silicon transforms ideas into system-optimized ASIC solutions within the time-to-market parameters desired by customers. The company enhances the value of customers’ products by innovating at every stage of design — architecture, logic, physical, system, software and IP — and then continues to partner to deliver fully tested silicon and platforms.

See Also:
Open-Silicon web site
Electronic Design Process Symposium (EPDS)


Good Library Hygiene Takes More Than an Occasional Scrub

Good Library Hygiene Takes More Than an Occasional Scrub
by Bernard Murphy on 10-26-2017 at 7:00 am

You don’t shower only before you have to go to an important meeting (teenagers excepted). Surgical teams go further, demanding a strict regimen of hygiene be followed before anyone is allowed into an operating room. Yet we tend to assume that libraries and physical IP (analog, memories, other physical blocks) are checked and pronounced clean by the provider and thereafter require no further hygiene-checks.


That view is based on a presumption that libraries and physical IP were somehow frozen in time and were perfectly checked (or, more likely, that that is somebody else’s problem). In fact, library and other physical IP (and even hardened digital IP) are just as subject to change (and errors) as any soft IP. Bugs in design and characterization are found and fixed, parametric models are improved and models are updated to reflect process and design refinements. As a result, it is common that careful teams walk a library hygiene path close to surgical expectations to minimize surprises.

Here I’m not thinking about the functionality and general parametrics of the IP, but more the consistency, completeness and basic reasonableness of the library models. If a supplier or an internal group give you a NAND gate or a PHY or a memory which doesn’t function as advertised, you have to start a different discussion. But you should be able to detect and demand correction to bad library models before they contaminate active designs.

Fractal Technologies have just published an entertaining white paper detailing the daily routine of a user of their Crossfire product in ensuring that good library hygiene is maintained. They illustrate this based on a new rev of a design they call Enigma-II, very successful in the first rev, now being ported to a smaller process node with a few additional interfaces. And naturally they have a short window to release this updated product.


The library verification engineer’s day (the engineer is JT Kirk in the WP, next update hoping to see Michael Burnham) starts by checking the nightly regression – sounds familiar. In Engima-I the design team ran into a bunch of library inconsistencies, some caught late in design. Now using Crossfire, Jim can quickly detect mismatches between views or missing pin-labels in rarely-used corner-case files and fire off a “fix ASAP” note to the IP owners, with all relevant details.

Jim also has to complete the Liberty power model for an IP with 7 different power domains and nearly 100 power terminals. Lots of opportunities for mistakes in power arcs and power pin attributes. Getting this right requires careful checking between Spice and Liberty files with power-domain-aware schematic support using SpiceVisionPro (from one of my favorite companies, Concept Engineering). Jim finds a problem in the schematic which hadn’t been caught in Spice testing completed so far. Note that here he caught not a characterization bug but a design bug – potentially much more damaging.


Then Jim has to inspect a new foundry library update. Who knows what changes this might represent, across hundreds of cells and hundreds of process corners? With Crossfire, Jim can fire off regression runs across a server farm to retest all required checks, yet allow for some acceptable variation in parameters like terminal capacitances for example. A challenge here is that this many checks across hundreds of cells and corners could lead to a deluge in violations from a few root causes. Crossfire has a neat way to visualize such problems through what they call an error fingerprint, to quickly identify a possible root cause for multiple violations. Once isolated, he can start a discussion with the design team and possibly the foundry. No need for surprises at signoff – significant changes become visible immediately.

Enigma-I was a big enough success that Jim’s company wants a second source for the derivative design, so now he has to qualify another library. But he can’t afford to double his effort, so he communicates acceptable quality expectations to that foundry in the Crossfire Transport format; using this the foundry can run all required checks and make corrections as needed, so Jim’s final incoming inspection should always pass clean.

Multiple libraries, multiple updates, frequently updated IP – that’s life in design these days. We all need a process to ensure that what we are getting in these updates is as thoroughly scrubbed as we expect it to be – not occasionally, but every time we get a new drop, because we will be accountable for not finding problems, even if the root-cause was somewhere else. You can read the white paper HERE.


Open source RISC-V ISA brings a new wrinkle to the processor market

Open source RISC-V ISA brings a new wrinkle to the processor market
by Tom Simon on 10-25-2017 at 12:00 pm

By now most people are quite comfortable with the idea of using an open source operating system for many computing tasks. It speaks volumes that Unix, and Linux in particular, is used in the vast majority of engineering, financial, data base, machine learning, data center, telecommunications and many other applications. It was not always so.

The history of commercial operating systems is replete with proprietary OS’s. At first there was tremendous resistance to the idea of using open source for something so fundamental. However, the advantages are pretty clear. One thing that adoption of open source OS’s lead to was a reevaluation of where value in the ecosystem resides. RedHat made a successful business model of offering superior support with an open source product. Point being that companies in these markets now go looking for places to add value rather than attempting to generate revenue by locking customers in.

Now you say this is all well and good for software, but what about processors? With the x86 architecture we have seen decades of litigation and conflict. Think of the millions of dollars spent on legal and court costs in the battles over that instruction set architecture (ISA). Indeed, the current licensing arrangement for x86 and its 64 bit variant boggles the mind. Even now Intel is shaking their swords at Qualcomm over ISA emulation of the x86 instruction set.

So the question needs to be asked: where is the value in processor design? Is the ISA a big competitive advantage, or if there was an open source ISA would the value shift to the specific implementation, and would the entire industry benefit by shared development? Well, we are about to find out. And the progress to-date is impressive.

Taking a quick survey of the processor market, we see that the big players are ARM and x86. The x86 ISA is of course divided up between Intel and AMD – just go to Wikipedia to read the whole gory story. There are a number of smaller processors serving the embedded market such as AVR, MIPS, etc. But, for the most part the big players in the ISA market are ARM and x86, both of which have evolved over many years. ARM for its part is trying to move up the food chain into servers, and Intel is trying to move down into the IoT and embedded markets. Each architecture comes with its own baggage and is having to adapt to make their move.

Reduced instruction set computer (RISC) ISA based processors have been around for quite a while, but none of them is enjoying huge commercial success right now. Many years ago, in an effort to create a vehicle for processor design research computer scientists at Berkeley started working on a non-proprietary RISC ISA. Fast forward many iterations to today and we have the RISC-V initiative. They have published a complete, usable and implementable ISA that is open source with no license and no royalties.

The RISC-V foundation now has over 65 members, including some of the biggest names in semiconductors, hardware and software. The ISA is modular, with a minimum base and standard extensions, as well as provisions for custom extensions. It supports 32, 64 and 128 bit architectures, along with operating modes for User, Supervisor and Machine.

There are bit streams for use in FPGA’s, RTL impplementations, and there are off the shelf IC’s you can buy. One company, SiFive, even has an Arduino compatible development board available for purchase based on their working silicon. The San Mateo based SiFive recently presented their latest offerings at the Linley Processor conference in Santa Clara.

During their presentation SiFive covered many interesting points about RISC-V and their specific implementations. They have partnered with TSMC and have an off the shelf implementation of their E310 core available as a part or on their Arduino compatible development board. The Freedom E310 chip incorporates SiFive’s E31 RISC-V 32 bit core running at over 320 MHz. This specific core, the RV32IMAC includes the integer instruction set, the extension for integer multiplication and divide, extension for atomic instructions, extension for compressed instructions, the privileged ISA specification, and external debug support. It also comes with 16KB L1 instruction cache, a 16KB data SRAM scratchpad, onboard OTP NVM, a wide variety of clock and interface support.

The E31 core is also available for integration into SOC’s. It is available as an FPGA bitstream for evaluation, or as RTL for synthesis for evaluation prior to full licensing. Moving up their product hierarchy there is the E51, which is a 32 bit core that they suggest is ideal for applications such as SSD controllers or networking applications.

However, the star of their presentation at the Linley Processor Conference was their new U54-MC core. This core comes with four of their U54 cores combined with an E51. It is capable of running a full featured Linux OS. This quad core processor is suitable for AI, machine learning, networking, gateways and smart IoT devices. In TSMC’s 28nm it runs at 1.5GHz typical.

Here is a summary of the features of SiFive’s U54-MC, which they favorably compare to the ARM Cortex-A35.

The final point is that a system designer might be concerned about the availability of development tools for processors with a new ISA. Because of the interest in RISC-V there has been a lot of development in this area. This is evident if you go take a look at the RISC-V Github repository at https://github.com/riscv. There is a wide range of support for things like OpenOCD, GNU, Linux, etc. Additionally, SiFive is making sure that users of their cores can access their Freedom Studio, which works on top of the Eclipse IDE. Freedom Studio is available on Windows, Mac and Linux.

SiFive is also bringing a radically different business model to processor IP. They have streamlined the process so you can get the specifications without any NDA. FPGA bit streams are downloadable, and RTL is also easy to get. RISC-V, and SiFive along with it, are gaining a lot of momentum. Any new processor has to compete on technology, but it seems that RISC-V is a solid and stable specification and that SiFive is making big strides in implementation. I look forward to seeing how this plays out in the market. In the meantime, I might just go and order one of their arduino boards to get some hands-on experience with a RISC-V processor based system. The SiFive website has a lot more information on RISC-V, their own cores and the development tools and environments.


Timing Analysis for Embedded FPGA’s

Timing Analysis for Embedded FPGA’s
by Tom Dillinger on 10-25-2017 at 7:00 am

The initial project planning for an SoC design project faces a difficult engineering decision with regards to the “margin” that should be included as part of timing closure. For cell-based blocks, the delay calculation algorithms within the static timing analysis (STA) flow utilize various assumptions to replace a complex RC interconnect load after routing and parasitic extraction with an effective capacitance for gate delay modeling. The library characterization data is then used to launch an (effective) waveform at the gate output to calculate the arrival times and slews at the RC network fanout pins. These calculations have an implicit error tolerance that is incorporated into the margins added to the STA flow path delay histograms.

The other day, I was having coffee with Geoff Tate and Cheng Wang from Flex Logix Technologies, providers of embedded FPGA IP on leading process nodes. Naively, I asked what guidelines Flex Logix provides to their customers, in terms of timing margins for the delay calculator and STA reporting features of their eFLEX compiler.

Cheng smiled, and said, “We do not have to provide margins. The reported path timing will accurately reflect what the customers will ultimately measure in silicon.”

He could tell that I looked a little puzzled.

Cheng continued,“An embedded FPGA implementation is different than a typical SoC block physical design. Yes, both approaches utilize a synthesis flow to a target library, followed by placement and routing steps. Yet, whereas timing analysis for the cell-based block has the challenge of modeling the interconnect load for a general fanout network, all the interconnects in an eFPGA are pre-defined. We invest significant resource to accurately characterize all elements of the eFPGA fabric to determine their signal delays and arrival slews — the LUT cells, all the route segments, the logic switches. And, then we confirm those models during our silicon qualification. The accuracy of the timing reports for a customer design is built-in, due to the extensive characterization data that is directly applicable.”

I finally got it. The building blocks of the eFPGA enable detailed characterization to be completed prior to customer release.

“So, how does an eFPGA customer run STA?”, I asked.

Cheng replied,“The eFLEX customer will follow a familiar flow as they have used for a general SoC block. A set of timing constraints are input, to define clocks and operating modes. A multi-corner, multi-mode (MCMM) set of scenarios is defined.” (see the figure below)


Geoff added, “The eFLEX compiler exercises synthesis, place, and route for the highest priority MCMM setting, to achieve the optimum implementation. The compiler provides STA path timing results for all the MCMM scenarios.”

“Given the pre-qualified characterization detail, STA is simplified to summation of individual circuit and net segment delays, once switch assignment and routing are complete. eFLEX uses a path-based delay propagation algorithm.”, Cheng described. “And, as clock arrival skews are also accurately characterized, any necessary hold time corrections by LUT delay insertion are applied judiciously. As the clock routes in the eFPGA fabric are highly optimized, very little functional delay path padding is typically required, perhaps in DFT scan mode.”

Geoff and Cheng shared some example screen shots of the STA results from the eFLEX compiler. The figures below depict a path delay histogram, and upon selecting a specific path, the detailed breakdown of its individual delay contributions.


There was one caveat that Geoff and Cheng shared.“Recall that an eFPGA design recommendation is to register the I/O signals. Whereas a general SoC block designer may invest significant effort in time budgeting and constraint file settings for inter-block paths, that is not the focus for out customers. They are seeking accurate, predictable register-to-register path timing for the functionality implemented in the programmable eFPGA logic.”

eFPGA IP is certainly unique. The detailed characterization of all the fabric elements enables accurate path timing analysis results, and eliminates the need to allocate significant timing margins

For more information on the Flex Logix eFLEX compiler and the path timing analysis features, please follow this link.

-chipguy


Silicon Creations talks about 7nm IP Verification for AMS Circuits

Silicon Creations talks about 7nm IP Verification for AMS Circuits
by Daniel Payne on 10-24-2017 at 12:00 pm

Designing at 7nm is a big deal because of the costs to make masks and then produce silicon that yields at an acceptable level, and Silicon Creations is one company that has the experience in designing AMS IP like: PLL, Serializer-Deserializer, IOs, Oscillators. Why design at 7nm? Lots of reasons – lower power, higher speeds, longer battery life.
Continue reading “Silicon Creations talks about 7nm IP Verification for AMS Circuits”


DSP-Based Neural Nets

DSP-Based Neural Nets
by Bernard Murphy on 10-24-2017 at 7:00 am

You may be under the impression that anything to do with neural nets necessarily runs on a GPU. After all, NVIDIA dominates a lot of what we hear in this area, and rightly so. In neural net training, their solutions are well established. However, GPUs tend to consume a lot of power and are not necessarily optimal in inference performance (where learning is applied). Then there are dedicated engines like Google’s TPU which are fast and low power, but a little pricey for those of us who aren’t Google and don’t have the clout to build major ecosystem capabilities like TensorFlow.


Between these options, lie DSPs, especially embedded DSPs with special support for CNN applications. DSPs are widely recognized to be more power-efficient than GPUs and are often higher performance. Tying that level of performance into standard CNN frameworks like TensorFlow and a range of popular network models makes for more practical use in embedded CNN applications. And while DSPs don’t quite rise to the low power and high performance of full-custom solutions, they’re much more accessible to those of us who don’t have billion dollar budgets and extensive research teams.

CEVA offers a toolkit they call CDNN (for Convolutional Deep Neural Net), coupling to their CEVA-XM family of embedded imaging and vision DSPs. The toolkit starts with the CEVA network generator which will automatically convert offline pre-trained networks / weights to a network suited to an embedded application (remember, these are targeting inference based on offline training). The convertor supports a range of offline frameworks, such as Caffe and TensorFlow and a range of network models such as GoogLeNet and Alex, with support for any numbers and types of layers.

The CDNN software framework is designed to accelerate development and deployment of the CNN in an embedded system, particularly though support functions connecting the network to the hardware accelerator and in support of the many clever ideas that have become popular recently in CNNs. One of these is “normalization”, a way to model how a neuron can locally inhibit response from neighboring neurons to sharpen signals (create better contrast) in object recognition.

Another example is support for “pooling”. In CNNs, a pooling layer performs a form of down-sampling, to reduce both the complexity of recognition in subsequent layers and the likelihood of over-fitting. The range of possible network layer types like these continues to evolve, so support for management and connection to the hardware through the software framework is critical.

This framework also provides the infrastructure for these functions you are obviously going to need in recognition applications, like DMA access to fetch next tiles (in an image, for example), store output tiles, fetch filter coefficients and other neural net data.

The CDNN hardware accelerator connects these functions to the underlying CEVA-XM platform. While CEVA don’t spell this out, it seems pretty clear that providing a CEVA-developed software development infrastructure and hardware abstraction layer will simplify delivery of low-power and high-performance for embedded applications on their DSP IP. An example of application of this toolkit to development of a vision / object-recognition solution is detailed above.

Back to the embedded / inference part of this story. It has become very clear that intelligence can’t only live in the cloud. A round-trip to the cloud won’t work for latency-sensitive applications (industrial control and surveillance are a couple of obvious examples) and won’t work at all if you have a connectivity problem. Security isn’t exactly enhanced in sending biometrics or other certificates upstream and waiting for clearance back at the edge. And the power implications are unattractive in streaming the large files required for CNN recognition applications to the cloud. For all these reasons, it has become clear that inference needs to move to the edge, though training can still happen in the cloud. And at the edge, GPUs characteristically consume too much power while embedded DSPs are battery-friendly. DSPs benchmark significantly better on GigaMACs and on GigaMACs/Watt, so it seems pretty clear which solution you want to choose for embedded edge applications.

To learn more about what CEVA has to offer in this area, go HERE.


IoT Security Hardware Accelerators Go to the Edge

IoT Security Hardware Accelerators Go to the Edge
by Mitch Heins on 10-23-2017 at 12:00 pm


Last month I did an article about Intrinsix and their Ultra-Low Power Security IP for the Internet-of-Things (IoT). As a follow up to that article, I was told by one of my colleagues that the article didn’t make sense to him. The sticking point for him, and perhaps others (and that’s why I’m writing this article) is that he couldn’t see why you would want hardware acceleration for security in IoT edge devices. He wasn’t arguing the need for security. He was simply asking why you would spend the extra hardware area in a cost-sensitive device when you could just use the processor you already have in the device to do the work in software.

I thought this was a good question and one that needed more than a flippant answer from me, so I went back to Intrinsix and had an interesting discussion with Chuck Gershman, director of strategic development at Intrinsix. It turns out the short answer is “power.” Edge devices spend a large percentage of their life not doing much. Many of the newest edge devices run off the tiniest of batteries and use energy harvesting from vibrations, pressure, light, etc. to fuel themselves. To do this, however, they must literally be able to shut themselves down for long periods of time.

So, how do security accelerators help? Well, most IoT edge devices that use their CPUs for security tasks don’t really shut down. They go into a sleep mode that keeps system registers alive so that they don’t lose device state. If you lose device state, the device must do a secure boot when it is time to wake up, and that takes CPUs both time and energy.

Intrinsix has shown that by using their hardware accelerated IP, they can fully shut down the device and then do a secure reboot in milliseconds instead of multiple seconds it takes a CPU to do the same thing. By using a dedicated hardware accelerator, they can boot the system up to 800 times faster, and the amount of power saved by being fully shut down instead of simply sleeping can lead to a 1000X power reduction and up to 10X better battery life.

Not to be totally rebuffed, my colleague then made the point that we were talking about IoT edge devices that were supposed to cost in the sub $1 / device range (in some cases in the pennies per device range). Hardware accelerators implied bigger die which means higher costs. In one sense my colleague was correct. It’s well known that cloud servers and IoT network hubs are expected to have lots of encrypted traffic as the cloud servers could be dealing with hundreds of network hubs, and the network hubs could be dealing with thousands of edge devices. One would expect to see dedicated security hardware in these devices to handle all the secure connections. The edge device, however, is likely only to be talking to just a few or maybe only one network hub.

Time for another discussion with Chuck who was all too happy to explain that the beauty of the Intrinsix security IP was that it was highly scalable. It turns out when Intrinsix designed their IP, they used an architecture that let them use configurable parallel computing for the security features. This means that they can optimize the design to meet different power, performance, and area (PPA) trade-offs while still giving you the benefit of having hardware acceleration.

So, you can still get the power benefits provided by the accelerators while having a minimal area penalty (which could be insignificant depending on the silicon technology used, pinout and package). And, since the IP is configurable, you can optimize the IP for whatever work load the device is expected to see. For network hubs and servers that means you can significantly boost their performance by adding more parallel compute lanes in the IP.

Last statement from my doubting colleague, was “Ok, so it sounds like I have to be a security guru to know how to optimize this IP to make the implied trade-offs”. For this one, I already knew the answer, which was, “no, you don’t.” Intrinsix is a design services firm that has the platforms, process, and people required to ensure first-turn success of your semiconductor project. They already have security expertise in-house and the necessary knowledge to optimize the IP for you. You tell them what you are trying to do and they can generate a fully optimized security IP for the job that is ready to drop into your ASIC. And… If you so desire, they can also help you to embed the IP into your ASIC or do the entire ASIC as well.

So, for those readers who had the same doubts as my colleague, I hope this article has cleared things up. Of course, if you want more details the Intrinsix team will be happy to talk to you.

If you want to learn more about Intrinsix and their IoT offerings, you can find them online at the link below. You may also want to download their IoT eBook.

See also:
eBook: IoT Security The 4[SUP]th[/SUP] Element
Intrinsix Fields Ultra-Low Power Security IP for the IoT Market
Intrinsix Website


Arm TechCon Preview with the Foundries!

Arm TechCon Preview with the Foundries!
by Daniel Nenni on 10-23-2017 at 9:00 am

This week Dr. Eric Esteve, Dr. Bernard Murphy, and I will be blogging live from Arm TechCon. It really looks like it will be a great conference so you should see some interesting blogs in the coming days. One of the topics I am interested in this year is foundation IP and I will tell you why.

During the fabless transformation of the semiconductor industry, semiconductor IP became a key enabler with EDA tools and ASIC services. Today, as non-traditional chip companies start designing chips from scratch, Foundation IP (SRAM, Standard Cells, and I/Os) from leading IP companies will again be front and center and when you want to know the latest about Foundation IP you talk to the foundries, absolutely.

In case you did not know, one of our leading foundry executives recently moved to Semiconductor IP which will bring a whole new perspective. Kelvin Low started at Chartered Semiconductor, then GLOBALFOUNDRIES, followed by Samsung Foundry, and is now Vice President of Marketing at Arm Physical Design Group where he will soon celebrate his 20th year in semiconductors. I had lunch with Kelvin recently and he told me what to look for in regards to foundries this week at Arm TechCon which starts with a free lunch with TSMC, Cadence, Xilinx, and Arm:

Unprecedented Industry Collaboration Delivers Leading 7nm FinFET HPC Solutions
Join us for an ecosystem lunch and joint presentations from our Ecosystem partners focusing on FinFET collaboration!In the first section of this set of four sessions, you will hear how Arm® and its Ecosystem partners delivered industry-leading 7nm FinFET solutions to address applications of the High Performance Computing (HPC) segment. With the implementation complexity at small geometries and more demanding product requirements, it is imperative that the Ecosystem collaborate closely to meet the most stringent system-level performance and power targets. Speakers from TSMC®, Cadence®, Xilinx® and Arm will share details of our combined effort and discuss key challenges and future opportunities.

Transforming Markets with Arm and Intel FinFET Solutions
In the second of four sessions, extend your lunch with us to hear from Arm and Intel® on our new partnership focusing on our collaborative solutions for 10hpm and 22ffl. The second part of the sponsored session covers the joint strategy bringing Arm and Intel Custom Foundry to the ecosystem. Together, we will share our planned journey to enable smart mobile computing on these key process nodes. Speakers from Arm and Intel will also discuss co-optimization of the process technology, and how we will expand the collaboration for broader solutions.

Samsung Foundry Roadmap to Advanced FinFET Nodes
In the third of four sessions, we welcome presenters from Samsung Foundry and Arm. Samsung Foundry will showcase their latest FinFET roadmap at 14nm, 11nm and beyond, including the value proposition and target markets for their advanced nodes. Samsung and Arm will highlight the results of our collaborative efforts in this space with Arm detailing their 14LPP and 11LPP platform offering and support of the Samsung Foundry roadmap for the benefit of the ecosystem.

Arm Physical Design Solutions
In the fourth of four sessions, we invite you to close out your lunch and hear direct from Arm on our physical design solutions for the ecosystem. We will cover cross-foundry roadmaps with a focus on POPTM IP, bring new optimizations to Arm CortexTM-A cores targeting improved design turnaround time. And we have an exciting announcement for our product availability on DesignStart.

If you would like to meet us at Arm TechCon message us on SemiWiki and I will make sure it happens. You can meet me in the Open-Silicon booth #918 Wednesday morning where we will be giving away 300 copies of Custom SoCs for IoT: Simplified”. It would be a pleasure to meet you. Or you can Download the Free PDF Version Here.


TSMC: Semiconductors in the next ten years!

TSMC: Semiconductors in the next ten years!
by Daniel Nenni on 10-23-2017 at 6:00 am

The TSMC 30th Anniversary Forum just ended so I will share a few notes before the rest of the media chimes in. The forum was live streamed on tsmc.com, hopefully it will be available for replay. The ballroom at the Grand Hyatt in Taipei was filled with cameras, semiconductor executives, and security personnel.

Here is the replay

The event started with a video about TSMC over the last 30 years followed by comments from Chairman Morris Chang. The keynotes were by Nvidia CEO Jensen Huang, Qualcomm CEO Steve Mollenkopf, ADI CEO Vincent Roche, ARM CEO Simon Segars, Broadcom CEO Hock Tan, ASML CEO Peter Wennink, and Apple COO Jeff Williams. Next was a panel discussion led by Chairman Morris Chang.

First let’s start with the jokes. Jensen Huang was supposed to go first but his presentation was not ready and Morris roasted him a bit over it. Jensen replied that it took him longer because he actually prepared for the event. Funny because it was a joke with a bit of truth to it because the other presentations were standard stock. Jensen did the best presentation which was all about AI which is in fact the future of semiconductors in the next ten years.

The best joke however was in response to a question about legal matters, if AI goes wrong who is held accountable? Morris pointed out that Steve Mollenkopf probably has the most legal experience of the group referring to Qualcomm’s massive legal challenges of late. Steve recused himself from the question of course. Even at 86 years old Morris still has a quick wit and provided most of the humor for the evening.

As I have mentioned before, AI will touch almost every chip we make in the coming years which will bring an insatiable compute demand that general purpose CPUs will never satisfy. This year Apple put a neural engine on the A11 SoC that’s capable of up to 600 billion operations per second. Nvidia GPUs do trillions of operations per second so we still have a ways to go for edge devices.

A couple of more interesting notes, the Apple-TSMC relationship started in 2010 which didn’t produce silicon until the iPhone 6 in 2014. Morris described the Apple-TSMC relationship as intense but Jeff Williams (Apple) said that you cannot double plan for the volumes of technology that Apple requires so partnerships are key. My take is that the TSMC-Apple relationship is very strong and will continue for the foreseeable future. Who else is going to be able to do business the Apple (non competing) way and still make big margins?

Jeff also predicts that medical will be the most disruptive AI application to which Morris agreed suggesting mediocre doctors will be replaced by technology. This is something I feel VERY strongly about. Medical care is barbaric by technology standards and we as a population are suffering as a result. Apple is focused on proactive medical care versus reactive which is what you see in most hospitals. Predicting strokes or heart events is possible today for example. AI enabled medical imaging systems is another example for tomorrow.

Security and privacy were discussed with Apple insisting that your data is more secure on your device than it is in the cloud. Maybe that’s why the new phones have a huge amount of memory (64-256 GB) while free iCloud storage is still only 5 GB. We use a private 1 TB cloud for just that reason by the way, our data stays in our possession. I certainly agree about security but privacy seems to be lost on millennials and they are the target market for most devices.

Bottom line: Congratulations to the TSMC support staff, this event was well done and congratulations to TSMC for an amazing 30 years. The room was filled with C level executives and a smattering of media folks like myself. It really was an honor to be there, being part of semiconductor history, absolutely.


Webinar: Optimizing QoR for FPGA Design

Webinar: Optimizing QoR for FPGA Design
by Bernard Murphy on 10-22-2017 at 12:00 pm

You might wonder why, in FPGA design, you would go beyond simply using the design tools provided by the FPGA vendor (e.g. Xilinx, Intel/Altera and Microsemi). After all, they know their hardware platform better than anyone else, and they’re pretty good at design software too. But there’s one thing none of these providers want to support – a common front-end to all these platforms. If you want flexibility in device providers, making a vendor change will force you back to an implementation restart. Which is one reason why tools like Synplify Premier from Synopsys have always had and always will have a market.


REGISTER HERE for this webinar on October 25[SUP]th[/SUP] at 10am PDT

The other reason is that a company whose primary focus is in design software, and which started and still leads design synthesis market, is likely to have an edge in synthesis QoR, features and usability over the device vendors. Of course, the physical design part of implementation still comes from the vendors, but Synplify tightly couples with these tools, not just in the sense of “you can launch Vivado from Synplify” but also in the sense that you can iteratively refine the implementation, as you’ll see soon.

As an example of what you get in synthesis from a tool in the Synopsys stable, Synplify Premier will handle optimization for state-machines (including recoding to other styles such as Gray encoding), resource-sharing, pipelining and retiming. And of course, they support DesignWare IP.

This webinar provides a fairly detailed overview of what is possible using Synplify Premier as your FPGA design front-end. Much of this will be familiar to ASIC designers or to FPGA designers already familiar with tools from device vendors. One topic is on optimal RTL coding styles, for FSMs (for optimization to the target device, to map away unreachable states, add safe recovery from invalid states or to change coding), math and DSP functions for efficient packing (for filters, counters, adders, multipliers, etc) and optimized RAM inferencing based on availability of resources (block RAMs etc).

Static timing analysis will look very familiar, except that the Synopsys constraint format is called FDC (FPGA design constraints) rather than SDC. Synplify Premier provides a nice feature to automatically create a quick set of constraints in early design to help you get through the basic flow-flush. Naturally you’ll want to work on developing real constraints (real clocks, clock groups, I/O constraints, timing exceptions, etc) before you move to physical design.

I mentioned earlier that interoperability between Synplify Premier and the vendor physical design tools isn’t just about compatibility in libraries, tech files and data passed from the synthesis tool to the vendor tool. A great example is in congestion and QoR management. These problems happen for well-known reasons – high resource utilization, over-aggressive constraints, logic packing problems and others.

One particularly important root cause can happen on Xilinx device which are multi-die (each die is known as super-logic region/SLR) on an interposer connected by super-long line (SLL) interconnects. You already know where this is going; there are only so many SLLs, which means they can be over-used (I assume there might also be reduced timing margin on SLLs). So lots of congestion and timing closure problems can happen – no news to implementation experts. What is interesting here though is that Synplify Premier can take this information from Xilinx or Intel project files and use it to drive re-synthesis to reduce congestions and timing closure problems. It also can drive many runs in parallel on a server farm so you can quickly explore different implementation strategies. That’s real and very useful interoperability.

If you’re not familiar with Synplify Premier, this should be a must-see. Remember to
REGISTER HERE for this webinar on October 25[SUP]th[/SUP] at 10am PDT