Banner Electrical Verification The invisible bottleneck in IC design updated 1

Tensilica Vision P6 DSP is Powering Huawei Kirin 970 Image

Tensilica Vision P6 DSP is Powering Huawei Kirin 970 Image
by Eric Esteve on 11-17-2017 at 7:00 am

Cadence has recently announced two key design-in for their Vision DSP IP family: MediaTek’s Helio P30 integrates the Tensilica Vision P5 DSP and HiSilicon has selected the Cadence® Tensilica® Vision P6 DSP for its 10nm Kirin 970 mobile application processor. The Kirin 970 being integrated into Huawei’s new Mate 10 Series mobile phones (HiSilicon is a subsidiary of Huawei), we can expect the IC production volumes to be huge. In fact, Huawei is ranked #3 for WW smartphone shipment in 2017, with a market share becoming closer to the #2 Apple. Let’s take a look at the various Tensilica DSP IP cores and figure out their positioning in respect with imaging, vision processing or emerging applications such as 3D sensing, human/machine interface, AR/VR and biometric identification for the mobile platform.


The above picture is useful to discriminate between the Vision P5 DSP, Vision P6 and Vision C5 DSP. At first the architecture, wide vector/SIMD (Single Instruction Multiple Data), is identical when the instruction set is optimized for imaging application (P5 and P6) or for Neural Network (NN) for the Vision C5 DSP.

The Vision P5 DSP was released in September 2015. It’s a general-purpose imaging DSP with 64 MACs, optimized for computational photography algorithms, like the Vision P6, but the latter (released in September 2016) is offering occasional-use neural network recognition and 4X MACs, with 256 MACs. Designer can select an optional 32 ways SIMD vector FPU with 16-bits (FP 16).
The Vision C5 DSP, released in September 2017, also offers 4X MACs count, with 1024 8-bit MACs or 512 16-bit. But the main difference with the P family is that the complete DSP is optimized to run all NN layers and allows running full-time NN (Always-on NN), supporting face detection, people detection, object detection and gesture detection and running video analysis and AI. The Vision C5 is optimized for vision, radar/lidar and fused-sensor applications and target surveillance, drone and mobile/wearable markets.

I remember the time where the only DSP into a mobile phone was integrated to support the modem… if you look at the Kirin 970 bloc diagram (above), you realize that digital signal processing is now intensively used in a mobile phone.

The global-mode modem is LTE modem (a Category 18 modem) with download capabilities of 1.2Gbps, based on DSP. Dual-back sensors are now integrated into mobile phone, and dual camera requires increased computational requirements for the imaging functionality -another DSP like function, supported by a dual camera ISP in the Kirin 970. Huawei’s i7 sensor processor is also a DSP based function, like obviously the Cadence/Tensilica Vision P6 image DSP doing Pre-and Post-image processing and the HiFi Audio processing.

Let’s address the dual-back sensors capability of the Kirin 970. According with a report from IBS “Image Sensor and Image Signal Processing” (Sep 2016), the integration of dual-back image sensors will replace the single-back sensor, as shipments will be at 50%/50% in 2020. The CMOS Image Sensor (CIS) market is not as well-known as the Apps Processor or modem market, but it’s weighting $12 billion in 2016 and will grow with 10% CAGR up to 2025 (according with IBS or with Yole, French based analyst specialized in the sensor industry). As far as I am concerned, I didn’t know that much about this market one year ago as I discovered it in 2017 while working for an IP start-up targeting CIS, but I can now tell that it will be one of the fastest growing semiconductor segment, thanks to the dual-back sensors adoption in mobile and also to the CIS pervasion in automotive, new cars integrating from 3 to up to 10 cameras! Stay tuned as the CIS IP segment could be one of the session of the next DAC IP in 2018…

If the phone integrates two back sensors, that make sense to use dual ISP to extract the best picture quality (in fact Huawei has shown a comparison of the same image taken with Samsung Galaxy S8 and Huawei Mate 10 Pro during a presentation in Berlin in September, and the result is a blurry pic with Samsung). But two ISP also means that the Vision processor (Tensilica Vision P6 DSP here) has to be much more powerful than before. Cadence is claiming to have the highest per-cycle processing with 4 vectors per cycle (each 64-way SIMD), the widest memory interface at 1024 bits. As the Vision P6 is running at 1.1 GHz on 16nm FF, we may expect it to run even faster in the Kirin 970 targeting 10nm.

I end up with this last picture showing the great energy efficiency of Tensilica Vision DSP, up to 25X better than CPU. More than just low power, energy efficiency will become the key concern of semiconductor devices of the future… and yes, we will most probably also address this topic during IP session at DAC 2018!

How MediaTek is using the Vision P5 DSP in their Helio P30 SoC, you can find some info on AnandTech here:

http://www.anandtech.com/show/11770/mediatek-helio-p23-helio-p30-midrange-socs

By
Eric Esteve fromIPnest


Scale the tools not your expectations

Scale the tools not your expectations
by Frederic Leens on 11-16-2017 at 12:00 pm

The complexity of silicon chips is exploding. Actually, it has been growing at a tremendous speed for decades. So far, the semiconductor industry has been successful at providing new ways to master new levels of complexity, over and over again.

Standardizing hardware platforms, using higher-level languages with a knowledge of the underlying hardware (like OpenCL), heavily reusing IPs, simulating from system to the gate, speeding up our EDA software, leaving the bulk of functionality to software…

All this has been a relentless quest for new strategies to master complexity.

We all know that this is not always a rosy picture, however. Despite adopting new strategies and deploying new tools, making an ASIC, a SoC or a FPGA ready for production is a long path paved with sweat, doubts and excitement too.

How about ‘Desillusion’? Maybe as well.

We,FPGA engineers, have run against quite some of it. For some time, we had thought that we had been blessed among other electronic engineers, because, well, the chip can be changed and fixed in the field, right?No worries about the high NRE cost, no risk of wasting a few 100th of millions dollars because of a respin.

Using a prototype provides much faster execution than simulation and reveals the imperfections of the models we use in simulation.

In the end, we can just take a sample in the field before production, so there is no risk of any unforeseen behavior…

Right? Right?

Well, not quite. Prototyping a complex chip on a FPGA is a good approach only if you can get a reasonable visibility out of it. Once on a board, a chip is fast and essentially opaque.

Should we give up debugging and analysis on FPGA prototypes because we lack visibility? Actually, using FPGA prototypes is not the problem. The problem is that the tools we use with FPGA prototypes did not scale with their gigantic complexity. Using FPGA prototypes is not the problem. The problem is that the tools we use did not scale with FPGAs’ gigantic complexity.

‘Did not’ scale you say?

Please watch the recording below: it show a capture of live data from inside a FPGA placed in the field, that spans over more than 1 hour – that is far more than what simulation usually shows – and this, in real condition.

For more information, you can contact meor go to : www.exostivlabs.com

Thank you for reading –

-Frederic Leens

Exostiv Labs provides innovative solution to debug FPGA. Our software / hardware products provide up to 100.000 times more observability than the usual embedded instrumentation solutions at the FPGA speed of operation. Today’s FPGA complexities require a new generation of debugging tools. Exostiv Labs focuses on reaching ultra large observability while preserving the target FPGA memory and I/O resources. This allows visiting new debugging scenarios, with extended reach in time and unprecedented Gigabyte-range debug information recordings.


High-Level Design for Automotive Applications

High-Level Design for Automotive Applications
by Bernard Murphy on 11-16-2017 at 7:00 am

Automotive markets have added pressure on semiconductor/systems design through demand for ISO26262 compliance – this we all know. But they have also changed the mix of important design types. Once class of design that has become very significant in ADAS, and ultimately autonomous applications, is image signal processing (ISP). Collision-avoidance, pedestrian detection, lane departure and many other features depend on processing images from a variety of sources. In all these cases, fast response is essential, especially in image preconditioning where hardware-based solutions will typically have an edge.


These functions handle a wide range of operations: defect correction, noise filtering, white balance, sharpening and many others. These can be handled through sequences of custom-tuned algorithms which are data-processing-centric rather than control-centric so particularly well-suited to high-level synthesis (HLS).

I check-in to HLS periodically to see how usage is evolving. You can look at this from a generic point of view – will it eventually replace RTL-based design across the majority of designs? I’m not sure that perspective is very illuminating. Methods change often not because a better way emerges to do exactly the same thing but because needs change and a new method better handles those new needs. In this case, automotive needs may be a stimulus for change.

ISP is one example. If you need highest performance along with lowest power and cost, you want a hardware-based solution but this needs careful co-design with associated image-processing software. Getting to the best-possible solution also demands a lot of experimentation through architecture options, for example on word-widths, pipelining and operand sharing.

This is where HLS can really shine. Once you have got through the learning curve (with C++ or SystemC) and you have built up a library of reusable templates, for a new IP development verification and maintenance is much faster, in part because there are simply less lines of unique code for such an IP that would be required in RTL. This isn’t just my view. The Imaging group in ST in a recent webinar say that over the last 3 years they have built a library of 50+ IP, ranging from 10K gates to 2M gates, in a team of less than 10 designers. Naturally the templates are an important part of this and are, I would guess, fairly application-specific. But once built-up, it seems new IPs can be added/adapted quite quickly. This is a pay now or pay more later proposition.

The payback in this flow is quite compelling. First, an IP group is able to deliver very quickly to the SoC verification team a model for basic integration testing. After that the IP team go into detailed functional design for the IP, while exploring architecture tradeoffs and synthesizing to RTL tool (they are using Mentor Catapult). Their verification methodology is a very interesting aspect of this stage. First, the team use the same UVM testbench for both the high-level model and the generated RTL, which means that at RTL, no new verification development should be required, and indeed they find that once the C-level model verifies clean, the generated RTL also verifies clean.

Second, these testbenches are largely developed by the IP designers (with verification experts jumping into to handle special cases). Nice idea. Few product teams are swimming in under-utilized resources (and if they are, that’s probably not a good thing). Better leveraging what resources you have is always a plus, in this case getting more of the verification assets (TB, sequences, constraints, etc.) developed in the design team.

In the final phase of IP development, the team focus on PPA optimization. Again, the high-level nature of the design provides a lot of flexibility to make late-stage changes, in architecture if needed, to get to the most competitive solution that can be delivered. Unlike late-stage changes in RTL which can be very disruptive, here there’s no drama. HLS simply regenerates a new RTL, adapting also to parameter-controlled option changes as needed and the same UVM testbench is again used to test the generated RTL, a very quick process since it has already been proven on the C-level design.

The ST speaker wraps up with a few other observations. Following their methodology, they have been able to reduce total development/verification time on a typical IP by nearly 70%. And by doing the bulk of their verification development at the C++ level (where verification runs much faster), they are able to run thousands of tests in minutes rather than the hours that would be required at RTL, which means they can get to coverage closure much faster.

ST has been in the ISP business for a long time so their suggestions have to be considered expert. If you want to learn more about how they are using Mentor Catapult, you can read the white-paper HERE and view the webinar HERE.


IoT and Blockchain: Challenges and Risks

IoT and Blockchain: Challenges and Risks
by Ahmed Banafa on 11-15-2017 at 12:00 pm

The Internet of Things (IoT) is an ecosystem of ever-increasing complexity; it’s the next wave of innovation that will humanize every object in our life, and it is the next level of automation for every object we use. IoT is bringing more and more things into the digital fold every day, which will likely make IoT a multi-trillion dollar industry in the near future. To understand the scale of interest in the internet of things (#IoT) just check how many conferences, articles, and studies conducted about IoT recently, this interest has hit fever pitch point in 2016 as many companies see big opportunity and believe that IoT holds the promise to expand and improve businesses processes and accelerate growth.

However, the rapid evolution of the IoT market has caused an explosion in the number and variety of IoT solutions, which created real challenges as the industry evolves, mainly, the urgent need for a secure IoT model to perform common tasks such as sensing, processing, storage, and communicating. Developing that model will never be an easy task by any stretch of the imagination, there are many hurdles and challenges facing a real secure IoT model.

The biggest challenge facing IoT security is coming from the very architecture of the current IoT ecosystem; it’s all based on a centralized model known as the server/client model. All devices are identified, authenticated and connected through cloud servers that support huge processing and storage capacities. The connection between devices will have to go through the cloud, even if they happen to be a few feet apart. While this model has connected computing devices for decades and will continue to support today IoT networks, it will not be able to respond to the growing needs of the huge IoT ecosystems of tomorrow.

The Blockchain Model
Blockchain is a database that maintains a continuously growing set of data records. It is distributed in nature, meaning that there is no master computer holding the entire chain. Rather, the participating nodes have a copy of the chain. It’s also ever-growing — data records are only added to the chain.

When someone wants to add a transaction to the chain, all the participants in the network will validate it. They do this by applying an algorithm to the transaction to verify its validity. What exactly is understood by “valid” is defined by the Blockchain system and can differ between systems. Then it is up to a majority of the participants to agree that the transaction is valid.

A set of approved transactions is then bundled in a block, which gets sent to all the nodes in the network. They, in turn, validate the new block. Each successive block contains a hash, which is a unique fingerprint, of the previous block.

Principles of Blockchain Technology
Here are five basic principles underlying the technology:

1. Distributed Database
Each party on a blockchain has access to the entire database and its complete history. No single party controls the data or the information. Every party can verify the records of its transaction partners directly, without an intermediary.

2. Peer-to-Peer Transmission

Communication occurs directly between peers instead of through a central node. Each node stores and forwards information to all other nodes.

3. Transparency

Every transaction and its associated value are visible to anyone with access to the system. Each node, or user, on a blockchain has a unique 30-plus-character alphanumeric address that identifies it. Users can choose to remain anonymous or provide proof of their identity to others. Transactions occur between blockchain addresses.

4. Irreversibility of Records
Once a transaction is entered in the database and the accounts are updated, the records cannot be altered, because they’re linked to every transaction record that came before them (hence the term “chain”). Various computational algorithms and approaches are deployed to ensure that the recording on the database is permanent, chronologically ordered, and available to all others on the network.

5. Computational Logic
The digital nature of the ledger means that blockchain transactions can be tied to computational logic and in essence programmed. So users can set up algorithms and rules that automatically trigger transactions between nodes.

Public vs. Private Blockchain

Blockchain technology implementation can be public or private with clear differences, for example the benefits offered by a private blockchain are: faster transaction verification and network communication, the ability to fix errors and reverse transactions, and the ability to restrict access and reduce the likelihood of outsider attacks. The operators of a private blockchain may choose to unilaterally deploy changes with which some users disagree. To ensure both the security and the utility of a private blockchain system, operators must consider the recourse available to users who disagree with changes to the system’s rules or are slow to adopt the new rules. While, developers who work to maintain public blockchain systems like bitcoin still rely on individual users to adopt any changes they propose, which serves to ensure that changes are only adopted if they are in the interest of the entire system.

Just as a business will decide which of its systems are better hosted on a more secure private intranet or on the internet, but will likely use both, systems requiring fast transactions, the possibility of transaction reversal, and central control over transaction verification will be better suited for private blockchain, while those that benefit from widespread participation, transparency, and third-party verification will flourish on a public blockchain.

Challenges of Blockchain in IoT

In spite of all its benefits, the Blockchain model is not without its flaws and shortcomings:

Scalabilityissues; relating to the size of Blockchain ledger that might lead to centralization as it’s grown over time and required some kind of record management which is casting a shadow over the future of the Blockchain technology.

Processing power and time; required to perform encryption algorithms for all the objects involved in Blockchain -based IoT ecosystem given the fact that IoT ecosystems are very diverse and comprised of devices that have very different computing capabilities, and not all of them will be capable of running the same encryption algorithms at the desired speed.

Storagewill be a hurdle; Blockchain eliminates the need for a central server to store transactions and device IDs, but the ledger has to be stored on the nodes themselves, and the ledger will increase in size as time passes. That is beyond the capabilities of a wide range of smart devices such as sensors, which have very low storage capacity.


Risks of Using Blockchain in IoT

It goes without saying that any new technology comes with new risks. An organization’s risk management team should analyze, assess and design mitigation plans for risks expected to emerge from implementation of blockchain based frameworks.

Vendor Risks: Practically speaking, most present organizations, looking to deploy blockchain based applications, lack the required technical skills and expertise to design and deploy a blockchain based system and implement smart contracts completely in-house, i.e. without reaching out for vendors of blockchain applications. The value of these applications is only as strong as the credibility of the vendors providing them. Given the fact that the Blockchain-as-a-Service (BaaS) market is still a developing market, a business should meticulously select a vendor that can perfectly sculpture applications that appropriately address the risks that are associated with the blockchain.

Credential Security:Even though the blockchain is known for its high security levels, a blockchain based system is only as secure as the system’s access point. When considering a public blockchain based system, any individual has access to the private key of a given user, which enables him/her to “sign” transactions on the public ledger, will effectively become that user, because most current systems do not provide multi-factor authentication. Also, loss of an account’s private keys can lead to complete loss of funds, or data, controlled by this account; this risk should be thoroughly assessed.

Legal and Compliance:It’s a new territory in all aspects without any legal or compliance precedents to follow, which poses a serious problem for IoT manufacturers and services providers. This challenge alone will scare off many businesses from using Blockchain technology

The Optimum Secure IoT Model
In order for us to achieve that optimal secure model of IoT, security needs to be built-in as the foundation of IoT ecosystem, with rigorous validity checks, authentication, data verification, and all the data needs to be encrypted at all levels, without a solid bottom-top structure we will create more threats with every device added to the IoT. What we need is a secure and safe IoT with privacy protected. That’s a tough trade-off but possible with Blockchain technology if we can overcome its drawbacks.

Ahmed Banafa Named No. 1 Top VoiceTo Follow in Tech by LinkedIn in 2016. Read more articles at IoT Trends by Ahmed Banafa


Arm and Mentor Use DesignStart Program to Accelerate Proof-of-Concept for IoT Designs

Arm and Mentor Use DesignStart Program to Accelerate Proof-of-Concept for IoT Designs
by Mitch Heins on 11-15-2017 at 7:00 am

Sometimes the hardest thing about bringing a new idea to fruition is overcoming the inertia to get started with a proof-of-concept. You must be able to put together enough parts of the solution to prove to those controlling budgets that an idea has merit and is worth taking to the next level. It’s a bit of a chick-vs-egg scenario as you can’t get funding to do the proof-of-concept without a proof-of-concept to get the funding. At this point, many good internet-of-things (IoT) applications die for lack of finding a way out of this catch-22 scenario.

Arm and Mentor have come up with a winning proposal to help entrepreneurs whether they be individuals in their home office, in small or medium sized enterprises or even in large corporations. It’s something Arm calls their DesignStart program. The DesignStart program is meant to be a simple, fast, low-risk route to using Arm’s industry-leading IP with no upfront fee. To compliment their IP, Arm has teamed up with Mentor, a Siemens business, to offer designers a limited-time free access license to Tanner EDA tools to overcome the cost barrier of acquiring EDA tools. Arm is also offering approved design partners training and support for SoC development.

The DesignStart program comes in two flavors, DesignStart Eval and DesignStart Pro. DesignStart Eval is for anyone. Users can instantly get free, click-through access to Arm Cortex-M0 and the Cortex-M3 processors as well as Arm subsystem IP. The cores and IP can be configured or modified including the ability to add your own IP and peripherals. The resulting design can then be prototyped on an FPGA giving designers a fast way to design, simulate and prototype their proof-of-concept. Forum-based support is provided giving designers access to others who have gone down the same path.

When it comes time to commercialize your idea, Arm is providing their DesignStart Pro option. This option is for companies looking to develop their own chip. Companies register on the DesignStart website, sign and return a contract with no upfront fees and a simple success-based royalty once the chip goes into production and then start work. DesignStart Pro also includes a verified subsystem, enhanced design services and the mbed OS platform.

As mentioned, Mentor provides designers with their Tanner EDA tools to enable designers to do design capture and simulation of their proof-of-concept SoC. This is done through a free 30-day evaluation license of the Tanner AMS flow with S-Edit schematic capture, and T-Spice and ModelSim simulators. These tools enable the designers to create a demonstration of how their design concept will work which can then be used as the proof-of-concept needed to gain funding for implementing and finalizing their idea.

Arm is providing both the Arm Cortex-M0 and Cortex-M3 processors along with their IP subsystems. The Cortex-M0 is a 32-bit processor with exceptionally small silicon area, low power and minimal code footprint. This processor is exceptionally good for Internet-of-Things (IoT) edge devices.

Conversely, designers can also choose to use the Cortex-M3 processor which is an industry-leading 32-bit processor for smart embedded applications. The Cortex-M3 is a high-performance processor used in microcontrollers, automotive applications, industrial control systems and wireless networking and sensors. The Cortex-M3 is especially well suited for IoT gateway devices.

Designers using the DesignStart Pro option also get the Corex-M Design Kit (CMSDK) which provides an example system, a selection of either Arm AMBA or APB infrastructure components and many other key system IP components. DesignStart Eval makes available a subset of the CMSDK capabilities.

A nice feature of both DesignStart programs is they provide the required CAD views and documentation necessary to use either the Arm Cortex-M0 or Cortex-M3 cores in the Tanner EDA flow. This is crucial for entrepreneurs who don’t have the help of a corporate CAD team to put a design flow together with verified IP for their proof-of-concept. DesignStart makes available the libraries to quickly assemble and configure designs in the Tanner S-Edit schematic capture tool and enables debugging of design interfaces between cores, peripherals and sensors using the mixed-mode simulation capabilities of Mentor’s Tanner T-Spice and ModelSim simulators.

Once funding is secured, the next step is to implement the layout of the design and then fabricate it. This requires that designers purchase the Mentor tools to complete the design giving them access to Mentor’s synthesis, placement, routing and verification tools. Once the design is completed and verified it can then be taped out to the designer’s chosen foundry. To speed up the implementation phase, Arm DesignStart also provides free access to a comprehensive library of physical IP.

So, if you have an idea and need a little help to get the ball rolling, you might want to check out the Arm DesignStart program and Mentor Tanner EDA tools. There is a white paper and a webinar that are available to get you going (see links below). You might find that they are exactly what you need to get your idea through the proof-of-concept stage and to get your ball rolling downhill to make the next big IoT application a reality.

See Also:
Arm/Mentor White Paper: “Fast SoC Proof-of-Concept with ARM DesignStart”
Arm/Mentor Webinar: “The Fastest Lowest-Cost Route to Developing Mixed Signal SoCs”
Arm DesignStart Program
Mentor Tanner EDA Tools


A Brief History of PSS at Breker

A Brief History of PSS at Breker
by Daniel Payne on 11-14-2017 at 12:00 pm

Verification engineers are hearing a lot about the Portable Stimulus Standard (PSS), and for good reason because it could potentially save them time and effort in doing their jobs much better. In order to get the big picture on what PSS is all about I contacted Adnan Hamid, founder and CEO of Breker Verification Systems, because he’s been involved with the formation of PSS since its inception.
Continue reading “A Brief History of PSS at Breker”


The Practice of Low Power Design

The Practice of Low Power Design
by Bernard Murphy on 11-14-2017 at 7:00 am

For any given design objective, there is what we in the design automation biz preach that design teams should do, and then there’s what designs teams actually do. For some domains, the gap between these two may be larger than others, but we more or less assume that methodologies which have been around for years and are considered to be “givens” among leaders in the field will be, at least conceptually, well-embedded in the thinking of other teams.


In low-power design, here are some of those givens:

  • Designer intuition for performance and area of a function can be quite good, but intuition for power is horrible: +/- 50% if you’re lucky.
  • ~80% of power optimization is locked down in architecture (leftmost designer above), ~20% or less in RTL (middle designer) and the best reductions you can hope for at implementation are in single digits (rightmost designer).
  • From which we conclude that power is an objective that must be addressed all the way through the design flow – from architecture all the way to GDS, with architecture most important.
  • Power is heavily dependent on use-case, more so perhaps in mobile, but still important in wired applications. You have to run power estimation on a lot of applications, which is going to run a lot faster at RTL than at gate-level.
  • Peak-power is also important, especially for reliability. So averages alone are insufficient.

For anyone designing for mobile applications, all of this is old news. They have been using these principles (and more) for a long time. What came as a shock to me, in discussion with Dave Pursley (Sr. PPM at Cadence) was that some design teams, at well-known companies, seem unaware of or indifferent to even the first two concepts. I can’t speak to why, but I can speculate.

Perhaps the view is that “this isn’t a mobile application, so we just have to squeeze a bit and we can do that in RTL”. Or “between RTL and layout optimizations we’ll get close enough”. Or “we can’t estimate power accurately until we have an implementation so let’s get there first then see where we stand”. Or maybe it’s just “we’ve always done design this way, power is just another thing we have to tweak, we’ll run that near the end”. Whatever the reason, as you might guess sometimes this doesn’t turn out so well. Dave cited examples (no names) of power estimates at layoutthat were 50% over budget.

Some of these were saved by Hail Mary’s. Which sounds heroic but it’s not a good way to run a business. A more likely outcome is a major redesign for the next product cycle or scrapping the product. For those of you who have found yourselves in this position and didn’t enjoy the experience, let’s review what you should have done.

First, just because your target won’t go into a mobile application doesn’t mean you can skip steps in low-power design. If you’re just doing a small tweak to an existing well-characterized design, to be used by the same customer in the same applications, then maybe. Otherwise you need to start from architecture just like everyone else. You don’t have to get anywhere near as fancy as the mobile design teams, but HPC/cloud applications, cost-sensitive applications without active cooling and high-reliability systems now also have tight and unforgiving power budgets.

How do you estimate power at the architecture level? For the system as a whole, simulation coupled with power estimation, if you don’t have any other choice. Emulation coupled with power estimation will be massively more effective in getting coverage across many realistic use cases and particularly in teasing out peak power problems.


For IP power characterization, you’ll start with RTL or gate-level models. If you’re planning to build a new IP, you might consider starting with high-level design (eg. SystemC). That can be synthesized directly to RTL where you can run power estimation driven by the testbench you developed at that same high-level (also faster to build than an RTL testbench). Developing at high-level allows for quick turn-around architecture exploration to optimize, say, power. You may be surprised to hear that a lot more functions are being developed this way today (see the results of a Cadence 2017 survey above). If this isn’t an option, you’ll have to stick with RTL models. Either way, you know power estimation will be as accurate as RTL power estimation.

Which these days is within ~15% of signoff power estimates. Might seem like a significant error bar, but it’s still a lot better than your intuition. And it’s actually better than 15% for relative estimates, so a pretty good guide for comparing architecture/micro-architecture options.

Next, unless they have a Hail Mary play in mind, don’t believe the RTL team if they tell you they can cut 50% power. (I’m not talking about power and voltage switching here). More common might be 15-30% starting from an un-optimized design, more like ~5% if already optimized. If they can save more than that, that doesn’t speak well to the quality of their design. Clock gating will save some, not much if you only do register-level gating, more if you gate at the IP level (also gating the clock tree), memory-gating can save quite a bit too. Hail Mary’s could include power gating but hang on to your hat if you start thinking of that late in the design. Verification is going to get a whole lot more complicated as is adding on-board power management, power grid design, floorplanning and more complex power integrity signoff.

Most of all squeeze everything you can out of power before you hand it over to implementation and make sure you are within ~5-10% of budget before you handoff. Implementation can fine tune but they absolutely cannot bail you out if you’re way off on power. What they need to focus on is that last few percent, power and signal integrity, thermal (again, not just average but also peak – local thermal spikes affect EM and timing and can easily turn into thermal runaway). And, of course, they worry about timing closure and area.

So now you know power optimization isn’t a back-end feature or tool. It’s a whole string of tools and a responsibility all the way from architecture to implementation, with the bulk of that responsibility starting at architecture. The right way to handle this is through an integrated flow like the that offered by Cadence. Why is integration important? Because a lot of getting this right depends on consistency in handling constraints and estimation through the flow, which you’ll sacrifice if you mix and match tools. And that will be even more frustrating.


Is there anything in VLSI layout other than “pushing polygons”? (3)

Is there anything in VLSI layout other than “pushing polygons”? (3)
by Dan Clein on 11-13-2017 at 12:00 pm

In late 1986 the Layout Project Leader of DSP96000 got married and left for a 6 months’ vacation so I inherited the biggest chip MSIL had in stock. Floorplanning such size chip was a challenge from day one. Even the 68030 SUN workstation was too slow. I started to ask around and going to demos for any other possible tool that can help me do my job. Our CEO came one night to see “how it’s going” and I was “routing” listening to music. He stayed about 15 minutes behind my back in which time I was capable to route 3 (three) signals in top level. I had a list of about 57,000 signals so he made the calculation that if I work 24 hours will be done way after tapeout time. He “empowered” me to find a solution “outside the box” even if it could be against Motorola policy.


I knew about a revolutionary tool called BRP (Block Place & Route) from ECAD, later CADENCE. The issue was not only software but hardware. The SUN on 68000 family was too slow and limited on what it can handle in DRAM and disk. The RISC processors machines were available, but it was not a Motorola chip in it ☹. Guess what, I sold the idea to our CEO and requested a RISC machine with 4 disks of 1 Gb each. It was approved immediately and the Israeli distributor, who used this machine for demos only 2 weeks earlier, had to bring it in MSIL and install it in record time.

We contacted ECAD/Cadence and we got again from UK another great AE, Steve Upham. He planned to stay for 2 weeks and stayed for 5 months. The biggest power of this software was that all signals will be routed with “preplanned” constraints and it knew how to maintain metal direction. This means new flow: each circuit owner of a block will provide a list of signals with EM/IR requirements (widths and currents) , and the layout person will ensure all applied based on architecture. The challenge was to deal with power grid issues, each block being connected many time to the same supply. All lines were “symbolic” in routing so easy to move from channel to channel or just manipulate. Once happy with the location you pushed the “compaction” button and all becomes real WIDTHS with proportional number of VIAS, and this was in 1988.

It was looking promising as we succeeded the first trial in 3 weeks, routing about 50,000 signals in 3 layers. Now the problem was how to import/export the GDSII data from/into CAECO where we need it for the real chip. CAD came to rescue again, Nachshon Gal worked with me to generate empty boxes of the real layout containing only the pins of all blocks so the total data we use in top level has only boundary and routing pins! We invented blocks ABSTRACTs in 1988. Steve and Nachshon had to rewrite a new interface and break down the final data from BPR into 2 pieces and reassemble it in CAECO. Even with abstracts the data was too big for the translation engine. One cool feature to reduce data in BPR was to reduce jogs and jumpers so smooth planning and clean channel reduced the final data. Win / win situation, smaller chip and smaller data for tools to handle.

Once we figured out that data size could be an issue, we looked at planning final GDSII verification. Remember all as FLAT in 1988 so the data footprint was very big for verification and virtual memory needed temporarily for DRC and LVS was also big. We envision that our chip will be 3×4 of the previous chip in area so we multiplied everything by 12. We did a local benchmark and found out that the only way to run this data size will be on the IBM servers in Austin, Texas, Motorola Semiconductor headquarters. We spend some effort and after one month of trials figure out what do we need for final verification. We booked 2 years in advance the servers, the virtual memory and disk needs. It proved to be a good proactive action and we tapeout on the promised day. This was another “non-layout” interesting task.

As the DSP requires a lot of Datapath blocks for Address Controllers, Multipliers, Adders, etc., we had to find ways to automate this type of layout blocks. We developed special cells for each function but adding a way to deal with last minute ECO (Engineering Change Order) was a priority in our minds. After the TEXT layer used in standard cell libraries the taste for “smarter automation” grew. We invented text layers for everything: contacts, diffusion, implants, and vias and metals. We developed a datapath programmable software that we called STDGEN, or Standard Generator. We built cells full with all layers and when ready DRC/LVS clean we changed the real layers in text one. Based on the needed functions from each cell, STDGEN was replacing the text layer with the real one and run the new verification versus the new netlist. No change in area, all external pins maintained, no possible errors and the final “coding” could come one day before final chip verification.

On this one I worked with Eythan Weiberger and other CAD guys… I was involved only in the specification stage and chip level verification as other people in the team helped CAD in this case. For this chip we spend approximatiely 300 man/months layout and about the same effort for circuit and system design and we were on schedule as planned 3 years earlier by Ika Reuveni, the original layout project leader. How is this to invent new things “as” and “when” you need them? A big part of doing all this was the unconditional support from management and all adjacent groups. It was challenging, fun, and many times exhausting but we never gave up.

One interesting fact little known to the world outside Motorola Semiconductor Israel and Cadence Israel. Around 1988 CADENCE released the first version of OPUS. Once the first customer (MSIL) bought a CADENCE license, they decided to hire a salesperson and an AE for support. But OPUS software was about 500 Mb and CADENCE did not have such a big computer to handle it. Cadence came to MSIL and negotiated to put the OPUS corporate software on my machine (which was at that time 4x1Gb disks). Every time CADENCE needed keys to enable a new customer to use any OPUS software, they will send by fax the CPU ID and I will generate the license keys. This enabled me to know more people from other Israeli companies, sometimes competitors, a little different job than layout designer duties.

Dan Clein
CMOS IC Layout Concepts, Methodologies and Tools

Also Read: Is there anything in VLSI layout other than pushing polygons? (2)


Finding the Right Needle in the IP Haystack

Finding the Right Needle in the IP Haystack
by Daniel Nenni on 11-13-2017 at 7:00 am

As the percentage of pre-configured IP increases in semiconductors, so design teams are able to reduce design cycle times. But one of the challenges for design teams is the inability to quickly and easily find IP because it’s incorrectly classified, sat in a designer’s home directory, or it’s been put into the ‘repository’ by an IP developer and someone forgot to update the spreadsheet to notify the design team that it’s available.


These challenges of finding the right “needle in the IP haystack” introduces delays to achieving design closure that run into millions of dollars of lost opportunity costs.

It sounds far fetched that in this day and age companies would use a spreadsheet to track IP, but many semiconductor companies do not have a specialized IP management solution; instead they use a PLM or ERP system like Oracle – or they use an in-house/home grown system that needs constant enhancement and maintenance by a small army of developers.

The result of these ‘legacy’ processes and solutions is that design teams spend unnecessary time and energy trying to find the right version of the right IP for their design. Whether internally developed or externally procured, the information about all of the IP in an enterprise needs to be accessible to those who need to find and qualify that it’s the required IP. It also needs to be hidden or non-discoverable to those who don’t need to see it.

Consensia, a channel partner of Dassault Systèmes, will be holding a webinar (moderated by me) later this month to demonstrate how DelphIP helps its customers’ IP consumers quickly and easily find the IP they need to add to their Bill of IP/SOC BOM. Consensia’s description of IP is anything that includes software, hardware, firmware or documentation.

DelphIP is an IP lifecycle management solution. It is based on the Dassault Systèmes ENOVIA platform, so it understands semiconductor nomenclature like foundries, process nodes, and other attributes that IP developers and consumers use to create or search for IP based. These attributes are added when the IP is created or goes into the repository, so it’s easy to track it through it’s lifecycle.

WEBINAR REGISTRATION Tuesday, November 28, 2017 8:00am PT – 9:00am PT

During the webinar, Consensia will demonstrate how design team members can search for, and easily locate, IP that meets their exact requirements – both internally developed or externally procured IP.

In an ideal world, IP would be developed and validated before a new design start commences. But in reality, IP is often developed in parallel with the chip design. DelphIP enables designers to see progress of IP that is being internally developed, as well as being notified when it has been published internally.

Consensia says that some of their customers have also benefitted from their Issue & Defect functionality. This allows ASIC/SOC design leads to report issues, have them assessed by the developer (or external vendor) and notified of the status of the IP so that they can get a roll up, hierarchical view of all of the defects in an SOC, something that the bug reporting tools don’t typically provide.

Consensia will also show something called collaborative workspaces which is a secure area used by customers working with external joint development partners. It enables semiconductor or IP companies to provide configurable access to specific IP with their partners. This accelerates the design process by giving visibility to specific IP on which they may want to undertake specific IP lifecycle management functions without compromising the security of other IP data.

The webinar should be an interesting insight into how IP lifecycle management is undertaken by some of the companies that use DelphIP. Again, you can register for Consensia’s webinar here. I hope to see you there.


Free PDF Version of PROTOTYPICAL for SoC Design

Free PDF Version of PROTOTYPICAL for SoC Design
by Daniel Nenni on 11-12-2017 at 7:00 am

In our quest to further enlighten the masses SemiWiki has published four books, we have two more eBooks in post production due out in Q1 2018 and two more topics in research. All of the books are available free for PDF versions or you can get printed versions on Amazon.com or free printed versions at book signings or if you happen to meet me during my travels. Continue reading “Free PDF Version of PROTOTYPICAL for SoC Design”