SemiWiki – Page 772 – The Open Forum for Semiconductor Professionals

RVN! 26 Banner revised (800 x 100 px) (600 x 100 px)

December 4, 2014

Intel is NOT Quitting Mobile!

Intel is NOT Quitting Mobile!
by Daniel Nenni on 12-04-2014 at 9:00 pm
Categories: Foundries, Intel Foundry, Mobile

Judging from the presentation Hermann Eul did at NASDAQ this week Intel is still in mobile. This presentation was probably inked before Intel Mobile was folded into the PC Group however. The first clue is Hermann’s title was listed as “General Manager and Vice President, Mobile Communications Group” which he is no longer. Even so, I found the presentation to be very interesting and the positioning of Intel in mobile solid. But first let’s talk about Prof. Dr. Hermann Eul (I’m a big fan of his from the Infineon days so this comes with some bias).

If you look at the Intel website the executive hierarchy is:

[LIST=1]

Three in the Executive Office

Four Executive Vice Presidents

Eight Senior Vice Presidents

Twenty Nine Corporate Vice Presidents

Last November Intel announced a “new” mobile strategy which put Hermann Eul on the front page with the SoFIA family SoCs. Hermann joined Intel in 2011 when Infineon’s Wireless group was acquired (SoFIA is from Infineon). At first he was President and General Manager of the newly formed Intel Mobile which was responsible for developing wireless products for connected devices (phones and tablets).

IoT Financial Outlook

IoT Financial Outlook
by Tom Simon on 12-04-2014 at 7:00 am
Categories: IoT

As exciting as the Internet of Things (IoT) is, the question of how and which companies stand to make money in this market remains. From previous waves of internet markets we have seen surprising wins and epic loses. How is the IoT market shaping up? And what are the real business drivers? According to a Silicon Valley Bank analysis, it’s important to look at IoT company market segmentation, and also to look at the relative size of these companies.

There are three areas of advancement that are fueling the current generation of IoT. First off is the increasing affordability of MCU’s, wireless devices and other hardware needed. Second is the ability to power back-end analytics needed to properly harness the information collected at the edge of the IoT. Lastly, are the economies provided by increasing numbers of connected devices. Metcalf’s Law asserts that the value of a communications network increases as the square of the number of attached devices.

Each of the drivers however faces potential hurdles. At some point decreasing hardware costs flatten out. For instance older nodes are used for a lot of RF and IoT devices. They will not see big costs per function cost decreases due to Moore’s Law. Big data and cloud resources used for analytics may need to be updated to deal with ever larger amounts of edge data collected. Ambiguity exists concerning whether a lot of connected devices have the same effect as ‘users’ for the purposes of valuing a communications network. Also we see the need to harmonize LTE networks, and make sure that other connectivity is easy to set up, secure and deploy.

Additionally other risk factors for business growth are power sources, software architectures and the brute force cost of adding connectivity to our existing ‘things’.

But the real question the SVB report raises is where will the pull for IoT business come from. Many large companies have been promulgating use scenarios for their IoT products and services, but are not seeing return on their investment. At the same time smaller companies are making inroads into markets and seeing real growth. Their challenge of course is to scale their growth so it becomes significant. VC investors are looking closely at their IoT funding. The SVB report states that angel money is supporting many more start-ups than the available series A funds from VCs can support in the long term. This means there will be a pruning.

Even within the start-up segment, critical mass is necessary for financial success. Early stage IoT companies making over $10M are seeing better sales growth than others in their peer group. It takes a big investment to get an IoT company to a higher valuation. The SVB report shows that series D rounds are most likely to be Up-Rounds for IoT companies. C rounds are the most perilous with close to 40% of companies having decreased valuations. If investors can stomach on average of $30M so a company can get past series C, those companies will have a better chance of giving their investors a good return.

The most interesting information in the report showed which market segments within IoT are prospering. Not surprisingly the IoT enablement category is performing well, with good growth, but hindered by higher costs of goods sold (COGS) and operating expenses. This category includes the hardware and software that is used as a foundation for IoT products.

The healthcare market seems to be a made to order market segment for IoT. Despite long leads times for product certification and extra effort required to meet the security requirements, healthcare has a growing demographic and many drivers for growth. Improved care levels and lowered costs for all kinds of medical monitoring, both at home and in hospitals are a big win. There is a clear cut ROI benefit for IoT devices in this market. Presently lack of scale is holding some of these companies back, but increasing demand can remedy this.

The third market segment featured in the SVB report is energy. Once again we see a very strong ROI for utilities and consumers. And in a less tangible manner there is a large ROI for our planet. Most of us already have smart meters and can monitor our power usage with increased granularity and almost in real time. This has ripple effects that go beyond reduced bills, to things like lower capital infrastructure costs from optimizing power plant utilization. By monitoring and controlling power consuming devices consumers can take advantage of dynamic rate structures that help manage power grid efficiency. This is a great example of how the IoT has grown from a simple point to point connection to a multi node multi directional system that can provide big leverage. But it also will require all the elements listed above, such as big data, connectivity, and sophisticated edge technology in the form of MCUs and communications devices.

As the IoT matures many people will be looking to see where there is growth and potential for profit. The development of real high value use cases will be essential. As is the case with disruptive technology shifts the winners and losers will be hard to predict this early in the game. But investors have learned to proceed cautiously, so hopefully we will not see a bubble, but rather rational growth and expansion.

December 3, 2014

No IoT No Justice!

No IoT No Justice!
by Daniel Nenni on 12-03-2014 at 9:00 am
Categories: IoT

It looks like even President Obama is on the IoT bandwagon now with his $263M in matching funds for state and local police body cameras and training. It is a shame that it took a tragic event to spur this type of Government investment in semiconductor technology but I appreciate it just the same.

As I have mentioned before, when I read something I do my best to understand what the author is saying but also why they are saying it. After following the media frenzy surrounding the events in Ferguson, Missouri I’m having trouble understanding either one in this case. Was the media’s intent to pour gasoline on a fire causing millions of dollars of damage? One thing I can tell you, as a result of this event all sorts of video surveillance equipment containing our precious semiconductor devices will be flying off the shelves, absolutely.

If you read the Grand Jury transcripts and the autopsy (yes the autopsy report is public) the so called eyewitness reports and media accounts are seriously conflicting. In case you are interested there is a Wikipedia page which does a decent job of capturing everything I have read thus far:

http://en.wikipedia.org/wiki/Shooting_of_Michael_Brown

We will probably never know what really happened that day which is why we will all be wearing personal video systems in the not too distant future. I do have an opinion on WHY it happened however which I will share with you now.

In the words of Iron Mike Tyson, one of the most feared boxers of my time, “Everyone has a plan until they get punched in the mouth.” As a former fighter I know this by experience. Do you remember the second Evander Holyfield fight where Tyson bit off part of Holyfield’s ear? I can assure you that was not Mike’s “plan” before he was hit in the face repeatedly. I’m also confident it was not Officer Darren Wilson’s plan to shoot an unarmed man six or more times that fateful day but as reported he was also hit in the face.

From what I have read I agree with the Grand Jury about criminal charges not being brought against Officer Darren Wilson. I do however think he should have been fired immediately for his careless approach to this situation which also opens up civil action. This man is an experienced Police Officer with no previous complaints lodged against him for anything. He had also never used his service weapon in the line of duty. So I ask you, just what was his plan exactly confronting two men through his car window without backup?

Hearing the world’s take on American current events first hand is very interesting to me. I was in Europe during the 9/11 attacks and was amazed at the response from people around me. Some good, some bad, enlightening just the same. I also remember being in Japan when the remake of the Pearl Harbor movie was released. Now that was interesting. Since SemiWiki is an international community of semiconductor professionals I would be interested to hear your opinions on this tragic event in Ferguson, MO and what I have written here.

December 3, 2014

Getting up close and personal with symmetric session key exchange

Getting up close and personal with symmetric session key exchange
by Bill Boldt on 12-03-2014 at 2:00 am
Categories: Foundries

In today’s world, the three pillars of security are confidentiality, integrity (of the data), and authentication (i.e. “C.I.A.”). Fortunately, Atmel CryptoAuthentication crypto engines with secure key storage can be used in systems to provide all three of these.

Focusing on the confidentiality pillar, in a symmetric system it is advantageous to have the encryption and decryption key shared on each side go through a change for every encryption/decryption session. This process, which is called symmetric session key exchange, helps to provide a higher level of security. Makes sense, right?

So, let’s look at how to use the capabilities of the ATSHA204A CryptoAuthentication device to create exactly such a changing cryptographic key. The way a key can be changed with each session is by the use of a new (and unique) random number for each session that gets hashed with a stored secret key (number 1 in the diagram below). While the stored key in the ATSHA204A devices never changes, the key used in each session (the session key) does. Meaning, no two sessions are alike by definition.

The video below will walk you through the steps, or you can simply look at the diagram which breaks down the process.

http://www.youtube.com/watch?v=_WNxFtI5A9E

The session key created by the hashing of the stored key and random number gets sent to the MCU (number 2) and used as the AES encryption key by the MCU to encrypt the data (number 3) using the AES algorithm. The encrypted data and the random number are then sent (number 4) to the other side.

Let’s explore a few more details before going on. The session key is a 32 byte Message Authentication Code or “MAC.” (A MAC is defined as a hash of a key and message.) 16 bytes of that 32 byte (256 bit) MAC becomes the AES session key that gets sent to the MCU to run the AES encryption algorithm over the data that is to be encrypted.

It is obvious why the encrypted code is sent, but why is the random number as well? That is the magic of this process. The random number is used to recreate the session key by running the random number through the same SHA-256 hashing algorithm together with the key stored on the decryption side’s ATSHA204A (number 5). Because this is a symmetric operation, the secret keys stored on both of the ATSHA204A devices are identical, so when the same random number is hashed with the same secret key using the same algorithm, the 32 byte digest that results will be exactly the same on the decrypting side and on the encrypting side. Just like on the encrypting side, only 16 bytes of that hash value (i.e. the MAC) are needed to represent the AES encryption/decryption key (number 6). At this point these 16 bytes can be used on the receiving side’s MCU to decrypt the message(number 7). And, that’s it!

Note how easy the ATSHA204A makes this process because it stores the key, generates the random number, and creates the digest. There’s a reason why we call it a crypto engine! It does the heavy cryptographic work, yet is simple to configure the SHA204A using Atmel’s wide range of tools.

http://www.youtube.com/watch?v=rGethCEF3C8

Not to mention, the devices are tiny, low-power, cost-effective, work with any micro, and most of all, store the keys in ultra-secure hardware for robust security. By offering easy-to-use, highly-secure hardware key storage crypto engines, it’s simple to see how Atmel has you covered.

Bill Boldt, Sr. Marketing Manager, Crypto Products Atmel Corporation

December 2, 2014

HLS – Major Improvement through Generations

HLS – Major Improvement through Generations
by Pawan Fangaria on 12-02-2014 at 6:30 pm
Categories: EDA

I am a believer of continuous improvement in anything we do; it’s pleasant to see rapid innovation in technology these days, especially in semiconductor space – technology, design, tools, methodologies… Imagine a 100K gates up to 1M gates design running at a few hundred MHz frequency and at technology node in the range of .18 to .35 microns in late 1990s and early 2000 when designers were struggling to optimize PPA and shorten design cycle time. Synopsys pioneered RTL to gate level synthesis which proved to be very successful. Today, with a billion gates SoC, operating at GHz frequencies, fabricated at cutting edge technology nodes, it is imperative one has to optimize PPA at system level. True, FinFET technology at 14nm provides excellent PPA, but that has huge cost and dynamic power implications. SoC verification cost has gone up tremendously. It’s time; we look at ways to optimize power and other critical success criteria at system level (not necessarily for FinFET nodes) and also reduce burgeoning design and verification cost including that of running huge regressions through large server farms.

I admire, people did have foresight on High Level Synthesis (HLS) in the form of behavioral compiler in 1990s. Then in 2004, Mentorunveiled Catapult which synthesized pipelined, multi-block subsystems from C/C++. During the same period, Forteintroduced Cynthesizer which synthesized hardware from SystemC. Continuous refinements went down the line to address design issues such as control logic improvements, power optimization, levels of timing abstractions (TLM standard came up) and so on. Other HLS tools such as CadenceC-to-Silicon also came up. These tools actually demonstrated the value of high level synthesis in terms of top-down design methodology from system level that optimized design architecture and cut down design time significantly. However, wider adoption of the HLS tools in design community was distant because they catered to specific type of hardware designs that exhibited mostly one-way data movement. The other reason was lame economic and business push to adopt HLS amid several issues to be resolved there.

Calyptowhich was excelling in SLEC (Sequential Logic Equivalence Check) and PowerPro (Power optimization tool at RTL) found complementary value in Catapult and acquired it from Mentorin 2011 to provide a comprehensive HLS solution. Since then Catapult is proving well in providing value for differentiated IP in video processing, image processing and advanced communication area. Recently by using Catapult Googlewas able to reduce the design time by half for their VP9 video decoder design and further they collaborated with VeriSiliconwhere it was very easy to share C code between them.. Now there is critical mass of designers seeing value in HLS that can optimize design architecture for best PPA at system level, reduce design time by large extent, accelerate verification and debugging at C/SystemC level, and facilitate collaboration and reuse through sharing of technology and architecture neutral designs. By using Catapult, customers have seen significant saving in area (up to 18%) and time (up to 16x) at best QoR of their designs.

However, HLS is still not a mainstreamdesign methodology, why? Recent surveyconducted by Calypto shows that designers need more control for design closure and seamless flow with their RTL verification, choice of C++ or SystemC and also learning through use of HLS. What’s Calypto doing to address these issues now?

The first of its kind in the 3[SUP]rd[/SUP] generation of HLS, Calypto announced Catapult 8 Platform that has unmatched capabilities to make designers more productive through HLS. In my brief telephonic call with Sanjiv Kaul, CEO at Calypto, Mark Milligan, VP of Marketing and Bryan Bowyer, Catapult 8 Product Engineering, I learnt that this newly architected product is a result of multi-year investment in Catapult since 2011. Interestingly, before this full production release, Calypto migrated its major partner customers to Catapult 8 through limited access release in 2014. Naturally, the active key designers’ input has been taken to architect this platform! What’s new?

Unlike older generation of HLS where any incremental change in C++/SystemC could lead to a very different RTL, now with configurable hierarchical design architecture of Catapult 8, designers will have full control over design hierarchy where they can assemble the design in top-down or bottom-up fashion, synthesize and verify individual blocks at a time while keeping rest of the design locked and import Verilog or VHDL IP as needed. This methodology provides automatic as well as designer controlled synthesis that provides 10x capacity improvement in design assembly and synthesis.

Catapult 8 moves the verification up and addresses designers’ major concern about C and RTL mismatches by synthesizing assertions and cover points, identifying and guaranteeing key equivalent points, providing cross-probing between RTL and C++/SystemC and using integrated formal tools to identify unreachable states. With Catapult 8, designers are able to obtain full RTL verification coverage, that has been a requirement for wide spread adoption of HLS. The methodology provides full functional coverage at much reduced (100x to 1000x) server need and smaller code to debug. Also it provides integration with verification flows based on industry standard methodologies such as UVM.

The platform is flexible to accept C++ or SystemC; designers may use both on different projects. Catapult LP, available on Catapult 8 platform provides power optimized RTL which uses Calypto’s patented deep sequential analysis technology and also enables designers to try different microarchitectures to explore low power.

What’s more? Catapult 8 includes a brand new Catapult Catware library of pre-built, synthesizable components that can be used for faster deployment and adoption of HLS. Expect widespread adoption of HLS with this new innovative platform! Stay tuned to hear more details on the specific state-of-the-art capabilities in Catapult 8.

Design Rule Checking (DRC) Meets New Challenges

Design Rule Checking (DRC) Meets New Challenges
by Daniel Payne on 12-02-2014 at 7:00 am
Categories: EDA, Synopsys

The traditional batch-oriented DRC process run as a final check to ensure compliance with foundry yield goals is quickly moving toward a concurrent DRC process performed early and often throughout design, especially at the 28 nm and smaller process nodes. What are the technology factors causing this change?

Increasing number of rules and their complexity
Coloring – the multi-pattern mask requirements
Metal fill is more complex and impacts timing results
Place and Route (P&R) has to be concurrent with DRC to get closure

FinFET transistors starting at the 22 nm node added some complexity to the DRC process, however the double patterning technology (DPT) required at 20 nm caused more computational complexity than FinFET for DRC jobs. Having to comply with 1,000 design rules at 10 nm does not look fun to me. Finally, keeping the CMOS process planar by adding fill patterns has mushroomed in effort required.

DRC Challenges

Related – FinFETs for your Next SoC

DRC tools can identify and automatically fix odd-cycle loops found with DPT, but when we start using Triple Patterning at the 10 nm node you will initially just get a warning and have to make the fixes yourself.

DPT Auto Repair

The computational effort required on verifying the coloring on triple patterning is increasing, so changing the approach to include heuristics is one way to keep run times down. There could even be quad patterning required for the 7 nm node.

Triple Patterning coloring conflict

Metal fill insertion now has to deal with coloring, alignment with signal direction, uni-directional track-based fill and balancing out the density to avoid large gradients:

Metal Fill Insertion

A side effect of adding metal fill is that it adds extra capacitors, thus affecting the timing on nearby nets. Metal fill always degrades Worst-case Negative Slack (WNS) and Total Negative Slack (TNS), causing iterations to get timing fixed. A smart approach is to identify these effects early, fixing them prior to tape out by mitigating the effect.

Related – Challenges of 20 nm IC Design

The latest DRC tool from Synopsysis called IC Validator and it has features to address each of the new challenges talked about so far. Concurrent DRC during P&R is called In-Design, saving you on turn around time.

Traditional vs In-Design Flows

Starting at the 28 nm node is where you see the biggest benefit of using In-Design instead of the older, iterative flow. Something like 20 of the top 25 IC Compiler customers are already using In-Design. Have a single database between IC Compiler and IC Validator means that there is no data streaming, tool setup or translation steps required. IC Validator runs natively incremental on just the changed areas, saving more time over tools that are forced to run on the whole IC layout. One Synopsys customer used the automatic DRC repair feature on their 28 nm design and actually saved 2 weeks in their schedule.

Many of the DPT errors that are identified can be automatically repaired, and the router is part of making the needed repairs. This tight integration between P&R and DRC is really needed to cut down iteration times.

Related – Enabling 14 nm FinFET Design

Critical nets can meet their timing requirements by having the metal fill insertion add spacing on the same layer, which lowers the parasitic capacitance values.

Critical nets with metal fill spacing

IC designers can be somewhat sheltered from coloring and fill, but to get the best chip performance they may adopt a DRC methodology that is concurrent with P&R, run early and often. Using IC Validator is straight forward because the leading foundries (TSMC, UMC, SMIC) and Intel Custom Foundry all support the run sets for their process nodes. You can even work with the FD-SOI process at 28 nm from ST today, and smaller nodes coming soon.

December 1, 2014

Fitness Watch Anyone?

Fitness Watch Anyone?
by mbriggs on 12-01-2014 at 4:30 pm
Categories: Mobile

I’m an exercise junkie. I’m also not a spring chicken so I like having the time on my wrist. I’ve been anxiously awaiting an iWatch to go with my iPhone 6. As patience is not a virtue of mine, and the iWatch is rumored to be expensive ($400-500). I decided to try a fitness watch.

This is a crowded area that includes activity trackers and runner’s watches with GPS. Fitbit and Nike seem to be leading the way in the activity tracker space, with the GPS companies such as Garmin the Magellan trying to stay relevant with GPS enabled runner’s watches.

For me, a key component of a fitness watch is the heart rate monitor. Many of the fitness devices, including the iWatch, use optical sensors. These are generally green LEDs which shine through the skin. This seems to be a technology that hasn’t quite been perfected as active monitoring is a power drain, and the signal/accuracy is sometimes suspect. The runner’s watches on the other hand connect to a chest strap for heart rate monitoring. This is more reliable, but of course bulky and inconvenient.

Jawbone recently introduced a different technology they call Bioimpedance. The sensor technology comes from BodyMedia, a health-monitoring startup Jawbone bought last year. It measures the resistance of body tissue to tiny electric current to enable the capture of a wide range of physiological signals including your heart rate. If you’ve ever measured your body composition such as fat content, this is very similar. There are metal studs inside that conduct electrical bioimpedance measurements. The bad news is that Jawbone’s newest band the UP3 still doesn’t have a display, just a few LEDs.

I read all the reviews I could find, and several publications really liked the Basis Peak. To cut to the chase, the reviewer on re/code got it mostly right with her comment, “I think it earns the title of One of the Best Activity Trackers Available While We All Wait to See What the Heck the Apple Watch Is Really Like. But there are a few areas where it falls short.“

I really wanted to love the Basis Peak. A fitness tracker, that is also a watch is right up my alley. The experience started well, the packaging was extremely well done, even Apple like. The problems started the morning after, and every morning after that. I wore the watch to bed, but do not charge my phone in the same room. It was a struggle to resync with my iPhone 6. It required quitting the app, unpairing Bluetooth, and restarting the watch. I struggle with almost all my Bluetooth devices, including the car and headsets. My Logitech keyboard is my only Bluetooth device that is anywhere close to reliable.

The heart rate monitor often took a long time to get going, as in a minute or two. I shaved the hair off my wrist thinking a better connection with my skin was the problem, but it didn’t help.There were also times when the heart rate shown on the watch was obviously inaccurate.

The calorie reading seemed to be right on, even though an exercise bike is considered walking, so you don’t get credit for steps. The Stairmaster 6000 (the real steps) is considered walking. The straw that broke the camel’s back is when water seeped into the watch during a hot tub incident and now it doesn’t work at all.

Net, net is that the concept is great, but I think seamless operation is a generation or two away. I haven’t quite decided if I’ll try the Withings, Garmin, Fitbit, or wait for the iWatch. My top contender as of this moment is the Fitbit Surge, available in early 2015.

Suggestions anyone?

December 1, 2014

3DIC in Burlingame

3DIC in Burlingame
by Paul McLellan on 12-01-2014 at 7:00 am
Categories: Events

Every year in December is what I think of as the main 3D IC conference where you can get up to speed on all the latest. Officially it is called 3D Architectures for Semiconductor and Packaging or 3D ASIP. It is held in the Hyatt Regency in Burlingame (the one right by 101 near the airport). This year it is from December 10-12th.

The first day is a pre-conference symposium. In the morning Herb Reiter is the master of ceremonies for a session on 3D-IC design tools and flows, with presentations by Bill Martin of eSystem Design, Zafer Kutlu of GlobalFoundries, Brandon Wang of Cadence, Norman Chang of ANSYS/Apache, John Ferguson of Mentor, Ming Li of Rambus, Durodami Lisk of Qualcomm and Jerry Frenckil of Si2.

That afternoon Herb passes the baton to Phil Garrou for a discussion of 3D Process Technology. Dean Malta of RTI will talk ab out TSV Formation. Severine Cheramy of CEA Leti will talk about Temporary Bonding and Via Reversal. Laura Mirkarimi of Invensas will talk about the rather broad topic of Assembly.

The conference proper starts on Thursday December 11th at 8am. The opening keynote sessions are:

Steve Schultz of Si2 titled A Design Ecosystem for Internet of Things, How 3D IC Standards will Enable a New Growth Paradigm. He will talk about how IoT will be a driver for 3D and how important standards will be to making it happen in a timely manner.
Robert Sturgill of Micron on 2.5D and 3D Memory Solutions and Outlook. Micron builds the hybrid memory cube which has 4 memory die on top of a logic die, and so arguably they have as much experience at 3D in a commercial context as anyone.
Xin Wu of Xilinx on An Ultrascale 3D FPGA. Xilinx built what is regarded as the first 2.5D chip in commercial production. Since it is a very high end FPGA it does not ship in enormous volume nor does it have to meet a consumer price point, but it was clearly an experiment to serve as a learning vehicle.

The next session isIoT, Memory and More than Moore with presentations from Yole Development, GE, Novati and NVIDIA.

After lunch, it is on to Perspectives on Manufacturing and Cost with presentations from Techsearch, Invensas and SavanSys Solutions. Since the main issues in 3D seem to be more around getting the cost down more than the technology of 3D manufacturing (we know how to make TSVs pretty well) this should be an interesting session.

Herb then will moderate a panel with Qualcomm, Atrenta, EVG and UC Santa Barbara on how to further strengthen 2.5D/3D IC pathfinding.

Finally, to wrap up the day, Ansys/Apache and Synopsys will talk about modeling, signal integrity and more.

On Friday we start by dropping half a dimension with a session on 2.5D interposers, with presentations from GlobalFoundries, Nanium, CEA-Leti, RTI International and Unimicron.

There is then a session on monolithic 3D, which is not building separate die and then using TSV to stack them but rather building a 3D chip by laying down more and more layers on a starting wafer. Presentations are by CEA-Leti, EV Group and Monolithic 3D.

The final session is 2.5/3D Systems — Bringing It All Together. There are presentations from Fraunhofer Institute for ICs, ON Semiconductor and IBM TJ Watson Research.

Don’t Mess with SerDes!

Don’t Mess with SerDes!
by Eric Esteve on 12-01-2014 at 2:23 am
Categories: Cadence, EDA

SerDes stands for Serializer/Deserializer, and SerDes is a serious piece of design, requiring an extremely experienced team of analog engineers (below 10 years’ experience, you’re still a quasi-beginner). Better to rely on an analog guru to draw the SerDes architecture and manage the team! Why does SerDes is becoming more and more important? At first, because next-generation peripherals, tablets, servers, and other applications are demanding greater bandwidth at lower cost and power. To meet these demands, communications protocols like PCI Express® (PCIe®) have gotten substantially faster—PCIe Gen4 calls for signal transmission speeds of 16Gbps. Such a protocol based function (PCIe, MIPI, SATA, etc.) is made of: Controller (100% digital) + PHY.

The PHY itself can be broken into a (PIPE interface + Physical Coding Sublayer (PCS)), both digital, and the famous SerDes. We could imagine running some of the SerDes functions by using Digital Signal Processing (DSP) circuitry, but the power consumption would explode, thus SerDes are completely analog, based on full custom design. The second reason why SerDes design is becoming more critical as bit rate goes up (16 Gbps for PCIe 4, 28 Gbps for certain communication protocols) is because it requires to add new design techniques to compensate the channel losses (as high as 27dB at Nyquist for PCIe 4) AND keep the power consumption as low as possible.
Until I read this white paper from Cadence, I was under the impression that I knew enough about SerDes design technology to be able to discuss about SerDes, during a conference or a show… but I learnt so many new design and architecture features that I really suggest you to read this white paper “Defining a New High-Speed, Multi-Protocol SerDes Architecture for Advanced Nodes” from Cadence.

Let’s try to summarize the main points.
The author claims that a SerDes supporting the latest communications protocol specifications, including PCIe, calls for a new type of SerDes archi*tecture that addresses the following needs with minimal power dissipation:

Data and clock recovery requirements in high-speed, high-dB-loss, and high-crosstalk channels
Critical loop timing specifications for the DFE
Environmental and process variations
Transmitter performance under low-supply conditions
High-speed clock distribution

Each of these points will be made explicit during the course of a white paper. Please remind that the paper describes a SerDes supporting “Multi-Protocols, and multiple processes nodes, going up to 16nmFinFET.

Clock generation and distribution
For clock generation and distribution, a low-jitter clean-up phase-locked loop (PLL) in the common area allows the use of a cost-effective reference. Un-buffered clock distribution on high-level metal avoids jitter that’s induced by power supply noise. The architecture also includes an in-lane local PLL operating at the TX baud rate.

Dual-path Reception

Traditional SerDes architectures have a limit in maximum achievable equalization. Typically, the DFE can’t begin opening the eye until clock recovery has occurred. Clock recovery, in turn, can’t start until the eye is slightly open. Also at odds are clock recovery and data recovery continuous time linear equalizer (CTLE) frequency response. Thus the architecture consists of separate optimized paths for clock recovery and data recovery. For each path, the relative timing is adjusted by an adaptive loop, which saves power. Decoupled clock recovery also allows for much better jitter tolerance because the CTLE and edge samplers are optimized for the clock path. Unlike many SerDes architectures, this new architecture allows use of every edge in the data stream for clock recovery.

The optimized clock path gets more signal and less noise than in a single-path design, due to a number of factors. For one, a separate CTLE for clock recovery, shown in the lower portion of Figure 3 (above), allows high-frequency peaking optimization for clock recovery. The equalizer is converged at the clock sample time, without having to rely on incorrect discrete equalization converged at the data sample time. And, all data patterns can contribute to clock recovery.

In Figure 3, the red blocks show the adaptive loops in the receiver. A digital controller manages all of the loops. Some of the adaptive loops are for start-up only. Others run in the background, so if there are changes in, for instance, humidity or temperature, then the backplane automatically adjusts to accommodate the changes. This approach allows continuous uptime, as the background adaptive loops do not interrupt the flow of data through the system.

Hybrid Tx Path

The hybrid TX path (Figure 4) in the multi-protocol, high-speed SerDes architecture is designed with a hybrid driver with true emphasis, not just de-emphasis. This path offers better rise times due to boost circuit, lower output cap than H-bridge, and less wasted power in de-emphasis. The hybrid TX path addresses transmitter effects by:

Maintaining power advantages inherent to non-emphasized voltage mode
Allowing additional amplitude in excess of what the voltage mode can produce
Requiring less power in emphasis than conventional voltage mode or current mode driver
Allowing, through the use of broadband matching, the use of larger and better protecting electrostatic discharge (ESD) diodes

Since the data path is a full-speed DFE, it avoids the substantial increase in required circuitry and clock distribution that unrolling would need. In addition, the IP:

Eliminates the need for a critical IQ phase-aligned clock distribution
Uses a reduced area and power, wide-frequency-range phase interpolator
Features lower frequency, top-level clock distribution

Summary
A new multi-protocol, high-speed SerDes architecture, designed for advanced nodes, addresses all of the existing and new challenges while offering the following characteristics:

Support for data rates of 1Gbps up to 16Gbps, with a continuous frequency range
Compliance with:
- PCIe Gen4: 2.5Gbps, 5Gbps, 8Gbps, 16Gbps
- 10G-KR: 10.3125Gbps, 12.5Gbps
- XAUI: 3.125Gbps
- RXAUI: 6.25Gbps
- Gigabit Ethernet/SGMII: 1.25Gbps
- SATA: 1.5, 3, 6Gbps
- HMC-SR: 10Gbps, 12.5Gbps, 15Gbps
Equalization up to 30dB channel loss (in the presence of -48dB crosstalk)
Low power
Flexibility and robustness

An IP vendor targeting the interface IP market, expected to grow with 10%+ CAGR until 2020, has to propose an integrated solution (Controller + PHY) on the market. This means that such an IP vendor has to perfectly manage SerDes design technology, not only on the mainstream, but on the advanced nodes like FinFET 16nm, supporting data rate up to 16 Gbps. Developing such a multi-protocol SerDes is a real challenge, but the ROI will be high. In fact, the market demand for higher bandwidth is growing incredibly fast: every year, the demand for storage is growing by 60%. Before storing data you need to exchange it and increasing system bandwidth is a good way to keep the size and cost of networking system reasonable. But you need to increase the various protocols (Ethernet, PCI Express, etc.) frequency, and to do so, you need new, power efficient SerDes.

Learn more about Cadence’s multi-protocol, high-speed SerDes PHY IP at:
http://ip.cadence.com/ipportfolio/ ip-portfolio-overview/interface-ip/serdes-ip

By Eric Esteve from IPNEST

November 30, 2014

How to Optimize for Power at RTL

How to Optimize for Power at RTL
by Daniel Payne on 11-30-2014 at 7:00 pm
Categories: EDA

Last week I was traveling in Munich attending the MunEDA User Group meetingso I missed a live webinar on the topic of optimizing for power at RTL. I finally got caught up in my email this week and had time to view this 47 minute webinar, presented by Guillaume Boilletof Atrenta. He recommended using a combination of automatic, semi-automatic and manual approaches to reduce power. At the SoC level you can make decisions about power and voltage domain partitions, critical blocks can use manual optimization like course or fine-grained clock gating, and non-critical blocks could use automatic power optimization techniques.

Example SoC Block Diagram: Broadcom BMC2153

The proposed power optimization flow consists of several steps, beginning with your initial RTL as input to the process, and ending up with a gate-level netlist after logic synthesis, placement and routing.

Power Optimization Flow

The first step of Power Estimation is where your RTL or even a netlist is parsed, along with any input stimulus or statistical toggling estimates, then producing power numbers per cycle or averaged over time.

There’s an optional Power Calibration step shown in the upper-left of the flow, and this is for designers that want to correlate a gate-level netlist with capacitances for interconnect against RTL numbers. The steps for power calibration are:

Power Calibration

SPEF is the interconnect parasitics, SGDC is the SpyGlass Design Constraint File, ACM is the Advanced Capacitance Model, and SIM is your input stimulus. The difference between RTL power estimates and the calibrated should be within 15%.

Another option within the Power Estimation phase is to do a physical-aware step, where you use actual placement information about cells and IP blocks like memory. Timing comes from a path delay calculation, so run times are slower:

Physical-aware Power Estimation

Going back to the power optimization flow the power profiling block is where a designer gets feedback on power estimates for each block in the SoC. Numbers on clock gating efficiency and activity levels provide the designer with analysis to decide which blocks should use clock gating techniques. A power browser displays numbers and a visual GUI to show you which blocks consume the most power.

Power Browser

Clicking on a block you can see more details like the registers, memories, micro-architecture and clocks being used, all opportunities for power reduction techniques.

There are several sequential power reduction techniques available, one is called stability condition where the enable to a downstream register can be identified and controlled.

Stability Control

Another sequential power reduction technique is called Observability Don’t Care Condition, where the enable to a register can be identified so that it doesn’t toggle node Q and the following logic stays dormant.

Observability Don’t Care Condition

Power reduction techniques applied to memories include:

Input data registers clocked
Redundant access removal
Light sleep mode activation

Feedback on how to modify your micro-architecture for lower power can be through FIFO optimization, counter gating and glitchy input identification. Activity trigger detection can find and show you the root causes of power changes from idle to active, spikes or surges.

Activity Trigger Detection

The RTL Power Verification step in the flow is where you want to double-check that your power goals have been met as specified by UPF 2.0 or 2.1, power lint checks have been run, and that RTL versus power intent is consistent. Power Verification also includes the step where the post-synthesis RTL is checked to be consistent.

The SpyGlass tool suite continues to expand over time, initially starting with lint and now helping RTL designers create power optimized designs. View the entire webinar here including a Q&A session, after a brief registration process.