webinar IPXACT banner

Designing an IDCT for H.265 using High Level Synthesis

Designing an IDCT for H.265 using High Level Synthesis
by Daniel Payne on 07-27-2015 at 8:00 pm

Math geeks know all about Inverse Discrete Cosine Transforms (IDCT) and a popular use is in the hardware architecture of High Efficiency Video Coding (HEVC), also known as H.265, the new video compression standard and widely used in consumer and industrial video devices. You could go about hand-coding RTL to create an IDCT function, but it would take you too many lines of code and precious engineering time compared to using higher level languages like C++ or SystemC. The promise of using High Level Synthesis (HLS) is that you can actually code your video algorithms in much less time and code compared to RTL, thus getting to market quicker with less engineering effort.

Uday Das from Calypto presented a tutorial at the #52DACevent last month in San Francisco with the subject, “Building an IDCT for H.265 Using Catapult“, so I reviewed the 46 slides and share my impressions in this brief blog. The HEVC specification calls for 4 transform units of various sizes: 4×4, 8×8, 16×16 and 32×32 to code the prediction residual. The hardware architecture here uses a row column decomposition approach that performs a 1-D operation on each row, followed by another 1-D operation on each column:

Related – NVIDIA and Qualcomm Talk about High Level Synthesis, Samsung on Low Power for Mobile

Algorithm
The IDCT algorithm can be described as a lower order matrix embedded in a higher order matrix, then detailed in a signal flow graph as an 8 point IDCT A8, made up of 4 point 1D IDCT A4 and an odd matrix M4:

Data flow for this algorithm can be designed using two major functions: Butterfly, Mult_odd.

An interface description can then be written in either C or SystemC, where C code is more compact:


A core class can be written and then re-used for the 4, 8, 16 and 32 points of Mult_odd and Butterfly member functions:

The Butterfly function is common for all sizes, and notice that there is no timing information at this level. The HLS tool Catapult will unroll the loop to create hardware for parallel execution.

Related – Shorten the Learning Curve for High Level Synthesis

Our functional model of the 1-D IDCT has instances of function calls and some muxes:

To meet the H.265 specification we have to make a parallel implementation and create a 2-D IDCT using some hierarchy:

Using HLS
Designers use the HLS tool Catapult by adding design files, clicking on a hierarchy tab selecting the top-level blocks, then clicking on libraries to select a specific technology and RAM models. Next you click on mapping an choose a target clock frequency, than map your data_in and data_out as RAM.

You next select your main loop and see which resources are being used in the design:

To schedule when operations are to occur you click on the schedule tab and work with a Gantt chart. Finally, you are ready to generate RTL code.

Verification
To double check that the generated RTL code is actually performing what we had in mind with our algorithm we need to create a testbench and verification flow. Most of this process is now push-button automated for us:

The transactors are what converts function calls into pin-level signal activity.

Related – Verifying the RTL Coming out of a High-Level Synthesis Tool

Summary
The tutorial from DAC showed me that C++ and SystemC coding are more compact to describe my video hardware than using RTL code. The Catapult tool for HLS is used to control micro-architectural decisions so that I can trade off power, performance and area metrics.

Companies like Google have found that using HLS on their VP9 video compression design was 2X faster than the previous approaches using hand-coded RTL, while dramatically reducing the number of lines written. Give the folks at Calypto a call to start discussing how appropriate HLS is for your hardware architecture, you may just find out that you can get your next IP or SoC to market in less time with fewer engineers, a nice benefit.


6 Memory Considerations for IoT Designs Built Around Cortex-M7 MCUs

6 Memory Considerations for IoT Designs Built Around Cortex-M7 MCUs
by Majeed Ahmad on 07-27-2015 at 12:00 pm

Tightly coupled memory (TCM) is a salient feature in the Cortex-M7 microcontrollers as it boosts the MCU performance by offering single cycle access for the CPU and by securing the high-priority latency-critical requests from the peripherals.

Continue reading “6 Memory Considerations for IoT Designs Built Around Cortex-M7 MCUs”


Synopsys’ Andreas Kuehlmann on Software Development

Synopsys’ Andreas Kuehlmann on Software Development
by Paul McLellan on 07-27-2015 at 7:00 am

Andreas Kuehlmann is the general manager of what is officially now known as the Software Integrity Group of Synopsys, what you might think of as Coverity although they have made some acquisitions too, so they now have a broader technology base. I sat down to talk to him last week.

He was brought up in Germany and came to the US in 1991 to join the IBM TJ Watson Research Center. He was involved with high level synthesis and worked on equivalence verification, in time enabling IBM’s custom verification.

In 2000 he joined the Cadence Berkeley Labs (which was where I think I first met him since I was at Cadence at the time). In 2003 Andreas was promoted to being in charge of running the labs. In 2010 he joined Coverity as the VP of R&D. Funnily enough he had also become the president of IEEE Council on Electronic Design Automation (CEDA) so he became president of CEDA just before leaving design automation! Since 2002 he has also been an adjunct professor at Berkeley.

Coverity was acquired by Synopsys and in May of this year Andreas was appointed GM of the Software Integrity Group. Despite being part of Synopsys, Andreas emphasized that their business is not EDA. They are serving the software industry which is much larger than semiconductor. To give you an idea, there are about 100,000 design engineers, 1M embedded software engineers and 10+M software engineers total growing 10% per year.

A year ago they acquired a small startup in France which manages software test execution, finding which tests need to be run when a change is made. Recently, they added two acquisitions in the area ofdynamic security testing, complementing the static analysis approach used by the Coverity technology. So these acquisitions add dynamic analysis.

The mission of the group is to make software development a more mature process. There is a great diversity in the maturity level companies apply to software development and many don’t use modern methodologies. In chip design you don’t get to “run” the code by taping out the chip, so if you don’t use modern methodologies you don’t get working chips. Software development is not like that and quality and security suffers as a result.

What is needed is a more general approach like we use in hardware design with a combination of different approaches. Static analysis under the hood uses some of the same technology as formal verification but there is no code reused, you can’t just yank out some Synopsys product and make a software version. Some Synopsys products in the system space, such as virtual platforms, are also involved in embedded software, where there is a much stronger awareness of the disciplined approach since they see what the IC designers do day-to-day.

Embedded software is simply any software that runs in a box: a car, a washing-machine, a router. It is not small scale. There is a lot of code in your smartphone as you probably know, although that is a lot less mission critical than your car.

In the IC world, the tool investment is $50-100K per engineer. In the software world it is more like $10-12K. This will change. Software development is a labor-intensive process and with modern tools it can be done much better. It makes no sense to pay a software engineer $150K/year and then not give him or her good tools, any more than it makes sense with an IC designer.

I asked Andreas about open source competitors? He says they are inferior. Anything with high algorithmic content is hard to develop using open source projects because it depends on deep expertise not just manpower. It really doesn’t make any more sense for a software engineer to write their own C++ static analysis than it does for a design engineer to write their own static timing analysis. Apart from the opportunity cost, they almost certainly don’t know how to do it.

Having said that, they are involved with the open source community. They have scan.coverity.com which allows open source projects to use the Coverity technology for free. It has been applied to several thousand projects already. And they have found their share of bugs, even in some high profile projects like Linux and Apache (the webserver, not the EDA company).

See also Bijan Kiani Talks Synopsys Custom Layout and More
See also Antun Domic, on Synopsys’ Secret Sauce in Design


Power Analysis Needs Shift in Methodology

Power Analysis Needs Shift in Methodology
by Pawan Fangaria on 07-26-2015 at 7:00 am

It’s been the case most of the time that until we hit a bottleneck situation, we do not realize that our focus is not at the right spot. Similar is the case with power analysis at the SoC level. Power has become equally if not more important than the functionality and other parameters of an SoC, and therefore has to be verified earlier along with the functional verification of the SoC. Today, an SoC is a complete system with various functional units with different power profiles and requirements. Hence the traditional method of running simulation, generating switching activity for a number of cycles, using a power analysis tool to analyze this activity, and then extrapolating to estimate power for the whole SoC is no more appropriate; it was fine for smaller chips with limited functionalities. Today, we need to analyze actual switching activity for the complete run of applications on an SoC.

Emulation is an emerging solution to capture switching activity over long emulation run; typically in an SAIF (Switching Activity Interchange Format) file. However SAIF do not have temporal information which is a key need to identify power peaks at different times. VCD and FSDB formats have temporal information; however they are inefficient due to their large generation time and then read/write times. Also, there are other inefficiencies in their data organization and storage and access mechanisms. Moreover, the power analysis tools may not be able to handle such large files generated by emulation. Overall, even an emulation methodology based on file-based flow is not the right solution for exploring and analyzing power at the SoC level.

So, what’s the alternative for detailed power analysis of design regions and applications that cause high switching in SoCs in real scenarios? During 52[SUP]nd[/SUP] DAC we heard from Mentorand ANSYSabout an innovative approach where Veloce generates real-time dynamic power data and PowerArtist reads it directly for power measurement and analysis without any file-based interface. I have already written about some details on PowerArtist side; see the link at the end of this article. Now it’s my pleasure to write about some actual details of what happens on Veloce side.

Veloce emulation system is used to boot the OS and run live applications. The Veloce Activity Plot is its unique capability to identify high switching regions over long emulation runs and enable designers to trace back to the logic blocks or applications that have power concerns. One can view the activity plot of a full design and analyze its power consumption pattern in an order of magnitude lesser time than the time taken in a file-based system. As an example, the activity plot of a 100 million gate design for 75 million clock cycles can be generated in just 15 minutes by Veloce emulation system.

After identifying high switching activity regions at the top level of the design, the sub-blocks or applications responsible for high switching activity can be analyzed further. The time zone information thus obtained can be captured in a tzf (Time Zone File) file and sent to Veloce for generation of complete data for the selected time windows for detailed power analysis.

During the emulation run, live switching data can be sent to PowerArtist through Dynamic Read Waveform API (DRW-API). This approach enables accurate power calculation at the system level where booting an OS and running software applications is required. The dynamic API-based streaming of switching data between emulation and power analysis tools allows for all operations to be run in parallel including emulation of the SoC, capture of switching data, reading of the switching data, power analysis and generation of power numbers.

Both the tools work on the same data model which improves the efficiency of data exchange between them. The compile times of both tools are aligned. Also, a native ‘critical signal list’ (typically 10-20% of total signals in a design) is used to further improve time to power performance by reducing data exchange between the tools.

Veloce along with PowerArtist enables a complete RTL power exploration for power budgeting and tradeoffs, as well as an accurate power analysis and signoff at the gate level in a targeted application environment. The direct data exchange provides huge improvement in the time of overall flow. The verification cycles to collect design switching activity can be very long compared to simulation. Data-driven decisions for accurate power analysis are enabled over a variety of test scenarios.

This particular flow to generate power numbers has provided up to 4.25x speed improvement over file-based flow on real customer designs. A table of designs along with their speed improvement numbers is given in a whitepaper written by Vijay Chobisa and Gaurav Saharawat at Mentor Graphics. The whitepaper contains more details about the new methodology; it can be accessed from HERE.

Also read: How PowerArtist Interfaces with Emulators

This innovative methodology unfolds a powerful, accurate, and right way of power measurement and analysis in a new environment where multiple functions such as computing, gaming, video streaming, watching movies etc. can be integrated on a single device such as a Smartphone. We can expect more such innovations for power analysis in the near future.

Pawan Kumar Fangaria
Founder & President at www.fangarias.com


A Candid Conversation with the GlobalFoundries CEO!

A Candid Conversation with the GlobalFoundries CEO!
by Daniel Nenni on 07-25-2015 at 8:00 pm

I did not know Dr. Sanjay Jha prior to this meeting but I certainly knew of him from his time at Qualcomm. It seemed a bit odd for me to fly to Dresden to meet a man that is based here in Silicon Valley but that made the meeting all that more interesting. Especially after finding out the German Chancellor Angela Merkel would also be visiting Fab 1, absolutely.

While at Qualcomm Sanjay was known as a technically brilliant man with exemplary business skills to which I agree whole heartedly. After 20 years at Qualcomm, which started in design and ended in the executive ranks, Sanjay joined Motorola as CO-CEO and spun out the mobility division which he later sold to Google for a whopping $12.5B. The next time I heard Sanjay’s name was on the short list for the next Intel CEO which I think would have been an excellent choice. That position of course went to an Intel insider (Brian Krzanich) and much to my surprise and great pleasure Sanjay joined GlobalFoundries. Knowing what I do about him I expected great things but I never would have expected the acquisition of the IBM semiconductor division. Not only did GF acquire it, they got paid $1.5B! Talk about the epic deal of the century!

One thing I should warn you about when talking to Sanjay is to be careful what you ask technically because he can go down to the transistor level, no problem. We had a very interesting discussion about Vts and body bias constraints. It is interesting to note that body bias is not mandatory when designing to FD-SOI but if you choose to it can be used at the chip and block level so a designer can do what-ifs and decide how best to leverage it but I digress… I will have one of our PhD bloggers write about body biasing in more detail because it is a point of contention between CMOS and FD-SOI design it seems.

After the interview I was afforded a trip through the Fab 1 clean room. Certainly not my first time through a clean room but it is always exciting to see the inner works of semiconductor manufacturing. In fact one of my first jobs out of college was in a fab on Mathilda Avenue in Sunnyvale circa 1984. I remember seeing the “No Accidents in X Days” sign for the first time and the X was in single digits which was frightening considering how toxic fabs could be back then. And I have no idea why they call clean room outfits bunny suits because there is nothing cute about them!

Other bloggers covered the technical details of the 22nm FD-SOI announcement:

GlobalFoundries 22nm FD-SOI: What Happens When

GlobalFoundries FD-SOI. Yes, It’s True

GlobalFoundries Endorse ST/LETI FD-SOI 22nm!

And Paul McLellan just added two more interesting FD-SOI blogs here:

FD-SOI: a Gentle Introduction

Thomas Skotnicki: FD-SOI 26 Years in the Making

Sanjay also committed an additional $250M investment for 22nm development and capacity. This brings the total Dresden investment to more than $5B since 2009 ensuring that Fab 1 will continue to be the largest semiconductor manufacturing facility in Europe. Hopefully Chancellor Merkel was impressed and offers matching funds or Government incentives of some sort.

Also Read:

CTO Interview with Dr. Wim Schoenmaker of Magwel

IROC Technologies CEO on Semiconductor Reliability

CEO Interview: Jens Andersen of Invarian


Device Noise Analysis, What Not to Do for AMS IC Designs

Device Noise Analysis, What Not to Do for AMS IC Designs
by Daniel Payne on 07-24-2015 at 12:00 pm

AMS IC designers have a lot to think about when crafting transistor-level designs to meet specifications and schedules, so the most-used tool in their kit is the trusted SPICE or FastSPICE circuit simulator to help analyze timing, power, sensitivity and even device noise. I just did a Google search for “device noise analysis ic” and the first five of ten search results all have links to Mentor Graphics, so that led me to read their recent White Paper entitled, “Ten Common Device Noise Analysis Mistakes“.

The three types of Device Noise Analysis are listed in this table for a variety of AMS circuits:


The basic idea is to run the appropriate type of noise analysis depending on the circuit type. If you are using a standard SPICE circuit simulator, then the following list of 10 mistakes will be helpful to avoid, or you could consider using the Analog FastSPICE(AFS) tool from Mentor that is equipped to handle each issue.

1. Insufficient Transient Accuracy
An AMS circuit may require 80 dB to 120 dB of dynamic range, however the default SPICE tolerances typically default to a reltol=1e-3 value, providing just 60 dB of dynamic range. For each 20 dB of dynamic range then tighten reltol by a factor of 10X smaller.

2. Periodic Noise Analysis with Too Few Sidebands
The default number of sidebands in SPICE for periodic noise analysis is under 50, however for 45 nm and smaller geometry process nodes you will want to use greater than 1,000 sidebands. Traditional SPICE run times increase quadratically with the number of sidebands, so you can quickly run out of RAM. The magic sauce with AFS RF allows you to run full-spectrum device noise analysis, which has an accuracy equivalent to an unlimited number of sidebands per run.


Switched-Capacitor Filter Noise Voltage Comparison

3. Simplifying Circuits for Periodic Noise
Using SPICE for periodic noise analysis consumes too much RAM, so you may be tempted to just simplify your netlist by removing transistors and other elements. The big downside is that you could make a mistake in your reduced netlist, plus you are taking up valuable engineering time. With the AFS RF tool you can simulate circuits with more than 100K elements, a much easier approach.

Related – Full Spectrum Analog FastSPICE Useful for RF Designs on Bulk CMOS

4. Not Including Parasitics in VCO device noise analysis
With limited capacity in SPICE it’s tempting to simulate a VCO using a pre-layout netlist with no parasitics, although the values your simulation returns aren’t really accurate without parasitics.

5. Using Oscillator Noise instead of VCO noise
Having a circuit simulator that can perform only free running oscillator noise is quite a limitation if you really have a voltage controlled oscillator (VCO), because the analysis results are quite different. The AFS RF tool handles both free running oscillator and VCO noise analysis.

6. Manually Analyzing VCO Sensitivity
One part of optimizing a VCO is to minimize phase noise, and you can perform this analysis manually by taking more time, increasing the possibility of errors, and really only approximating the values. A more elegant approach is to visually see the instantaneous noise from every device as it contributes to the VCO output noise. Here’s what that analysis looks like with AFS RF for the noise intensity, sensitivity and jitter:


​VCO Noise Contribution Sensitivity

7. Not Updating Transient Noise for Every Device at Every Time Step
A traditional transient noise simulator may update the random device noise injection at fixed time intervals, but not at every simulator time step, causing inaccuracy. With the AFS simulator you will see accurate transient noise results by injecting noise at every time step.

8. Setting Transient Noise NoiseFmax Too Low
With SPICE you can set the noisefmax simulator setting to trade off runtime and accuracy, choosing a smaller noisefmax gives a shorter runtime but with lower accuracy, choosing a higher noisefmax gives better accuracy but at slower runtimes. It’s hard to tell when you’ve made the right trade off with noisefmax. Since AFS uses the full noise spectrum from noisefmin to noisefmax there is no truncating the device noise spectrum.

Related – DAC Update on IC Design Tools at Mentor

9. Setting Transient Noise Tstop Too Short
Users get to set their tstop value controlling how many cycles a simulation runs, but choosing a value that is too short means that the results could be statistically uncertain. When using the AFS simulator for transient noise analysis you get a recommended tstop value that gives the desired statistical confidence level.

10. Post-Processing Mistakes
Many AMS designs have performance metrics measured in the frequency domain, while a transient noise analysis is in the time domain so you have to use a fast Fourier Transform (FFT) post-processing on the data. Be cautious about post-processing because it impacts accuracy by:

  • Extra spectral leakage beyond 2.5 FFT bins
  • Some signal frequencies are not perfectly centered
  • An FFT window which doesn’t minimize spectral leakage
  • Using MATLAB default FFT windows


Excessive Spectral Leakage

With AFS there’s CalcPad that does the post-processing of transient noise waveforms to ensure correct FFT-based results.

Summary
Device noise analysis is required for AMS IC designs to reach performance goals but there are many pitfalls when using a traditional SPICE circuit simulator. Choosing a SPICE circuit simulator like Analog FastSPICE from Mentor Graphics will help you to do that safely and quickly. Read the complete 6 page White Paper here.


FD-SOI: a Gentle Introduction

FD-SOI: a Gentle Introduction
by Paul McLellan on 07-24-2015 at 7:00 am

Over the last couple of weeks, FD-SOI has been in the news with GlobalFoundries announcement of a 22nm FD-SOI process that will run in the Dresden Fab. Also, earlier in the week I talked to Thomas Skotnicki about the saga (and it is a saga) of how FD-SOI got from his PhD thesis to volume manufacturing and global deployment. But there is a lot less knowledge around about FD-SOI than there is about FinFET so I thought it would be good to go back and see where the motivation for a process like that came from.

See GlobalFoundries 22nm FD-SOI: What Happens When

See Thomas Skotnicki: FD-SOI 26 Years in the Making


By 28/20nm planar processes were running into problems. The channel area underneath the gate was getting very short and the gate was no longer powerful enough to control it properly. It could control the top part of the channel but the further from the gate the less the control. In particular, when the gate was off there were paths between source and drain that remained on and so there was very high leakage. See the diagram above. It was clear that a new transistor architecture would be required.

The basic constraint was that all of the channel needed to be close to the gate so that it could be controlled properly. One way to do this was to make the channel into a thin vertical fin (like a shark’s fin, that is where the name comes from) and wrap the gate around it which gives you FinFET. Since the fin is thin, it is never far from the gate and control is good and leakage is low.

The alternative is to go horizontal. If a thin channel is put on top of an insulator, and the gate is built on top of that then there is once again good control and low leakage. There are simply no paths through the channel that are far from the gate and so poorly controlled because the insulator is…well, an insulator. Current cannot flow there. The transistor is not quite as good as FinFET since it only controls one side of the channel, but it is a lot easier to manufacture. That is thick-box FD-SOI (box just stands for buried oxide, the insulator underneath the channel). If, however, the box is very thin then in effect the substrate itself becomes a sort of second back gate and can further be used to control the channel, not to turn it on and off but to affect its performance. See the diagram above.

The above diagram shows how forward body bias (FBB) works. FBB of up to about 1.5V can be applied (GF reckon they can go to 1.8V at 22nm). When higher performance is required the FBB is applied. The transistors are faster at a cost of slightly higher leakage. When the design is in a standby mode of some sort then the FBB can be removed to reduce the leakage again. Alternatively, the power supply voltage can be reduced, which would slow the transistor down too much to meet timing, and then FBB can be used to speed it up again. But the voltage is still down, reducing leakage and, of course, dramatically reducing dynamic power (voltage is squared in the power equation).
Since the FBB is under software control, it can be used very much like dynamic voltage and frequency scaling (DVFS) under software control and the EDA and signoff flows required for FBB are almost identical. The above diagram shows for the GF 22FDX process the effect. The red line is 28HKMG, the green is 22FDX and the blue is 22FDX with FBB of 1.5V.

FBB can also be used to reduce overdesign by recentering the parts and move slow and typical parts to the fast end of the distribution, and to narrow the distribution. One of the big selling points of the GlobalFoundries 22FDX process(es) is that it can reduce the power supply voltage all the way to 0.4V.

You might have heard that FD-SOI uses an expensive starting material (wafer blank) and that is true. But, in effect, that is because a couple of “mask steps” have already been done. I put it in quotes since no masks are involved. The wafer blank looks like the above picture. For 28nm the top silicon layer is 12nm thick and the box is 20nm. Some (5-6nm) of the top silicon is lost during processing leaving a channel depth of about 6nm. For 22nm these numbers are further reduced. The rest of the process is actually simpler than the equivalent bulk process involving fewer masks and fewer mask steps (and about half the number of FEOL masks as the equivalent FinFET process, although the BEOL is, of course, the same).

Any SOI technology has some inherent advantages over bulk:

  • no well taps required
  • isolation makes on-chip RF much simpler
  • higher resistance to latch up
  • inherently more radiation hard (in fact SOI was first used in space)
  • reduced parasitic capacitance (this is one of the challenges with FinFET)

Thin box FD-SOI has the further advantage of forward body bias. And the FD (fully depeleted) part makes manufacture easier since zero is a very easy number to keep under control.


Intel to Skip 10nm to Stay Ahead of TSMC and Samsung?

Intel to Skip 10nm to Stay Ahead of TSMC and Samsung?
by Daniel Nenni on 07-23-2015 at 12:00 pm

Quarterly earning calls are a great source of information but they can also be a source of confusion and generally it is an unhealthy combination of the two. On one hand these earning calls are to appease the financial community. On the other hand, in my opinion, these calls are also used to generate fear, uncertainty, and doubt amongst the competition, absolutely.

Intel’s (INTC) CEO Brian Krzanich on Q2 2015 Results – Earnings Call Transcript

Foundry Business
Before we get to the prepared statement I just wanted to point out that Intel Custom Foundry (ICF) was not mentioned in the call nor was it part of the Q&A. Intel (and most publicly traded companies) generally do not announce when they are quitting something (Itanium for example), they just let it go quietly off into the sunset. My guess is that Intel will not continue in the foundry business.

Today the foundry business really does revolve around mobile SoCs. It started with Qualcomm at 28nm but Apple took over at 20nm and again at 14nm. By revolve around I literally mean that the semiconductor manufacturing processes are built for SoCs and adapted for everything else. Apple writes some REALLY big checks and that is what makes the capital intensive foundry business work. Apple will again “influence” 10nm and 7nm with their technical and financial prowess and that just does not fit with Intel’s process development culture, my opinion.

Mobile Business

“We have also updated our mobile roadmap. Our OEMs’ first Atom x3, x5, and x7 products were announced and are ramping using our previously code named Cherry Trail SoFIA 3G and SoFIA 3G-R products. The 4G version of our Atom x3 platform, SoFIA LTE, is sampling now for network certification, and is expected to ship in volume in the first half of next year. Our latest LTE modem, the CAT-10 7360, is on track for shipments to customers this year.”

Again, I doubt Intel will ever announce that they are quitting mobile but this is a strong signal that they are scaling back. I have also been told that Intel is cutting mobile staff and shutting down complete mobile groups all over the world.

At the top end of mobile you have device makers Apple and Samsung (who make their own SoCs) which control the majority of market share and profits. Then you have SoC giants Qualcomm and MediaTek who control the majority of the rest of the mobile sockets leaving Intel and the Chinese SoC companies fighting unprofitably for the final few. So yes, Intel will quit mobile at some point in time, my opinion.

Altera FPGA
“I’d like to shift gears now and talk about a couple of important strategic updates. Last month, we announced our plan to acquire Altera, a leading FPGA vendor. We see four key strategic drivers behind this acquisition. First, we believe we can enhance Altera’s base FPGA ARM-based business substantially. We plan to do this through our leadership in Moore’s Law and our ability to execute designs using our tools and silicon more quickly, allowing us to continue to support and develop their ARM-based products.”

I really don’t understand the “ARM-based products support” statement. If you were designing an FPGA with an ARM core inside would you really chose Altera/Intel? FPGA designs have a VERY long shelf life and I would not bet my career on the remote possibility of a healthy relationship between Intel and ARM. Not today anyway. ARM is making another play for Intel’s Data Center business, right?

“Second, history tells us that the FPGA vendor who is first to a manufacturing process node enjoys a market segment share advantage over the life of that node.”

Yes the FPGA vendor that hits the new process node first is awarded extra market share. Altera beat Xilinx to 40nm by a year or more (depending on whom you ask) and dominated that node. Xilinx came back and beat Altera to 28nm by a couple months and reclaimed leadership. Xilinx again beat Altera to 20nm by a significant margin and will dominate that node. At 14/16nm both Xilinx and Altera taped-out last quarter so it is too close to call but my bet is on Xilinx and TSMC 16FF+. It really is an amazing process and will ramp quickly. 10nm will again be close but as we have read Intel has delayed 10nm and Xilinx is already working with TSMC on 7nm so the smart money is on Xilinx.

Intel 10nm Update

“The last thing I’d like to share with you is an update related to our 10-nanometer technology transition…”

BK did a nice job of spinning the 10nm delay in the prepared statement. In the Q&A however there was a more direct question and response:

“No, I’d call it similar to what happened on 14-nanometer. Remember, on all of these technologies, each one has its own recipe of complexity and difficulty, 14-nanometer to 10-nanometer same thing that happened from 22-nanometer to 14-nanometer.”

One of the possible scenarios I see here is that Intel will improve the performance of 14nm (similar to what TSMC did with 16nm and 16FF+) and skip 10nm in favor of accelerating 7nm. This makes complete sense if Intel wants to maintain their process lead against TSMC and Samsung. It also makes sense if Intel wants to continue to cut expenses. Sound reasonable?

Everything Else

There was also a lot of good news in there about the PC, Data Center, and NAND business in which they are dominating. The Intel IoT business is of interest to me but I don’t really understand it. This is the old embedded group, right? What growth path was it on before they renamed it IoT? And where exactly is that 4% growth coming from?

You can find the full transcript HERE.

Also read: TSMC (Apple) Update Q2 2015!


Thomas Skotnicki: FD-SOI 26 Years in the Making

Thomas Skotnicki: FD-SOI 26 Years in the Making
by Paul McLellan on 07-23-2015 at 7:00 am

It seems to be FD-SOI week yet again. I talked to Thomas Skotnicki this morning. He is the father of thin-box FD-SOI and its birth is an interesting story. The story began 26 years ago (so not quite as far back as the photo!).

Thomas is of Polish origins (he is actually Tomeczek) and grew up in Warsaw where he earned his PhD. In 1983 in Canterbury, England (famous for tales and archbishops), he presented a paper at the prestigious ESSDERC conference on his PhD work. France Telecom had a research lab in Grenoble (the French equivalent of the Bell Labs of the era) and they offered him a job. But this was before Europe was unified and Poland was still in the Soviet bloc so emigrating/immigrating took a while.

Thomas worked for France Telecom for 14 years. Eventually, in 1999, they decided this research was better suited to a semiconductor firm, so they offered to transfer Thomas and his team to STMicroelectronics. At the time, although he had a team of 14 engineers, he was the only one who accepted the transfer. So he had to recreate his team and hire PhDs to continue the work. Thomas was the front-end team leader. He went on to be named the first Fellow at ST in 2006 and recently was promoted to Technical Vice-president and Company Fellow.

But the story goes back all the way to France Telecom in 1988 when Thomas first published his new approach (voltage-doping transformation) to the physics of short-channel transistors. Back then the conventional wisdom was that the “box” (actually the buried oxide under the channel) should be thick. A thick box, however, precluded body bias. In thin box FD-SOI on the other hand, because the amplitude of body bias is not limited by diode leakage, the body bias is a very important feature of the technology. It enables the large performance boost or leakage reduction. In addition, Thomas’ equations suggested the thin box was optimal for suppressing short-channel effects.

Moreover, thin box simply did not exist, as no one had previously thought to ask for it. Now that Thomas was asking, the thin box turned out to be an extremely difficult technical challenge. As a result, the whole thin box idea went into standby mode for a decade. Then, in the late 1990s Thomas and his team, including Dr. Malgorzata Jurczak in a post-doc position in the team, developed a way to create a thin box on bulk CMOS. They called it “silicon on nothing” and it was the subject of 135 papers from Thomas and his internal and external colleagues and partners. This paper trail helped the ideas get some traction. Suddenly, people who had been arguing with him at conferences and on panel sessions were publishing their own papers, promoting thin box.

In one particular instance, Thomas had a long fight over a key paper at IEEE Transactions on Electron Devices, where the editors didn’t want to publish. Then a serendipitous change of editor opened the door to publication; the paper was given the Rappaport Award, as “best publication of the year” by the IEEE Electron Devices Society. As Mahatma Gandhi said: First they ignore you, then they laugh at you, then they fight you, then you win.

With these successes building momentum, the semiconductor community finally started to believe in the idea. One important believer was Carlos Mazure from SOITEC where they make wafer blanks. SOITEC was excited by the potential of these thin-box, short-channel devices, but at the time they could only make a box 145nm thick, not the 10-20nm that was required. Under Carlos’ leadership, SOITEC was instrumental in launching the R&D program that successfully delivered thin box SOI wafers.

At this point LETI got involved. Although most of their work was on thick-box devices, they decided to collaborate with Thomas to actually fabricate his ideas into real silicon. LETI helped with both silicon-on-nothing and then with thin-box FD-SOI. Up until then it had all been equations. The whole idea gained speed once the project was transferred from the whiteboard to silicon.

Then, in 2011, Intel announced FinFET. Everyone already knew about FinFET and it was known to be really difficult technology. The complexity of FinFETs and the concerns about efficiently producing it led to raucous debate within the industry and within companies. Thomas sold the deal at ST when he showed that by turning a FinFET on its side you pretty much had silicon-on-nothing, FD-SOI with a thin box. It was the biggest day of Thomas’ professional life when ST’s top management, including CEO Carlo Bozotti, COO Jean-Marc Chery, and EVP of Front-End Manufacturing Joël Hartmann made the decision to take its Ultra-Thin Body and Box FD-SOI to manufacture. Thomas recounted that from initial conception and equations to industrial fabrication it took 26 years.

Industrialization of the manufacturing process went fast since the technology worked even better than the equations and FD-SOI is a much simpler technology than FinFET—it leverages the learnings of planar (bulk) silicon with fewer masks and processing steps, albeit with a slightly more expensive wafer.

Still, selling FD-SOI beyond ST took a bit more time, as initially ST was alone and customers require partners, second sources, alliances and not just a single manufacturer. Today, however, the technology is being deployed worldwide not just at ST but also at Samsung and GlobalFoundries.

As a marketing guy, I can’t but help noticing a missed opportunity. “Silicon on nothing” is a much better name than FD-SOI.


Taking prototyping beyond prototypes

Taking prototyping beyond prototypes
by Don Dingee on 07-22-2015 at 12:00 pm

Everyone has heard the expression, “Half the job is having the right tool.” In the case of FPGA-based prototyping, however, the right tool for the job is only the beginning. What teams really need to think through is what exactly should be done with an FPGA-based prototyping tool?

The obvious answer is prototyping an SoC, pre-silicon. We go get some third party IP, some legacy IP from the previous design, and a few new IP blocks, and toss all the RTL into an FPGA-based prototyping system. Every new release of FPGA-based prototyping systems brings bigger FPGAs, so in theory more SoC designs fit in a given system. But, is it worth the trouble of going through the hassle of partitioning a design and tweaking it for debugging?

I’d submit that the challenge is not getting your RTL to “work”. Competent design teams can create an IP block to a functional specification, and run a simulator on it, and figure out what needs to be fixed, iterating to goodness. IP blocks can then be strung together into a design and simulated – the more IP, the slower the simulation – and eventually, a design is deemed as working.

As far as you know, at least. Are there corner cases in timing between the integrated blocks? Are the I/O blocks compliant with interfacing standards? Were enough test suites run to completely validate the design? Were the IP blocks exercised simultaneously to find problems in interaction?

These are incredibly hard questions to answer comprehensively with a functional simulator. That’s why people have turned to emulator platforms – but they are budget busters. Getting those answers in emulation is expensive and still relatively slow.

What about the “what-if” factor? Is there a more efficient way to fix a problem, or even implement a functional requirement? The process of system exploration is often skipped because it is just too time consuming – fix it, and move on as quickly as possible.

S2C explores these and many other thoughts in a new 8-minute presentation on their Videos page:

Challenges and Benefits of FPGA-based Prototyping

They take on many of the objections we hear to using FPGA-based prototying systems. Some of these have been solved simply by using ultra-large FPGAs, but others are addressed through a solid engineering approach designed to increase the flexibility and usefulness of a platform in the prototyping process.

The bottom line here is these FPGA-based prototyping solutions are not just huge FPGAs glued to a board. S2C explores ideas like deep trace capture, real-world I/O via daughtercards, and the benefits of distributed development using remote system management capability. The combination of architecture, hardware, and software makes this more than just “a tool.”

I’d like to get some feedback and discussion, not so much about product features as about the state of the FPGA-based prototyping concept. Are the challenges and benefits S2C is describing in the presentation ones you are experiencing? What other concerns are there with using an FPGA-based prototyping system? Is there another strong benefit that isn’t being talked about much? We’ll ask an S2C representative to respond to your ideas.