RVN! 26 Banner revised (800 x 100 px) (600 x 100 px)

Develop High Performance Machine Vision in the Blink of an Eye

Develop High Performance Machine Vision in the Blink of an Eye
by Paul McLellan on 08-21-2014 at 7:01 am

The growing capabilities of silicon along with improved algorithms means that machine vision is becoming increasingly important since more and more systems can be built in such areas as manufacturing, intelligent traffic management, bar code scanning, counterfeit detection and even sports simulation. Is that a 3X driver? No, it’s a 3 wood.

The traditional approach would use and FPGA for frame-grabbing and then a PC to do image analysis. The FPGA was fast and flexible and could do some pixel level processing; the PC was easy to program and there were lots of open source algorithms. Now the best of both worlds can be combined in a Xilinx Zync-7000 all-programmable system with the FPGA for speed and real-time response and the processor for the software environment. No PC, no cable bottlenecks, small and power efficient. What’s not to like?

Above is a video to show the difference, made at the SPS/IPC Drives conference held in Nuremburg earlier this year. To compare the old way of doing things to the new, Xilinx built two implementations.

  • HALCON machine vision software runs on the Cortex-A9 in the Zync but the FPGA fabric is not used. Result: 16 frame per second (fps) with 50% error rate
  • FPGA used for image processing. No errors even when running at 90 fps


So how do you program one of these machine vision systems. The old way would be to learn a hardware description language if you don’t already know one (and if you are a software guy you almost certainly don’t). Or you can use embedded Visual Applets (eVA) from Silicon Software and accelerate your productivity. There are over 200 machine vision operators and they map directly into the FPGA. So even though you are a software guy you can do hardware design!


The design entry is done at a very high level. But the code efficiency is very close to hand-crafted HDL. This produces extremely short design cycles. Like 15 minutes. Obviously great for time-to-market, ease of experimentation and more. It is intended for software engineers, application engineers and machine vision experts. FPGA programming skills are not required. The eVA core is completely inside the glue logic all on the same silicon fabric.

Programming is a four step process:
[LIST=1]

  • Enter eVA platform description in a simple XML file. Image and output ports, memory interfaces, FPGA type, clock, bit width and so on
  • Generate the eVA platform installer which produces a VHDL/Verilog black box and testbench to integrate, simulate and synthesize the generated IP code. Your platform is now ready to use Visual Applets in the next step
  • Design your application. Just install the VA and the platform installer. Simple verification with the built-in DRC, simulation and bandwidth analyzer tools
  • One click build in VA generates the bitstream to program the FPGA along with an SDK C-code for easy integration and GenlCam XML file and C-code API for parameterization. You can then use it!

    To see a webinar on this topic, go here. Learn more about Xilinx Smarter Vision here.

    Coming up soon is the Avent-Xilinx XFest (not specifically focused on vision). This is a grand international tour. First pick your city:

    • US: Irvine | San Diego | San Jose | Boston | Baltimore/North Virginia/DC | Minneapolis | Dallas | Toronto | Vancouver | Montreal | Chicago
    • Europe: Gaydon, Warwicks | Oslo | Madrid | Antwerp | Milano | Paris | Stuttgart | Odense | Warsaw
    • Asia: Beijing | Xian | Sydney | Singapore | Shanghai | Shenzhen | Seoul | Guangzhou | Chengdu | Shenyang | Nanjing | Hsinchu | Bangalore | Taipei | Hangzhou
    • Japan: Tokyo | Osaka

    Then find out dates and to register to attend XFest here. It’s free.


    More articles by Paul McLellan…


  • Substrate coupling analysis method and tool

    Substrate coupling analysis method and tool
    by Jean-Francois Debroux on 08-20-2014 at 4:00 pm

    There has been a lot written on this topic, and some expensive tools proposed to solve this issue, but it is still a concern and a mystery for many designers. The point is that whatever efforts you do, the substrate is common to an entire chip and can cause some undesired coupling if not managed properly and at an early stage. As a start point, we can state that any block in a chip can inject current in the substrate, either through capacitive coupling from metal lines or from wells or through parasitic devices such as bipolar transistors if a PN junction is forward biased.

    All the currents that are injected in the substrate combine through an impedance network that has one pin per injection point. As a result, each injection point, instead of having a quiet voltage, has some “noise” resulting from its own activity but also from the activity of other blocks/devices. And since each block exhibits some sensitivity to the noise on its substrate, the overall circuit may fail to reach target performance.

    It is easy to have substrate pin for every block in a circuit. You just have to connect all the devices or wells diodes substrate pins to a dedicated substrate net/pin instead of a global. The point is to choose the right granularity. My advice at that point is that a coarse grain is much better that nothing and can always be refined later on as needed.

    The hard point is to get the impedance network to connect all the substrate pins. Again, an approximate network is much better than nothing. A detailed analysis I could write about in another post shows that most series resistance to a substrate tie is located close to the tie. This is the reason why substrate ties are usually grouped in long lines and why in many process one has to put substrate ties every some distance. If this is done properly, the substrate inside a cell can be considered as a constant voltage area AT THE SILICON SURFACE. Of course, as you go deeper in silicon, things change.

    Then,as a first approximation one can consider there is one access to the substrate per cell. As a start point, one can consider only the top level cells or those that are far enough from others. For many analog and mixed signal chips, the number of cells at that point is rather small, some tens at most. I used my home made 3D Field solver “EZMod3D” to perform the substrate equivalent schematic extraction. Here is an example for a small mixed signal chip I designed a while ago:


    On this picture, one can see the 11 top level blocks, 1 being digital, 4 being noisy analog and 6 being sensitive analog. The blocks are described as constant voltage limits on top. The substrate has been described as a low resistivity P+ thick layer at the bottom and a higher resistivity epitaxial P- thin layer on top. The data for these layers were available in the process design manual. No need to create complicated tech files, just enter the physical characteristics of layers.

    N nodes connect by N*(N-1)/2 impedances. For N=11, 11*(11-1)/2=55 Impedances

    Considering resistivity values, permittivity values and operating frequencies, in this case, impedances appear to simplify simplify to resistances. EZMod3D will automatically set all the limits but one to GND, drive the non GND one with a voltage source, simulate that structure and extract all the currents in all the GND pins. This allows computing the N-1 impedances between the non GND pin and all the GND pins. Then moving around the non GND pin in N-1 simulations gives the whole picture and the equivalent netlist can be created.

    I had to run the extraction again for the purpose of getting figures to write that post. EZMod3D required 40 Mbytes memory to run and completed in about one hour and a half on a 4 years old core i7 laptop. It issued directly the following netlist:

    .SUBCKT EQUIV
    + Node2
    + Node3
    + Node4
    + Node5
    + Node6
    + Node7
    + Node8
    + Node9
    + Node10
    + Node11
    + Node12
    R1130 Node2 Node3 1.02408199664998670e+001
    R2130 Node2 Node4 2.17593122679717650e+001
    R3130 Node2 Node5 1.62515077027132730e+001
    R4130 Node2 Node6 8.51530376823227900e+000
    R5130 Node2 Node7 2.62288581224144860e+001
    R6130 Node2 Node8 2.52566606961607540e+001
    R7130 Node2 Node9 6.66757628924332550e+001
    R8130 Node2 Node10 1.13673924166039660e+001
    R9130 Node2 Node11 3.30231595959797420e+001
    R10130 Node2 Node12 2.56098355567696920e+001
    R2131 Node3 Node4 1.48623723718801100e+001
    R3131 Node3 Node5 1.13061579624792760e+001
    R4131 Node3 Node6 7.77357320381043330e+000
    R5131 Node3 Node7 1.98342928529872720e+001
    R6131 Node3 Node8 1.88483495757762750e+001
    R7131 Node3 Node9 6.07654417616208920e+001
    R8131 Node3 Node10 9.92209211715984910e+000
    R9131 Node3 Node11 2.57862772544557010e+001
    R10131 Node3 Node12 2.02746422406297140e+001
    etc…

    No need to go further here, there is no added value in reading this stuff!

    At the time I designed the chip, the netlist could be simulated together with the circuit schematic and the package equivalent schematic, connecting the substrate pins of the cells to the substrate schematic and connecting the package pins appropriately. The simulation showed a number of issues that were not visible without substrate coupling. In particular, some fast current pulses resulting from simultaneous conduction in large CMOS inverters did couple through and caused undesired signals in the quite area. Usually when you face a substrate coupling issue you have to reduce the amplitude of disturbing signal through all possible design techniques, reduce the sensitivity of the disturbed cell and reduce coupling through substrate by a couple of technique I may detail in another post.

    In the example case I decided to use two substrate zones, with two separate pads and package pins. EZMod3D predicted 6 ohms resistance between the two substrate zones. Together with 0.5 ohms for bonding, this approach improved substrate coupling by more than 20 dB. When the samples came in, one of the first measurements I did was the resistance between the two substrate pins. The measured value was 7 ohms. Pretty good agreement showing the validity of assumptions. And the coupling between noisy and quiet cells met the specification. Since that time, I have simulated substrate coupling for many chips and that often helped me optimizing the floorplan before any layout was done, saving lots of manpower and probably some runs.


    EDA Ice Bucket Challenge!

    EDA Ice Bucket Challenge!
    by Daniel Nenni on 08-20-2014 at 7:00 am


    In case you have not heard, the Ice Bucket Challenge is a social media program aiming to increase the awareness of ALS (Amyotrophic lateral sclerosis AKA Lou Gehrig’s Disease). One of our neighbors recently passed away as a result of ALS so this challenge is dedicated to Barbara Letts. After hearing about the challenge my daughter Ciara eagerly volunteered to help, a little too eagerly if you ask me. For whatever reason she REALLY REALLY REALLY wanted to dump a bucket of ice water on me so here it is:

    http://www.semiwiki.com/forum/files/IMG_0593.MOV

    Now I get to challenge other people to do the same. For the greater good of ALS and the EDA industry I challenge:

    You gentlemen have until Saturday midnight to get creative and douse yourself with a bucket of ice water and send me the video to post on SemiWiki. For those who do not accept the challenge you can make a donation to the ALS charity of your choice or pass the challenge to a coworker. For every video we publish SemiWiki will make a donation to the ALS Association.

    Also Read: The EDA Ice Bucket Challenge Just Got Real!

    As of yesterday the ALS Association has received in excess of $20 million in donations which is more than ten times the amount at the same period last year. More than 450k new donors were also added so yes, this is more than just water play. It’s all about branding and building awareness, absolutely.

    Bill Gates getting iced was the first video I saw and there are many more worth viewing. Search #IceBucketChallenge on YouTube or Twitter and have some fun!

    About ALS
    ALS was first found in 1869 by French neurologist Jean-Martin Charcot, but it wasn’t until 1939 that Lou Gehrig brought national and international attention to the disease. Ending the career of one of the most beloved baseball players of all time, the disease is still most closely associated with his name. Amyotrophic lateral sclerosis (ALS) is a progressive neurodegenerative disease that affects nerve cells in the brain and the spinal cord. Motor neurons reach from the brain to the spinal cord and from the spinal cord to the muscles throughout the body. The progressive degeneration of the motor neurons in ALS eventually leads to their death. When the motor neurons die, the ability of the brain to initiate and control muscle movement is lost. With voluntary muscle action progressively affected, patients in the later stages of the disease may become totally paralyzed.

    Most commonly, ALS strikes people between the ages of 40 and 70, and as many as 30,000 Americans have the disease at any given time. ALS has cut short the lives of other such notable and courageous individuals as Hall of Fame pitcher Jim “Catfish” Hunter, Senator Jacob Javits, actors Michael Zaslow and David Niven, creator of Sesame Street Jon Stone, television producer Scott Brazil, boxing champion Ezzard Charles, NBA Hall of Fame basketball player George Yardley, pro football player Glenn Montgomery, golfer Jeff Julian, golf caddie Bruce Edwards, British soccer player Jimmy Johnstone, musician Lead Belly (Huddie Ledbetter), photographer Eddie Adams, entertainer Dennis Day, jazz musician Charles Mingus, former vice president of the United States Henry A. Wallace and U.S. Army General Maxwell Taylor.


    Silvaco News: Silicon Valley, China and Korea

    Silvaco News: Silicon Valley, China and Korea
    by admin on 08-20-2014 at 3:00 am

    Silvaco is one of the sponsors of the GSA Executive Forum to be held over in VC Land at the Rosewood Sand Hill on September 10th. Note that it starts at 11.45am with a networking lunch.

    • The featured keynote speakers are Fareed Zakariah and Rana Faroohar, both of CNN. Rana is also Senior Managing Editor of Time.
    • The first panel session is about Driving Contextual Technology. The moderator is Gary Shapiro of CEA. Participants are from Intel, Cisco, Plantronics and SoftKinetic.
    • The second panel session is about Powering the Digital Economy. The moderator is Stephen Gray of CSR (now the official name of what we used to know as Cambridge Silicon Radio). The participants are from Intel-McAfee, Juniper, AT&T, Ericcson and Alcatel-Lucent.
    • At the end of the day, just before Fareed’s keynote and Q&A is the CEO panel. You know the panel has to be good when the moderator is himself a major CEO, Aart de Geus of Synopsys. The panel consists of Sayed Ali, CEO of Cavium; Tzu-Yin Chu, CEO of SMIC; Scott McGregor, CEO of Broadcom; and Steve Mollenkopf, CEO of Qualcomm.

    Full details including a link for registration are on the GSA website here.


    Silvaco’s commitment to the China market was given a significant boost last month when the company entered into an agreement with Shanghai Research Institute of Microelectronics of Peking University (SHRIME/PKU) to jointly establish a development laboratory. This development platform will enable a better understanding of the requirements for TCAD and EDA tools by the design community in China. It will enable them to rapidly develop solutions targeted to the specific needs of the local market.


    Iliya Pesic of Silvaco gave an overview of the company and the business development situation in China. He highly appreciated the close cooperation between SHRIME/PKU and Silvaco over the past seven years and hoped that, in the future, on the foundation of the jointly established laboratory platform, both sides can get closer to the Shanghai microelectronics industry users, develop and provide more professional and precise technology solutions and services to promote the development of the local industry.


    Also, last month in Asia, IC Design Education Center (IDEC), located in University of KAIST, and Silvaco Korea announced that IDEC has instituted Silvaco’s complete AMS design flow solutions to deliver its VLSI design instruction and chip fabrication services for leading national, public, private universities and colleges within Korea.

    They also announced the availability of some Tower/Jazz PDKs at IDEC. Tower/Jazz have been working with IDEC for over 4 years:

    • TS18PM (Power Management 0.18um)
    • TS18IS (CMOS image sensor 0.18um)
    • CA18HD (Analog/RF CMOS 0.18um)
    • SBC18HA (SiGe 0.18um)

    As David Halliday, Silvaco’s CEO, said:Silvaco is committed to supporting Korea’s excellent higher education system. This partnership ensures that Korean universities have the most advanced tools to train tomorrow’s engineers for the challenges they will face when they enter Korea’s pioneering semiconductor industry.


    More articles by Paul McLellan…


    USB 3.0 IP on FinFET may stop port pinching

    USB 3.0 IP on FinFET may stop port pinching
    by Don Dingee on 08-19-2014 at 5:00 pm

    Sometimes a standard is a victim of its own success, at least for a while as the economics catch up to the technology. When a standard like USB 3.0 is announced, with a substantial performance increase over USB 2.0, some of the use cases come on board right away. Others, where vendors enjoy a decent ROI with good-enough performance, take longer to embark.

    Good question, Harry, and it isn’t just PCs. Just as small numbers produce some spectacular CAGR figures when in reality there isn’t a lot of news yet, large numbers can mask a significant amount of progress being made. If we look at some of the USB 3.0 use cases:

    • Keyboards, mice, speakers, microphones: nice volumes, don’t need 190 MB/sec
    • Printers, scanners: sweet spot, benefit a lot from faster transfers
    • External hard drives: if only people understood how critical backup is, more would sell
    • Smartphones, tablets: faster charging may be more important than faster transfers
    • Flash drives: big volume, need speed, difference between cheap and inexpensive

    In that analysis, in a few cases performance clearly wins – in other cases, people would take more speed if it were free. Therein lies the problem.

    TrendForce released a study in September 2013 indicating USB flash drive penetration for USB 3.0 was only about 10% at the time, expected to grow to somewhere between 20-25% this year. What’s the holdup? In large part, the popularity and profitability of USB 2.0 flash drives has made it unnecessary to invest right way in transitioning to USB 3.0. Combine that with the fluctuations in NAND supply and the overall lack of USB 3.0 flash controllers so far, and the result is slow going in terms of percentages.

    It takes two to tango; both hosts and targets need to be up to speed on USB 3.0 to fully realize the benefits. The other side of this coin is real estate, especially on the host where four, six, eight, or more ports can be involved. Just upgrading all ports to USB 3.0 might seem like a no brainer, but there is only so much footprint available on SoCs. A port pinched is a dollar earned.

    All this points to a need for more space-efficient USB 3.0 IP, which is where Synopsys comes in. The recent announcement of DesignWare USB 3.0 femtoPHY IP cuts the implementation area to less than 0.5 sq mm, and reduces pin count for further savings on the periphery. Concurrently, Synopsys introduced an enhanced USB 2.0 IP package as well.


    Synopsys hasn’t skimped on performance or features, in spite of a claimed 50% footprint reduction over the previous picoPHY IP packages. These are fully certified to the latest USB-IF specifications by a third party. They claim this is the first certified USB 3.0 IP for 14nm FinFET at Samsung, and both implementations are also available on 28nm.

    In a new Synopsys white paper, Gervais Fong, senior USB product marketing manager, steps through five key benefits beyond the raw transfer rates USB 3.0 offers: area reduction, power reduction, production yield, reliability, and evolution of features such as the improved battery charging. He outlines the challenges faced in getting 1.8V I/O working, and in meeting specifications in 14nm FinFET.

    Meeting the USB IP Requirements of SoC Designs from 180-nm to 14/16-nm FinFET

    There is also a video starring Fong discussing the insights (scroll down the page).

    So, Harry, to answer that Twitter question: it might be that there hasn’t been much small, solid, certified USB 3.0 IP out there until now. DesignWare USB 3.0 femtoPHY IP may finally remove the urge to port pinch in SoC designs and help expand USB 3.0 adoption more quickly.


    Vlang – Opportunities Galore for Productivity & Performance

    Vlang – Opportunities Galore for Productivity & Performance
    by Pawan Fangaria on 08-19-2014 at 2:01 pm

    Yes, verification technologies are open to innovation for improved productivity and performance in the face of ever growing SoC/IP design sizes and complexities. There is not much scope left in processor speed to improve, other than multi-core processors in servers which again need software properly architected to be thread-able and scalable to speed up the simulation run. So, where are the opportunities? The hidden secret is that there are many programming features, styles, verification features, compiler technologies and other performance enablers which can be manifested into the hardware design & verification languages and exploited to improve performance of simulation and productivity of design and test infrastructure development.

    Since there is ample scope of improvement in the design of a hardware description language from productivity and performance standpoint, we regularly see new languages coming up, often subverted by standardization paradigm, leaving divided choices for designers to choose one over the other – standard language or more powerful language. Today we have Verilog, VHDL, SystemVerilog, SystemC, SpecC, SystemRDL, ‘e’, and many others, usually the powerful ones are not widely heard but used by designers and developers internally; they switch to standard ones when interoperability is needed. I need not mention there are smart SoC/IP vendors who use their proprietary languages/compilers for faster verification.

    Coming to Vlang, what is this new incarnation? This is a new powerful hardware verification language, derived from the powerful open source language ‘D’. This is a language which appears to have best of both worlds; while it enhances the power of verification by several means, it retains ease of generic programming, clean syntax and semantics, safety and object oriented methodology like a standard language. The result is a powerful language with high performance and high productivity that enables designers and verification engineers to start early and gain higher verification coverage faster.

    Vlang is ABI (Application Binary Interface) compatible that provides much better integration with C++ compared to SystemVerilog which is limited to C with DPI (Direct Programming Interface) resulting into inefficiency in any interface between SystemVerilog and SystemC. What is important is that any method (including virtual methods) on C++ objects can be called directly from Vlang without the need of any boilerplate code. These advantages make Vlang ideal for any ESL model verification and also for creating high performance testbenches for driving emulation platforms.

    Let’s look at some of the key performance features of Vlang. It’s uniquely architected to support full blown multi-core concurrency to take advantage of multi-core processor architecture. The concurrency is available at any convenient abstraction level that can be effectively scaled for VIPs. The language supports excellent, state-of-science compile time with less code generation (without any need of boilerplate code) that compiles at lightning speed and simulates faster. It can support multiple simulators in single simulation with excellent simulation management. It supports fast and advanced constrained randomization that takes lesser simulation time.

    From the productivity standpoint, it provides high productivity matrix with clean and easy syntax to quickly learn and program, high degree of reusability with true object orientation, generic programming, template meta-programming and concepts, automatic garbage collection like any other modern language does, and UVM Standard 1.1d support with multi-core concurrent and parallel, many uvm_root instantiations.

    A good aspect of Vlang is that it is free and open source. I admire open source because it’s the source of innovation with dedicated user force. Of course there can be concerns about supportability of a version for a long time, but reliable partners shouldn’t disappoint you.

    Vlang base can be downloaded from https://github.com/coverify/vlang.
    Vlang UVM can be downloaded from https://github.com/coverify/vlang-uvm.

    You will need latest version of dmd compiler which can be obtained from D language’s homepage http://dlang.org. You can clone Vlang from github using following commands –

    git clone git@github.com:coverify/vlang-uvm.git
    git clone git@github.com:coverify/vlang.git

    Vlang is developed and maintained by Puneet Goel. He can be reached at puneet@vlang.org.

    More Articles by Pawan Fangaria…..


    Learning Cache Coherency and Cache Coherent Interconnects: ARM White Paper

    Learning Cache Coherency and Cache Coherent Interconnects: ARM White Paper
    by Eric Esteve on 08-19-2014 at 11:08 am

    Cache Coherency is the type of concept that you think you understand, until you try to explain it. It could be wise to come back to fundamentals, and ask what does coherency means to an expert. I have surf the web, found several white papers on ARM site, and now I can try to share these fresh lessons learned (or you may prefer to download directly these white paper!).

    Starting from the fundamentals is always a good idea, so I suggest you to first read this Cache Coherency Fundamentals Part1 white paper from Neil Parris. You will find the definition of coherency: “Coherency is about ensuring all processors, or bus masters in the system see the same view of memory”, and the problem quickly arise: I have a processor which is creating a data structure then passing it to a DMA engine to move. If that data were cached in the CPU and the DMA reads from external DDR, the DMA will read old, stale data.


    The author describes the three mechanisms to maintain coherency: disable caching, software managed coherency and hardware managed coherency. The first two are clearly impacting performance and power, and the hardware managed coherency through Cache Coherent Interconnect (CCI) appears as power and performance friendly and simplify software.

    Extending hardware coherency to the system requires a coherent bus protocol, and in 2011 ARM released the AMBA 4 ACE specification which introduces the “AXI Coherency Extensions” on top of the popular AXI protocol. The ACE interface allows hardware coherency between processor clusters (remember the processor and DMA engine example), it was also the key enabler for a Symetric Multi-Processor (SMP) operating system to extend to more cores. We can see that the chip makers like Samsung or Qualcomm, designing quad-cores if not octal-core application processor have taken full benefit of CCI. These products have been the gate openers for big.LITTLE designs as well as GPU Compute in Mobile applications. GPU compute include: computational photography, computer vision, modern multimedia codecs targeting Ultra HD resolutions such as HEVC and VP9, complex image processing and gesture recognition.

    Now, since you have read this first white paper, it may be time to download Cache Coherency Fundamentals Part2, and discover the products developed to support massive cache coherency:

    • CoreLink CCI-400 Cache Coherent Interconnect

      • Up to 2 clusters, 8 cores
    • CoreLink CCN-504 Cache Coherent Network

      • Up to 4 clusters, 16 cores
      • Integrated L3 cache, 2 channel 72 bit DDR
    • CoreLink CCN-508 Cache Coherent Network

      • Up to 8 clusters, 32 cores
      • Integrated L3 cache, 4 channel 72 bit DDR

    CoreLink products have allowed ARM to target enterprise applications such as networking and server, supporting high performance serial interfaces such as PCI Express, Serial ATA and Ethernet. In most applications all of this data will be marked as shared as there will be many cases where the CPU needs to access data from these serial interfaces.

    This second white paper proposes an exhaustive description and feature list of these CoreLink products. If you want to go deeper into enterprise dedicated solution, I would recommend reading this blog from Ian Forsyth, Coherent Interconnect Technology supports Exponential Data Flow Growth

    CoreLink CCN-508, described in this paper, has been designed to support the performance requirement of up to 32 cores, also including the following low power features:

    • Extensive clock gating
    • Leakage mitigation hooks
    • Granular DVFS (Dynamic Voltage and Frequency Switching) and CPU shutdown support
    • Partial or full L3 (level-3) cache shutdown and retention modes.

    In fact, low power, or better power efficiency, has been ARM’s differentiator explaining the incredible success of the company in mobile application, with probably more than 95% penetration in the billions of phone/smartphones shipped every year. Power efficiency will be the key for enterprise market penetration. Better power consumption is no more a “nice to have” feature in such power hungry market, it is becoming a “must have” and could be the Trojan horse for ARM to penetrate this high performance market. Just take a look at the (above) 32 cores architecture: it’s highly complex, high performing, but imagine that you have to integrate, package and cool thousands of such IC in the same space. And pay the electricity bill, about 2/3[SUP]rd[/SUP] of it being only dedicated to the cooling system!

    Eric Esteve from IPNEST

    More Articles by Eric Esteve…..


    SEMulator3D: GlobalFoundries Process Variation Reduction

    SEMulator3D: GlobalFoundries Process Variation Reduction
    by Paul McLellan on 08-19-2014 at 7:01 am

    At SEMICON last month, Rohit Pal of GlobalFoundries gave a presentation on their methodology for reducing process variation. It was titled Cpk Based Variation Reduction: 14nm FinFET Technology.

    Capability indices such as Cpk is a commonly used technique to assess the variation maturity of a technology. It looks at a given parameter’s variability and compares it to 6 sigma. The higher the number the better, 1.33 should have the process yielding close to 100% (for that parameter) and 2 is the full 6 sigma. Using Cpk makes it easy to track metrics to assess variation improvement for a technology. They can also be used as a gating item for technology milestone achievement. However, it is not truly an absolute value, it is a function of the specification limits.


    One of the big challenges is modern processes is that variation at one stage of the process can depend critically on variation at an earlier stage in the process, so the steps cannot be considered individually. Plus, with a fab cycle measured in months, and masks costs measured in millions, doing experiments on real silicon are prohibitive. At a high level, the approach GlobalFoundries used is to use structural simulation using Coventor’s SEMulator3D virtual fabrication platform. By analyzing the output it is possible to assess the knock-on effects of process changes, meaning effects later in the process. Analyzing the output it is possible to see which early factors have a major effect on variation later in the process, and thus where to focus the effort for improvement. On the other hand, factors which make little difference later can be left alone.

    Structural simulation in SEMulator3D works by taking a specification of all the process parameters along with the layout data. SEMulator3D then builds up the result of building that layout on the process with those particular parameters. This structural output can then be used to derive electrical and other data. The picture at the top of this blog entry shows some example output, the bright green being the gates for the FinFets and the purple are the fins themselves. SEMulator3D has modules that understand the implications of almost everything that might be used in a process such as directional deposition, anisotropic etch, chemical mechanical polishing (CMP), implant and so on. Just as in actual fabrication, the virtual fabrication lays the various steps down one after another and builds up the outcome. But in the form of a 3-dimensional model of the outcome rather than an actual chip, of course. In a lot less time. For no mask or fab charges.


    The example that Rohit went into in detail was FinFET gate height. Insufficient gate-height was identified as a yield problem. But gate-height is influenced by many steps (fin definition, dummy poly definition, junction, poly open, work function patterning, tungsten fill, tungsten etching, CMP and probably more). For example, the picture below shows an adjustment made to eSiGe Space RIE (reactive ion etching). After simulating more steps, you can easily see visually a big difference in the eSiGe epitaxy.


    For the gate height improvement, a 9 factor two level DOE was executed and based on the simulation they could determine that Fin reveal, poly CMP, poly open CMP and tungsten CMP were statistically significant. So the specification limits were redefined and the variation spread amongst the contributing steps.


    For example, one step is poly open CMP. The original process had poor yield and an unnaceptably Cpk of 0.36. By adding additional steps to the process using a two level deposition before the first CMP and then doing a second Cmp got the Cpk up to 1.1.

    The conclusion is that the Cpk approach along with structural simulations (Coventor’s SEMulator3D) and physical to physical, electrical and yield correlations were used to define specification limits for physial measurements. Gate height variations for 14nm FinFET technology was successfully improved using this methodology.

    The slides for Rohit’s presentation are here.


    More articles by Paul McLellan…


    Enable a new generation of connected devices?

    Enable a new generation of connected devices?
    by Eric Esteve on 08-19-2014 at 4:08 am

    Imagination Technologies has designed a complete environment to address the needs of emerging IoT and other connected devices, FlowCloud. The technology has been engineered by Imagination to optimize device to cloud connectivity for embedded applications. FlowCloud is a cloud based application independent development platform, but it’s also a set of of core services and supporting infrastructure that form a set of building blocks specifically designed to accelerate the deployment of cloud-based applications. FlowCloud provides a platform that enables rapid construction and management of machine-to-machine and man-to-machine connected services, equally suitable for the hobbyist programmer through to large corporate clients. In other words, anybody can design an IoT or cloud-connected devices using FlowCloud. But the platform is also able to handle complex services relying upon subscription, billing and payment mechanisms.

    The white paper introducing to FlowCloud can be accessed here (you may have to register before downloading)

    In the real world, once the concept of the innovative emerging Internet of Things (IoT) or Machine-to-Machine (M2M) cloud-connected devices has been sketched, you want to be able to prototype it, and do it fast. Imagination’s silicon partners have created several low-cost reference platforms with full support for FlowCloud. For example, the chipKIT Wi-Fire development platform from Digilent is an ideal starting point; it uses a PIC32 microcontroller (MCU) with a MIPS microAptiv CPU and boasts on-board Wi-Fi. You can exercise your application, still using FlowCloud. At the heart, there is a set of core services and supporting infrastructure that form a set of building blocks specifically designed to accelerate the deployment of cloud-based applications. State-of-the-art data centres host the FlowCloud platform and supported services, using cluster server technology and built-in redundancy to deliver high reliability and guarantee system uptime.


    Let’s take a look at some real case applications.
    Case 1 is a Simple home control application. PowerBox is an example application that tracks and controls the electricity consumed by an appliance, enabling running costs to be monitored by the consumer. The application also delivers remote control and scheduling of individual power sockets.

    Case 2 is an example of Secure systems, Electronic Healthcare, where high integrity, security and privacy of data are primary considerations. In the example of the SensiumVitals system from Sensium Healthcare, the device itself takes the form of a medical patch worn by the patient which provides live monitoring of vital signs including respiration, heartbeat and body temperature.

    Reading Imagination white paper, you will discover how FlowCloud can be used to build a complete system competing with Apple itself, with Case 3, Cloud Music Service. The service allows full subscriptions, billing and micropayment.


    Major features of FlowCloud include device and user management, asynchronous messaging services, event logging, data storage facilities, secure transactions and electronic payments. From the analytics standpoint, a full suite of administration and reporting tools provide dynamic views into the data stored server-side, enabling monitoring and management of all user interactions plus the status of all devices registered to your cloud-based services. Moreover, the tools allow both aggregation and deep analysis of this data, enabling the creation of advanced intelligent services.

    If you read this white paper too quickly, you may only get the cloud based “development platform” view and miss the innovative set of FlowCloud services, including registration, authentication, association, security, notifications, updates and remote control. Imagination also proposes optional plug-ins modules to accelerate development including FlowTalk (VoIP), FlowFunds (electronic payments), FlowMusic (audio subscription services) and many others.

    FlowCloud has been designed like a consumer interface, a cloud based set of services allowing from hobbyist programmer through to large corporate clients to realize the rapid construction and management of machine-to-machine and man-to-machine connected services. FlowCould may allow garage start-up to quickly develop and demonstrate, not only the system, but also the business model, and exercise it in the real life…

    From Eric Esteve from IPNEST

    More Articles by Eric Esteve…..


    Intel 14nm is NOT in Production Yet!

    Intel 14nm is NOT in Production Yet!
    by Daniel Nenni on 08-18-2014 at 2:01 pm

    Okay, maybe I’m the only one questioning Intel 14nm yield but I think it will be an interesting discussion in the comments section. Here are the questions I would have asked Intel during their recent 14nm PR tour: Has the P1272 process been rolled out to the production fabs in OR, AZ, and Ireland? Is the process officially in production (at Intel this means yield is in a specific range)? Before I share the answers I dug up to those questions lets take a look at the slide show Intel presented last November during the analyst meeting. Here are the most interesting yield slides:

    Please note that some of the slides have *Forecast at the bottom. Just last week Intel shared an updated yield slide with notably less detail. Wait, is Broadwell really an SoC?

    Clearly Intel missed the Q1 2014 “matching yield” projection but the question is why? Given that 14nm is a second generation FinFET process it really boggles the mind why yield is such a challenge. The consensus at SEMICON West last month is that there was a significant materials change at 14nm. If you know more about this please let us know in the comments section. Another slide Intel shared recently also shows a FinFET change which was predicted/discussed by Asen Asenov of GSS: Has Intel Learned from Predictive Simulations?

    Gold Standard Simulations (GSS) offers complete solutions for Design Technology Co-Optimisation (DTCO), PDK development and exploration and screening of future technology options. Our tool chain integrates predictive Monte Carlo and statistical TCAD simulations, statistical compact model extraction and high sigma statistical circuit simulation using ‘push button’ cluster-based technology. Our tools are the ‘gold standard’ in terms of physical accuracy, efficiency and usability.

    Why is Intel releasing this information now? My guess is that they are under considerable pressure from Wall Street (I have received several calls on it and have another one coming up). The last comment on 14nm production I remember is from BK on the Q2 2014 conference call last month:

    “We also expect the first 14-nanometer Broadwell Core M processor-based systems including fanless two-in-ones will be on shelves for the holiday selling season, followed by broader OEM availability in the first half of 2015.

    Since BK is an experienced Intel operations person it would have been nice if he had said, “The P1272 14nm process has been moved from R&D (copy exact) to production fabs in Oregon, Arizona, and Ireland. P12272 is currently in pre-production at those fabs with production targeted by the end of 2014.” It’s all about transparency Brian, absolutely.

    It would also be interesting to know why Intel chose such an aggressive metal fabric for 14nm. Is Intel bound by Moore’s Law and the ability to go where no transistor has gone before? Or was there a technical method in their madness?

    Hopefully the foundries will have an easier time with yield since they chose to reuse the 20nm metal fabric for their first FinFET implementation. In the foundry business it’s all about manufacturability and servicing a very large customer base so the method in TSMC’s madness is easy to understand.

    Also Read: Intel Versus TSMC 14nm Processes

    More Articles by Daniel Nenni…..