RVN! 26 Banner revised (800 x 100 px) (600 x 100 px)

Xilinx UltraScale leads the way on connectivity

Xilinx UltraScale leads the way on connectivity
by Luke Miller on 09-21-2014 at 10:00 am

Even though Xilinx FPGAs seem to keep growing in densities and gobbling up boards into a single part, there is still the need for chip to chip connectivity and of course backplane connectivity. Xilinx 20nm UltraScale, TODAY, can really move 28 gb/s over the back plane. This is something that you cannot do with Altera 20nm, they are limited to 17 gb/s for the backplane. Arria Chip-Chip they are behind as well, only at 28 gb/s. Xilinx 20nm UltraScale will get you at 33 gb/s chip to chip. Higher data rate means less lanes to do more.

Need some proof, please watch this video seeing REAL Xilinx UltraScale hardware working, and is available now. Also demonstrates that Xilinx is the leader in 400G/500G solutions as well. This is a great video and is really well done.

Now Altera will say Arria is midrange, wait for the Stratix-10. That is an interesting scenario. Do you know how hard it is to manage a family/product with one fab? Xilinx is the leader in execution, and it requires vast amount of engineering resources. So say you do end up designing in an Arria 10, do you think you are going to get the engineering support when the smooth Intel 14nm process starts to crank out wafer’s for Altera? Has Altera doubled its engineering resources? Both used the 20nm TMSC fab, Xilinx is faster and denser. What happened to Arria? Same process, why is DSP limited to 550 MHz and Xilinx 741 MHz? It is all about engineering and architecture.

The Xilinx Gigabit Transceivers are unleashing what I would call nothing short of a revolution. High speed serial is here to stay and JESD204b and Hybrid Memory Cube are absolutely tremendous. Of course Xilinx leads the way with these technologies. Here is an excellent video of the World’s first Hybrid Memory Cube 15G using an Xilinx UltraScale design. Please watch and enjoy, once again technology that is available here today.

For some of you, and that is many that still need DDR4, Xilinx UltraScale still does that, and is the world’s fastest and best in DDR3/4 solutions, not in simulation or in tools but in REAL silicon!

For more video’s on Xilinx UltraScale I encourage you to check out this link and really spend the time to clearly understand the advantages Xilinx has over Altera. And may I ask you, why are you using Altera? Wait till I get to write about 16nm, my oh my…


TCAD to SPICE

TCAD to SPICE
by admin on 09-21-2014 at 7:00 am

Power devices have historically been made from silicon (Si), which has reached the limit of electric power loss reduction. With the superior physical and electrical properties of silicon carbide (SiC), we can expect to see a significant expansion in the amount of electric power conversion of electrical equipment as well as reduced loss during conversion. One of the big steps in bridging the TCAD and design worlds is the capability to define the process in the TCAD environment and end up with SPICE models that allow designers to simulate the capabilities..

Next Tuesday, September 23rd, Silvaco have webinar TCAD to Spice Simulation of SiC and Si Power Devices. The webinar is from 10-11am Pacific time and is presented by Dr. Eric Guichard, who is Silvaco’s Vice President of the TCAD Division. He is responsible for all aspects of TCAD from R&D to field operations. Since joining Silvaco in 1995, he has held numerous positions including director of Silvaco France and most recently Director of Worldwide TCAD Field Operations. Dr. Guichard holds an MS in material science and a PhD in semiconductor physics from Ecole Nationale Polytechnique de Grenoble, France.

The webinar will provide a discussion of the methods used to design, simulate and optimize the performance of power devices using TCAD and SPICE simulations. Silicon has long been the semiconductor of choice for high-voltage power electronics applications. However, wide-bandgap semiconductors such as SiC have begun to attract attention due to their projected improved performance over silicon. Simulating SiC devices is more challenging relative to silicon-based devices. In this webinar Eric will review the requirements to accurately simulate SiC-based power devices. He will also present a completely automated TCAD to SPICE flow that helps reduce the cost and time taken to develop a Silicon-based IGBT power device.


What attendees will learn:

  • Key challenges of power device TCAD simulation
  • Key challenges of SiC TCAD simulation
  • TCAD simulation of SiC IGBT (Insulated gate biplolar transistor), Trench MOS and DMOS
    • 2D and 3D TCAD simulations (meshing, solver, physical models)
    • When to use 3D over 2D
  • Full TCAD to SPICE IGBT flow example
    • Process and Device simulations for IV curve generation
    • TCAD-based SPICE parameter extraction using HiSIM-IGBT compact model
    • Correlation between circuit performance and process variation
    • Circuit performance optimization

More details on the webinar including a registration link are here. If you cannot make this date and time then register anyway, you will get sent a replay link to the webinar soon afterwards.

More articles by Paul McLellan…


Intel’s 35% Density Advantage Claim Explored

Intel’s 35% Density Advantage Claim Explored
by Daniel Nenni on 09-20-2014 at 1:00 pm

The previous blog I did on the density difference between Intel 14nm and TSMC 20nm caused quite a stir and many interesting comments which I would like to address. After writing thousands of blogs on a wide variety of topics I have found that playing the devil’s advocate stimulates the most productive conversations and in this case it proved to be true. The Intel Core M vs Apple A8! blog went viral last week and resulted in some very interesting points made in the comment section that I feel should be explored in greater detail.

First is how we measure density. The semiconductor industry is all about packing more transistors in a smaller space. It is part of Moore’s Law, it is how we get less expensive consumer electronics, it is a badge of honor really. There are two transistor numbers you can use: the number of transistors in a design schematic and the number of transistors in the final layout which is then manufactured. The difference between these numbers varies but after taking a quick poll amongst leading edge design and layout people the range is 0-10% more transistors in the layout. Since density is a badge of honor most companies use the layout transistor count but if it serves a marketing purpose they will use the schematic transistor count. Either way, considering the point I’m trying to make, it doesn’t really matter.

Second, comparing the Intel Core M processor and the Apple A8 SoC is like comparing an orange to an apple but this is the only data we have today and it is a good starting point for a density discussion. The architectures are different (CPU vs SoC), the processes are different (20nm planar vs 14nm FinFET), and the companies are very different (IDM versus Fabless).

Third, the performance, power, and functionality of the chips are not part of this discussion. Tear downs and third party benchmarks will be required and they are not available yet. When they are, we can look back on this discussion and see if we were right and if not we can see where we went wrong. All in the interest of science, right?

Here is the argument: Intel claimed a 35% density advantage over TSMC during their November 2103 Investor Meeting using the middle slide above. Intel also used the Altera slide as support for their claim. TSMC rebuffed that claim during a quarterly conference call using the slide on the left.

According to Apple the A8, which is manufactured by TSMC on a 20nm planar process, has about 2B transistors on a 89mm2 die. According to Intel the Core M manufactured on a 14nm FinFET process has about 1.3B transistors on an 82mm2 die.

Given that:

[LIST=1]

  • According to TSMC, 16nmFF+ has a 15% density advantage over 20nm planar
  • We do not know what type of transistor count Intel and Apple uses but assume the worst case with a 10% variance (upsize Intel by 10%)

    Intel’s 35% density advantage claim just does not hold up, not even close. Time will tell, silicon does not lie, but for now TSMC’s density slide is much more honorable than Intel’s. And let’s not forget that Intel’s processes are highly specialized for a single product and TSMC’s processes serve a much wider range of applications. If true, this lack of density gap is really big news for the fabless semiconductor ecosystem, absolutely!

    More Articles by Daniel Nenni…..


  • Who will be “lucky dog” in 4G LTE basebands?

    Who will be “lucky dog” in 4G LTE basebands?
    by Don Dingee on 09-19-2014 at 5:00 pm

    The official term is “beneficiary rule”, but among colorful racing broadcasters, drivers, and fans it is more commonly referred to as the “lucky dog”: the driver who is down a lap, but gets to advance to the lead lap by virtue of being farthest ahead when a caution flag is raised.

    Qualcomm has lapped the entire field when it comes to cellular baseband chipsets, holding 66% market share according to the latest figures from Strategy Analytics. Continue reading “Who will be “lucky dog” in 4G LTE basebands?”


    MEMS+, Bringing MEMS into the Electronic World

    MEMS+, Bringing MEMS into the Electronic World
    by Paul McLellan on 09-19-2014 at 1:59 pm

    One of the things about MEMS devices is that they almost always live on a chip that also contains the electronics necessary to process the output from the sensor. For example, an on-chip accelerometer for a car airbag deployment will contain the electronics necessary to process the signal from the sensor and end up with something much closer to “we’re crashing, deploy the airbags” versus “we’re OK, don’t fire off the airbags.”

    The design of the MEMS devices themselves are typically done with some form of finite-element analysis (FEA), a very general approach to designing mechanical structures. However, these models of the device are very complex and slow to evaluate due to the huge number of degrees of freedom. This is fine for designing the device itself but for working with the electronics a simpler model of the device is required that is accurate enough for the purpose but is also fast to evaluate.

    What is required is a model that can be imported into Mathworks/Simulink or Cadence/Virtuoso and allows the circuits being designed to be evaluated with the MEMS device in place. In effect we want the input to the electrical simulation to be the input to the MEMS device, which is typically mechanical/force/temperature not the electrical signal it produces. So we input deceleration and then can see the signals that the sensor creates, how they are processed all the way up, potentially, to how the software in a microcontroller reacts. Other MEMS devices are more on the output side, such as mechanical switches or DLP mirrors, but the same idea remains. The electronics and the MEMS devices need to be cosimulated with enough accuracy on the MEMS side to ensure that the electronics is designed correctly but without requiring a model with such fidelity that the simulation is prohibitively slow. The traditional approach to doing all this has been to hand-craft a model. To make it possible to even do that, tye model is often over-simplified which can lead to errors slipping through the cracks.


    Coventor’s MEMS+ and other products are the tool of choice for MEMS designers. Over half the top 10, half the top 20 and half the top 30 are Coventor customers. So what is MEMS+? It is a tool for creating high order finite element models that run in MATLAB, Simulink, and Cadence instead of proprietary field solvers. It allows models of MEMS components to be constructed from parametric finite elements such as rigid shapes, flexible shapes, side electrodes, interdigitated combs and more. The basic building blocks of MEMS devices. Then models can automatically be generated for use in MATLAB/Simulink and Cadence. The models include mechanical, electrical and gas damping effects, and are small and fast enough for transient simulations, simulating in minutes on a standard laptop.

    The models are parametric so it is straighforward to vary the design to take account of, for example, manufacturing variability. For example, an interdigitated comb can take account of the thickness of the elements, the height, the sidewall incidence from over-etching etc.


    MEMS+ 5.0 was announced by Coventor recently. The key new features are:

    • Improved Reduced Order Model generation and export
    • Now exports Verilog-A and MATLAB/Simulink models (up to 100X faster than complete nonlinear models )
    • MATLAB/Simulink ROM’s support 3D result visualization
    • New option to include mechanical nonlinearities for frequency hysteresis (Duffing effect) and quadrature
    • Improved model library

      • New comb models for movable flexible structures
      • Improved side electrode and contact models
      • New squeezed film damping models for side electrodes
      • Support for modeling out-of-plane structures such as corrugations
      • New charge output for piezo-electrical layers
      • New generic spring and damper

    The SUN will NOT set on Oracle!

    The SUN will NOT set on Oracle!
    by Daniel Nenni on 09-19-2014 at 7:00 am

    Larry Ellison resigning as CEO of Oracle caught me by surprise. I definitely did not see that one coming. Talk about the end of an era, as a 30+ year Silicon Valley veteran there have been quite a few industry icons that stand out amongst the others: Dave Packard, Bill Hewlett, Gordon Moore, Andy Grove, Bill Gates, Steve Jobs, and Larry Ellison are definitely in my top ten. SUN Microsystems was also one of my favorites since I’m a computer person at heart and they changed the world. So when Larry Ellison rescued SUN from IBM I was ecstatic. Unfortunately the emails I have received today suggest that clouds may be coming to which I totally disagree.

    Early in my career I almost joined SUN but opted for a SUN compatible start-up called Solbourne Computer. Not one of my best ideas. In fact, it was my second worst career decision, joining Avant! being the first. I started my career with the mini-computer and saw a room full of compute power end up on a desktop with the first SUN System. That same compute power is now in a watch by the way. SUN really created the internet in the 1980s and rode the dot-com bubble into oblivion. Larry Ellison bailed SUN out in a $7.4 billion dollar acquisition in 2010 to better position Oracle in the server business against HP/Intel and IBM. Larry Ellison and SUN co-founder Scott McNealy were also good friends so that probably had something to do with it as well.

    Larry is probably the most ego fueled CEOs on my list and I will never forget what he said about cloud computing:

    “The interesting thing about cloud computing is that we’ve redefined cloud computing to include everything that we already do. … The computer industry is the only industry that is more fashion-driven than women’s fashion. Maybe I’m an idiot, but I have no idea what anyone is talking about. What is it? It’s complete gibberish. It’s insane. When is this idiocy going to stop?”

    I feel the same way about the Internet of Things. It’s called embedded IC design and we have been doing it for a lifetime, right?

    I had never really watched America’s Cup until Larry turned it upside down. Now I not only watch the races but my beautiful wife and I attend them whenever possible.

    “I enjoy the competition and the process of learning as we compete. The whole thing is just fascinating. I don’t know what I’ll do when I retire. When I go sailing, I look around … anyone want to race? I just love competing as opposed to just going out and watching the sunset.”

    Not only is Larry ultra competitive, he has no problem writing some very big checks and since the SUN acquisition is on his shoulders I’m confident those checks will continue. Thanks to Larry Ellison, Oracle is now a leading edge fabless semiconductor company and a key collaborator within our ecosystem. If you read the fine print on Larry’s resignation it says he will now lead software and hardware engineering full time without CEO interrupts. I think this is a VERY good thing for the fabless semiconductor industry, absolutely.

    More Articles by Daniel Nenni…..


    MIPI Alliance introduces C-PHY, Synopsys launch C-PHY VIP

    MIPI Alliance introduces C-PHY, Synopsys launch C-PHY VIP
    by Eric Esteve on 09-18-2014 at 12:05 pm

    The set of MIPI PHY specifications has enlarged during last night, as theMIPI Alliance has introduced the new C-PHY spec on September 17th, a physical layer interface for camera and display applications. “The MIPI C-PHY specification was developed to reduce the interface signaling rate to enable a wide range of high-performance and cost-optimized applications, such as very low-cost, low-resolution image sensors; sensors offering up to 60 megapixels; and even 4K display panels,” said Rick Wietfeldt, chair of the MIPI Alliance Technical Steering Group.

    The next day, Synopsys has released a new Native SystemVerilog-based MIPI C-PHY Verification IP to help enable engineers to verify interfaces such as MIPI CSI-2 v1.3, which includes the MIPI C-PHY. The MIPI MIPI C-PHY™ specification uses three-phase digital coding techniques. This means that a chip integrating MIPI C-PHY will use 3 pins to form 1 unidirectional lane (in fact a trio), the clock being embedded. Thus we don’t speak any more about Gb/s (Giga bit per second) but Gsym/s (Giga symbol per second), the symbol being formed by the lane trio. When a differential signaling technique (like for MIPI M-PHY) uses two wires to carry one symbol equal to one bit (minus the encoding, for example with 8b/10b, only 0.8 bit), the MIPI C-PHY will use three wires to carry one symbol equal to 2.28 bits. The benefit is that you reach (about) the same bandwidth with a MIPI C-PHY running at 2.5 Gsym/s on 3 wires than with a MIPI M-PHY running at 5.8 Gb/s on 2 wires. Designing at lower frequency (in this range) is probably easier, and the 2.5 GHz lane should generate less perturbation than the 5.8 GHz… don’t forget that the first application is mobile phone, thus avoiding to perturb RF signals can only be good!

    Synopsys providing the MIPI C-PHY Verification IP the same day the specification is introduced is already great news. But another PR was launched the same day: Synopsys has released the MIPI D-PHY v1.2, running up to 2.5 Gbps per lane, or an aggregated data throughput of up to 20 Gbps for high-resolution imaging applications. According with Synopsys, this new “MIPI D-PHY is 50 percent lower in area and power compared to competitive solutions, reducing silicon cost and extending battery life”. There is also an interesting quote from Sean Mitchell, senior vice president and COO at Movidius, saying that “The DesignWare MIPI D-PHY offered low power consumption, high performance and configurability options that were critical to the success of our Myriad 2 Vision Processing Unit”. If you take a look at Movidius web site, the Myriad 2 Vision Processing Unit targets the following applications:

    • Smartphone / tablet cameras
    • Wearables, action cameras, and electronic eyewear
    • Embedded devices (home automation, industrial, and robotics)

    If using MIPI D-PHY for Smartphone and Media Tablet looks pretty obvious, listing other applications like wearable and embedded devices is very interesting: MIPI technology is going outside of the mobile phone (or tablet) industry! In fact, we expect such information to become more common in the future. The benefits coming with MIPI technology usage like better power/bit efficiency, interoperability or availability of Off-The-Shelf ASSP running in 100’s million units in production (with a positive impact on price) should make MIPI powered IC a very attractive solution for wearable, IoT and embedded devices!


    I was about to miss the latest, but not least, MIPI related PR released the same day by Synopsys: “Leadcore Achieves First-Pass Silicon Success with DesignWare MIPI IP in Smartphone Application Processor SoC”. In fact Leadcore is a chinese Application Processor SoC maker, targeting a market which is probably the most competitive on a world-wide basis today, the Chinese mobile market.

    “With the tight time-to-market windows in the mobile market, we needed an established IP supplier that would provide high-quality and reliable solutions,” said Dijun Liu, vice president, Leadcore Technology. “We successfully integrated the DesignWare MIPI IP into our design within two weeks, letting us focus our efforts on the differentiating portions of our design. The DesignWare IP helped us meet our project schedule and improved our product’s time-to-market. We fulfilled our customer’s requirements by using DesignWare IP from Synopsys.”

    The MIPI specification integrated into Leadcore INNOPOWER LC810 is DesignWare MIPI D-PHY, compliant to the MIPI D-PHY interface specification v1.1, supports up to 1.5 Gbps and is configurable for host or device applications. So, if we summarize, Synopsys has launched Verification IP for MIPI C-PHY, compliant with MIPI CSI-2 v1.3, released the MIPI D-PHY v1.2, running up to 2.5 Gbps per lane, or an aggregated data throughput of up to 20 Gbps, and shared a customer success story, the Leadcore LC810 integrating MIPI D-PHY v1.1… We understand this quote from Joel huloux, chairman of the board of MIPI Alliance, “Over the last 10 years, Synopsys has played an active role in MIPI Alliance working groups, contributing to the development and proliferation of MIPI Alliance technology,” as Synopsys investment into MIPI technology is clearly strong.

    Eric Esteve – See “MIPI IP Survey & Forecast” from IPNEST


    Ultra low light CMOS biosensor helps tackle infectious diseases

    Ultra low light CMOS biosensor helps tackle infectious diseases
    by Daniel Nenni on 09-18-2014 at 9:00 am

    The recent outbreak of Ebola in West Africa underscores the urgent need for globally affordable tools to help fight infectious diseases. Among these, a method to rapidly and accurately identify the infectious pathogen is of particular importance.

    In recent years, researchers have tried many ways to achieve easily portable and affordable molecular diagnostics solutions to help with curbing infectious diseases globally. Much effort has been put in developing novel microfluidics (e.g. DNA microarrays) technology to achieve miniaturization. Microfluidics biochips allow reactions of samples and diagnostics assay to take place in a small, disposable chip platform. However, these biochips still need to be “read out” by such instruments as fluorescent microscopes. And these instruments are often bulky and expensive.
    For this reason, innovation in microfluidic biochips fell short of providing a complete solution for point of care diagnostics. Engineers at Anitoa (Palo Alto, CA. www.anitoa.com) took a different approach by focusing on technologies that enable compact instrumentation. Anitoa just introduced a CMOS bio-optical sensor that is highly integrated and low power. Combined with microfluidics technology, This CMOS biosensor can enable truly portable and affordable molecular diagnostic solutions.


    Figure 1, Molecular tests (DNA, antibody) provide precise answers about the type of virus and bacteria behind infectious diseases

    Anitoa recently announced the availability of the industry’s first 3e-6 lux ultra low-light CMOS Bio-optical sensor called ULS24. Anitoa’s single chip CMOS Bio-optical Sensor is capable of detecting 3e-6 (or 3×10-6) lux with a better than 13dB signal to noise ratio (SnR), consuming only 30mW. With this performance, CMOS Bio-optical Sensor can now replace the bulky and expensive PMTs (Photon Multiplier Tube) and cooled CCDs widely used today in molecular diagnostic instruments, which sense molecular reactions using fluorescence or chemiluminescence signaling principle.
    Anitoa’s ultra low-light CMOS biosensor enables the realization of a wide range of low-cost and portable medical or scientific instrument. A field-portable Nucleic Acid Test (NAT) system that can precisely identify infectious pathogens is just an example. Such DNA test instrument solution can be deployed in the field and allows physicians to respond timely to potential epidemic diseases globally by prescribing life-saving treatments such as targeted antibiotics or anti-viral drugs on the site.

    CMOS biosensor enables portable qPCR
    One of the most powerful methods to detect and quantify infectious pathogen is through Real Time Quantitative Polymerase Chain Reaction or qPCR. qPCR achieves sensitivity and specificity through combined amplification and real time detection. qPCR can cause target DNA strands be selectively replicated millions of times, with the help of a special enzyme called polymerase. As the target DNA being replicated, they bind with specially designed molecular probes that are labeled with fluorescence materials. A sensor would capture the fluorescent signal emitted from these probes as a way to monitor the reaction, and detect and quantify the target bacteria or virus DNA.

    Figure 2 Result of qPCR detection of E. coli DH5a bacteria DNA with Anitoa’s CMOS biosensor

    With the high sensitivity and signal-to-noise ratio of Anitoa’s CMOS biosensor, system designers can fully take advantage of microfluidics innovations to achieve total system miniaturization. Microfluidics uses very small reaction volume and densely packed reaction sites. This also leads to faster reaction time. Anitoa’s CMOS biosensor has needed fast integration time and imaging capability to take advantage of microfluidic systems. With this combination, engineers at Anitoa are in the process of building a palm-sized microfluidic qPCR system that will work with today’s off-the-shelf qPCR diagnostic assays (see Figure 2 for validation results).

    In addition to small size and fast detection, ultra low light CMOS biosensor can also help achieving better diagnostic results. For example, ultra low light CMOS can help overcoming an issue called photo bleaching by using much lower dose of excitation light (using LED instead of laser) to generate fluorescence. Reduction of photo bleaching significantly improves assay repeatability, making diagnostic results more believable. (Link to full paper can be found at http://www.anitoa.com/docs/anitoa-whitepaper-l.pdf)

    Also Read: CMOS Biosensor Breakthrough Enables Portable Diagnostics Solution


    GlobalFoundries on the Road

    GlobalFoundries on the Road
    by Paul McLellan on 09-17-2014 at 5:12 pm

    Every year in the fall GlobalFoundries has a series of technical seminars they take on the road around the US. This year it kicks off on Tuesday, October 21 at the Doubletree Hotel in San Jose. Two days later it is at Dana Point (southern CA) and on the 30th it goes to Austin (you don’t need me to tell you where Austin is, I’m assuming).

    GlobalFoundries is the #2 foundry in the world. The most significant recent announcement was that their Fab 8 in alta NY will run a copy-exact of the Samsung 14nm process. This will not only give GF a solid 14nm process, but also make it possible for customers to second source designs into either fab. Since Samsung is both a foundry and a major player in mobile and other industries, there is a high level of competing with their customers. For sure that is one reason that Apple moved away from Sammy to TSMC, it was just too weird being both the largest (essentially only) foundry customer and at the same time playing number 2 in market share to Samsung’s mobile offerings.

    Building on last year’s inaugural seminars, executives from GF will reveal the latest developments in technology and discuss industry trends and solutions throughout the region. From leading-edge advanced SoC design and packaging to HV analog and embedded NVM & RF solutions, the event will cover a wide range of topics.

    The seminar also features a panel discussion focusing on Techno-Economics Pressure in Semiconductor Value Chain that may impact Consumers and Global Economy. Panelists include leading figures and industry experts from the semiconductor Eco-System. Another highlight is the partner area featuring GF’s ecosystem partners who will showcase their collaborations. A special networking reception will be held in the afternoon allowing attendees the opportunity to meet and mingle with partners in a relaxed setting.

    So here are the details. The agendas vary slightly from seminar to seminar.

    • October 21st, Doubletree Hotel San Jose. 8.30am to 4pm.
    • October 23rd, Doubletree Hotel, Dana Point. 8.30am to 4pm.
    • October 30th, Omni Austin Southpark, 8.30am to 4pm.

    The agenda for the San Jose seminar is as follows:

    • GF strategy and roadmap update
    • Panel discussion: Techno-Economics Pressure in Semiconductor Value Chain may impact Consumers and Global Economy – What is our Solution?
    • 2.5D/3D The new paradigm of advanced packaging
    • RF, BCDlite and NVM
    • 14-20nm leading edge

    Full details, including a link for registration, are here.


    More articles by Paul McLellan…


    Designing the Right Architecture Using HLS

    Designing the Right Architecture Using HLS
    by Pawan Fangaria on 09-17-2014 at 9:05 am

    With the advent of HLS tools, general notion which comes to mind is that okay, there’s an automated tool which can optimize your design description written in C++/SystemC and provide you a perfect RTL. In real life, it’s not so, any design description needs hardware designer’s expertise to adopt right algorithm and architecture in order to fulfil the right intent of the design; the desired RTL architecture must be understood before writing the design description. Effectively it’s a hardware design and not software synthesis. So, more than the transformation of an abstract level h/w description to RTL, major contribution of an HLS tool is in improving the QoR (Quality of Results) by tuning the micro-architecture according to HLS constraints and making the design technology specific from technology independence. Calypto’sHLS process using Catapult has a dedicated ‘Architecture Refinement’ stage between ESL Reference Model and ESL Synthesizable Model.

    Consider the above example of a simple filter model where ‘multiply and accumulate’ loop can be unrolled for parallelism. The s/w code has bit-accurate types (Algorithmic C, or SystemC) with proper rounding, known sizes, internal taps and external coeffs. This s/w model can be easily synthesized.

    Now consider an optimized architecture (reduced area and complexity) of the folded 5-tap filter as shown in the above picture, the coeffs are reduced to 3. The decision to share or unroll summing adders can be made in HLS. As shown in the s/w model, loop merging in HLS can share folding adder which becomes technology dependent.

    HLS untimed model is technology and performance neutral. Depending on the system clock, sampling frequency and other design parameters such as throughput, the number of taps and appropriate level of folding or unrolling are decided. The area saving by folding becomes more pronounced with fully unrolled solutions with one sample per clock cycle.

    Above is an example of circular buffer RAM implementation with mutually exclusive read and write that allows single port RAM for tap storage. Circular buffer RAM may require large number of taps.

    Decimation is a technique to reduce sample rate by discarding samples, say 3 out of 4, and therefore it’s wise to reduce computational overhead for those discarded samples. Polyphase decimation is a concept that computes the required result in phases to reduce this overhead.

    A more complex example can be from image processing. Below is a sample code of image windowing – edge detector.

    It is inefficient to read an image 9 times for a single image out. For such cases, window & line buffer architecture is needed; a line buffer is a circular buffer delay line implementation with a write and read every cycle. In the above example considering positions 0 through 8 as registers and injecting pixels into position 8 and shifting (with appropriate delay of inputs) will get first pixel_out result at position 4. The line buffer can be implemented using dual port RAM with one read and one write or single port RAM with guaranteed read-before-write behavior or with double-width ping-pong read/write buffering.

    In order to implement appropriate h/w for single port RAM, a template can be defined for SPRAM hardware_window class and corresponding SPRAM class constructor and member function are defined. The RAM access operations are appropriately defined for mutually exclusive read and write operations. Similarly, shifting of window pixels, injecting data from delay line and updating the window registers are defined appropriately.

    The above image shows synthesis process in Catapult. The RAM array from SPRAM class instance can be mapped to SPRAM library. A 3×3 window on 1920 image width will have 958 deep double width RAM. 12-bit pixels, two lines to buffer and double width will require 48-bit wide RAM.

    It’s clear from the above examples that the hardware expertise of a RTL designer proves quite valuable while writing the description at a higher level of abstraction which leads to productivity in design exploration and optimization, and accelerates verification and validation. To know more details and actual synthesis process about these examples, attend an on-line webinar(needs a quick registration on-line) presented by Stuart Clubb from Calypto. Stuart explained the code in great detail, pointing to specific variables, data and operations. It’s a must webinar to attend for designers and ESL specialists exploring to write hardware descriptions for SoCs at system level.

    More Articles by Pawan Fangaria…..