SemiWiki – Page 856 – The Open Forum for Semiconductor Professionals

July 25, 2013June 14, 2019

Any MIPI CSI-3 Host IP Solution for SoCs Interfacing with Sensors?

Any MIPI CSI-3 Host IP Solution for SoCs Interfacing with Sensors?
by Eric Esteve on 07-25-2013 at 4:37 am
Categories: IP

For those taking a quick look at the various MIPI Interface specification, the first reaction is to realize that they will have to look at MIPI more closely, and that it will take them longer than expected to make sure they really understand the various specifications! Let’s start with the PHY. One specification defines the D-PHY, up to 1 Gbps (1.5 Gbps is also defined, but not really used), another defines the M-PHY, to support higher data bandwidth and higher speed. Look simple? In fact, we did not mention the various “gear” supported by M-PHY (per lane): Gear 1 is up to 1.25 Gbps, Gear 2 up to 2.5 Gbps when gear 3 is defined up to 5 Gbps. In fact there are many more differences between D-PHY and M-PHY, if you take a look at MIPI Org web site, you will find this comprehensive picture:

Now, you clearly understand the various MIPI PHY, and you know that a PHY is nothing without a Controller, the digital part of the function in charge of processing the protocol layers, like “Link Layer”, “Transport Layer” and so on. Let’s stay with the M-PHY example. If life would be simple, you would attach one MIPI Controller to this M-PHY. But, if we are (more or less) well paid engineers, it’s because SC related life is not simple… Just take a look at the picture below:

In order to ease SoC integration, M-PHY can support up to six different protocols. This means that when a chip maker decides to integrate several MIPI protocols on the same chip, he wills also instantiates several times the same PHY IP, and the various controllers attached. All controllers are not made equal: DigRF (interfacing with RF chip), LLI (interfacing the SoC and a modem chip, to share a unique DRAM) and SSIC (SuperSpeed USB IC protocol, for board level inter-chip connection) can be plugged directly with the M-PHY. But, another group of Controllers (CSI-3, DSI-2 and UFS) require another piece of IP, UniPro, to be inserted between M-PHY and, for example, MIPI CSI-3 controller (Camera Serial Interface specification).

When a chip maker designs an Application Processor for smartphone or media tablet, he is integrating over 100 IP, from ARM A9 to I2C or SRAM. Such a chip maker will certainly appreciate the fact that Synopsys propose a complete Camera Serial Interface 3 (CSI-3) host solution, including the new DesignWare MIPI CSI-3 Host Controller IP combined with the MIPI UniPro Controller and multi-gear MIPI M-PHY IP. With support for up to four lanes in Gear1 to HS-Gear3 operation, the CSI-3 host solution simplifies the system-on-chip (SoC) interface for a wide range of image sensor applications, giving SoC designers maximum flexibility to increase throughput while reducing pin count requirements and integration risk.

I agree with Joel Huloux, MIPI Alliance Chairman when he says: “IP supporting the MIPI CSI-3 v1.0 specification, along with a HS-Gear3 M-PHY, gives designers the ability to rapidly build host configurations into their SoCs,”. “Synopsys’ DesignWare MIPI CSI-3 Host Controller promotes the MIPI ecosystem while furthering the realization and reach of the latest MIPI specifications.” Working in the IP business for about 10 years, I have realized how important it is for a chip maker who decides to outsource a certain function, split into PHY and Controller, to have the opportunity to acquire a complete solution with a single supplier. This is the guarantee that the function has already been integrated (by the vendor), and also validated and verified before he will integrate it. In this case of a Camera solution, we are talking about three different functions! Last but not least, this new MIPI CSI-3 Host Controller, simplifying CSI-3 image sensor interface integration, is a low power solution.

By Eric Esteve from IPNEST

lang: en_US

July 24, 2013June 14, 2019

Semicon: Multiple Patterning vs EUV, round #2

Semicon: Multiple Patterning vs EUV, round #2
by Paul McLellan on 07-24-2013 at 9:00 pm
Categories: ESD Alliance, Semiconductor Services

Round #1 was here.

In the EUV corner were Stefan Wurm of Sematech (working on mask issues mostly) and Skip Miller of ASML who are the only company making EUV steppers (and light sources, they acquired Cymer).

You may know that the biggest issue in EUV is getting the source brightness to have high enough energy that an EUV stepper has a throughput of at least 120 wafers per hour so that it is competitive with multiple patterning. And the source is like something out of science-fiction. First, you make little tiny droplets of molten tin. Then you hit them with a laser to shape the drop. Then you hit it with a really big laser, so big that it needs a whole power infrastructure in the sub-fab, and this vaporizes the tin droplet to plasma. With a 20KW laser with a power efficiency of 10% you need 0.2GW of input power. The plasma lets out a little bit of EUV. Oh, and do that about 100M times per hour.

But EUV is absorbed by everything so you can’t use normal (transmissive) masks and the stepper has to have a high vacuum (because even air absorbs EUV). But you can’t use conventional mirrors like you have in your bathroom. They absorb EUV too. You need to build masks out of multiple layers of silicon and molybdenum to form a mask that reflects due to interlayer interference. But they still don’t reflect very well, about 30%, so after a couple of mirrors to focus the EUV light, and 6 more to direct it, and a reflective mask, 96% of the light is absorbed and only 4% hits the wafer.

So with that background, are we there yet?

Stefan started of pointing out that he is working on the assumption that the light source issue is solved. He can’t do anything about it at Sematech but waiting for it to be solved before looking at the other problems is clearly silly.

First the good news. EUV resists seem to be in good shape, first production-type EUV tools are being delivered, masks blanks are being made, there is some experience with pilot runs. Line width roughness (lwr) and CD uniformity (cdu) are getting better but so far only by accepting slower resists.

Mask blanks are still a big issue. We would like defect free masks but that is not really going to happen and here is why. As I said above, the masks are mirrors built up by depositing layers of silicon and molybdenum onto a glass blank using ion beam deposition (IBD). A big problem is that defects on the glass that are too small to see get amplified by this process and become real defects that affect the mask and then you can see (but then it is too late). Best masks are about 12 defects at 45nm. Those 12 break down into 10 pits from substrate (that were too small to see before you started deposition), one handling defect and one from deposition. Marathon runs of IBD of 100 blanks get 20-30% yield of acceptable masks. There are some long-term issues, IBD may not be viable long-term as process feature sizes continue to shrink.

Another issue is the EUV masks don’t have a cover on them (known as a pellicle) because that cover would absorb the EUV. So any defect on the mask is in the focal plane. There is an assumption in developing EUV that there is no contamination in the chamber, but of course that is not completely realistic. To me this is a huge issue, and one we don’t have with optical masks that have a pellicle to keep contamination out of the focal plane. So masks need to be cleaned regularly. But there are starting to be degradation of the patterns after 50-100 cleanings.

There is some work on pellicles with materials that are transparent(ish) to EUV. The most promising material seems to be single crystal silicon.

Takeaway:

IBD can produce usable mask blanks but may not be viable long-term
Substrate quality is an issue, hampered by lack of defect metrology
Need to ensure adders (particles that get on the mask after it was made) do not print
Mask lifetime learning has just begun (backside coating damage, clean handling)
EUV mask supply chain is a weak link. Will not be ready at quality and volume needed for HVM ramp and so industry needs to strengthen mask supply ecosystem.

Next was Skip from ASML. Cost is a big concern in the whole industry. In 2000 1GB would set you back $1182 in lithography costs, but by 2015 it should be about $0.17. But post-28nm cost is flat per transistor. Only EUV can give a full scaling for 10nm node due to litho/layout restriction with multiple patterning. EUV also has 30-75% reduced cycle time. They have 11 systems in various stages of construction in their clean room.

Currently source is generating 55W of power which is 43wph. They expect 80W and 60wph by the end of the year. EUV production is expected for 10nm logic and 1xnm DRAM volume production in 2015-2016.

My opinion. I understand the need for EUV to keep Moore’s law on track especially without having insanely high costs and insanely long tunround time. But I still don’t see how everything can be made to work in time. Intel is already planning 10nm without it. The pellicle issue I have always considered a killer but perhaps a silicon pellicle can be made to work. This meeting was the first time I’d heard the hint of possibility of an EUV-transparent pellicle. The fact that masks will not be defect free seems like a big issue. So much is being invested in the light source that I can believe that will be solved. But the almost laser-like focus (see what I did there) on that one issue has obscured many other issues that stand between EUV and use in high volume production lithography.

July 24, 2013June 14, 2019

Constrain all you want, we’ll solve more

Constrain all you want, we’ll solve more
by Don Dingee on 07-24-2013 at 8:30 pm
Categories: Aldec, EDA

EDA tool development is always pushing the boundaries, driven in part by bigger, faster chips and more complex IP. For several years now, the trend has been developing tools that spot problems faster without waiting for the “big bang” synthesis result that takes hours and hours. Vendors, with help from customers, are tuning tools to real-world results.

Continue reading “Constrain all you want, we’ll solve more”

July 24, 2013June 14, 2019

Metastability Starts With Standard Cells

Metastability Starts With Standard Cells
by Daniel Nenni on 07-24-2013 at 8:05 pm
Categories: EDA

Metastability is a critical SoC failure mode that occurs at the interface between clocked and clockless systems. It’s a risk that must be carefully managed as the industry moves to increasingly dense designs at 28nm and below. Blendics is an emerging technology company that I have been working with recently, their MetaACE product can be used throughout the design flow starting with foundation IP.

For standard cells, there are at least three groups that benefit from MetaACE:

The designer of the standard-cell synchronizer
The individual responsible for characterizing the synchronizer cell
The integrator of the synchronizer cell into a SoC product

MetaACE is used to refine cell design by minimizing the settling time-constant (tau) while maintaining other cell specifications. MetaACEis then used to obtain the parameters that characterize the synchronizer. The results can then be used to determine the MTBF of the synchronizer, as it will be used in the SoC product.

Let’s look in detail, as it was explained to me during customer meetings, at how a standard-cell characterization team might use MetaACE as part of their flow. Assume that the design is sent to characterization; this includes the extracted cell netlist as well as device models for the process in question.

The typical characterization flow would be run to find things like setup/hold times, propagation delay, input loads, etc. Using MetaACE one could also determine the four parameters needed for metastability analysis: Tw(1), Tw(2), tau-m and tau-s. This uses the same extracted cell netlist and device models for the characterization but with a few twists:

[LIST=1]

A small netlist should be created that instantiates the design.

Also, a few parameters would be defined that MetaACE will use for its analysis:

*include process models (SS corner, for example)
.include ‘$models/processModSS.sp’
*include cell to test
.include ‘$CellLib/DFF.sp’
*include the file MetaACE creates to drive simulation
.include ‘$MetaACE/ic.sp’

*define SUPPLY and wire it to Vdd
Vdd vdd 0 DC ‘SUPPLY’

*Wire up the flip-flop/Synchronizer cell(s) to be simulated
xdff1 Vdd 0 D C QN Sync DFF_X1

* bring out any internal nodes you may want to plot/analyze
Vm3 xdff1.z9 n11 0
Vm4 xdff1.z10 n21 0

In the above netlist example, the model file and the file that MetaACE modifies are included as well as the flip-flip/synchronizer to be simulated. “SUPPLY” is wired up as are “C” and “D” which are used by MetaACE to specify the supply voltage and clock/data inputs. Finally, any internal nodes in the circuit needed for analysis should be brought to this top level.

[LIST=1]

MetaACE is now run, specifying the netlist created, above, as the input as well as a few other parameters. The main items needed are the location for the simulator (HSPICE) as well as:
[LIST=1]

The temperature of the run,

Vdd for the run,

The name of the clock and its rise/fall time and width,

The name of the data input and its rise/fall times,

The device’s setup/hold times (if known), and

What node(s) should be plotted and analyzed.

For a master-slave type device, the first simulation run may specify the node that is the input to the slave as the node to analyze, first; this will give tau-m.

Once tau-m is found, the same circuit is run, again, but this time looking at the output of the first slave stage (if more than one stage). This will give the results for tau-s and TW(1).

After this second run, the simulation can be rerun a third time looking at the output of the second flip-flop (for a multi-stage device) which will give TW(2).

At each run, the configuration used can be saved for future use (in GUI or command-line mode). Steps 4-6 could be run from the command line as part of a script automatically extracting all parameters for each submission of the circuit for characterization.

These general procedures can be run over various process corners by copying the configuration files (which are XML) and the top-level netlist, modifying the netlist to call different corner models and changing the configuration file to point to the appropriate netlist to simulate. In this way, one could simulate the SS corner and the FF corner, for example. Even more complex cases can be run, such as an N-stage synchronizer cell, where each flip-flop could be assumed to be at a different process corner. Whatever you can specify in your netlist, MetaACE can simulate.

After the characterization process concludes, the results for the taus and TWs are tabulated along with the other parameters of the cell and passed back to the library folks for design updates, datasheets, etc. The data are all that is needed to calculate MTBF for this cell once the input clock frequency, duty cycle and data arrival rate is determined. This data could also be used to see if any recent changes made to the cell influenced tau; for example, the drive strength of the cell increased but this may have caused tau-s to get a bit larger. This may be acceptable based on some specification about what the maximum allowable tau. But for the first time, one may actually gain some insight into how their cell changes affect not only things like propagation delay and loading, but also how those changes may make performance of the cell, when used as a synchronizer, better or worse.

Bottom line: MetaACE is a powerful tool that allows any engineer, who has the extracted cell netlists and device models, the ability to obtain the metastability parameters well before fabrication and even during the cell design process. It also allows any product engineer, who integrates synchronizer cells into his design, to calculate the overall MTBF of all the synchronizers in the system.

lang: en_US

July 24, 2013June 14, 2019

The FPGA Blob is Coming…

The FPGA Blob is Coming…
by Luke Miller on 07-24-2013 at 5:00 pm
Categories: FPGA, Xilinx

I never understood when I was a kid how ‘the Blob’ could actually catch someone but it sure did. It caught the unsuspecting, the off guard. I mean you’d have time for a soda and shower if you saw it on your road. And no, your manager is not the Blob; don’t think like that, it’s always his boss. The blob comes to consume the worker who was unaware that they could leave at 4pm on a Friday to avoid the next mini design crisis; only to learn they did not need your FPGA fix, it was the software reading the wrong register all the time.

I better write something techy… So do you know what I liken the Blob to? The FPGA… Yikes that at the surface does not sound like a compliment but it is and maybe when I’m done, you’ll want to be the blob too. I better stop.

Over the last decade have you noticed what the FPGA has consumed from your marvelous circuit board? Hmm, have you? I have seen the FPGA Blob personally eat whole RADAR Chassis into one part. Now that is blobbish. We are not only doing more math in FPGAs, but handling massive amounts of IO. The IO is how the Blob eats and spits out data. No more propriety IO chips, implement whatever you want. The Blob loves ‘À la carte’. Shoot, even Richard Simmons can’t stop this thing. VPX FPGA COTS boards have an IO FPGA tied to the VPX high speed fabric. That means that board could use SRIO, PCIe etc… and is not locked into a particular ‘open’ (That’s funny) architecture.

Remember the bridge chips? I do… It makes me wonder what else going to be consumed? I have some ideas but will keep them with me for now. The major question I have is what will be left for the microchip makers? Am I really going to buy a video encoder/decoder chip? Memory is safe but everything else is fair game. Could it be that in the future instead of say TI making chips solely, is that it also designs IP for some FPGA house? Thus the single chip IP solutions will find themselves as hardened IP or even soft for that matter in an FPGA. I do not know many engineers at this point using the SHARC Processors, why? Well the FPGA’s are doing much of that DSP now. Blobbed.

It really is a new reality for FPGAs and I like what NVIDIA is starting to think about which is ‘Network on a Chip’, Yes we do need a System on a Chip (My Kids Actually think that is a dip) but that System needs to talk to other Systems over a medium called a network. Who is the biggest FPGA Blob victim you ask? ASIC’s! I remember the days considering the trade space of ASIC vs. FPGAs. No longer is the ASIC a real competitor in the way we used to think of them. In fact IBM laid off many employees from its ASIC division last month, no doubt due to the FPGA blob in part. You see the Blob can change shape, unlike the ASIC it is very flexible just like our friend the FPGA. The Blob is coming and there is no stopping the momentum, just don’t get eaten.

lang: en_US

July 24, 2013June 14, 2019

TSMC Q2 Results: Up 17%; 20nm and 16nm on track

TSMC Q2 Results: Up 17%; 20nm and 16nm on track
by Paul McLellan on 07-24-2013 at 10:47 am
Categories: Foundries, TSMC

TSMC announced their Q2 financial results yesterday. Revenue was $5.2B (at the high end of guidance) with net income of $1.6B. This is up 17.4% on Q1 and up 21.6% year-to-year. Gross margin is up too, at 49% which is up 3.2 points on Q1 and 0.3 points year-to-year. As usual the financial results are not directly that interesting since I don’t much care whether TSMC is a buy next quarter. What is more interesting is trying to read the tea-leaves for the big strategic picture on a multi-year timescale.

Their business breaks down 57% in communication, 16% in computer, 20% in industrial and 7% in consumer. Pretty much all the grown since last quarter is in the communication area, which isn’t really a big surprise, up 22% on the biggest numbers, although other areas are all up in the 10-20% too but from smaller bases.

It is interesting to see the shift taking place between process generations. 29% is 28nm, 21% is 40/45nm, 16% is 65nm and everything else is older. Suprisingly, 15% is in 0.15/0.18um (I’m guessing mostly analog and other specialist stuff since there is almost nothing in 0.13um or 90nm).

ARM also announced their results yesterday, and these are significant for TSMC for one reason. If ARM starts to lose share to Intel in mobile (or Intel starts to lose share to ARM in servers) this will impact TSMC negatively (or positively for the servers). Simon Segars, in his first quarterly presentation since becoming CEO, was very bullish on both areas. Perhaps the most interesting little factoid from the ARM presentation is that royalties are up 24% year-on-year, which is much bigger than the growth in overall semiconductor (2%). And perhaps even more interesting is that a large number of cores that ARM has licensed are not yet shipping (and so not yet producing royalties). For instance, the Cortex-M (which is a microcontroller) has 180 licensees but only 50 are yet shipping. Not all of these ARM-based chips will be manufactured by TSMC., of course, but certainly TSMC will get their unfair share as the biggest foundry. That’s an attractive pipeline. ARM-based servers are now starting to ship, and AMD (admittedly a biased observer) is predicting double-digit market share by 2016/17 which is huge if it turns out to be true. And while AMD themselves do a fair bit with GF, other server licensees work with TSMC. And those are big chips (mostly 64 bit) which will need a lot of wafers.

What is TSMC’s total capacity? Their forecast for the end of the year is for 16.5M 8″ equivalent wafers per year. Fab 14 alone is 2.2M 12″ wafers (5M 8″ equivalents). That’s a lot of silicon, up 11% from last year with 12″ capacity up 17% (new fabs are all 12″ of course). Their capex spending remains on-track for $9.5B to $10B for this year (of which 55% has already been spent in the first half).

When Morris Chang spoke he was bullish too. For overall semiconductor they are cutting their forecast from 4% to 3%. But for fabless they predict 9% growth. And for the foundry industry (not just TSMC) they are raising the forecast to 11% from 10%. And for TSMC bigger than that.

As for 28nm:“Our 28-nanometers is on track to triple in wafer sales this year and our 28-nanometer high-K metal gate is ramping fast, and will exceed the Oxynitride solution starting this quarter. For the Oxynitride solution in which we do have competitors, we believe that we have a substantial lead in yield. For the high-K metal gate solution, we do not have any serious competitors yet. We believe we have a substantial lead in performance. If you recall, ours is a gate-last version and our competitors are mainly in the gate-first version.“

20nm: Risk production has started and volume production starts Q1 2014. Doesn’t see any real competition.

14nm: Volume production starts a year after 20nm in early 2015.

Morris again:“On the 16, if we put it on a foundry to foundry or foundry to IDM basis, we are competitive. If you put it on a grand alliance to IDM basis, we are more than competitive.”
(BTW the transcript for this part keeps saying IBM but that makes no sense and it must really mean IDM, integrated device manufacturer. Or, to be precise, Intel. What Morris is saying is that they will be competitive with Intel at 14nm).

Presentation is here. Transcript of call is here. Transcript of ARM’s call is here.

July 24, 2013July 18, 2025

♫ IMG Sitting on the DOK of the Bay…Closin’ Timin’

♫ IMG Sitting on the DOK of the Bay…Closin’ Timin’
by Paul McLellan on 07-24-2013 at 7:00 am
Categories: EDA, Imagination Technologies, IP, Synopsys

Scott Fitzgerald is supposed to have said “the rich are not like other people” to Ernest Hemingway (he didn’t). In the same way, processors are not like other blocks, and not because they have more gates (they don’t). However, special approaches to optimizing processors are important because the clock rate of the processor(s) sets the overall performance of the system, and any effort made to optimize a processor is amortized over the many systems in which that processor is used.

Today, Imagination announced their first Design Optimization Kits (DOKs), co-developed with Synopsys, that delivers substantial silicon PPA (power-performance-area) gains while reducing design cycle times. The first DOK reduces dynamic power by up to 25% and area up to 10% for PowerVR series6 GPUs. Imagination DOKs consist of optimized reference design flows, tuned libraries from partners and characterization data and documentation.

The first DOK allows customers using Imagination’s PowerVR Series6 “Rogue” GPUs to accelerate time-to-market and reduce power and area through a package of core IP and physical IP co-developed by Imagination and Synopsys. THe DOC includes Synopsys’s new DesignWare HPC (High Performance Core) Design Kit. Imagination has significant internal SoC development expertise in their IMGworks design SoC design group and through this team they can optimize not just the core itself but also how it works in the system.

Reducing the power by 25% is actually pretty significant since PowerVR GPU cores already consume only a fraction of the power of competitive solutions. The fully validated DOK enables customers to optimize the implementation further and hit the precise PPA that they want for their specific application.

Of course this is just the first of many DOKs. The initial focus will be on PowerVR GPUs and MIPS CPUs (remember, Imagination acquired MIPS, announced last November and completed in February) but there are plans to extend the collaborations across other key members of Imaginations IP portfolio including video and vision processors (VPUs) and radio communication processors (RPUs).

Another reason this is so important is that the area occupied by the GPU is growing in relation to the rest of the cores on a typical SoC. You only have to look at die photos of Apples Ax series chips to see how large the GPU has become, especially on the version for iPad that have quad-core GPUs since a retina display the size of an iPad is actually more resolution that HD on the TV in your living room (of course the pixels are a lot smaller but then you are not sitting ten feet away from your iPad). Next generation SoCs are going to have more of everything, more pixels, GPU compute, more…well, more of everything except power. So customers of Imagination who are using these GPUs in systems need to be able to get the performance they need without the power that they don’t.

The first DOK for PowerVR Series6 cores will be available in Q3…wait, isn’t July in Q3?

July 23, 2013July 18, 2025

Debugging Verification Constraints

Debugging Verification Constraints
by Paul McLellan on 07-23-2013 at 3:44 pm
Categories: Arm, EDA, IP, Synopsys

In his DAC keynote last year (2012) Mike Mueller of ARM compared how much CPU was required to verify the first ARM versus one of the latest ARM Cortex CPUs. Of course the newer CPU is hundreds of times larger than the first ARM but the amount of verification required was millions of times as much, requiring ARM to construct their own datacenter outside their office buildings. The reasons are two fold. Firstly, verification suffers from a sort of combinatorial explosion: verifying just doesn’t scale linearly with the size of the design because the number of interactions goes up closer to exponentially. The second reason is that we switched from directed verification (where all the vectors were written by hand) to constrained random verification, where we use random verification to substitute computer power (cheap) for brainpower (expensive) and then use constraints to keep it under control (stop duplicate verification) and expand coverage to areas that were not exercised. This allows us to expand the amount of verification in large systems…but not too much (a good thing).

A second thing that has happened with the development of Universal Verification Methodology (UVM) and the growth of the Verification IP (VIP) business is that much of the verification code was not written by the verification engineer. It comes from other groups in the company or from 3rd parties.

All these together mean that designs now can have 50-100K lines of constraints which leads to performance issues. The constraint solver under the hood of Synopsys’s verification environment has been improved and that has sped things up, sometimes by as much as 25 times but more often just a factor of 2.

But the big problem is now debugging constraints. When the simulation doesn’t do what you expect, then it can be a problem with the constraints. There are three big areas where problems occur:

randomization failure due to inconsistent constraints
unexpected solutions from the solver
unwanted solution distribution

Jason Chen, a CAE in the verification group at Synopsys, presents a webinar on how to use the latest VCS verification environment to debug constraints in a more efficient way, often without requiring recompilation (which for a large design makes the whole process a lot nimbler).

The first thing is the capability to set solver breakpoints: when certain things occur in the solver then the simulation is halted and control is in the debug environment allowing a deeper investigation at just the point of interest. For example, an unexpected solution from the solver.

I’m not a verification engineer but as a C-programmer (well, an emeritus C-programmer) I can sympathize with a problem like this (where the assumption is that if y is 1, x is 10 otherwise 20):x == y ? 10 : 20

But the precedence order is different and this expression is actually evaluated as:(x==y) ? 10 : 20

which doesn’t constrain x at all.

Another significant future problem that is going to occur is that there are now (new! improved!) soft constraints that can be honored or dropped depending on whether they cause consistency problems. Of course, if you write a constraint it is pretty important to know if it is being honored and is actually being used in the verification, or dropped and contributing nothing.

Another feature that improves efficiency is on-the-fly re-randomization. You can make changes to the constraints (such as disabling a constraint block) and then re-randomize without requiring a recompilation or a restart of the simulation. This lets you zoom in on the cause-effect relationship immediately.

The link to the webinar is here.

July 23, 2013June 14, 2019

Around the World in 80 Engineers…Actually Well Over 200

Around the World in 80 Engineers…Actually Well Over 200
by Paul McLellan on 07-23-2013 at 12:19 pm
Categories: EDA

Atrenta today announced Dr Ajith Pasqual, who is the Head of the Department of Electronic & Telecommunication Engineering at the University of Moratuwa in Sri Lanka (which used to be known as Ceylon) has joined Atrenta’s technical advisory board (TAB). OK, academics join EDA company’s TABs all the time so that’s not exactly front-page news. In fact I’ve written before that part of the purpose of TABs in EDA companies is not just advice, but to build stronger connections with senior customer technical staff at customers (both to improve the business environment and to get high-level unvarnished feedback) and with academics (to help in recruiting the best students, getting access to important research etc). So almost every EDA company has a few academics on their TAB.

But EDA in Sri Lanka? Really?

OK, it’s not yet the next Silicon Valley or even Noida (New Delhi, India). But Atrenta has built an R&D team in Sri Lanka with 35 engineers working in the city of Colombo and plan to expand the team to 50 by the end of 2013. They are the first EDA company to have a team in the country and have been working with several universities in the country to recruit local engineers there. Given their expansion plans, they are obviously very happy with the way everything has gone.

Talking of Noida, Atrenta has an R&D operation there. Not surprising really. Ajoy Bose, Atrenta’s CEO, is Indian and when Atrenta started there was a huge difference between Indian and Silicon Valley salaries, since much reduced. Atrenta have 150 engineers there today. In fact they announced just a couple of months ago a move to new bigger premises.

Two years ago they announced their European R&D center in Grenoble France. In fact they recently had a two-year anniversary party (complete with wine-tasting, somebody confused SpyGlass with WineGlass) that Eric Esteve went along to and blogged about here.

Where else? Well, Atrenta’s HQ is in San Jose just near the airport and amongst all those marketing, PR and application people are another small engineering team of 15 people.

And what self-respecting global EDA company can get by these days without a presence in China. Atrenta is no exception with another 15 or so in Shanghai.

Geographically distributed development like this can be very effective. I set it up myself when I went to Europe to open an R&D center for VLSI Technology in 1986. At that point, we hadn’t decided we would set up the R&D center in France, and so the first thing I had to do was site selection. Like Atrenta in Sri Lanka, we were the first EDA (or electronics) company to locate in Sophia Antipolis, although Texas Instruments was only about 10 miles away at Villeneuve-Loubet (that site shut down at the end of last year when TI withdrew from the cell-phone market). Subsequently the European Telecommunication Standards Institute (ETSI) located there and so every major telecoms company had an office of some size in the area.

So what are the advantages of having a remote development team? I think there is one big one and a couple of smaller ones. The big one is that it allows you tap into a new pool of talent. Furthermore, if you are in an area without much competition then you can hire the very best. Try doing that in Silicon Valley (or Noida for that matter). For example, Wally Rhines was telling me that they do a lot of the most mathemtical optical proximity correction (OPC) stuff for Calibre in Cairo, Egypt. They can hire the very best mathematicians in the whole country since there are very few other opportunities for them. Clearly Atrenta in Sri Lanka is the same and they consider the graduates they are able to recruit to be on a par with the top IITs in India.

Other good things about remote development is that engineers are more productive. There just aren’t enough people on-site to spend all day in meetings with! Compared to Silicon Valley, a lot of these places are also very stable and turnover of engineers is much lower. This means that over time these remote sites become really deep knowledge centers on certain areas of the product.

July 22, 2013July 18, 2025

The fixed and the finite: QoR in FPGAs

The fixed and the finite: QoR in FPGAs
by Don Dingee on 07-22-2013 at 1:00 pm
Categories: EDA, FPGA, Synopsys

There is an intriguingly amorphous term in FPGA design circles lately: Quality of Results, or QoR. Fitting a design in an FPGA is just the start – is a design optimal in real estate, throughput, power consumption, and IP reuse? Paradoxically, as FPGAs get bigger and take on bigger signal processing problems, QoR has become a larger concern.

FPGAs started out as logic aggregators, but quickly evolved into signal processing machines because they offer a way to create and connect fast multiply accumulate blocks without most of the surrounding overhead of a general purpose machine. Data comes in, and is subjected to the same predictable operations compiled – or in modern terms, synthesized – into a machine. As FPGAs improve in clock speed, fabric throughput, and logic capability, they have outstripped other approaches to signal processing in terms of raw performance.

It would seem at first there would be few concerns over how an algorithm fits into FPGA logic; after all, the logic is customizable, and designers control what goes where. However, that is only true to a point – FPGAs are not magic. There are specific architectural elements that support flexible, but not infinite programmability. At some point, an infinite range of design choices has to be distilled into finite blocks.

In a recent webinar, Chris Eddington, Sr. Technical Marketing Manager at Synopsys points out today’s FPGA devices are huge and complex, with as many as 4000 DSP blocks, and each block is capable of around 1000 modes of operation. While the general intent of DSP blocks is quite similar, the exact capability and programming of blocks varies widely between FPGA vendors.

How does a designer map their algorithm to actual logic blocks in an FPGA? Are the right blocks in the right modes in the right places? How are they clocked? Are there flow control issues? How are resources like block RAM utilized? This may sound foreign to a designer used to hand-coding, but in the reality of gigantic FPGAs and third-party IP, the chances things fall into place optimally are getting smaller. Eddington proposes the solution is using a high-level synthesis tool with knowledge of multiple FPGA architectures to help.

Many developers are modeling systems today in MATLAB, which makes evaluating and tuning the signal processing algorithms very productive. Eddington ran an informal poll of his audience, and a few more than half who responded were already doing so, “Even if you have a high-level algorithmic design, there may be many, many choices in mapping,” observes Eddington.

By using high-level synthesizable code, leveraging inferences to resources instead of explicit instances, synthesis tools can boost productivity and QoR. With a global view of the on-chip resources and the design, tools can evaluate options and make some informed mapping choices. In a simple example of a multiplier, Eddington showed how parameterized code is more readable, more synthesizable, and more portable all at the same time.

Another example looks at how the MATLAB model represents data in floating or fixed point formats, which may be different from the fixed point mode of operation chosen for the FPGA DSP blocks involved. Mismatches in precision leading to uncontrolled truncation in a pipeline can be a QoR disaster waiting to happen. One capability of Synphony Model Compiler targets the fixed point problem: RTL can be instantiated inside of Simulink, and its fixed point operation simulated and verified using MATLAB scripts with the expected precision. Once verified, that exact same RTL can be synthesized into the FPGA.

Eddington goes on to talk about the uses of cycle-accurate C models for simulation, offering as much as a 40x speedup over RTL simulation, and what to look for in high-level IP to help reduce issues. For example, he has a good discussion on using flow control, which helps with mapping storage requirements to FPGA block memory. He also brings in an example of a parallel FFT and how the flow goes from model to verification.

View the entire Synopsys webinar:
High-Throughput FPGA Signal Processing: Trends, Tips & Tricks for QoR and Design Reuse

While the examples invoke the Synopsys tool chain and some of its unique capability, the webinar is very worthwhile in pointing out the general steps to avoid QoR trouble in larger FPGA designs. Experienced designers may think they have good reasons to hand-code some FPGA resources, but as designs get bigger and faster and IP is being reused more, the case for FPGA high-level modeling and synthesis grows stronger.

lang: en_US