RVN! 26 Banner revised (800 x 100 px) (600 x 100 px)

Debugging Complex Embedded System – How Easy?

Debugging Complex Embedded System – How Easy?
by Pawan Fangaria on 11-08-2013 at 9:00 am

In today’s world of semiconductor design with SoCs having complex IPs, hardware and software working together on a single chip, it’s hard to imagine a system without embedded software into it. But it is easy to guess how difficult it would be to test that hardware and software embedded system. And often there is limited window of time to test that complex system. If I recollect from my initial days of software development when I used to test every function, bottom up by using printf() statements, it is unimaginable that testing these systems will ever complete. Today, on-chip debug logic is being put on the embedded processor and testing is done smartly in real time, on-the-fly.

On-chip debug logic does allow run-control way of debugging by setting breakpoints, but that approach is intrusive and stops the system at those points to examine its state. This approach is okay for small simple systems, but is not suitable for complex multi-processing system and system which require the embedded processor to continuously run, control the feedback loops and maintain mechanical stability. It cannot detect problems at full clock speed and provides limited insight into asynchronous interrupt handlers.

The other approach of on-chip debugging is by utilizing ‘real-time trace’ mechanism which is much advanced and non-intrusive. Cycle-accurate timing information about the program execution, data access and details about surroundings (forward and backward) of specific events can be obtained with single or multiple cores running together. This approach has proved significantly important and widely accepted in safety and performance critical applications such as aerospace, automotive, medical, banking and mobile communications.

So, how does the ‘real-time trace’ work? Traces of all executed instructions by the processor in real-time are captured in a buffer (along with the data as required) to be analysed later. It’s possible to analyse trigger conditions as well. It provides deep insight into the system code without any effect on CPU performance and enables engineers to figure out what exactly brought the system to a state or conditions it is in.


[Working of real-time trace with trace buffers]

Mentor Graphicsrecommends Vitra-XD trace probe developed by Ashling Microsystemsto be used for real-time trace debug. It is clear from the above picture that a combination of Mentor’s Sourcery CodeBench and Vitra-XD has made debugging pretty fast compared to a system with normal on-probe trace buffer with limited capacity. Vitra-XD provides a large buffer (supplemented by an on-device 500 GB hard drive) for storing a vast amount of trace data, thus increasing the probability of capturing and identifying most of the problems. It has 12.8 GB/sec 16 bit parallel trace data capture rate and a timestamp generator which can stamp the captured data at a resolution of 5ns.

As real-time trace is non-intrusive, it provides accurate timing information which helps in investigating timing bugs that may not be caught by other methods. Also, this method works exceptionally well to catch hard-to-detect intermittent problems or in cases where the system reaches unexpected faulty state. Similarly, it is handy to trace back from the point of data corruption or processor exception to analyse what exactly caused these to happen.

In modern world where most of the applications are becoming performance critical, it has become essential to use such debug systems in semiconductor design and SoC development. Most of the ARMprocessors include an embedded trace macro cell (ETM) for this purpose. Mentor Graphics and Ashling Microsystems have rightly identified this opportunity to provide a robust state-of-the-art system to trace, inspect and debug large amount of real time data. There is a whitepaperat Mentor’s website which provides a lot of details about real-time trace, how it works and several of its advantages. It’s worthwhile to read that and gain advantage!

More Articles by Pawan Fangaria…..

lang: en_US


Data Management in Russia

Data Management in Russia
by Paul McLellan on 11-07-2013 at 5:06 pm

Milandr is a company based in Moscow that makes high reliability semiconductor components for the aerospace, automotive and consumer markets, primarily in Russia. They work with multiple foundries, including X-FAB and TSMC in technologies from 1um down to 65nm. Corporate headquarter and main IC design house is located in Russian silicon valleyZelenograd. R&D centers are in St. Petersburg and N. Novgorod.

Over in Semiconductor Engineering is an interesting article about how their data management got out of hand and how they got it back under control. With over 150 engineers spread over multiple sites they certainly had the opportunity for plenty of miscommunication: To reliably use each other’s blocks, engineers would have to create custom scripts and communicate extensively with each other to understand which updates were made and when. More importantly, operational complexity was so high that new users would take a long time to come up to speed when trying to locate the data they needed. Problems included use of wrong configurations, mismatched schematics and their layouts, and verification teams testing outdated versions.

They did an evaluation of the data management solutions out there and picked ClioSoft’s SOS as a solution. They were already using Cadence’s Virtuoso layout environment and SOS cleanly integrates directly into it so that most of the time it is not necessary for engineers to explicitly go to other windows and do explicit management.


ClioSoft also provides a visual design diff (VDD) which allows designers to compare two schematics or layouts and see the differences highlighted. For example, it is easy to compare two versions of a file and see what change someone else made (or what you changed yourself).

The ultimate proof is how happy management are with their decision:Milandr strongly believes that deployment of the SOS tool and methodology has been one of the critical success factors that has allowed them to scale their design activities to over 150+ engineers across multiple design centers in Russia.

The full article, which contains a lot more detail about day to day use and the initial adoption, is here.

Also Read

Managing Multi-site Design at LBNL

Analog ECOs and Design Reviews: How to Do Them Better

ClioSoft at GenApSys


Xilinx and TSMC: Volume Production of 3D Parts

Xilinx and TSMC: Volume Production of 3D Parts
by Paul McLellan on 11-07-2013 at 1:23 pm

A couple of weeks ago, Xilinx and TSMC announced the production release of the Virtex-7 HT family, the industry’s first heterogeneous 3D ICs in production. With this milestone, all Xilinx 28nm 3D IC families are now in volume production. These 28nm devices were developed on TSMC’s Chip-on-Wafer-on-Substrate (CoWoS) 3D IC process.

A couple of years ago Xilinx announced what was then the first production part using a 2.5D silicon interposer. The part consisted of four rectangular die on a square silicon interposer. The die were regular FPGA die, flipped onto the interposer. The interposer had routing and through-silicon-vias (TSVs) to power up and connected the four die. It was a huge part. At a 3D symposium in 2011 eSilicon ran the publicly available information about this part through their cost model and reckoned yield would go from 25% for a huge square die to 75% with the smaller rectangular die. Even with the cost of the interposer eSilicon reckoned the cost saving to be around 50%. This was a very high end Xilinx part that apparently has a list price of tens of thousands of dollars, so even though it was in production, the volumes would have been very limited. Plus it was not a heterogeneous 3D design, all four slices were identical. I am sure Xilinx made this part not just because it was cost-effective but as a pipe-cleaner for interposer-based FPGAs and I’m sure they learned a lot.

The significance of 3D is two fold. Firstly, it allows Xilinx to manufacture FPGAs that are too big to do monolithically. Remember that as die size increases, you get fewer die per wafer but also the chance of an entire FPGA avoiding a critical defect decreases exponentially. You don’t avoid the first cost kicker by doing 3D, the die per wafer decreases whether you do multiple small die or one big one. Actually that is not quite true, there are probably some improvements around the edge of the wafer where a little less of the silicon out there is wasted. However, the second cost can be a big difference, with the smaller die yielding very much higher than the large one that might barely yield at all. For some applications, such as SoC prototyping, having a very big array is a big advantage over multiple smaller arrays since it avoids having to partition the SoC design and worry about the very different timing paths between arrays compared to on-array.


However the other significance is that by mixing die (and probably process) Xilinx can make parts like the Virtex-7 H580T which in addition to the array contains sixteen 28Gbps transceivers meaning that you can drive a 100Gbps optical module using only four transceivers compared to ten 10Gbps that you would need on a normal (not heterogeneous 3D) FPGA. That is what makes these parts the first heterogeneous 3D designs in volume production.

The Xilinx press release with more details is here.


More articles by Paul McLellan…


TSMC on Semiconductor IP Quality

TSMC on Semiconductor IP Quality
by Daniel Nenni on 11-07-2013 at 9:00 am

It is important to note that the System On Chip (SoC) revolution that is currently driving mobile electronics has one very important enabling technology and that is Semiconductor Intellectual Property. Where would we be without the commercial IP market segment? Computers and phones would still be on our desks for one thing, and our appliances certainly would not be talking to us. Semiconductor IP; soft cores, hard cores, foundation IP, interface IP, etc… not only reduce the cost and time to market of SoCs, it also dramatically raises the innovation bar via competition.

Don’t ever forget that TSMC is in the business of selling wafers and IP is a key enabler which is why TSMC spends an incredible amount of time and money on IP quality. A bad IP block can delay or even kill a wafer sale, right? Dan Kochpatcharin, Deputy Director, IP Portfolio Management at TSMC, presented some interesting IP Quality data at the Semico Impact Conference this week. You can find the slides HERE.

“Nobody in this room makes money until the IP comes together on a piece of silicon in production. TSMC’s role is to help host the ecosystem for interoperability, quality, and availability, so that our customers can deliver products. That is what OIP and the TSMC9000 IP quality program is all about.”

The TSMC9000 program consists of a set of rigorous quality requirements for IP designed for TSMC process technologies. Members of the TSMC “Grand Alliance” submit reports and test chip results. This data is available on-line enabling customers to better judge the quality and risk level of the IP before integrating it into their design. Taking IP quality a step further, TSMC has an IP Validation Center (staffed by 30+ TSMC employees) which audits silicon testchip results. This effort is all about trust and clearly in support of the TSMC credo:

Our mission is to be the trusted technology and capacity provider of the global logic IC industry for years to come.

If you look at the IP usage trends over the last five process nodes (65nm, 40nm, 28nm, 20nm, 16nm) the number of unique IP per tape-out is increasing while the ability to re-use IP across nodes is dropping. And thanks to the ultracompetitive mobile market with new products coming at us everyday, design cycles are incredibly short and complex. Do you really want to spend your precious time and resources qualifying IP? Even worse, do you want to risk integrating one of the many pieces of IP into your SoC that does NOT pass TSMC9000 scrutiny?

The fabless semiconductor ecosystem started with TSMC more than 25 years ago and today it is a force of nature that no one company can control. Hundreds of companies, thousands of products, hundreds of thousands of people, and more than a trillion dollars in yearly investment drive this ecosystem and there is no stopping it, absolutely. You will be able to read more about IP and the fabless ecosystem in the soon to be bestselling book “Fabless: The Transition of the Semiconductor Industry” which can be previewed HERE.

More Articles by Daniel Nenni…..

lang: en_US


Start With The End In Mind – For Complete & Fast Success!

Start With The End In Mind – For Complete & Fast Success!
by Pawan Fangaria on 11-07-2013 at 6:00 am

There is always a rush to converge a semiconductor design toward faster closure, amid increasing divergent trends of multiple IPs and high complexities of various functionalities on a single chip. Every design house struggles hard to evolve its customized design flows with several short paths patched up to fix issues, global or local, at each stage in the design flow. And that becomes severe at the final layout stage, acting like a point of no return and warns – fix what you can, here and now, otherwise you are bound to lose the market window. Often one needs to compromise in terms of area and/or performance. Not a pleasant situation!

I wanted to understand how SpyGlass deals with layout situations early at RTL. What I found is this – start at RTL with the layout in mind! I was delighted to have an opportunity talking to Sanjiv Mathur, Sr. Director atAtrenta’s Noida site. I know Sanjiv since my Cadence days when he was in my Physical Design team. He used to provide innovative ideas in the physical design domain, so my guess was right, he is part of a team which has done something really great in SpyGlass Physical. Here is our talk:

Q: Sanjiv, you have been associated with multiple facets of SpyGlass Physical, what are those you are currently working with?

SpyGlass Physical Base functionality concentrates on structural aspect of the design and works at RTL to determine a design construction which can lead to an optimal distribution at the layout level thus reducing long iterations. It takes into account various rules at the pre-floorplan level, satisfies those, and then provides RTL signoff to post-floorplan processes.

Q: What are the pre-floorplan rule checks?

These rules are majorly classified into congestion, timing, and area rules and there are critical structures that violate most of these rules. Critical structures need special attention to make them free from those issues. If large size of a mux, high fan-in and fan-out cones at FFs, high cell-pin density in a module, etc. can cause congestion, then they can easily be detected at RTL through the use of some rules. Similarly, long depth of logic level can lead to timing violations. So, there are specific rules used to capture these issues at RTL, which are then resolved by employing special techniques. For example, pipelining is used to reduce high fan-out cone and splitting is used to reduce high fan-in cone.


[Physical rules employed at RTL]

Q: Do physical rules follow any standard language for their description?

No, these are SpyGlass specific, built into the tool. Any designer can enter them through graphic entry or they can write them as text into a project file (prj file) which has all kinds of descriptions such as HDL (Hardware Description Language), SDC (Standard Design Constraint), technology information and the rules. For analyzing these rules, cross-probing between RTL and schematic is provided.

Q: This looks like a novel idea to resolve issues at RTL. What is your experience on the kind of change in the design that takes place after RTL handoff?

Yes, we have a patent pending on this unique way of estimating congestion based on RTL structures. The change in design is quite appreciable. Below is an example of a real design before and after RTL modification.

Q: Great!! These rules and methods seem to be good to detect and resolve gross structural violations. What do you do when issues come after RTL handoff?

Yes, that is where SpyGlass Physical Advanced comes into the picture which provides handoff quality design by signing off post-floorplan metrics. In an SoC design flow (which can have multiple complex IPs and hard macros along with other design elements), it does partitioning and data flow analysis, divides the design into multiple physical units (PUs), partitions the bus fabric to cater to each PU such that there are minimum logic and clock crossings and there is timing closure. By employing floorplanning, StdCell placement and global routing, it can provide physical congestion scores, thus optimizing the area and satisfying timing closure. It also provides physical guidance to improve efficiency of downstream implementation tools, such as channel planning and estimation.

Q: O.K. that’s a good differentiation between Base and Advanced. How do you make them as separate offering?

It’s the best combination to use SpyGlass Physical BaseandAdvanced together which can provide an optimized floorplan in the form of DEF (Design Exchange Format) or physical constraints. However, for small designs they can use just SpyGlass Physical BaseforRTL signoff and then do the rest of physical implementation on their own. For SoC and complex IP designs, SpyGlass Physical Advanced is the ultimate choice since predicting design closure before entering implementation flow is becoming essential with shrinking technology nodes. It makes the downstream flow smoother by reducing iterations and optimizing on performance and area.

Q: SpyGlass Physical appears to be a great set of tools, who all are using it?

It’s integrated into TSMC IP Kit 2.0. Major design and IP companies and foundries in US, Europe and Japan are using SpyGlass Physical.

It was a great eye opener session with Sanjiv. It reminded me about two of my patents in my earlier job, about five years ago, which, at that time, tried to predict layout congestion at the floorplanning stage based on the block placement and pin assignment in the floorplan. However, that didn’t have a much broader and global context as seen from the RTL level and also there was no pre-floorplan consideration. SpyGlass approach is much superior which starts design refinement to reduce layout congestion right from the beginning at RTL level.

More Articles by Pawan Fangaria…..

lang: en_US


Dassault Patent on Hierarchy Management

Dassault Patent on Hierarchy Management
by Paul McLellan on 11-05-2013 at 5:05 pm

Dassault have recently been granted a patent on their approach to managing design hierarchy. I asked them how long it took from filing the patent until it was granted and they said the whole process had taken 8 years. It is a bit of an indictment of the patent system when it takes 8 years, also known as 4 or 5 process nodes, for a patent to issue in an industry as fast-moving as semiconductor design. Even in Dassault’s slower moving main business of mechanical design, 8 years is a long time. That’s longer than the 787 took to create from when Boeing decided to start until delivery started a couple of years ago. In electronics, it is so far back that it predates the iPhone, which launched the smartphone era and the transition from PCs to mobile.

The patent is 8,521,736. As with all patents it is pretty unreadable. There are only so many times you can read about a “pluraiity of modules” or “a machine-readable storage device” (aka a file) before your eyes glaze over. Despite the official rationale that a patent is a limited monopoly in return for disclosing how to do whatever it is, I’ve never come across anyone in the software world who has used a patent as a basis for implementation. In fact lawyers at companies where I’ve worked have told us not to look at patents so we can’t accidentally be accused of “willful” infringement which caries punitive damages.

So what is Dassault’s patent all about? Static and dynamic selectors are maintained on each module version that has a hierarchical reference to another module. Static selectors are fixed as each module is checked in, but dynamic selectors point to the latest version unless specifically tagged. So the static selectors give a consistent view of the entire design as checked in, but the dynamic view allows the latest versions of everything to be examined (or manually can pick a mixture of different modules at different stages).


Since modules can be tagged arbitrarily, it is possible to use the dynamic configuration to pick up specific key stable vesions such as “GOLD” or “TAPEOUT”. The means it is possible to control easily whether to have a fully static design, fully dynamic or, usually, some sort of mixture.

In a small example, this capability doesn’t seem all that useful, but when configuring modern SoC designs that may contain literally millions of files developed with geographically dispersed teams. Tags can be used to specify design locations, particular IP (e.g. blocks in an ARM microprocessor), phases of the design, or anything else that will turn out to be useful when trying to pull a particular configuration together (all the latest stuff in San Jose with the stable version from Bangalore, for example).

This capability is used within DesignSync for electronic design. Although mechanical design also uses hierarchy, it is much simpler than electronics because a lot is just a black-box that cannot be pushed down into, unlike with a chip design where ultimately everything is a polygon on a mask or it doesn’t exist. Designing a plane you can’t just push into the engine internals.

More articles by Paul McLellan…


Can Intel Compete in the IoT?

Can Intel Compete in the IoT?
by Daniel Nenni on 11-05-2013 at 5:00 pm

Kevin Ashton, a British technology pioneer, is credited for the term “The Internet of Things” to describe an ecosystem where the Internet is connected to the physical world via ubiquitous sensors. Simply stated: rather than humans creating content for the internet IoT devices create the content. To be clear, this does not include PCs, Smartphones, SmartTVs, or wearable electronics. Think everyday things like thermostats, appliances, parking meters, and medical devices enabling physical-to-digital communication via the internet. Today there are an estimated 2B IoT devices in play and that number is expected to grow exponentially in the coming years, so yes, this is a big deal.

The question I have is this: Does Intel have a chance here or will ARM and the fabless semiconductor ecosystem continue to dominate the IoT market?

The annual ARM user gathering was last month and IoT was a major focus. You can read about the ARM and the Internet of Things keynote and visit the ARM TechCon website for more information. My agenda at the conference was gathering 14nm silicon data but I attended the IoT presentations as well and that lead me to where I am today, at the IEEE IoT workshop.

“The great promise of the Internet of Things is about the transformation of the world based on the convergence of numerous disjointed systems into a fully connected environment where complex tasks are synchronized and performed by a unified platform,”said Oleg Logvinov, member of the IEEE-SA Standards Board, member of the IEEE-SA Corporate Advisory Group, and director of market development, Industrial and Power Conversion Division with STMicroelectronics. “During the workshop in Silicon Valley, we will explore how various technologies can be applied across multiple verticals and how convergence is fueling IoT’s endless potential and opportunities.”

I also attended the IDF 2013 Forum last September where Intel announced their IoT contender, Quark. For you Star Trek fans Quark was the beloved con man pictured above. For Intel, Quark is a synthesizable core based on the 486 instruction set to which they claim uses 1/10th the power of Atom and is 1/5 the size. This was just slides with little technical data but details are now starting to emerge. The first Quark will be manufactured on a 32nm SoC process. The main problem I see here is that Intel’s 32nm is HKMG which is not cost nor power optimized and will unfavorably compete with TSMC 28nm poly/SION but I digress…. Lets get back to business.


The IoT value proposition is similar to mobile with low power and cost being the primary drivers. Business models and ecosystem are also going to be determining factors. Do you even know what silicon is inside your mobile devices? I do, but most people don’t. Do you even care? I do, but again, you don’t. Is IoT going to be any different? Absolutely not so say good bye to the old school benchmarks and transistor one-upmanship.

Also read: Intel Quark: Synthesizable Core but you can’t have it

The first questions during the IDF Q&A were about Quark and the Intel business model. By definition a synthesizable core can be licensed and customized by the customer. ARM takes this to a deeper level by licensing the architecture and instruction set so customers have complete control over implementation. So the first question to Intel CEO Brian K. was: Will Intel license the Quark cores? The answer was, “No”. Can Quark be manufactured outside of Intel? No. Can customers synthesize Quark? No. Can Intel be successful in the IoT market with their current Quark business model? No (my incredibly biased opinion). Fortunately business models can change faster than technology so Intel still has a chance with IoT and Quark but they had better hurry.

More Articles by Daniel Nenni…..

lang: en_US


nVidia: Virtual Platform/Emulation Hybrid

nVidia: Virtual Platform/Emulation Hybrid
by Paul McLellan on 11-05-2013 at 11:57 am

I was the VP marketing at VaST Systems Technology and then at Virtutech. Both companies sold virtual platform technology which consisted of two parts:

  • an extremely fast processor emulation technology that actually worked by doing a binary translation of the target binary code (e.g. an ARM) into the native instruction set of the server on which the emulation was running (usually an x86 workstation of some sort). This was done dynamically as the code was executed in the same way as JIT compilers for Java (and other bytecodes) work
  • a modeling technology for other devices in the system (or on the chip if it was an SoC design) along with libraries of pre-existing models

Both companies were moderately successful at selling their technology, VaST especiallly in automotive and Virtutech especially in networking and base-stations. VaST ended up being acquired by Synopsys and Virtutech by the Wind River division of Intel.

But the big hurdle to using the technology was the need to create models for all the “other” devices. Everyone loved the processor technology and found its performance unbelievable. But if it took 3 months of the 6 months you were going to save to develop those models, the ROI was a lot less compelling. Plus blocks were always changing so there was always the problem to ensure that the models matched the RTL (or whatever representation was being used). One answer would have been to use emulation and just run the RTL fast. But in the mid-2000s emulators were million dollar boxes used at only a handful of the biggest semiconductor companies, certainly not available to embedded software developers. But the economics have changed. By some estimates, emulation is now the cheapest way to do simulation per cycle, beating even verilog simulators on big server farms, which has been the simulation infrastructure of choice for a long time.


At ARM TechCon last week nVidia were reporting on how they had used Cadence’s hybrid virtual platform and emulation system to bring up their latest Tegra chips.

You might think that with a big emulator lying around then you just load up the RTL for the ARM processor into the box too. But the problem is that to do real software development requires that you first boot an operating system before you get to run the code you are really working on. Linux takes 1B instructions to boot, Android takes 20B and Windows RT 50B. Booting Windows would take days on the emulator. Instead the fast processor models are used along with the rest of the chip loaded in the emulator. Obviously there is some clever glue making all this work cleanly together as the Palladium/VSP Hybrid.


Results? Linux boots in 2 minutes versus 45 minutes using just the emulator. Android in 45 minutes versus hours. Windows RT in 90 minutes versus days. Nobody knows how many days since nobody bothered to try.

Ultimately, though, what is important is whether all this made any difference to the design. Here is nVidia’s experience:

  • eliminated reliance on other pre-silicon platforms
  • found some software race conditions
  • found some memory management bugs
  • found some code completeness issues

Then silicon came back from the fab. Of course system bringup requires running the software on the actual silicon:

  • this was the smoothest bringup they had done
  • software was ready to demo product at Speed-Of-Light (SOL)
  • fewer bugs meant they could put more effort into optimizing for power and performance.

The nVidia presentation is on the Cadence website here.


More articles by Paul McLellan…


Synopsys Creates a High-performance ARC Core

Synopsys Creates a High-performance ARC Core
by Paul McLellan on 11-05-2013 at 10:00 am

ARC is a family of configurable processors. Originally it was a standalone company in the UK (what is it with the UK and processor cores?) spun out from Argonaut Software. The A in ARC stood for Argonaut originally. ARC International was acquired by Virage and then Virage was acquired by Synopsys so now it is part of Synopsys Designware IP offering.

The family of cores has always been focused on low power and on configurability, but there has not been a high-performance core in the offering. Today Synopsys announced a new generation of high-speed processors, following a sneak preview at the Linley Microprocessor Conference a couple of weeks ago:

  • advanced ARCv2 architecture
  • 18% improvement in code density
  • Real-time and high-end embedded focus
  • >3000 DMIPS per core at under 60mW, 0.15mm[SUP]2[/SUP]
  • power efficient 10-stage scalar pipeline
  • out of order execution
  • branch prediction
  • late-stage ALU to improve throughput
  • 64-bit loads and stores to move data
  • 64-bit multiply and multiply-accumulate
  • hardware integer divider (4 to 19 cycles)
  • IEEE 754 compliant FPU with single/double precision
  • ECC protection for all memories in processor
  • I/O coherency for DMA and peripherals

From an implementation point of view, the new core makes less severe demands on the type of memories required on the chip. Only single-port SRAMs are required, even for the branch prediction cache. Two pipeline stages are dedicated to access CCMs and caches.


Compared to the older ARC cores such as the ARC700 there is a lot more throughput per megahertz and 50% higher CoreMark per megahertz. The maximum operating frequency is up from 1.1GHz to 1.6GHz. Also importantly, it can run and function at a clock rate as low as 2MHz, so offers lots of scope for dynamic frequency scaling for power saving.

The core is optimized for the high end embedded market such as automotive driver assist, solid state drives, digital TV, home networking and so on. Many of these have strong real-time requirements. So to get it:

  • single-cycle peripheral and memory access
  • fast context switch with a second register file
  • configurable to hit sweet-spot for performance vs power
  • custom instructions (as all ARC processors always have)
  • robust interrupt architecture with up to 240 interrupts, 16 levels of priority, auto save and restore
  • optional ECC hardware on all memories (to correct single event upsets etc)

Of course there is a full development tool chain. ARChitect optimized the processor and configures it. There is a range of compilers. ARC plays with virtual prototypes. Out of the box it has support for Linux and Android.

More details on ARC processors are on Synopsys’ website here.


GlobalFoundries and ARM

GlobalFoundries and ARM
by Paul McLellan on 11-04-2013 at 4:56 pm

GlobalFoundries had several interesting things at the ARM TechCon last week. Firstly, GlobalFoundries won the best in show award in the chip design category recognizing the best-in-class technologies introduced since the last TechCon.

Earlier in the summer GlobalFoundries and ARM announced the ARM Cortex-A12 processor, for which GlobalFoundries was the foundry launch partner. The A12 is expected to be a very high volume processor since it is targeted at the low end of the smartphone market (which cannot afford a Cortex-A57 class processor). The low end of the smartphone market is expected to be the fastest growing going forward, the high end already being largely saturated.

Donar is the name GlobalFoundries use for the family of A12 test chips. They are based on Semper, which is an older family of A9 test chips taped out several times in 28nm and 20nm. They were created using a Cadence flow. In a joint presentation, GF and Cadence gave details on what was done. The design was a quad-core A12 and will tape-out imminently in GF’s 28nm-SLP process. It was the first experience for with the A12 for the entire project team of ARM, Cadence and GF.


The work was split up as follows. ARM developed the Cortex-A12, optimized POP components and the initial reference methodology. Cadence supplied the RTL to GDS2 implementation flow, methodology and tool support. GlobalFoundries supplied the Donar test chip design, 28nm-SLP MPW, full Cadence design flow enablement, development of a set of fast cache instances and design resources. The whole design was done on a very tight schedule between May and October.

Another interesting design that GF were showing in the exhibit hall is a 2.5D interposer-based design that was jointly created along with OpenSilicon. The design is called Avatar and consists of two ARM-based die in 28nm on a 65nm silicon interposer. This was a pipe-cleaner design to shake out problems with this sort of design, rather than anything that is expected to enter volume production.

Since there were no acceptable I/Os for this sort of design, OpenSilicon developed specialized die-to-die I/Os (which, of course, are now available for other designs). The problem with “normal” I/Os in this application is that the ESD requirements for full chip I/O is much too high, the drive requirement is too high since it is designed for a PCB trace, and the I/O needs to be small enough to fit under the microbump pitch.


The interposer has 4 front-side and 1 back-side layers of metal and through-silicon-vias (TSVs). The two die are assembled on the interposer. A lot of additional testing needs to be done on the die at wafer sort compared to a normal assembly because of what is called the “known good die” issue. If a faulty die slips through then not only does that bad die get discarded, an interposer and a second good die are also wasted.

2.5D interposer based designs allow different technologies to be mixed in the same design (although that was not done with Avatar). SoC with very wide memory, SoC with analog and high-speed interfaces, or even SoC and FPGA. As we move below 20nm it is hard to put analog on the same die and using a mature process that is optimized for analog design and then putting two or more die on an interposer is an attractive solution.Watch a video about GlobalFoundries 28nm here.

More articles by Paul McLellan…