CEVA Dolphin Weninar SemiWiki 800x100 260419 (1)

Why ARM Enabling Easy Access to Customized SOC’s Matters

Why ARM Enabling Easy Access to Customized SOC’s Matters
by Tom Simon on 10-14-2015 at 7:00 am

The introduction of the Arduino heralded the huge growth and interest in MCU based designs by people who could never before easily put together the hardware and software system required for implementation of their ideas. I remember the first time I saw the Arduino in use. I was at a talk on how a system for controlling propane jet solenoids for an art project had been put together in a matter of a few days on site at an art installation. Seeing how easily the presenter was able to connect the devices to a user interface and write the firmware got my immediate attention. This was around 2008.

The Arduino project has grown from a few open source developers into major initiatives at companies like ARM, Freescale, ST, Atmel and many others. Arduinos have moved from the Atmel AVR based processors to ARM based processors, very often using the Cortex-M0. Other open platforms have evolved too, for instance the ARM mbed initiative. What all this has done is enable a huge wave of innovation, allowing people to get their ideas implemented quickly and at low cost, lowering the barriers to product development.

Unfortunately, up until now, people who wanted to go to the next level and get their products into custom SOC’s were out of luck. Companies whose products are implemented on PCB’s with SMD’s for each separate component could benefit hugely from the advantages of a custom SOC. Their products would be much smaller and could fit into wearable size enclosures. Power consumption would be reduced, reliability would go up, and assembly and PCB costs would go down. The alternative today is to buy standard parts and solder them up, but often compromises are made with off the shelf parts in terms of functionality and form factor.

ARM has just announced a revolutionary new initiative to bring custom SOC’s within reach of system and product designers that would have never been able to take advantage of their benefits. ARM is making its popular low power 32-bit Cortex M0 available for download, along with the design kit and three months of free access to the ARM Keil MDK development tool. With this comes system IP, peripherals, test bench and software. There is even an option to buy a FPGA board for under $1,000 to aid in prototyping.

Once it’s time to move to the next stage of product development, a license to manufacture products containing the Cortex-M0 can be purchased with an easy to buy $40,000 IP license from ARM. The Cortex-M0 is a very popular processor for IoT, mobile and wearable applications. It is extremely low power yet offers 32-bit computing power.

Despite this radically different technology access and licensing model if there is still concern about how to actually implement an SOC, ARM is further enabling SOC development by linking its Design House partners with product developers. Partners like Brite, Dream Chip, S3 Group, Open-Silicon and others, offer turnkey development as well as consultancy or help for custom SOC development. There is also cooperation with a large number of foundries, such as licensing models that include ARM IP costs as a minimal add-on in the foundry fabrication cost.

This is a game changer for people who have product ideas but cannot effectively implement them without a custom SOC. We saw a proliferation of design ideas and collateral technology spin offs from the Arduino. One notable example are 3D printers, most of which have a controller board descended from the Arduino. In the same vein this initiative from ARM will continue the democratization of technology that the maker movement started. Only now, even more complex and traditionally more difficult design options will be available to a larger audience.


Optimizing Quality-of-Service in a Network-on-Chip Architecture

Optimizing Quality-of-Service in a Network-on-Chip Architecture
by Tom Dillinger on 10-13-2015 at 12:00 pm

The Linley Group is well-known for their esteemed Microprocessor Report publication, now in its 28th year. Accompanying their repertoire of industry reports, TLG also sponsors regular conferences, highlighting the latest developments in processor architecture and implementation.

One of the highlights of the conference was the presentation from Benoit de Lescure, Director of Application Engineering at Arteris, and Marc Greenberg, Director of Product Marketing at Synopsys. Benoit provided an update on Network-on-Chip (NoC) architecture design, with an emphasis on optimizing transactions to the unique capabilities of LPDDR4 memory. Marc joined Benoit, to describe how the Synopsys memory controller IP integrates with the new Arteris NoC memory transaction scheduling unit for LPDDR4.

NoC Basics
As SoC designs integrate a greater number and diversity of processing units, traditional crossbar or hierarchical (multi-level) bus architectures do not scale. Routing congestion becomes a major issue for physical implementation, while satisfying Quality-of-Service (QoS) requirements becomes a difficult timing closure task.

Most users associate the term Quality-of-Service with the allocation and scheduling of resources to provide computation that meets critical deadline constraints, typically under the supervision of a real-time operating system. In the context of an SoC with many disparate processing blocks, similar QoS considerations apply. The very heterogeneous types of data traffic on a processor SoC will have different latency and bandwidth requirements:

  • protocol
  • clock frequency
  • data width
  • peak throughput
  • traffic patterns – e.g., transaction length, address alignment
  • reaction to latency and/or “back pressure” from pending requests

These system performance characteristics require specific focus on:

  • throughput of multiple concurrent links (bandwidth)
  • delay from a request initiated by a master through the interconnect to the target (latency)
  • memory efficiency (% of maximum memory throughput realized, sharing the finite memory bandwidth across many command requests)

In essence, a Network-on-Chip implementation involves encapsulating data traffic between processing units into packets, and transporting those packets serially, in a pipelined manner. This enables scaling of SoC complexity, while managing physical routing congestion and satisfying QoS requirements.

As Marc put it, “The goal of the NoC and memory controller is to get the right data to the right master at the right time.”

QoS in a NoC Architecture
The Arteris NoC architecture consists of Network Interface Units (NIU), which communicate directly with each IP core. This logic converts traditional (AMBA AXI, AHB, OCP, or customer proprietary) protocol transactions into packets for transport across the NoC network fabric. At the receiving end, the NIU communicates with a core using an IP socket interface.

The Arteris FlexNoC solution provides synthesizable RTL modules for physical implementation of the NoC. The NIU logic is typically places close to the related IP core in the chip floorplan, to optimize timing and minimize routing congestion. Pipeline register insertion is supported, to optimize timing. And, to be sure, the FlexNoC package includes a suite of SystemC TLM simulation and performance analysis tools.

The NoC fabric specifically addresses QoS bandwidth and latency requirements in several ways:

  • packet priority assignment (by packet, or by all socket transactions)
  • dynamic “pressure” relief (provide a low latency path to high-priority packets when traffic is high)
  • communication between cascaded arbiters at each network switch (to avoid deadlocks)

and the main emphasis of Benoit’s and Marc’s presentation:

  • optimization of the memory scheduler DDR commands to the memory controller IP block, for highest memory efficiency and fewest wait states

NoC QoS with LPDDR4
The advent of LPDDR4 introduces new features in physical memory addressing and timing – with these new features comes opportunities for additional QoS optimizations. Benoit and Marc described how Arteris and Synopsys have collaborated to leverage these new capabilities.

The NoC memory request scheduler and memory controller optimize the sequence of LPDDR4 commands to the shared memory, managing:

  • multiple, independent LPDDR4 channels
  • memory interleaving (logical-to-physical mapping), to optimize addresses for low “locality of reference” packets
  • coherent (and non-coherent) memory requirements for different IP cores
  • power dissipation options, separating critical functions from active/standby memory areas
  • per bank refresh scheduling
  • PHY training/calibration cycles

As SoC processors continue to integrate a greater number and complexity of IP cores, the scalability of the NoC architecture provides a key advantage. The QoS performance requirements of the various core functions will result in a wide set of characteristic “traffic” through the network switch fabric, with varying bandwidth and latency requirements. Specifically, the schedule of memory commands needs to be optimized to achieve QoS metrics, taking maximum opportunity to leverage new LPDDR4 features.

Arteris and Synopsys recently illustrated how a collaborative development partnership between NoC architecture and memory controller IP provider can achieve significantly improved power/performance.

For a great summary of the recent TLG Processor Conference, check out Tom Simon’s recent article:

http://www.semiwiki.com/forum/content/5086-processors-rule-day.html

-chipguy


Convolutional Deep Neural Network is now a reality with CEVA-XM4

Convolutional Deep Neural Network is now a reality with CEVA-XM4
by Eric Esteve on 10-13-2015 at 7:00 am

XM4 DSP has been enriched with CEVA Deep Neural Network (CDNN) Software Framework. Some explanation could be useful before jumping into CDNN. The “Deep” of CDNN comes from “Deep Learning”, a family of neural network methods using high number of layers, so a deep network. The most popular deep learning neural network method is the “Convolutional Neural Network”. Why is it popular? Because CNN focus on feature representations, required to support applications like object recognition, driver assistance (ADAS) or augmented reality, to name a few emerging applications becoming very popular, generating developments in various segments, from automotive to consumer. CNN offer two major benefits, justifying this infatuation. At first, CNN provides best recognition quality when compared with alternative recognition algorithms. The second benefit is linked with the artificial intelligence nature of the algorithm: the designer will implement it once and be able to use it many times without code change, through re-training. Such a benefit could greatly accelerate machine learning deployment for embedded systems, as, by definition, you want such systems to run as long as possible without intervention.

CEVA has run a partnership with Phi Algorithm Solutions and optimized the CNN-based Universal Object Detection (UOD) algorithm from Phi, and ported it to CEVA-XM4 via CDNN. The first “N” of CNN is for Neural, indicating that researchers strive to mimic the human brain in computers. Such work was limited mainly by computing horsepower, power constraints and algorithmic quality, but the technology progresses allow to bring neural network in the embedded world. Harnessing the computing power of the CEVA-XM4 imaging & vision DSP, the partners have created the lowest power and memory bandwidth deep learning solution providing real-time, efficient object recognition and vision analytics.

The concept of pre-trained networks is brilliant: the designer receives network model & weights as design inputs from offline training (pre-trained) and these are automatically converted into a real-time network model, via CEVA Network generator. He can utilize this real-time network model in CNN application on CEVA XM4 DSP. This usage flow is described below (Caffe is a popular open source software framework, used to build, train, activate neural networks).

If you look at the usage flow, CEVA main contribution is the Network Generator, allowing merging two distinct know-how. The 100% software based science using floating-point algorithms on the left side has to be converted into fixed point, power aware and hardware compatible customized network to be implemented into an embedded DSP… keeping high recognition accuracy. CEVA claims less than 1% degradation in accuracy compared to the original network, which means that in less than 1% of cases, the pictured Labrador retriever could be confused with a Beagle.

CNN-based Universal Object Detector algorithm (from Phi Algorithm Solution) is now available for application developers and OEMs to run a variety of applications including pedestrian detection and face detection for security, ADAS and other embedded devices based around low-power camera-enabled systems.

Taking the example of pedestrian detection the real-time detection application utilizing CDNN and optimized for CEVA-XM4 DSP exhibit less than 30 mW for 1080p, 30 fps and provide 15x average memory bandwidth reduction compared to typical neural network implementations. This makes it the lowest power deep learning solution for embedded systems: 30x lower power and 3x faster processing when compared to leading GPU-based systems.

According with Eran Briman, vice president of marketing at CEVA, “Our new Deep Neural Network framework for the CEVA-XM4 is the first of its kind in the embedded industry, providing a significant step forward for developers looking to implement viable deep learning algorithms within power-constrained embedded systems.” The CEVA-XM4 imaging & vision DSP together with the CDNN framework paves the way to advances in artificial intelligence devices in the coming years using deep learning techniques.

By Eric Esteve from IPNEST


Wi-Fi Pioneers: Where are they now?

Wi-Fi Pioneers: Where are they now?
by Majeed Ahmad on 10-12-2015 at 4:00 pm

Wi-Fi is the unsung hero of the mobile revolution. Some people even call it the real Internet. In retrospect, smartphones took off partly because Apple forced mobile operators to seriously consider handsets with Wi-Fi capabilities. Now Wi-Fi is an intrinsic networking component serving smartphones, tablets and notebook computers in home, office and public environments.

The origins of Wi-Fi can be traced back to the 1980s when the wired networks like Ethernet were just taking off. The article briefly chronicles the work of three pioneers who laid the groundwork for one of the most successful facets of the Internet.

Michael Marcus:


The man who proposed the idea of ISM band

In 1980, an engineer named Michael Marcus proposed the idea of opening up the 900 MHz industrial, science and medical (ISM) band at 2.4 GHz and 5.8 GHz frequencies. Marcus theorized that license-free radio spectrum in the hands of technology entrepreneurs would stimulate innovation and thus yield productive benefits.

After five years of prodding from Marcus, in 1985, the Federal Communications Commission (FCC) opened up the so-called garbage band allocated for household devices like microwave ovens and radio-controlled toy cars. However, the commission mandated that this new license-free band would use spread spectrum technology to ensure that there was no interference with the existing devices using the ISM band.

Where is he now?

Marcus now heads a consultancy firm Marcus Spectrum Solutions LLC that is located in Washington D.C. area and provides wireless spectrum-related services like certifications and training.

Victor Hayes:


Hayes led the IEEE 802.11 committee through its first decade

In the 1980s, IT equipment maker NCR Corp. faced a problem: its retail customers changed their floor plan from time to time, and when they did that, the NCR-provided cash registers had to move and be re-connected to the computer servers. So in 1988, NCR, which wanted to use the unlicensed radio spectrum to hook up its wireless cash registers, asked protocols specialist Victor Hayes to look into this technology prospect.

The engineering teams of NCR and its joint venture partner AT&T eventually developed the Wi-Fi technology in 1991 in Neuwegein, the Netherlands. The product called WaveLAN—offering speeds of 1 Mbit/s to 2 Mbit/s—would serve cashier systems in a wireless environment.

Hayes was the first chair of the IEEE 802.11 group, which in 1997 finalized the wireless standard that later became known to the world as Wi-Fi. The Netherlands native, who had joined NCR in 1974, is often being referred to as the “Father of Wi-Fi” for his role in establishing and chairing the IEEE 802.11 Standards Working Group for Wireless LANs.

Where is he now?

Hayes is now senior research fellow at the Delft University of Technology, the Netherlands, where he carries out development work for flexible wireless spectrum management.

Bruce Tuch:


Tuch was a lead innovator in the Wi-Fi development arena

Bruce Tuch was working as an RF engineer at Bell Labs when he got involved in work related to WaveLAN, of product of NCR, which had now been acquired by Bell Labs’ parent company AT&T. Subsequently, Hayes and Tuch approached the Institute of Electrical and Electronics Engineers (IEEE), where a committee named 802.3 had earlier developed the Ethernet LAN standard. Consequently, a new committee called 802.11 was formed, and work on a new wireless LAN standard began.

After his early contribution in the development of WaveLAN product and technology standardization at IEEE 802.11 committee, he continued to lead the Wi-Fi development efforts at the Agere System’s Utrecht Systems Engineering Centre in the Netherlands.

Where is he now?

Tuch is now based in Amsterdam, the Netherlands, where he is Vice President of Development at PowerOasis, a firm that provides turnkey mobile communications solutions for renewable energy markets.

The article is based on excerpts from the book “Age of Mobile Data: The Wireless Journey to all Data 4G Networks.”


SpyGlass World at Levi Stadium, October 21st

SpyGlass World at Levi Stadium, October 21st
by Bernard Murphy on 10-12-2015 at 2:00 pm

I suppose you might have something better to do next Wednesday but, seriously, it had better be pretty good. I admit I’m biased (I was the Atrenta CTO until very recently) but even given that and mixing metaphors, Atrenta really knocked it out of the park when they got the 49er stadium for their User Group meetings. You don’t have to be a 49er fan (perhaps not easy to admit these days) or even a football fan to enjoy the event. It’s held in the United Club, a luxurious and relaxing environment with a panoramic view of the field and ample room to stretch your legs, both inside and outside. Oh, and the food is pretty darn good too. And they have raffle prizes and a grand prize of 49er tickets. And a luxury cruise… OK, I made that last one up. But you get the point. This is a much easier way to spend a day than the way you typically spend your day.

Synopsys says they are committed to maintaining continuity for SpyGlass, which is why they are continuing these user meetings. Philippe Magarshack, now CTO of ST and a long-standing supporter of Atrenta opens with a keynote on application of FD-SOI in fast growing markets, especially for IoT. You’re not often going to get to hear directly from the technical #1 in the leader in FD-SOI how he thinks it stacks up against FinFETS. The day continues with user experiences from Marvell, Infineon, Synopsys (should also be fascinating), Xilinx and Broadcom. The event wraps up with a forward look into Synopsys verification directions which will be a must-see for anyone wanting to know how SpyGlass will fold into the Synopsys family and where Synopsys is headed with static verification.

When you need to explain to your supervisor why you should be doing this rather than re-running those simulations for the 95[SUP]th[/SUP] time, you might mention that user-groups are important for a perspective into how competitors in your industry are using these tools, and that your company needs to stay current on best-in-class practices. The funny thing is, this is actually true. You can’t improve your own competitiveness and company competitiveness unless you focus some of your time on important tasks (learning) over urgent tasks (fighting the latest fire).

So get started on whining your boss into letting you spend a day away from the company treadmill. You won’t regret it. Register HERE.

More articles by Bernard…


3 Self-Service Semiconductor Design and Manufacturing Wins!

3 Self-Service Semiconductor Design and Manufacturing Wins!
by Daniel Nenni on 10-12-2015 at 12:00 pm

As the semiconductor consolidation continues and thousands of semiconductor professionals update their LinkedIn profiles, the march to create new silicon opportunities is increasing at a rapid pace. It is 1980s deja vu all over again when the fabless business model reenergized the semiconductor industry and brought affordable electronics to our homes and hands.

One of the leading enablers for the coming onslaught of wearable, IoT, robotic, autonomous, virtual reality, security, etc… designs is an automated online secure environment that provides a self-service, transparent, accurate, real-time experience from IC design through volume ASIC production. I’m talking about the STAR online design virtualization platform of course. STAR helps you manage complexity and make the right decisions on your ASIC journey from concept to volume production, absolutely.

The eSilicon STAR platform is closing in on 1,000 users (think design starts). These folks have explored many IP and prototyping and production tape-out options, several thousand in all. The 24/7, transparent experience offered by the platform is starting to change the way folks do business. Here are three examples:

A small startup found the eSilicon STAR platform through a Google search last year in late December – during eSilicon’s year-end shutdown. This company ran several MPW prototype scenarios with MPW Explorer over the shutdown period, and made one phone call to eSilicon to get clarification on a technical question. One phone call. A few days after New Year’s, they sent a signed contract. eSilicon closed a deal when no one was at work. This same company got their MPW prototypes and things went well from there with their end customer. They then generated a production tapeout quote with GDSII Explorer, signed the quote and taped out three weeks later. This startup used the data provided by eSilicon’s STAR platform to build their business plan.

Another startup booked an MPW run through MPW Explorer with no verbal communication at all. The entire deal was booked online. Another startup registered for a STAR account and committed an MPW run 90 minutes later. No phone calls. No human interaction at all.

eSilicon recently booked a tapeout deal with a large design services organization. This company had set up a STAR account and generated a quote with GDSII Explorer for their end customer before they ever spoke with anyone at eSilicon.

Self-service semiconductor design and manufacturing is becoming real…

The STAR Navigatortool allows you to quickly search, compare, and evaluate IP online to find the eSilicon memories and I/Os that best meet your chip’s power, performance, or area (PPA) targets and transparently access key data without navigating complex documentation or engaging in time-consuming evaluation processes.


Tensilica 4th generation DSP IP is a VPU

Tensilica 4th generation DSP IP is a VPU
by Eric Esteve on 10-12-2015 at 7:00 am

You may not know Tensilica DSP IP core, but you probably use Tensilica DSP powered systems in your day to day life. Every year, over 2 billion DSP cores equip IC in thousands of designs supporting IoT, Mobile Phones, Storage/SSD, Networking, Video, Security, Cameras… and more. Why DSP processing, the foundation of all Tensilica processors, is getting such high adoption? Just because DSP processing is more energy and area efficient than CPU or even CPU/GPU processing. If you compare the energy dissipated to process image (in mJoule per frame) when offloading to Host CPU (4 cores), Host CPU (4-cores) + 3-pipe GPU (4-cores) or to a Vision P5 DSP like Tensilica 4[SUP]th[/SUP] generation IP core you notice 30X reduction factor.

Imaging/Vision processing is required in more application every day, like Mobile phone when processing the raw image from the camera, automotive to support the multiples applications linked with Automotive Driver Assistance Systems (ADAS), 4K Ultra-HD or IoT. That why the 4[SUP]th[/SUP] generation of DSP from Tensilica is labeled “Vision P5”. Before looking at the DSP architecture, let’s clarify a point: this DSP is both an Image Signal Processor (ISP) and a Vision Processing Unit (VPU). Directly interfacing with sensors, ISP used to be implemented in hardwired logic (RTL), but the trend is to move to “Soft ISP”. If we take the example of face detection, moving to soft ISP allows dividing by 5X energy consumption. Cadence Imaging/Vision DSP focus on Image post processing and on Image/Video analysis. The slide below helps understanding this focus. Post processing includes 2D/3D noise reduction, image stabilization, Super Resolution, etc. when Image/Video analysis (face detection, people detection and more) is part of Vision Processing.

Cadence Vision P5 DSP core is a deeply pipelined design running up to 1.1 GHz (on 16nm FF technology), being low power thanks to massive clock gating implementation. The core supports 256 ALU ops/cycle due to the vector extensions based architecture: 4 vectors ops per cycle, each being 64-way SIMD. Vector extensions data can be 8-b, 16-b or 32-b. To support vision-based ADAS, drones, and augmented reality, the designer will have to implement an (optional) Vector Floating Point Unit (VFPU). This VFPU can deliver 32 GFLOPS per second for a core running at 1 GHz. This core supports industry widest 1024-bit memory interface and the memory system performance is greatly improved, thanks to scatter/gather data registers. Up to 16X faster random memory access can be achieved for non-uniform access algorithms like image warping, edge tracing, non-rectilinear patch access… Finally, like the others Tensilica DSP IP core, the Vision P5 DSP core allows customers to add their own instruction set.

To support massive computing needs, multiple Vision P5 DSP cores can be implemented. Multi-core support shared memory and message-passing architectures. A five cores implementation can deliver up to 1 Tera-ops (1,000,000,000,000 operations per second !) but which is really amazing is the footprint : 2 sq mm in 16nm FF technology.

Imaging and vision processing is a very fast moving market, OEM are constantly changing optics, sensors, and new algorithms have to be developed to support emerging applications like face detection, face recognition. ADAS is becoming a reality and in the near future we can expect the car manufacturers and their suppliers to imagine many new applications and create new algorithms. They just need highly programmable and flexible engine, scalable thanks to multi-core capability. The 4th generation of Tensilica DSP, the Vision P5, has been developed by Cadence to support such applications.

By Eric Esteve from IPNEST


Meeting DFM Challenges with Hierarchical Fill Data Insertion

Meeting DFM Challenges with Hierarchical Fill Data Insertion
by Tom Dillinger on 10-11-2015 at 12:00 pm

To describe the latest methodology for the addition of Design for Manufacturability fill shapes to design layout data, it’s appropriate to borrow a song title from Bob Dylan – The Times They Are A Changin’. The new technical requirements are best summarized as: “The goal is now to add as much fill as possible, which (ideally) looks like the actual design.”

At the recent TSMC Open Innovation Platform symposium, Zhe Lui from HiSilicon and Bill Graupp from Mentor Graphics presented results of their collaboration with TSMC to optimize the fill database for a HiSilicon N16 design. Specifically, Mentor and TSMC enhanced the algorithms in Mentor’s Calibre YieldEnhancer with Smart Fill tool (and the DFM Data Kit runsets), while Mentor and HiSilicon implemented a hierarchical methodology for managing the additional data volume and subsequent design verification runtimes.

Background

The addition of fill data for DFM originated with the Chemical-Mechanical Polishing (CMP) process for improved BEOL metallization planarity. To reduce the extent of metal line “dishing” during polishing, a rule for minimum local metal density measured across a small stepping window was established, and metal fill algorithms were implemented to meet this requirement.

The shapes were typically of a size/spacing to have minimal impact on electrical performance. The shapes were added “at the end, in the top cell” of the design, to meet tapeout criteria. That has all changed…

Current fill data requirements

Fill data must now meet a much more stringent set of lithography requirements – e.g., stepping window density (as before), density gradient limits, multi-patterning decomposition “color assigned” fill data, color-specific density on a layer.

No longer solely a methodology to address BEOL metal thickness uniformity after CMP, fill data is now directly related to:

  • lithographic uniformity of FEOL layers (especially for FinFET processes)
  • induced mechanical stress on devices (affecting their electrical characteristics)
  • etch rate uniformity (from the “loading” of material to be removed)
  • electrical behavior (due to the proximity of fill data to design data)

As a result, the volume of fill data required has exploded – “often greater than 1B shapes/layer, or 3-4X the size of the design database”, according to Mentor and HiSilicon. The full-chip (flat fill) DRC verification resources alone quickly became intractable.

Hierarchical fill

Mentor and HiSilicon developed a hierarchical “fill as you go” methodology, starting with lower-level cells. This necessitated enhancements to the Calibre YieldEnhancer Smart Fill algorithms. Optimizations were required at the perimeter of cells – “no empty space”. Interconnects were “wrapped” with fill data to match multi-patterning color assignments.

HiSilicon described the engineering approach needed to implement hierarchical fill:

  • (lower-level) cell selection
  • addition of blockage layers for subsequent fill steps
  • the flow for layout database management

This new methodology deployed at HiSilicon enabled them to maintain suitable turn-around time in final full-chip verification.

(Although HiSilicon and Mentor didn’t explicitly mention this feature in their presentation, Calibre Smart Fill also includes algorithmic support for ECO fill, a key feature required to keep the verification and analysis iteration time for a last-minute design change in check.)

As Moore’s Law proceeds, the complexity of DFM requirements will certainly continue to grow. This OIP presentation highlights that a foundry + EDA vendor + design company collaboration is extremely beneficial to drive tool and methodology enhancements, to address these complexity challenges. The times they are (definitely) a changin’…

-chipguy


Applying EDA Concepts Outside Chip Design

Applying EDA Concepts Outside Chip Design
by Bernard Murphy on 10-11-2015 at 7:00 am

(I changed the title of this piece as an experiment) Paul McLellan recently wrote on the topic of new ventures crossing the chasm (getting from initial but bounded success to a proven scalable business). That got me to thinking about the EDA market in general. In some ways it has a similar problem, stuck at $5B or so and single-digit growth rates, on the left side of a chasm separating it from an at least conceivably much broader market. EDA isn’t going to get more of the semiconductor pie, so now we look for ways to expand upward into software and embedded systems. That’s one way to grow the market, but are there different, or at least complementary ways to expand? One opportunity may be network architecture design and analysis, an emerging (and therefore potentially fast-growing) domain to which it seems we could adapt EDA techniques and principles.

It doesn’t take a lot of thought to realize that a network looks a lot like a netlist. Of course there are differences. All or most connections are bidirectional, “signals” are a lot more complex than 1’s and 0’s and the nodes are a lot more complex than logic gates. However, if obstacles like that were insuperable we’d still be using Spice to simulate logic, so differences aside, perhaps there are opportunities to apply netlist tool concepts to networks.

This idea is not new, but neither is it fully developed. SIGCOMM, the ACM’s group on data communications held a conference this August in which they devoted half a day to a tutorial on network verification. An extract from the tutorial introduction makes this point: “One can also view a network as a circuit using an EDA (Electronic Design Automation) lens …. If design rule checking is analogous to static checking, what is the analog of synthesis? … These analogies have led networking researchers to frame a new research agenda, made compelling by the ubiquity of cloud services, called Network Verification. They ask: what are the equivalents of compilers/synthesis tools, debuggers, and static checkers for networks?”. If that isn’t a clarion call for EDA innovators searching for a new direction, I don’t know what is.

One example of static analysis checks configurations for potential errors where routers may learn routes that are not usable or conversely fail to learn routes that are usable. More recent efforts aim to formally assess reachability of IP addresses and to define semantics for networks which might provide a foundation for proofs of correctness. Motivated by Software Defined Networks (SDNs) there is work around how to specify requirements at levels above the fairly atomic level in which individual routers are programmed, moving the abstraction up to network policies, so that individual router configurations can be derived automatically from a synthesis / compilation step based on that higher level requirement. There are analogs to ATPG (in this case, automatic test packet generation) and coverage analysis to perform end-to-end testing and performance analysis in network. And there’s more – this seems to be a very fertile area of research.

A very interesting aspect of analysis in this domain is that it can and often will be applied to live networks, rather than networks in the design stage. Application in field deployment was always a holy grail for EDA because it would take you past the very small universe of designers who might use your tool to the potentially much larger universe of field-deployment and maintenance specialists. That is what may make this direction so compelling – to grow from a total market of say 10-20K chip designers to a market of hundreds of thousands or millions of IT/IS and networking engineers.

Of course this won’t be easy, but in a slowly-growing, mature market returning interesting value to investors isn’t easy either. EDA principles will carry over and maybe some techniques too, but a lot of invention and new development will be required. And you have to worry about the small detail of if or when this market will actually take off; you don’t want to get too far ahead of the parade. That said, there are indicators. The ACM tutorial referred to the growth of cloud services. The potentially significant growth of IoT will further compound the complexity of networks above those we understand today and software defined networks seem likely to become more commonplace. In this new reality, wouldn’t you think a need for automated design, optimization and verification tools would become essential? The ACM certainly seems to think so.

More articles by Bernard…


S2C ships UltraScale empowering SoFPGA

S2C ships UltraScale empowering SoFPGA
by Don Dingee on 10-10-2015 at 7:00 am

Most of the discussion around Xilinx UltraScale parts in FPGA-based prototyping modules has been on capacity, and that is certainly a key part of the story. Another use case is developing, one that may be even more important than simply packing a bigger design into a single part without partitioning. The real win with this technology may be system-on-FPGA (SoFPGA).

In the early days of FPGAs, everything was basic rows and columns, without much visibility inside. The good news was this allowed logic blocks to be laid down like tiles. Combinational logic was happy with this approach, and simple sequential logic benefitted from the flexibility. Complexity rose, and generations of more sophisticated FPGAs with improved clocking structures, logic enhancements, and reduced propagation delays raised the bar.

FPGA-based prototyping systems evolved into real prototyping platforms. The prime directive was to reproduce behavior of RTL intended for an ASIC as faithfully as possible in an FPGA. This was easiest if a design fit entirely within a single FPGA, but innovators quickly found creative ways to support larger designs with partitioning and interconnect strategies connecting two, four, or more FPGAs. Debug capability was enhanced, enabling teams to see what was happening inside a design when things were not quite right. Speeds increased, allowing actual software to run, and synthesis times for revisions dropped allowing changes to be made quickly.

That all adds up to a strong value proposition for FPGA-based prototyping of SoCs.

S2C’s announcement of production shipments of single UltraScale VU440 (Single VU) Prodigy Logic Modules represents state-of-the-art in single-module capacity and debug capability. With dual and quad modules on the way soon (available for ordering now), the ability to partition big SoC designs across four UltraScale VU440s is a given.

What sets S2C apart from other FPGA-based prototyping systems is the potential for large-scale system-on-FPGA design, where the deployment system is the FPGA-based platform. Workload-optimized platforms for hardware acceleration of processing and analysis are taking advantage of high-speed FPGA interconnect and advanced DSP capability found in the UltraScale VU440. The Prodigy Cloud Cube from S2C connects up to 16 Single VU Logic Modules today in a massively configurable SoFPGA.

Such a SoFPGA can tackle parallelism on a scale few other architectures can achieve. SoFPGAs also excel in relatively low-volume applications where justifying a SoC would be difficult. Applications like big data processing, broadcast video, image processing, financial trading, and others with unique high performance requirements in select deployment are ripe for this kind of innovation.

In a departure from previous generations, design of SoFPGA systems with advanced FPGAs like UltraScale can now leverage SoC-class IP, as opposed to only brute-force FPGA tactics or basic RTL for synthesis. The biggest development so far is SoFPGAs are now utilizing AXI as the IP interconnect. This has two distinct advantages: it abstracts the hardware interconnect making IP blocks reusable, and it allows managed traffic flow for advanced software design.

Startup Wave Semiconductor is beginning to emerge from stealth mode, and is providing a look at how they are leveraging AXI in large FPGA designs. At the October 14[SUP]th[/SUP] session of the DVClub in Milpitas sponsored by S2C, Wave will present how they are using deep packet inspection to verify AXI traffic. We usually associate DPI with an external networking interface such as Ethernet, but the use of DPI within SoFPGA designs could provide significant advantages in scalability and security.

More details on Single VU production shipments and the upcoming special event are on the S2C site:

S2C Shipping Prodigy Virtex UltraScale and Kintex UltraScale FPGA Prototyping Boards to Customers Worldwide

Wave Semiconductor to Present at S2C-Sponsored DVClub in Silicon Valley on October 14

Again, I’d emphasize that the new S2C Single VU modules are useable in stand-alone configurations, in a more traditional FPGA-based prototyping role. The potential for SoFPGA as a processing platform is fascinating, and we’re excited to see where cloud interconnect, AXI-based IP, and system level approaches like DPI can take applications.

More articles from Don…