webinar banner2025 (1)

DAC versus SEMICON ES Design West!

DAC versus SEMICON ES Design West!
by Daniel Nenni on 12-19-2018 at 12:00 pm

As I mentioned in a previous post, the big drama at last year’s Design Automation Conference was the acquisition of the Electronic Systems Design Alliance (formerly EDAC) by SEMI, the owner of the SEMICON West Conference franchise. The plan is to add an ES Design West wing to the SEMICON West conference in San Francisco next year. DAC is in June, SEMICON West is in July, thus the conflict. Given EDAC was a big DAC supporter some consider this a treasonous act which makes it all that more entertaining.

The timing is right to launch a competitive conference in San Francisco because in 2019 DAC will be in Las Vegas, a location criticized for a lack of local technical community support amongst other things. DAC has been in Las Vegas twice over the last 35 years if my memory serves me. The first time was in 1985. It was memorable as my second DAC but also because I was just married and my beautiful bride joined me. It really was an exciting time in EDA and Las Vegas is an exciting venue, absolutely.

In fact, my beautiful wife and I returned to Las Vegas 30 years later to renew our wedding vows. Funny story, we actually went to Las Vegas on our anniversary to see Elton John and I had planned on surprising her with a quick chapel reenactment by an Elvis impersonator but the hotel had a last minute wedding cancellation so we got a large room with all of the trimmings. My wife was duly impressed, Las Vegas baby!!!

Here is the official ESDA announcement:

Oct 24, 2018:ESD Alliance Announces ES Design West Debut in Conjunction with SEMICON West 2019 in San Francisco

One thing I can say about the Design Automation Conference organizers is that they listen and last week is proof in point. DAC will be in San Francisco from 2020-2025 and probably beyond. Talk about returning fire, wow! Truthfully, I will miss the exotic DAC venues like Las Vegas, New Orleans, San Diego, Los Angeles, Austin, Miami, Dallas, Albuquerque, and other locations that I attended but can’t recall.

But I do agree that the San Francisco DAC provides the most value add for EDA and IP vendors exhibiting their wares, absolutely. The question is: Will there be enough demand for two Design Automation Conferences in San Francisco a month apart in 2020 and beyond? You tell me in the comments section and I will add my thoughts.

Here is the complete DAC PR:

The Design Automation Conference Secures a Five-Year Conference Location at San Francisco’s Moscone Center

The world’s premier event devoted to the design and design automation of electronic chips to systems returns to the Bay Area starting June 2020 and beyond

LOUISVILLE, Colo. – December 13, 2018 Tapping into a resurgent interest in electronic design automation and the proximity to Silicon Valley, sponsors of the Design Automation Conference (DAC) announced they will hold the annual event, now in its 56[SUP]th[/SUP] year, in San Francisco for five consecutive years, starting in 2020.

DAC’s sponsors – the Association for Computing Machinery’s Special Interest Group on Design Automation (ACM SIGDA), and the Institute of Electrical and Electronics Engineer’s Council on Electronic Design Automation (IEEE CEDA) – secured the following dates at San Francisco’s Moscone Center to hold their annual event.

  • DAC 2020, June 17 – 26 – North and South Hall
  • DAC 2021, June 23 – July 2 – West Hall
  • DAC 2022, June 15 – 24 – North and South Hall
  • DAC 2023, June 21 – 30 – North and South Hall
  • DAC 2024, June 19 – 27 – West Hall

After 55 years, DAC continues to be the world’s premier event devoted to the design and design automation of electronic chips to systems, where attendees learn today and create tomorrow. A rise in attendance and participation in the 2018 DAC in San Francisco spurred the sponsors to return to the city after the 2019 event that will be held in Las Vegas, June 2-6. More than 6,000 people attended and more than 170 companies exhibited at DAC 2018 held at Moscone West Center.

DAC is the only event for top-notch researchers to present their cutting-edge discoveries, the platform for leading field engineers to share their experiences in using EDA (electronic design automation) tools to tame the ever-growing circuit and system design monsters, and the largest exhibition of EDA tools, software, IP cores, and other related products and services. DAC has been and will continue to be the must-attend event for the design and design automation community.

“We attend DAC to build our knowledge of what is possible so that we can continue to innovate and stay at the leading edge of design,” stated Chris Collins, senior vice president, products & technology enablement at NXP. “DAC has been and always will be a place we look to for such guidance. It is a place where we can meet with pioneers, innovators, and solutions providers to understand the technology and provide feedback on the technology that will drive our next products.”

The abstract submission deadline for DAC 2019 technical papers closed November 20, 2018, with a record 1049 abstracts received and 819 accepted papers for review. The number of accepted papers for review for DAC 2019 has surpassed the last five years of accepted for review papers by approximately 19%. Among the hot topics for 2019 are submissions in areas such as machine learning and artificial intelligence architectures, which increased by 61%.

“DAC continues to evolve to satisfy the needs of our industry. It’s always been the premier place for presenting EDA research and showcasing all vendors under one roof,” said John Busco, director, logic design implementation at NVIDIA. “In recent years, it’s added tracks to satisfy the interests of working designers and IP consumers. DAC brings together academia, commercial EDA, and electronic system designers—a true cross-section of semiconductor design. As technology progresses, and both our challenges and opportunities multiply, DAC offers an ideal forum to explore, exchange ideas, and innovate.”

The call for contributions for the 56[SUP]th[/SUP] DAC in Las Vegas is now open for the Designer Track and IP Track. The submission deadline is January 15, 2019. For more information visit: https://www.dac.com/submission-categories/designer-track

About DAC
The Design Automation Conference (DAC) is recognized as the premier event for the design of electronic circuits and systems, and for electronic design automation (EDA) and silicon solutions. A diverse worldwide community representing more than 1,000 organizations attends each year, represented by system designers and architects, logic and circuit designers, validation engineers, CAD managers, senior managers and executives to researchers and academicians from leading universities. Close to 60 technical sessions selected by a committee of electronic design experts offer information on recent developments and trends, management practices and new products, methodologies and technologies. A highlight of DAC is its exhibition and suite area with approximately 175 of the leading and emerging EDA, silicon, intellectual property (IP) and design services providers. The conference is sponsored by the Association for Computing Machinery’s Special Interest Group on Design Automation (ACM SIGDA), and the Institute of Electrical and Electronics Engineer’s Council on Electronic Design Automation (IEEE CEDA).

Design Automation Conference acknowledges trademarks or registered trademarks of other organizations for their respective products and services.


Ampere: More on Arm-Based Servers

Ampere: More on Arm-Based Servers
by Bernard Murphy on 12-19-2018 at 7:00 am

Since I talked recently about AWS adding access to Arm-based server instances in their cloud offering, I thought it would be interesting to look further into other Arm-based server solutions. I had a meeting with Ampere Computing at Arm TechCon. They offer server devices and are worth closer examination as a player in this game.


First, the people at Ampere are heavy hitters. Start with Chairman and CEO Renee James, a past president of Intel. The CFO/COO is ex Apple and Intel and almost everyone else is ex Intel, immediately or at some time in the past, including the architect and VP of engineering, all with solid server backgrounds. I’ve also heard that they are raiding Marvell/Cavium for talent. I met with Matt Taylor, SVP of WW Sales and Biz Dev. Between Intel and Ampere, Matt was VP of sales for Qualcomm’s Datacenter group. All in all, a pretty impressive lineup for a business targeting the cloud space. The company is funded by the Carlyle group (first round), though no word on how much.

I had to ask Matt for his view on why QCOM exited servers. No real surprises but good to hear from an insider. He said the business opportunity was strong (well he would), but QCOM was distracted (just a bit). Paul Jacobs and Derek Aberle, who were supporters, left and QCOM had to cut $1B, for which datacenter was an easy target. Multiple reasons, fairly unique to QCOM, which didn’t really say anything about the general Arm-based server opportunity.

Ampere is going after the same target as Annapurna (AWS), except Ampere isn’t captive so is aiming at all the cloud top-end providers (the hyperscalers/super 8) – Google, Amazon, Microsoft, Facebook, Baidu, Alibaba, Tencent, and China Mobile – all of who buy servers by the railcar load.

On specs, Matt has offered that in current 16nm implementations the Ampere eMAG solution is comparable to Xeon Gold devices, but at half the cost and Epyc devices at half the power. Side-note on power: some analysts think cloud users won’t care – they just pay for usage time, so performance should be the only metric that matters. Wrong – power contributes significantly to total datacenter overhead in the cost of keeping the whole thing cooled. Your bill as a user is part runtime (and price) on the instance type you chose and part overhead, including cooling costs. So yeah, power matters, even though it’s an indirect cost.

Lenovo has released (recently) their ThinkSystem HR350A rack server based on the eMAG processor, so it’s already possible to deploy servers based on the devices. Just like Arm, they stress scale-out applications (high parallel operations like video serving, where it is easy to add more processors to handle more parallel requests) and similar applications where performance per dollar and performance per watt are important considerations.

Matt told me that they are at various stages (eval to deployment) with big cloud service providers and are hearing similar themes for workload trends well-fitted to Arm-based servers, including storage, internal and external search, content delivery, in-memory db applications and (interestingly in china) for mobile gaming with cloud-based rendering. Some of these are accelerator options but he stressed also standard server applications with differentiated capabilities that you couldn’t easily get on the usual platforms. Sadly he didn’t want to share specific examples.

Overall, sounds very consistent with the Arm story I wrote about earlier. Arm-based servers may not be as fast, unit for unit, as the best of the best from Intel and AMD but (a) they’re a lot cheaper and lower power than those options and (b) you can build your own customized solutions optimized to higher throughput per dollar/watt for specific workloads. In some pretty high traffic datacenter applications, the best of the best may not always be the best total system solution.


SoC Design Partitioning to Save Time and Avoid Mistakes

SoC Design Partitioning to Save Time and Avoid Mistakes
by Daniel Payne on 12-18-2018 at 12:00 pm

I started designing ICs in 1978 and continued through 1986, and each chip used hierarchy and partitioning but our methodology was totally ad-hoc, and documented on paper, so it was time consuming to make revisions to the chip or train someone else on the history or our chip, let alone re-use any portion of our chips again. Those old, manual ways of doing chip designs are happily far behind us now, so much so that recent smart phone chips routinely have processors with billions of transistors, with massive amounts of semiconductor IP reuse, all enabled by more modern and automated IC design flows. This blog idea springs from information gleaned in a White Paper written by Methodics, a software company founded in 2006 with a headquarters in San Francisco. The big picture view at Methodics is to model your entire SoC as related sets of functional blocks, then automate the workflow to ensure that your chip design is consistent and easy to update and communicate changes and dependencies.

Here’s a picture of what they call an IP configuration and how it maintains multiple relationships to design data and versions:

The specific software tool at Methodics is called Percipient, and using this IP configuration approach you can do top-down designs more easily, because along the way the tool is tracking the content of each IP and the hierarchical relationship between them. These IP objects and relationships can be quickly captured at the very start of a project, even before the design details are ready. Everyone on the design team can visualize how their part of the project is being placed in a hierarchy and what its dependencies are going to be. Metadata is attached to each IP, so for example a Bluetooth IP block may require a specific PDK version from your foundry of choice and you can quickly determine if all IP blocks are compatible with that PDK version.

In the first diagram there’s a blue area showing that IP can be imported for re-use from many Data Management (DM) sources:

  • Perforce
  • git
  • Subversion
  • Custom

If your particular DM system isn’t listed, then just contact Methodics to see if they’ve already got an import available. The files in your DM can be primarily binary, text or a mixture of the two, so it’s your choice and there’s no restriction on DM type or how you make relationships between IPs of each type.

Workspaces are used to save specific configurations of your own choosing. Making changes to IP and its metadata can then be saved as a release, and each release has the relationships between all IPs in your hierarchy at that one point in time. With any particular release you can run simulation and functional verification, then the results are attached to that release. Everyone on your team can be notified when a new release happens on some IP.

There are even third party integrations with requirements management and bug-tracking tools, so team members always know for each IP what the requirements associated with it are, along with any bug reports. Here’e another diagram to show how an IP configuration connects with other tools in your IC flow:

So with the Percipient methodology you can go to one place and find out all information about your electronic system, from the top-level all the way down to the lowest block levels. You will know where each block is being used and how often it is being re-used, along with the requirements and performance, plus the history of changes made to it and by who. Searching through the Percipient catalog is quick and easy, so it takes a lot of the guesswork out of complex IC design projects.

Projects that need to comply with Functional Safety (FuSa) will enjoy the traceability features built-in to Percipient, so that you can validate every safety function automatically at each release. Another benefit to automating FuSa compliance is that user responses to questionnaires can be attached to specific IPs, and then managed throughout the design hierarchy.

OK, this sounds promising so far, but how do I know how to best partition my specific design with this tool? The best practice is to place anything that could be re-used into a functional block as its own IP object. A functional block can contain sub-blocks too, here’s another example of a hierarchy:

Your design team is typically comprised of groups, and each group can be responsible for their own releases. The best practice is to release early and often as progress is made and milestones are reached. Both producers and consumers of IP blocks use the Percipient tool, while a producer may be most interested in the latest version and a consumer could be more interested in using a fixed version that isn’t changing until they request an update. The producers are doing design work, running simulations and validations, reaching some quality goal and then they make a new release, alerting consumers that a new version is ready to consume. All team members are in the loop and quickly learn to choose the proper release.

Conclusion
Your SoC projects can be quite complex, containing Terabytes of data, so consider the benefits of using a proven, modern system to manage your IP with traceability, quickly and easily. Just look in one place to know the state of your design, while avoiding communication mistakes that could cost you an expensive silicon spin. The complete 7 page White Paper can be read here.

Related Blogs


Cadence Automotive Summit Sensor Enablement Highlights

Cadence Automotive Summit Sensor Enablement Highlights
by Camille Kokozaki on 12-18-2018 at 7:00 am

At the November 14 Cadence Automotive Summit, Ian Dennison, Senior Group Director, outlined sensor enablement technologies and SoC mixed-signal design solutions, from Virtuoso electrically aware design with high current, high reliability, yield and performance tools and methodologies enabling ADAS/AV sensors for vehicle perception.

An ADAS/AV camera system was described as containing on the transmit side a Cadence Ethernet MAC, a BroadR-Reach PHY, filters, cables, and connectors to a decision-making board on the receiving side and accompanying simulation-based EMI verification, modeling, PCB power integrity analysis, S-parameter models to ensure data coming in is successfully received without resending.

The actual IP containing the automotive Ethernet MAC IP is available in 10Mbps Ethernet that replaces CAN, Flexray, 100Mbps which still demands image compression, and a higher than 1Gbps rate that avoids the need for image compression for the highest image quality and best object classification, along with DMA, APB configuration interface registers and a time stamp unit for Time Sensitive Networking to ensure camera data is not delayed. The automotive Ethernet MAC has received ASIL-B ready certification under the Automotive ISO 26262 standard.

ASIL compliance requires a quality management process and certification, a safety manual for SEooC, safety features description, failure mode effect, diagnostic analysis, and automotive safety kits for tools and flows. In implementing a system using Innovus, a time stamp unit (TSU), the TSU block is duplicated with timer outputs compared on a cycle-by-cycle basis to detect any faults as one of the safety mechanisms. Other considerations exist like creating safety boundaries where internal nets are maintained inside each TSU and interfaces nets are not routed over to avoid common mode failures in the duplicated TSUs.

ISO 26262 SoC design compliance is ensured with the functional safety methodology described in the diagram. Some requirements for proper classification include continuous checks that are needed on the image sensor to enable failure signals to be raised within two frames. On-chip checkers are placed inside the chip to identify analog or digital functional failures that can result in image sensor row, column, ADC and clock failures. There are some types of image sensor failures that rely on DSP processes downstream to be properly detected.

There are some CIS ADAS/AV considerations that determine object classification success driven primarily by moving vehicle image quality. CIS ADAS/AV issues include high dynamic range (HDR) needed for bright/dark conditions, vehicle motion-induced rolling shutter distortion, LED street/vehicle lighting rolling shutter flicker mitigation needs, real-time shutter compensation, noise vulnerability, moving vehicle stabilization and gyroscope fusion and finally cost in a price sensitive automotive market.

A complete simulation platform for CIS analysis uses the ADE Product Suite and the Spectre family of simulators.

Designing needed CIS ADC high dynamic range includes considering the CIS fps/shutter speed that sets the ADC conversion rate, and CIS dynamic range sets the ADC resolution (60 dB and a range of 1000:1 means a 10-bit ADC). The Cadence methodology characterizes the ADCs in the presence of temporal noise.

Uniformity of the CIS arrays for proper design-in of electrical and electrical reliability is essential. Cadence Virtuoso electrically-aware design offers on-screen real-time parasitics and resistance analysis with colormaps and voltage drop summaries, and electromigration current flow, so they can be considered in the analysis and design.

Lidar uses several technologies such as CMOS used in SoCs for controller and Ethernet and image sensors, MEMS for scanning mirror, silicon photonics, III-V material for laser source, and system-in-package. The drive is towards low-cost, small form-factor lidar for automotive, medical, and industrial applications.

The end of Moore’s law is enabling a disaggregated SoC where packaging is the glue between different die where thermal integrity, AC coupling, losses, reflections, crosstalk, warping mitigation, thermal and electromagnetic integrity need to be comprehended and dealt with.

Silicon photonics for frequency modulated continuous wave (FMCW) requires 10 cm automotive lidar depth precision addressed with tighter control of laser modulation and an electro-optical phase-locked loop (PLL). A MEMs tunable laser producing a laser source is split down into two waveguides. One waveguide is sent to the target and the return signal is blended, using an FMCW with a frequency changing all the time and with the signature generating a beat frequency.

Silicon photonics and MEMS co-design are enabled with Spectre APS and AMS Designer, Virtuoso ADE, schematic and layout suites, along with tools from partners like Lumerical and Coventor.

Cadence’s Legato Reliability Solution has a design-for-reliability approach extending the lifetime of the chips. When a failure occurs, functional safety kicks in to stop a car, but tools are needed to help a design-for-reliability mindset where analog defect analysis occurs to reduce the test cost and eliminate test escapes, electro-thermal analysis prevents thermal overstress avoiding premature failures, and advanced aging analysis accurately predicts product wear-out.

In ADAS radar sensor design, antenna sizes are shrinking allowing on-chip integration. A 122GHz radar includes a low-noise amplifier (LNA), power amplifier, mixer, and two on-chip antennas.

The Virtuoso RF Solution allows multi-fabric RF in PCB, SiP and SoC and interfaces with Spectre RF, Allegro Sigrity and National Instruments’ Axiem. An ADAS radar transceiver design was illustrated showing stretchable transmission lines with pCells, matched RX & TX antennas, Spectre RF and Virtuoso ADE Assembler showing the noise figure, input matching, gain, and stability.

Virtuoso RF allows a layered extraction of modules with EM solvers using QRC (a parasitic extractor), Sigrity PowerSI (a 3D-EM solver), and NI’s Axiem (a 2.5D solver for planar elements). The Sigrity PowerSI 3D-EM has an RF-module package extraction and critical path S-parameter model extraction for layered structure designs (on-chip, package, and PCB).

Datacenters are well suited for labeling training datasets with a training engine run only once per dataset versus an inference engine that is run on every image from various sensors onboard the vehicle that feeds new data to the datacenter.

The labeling of the datasets generates a set of coefficients with various weights pushed to the car, which then does a single pass evaluation on the image and generates the most probable label for proper decision making.

System and software design follow a spectrum starting from workstation simulation with no specialized hardware, to Xcelium parallel simulation with hardware running at about 1KHz for software execution, moving to Palladium Z1 emulation with hardware running at ~1MHz for software execution, then to Protium S1 FPGA prototyping at ~10 MHz and finally with first silicon on a prototype board. This allows development of OS, middleware, firmware, and drivers in parallel with hardware-based simulation accelerating the functional verification. Early start to software development or the new hardware accelerate time-to-market.

The Cadence design enablement allows system and DSP design, advanced node SoC development, MEMS and Silicon photonics implementation, SiP integration and CNN software development, all in one interoperable environment that greatly enhances sensor design and opens design fabrics and opportunities leveraging improved accuracy, decision making, and reliability.

Read more here: Automotive Summit 2018 Proceedings


Photonics with CurvyCore

Photonics with CurvyCore
by Alex Tan on 12-17-2018 at 12:00 pm

As a preferred carrier to data or energy, photonics technology is becoming broad and diverse. In IC design, silicon-photonics technology has been the enabler of new capabilities and has revolutionized many applications as Moore’s-based scaling started to experience a slowdown. It acts as new on-chip inductor in HPC design and fast connectivity in network infrastructure.

At the Cadence Photonics Summit and Workshop 2018 held in San Jose last month, Cadence showcased its CurvyCore Infrastructure, a new technology intended for photonics applications. It is a native infrastructure in the Cadence Virtuoso custom IC design platform, allowing designers to create and edit complex curvilinear shapes common in photonics, RF, MEMs, microfluidics and conformal metal routing.

The State of Photonics Technology
The CurvyCore technology addresses markets and technologies that span from silicon photonics switches and interconnects for HPC/Datacenter, medical sensing applications, LiDAR, aerospace, MEMs to carbon nanotube conductors. According to Dr. Vladimir Stojanovic from UC Berkeley, who gave a keynote at the Cadence 2018 Photonic Summit, the current photonics integration with advanced electronics leverages CMOS transistor performance, its process fidelity and package integration, to enable emerging SoCs for various applications ranging from computing to sensing and imaging.

Based on his team’s research, the sweet spot for a “zero-change” silicon photonics platforms used in this monolithic integration technology is of either 45nm or 32nm SOI CMOS processes –they are suitable for adding photonic capability and enhancing integrated system applications such as main communication of computing tasks without involving complicated 3D integration efforts or double-patterning for EUV. Figure 1 captures the optical I/O landscape as well as the application of photonics on RISC-V microprocessor and DRAM.


Silicon photonics application for fast interconnects also evolved from a data center, off-chip centric type to be more integrated as on-chip feature in both microprocessor and PIM (Photonic-Interconnected DRAM) designs. The open-source RISC-V microprocessor with photonic on-chip interconnect was first implemented as single, electronics-optics hybrid chip, dual-core 1.65Ghz processor in CMOS 45nm SOI.

Challenges to Silicon Photonics
Embracing silicon photonics is an evolving process as handling curvilinear physical shapes is challenging. Unlike the traditional Manhattan polygons, custom design of curvilinear geometries is prone to misalignment, roundoff errors and manufacturing problems.
It is an effort intensive undertaking as its associated Pcell creation is cumbersome and time consuming. Additionally, a lack of common infrastructure leads to many ad-hoc, non-replicable and sub-optimal flows –translating to complex DRC/LVS problems to fix. All of these drives the need of having a robust platform to address the overall physical design automation.

CurvyCore Technology and Its Key Benefits
For an optimal performance, the CurvyCore technology has been natively implemented in the Virtuoso platform. The CurvyCore infrastructure has a three-tier data model and is an extension to the Virtuoso advanced-node platform, which sits on a high-performance symbolic mathematical engine.

As part of the Virtuoso expanded data-model, the CurvyCore infrastructure provides full access to all levels of design captures –from building block to actual symbolic expressions, enabling the creation and maintenance of differentiated curvy IP. For example, figure 3 shows phase shifters for a LiDAR (Light Detection And Ranging) application comprises of orthogonal polygons for the electrical connections and curvy geometries for optical interconnect –both of which can be concurrently viewed and manipulated.

The diagram in figure 4 illustrates the CurvyCore data model starting with the mathematical core which consists of an accurate mathematical representation thru symbolic equations, then is followed by the second layer (in magenta) that captures any curve geometries discretized to its equivalent shapes, and the top, physical layer (in pink), that contains the layout polygons in OA shapes. The combination of curvilinear discretization, boolean and sizing operations enables design rule fixing related to photonic complex shapes.

Aside from enabling the creation and editing of complex curvilinear shapes, the CurvyCore integration with the Cadence Virtuoso custom IC design platform leverages a unified design environment for the development of multi-fabric systems. It has new APIs to allow complex PCells creation and efficient data model to support high-performance editing or storage of curvilinear shapes within the Virtuoso design platform.

CurvyCore also supplements the Virtuoso Layout Suite and works seamlessly with the most advanced Virtuoso features, allowing true co-design and integration of electronics and curvilinear features. For example, during the Cadence Photonics workshop attendees were given the opportunity to use the Virtuoso custom IC design platform to view or edit a LiDAR photonics IC and to perform co-simulation of beam steering using Spectre® AMS Designer, MATLAB and Lumerical INTERCONNECT as part of a test-drive of the CurvyCore infrastructure implementation in the Virtuoso platform.

The CurvyCore technology is planned for general availability in Q1-2019.

For more about CurvyCore check HEREand for the Cadence 2018 Photonics Summit check HERE.


Intel Discontinues the Custom Foundry Business!

Intel Discontinues the Custom Foundry Business!
by Daniel Nenni on 12-17-2018 at 7:00 am

After mentioning what I heard at IEDM 2018, that Intel was officially closing the merchant foundry business as an aside in a SemiWiki forum discussion, I got a lot of email responses so let me clarify. Honestly I did not think it was a big surprise. Intel Custom Foundry was an ill conceived idea (my opinion) from the very start and was not successful by any measures. To be clear, it is not something I just heard, it is something I have verified through multiple sources so I believe it to be true, absolutely.

Just a little background, we started blogging about Intel in the early days of SemiWiki and have posted 202 Intel related blogs that as of today have been viewed 2,822,613 times which is an average of 13,973 per blog. Big numbers in the semiconductor blogging world in my experience. Intel has a very large group of entrenched supporters with even more naysayers that are not easily swayed so there are plenty of blog comments, some of which had to be deleted. My argument against Intel opening up their leading edge manufacturing facilities to the fabless community was that it would be a distraction from Intel’s core competency of making microprocessors. As we know, ecosystem is everything with the foundry business and that takes time, money, and technical intimacy, three things that Intel seemed to greatly underestimate.

Also read: Intel Custom Foundry Explained!

Altera was the big win for the Intel Custom Foundry business. I was having coffee with a friend in TSMC Fab 12 when it was announced. If my memory serves it was Dr. Morris Chang who made the announcement and it honestly felt like parents were divorcing. It was mentioned that TSMC viewed this as a learning experience and would make sure that losing an intimate partner like Altera would never happen again.

Also read: Apple will NEVER use Intel Custom Foundry!

Altera was founded in 1984, the same year I started my semiconductor career. Some of my school friends joined Altera and I worked with Altera as a customer during my EDA and IP career down to 20nm so I had a front row seat. It was a very close relationship between Altera and TSMC up until Xilinx came to TSMC at 28nm. TSMC gave Xilinx equal access which soured the Altera relationship. Altera then moved to Intel at 14nm which led to the acquisition at a premium price.

One of the funniest stories I heard was about the first copy of the Intel 14nm design rules Atera got from Intel. They were heavily redacted, which is something I had never seen in the foundry business. After many delays Intel put their own implementation team on the first 14nm Altera tapeout and the result was a very competitive FPGA chip. If not for the continued delays, Xilinx would have been in serious trouble as the Intel 14nm FPGA, based on my experience with customers, beats the Xilinx 16nm in both density and performance.

You can see the 2014 Intel Custom Foundry pitch HERE. Great intentions, good effort, too many broken promises, but doomed from the very beginning, my opinion.


Next-Generation Formal Verification

Next-Generation Formal Verification
by Daniel Nenni on 12-14-2018 at 12:00 pm

As SoC and IP designs continue to increase in complexity while schedules accelerate, verification teams are looking for methodologies to improve design confidence more quickly. Formal verification techniques provide one route to improved design confidence, and the increase in papers and interest at industry conferences like DVCon and DAC reflect the growing usage of formal verification tools in the industry. Despite the increase in usage of formal verification, there are still few opportunities for verification engineers interested in formal techniques to exchange ideas, knowledge, and best practices.

The Synopsys VC Formal Special Interest Group (SIG) events are a step towards broadening knowledge of formal verification. In the inaugural year of the VC Formal SIG, Synopsys held events in India, Japan, and the United States. In all three events, VC Formal customers shared their experience and successes in using VC Formal to address verification problems, alongside Synopsys AEs who presented new or advanced applications of formal verification. The most recent event was held in November in Santa Clara, California, with a keynote discussion from Sean Safarpour and Pratik Mahajan from Synopsys discussing the history and future of formal verification. The Santa Clara event showcased experts from AMD, ST Microelectronics, Qualcomm and Juniper Networks, highlighting the use of formal verification to solve challenging verification problems.

Formal Sign-Off of a Control Unit (AMD)
Wayne Yun discussed the verification of a complex control block using formal methods. The block included an AHB arbiter, microprocessor, data collector, accumulator, and glue logic. First, each block was considered in isolation. Formal techniques such as formal model creation, case splitting, invariant identification, and symbolic variables were applied to each sub-block. Assume-guarantee reasoning validated assertions on interfaces between sub-blocks. He also discussed how AMD used the coverage collection, overconstraint analysis, and fault injection features of VC Formal to signoff the design. VC Formal formal core analysis and fault injection identified several areas that required additional assertions. At the end of the project, over 94% of assertions were proven, all blocks had full formal core coverage, and most blocks detected over 99.8% of injected faults.


Formal Verification of a GPU Shader Sequencer (AMD)
Chirag Dhruv and Vaibhav Tendulkar showed the benefits of VC Formal FPV App’s bug hunting effort to find bugs in a GPU shader sequencer. A wide variety of parallel instructions, asynchronous events, and dynamic configuration changes make simulation coverage closure difficult in this block. Sub-block decomposition allowed quick bug exposure and rapid iteration time. Formal reachability analysis, COI coverage analysis, and formal core coverage analysis identified dead code and design areas requiring more assertions. Formal verification found over 30 RTL bugs, several which would have been difficult to discover through simulation, absolutely.

Accelerate Digital IP Formal Verification with Machine Learning Technology (ST Microelectronics)
Giovanni Auditore described a shift of verification resources within ST towards formal verification. They used Verdi Planner to combine the formal and simulation verification plans, allowing a single view of verification progress as formal use increased from project to project. A variety of VC Formal Apps were applied to the design, including FCA for unreachability analysis, FRV for register verification, and FPV for property and protocol verification. Giovanni also described how the Regression Mode Acceleration (RMA) feature of VC Formal sped up formal regression. RMA uses machine learning techniques to accelerate proof time on future runs of the same or incremental versions of RTL. After applying RMA learning to an initial release of RTL, proof time for subsequent RTL releases took 1/3 the time as running identical regressions without RMA. RMA also reduced runtime of fault injection qualification from 21 to 16.5 hours.

Verification Sign-Off with Formal (Qualcomm)
Anmol Sondhi shared how to layer various coverage metrics available in VC Formal to build confidence in assertion quality throughout the design cycle. Early in the project, cone-of-influence based property density will identify testing holes in the design. Unreachability analysis through the FCA App, and over constraint analysis identifies areas in the design where formal stimulus won’t reach, allowing targeted review of RTL and formal constraints. Formal core coverage represents the logic that formal engines use to prove a property. Uncovered areas represent potential test holes. Finally, fault injection identifies areas where modifications in RTL behavior trigger assertion failures. He also showed how RMA resulted in between 2X and 10X improvement in regression runtime. Using VC Formal, the example project achieved over 99.5% property density and 90% formal core coverage and identified only 12 areas of undetected faults that required further investigation.

Designing for Formal Verification (Juniper Networks)
Anamaya Sullerey explained how RTL designers can be involved with formal verification through design methodology and short, frequent formal regressions of RTL. He described how changing an event driven implementation with a complicated state machine and complex, interacting side effects to a functionally driven implementation with lots of small modules that perform simple tasks can simplify and accelerate formal verification. Efficient decomposition of the design allows for meaningful sub-block formal verification regressions of no more than five minutes. Other recommendations for formal friendly design include early parameterization of RTL code, isolating complex blocks into separate modules for easy abstraction, creating meaningful intermediate expressions, and coding assertions for design invariants such as one-hot bit vectors. With a high level of formal friendly design methods, designers or verification engineers could quickly build module level formal testbenches that catch a majority of bugs with five minutes of regression time.

The Synopsys formal verification team presented tutorials on datapath operations and how to discover design invariants. JT Longino talked about using Synopsys tools for formally proving datapath operations. Datapath correctness continues to be a challenge for the industry. High confidence in datapath operations is difficult or impossible to achieve using simulation, but datapath operations have historically exceeded the capacity of formal property verification tools. Synopsys HECTOR technology provides users the ability to prove equivalence between an implementation RTL design and a reference design. The two designs can have different latencies, and the reference design can be untimed C or C++ code. The new VC Formal DPV App integrated HECTOR technology into the VC Formal GUI, allowing formal verification engineers to work on datapath verification problems in a familiar environment.

Iain Singleton described how to use VC Formal to discover design invariants to help converge complex properties. Invariants describe properties that remain unchanged when a specific transformation is applied and can restrict the state space of subsequent proofs if used as assumptions. Although they are powerful tools or assisting convergence, invariants can be difficult to identify and write. The VC Formal Iterative Convergence Methodology (ICM) provides users a methodical, tool-assisted approach to identifying design invariants. Using ICM, convergence time for a selected set of difficult properties was reduced from over three hours to around one minute. To learn more about VC Formal and to stay up to date on dates on VC Formal SIG 2019 events, visit HERE.


Embeddable FPGA Fabric on TSMC 7nm

Embeddable FPGA Fabric on TSMC 7nm
by Tom Simon on 12-14-2018 at 7:00 am

With their current line-up of embeddable and discrete FPGA products, Achronix has made a big impact on their markets. They started with their Speedster FPGA standard products, and then essentially created a brand-new market for embeddable FPGA IP cores. They have just announced a new generation of their Speedcore embeddable FPGA IP that targets leading edge compute applications such as AI/ML. More than just being a process node advancement, they have made a number of strategic architectural changes to improve performance and adapt to certain classes of problems.

Yes, as you might expect this announcement includes moving to the latest process node, TSMC 7nm, and there will be a back port to 16nm later in 2019. However, the really interesting stuff in this announcement has to do with further improvements in the already optimized architecture of the fabric.

I had a chance to speak to Robert Blake, Achronix CEO, at the time of the announcement to gain deeper insight into the specifics. He mentioned that they have successful 7nm validation silicon back that meets their target specifications. The motivation for many of the changes in this new generation are based on the AI/ML market and the big changes in how FPGA technology is being used.

FPGAs have made a dramatic shift over the decades from glue logic and interface uses to becoming a major element in data processing, such as networking and AI. Microsoft demonstrated how FPGAs offer huge acceleration for compute intensive applications. Classic CPUs have seen their year-to-year performance gains flatten out. With this there has been a concomitant growth of the use of specialized processors such as GPUs to fill the gap. FPGA’s represent an even more flexible tool for implementing computational processing. Achronix likes to point out that CPUs are rapidly becoming FPGA helpers, that can deal with exceptions, but are not necessarily in the main data path as much anymore.

The beauty of embeddable FPGA fabric IP is that significant overhead of an off-chip resource is avoided. These include off chip driver loads, board real estate, and interface speed limits.

The Speedcore 7t, which is built with their Gen4 architecture, provides significant PPA improvements. Robert told me that they see simultaneous gains in performance, power and area, namely a 60-300% boost in performance and a 50% decrease in power with an area decrease of 65%. Any one of these would be noteworthy, but they have a combined win. Robert walked me through some of the changes that contribute to these numbers.

Based on the needs of several important applications, Achronix has added or enhanced certain logic blocks. For instance, there is an 8-1 mux, which is critical for networking applications. Another is an 8-bit ALU that is heavily used for AI/ML. Robert also talked about their bus max function, dedicated shift registers, and LUT changes, all of which improve the compute power of their FPGA fabric.

Robert talked about numerous other additions, such as their programmable bus routing. This 4-to-1 bus routing capability can be cascaded to create wider busses. This will save LUT resources and offers a 2X performance improvement.

Going one step further, they have added a new compute block – a Machine Learning Processor (MLP). It is optimized for neural network (NN) matrix vector multiplication. It is clocked at 750 MHz and has flexibility in the number formats is can handle: Fixed point, Bfloat16, 16-bit half precision FP, 24-bit FP, block FP. The flexibility provided with varying configurations, allows customization to adapt to different NN algorithms. It also provides future proofing, because the programmable array can be altered as NN algorithmic technology advances.

There is so much in this announcement, I suggest referring to the Achronix website for all the details. However, it is clear that Achronix intends to maintain its technical and business advantage in this space using a wide range of targeted technical improvements. Rather than rest on their laurels, they are using their experience to help meet the emerging computational requirements for AI/ML, which is poised to become pervasive.


Sequential Equivalency Checks in HLS

Sequential Equivalency Checks in HLS
by Alex Tan on 12-13-2018 at 12:00 pm

Higher level synthesis (HLS) of an IP block involves taking its high-level design specification –usually captured in SystemC or C++, synthesizes and generates its RTL equivalent. HLS provides a faster convergence path to design code stability, promotes design reuse and lowers front-end design inception cost.

HLS and Mentor Catapult Platform
The current rise in the HLS adoption has been partly attributed to the availability of verification solutions, which facilitate the validation of the generated RTL codes against the high-level reference design. Both design abstractions can be validated through either simulation based verification (such as coverage metric and assertion driven), or formal verification method to check for their equivalencies.

Mentor’s Catapult® HLS Platform provides a complete C++/SystemC verification solution that interfaces with Questa® (or third party simulators) for RTL verification as shown in figure 1. The platform consists of a design checker (Catapult DesignChecks or CDesign Checker), a coverage tool (Catapult Code Coverage or CCOV), a high-level synthesis (Catapult HLS) and a formal tool SLEC HLS (Sequential Logic Equivalence Check).

Logic and Sequential Transformations
During RTL-GDS2 design implementation, timing optimization frequently necessitates logic restructuring and transformation to meet PPA (Power Performance Area) tradeoffs. While critical timing paths can be resolved through the manipulation of logic cone topology –such as buffering, drive strength adjustments, better resource sharing, and an interconnect layer promotion, sequential related design manipulation can provide opportunities for solving critical timing paths that otherwise impossible to tackle. Such design transformation is hardly available in the later stage of gate level timing optimization as it may introduce state changes and complicates the verification tasks.

Several timing driven sequential modifications include pipelining (managing number of stages along data or control paths to meet throughput target); register retiming (shifting register to balance logic cone latency); state recoding in FSM block (such as from binary-encoded to one-hot implementation); and resource scheduling (what-if scenarios to meet optimal area and performance targets).

Furthermore, a number of design refinements might also introduce changes in sequential element count such as a block interface conversion from abstract data types to bit-accurate busses and augmenting test mode operation involving scan path logic insertion.

Designer also run Catapult HLS to generate power optimized verification ready RTL, which involves sequential transformations due to enable manipulation technique (enable extraction or strengthening) –yielding further clock gating that reduces switching activities. Figure 2 shows the additional clock gatings due to sequential analysis.

All of the previously described design changes alter the maps between registers of the two design abstractions, and render traditional combinational equivalence checkers ineffective. Instead, designers could run SLEC HLS to validate between C++/System C to RTL as well as RTL versus RTL (before and after an incremental power optimization).

The Mechanics of SLEC HLS
Unlike the traditional equivalence checker for combinational logic, proving equivalency across two design abstraction such as C++ versus RTL requires a different approach such as identifying the sequential differences. SLEC HLS has advanced analysis that allows designers to explore what-if refinements that normally might trigger traditional equivalence checkers to generate a false positive. It employs a fine-grain partitioning of design sections in order to provide scalability in handling large design codes.

As illustrated in figure 3, fracturing a function to its associated basic blocks, SLEC HLS analyzes and map them to its corresponding control FSM to schedule the dataflow analysis. Comparison is then performed at the interface of these “state-like” basic blocks along the time axis using micro transactions. HLS synthesis provides both basic block boundaries and information about the micro transactions.

Running and Analyzing SLEC HLS Results
SLEC HLS run setup is quite straight-forward and is achieved by black-boxing non-synthesizable design parts including large memory blocks, and specifying the same reset states and sequences usually captured for synthesis. Other design setting such as state correspondence, which reduces verification complexity as well as clocking and port mapping are automatically derived from the high-level reference codes. To improve run time and quality of results, the tool uses function-based partitioned blocks in the form called CCOREs (Catapult C Optimized Reusable Entities) –which are called for multiple times in the design, to perform a hierarchical verification.

At SLEC HLS run completion, there are 3 possible outcomes: a full proof of equivalency, a mismatch, or a partial proof –which indicates that some remaining points in logic being compared needed further analysis. A neat feature of SLEC HLS is when a mismatch occurs. In this instance, it will generate counter example testbenches containing stimulus sequences that designers use to trace design differences. These testbenches include flip-flop initialization values and all primary input stimuli intended to demonstrate the difference –which are simulation ready, and can be used for further analysis with functional verification tool. Subsequent fixes to mismatches may involve source code or I/O cycle/sampling adjustments, constraint changes and a rerun.

In the case of a partial proof, SLEC HLS will generate a formal coverage report that quantifies the exploration of all possible inputs and states, which is helpful to root-cause issues such as dead-code situation. Such information can also be used to identify incorrect assumptions or constraints that were provided to SLEC HLS such as conflicting dual assignments to an input. Hence, adding a formal verification in the flow reduces the need to do full-blown RTL simulation and cutting the overall verification time.

As part of the Catapult HLS Platform integrated verification solution, SLEC HLS provides designers with formal validation of designs across different abstractions (C++, SystemC, RTL) or refinement stages (pre- vs post-power optimization). Such vectorless validation can be used to complement simulations to deliver a more comprehensive verification and reducing the overall efforts and cost.

Check HERE for HLS and HERE for SLEC HLS.


Big Data Analytics in Early Power Planning

Big Data Analytics in Early Power Planning
by Bernard Murphy on 12-13-2018 at 7:00 am

ANSYS recently hosted a webinar talking about how they used the big-data analytics available in RedHawk-SC to do early power grid planning with static analytics, providing better coverage than would have been possible through pure simulation-based approaches. The paradox here is that late-stage analysis of voltage drops in the power distribution network (PDN), when you can do accurate analysis, may highlight violations which you have no time left to fix. But if you want to start early, say at floorplanning where you can allow time to adjust for problems, you don’t have enough information about cell placement (and therefore possible current draw) to do accurate analysis.

ANSYS have a solution based on something they call Build Quality Metrics (BQM). In the webinar they talk about the general methodology. There are multiple ways to approach BQM; one starts with a static analysis of the design (no simulation) and doesn’t require placement info. For this you build heatmaps based on simultaneous switching (SS) calculations, likely issues in the planned power grid and likely timing criticality. For SS, you calculate peak current per cell based on library parameters and operating voltage. You then combine these values for nearby instances which have overlapping timing windows (taken from STA analysis), summing these currents to generate an SS heatmap.

Next you want to look at where you may have excessive IR drop in the planned grid. In BQM, since you don’t yet have cell instance placements you fake it by placing constant current sources at a regular pitch on the low metal segments and then do a static solve to generate an IR-drop heatmap. The evenly-spaced current draw won’t match exact cell instance current draws but it should be a reasonable proxy, allowing these heatmaps to be generated early in implementation and refined as placement data becomes available.

You can further refine this analysis using timing slack data generated from STA analysis data to prioritize timing critical cases. Combining all these heatmaps together generates the ultimate BQM heatmaps. ANSYS and their customers have shown that there is excellent correlation in observed hotspots between these and heatmaps generated through the traditional RedHawk (non-SC) path.

All of this analysis leverages the ANSYS Seascape architecture underlying RedHawk-SC to elastically distribute compute to build heatmaps. Which means that analysis can run really quickly, allowing for an iterative flow through block place and route. Which is really the whole point of the exercise. Instead of building a PDN based on early crude analyses like shortest path resistance checks, then doing detailed analysis on the finished PnR to find where you missed problems with real vectors, the BQM approach provides high coverage earlier in the flow, without need for vectors or cell placement, enabling incremental refinement to the PDN as you approach final PnR.

ANSYS reports that runtime of the BQM approach can be 3X faster than a dynamic analysis based on just a single vector. Note that the static approach in BQM provides essentially complete instance coverage (all instances are effectively toggled) whereas dynamic coverage is inevitably lower. You can raise dynamic coverage by adding more vectors but then runtime becomes even higher. Overall, you can build and refine your PDN early, avoiding late-stage surprises, you can do this quickly enough that it makes sense as an iterative step in the PnR flow. You’ll still do signoff at the end with whatever method you feel comfortable. Just without nasty surprises. What’s not to like?

ANSYS tells me they have scripts to automatically setup the SC flow from your RedHawk setup, so it seem like there’s really no excuse not to give this a whirl 🙂 You can register to watch the webinar HERE.