webinar banner2025 (1)

CES 2019 Stormy Weather for IBM

CES 2019 Stormy Weather for IBM
by Roger C. Lanctot on 02-03-2019 at 12:00 pm

Ginni Rometty, chairman, president and CEO of IBM was kind enough to take on the task of an hour-long keynote at CES 2019 in Las Vegas last week. She used the opportunity to highlight three areas of computational innovation at IBM – deep data, broad AI and quantum systems – with the help of three partners: Delta Airlines, Wal-Mart and Exxon Mobile.

CES 2019: CTA State of the Industry Address and IBM Keynote

Rometty proclaimed IBM’s advances in weather forecasting and the importance of building trust, transparency and security around data analytics. She also talked about the positive societal impacts of these technological advances and how artificial intelligence, while changing everything about how humans work in the future, will be a force for good.

Unfortunately, there were two big gaps in Rometty’s comments. Firstly, Rometty had nothing to say about the automotive industry – an industry where IBM is deeply embedded and invested and therefore implicated in the nearly 40,000 annual fatalities that occur on the nation’s highways. And, secondly, she had nothing to say about the Weather Channel’s alleged sharing of location data via its smartphone application which has 45M active monthly users.

Rometty correctly pointed out that IBM is not thought of as a consumer technology company, yet IBM underpins many if not most consumer interactions with technology on a daily basis. A week before her keynote at CES, the city of Los Angeles attorney filed a lawsuit claiming that the Weather Company, which is owned by IBM, “unfairly manipulated users into turning on location tracking by implying that the information would be used only to localize weather reports,” according to a report in the New York Times.

“Yet the company, which is owned by IBM, also used the data for unrelated commercial purposes, like targeted marketing and analysis for hedge funds, according to the lawsuit,” reported the Times. The lawsuit alleges a violation of California’s Unfair Competition Law.

(Two years before, IBM announced a partnership with General Motors to create a contextual marketing platform in GM vehicles to be called OnStar Go. The cooperation was announced at CES 2017, but by CES 2018 IBM had handed the opportunity over to a company called Xevo which now manages the application as the “GM Marketplace.”)

The report in the New York Times sounds like a sad tale of corporate malfeasance unlikely to end well for all involved. For its part, IBM has asserted its innocence.

For me, though, the bigger issue is the underlying question of weather and automotive safety. In Nate Silver’s “The Signal and the Noise: Why So Many Predictions Fail – but Some Don’t,” the author notes that weather is a rare area where prediction and forecasting has shown steady improvement in accuracy.

There is a lot of promise in weather forecasting and it may be for that reason that IBM acquired The Weather Channel in 2016. In fact, weather has become a contentious issue around enabling automated vehicle operation, with some analysts – including professors at Michigan State – suggesting that autonomous vehicles will never be able to overcome the challenges posed by weather.

Interestingly enough, a growing cadre of private weather researchers including Global Weather Corporation, Foreca, Weather Cloud, Weather Telematics and others are applying their analytic platforms to better understand and predict road surface conditions. These efforts are integrating data sources including roadside and on-vehicle sensors along with atmospheric indicators to enable automated vehicles to advise drivers when it may be necessary for a human operator to intervene.

IBM’s keynote focused on how IBM was integrating new weather data forecasting sources including inputs from airplanes to enhance the accuracy and granularity of global weather forecasts. The focus for IBM is its Global High-Resolution Atmospheric Forecasting System, or GRAF, which is using IBM supercomputers to aggregate data from millions of sources, including the smartphones of Weather Channel app users (on an opt-in basis). GRAF will be rolled out later this year, IBM says.

Clearly, the app is playing a role in the larger weather-related message for IBM. In an age of European Global Data Protection Regulation and California’s new privacy legislation, one would expect IBM to make sure it gets privacy and disclosure right.

Rometty told USAToday: “Your ATM doesn’t work without us, you can’t get an airline ticket without us, you cannot fill your car with gas without us, you (won’t have a) supply at Wal-mart without us. We really are underneath almost all of it.”

IBM wants and deserves credit for changing our lives every day – maybe even saving our lives and making our lives better. It is important, therefore, that IBM take on the most challenging and important tasks facing society and avoid the trivialities of potential legal violations via smartphone apps.

Better weather forecasts are a powerful value proposition – but accountability matters too. At a trade event dominated by personal transportation transformation, IBM was oddly silent on its contribution to resolving automotive safety issues and automated driving, in particular. How can the company solve these big challenges if it can get tripped up by an app-related end user disclosure?

Roger C. Lanctot is Director, Automotive Connected Mobility in the Global Automotive Practice at Strategy Analytics. Roger will be participating in the Future Networked Car Workshop, Feb. 7, at the Geneva Motor Show –https://www.itu.int/en/fnc/2019.

More details about Strategy Analytics can be found here: https://www.strategyanalytics.com/access-services/automotive#.


Open-Silicon SiFive and Customizable Configurable IP Subsystems

Open-Silicon SiFive and Customizable Configurable IP Subsystems
by Daniel Nenni on 02-01-2019 at 12:00 pm

After 8 SemiWiki years, 4,386 published blogs, and more than 25 million blog views, I can tell you that IP is the most read semiconductor topic, absolutely, and that trend continues. Another correlating trend (from IP Nest) is the semiconductor IP revenue increase in relation to the semiconductor market (minus memory) which more than doubled from 2006 to 2016 and is set to double again by 2026.

If you really want to know how important IP is ask an ASIC expert like Open-Silicon who specializes in chip differentiation by using customizable and configurable IP Subsystems. Here is a quick look at proprietary and third party IP that is an integral part of customizable system and physical design solutions offered by Open-Silicon:

By using an HBM2 IP Subsystem (controller + PHY + I/O), with silicon validation completed in TSMC’s FinFET and CoWoS technologies, customers can minimize the integration risk. Open-Silicon can also do a complete ASIC for you if time-to-market is a challenge.

Hybrid Memory Cube (HMC) is an innovative memory architecture in terms of performance, bandwidth, power efficiency, and reliability: 15x the performance of a DDR3 module, 70% less energy per bit than DDR3 DRAMs, and 90% less space than today’s RDIMMs.

Open-Silicon, a founding member of the Interlaken Alliance formed in 2007, launched the 8th generation of Interlaken IP core supporting up to 1.2 Tbps bandwidth. This high-speed chip-to-chip interface IP features an architecture that is fully flexible, configurable and scalable. Open-Silicon provides a complete Networking IP Subsystem which includes MAC IP + FlexE IP + PCS IP + MCMR FEC IP + Interlaken IP for ease of integration and as a one-stop solution to customers designing ASICs in TSMC FinFET technologies.

Investigating, evaluating, and integrating IP is rapidly becoming the biggest challenge of the SoC/ASIC industry. The success of a chip depends on the careful selection of reliable IP. Open-Silicon has a dedicated IP team that works with a wide variety of IP providers and is continually qualifying and ranking IP and updating a portfolio of recommended IP. The goal being to help you make informed IP decisions that differentiate your product, assure IP quality and reusability, and deliver first-time working silicon.

As you may have read Open-Silicon is now a SiFive company. It was a very disruptive move which greatly accelerated SiFive’s mission of becoming a fabless custom SoC powerhouse by leveraging Open-Silicon’s large customer base and ASIC implementation expertise.

The SiFive Tech Symposiums start this month in North America where you can spend time learning about the latest RISC-V offerings differentiated by customizable and configurable IP subsystems.

The RISC-V ISA has spawned a worldwide revolution in the semiconductor ecosystem by democratizing access to custom silicon with robust design platforms and custom accelerators. SiFive is fueling the momentum with myriad hardware and software tools for new and innovative RISC-V based solutions for IoT, AI, networking and storage applications. Attendance is free and includes lunch and plenty of time to meet and network with the speakers.

Mohit Gupta, SiFive Vice president of SoC IP, will be talking about all the offerings described above and more at the SiFive Tech Symposium at the Computer History Museum in Mountain View.

About Open-Silicon
Open-Silicon is a system-optimized ASIC solution provider that innovates at every stage of design to deliver fully tested IP, silicon and platforms. To learn more, please visit www.open-silicon.com

About SiFive
SiFive is the leading provider of market-ready processor core IP based on the RISC-V instruction set architecture. www.sifive.com

Also Read:

Ethernet Enhancements Enable Efficiencies

RISC-V End to End Solutions for HPC and Networking

A 2021 Summary of OpenFive


How to be Smart About DFT for AI Chips

How to be Smart About DFT for AI Chips
by Tom Simon on 01-31-2019 at 12:00 pm

We have entered the age of AI specific processors, where specialized silicon is being produced to tackle the compute needs of AI. Whether they use GPUs, embedded programmable logic or specialized CPUs, many AI chips are based on parallel processing. This makes sense because of the parallel nature of AI computing. As a result, in silicon for these applications we are seeing large numbers of replicated processing elements and distributed memories. These large AI designs fortunately lend themselves to advanced DFT solutions that can take advantage of their architectural characteristics.

Mentor has produced a white paper, titled “AI Chip DFT Techniques for Aggressive Time to Market”, that talks about how the properties of many large AI chips can be leveraged to save DFT, ATPG and test time. The first step they recommend is to take advantage of AI chip regularity. They propose doing test insertion and pattern generation/verification at the core level. Hierarchical DFT, like that found in Mentor’s Tessent, can use hierarchically nested cores that are already signed off for DFT to run DFT on the entire design from the top level. Higher level blocks can include blocks or cores that have already had DFT sign-off. These in turn can be signed off and used repeatedly within a chip.

Tessent’s IJTAG allows plug and play for core replication and integration. It also offers automation for chip-level DFT configuration and management. The flexibility this allows for some interesting optimizations. One such case is where there are a large number of very small cores. Mentor suggests using hierarchical grouping of cores for test to reduce overhead and save time. This is a happy middle ground between too granular and completely flat ATPG.

Another optimization that their approach allows is channel broadcasting. This allows the same test data to be used for identical groups of cores. It reduces test time and the number of pins required. Tessent is smart enough to help optimize the configuration for channel broadcasting.

In addition to repeating logic elements, AI chips have a large number of smaller distributed memory elements. If each memory core had its own BIST controller this would require a large area overhead. With Tessent it is possible for one BIST controller to be shared among multiple memory cores. To go along with this they offer a shared-bus interface to optimize the connections to the BIST controller.

Another topic the white paper covers is their move to RTL for test insertion. When this is used, it is possible to run test verification before the synthesis. RTL verification runs much faster than gate level verification. Also, the debug process is easier. Moving test debug and verification to the RTL level means that synthesis is not required each time a test fix is made. Mentor has also implemented a number of testability checks at RTL that can save down-steam iterations during ATPG.

While AI is making the lives of end users easier, it is certainly creating a demand for increasingly powerful silicon for processing. Despite this growing complexity of silicon, there is a bright spot in the test arena. Mentor clearly has been investing in their DFT product line. The good news is that many of the characteristics of these AI chips create opportunities for improving the efficiency of the design process and the resulting design, particularly in the area of test. If you want to delve into the specifics of how Mentor proposes designers take advantage of DFT optimizations for AI chips, the white paper is available on their website.


Secretary Chao Unchained @ CES 2020

Secretary Chao Unchained @ CES 2020
by Roger C. Lanctot on 01-31-2019 at 10:00 am

U.S. Department of Transportation Secretary Elaine Chao has agreed to mount the stage at the upcoming Consumer Electronics Show in Las Vegas to share her vision of the positive economic impact of technology unleashed from regulatory oversight. It’s a powerful message but it’s going to be a tough sell.

Chao is likely taking the keynote slot vacated by General Motors CEO Mary Barra in the wake of the catastrophic United Auto Workers strike which left dealer lots bare in time for Christmas and delayed plans for the company’s first electric pickup truck – at least according to official GM statements.

The front burner issue for Chao at CES 2020 will be enhanced vehicle safety from inter-vehicle (V2V) connections. To her credit Secretary Chao has carved out a technology agnostic stance on the issue which has conveniently left the door open for the Federal Communications Commission to pass a Notice of Proposed Rule Making (NPRM) last week re-allocating the 5.9GHz spectrum intended for V2V applications.

“The Commission proposes to designate the lower 45 megahertz of the band for unlicensed uses like Wi-Fi,” writes the FCC in its NPRM. “This 45 megahertz sub-band can be combined with existing unlicensed spectrum to provide cutting-edge high-throughput broadband applications on channels up to 160 megahertz wide.

“The Commission is proposing to dedicate the remaining 30 megahertz of the band for use by transportation and vehicle safety-related communication services. Specifically, in the NPRM, the Commission proposes to revise its rules to provide Cellular Vehicle to Everything (C-V2X), an emerging standard for transportation applications, with exclusive access to the upper 20 megahertz of the band.

“Under the Commission’s current rules, no spectrum is allocated for C-V2X. The NPRM seeks comment on whether to retain the remaining 10 megahertz for use by DSRC systems or to dedicate it for C-V2X use.”

If adopted, the FCC plan likely puts a fork in the plans of the National Highway Traffic Safety Administration’s efforts to mandate dedicated short range communication (DSRC) technology for the same application. This 20-year-old effort appears to have arrived at the end of the road – which one might imagine is welcome news at the Trump administration’s USDOT where regulations are being eliminated, not promulgated.

Just last week Chao released a statement that she had signed a “rule on rules” ensuring the department’s regulations aren’t “too complicated, out of date, or contradictory.” The new Transportation Department action formalized a Trump administration requirement that for each regulatory step a department takes, it must undertake two deregulatory moves.

News reports quoted USDOT claims that it had exceeded its own standard, establishing a ratio of 23 deregulatory steps for each regulatory initiative – estimating unspecified resulting industry savings of $3.7B. (The agency provided no details regarding the source of these savings or industries impacted.)

Maintaining that 23-1 ratio of deregulation to regulation may pose a challenge as the USDOT faces a growing clamor for more regulatory guidance in the development of self-driving cars and the advancement of active safety systems. Secretary Chao and the Trump administration may have painted themselves into a corner with the new mandate or simply written themselves out of the normal NHTSA script intended to reduce highway fatalities by guiding the future of automotive design.

Of course, the claim of a $3.7B contribution to industry savings from reduced regulatory oversight must be considered in the context of what is shaping up as a far more enduring industry impact from Trump administration policies as Boeing ponders the termination of 737 MAX production in January. With two fatal crashes occurring under the guidance of an acting USDOT secretary (preceding Secretary Chao), the Trump administration will have to come to terms with the $3.7B earnings hit Boeing took months ago to which $5B in victim compensation and liability has since been added to Boeings books.

Some have taken to describing the USDOTs deregulation binge as “win-ovation.” Nobody at Boeing in Chicago or Renton, Washington, would use such a word – especially as Boeing ponders what may be the demise of its most popular plane.

Maybe Secretary Chao can ask fellow CES keynoter Delta Airlines CEO Ed Bastian what he thinks about deregulation.


Qualcomm Attests Benefits of Mentor’s RealTime DRC for P&R

Qualcomm Attests Benefits of Mentor’s RealTime DRC for P&R
by Tom Simon on 01-31-2019 at 7:00 am

When floor planning (FP) and place & route (P&R) tools took over from custom layout tools for standard cell based designs, life became a lot better for designers of large digital chips. The beauty of the new flows was that all the internals of the standard cells and many IP blocks were hidden from view, lightening the load on the tools and designers. So-called footprints, in the form of Library Exchange Format (LEF) files, filled in for the internals of these cells and blocks.

A properly constructed LEF cell view is frequently adequate to give the P&R tool the ability to produce design rule correct final GDS level layout. But not always. Final DRC is always needed. As design sizes increased and the loop from final whole chip DRC to fixes and verification became unwieldy and not practical for highly sensitive chip design schedules.

This is the issue that Qualcomm encountered in their flow while designing extremely large state of the art SOCs. I recently had the chance to read a white paper that talks about the issues that Qualcomm encountered and how they solved them with Mentor’s Calibre RealTime Digital in-design DRC. Let’s look at the primary issues that they encountered.

LEF files contain an abstract of the underlying layout called a footprint. However, as the cells and blocks are placed in a design there are a large number of interactions that can take place between these cells that could lead to DRC violations. Waiting until late in the design cycle to verify all the base layer design rule correctness can leave serious issues baked into the design until they are much more difficult to fix.

The other source of issues are interactions between routing in the P&R tool and the geometry inside of the placed cells. A cause of this issue can be errors in the LEF cell views. In this case the errors might not be caught until the entire design is exported and merged with the contents of the placed cells. For large designs like those in development at Qualcomm, the chip level operation to merge all the geometry in the design can take a long time.

The white paper describes the process that Qualcomm adopted using Mentor’s Calibre RealTime Design solution. Instead of doing a merge of the entire design and running batch DRC, Calibre RealTime Digital makes direct calls to the Calibre engine running foundry qualified sign-off Calibre rule decks. It is able to run analysis on the area in proximity to where designers are working. It enables incremental DRC while in the P&R tool, including all the layout layers and nested cells and blocks that are normally not available from the P&R tool. As a result, designers get nearly instantaneous feedback on violations and feedback on potential fixes.

When geometry in inside placed cells is part of violation, selected shapes contained in the cells or blocks is shown to help designers fully understand the error, so a fix can quickly be made. This is really the best of both worlds, designers can still work efficiently in the P&R tool environment, yet they are able to detect and understand problems that arise due to the fully merged geometry and all layers.

It was years ago that Calibre completely disrupted the DRC market with the introduction of its easy to use and highly effective hierarchical capabilities. Customers immediately understood the competitive advantage that they would gain by using Calibre. The Calibre RealTime Digital interface looks to be another game changing capability to come from this formidable development team. Based on the white paper it seems that Qualcomm agrees with this sentiment.


The 50th Year of Intel, What Happened in 2018

The 50th Year of Intel, What Happened in 2018
by Daniel Payne on 01-30-2019 at 12:00 pm

2018 was the 50th year for Intel in the semiconductor business, and their Q4 2018 conference call just happened last week, so I’ll get you all caught up on what they talked about. Bob Swan is the CFO and interim CEO, as the company continues to search for a new CEO after Brian K. was ousted for misconduct. Here’s a quick financial summary for the Q4 2018 quarter:

  • EPS of $1.28, beat estimates by $0.06
  • Revenue of $18.66B, a 9.4% increase, but missed estimates by $360M

For the entire year revenue passed $70B, a growth of 13% thanks to data center, IoT, programmable, memory and modem businesses. I spend all day in the cloud by browsing, banking, reading, writing and networking, so no surprise that the Data Center Group did $23B in revenue, up 21% for the year.

I kind of expected the PC business to be flat or slightly down, but surprisingly that business grew 9% for the year. There was no big Q4 revenue uptick, so the product areas blamed are:

  • Modem
  • China slow-down
  • Slower cloud growth
  • NAND pricing

Hopes are high for three areas: AI, Autonomous Driving, 5G. All of these topics are popular on SemiWiki, so let’s see if Intel can compete on a global scale with so many competitors in each field.

Related blog:Intel Swaggers at CES

3D chip packaging (aka Foveros) has arrived, and their first product is coming later in 2019, Lakefield – a 10nm CPU, four Atom cores, Gen 11 graphics. I’m still curious about how they managed to remove all of that heat effectively.

The 9th generation of Intel Core desktop products was launched in 2018, so that benefits the scientific, gaming and content creating segments. The 10nm Ice Lake client CPUs were previewed, but not quite ready for sale in 2018.

Vision processing is a typical AI task using CNN (Convolutional Neural Networks), so Intel has a 3rd generation vision processing unit along with a toolkit called OpenVINO. Another tease for AI is something Intel calls a Neural Network Processor for Inference (NNPI), due in the 2H2019, with a product family name of Nervana.

Intel’s automotive acquisition Mobileye added some 28 new design wins and 78 vehicle model launches last year, so that segment for collision detection and avoidance is in growth mode for ADAS levels one to three.

Shareholders of INTC stock enjoyed receiving $5.5B in dividends, and the company bought back 217 million shares as the stock traded between $42 – $57.

One statistic that really jumped out at me was that revenue per employee has improved 25% in the past three years.

Just the Data Center Group at Intel produced $6B of revenue in Q4, more than most companies in the world.

Back in 2006 Intel and Micron created IM Flash Technologies, LLC, but in 2018 Micron announced their plans to fully own the technology known within Intel as 3D Xpoint, or product name of Optane. Intel can still purchase their chips from Micron, so let’s see if this segment continues to grow in 2019.

Forecasts for 2019 at Intel are:

  • Revenue of $71.5B,
  • EPS of $4.60

Q1 2019 forecast of $16B in revenue and EPS at $0.87, so no growth expected.

The transition at Intel from PC-centric to data-centric is slowly taking shape.

Data Center revenues for the first nine months of 2018 were up 45%, but the fourth quarter slowed way down. For 2019 they hope that the second half shows pickup in this area.

2018 we saw Intel exit Wind River and the wearables market. The 10nm node slowly continues to ramp up production volumes in 2019, although some 4 years later than expected.


Why High-End ML Hardware Goes Custom

Why High-End ML Hardware Goes Custom
by Bernard Murphy on 01-30-2019 at 7:00 am

In a hand-waving way it’s easy to answer why any hardware goes custom (ASIC): faster, lower power, more opportunity for differentiation, sometimes cost though price isn’t always a primary factor. But I wanted to do a bit better than hand-waving, especially because these ML hardware architectures can become pretty exotic, so I talked to Kurt Shuler, VP Marketing at Arteris IP, and I found a useful MIT tutorial paper on arXiv. Between these two sources, I think I have a better idea now.

Start with the ground reality. Arteris IP has a bunch of named customers doing ML-centric design, including for example Mobileye, Baidu, HiSilicon and NXP. Since they supply network on chip (NoC) solutions to those customers, they have to get some insight into the AI architectures that are being built today, particularly where those architectures are pushing the envelope. What they see and how they respond in their products is revealing.

I talked a bit about this in an earlier blog (On-Chip Networks at the Bleeding Edge of ML). There is still very active development, in CPUs and GPUs, around temporal architectures, primarily to drive performance – single instruction, multi-data (SIMD) fed into parallel banks of ALUs. More intriguing, there is rapidly-growing development around spatial architectures where specially-designed processing elements are arranged in a grid or some other topology. Data flows through the grid between processors, though flow is not necessarily restricted to neighbor to neighbor communication. The Google TPU is in this class; Kurt tells me he is seeing many more of this class of design appearing in his customer base.


Why prefer such strange structures? Surely the SIMD approach is simpler and more general-purpose? Neural net training provides a good example to support the spatial approach. In training, weights have to be updated iteratively through a hill-climbing optimization, aiming at maximizing a match to a target label (there are more complex examples, but this is good enough for here). Part of this generally requires a back-propagation step where certain values are passed backwards through the network to determined how much each weight influences the degree of mismatch (read here to get a more precise description).

Which means intermediate network values need to be stored to support that back-propagation. For performance and power you’ll want to cache data close to compute. Since processing is quite massively distributed, you need multiple levels of storage, with register file storage (perhaps 1kB) attached to each processing element, then layers of larger shared caches, and ultimately DRAM. The DRAM storage is often high-bandwidth memory, perhaps stacked on or next to the compute die.

Here it gets really interesting. The hardware goal is to optimize the architecture for performance and power for neural nets, not for general purpose compute. Strategies are based on operation ordering, eg minimizing update of weights or partial sums or minimizing size of local memory by providing methods to multicast data within the array. The outcome, at least here, is that the connectivity in an optimized array is probably not going to be homogenous, unlike a traditional systolic array. Which I’m guessing is why every reference I have seen assumes that the connectivity is NoC-based. Hence Arteris IP’s active involvement with so many design organizations working on ML accelerators.

More interesting still are architectures to support recurrent NNs (RNNs). Here the network needs to support feedback. An obvious way to do this is to connect the right -side of the mesh to the left-side forming (topologically) a 3D cylinder. You can then connect the top-side to the bottom-side, making (again topologically) a torus. Since Kurt tells me he hears about these topologies from a lot of customers, I’m guessing a lot of folks are building RNN accelerators. (If you’re concerned about how such structures are built in semiconductor processes, don’t be. These are topological equivalents, but the actual structure is still flat. Geometrically, feedback is still routed in the usual way.)

I see two significant takeaways from this:

  • Architecture is driven from top-level software goals and is more finely tuned to the NN objective than you will find in any traditional application processor or accelerator; under these circumstances, there is no one “best” architecture. Each team is going to optimize to their goals.
  • What defines these architecture topologies, almost more than anything else, is the interconnect. This has to be high-throughput, low-power, routing friendly and very highly configurable, down to routing-node by routing-node. There needs to be support for multicast, embedding local cache where needed and high-performance connectivity to high-bandwidth memory. And of course in implementation, designers need efficient tools to plan this network for optimal floorplan, congestion and timing across some very large designs.

I hope Kurt’s insights and my own voyage of discovery added a little more to your understanding of what’s happening in this highly-dynamic space. You can learn more about what Arteris IP is doing to support AI in these leading-edge ML design teams HERE. They certainly seem to be in a pretty unique position in this area.


Switch Design Signoff with IC Validator

Switch Design Signoff with IC Validator
by Alex Tan on 01-29-2019 at 12:00 pm

The surge of network traffic at the data centers has driven to an increase in network bandwidth, doubling every 12-15 months according to a study conducted on Google’s data centers. The primary drivers to this uptick include the proliferation of cloud computing, more distributed storage architecture, emerging applications in AI, 5G and video streamings.

Innovium is a provider of high performance and highly scalable switching silicon solutions for data centers. Its TERALYNXTM Ethernet Switch family supports switch capacity ranging from 3.2 Tbps through 12.8 Tbps with large buffers. Key to its network switch design requirements are the power efficiency in terms of performance per watt, a very low latency and inherently high port count connectivity. Innovium has rolled-out several generation of silicon starting with 28nm at 3.2Tbps and the most recent one in 16nm at 12.8Tbps. One of the main challenges to its silicon design is the need to reduce the physical verification time.

IC Validator™ is a comprehensive physical verification solution from Synopsys. It delivers both performance scalability and a broad runset support for advanced process nodes including 7nm FinFET. IC Validator’s Design Rule Checking (DRC) and Layout Versus Schematic (LVS) physical verification engine has near-linear scalability performance across hundreds of CPU cores and substantially reduces the time to results as shown in Figure 2.

Earlier this month, Synopsys announced Innovium adoption of IC Validator as its TERALYNX physical signoff tool. Innovium was able to take advantage of IC Validator’s performance scaling across 250 plus CPU cores to complete full-chip DRC/LVS signoff in TSMC 16 FinFET process within a day.

“Physical verification is on the critical path to our tapeout. Early physical verification closure is essential to ensure that design schedules are met,” said Keith Ring, vice president of Technology at Innovium. “IC Validator performance enabled us to complete full-chip DRC and LVS signoff within a day for our flagship network switch design.”

“Designers are challenged to close physical verification within schedule because of the increasing manufacturing complexity at advanced technology nodes,” said Christen Decoin, senior director of business development, Design Group at Synopsys. “Through high performance, scalability, and readily available optimized runsets from all major foundries, IC Validator is providing designers with the fastest path to production silicon.”

I had the opportunity to talk about this announcement with Manoz Palaparthi, Synopsys Technical Marketing Manager. The following is his excerpted responses to my inquiries:

What types of challenges that Innovium overcame by migrating to IC Validator physical signoff?
Performance was a key concern for Innovium. The design is large with several billions of transistors to verify. As such, the traditional full-chip DRC signoff happens late in the design flow and its long runtimes can lead to tapeout delays. With IC Validator, Innovium could complete full chip DRC and LVS runs under one day. Innovium used IC Validator across more than 250 CPU cores to take advantage of IC Validator’s distributed processing and scalability.

Which parts of IC Validator verification features being utilized by Innovium?
Innovium deployed IC Validator for all of their physical verification needs, including DRC, LVS, Antenna checks and metal-Fill.

Could you comment on how IC Validator smart memory-aware load scheduling and balancing technology work?
Yes. The memory aware scheduling and smart load sharing technologies are built into the IC Validator scheduler.

Memory aware scheduling enables jobs to be scheduled based on their individual memory requirements. IC Validator scheduler estimates memory needs in advance. If a job requires large memory, for example 512GB or 1TB, it is scheduled on a large machine. Lighter jobs are scheduled on smaller machines. And as the jobs progress, the tool dynamically adjusts and reschedules as needed. With smart load sharing technology, IC Validator continuously monitors jobs and dynamically optimizes them: are the jobs progressing well? Have some machines died? Is load balancing and rescheduling required from some jobs?

Once the jobs start running, scheduler ensures that jobs run as efficiently as possible. This technology, combined with massive distributed processing scalability, accelerates time to results for customers like Innovium.

IC Validator has evolved to a comprehensive physical signoff tool, could you comment on how you keep adding values to the tool?
The focus for IC Validator product is on delivering highest productivity to the physical verification engineer. In that direction, we recently introduced lot of technology to help our customers to get to their tapeouts faster: Massively parallel distributed processing with scalability to 1000s of CPUs, Explorer to quickly identify and fix gross design weaknesses during chip integration and fusion technology for automated DRC repair, timing aware metal fill and more.

For potential users who would like to migrate to IC Validator, could you share what the expected pre-adoption collaboration time?
Potential customers can migrate to IC Validator very quickly. IC Validator runsets are readily available for all mainstream process nodes from foundry partners such as TSMC, GF, Samsung and more.

After the runsets are ready, it is just a matter of few hours to setup the tool and start running PV jobs with IC Validator. Over the next few days/weeks, customers typically do runs with some designs to evaluate tool for performance and features. Several of our recent customers are able to deploy IC Validator in production within a month.

To find out more details on IC Validator, please check HERE


Mathematics are Hard – That is Why AI Needs Mathematics Hardware

Mathematics are Hard – That is Why AI Needs Mathematics Hardware
by Tom Simon on 01-29-2019 at 7:00 am

The field of artificial intelligence has relied on heavy inspiration from the world of natural intelligence, such as the human mind, to build working systems that can learn and act on new information based on that learning. In natural networks, neurons do the work, deciding when to fire based on huge numbers of inputs. The relationship between the inputs, in the form of incoming synapses, and the act of firing an outgoing synapse is called the transfer function. In the wetware of our brains there is a complex interplay of neurotransmitters and ion flux across cell membranes that define the activation function for a given neuron. In some cases, thousands of incoming inputs can control firing. In each part of the brain where information is processed there might be a multitude of parallel neurons layered in series to perform a task.

So too, in artificial neural networks, there are layers of highly interconnected neurons that receive input and must decide to fire based on an activation function. Over the years, AI researchers have experimented with a wide range of activation functions, i.e. step function, linear, sigmoid, etc. Sigmoid and sigmoid-like functions have become popular because of their responsiveness and suitability for building multilayered networks. There are, of course, other desirable types, but many of these share the need to vary sensitivity as a function of input level.

A direct consequence of the use of sigmoid functions, with their varying sensitivity, is the use of floating point operations to implement the activation functions for many classes of neural networks. Initially it was sufficient to rely on the processing capabilities of standard processors for the implementation of neural networks, but with increasing requirements for throughput and optimization, specialized hardware is used more and more often. Specialized floating point processing has proven itself to offer big benefits. In addition to lack of parallelization, CPU based floating point units often did not fit the precision, area, power or performance needs of neural networks. This is especially true for neural networks tailored for specific tasks.

It makes more sense for designers of SOCs that perform neural network processing to pick and choose the best floating point IP for their application. The considerations might include optimization through fusing specific functions, controlling rounding, and minimizing redundant normalization operations. It is also worth pointing out that when lower precision will work, reductions in area and power can be made.

It was with great interest that I read an article by Synopsys on how their DesignWare library offers floating point functions that can be used for neural network implementations. When their floating point operations are pipelined it is possible to use their Flexible Floating Point (FFP) format to transfer intermediate results and meta data between operations. This gives designers the ability to precisely specify all aspects of the float operations. To ensure ideal performance and behavior, Synopsys supplies bit-accurate C++ libraries, allowing system simulation that includes desired floating point operations.

In some ways, the future of neural networks is about lower precision. A lot of work is going on into driving towards sparse matrix operations at the lowest resolutions. However, at these precisions the sensitivity and accuracy of the operations must be ensured. Designers need the ability to explore and then define the optimal configurations and operation of special purpose neural networks. The Synopsys DesignWare IP offerings for floating point functions seem like a useful tool to accomplish this. The article referenced above can be found on their website.


Accuracy of In-Chip Monitoring for Thermal Guard-banding

Accuracy of In-Chip Monitoring for Thermal Guard-banding
by Daniel Payne on 01-28-2019 at 12:00 pm

I remember working at Intel and viewing my first SPICE netlist for a DRAM chip, because there was this temperature statement with a number after it, so being a new college graduate I asked lots of questions, like, “What is that temperature value?”

My co-worker answered, “Oh, that’s the estimated junction temperature of the chip.”

The next question was, “What do you mean estimated, don’t we know what the junction temperature actually is?”

With a slight grimace, the co-worked replied, “Well, we don’t know what the actual junction temperature is until we fabricate it, package it up, place it in the tester, then measure it. So we just estimate the junction temperature based on past experience, then put that number in SPICE.”

My naive engineering bubble had just burst, because I assumed that my professional colleagues in industry knew a lot more about the DRAM chips that they were designing than to simply guess at a temperature for use in SPICE, then hope for the best when silicon came back. Today, however, the IC design landscape has changed quite a bit, even to the point that engineers can place an actual IP block on their chip that will dynamically measure the local temperature in real time, aka In-Chip Monitoring.

Going back to the DRAM example at Intel we first started out packaging the memory device in a rather expensive ceramic package which had excellent thermal properties, but then for cost-savings we would migrate to a cheaper plastic package with poor thermal properties, so knowing the junction temperature made a huge difference in the operation of the DRAM and the profit margin of our product.

Chips are being designed today across a wide range of process nodes from the mature 40nm down to research nodes like 3nm, and at each node you have to keep your chip operating within a safe thermal limit in order to meet power and reliability requirements. Many design segments are limited by thermal considerations for semiconductor devices, like: Datacenter, IoT, consumer and automotive. If you can sense the die temperature and then manage the operation of the chip to keep within thermal limits, you will save power and improve reliability.

Let’s consider using an in-chip thermal monitor where we have a target junction temperature of 85 degrees C. If the temperature sensor accuracy is plus or minus 5C, then our expected temperature range is 80C to 90C. When the worse-case lower point to the temperature of 80 degrees C is reached then the chip could slow down a clock frequency or even reduce the Vdd level to one or more IP blocks. So, within software you may decide to set such actions to be taken at the 80C, to be on the safe-side. However, by setting the software limit to 80C you still need to account for the worse case thermal sensor accuracy. Therefore, within the thermal guard-banding scheme, software thinks 80C is reached but actual junction temperature could be as low as 75C.

In comparison what happens if we instead use a temperature sensor with a tighter accuracy of plus or minus 2C?

The good news is that this more accurate temperature sensor has a tighter range of 83C to 87C, and then with guard-banding has a lower limit of 81C. The difference between the first temperature sensor limit and the second one then becomes 81C – 75C = 6C. That 6C difference means a lot, and could be between 5W and 10W of power savings, depending on the architecture.

When talking about a consumer hand-held device running on battery power, that 5W-10W savings means longer battery life, a real benefit. On the other end of the electronics power spectrum like a data center or telecom system this savings would be seen in system energy consumption, speed and data throughput. An automotive benefit is tighter reliability management of the semiconductor device.

An IP supplier based in the UK that focuses on in-chip monitoring of temperature is Moortec, and Stephen Crosher is the CEO who recently made a video on this thermal topic. Stay tuned for a video series from Moortec because they also have IP sensors for Process and Voltage, parts of the PVT troika.

Related Blogs