Bronco Webinar 800x100 1

Emulation Evaluation for the Ages!

Emulation Evaluation for the Ages!
by Daniel Nenni on 12-24-2018 at 7:00 am

One of the more entertaining things I get to observe in the semiconductor ecosystem is competitive customer evaluations of tools and IP. Seriously, this is where the rubber meets the road no matter what the press releases say.

This time it was emulators which is one of the most interesting EDA market segments since there is no dominant vendor. So it really is three big dogs eating out of one bowl as former Cadence CEO Joe Costello so elegantly put it many years ago. We all know Mentor dominates verification, Cadence AMS design, and Synopsys Synthesis and IP. But for emulation Mentor, Cadence, and Synopsys all have sizable dogs at this $300M+ bowl.

In my experience with emulation evaluations, if there is an incumbent (an already installed system) they have a distinct advantage, unless of course the customer has “outgrown” them which is what happened in this case. I had the inside track since I know Wave and have been waiting patiently for the press release to blog it:

Wave Computing selects the Mentor Veloce Strato platform for verification and validation of artificial intelligence SoC designs

“Wave Computing is revolutionizing artificial intelligence and deep learning with our dataflow technology-based solutions, which are pushing the boundaries of AI system design. The Veloce Strato roadmap not only addresses growing capacity needs, but it also maps to the diverse and expanding challenges of hardware/software verification and validation,” said Darren Jones, vice president of Engineering, Wave Computing. “When we saw that our reliance on hardware emulation was growing beyond early software validation, we evaluated all available tools. The Veloce Strato platform was the best solution that met our needs. It enables a robust virtual emulation environment that tackles complex AI design challenges.”

Coincidentally I was interviewed by CNBC last week and asked why smart phone companies are making their own SoCs instead of just buying them from Qualcomm. One of the reasons of course is emulation. When you design an SoC you can quickly debug the chip on an emulator then get started with software development before the silicon is back. Bigly advantage for a company like Apple who has lots of software floating around their SoCs, absolutely.

After talking to Jean-Marie Brunet, Director of Marketing, Emulation Division at Mentor Graphics, and getting some slides, I sent some questions to Darren Jones and Edmund Jordan at Wave:

Q: I would like to know why you chose Mentor?
We selected Mentor because the platform performed debugging processes far better, which is where we were spending the majority of our time. By helping us significantly reduce the amount of time we spent debugging, Mentor helped us speed time-to-market. Mentor’s Veloce® Strato™ emulation platform also included several different interface options and other features which led to Veloce being a far more complete solution, out-of-the-box.

Q: Can you elaborate on the details of your design?
We have a rather large design within a leading-edge process node. As is industry standard, we believe emulation is a key part of verifying design before the proposed solution is put on silicon, which is why we turned to Mentor’s Veloce® Strato™ emulation platform for assistance. Our design includes memory interfaces such as DDR, PCIE cards, etc. as well as a large array of custom processors.

Q: How fast did your design run on the emulator?
A rough estimate would be that we run about 1MHz, which is a notable improvement over what we ran on our previous product when debug is enabled. However, we use the emulator primarily for debugging the design so absolute speed is not as important as the speed of debugging.

Q: What is your chip design methodology?
We primarily use standard Verilog RTL, Synthesis, and place and route design methodologies. We do use custom blocks when performance is required.

Q: Is the chip taped out?
Yes.

Q: First silicon working?
Yes.

Q: Anything else interesting to add?
One advantage—and one of the reasons we chose Veloce—is that it’s very easy to use for hardware debug when running real-world software applications. What we liked most is that we could run different simulation cycles without having to recompile the design. This helps speed our development cycles, which is certainly advantageous.

We’ll always have simulation, but the advantage of emulation is it’s much faster than simulation. Thus, it enables us to run software that would take too long to run on a simulator. However, I’m certainly not going to run every software scenario on an emulator because doing so is not cost effective.

If you want to know who the incumbent was you will have to login and read the comments…


CAPEX Cuts and Microns Memory Markdown

CAPEX Cuts and Microns Memory Markdown
by Robert Maire on 12-21-2018 at 7:00 am

For those who have been paying any attention to the semiconductor industry its no surprise that memory demand and therefore pricing is down from its peak earlier in the year. Its not getting better any time fast.

After several strong years of demand and pricing, which was followed by enormous CAPEX spending we are seeing the standard reverse pattern of the cycle as we are headed back down to low pricing and low demand coupled with low capex. There still may be a few investors, inexperienced analysts and company management who still cling to “its no longer cyclical”, “its different this time” and “Santa Claus and the Easter Bunny are real…”.

Micron reported both EPS and revenue more or less in line with expectations with Revenues slightly low at $7.91B versus expected $8B and EPS slightly ahead of the expected $2.95 coming in at $2.97.

The big issue that caused the stock to drop after hours was guidance of $5.7B to $6.3B versus expectation of $7.3B. EPS is projected to be $1.75 +- $0.10 versus expected $2.44.

When memory slows down it goes off a cliff without skid marks……

We are somewhat surprised that other analysts did not cut their estimates in front of the quarter as it was obvious that things have been deteriorating for a long time. It was not rational to expect revenues to hold up given a double whammy of lower demand and lower pricing.

Capex was cut by about 12% at roughly $1.25B down. Bit growth in both NAND and DRAM sounds like the low teens in 2019. The company is cutting supply growth to match the reduced bit growth.

The one thing that is different this time is that Micron’s financial and cash position is much better and stronger to weather this downturn. The company spoke a lot about managing costs. Buybacks will help support the stock.

The company views this as an “air pocket” that will be short lived, perhaps the first half of 2019 given the underlying strong demand but we would be more cautious about all of 2019 rather than just the first half.

More impact on semi equipment companies
We think that the more negative impact that investors should focus on are the semi equipment companies. Though they may try to downplay the capex cuts as only $1.25B, the reality is that it is obviously much more widespread across the industry than just Micron.

Samsung capex memory spend down 75%??
We have mentioned previously that we have heard discussion of Samsung cutting its memory capex by as much as 75%. Given how much Samsung increased capex over the past couple of years, that level of cut would get them back to a more normal prior level. So the real concern is not Micron’s capex cut as Micron never went crazy but rather Samsung’s cut as Samsung was spending at way crazy, unsustainable levels.

Capex cuts will not be even- LRCX and AMAT suffer more
Micron made it clear on the call that they will keep up technology spending and slow capacity spending. The simple translation is spending on KLAC and ASML equipment and slowing dep and etch from Lam and Applied Materials. This does not mean EUV as memory uses standard DUV scanners but rather focus on going from 1X to 1Y then 1Z which requires yield management and lithography. Micron said its technology progress and costs were ahead of schedule.
Memory peaked at 84% of Lams business and we won’t see those levels again for quite a while

Micron, the stock, is still very cheap
The new estimate of $1.75 is an annual run rate of $7 which suggests the company is trading at roughly a bit over 4 times forward EPS, which is cheap by any standard.

The obvious question is what EPS will really trough at? If we trough at $1 a share per quarter , then we are currently trading at 8 times trough earnings. The next question is could we get below that? Whats the worst case scenario?
Given our guess that weakness lasts at least two quarters and perhaps more, we think $1 a quarter is likely a trough EPS.

Micron has a much better financial position in the current cycle than previous cycles with over $3B in net cash and an ongoing buy back program.

$30 feels like a pretty solid bottom in the near term for the shares of Micron. If the stock were below that we would be more aggressive buyers.

Much as we have seen recently, the after hours knee jerk reaction can sometimes reverse in the following trading day (as we had experienced with AMAT) so we don’t think the stock will be down as much as it was after hours.

Equipment stocks
Equipment stocks had a slight recovery only to retest lower levels. We think the Micron news is more negative for equipment companies than perhaps for Micron itself.

Lam remains the poster child or most direct victim of memory issues. We think the equipment companies could see more downside or limited upside as the real prospects of slashed Samsung spending come into view. Its quite clear that Samsungs spending in memory will be down, the only question is how much…..and we think that number is underestimated.

Quarterly outlook into 2019
Q1 is always the weakest quarter for the chip industry. With Chinese new year and a post partum depression after the holidays, memory pricing has historically been at its weakest.

Q2 may be a little better than Q1 but we see no reason for a bounce back.

Q3 2019 seems the earliest we could see any kind of recovery in demand and/or pricing of memory components.
The stocks may bounce along the bottom here for a while as we are in waiting mode.

With the addition of the China sword dangling above our head, upward movement in the near term is going to be very difficult at best.

This years new song….
Sung to the melody of “Baby Its Cold Outside”


Memory chips really can’t stay (Baby it’s cold outside)
NAND & DRAM prices have gone away (Baby it’s cold outside)
This long cycle has been (Been hoping that you’d dropped in)
So very nice (I’ll hold your hands they’re just like ice)
Investors will start to worry (Beautiful what’s your hurry?)
Stock prices haven’t found the floor (Listen to the fireplace roar)
So to cash I’d better scurry (Beautiful please don’t hurry)
Well maybe just a half a drink more (I’ll put some records on while I pour)

Synopsys Offers Smooth Sailing for OTP NVM

Synopsys Offers Smooth Sailing for OTP NVM
by Tom Simon on 12-20-2018 at 12:00 pm

Nobody likes drama. Wait, let me narrow that down a bit. Chip designers really hate drama. They live in a world of risk and uncertainty, a world that tool and IP vendors spend considerable resources trying to make safer and more rational. It’s notable just how ironic that Sidense and Kilopass were duking out patent litigation in the earlier part of the decade. Their products, one time programmable (OTP) non-volatile memory (NVM) exist solely to provide certainty and reliability in a wide range of IC’s – both digital and analog, from 180nm down to 16nm. It is noteworthy that both of these companies have been acquired by Synopsys – probably the EDA/IP company most renowned for its no non-sense approach to business and technology.

While usually a diversity of suppliers is a good thing for customers, these acquisitions have probably been for the better. Each of the two, Kilopass and Sidense, had their own strengths in this market, while both using similar 1T and 2T antifuse technology that has a lot to offer. Of course, there is more to OTP NVM than the bit cell, controller circuitry is also responsible for many aspects of the NVM memory block operation.

What are chip designers looking for when evaluating OTP NVM memory? In many cases OTP NVM is used for security related features, such as unique device identity, crypto key or secure boot code storage. It needs to be compact, cost effective, power efficient, secure and reliable. With the consolidation of this technology by Synopsys, customers should expect to have access to a full range of OTP NVM technologies. This ranges from small register size blocks up to megabit size storage for boot code.

Antifuse OTP NVM is easy to use because it requires no additional layers for CMOS processes. This helps manage production costs and risks. Antifuse also has some useful characteristics. Because the programming involves oxide breakdown during the write operation, it is nearly impossible to read the logic state through mechanical or visual inspection. Symmetric storage strategies also eliminate side channel attacks to read data. The oxide breakdown is also non-reversible, so it is not prone to EM driven metal regrowth, like eFuse can experience.

Also unlike the NVM techniques that use stored charge, antifuse NVM are not vulnerable to UV, thermal or aging issues that can lead to charge depletion and associated data loss. Antifuse is a robust and efficient method for OTP NVM. Larger instances can support few time programmable (FTP) which is implemented using remapping in the controller to provide re-write functionality. This is useful for trim information on PMIC, calibration on sensors, re-provisioning of security keys, or limited code updates, etc.

Synopsys, Sidense and Kilopass were all known for the extensive qualification work on a wide range of processes. A lot of sensor and analog chips use antifuse on older legacy process nodes, but Synopsys antifuse has been qualified on the latest FinFET nodes as well. This makes it attractive because other NVM techniques have had trouble migrating to smaller more advanced nodes.

Synopsys DesignWare OTP NVM is ideal for automotive applications because of its AEC-Q100 grade 0, 1, and 2 qualification. It has very high temperature stability, with operating temperatures up to 175C. Synopsys DesignWare OTP NVM is available on TSMC, SMIC, UMC, GLOBALFOUNDRIES.

Synopsys recently added an excellent overview of their full range of OTP NVM offerings and their advantages on their website. Despite the dramatic history, it seems that antifuse OTP NVM is a sound solution when looking for security, safety and optimal PPA.


AI at the Edge

AI at the Edge
by Tom Dillinger on 12-20-2018 at 7:00 am

Frequent Semiwiki readers are well aware of the industry momentum behind machine learning applications. New opportunities are emerging at a rapid pace. High-level programming language semantics and compilers to capture and simulate neural network models have been developed to enhance developer productivity (link). Researchers are pursuing various hardware implementations to accelerate throughput of both the dataset training and subsequent inference steps. These hardware projects include: a processor architecture for software-based execution of neural network models (CPU- and GPU-based); an ASIC specifically developed for ML applications (e.g., a TPU), or deploying commercial field-programmable logic arrays.

To date, however, the main production-level ML applications have been exercised on datacenter-class resources. The effective throughput is derived from evaluating the NN model on a batch of input samples, to compensate for the delays in loading the NN layer weights and activation functions from memory. Yet, the data is generated at the edge, and the connectivity bandwidth, latency, and power consumption to communicate with the central processing datacenter for inference evaluation simply does not scale. Thus, the R&D focus is shifting to innovations required in the (distributed) edge computing architecture.

The forecasts for the edge computing market are extremely robust – e.g., 35%+ CAGR, TAM exceeding $30B in 5 years. The application areas being pursued are diverse. (Although edge computing is often associated with the Internet of Things, I’ll stick with edge computing in this article – several of the application examples given below are not usually associated with the IoT.)

The most publicized edge computing AI activity is certainly the development work on autonomous vehicles. Another fast-growing ML area is in the field of factory automation and robotics. There are applications where a human operator is currently glued to a monitor, where machine learning technology can accelerate and alleviate the task of classification. Increasingly, video surveillance technology will incorporate more sophisticated ML inferencing computation. The detection of an attempted network intrusion security breach will also expand the adoption of ML decision support.[1]

An interesting area that I recently read about is the opportunity to provide emergency call centers and responders with more accurate information. Analysis and classification of various patient inputs (e.g., breathing, voice patterns) will improve the triage steps immediately pursued. In these examples, note that the primary characteristic is that the inferencing throughput must be optimized for batch = 1.

Perhaps the best indication of the tremendous growth and interest in edge computing for AI applications is the introduction of an industry conference specific to the topic – the Edge AI Summit – recently held in San Francisco. One of the presentations at the conference was from the team at Flex Logix, who provided details of their NMAX embedded FPGA IP, optimized for edge AI applications.

Last month, the Flex Logix team shared an overview of NMAX with me (link). Recently, I met with the team again to review the details of their Edge AI Summit presentation.

NMAX is a field-programmable logic architecture specifically developed for edge AI (batch = 1). The NMAX “tile” building block for embedded IP integration consists of:

 

  • MAC logic (64 8-bit MAC’s in each cluster, 8 clusters per tile)

From the detailed (floating point-based) training weights, the NMAX edge AI MAC utilizes a scaled int8 implementation, with minimal loss in classification accuracy.

 

  • traditional programmable logic

The activation function calculations for each neural network layer are mapped to this EFLX logic. The state machine control for layer-by-layer evaluation and NN model reconfiguration sequencing is also mapped to this logic.

 

  • internal L1 SRAM

The L1 SRAM stores the node weights – more on that shortly.

 

  • connectivity to embedded L2 SRAM

The composite NMAX tile array can be integrated with a range of L2 SRAM sizes (link). The L2 stores NN layer data values while the tiles are being reconfigured for successive layers. As will be discussed shortly, developers will optimize the tile array and SRAM configuration for their NN model.

 

  • high embedded IP I/O pin count for external connectivity

The goal of the NMAX implementation for edge AI is to minimize the number of (and latency to) external DRAM, to retrieve the layer configuration data, weights, and store intermediate results. Real-time NN object classification examples provided in the Edge AI Summit presentation illustrate images/second classification results on the YOLOv3 benchmark using only 2 LPDDR4 DRAM modules, achieving 24 fps with a 6×6 tile array plus 36MB L2 SRAM @ 1GHz. (YOLOv3 is a demanding image classification benchmark – i.e., utilizing >100 layers, >60M weights, requiring 800B calculations/image.)

 

  • compiler support, from a NN model description to full physical eFPGA programming

The figure below illustrates the functionality of the NMAX compiler. The user provides the NN model (e.g., TensorFlow, Caffe), and the proposed tile array plus SRAM topology.

The intermediate compiler output provides the allocation and sequencing of NN model layers to the hardware in the tiles, allowing the user to quickly optimize the area versus throughput tradeoff. The final compiler output data provide the physical programmable logic assignment and interconnect implementation throughout the tile array.

Specifically, the NMAX compiler addresses NN throughput using several unique algorithms. Cheng Wang, Senior VP of Engineering at Flex Logix, indicated that there are features to provide an optimal assignment of layer nodes to the tiles, as well as overlapping of model reconfiguration steps. For example, while a layer is executing, the weights for the next sequence of nodes to be configured are pulled from L2 SRAM (or external DRAM) into L1 memory. Once the next logic reconfiguration is complete, the corresponding set of node weights are transferred to L0 inside the tile MAC to execute.

Geoff Tate, Flex Logix CEO, indicated that his discussions with Edge AI Summit participants showed great interest in larger NMAX configurations, able to contain multiple NN layers in the tile array, executing in a direct pipeline without the latency of reconfiguring the tile logic and weights for successive layers.

FPGA logic offers a unique hardware alternative for ML applications, with improved performance over software-centric architectures. For the transition to ML inference at the edge, embedded FPGA IP is an excellent fit, as long as the following characteristics are available:

  • optimized for the high MAC demand
  • integrated with a significant capacity of L2 SRAM
  • wide connectivity to L2 and external DRAM to minimize latency to load new network input data and layer configuration information
  • expandable to allow optimization of area, power, and computational throughput (especially the ability to represent multiple network layers without logic reconfiguration)

The NMAX architecture from the Flex Logix team strives to meet these requirements for edge AI applications (link).

-chipguy

References

[1]Xin, Y., et al., “Machine Learning and Deep Learning Methods for Cybersecurity”, IEEE Access, Volume 6, p. 35365-35381, 2018.


DAC versus SEMICON ES Design West!

DAC versus SEMICON ES Design West!
by Daniel Nenni on 12-19-2018 at 12:00 pm

As I mentioned in a previous post, the big drama at last year’s Design Automation Conference was the acquisition of the Electronic Systems Design Alliance (formerly EDAC) by SEMI, the owner of the SEMICON West Conference franchise. The plan is to add an ES Design West wing to the SEMICON West conference in San Francisco next year. DAC is in June, SEMICON West is in July, thus the conflict. Given EDAC was a big DAC supporter some consider this a treasonous act which makes it all that more entertaining.

The timing is right to launch a competitive conference in San Francisco because in 2019 DAC will be in Las Vegas, a location criticized for a lack of local technical community support amongst other things. DAC has been in Las Vegas twice over the last 35 years if my memory serves me. The first time was in 1985. It was memorable as my second DAC but also because I was just married and my beautiful bride joined me. It really was an exciting time in EDA and Las Vegas is an exciting venue, absolutely.

In fact, my beautiful wife and I returned to Las Vegas 30 years later to renew our wedding vows. Funny story, we actually went to Las Vegas on our anniversary to see Elton John and I had planned on surprising her with a quick chapel reenactment by an Elvis impersonator but the hotel had a last minute wedding cancellation so we got a large room with all of the trimmings. My wife was duly impressed, Las Vegas baby!!!

Here is the official ESDA announcement:

Oct 24, 2018:ESD Alliance Announces ES Design West Debut in Conjunction with SEMICON West 2019 in San Francisco

One thing I can say about the Design Automation Conference organizers is that they listen and last week is proof in point. DAC will be in San Francisco from 2020-2025 and probably beyond. Talk about returning fire, wow! Truthfully, I will miss the exotic DAC venues like Las Vegas, New Orleans, San Diego, Los Angeles, Austin, Miami, Dallas, Albuquerque, and other locations that I attended but can’t recall.

But I do agree that the San Francisco DAC provides the most value add for EDA and IP vendors exhibiting their wares, absolutely. The question is: Will there be enough demand for two Design Automation Conferences in San Francisco a month apart in 2020 and beyond? You tell me in the comments section and I will add my thoughts.

Here is the complete DAC PR:

The Design Automation Conference Secures a Five-Year Conference Location at San Francisco’s Moscone Center

The world’s premier event devoted to the design and design automation of electronic chips to systems returns to the Bay Area starting June 2020 and beyond

LOUISVILLE, Colo. – December 13, 2018 Tapping into a resurgent interest in electronic design automation and the proximity to Silicon Valley, sponsors of the Design Automation Conference (DAC) announced they will hold the annual event, now in its 56[SUP]th[/SUP] year, in San Francisco for five consecutive years, starting in 2020.

DAC’s sponsors – the Association for Computing Machinery’s Special Interest Group on Design Automation (ACM SIGDA), and the Institute of Electrical and Electronics Engineer’s Council on Electronic Design Automation (IEEE CEDA) – secured the following dates at San Francisco’s Moscone Center to hold their annual event.

  • DAC 2020, June 17 – 26 – North and South Hall
  • DAC 2021, June 23 – July 2 – West Hall
  • DAC 2022, June 15 – 24 – North and South Hall
  • DAC 2023, June 21 – 30 – North and South Hall
  • DAC 2024, June 19 – 27 – West Hall

After 55 years, DAC continues to be the world’s premier event devoted to the design and design automation of electronic chips to systems, where attendees learn today and create tomorrow. A rise in attendance and participation in the 2018 DAC in San Francisco spurred the sponsors to return to the city after the 2019 event that will be held in Las Vegas, June 2-6. More than 6,000 people attended and more than 170 companies exhibited at DAC 2018 held at Moscone West Center.

DAC is the only event for top-notch researchers to present their cutting-edge discoveries, the platform for leading field engineers to share their experiences in using EDA (electronic design automation) tools to tame the ever-growing circuit and system design monsters, and the largest exhibition of EDA tools, software, IP cores, and other related products and services. DAC has been and will continue to be the must-attend event for the design and design automation community.

“We attend DAC to build our knowledge of what is possible so that we can continue to innovate and stay at the leading edge of design,” stated Chris Collins, senior vice president, products & technology enablement at NXP. “DAC has been and always will be a place we look to for such guidance. It is a place where we can meet with pioneers, innovators, and solutions providers to understand the technology and provide feedback on the technology that will drive our next products.”

The abstract submission deadline for DAC 2019 technical papers closed November 20, 2018, with a record 1049 abstracts received and 819 accepted papers for review. The number of accepted papers for review for DAC 2019 has surpassed the last five years of accepted for review papers by approximately 19%. Among the hot topics for 2019 are submissions in areas such as machine learning and artificial intelligence architectures, which increased by 61%.

“DAC continues to evolve to satisfy the needs of our industry. It’s always been the premier place for presenting EDA research and showcasing all vendors under one roof,” said John Busco, director, logic design implementation at NVIDIA. “In recent years, it’s added tracks to satisfy the interests of working designers and IP consumers. DAC brings together academia, commercial EDA, and electronic system designers—a true cross-section of semiconductor design. As technology progresses, and both our challenges and opportunities multiply, DAC offers an ideal forum to explore, exchange ideas, and innovate.”

The call for contributions for the 56[SUP]th[/SUP] DAC in Las Vegas is now open for the Designer Track and IP Track. The submission deadline is January 15, 2019. For more information visit: https://www.dac.com/submission-categories/designer-track

About DAC
The Design Automation Conference (DAC) is recognized as the premier event for the design of electronic circuits and systems, and for electronic design automation (EDA) and silicon solutions. A diverse worldwide community representing more than 1,000 organizations attends each year, represented by system designers and architects, logic and circuit designers, validation engineers, CAD managers, senior managers and executives to researchers and academicians from leading universities. Close to 60 technical sessions selected by a committee of electronic design experts offer information on recent developments and trends, management practices and new products, methodologies and technologies. A highlight of DAC is its exhibition and suite area with approximately 175 of the leading and emerging EDA, silicon, intellectual property (IP) and design services providers. The conference is sponsored by the Association for Computing Machinery’s Special Interest Group on Design Automation (ACM SIGDA), and the Institute of Electrical and Electronics Engineer’s Council on Electronic Design Automation (IEEE CEDA).

Design Automation Conference acknowledges trademarks or registered trademarks of other organizations for their respective products and services.


Ampere: More on Arm-Based Servers

Ampere: More on Arm-Based Servers
by Bernard Murphy on 12-19-2018 at 7:00 am

Since I talked recently about AWS adding access to Arm-based server instances in their cloud offering, I thought it would be interesting to look further into other Arm-based server solutions. I had a meeting with Ampere Computing at Arm TechCon. They offer server devices and are worth closer examination as a player in this game.


First, the people at Ampere are heavy hitters. Start with Chairman and CEO Renee James, a past president of Intel. The CFO/COO is ex Apple and Intel and almost everyone else is ex Intel, immediately or at some time in the past, including the architect and VP of engineering, all with solid server backgrounds. I’ve also heard that they are raiding Marvell/Cavium for talent. I met with Matt Taylor, SVP of WW Sales and Biz Dev. Between Intel and Ampere, Matt was VP of sales for Qualcomm’s Datacenter group. All in all, a pretty impressive lineup for a business targeting the cloud space. The company is funded by the Carlyle group (first round), though no word on how much.

I had to ask Matt for his view on why QCOM exited servers. No real surprises but good to hear from an insider. He said the business opportunity was strong (well he would), but QCOM was distracted (just a bit). Paul Jacobs and Derek Aberle, who were supporters, left and QCOM had to cut $1B, for which datacenter was an easy target. Multiple reasons, fairly unique to QCOM, which didn’t really say anything about the general Arm-based server opportunity.

Ampere is going after the same target as Annapurna (AWS), except Ampere isn’t captive so is aiming at all the cloud top-end providers (the hyperscalers/super 8) – Google, Amazon, Microsoft, Facebook, Baidu, Alibaba, Tencent, and China Mobile – all of who buy servers by the railcar load.

On specs, Matt has offered that in current 16nm implementations the Ampere eMAG solution is comparable to Xeon Gold devices, but at half the cost and Epyc devices at half the power. Side-note on power: some analysts think cloud users won’t care – they just pay for usage time, so performance should be the only metric that matters. Wrong – power contributes significantly to total datacenter overhead in the cost of keeping the whole thing cooled. Your bill as a user is part runtime (and price) on the instance type you chose and part overhead, including cooling costs. So yeah, power matters, even though it’s an indirect cost.

Lenovo has released (recently) their ThinkSystem HR350A rack server based on the eMAG processor, so it’s already possible to deploy servers based on the devices. Just like Arm, they stress scale-out applications (high parallel operations like video serving, where it is easy to add more processors to handle more parallel requests) and similar applications where performance per dollar and performance per watt are important considerations.

Matt told me that they are at various stages (eval to deployment) with big cloud service providers and are hearing similar themes for workload trends well-fitted to Arm-based servers, including storage, internal and external search, content delivery, in-memory db applications and (interestingly in china) for mobile gaming with cloud-based rendering. Some of these are accelerator options but he stressed also standard server applications with differentiated capabilities that you couldn’t easily get on the usual platforms. Sadly he didn’t want to share specific examples.

Overall, sounds very consistent with the Arm story I wrote about earlier. Arm-based servers may not be as fast, unit for unit, as the best of the best from Intel and AMD but (a) they’re a lot cheaper and lower power than those options and (b) you can build your own customized solutions optimized to higher throughput per dollar/watt for specific workloads. In some pretty high traffic datacenter applications, the best of the best may not always be the best total system solution.


SoC Design Partitioning to Save Time and Avoid Mistakes

SoC Design Partitioning to Save Time and Avoid Mistakes
by Daniel Payne on 12-18-2018 at 12:00 pm

I started designing ICs in 1978 and continued through 1986, and each chip used hierarchy and partitioning but our methodology was totally ad-hoc, and documented on paper, so it was time consuming to make revisions to the chip or train someone else on the history or our chip, let alone re-use any portion of our chips again. Those old, manual ways of doing chip designs are happily far behind us now, so much so that recent smart phone chips routinely have processors with billions of transistors, with massive amounts of semiconductor IP reuse, all enabled by more modern and automated IC design flows. This blog idea springs from information gleaned in a White Paper written by Methodics, a software company founded in 2006 with a headquarters in San Francisco. The big picture view at Methodics is to model your entire SoC as related sets of functional blocks, then automate the workflow to ensure that your chip design is consistent and easy to update and communicate changes and dependencies.

Here’s a picture of what they call an IP configuration and how it maintains multiple relationships to design data and versions:

The specific software tool at Methodics is called Percipient, and using this IP configuration approach you can do top-down designs more easily, because along the way the tool is tracking the content of each IP and the hierarchical relationship between them. These IP objects and relationships can be quickly captured at the very start of a project, even before the design details are ready. Everyone on the design team can visualize how their part of the project is being placed in a hierarchy and what its dependencies are going to be. Metadata is attached to each IP, so for example a Bluetooth IP block may require a specific PDK version from your foundry of choice and you can quickly determine if all IP blocks are compatible with that PDK version.

In the first diagram there’s a blue area showing that IP can be imported for re-use from many Data Management (DM) sources:

  • Perforce
  • git
  • Subversion
  • Custom

If your particular DM system isn’t listed, then just contact Methodics to see if they’ve already got an import available. The files in your DM can be primarily binary, text or a mixture of the two, so it’s your choice and there’s no restriction on DM type or how you make relationships between IPs of each type.

Workspaces are used to save specific configurations of your own choosing. Making changes to IP and its metadata can then be saved as a release, and each release has the relationships between all IPs in your hierarchy at that one point in time. With any particular release you can run simulation and functional verification, then the results are attached to that release. Everyone on your team can be notified when a new release happens on some IP.

There are even third party integrations with requirements management and bug-tracking tools, so team members always know for each IP what the requirements associated with it are, along with any bug reports. Here’e another diagram to show how an IP configuration connects with other tools in your IC flow:

So with the Percipient methodology you can go to one place and find out all information about your electronic system, from the top-level all the way down to the lowest block levels. You will know where each block is being used and how often it is being re-used, along with the requirements and performance, plus the history of changes made to it and by who. Searching through the Percipient catalog is quick and easy, so it takes a lot of the guesswork out of complex IC design projects.

Projects that need to comply with Functional Safety (FuSa) will enjoy the traceability features built-in to Percipient, so that you can validate every safety function automatically at each release. Another benefit to automating FuSa compliance is that user responses to questionnaires can be attached to specific IPs, and then managed throughout the design hierarchy.

OK, this sounds promising so far, but how do I know how to best partition my specific design with this tool? The best practice is to place anything that could be re-used into a functional block as its own IP object. A functional block can contain sub-blocks too, here’s another example of a hierarchy:

Your design team is typically comprised of groups, and each group can be responsible for their own releases. The best practice is to release early and often as progress is made and milestones are reached. Both producers and consumers of IP blocks use the Percipient tool, while a producer may be most interested in the latest version and a consumer could be more interested in using a fixed version that isn’t changing until they request an update. The producers are doing design work, running simulations and validations, reaching some quality goal and then they make a new release, alerting consumers that a new version is ready to consume. All team members are in the loop and quickly learn to choose the proper release.

Conclusion
Your SoC projects can be quite complex, containing Terabytes of data, so consider the benefits of using a proven, modern system to manage your IP with traceability, quickly and easily. Just look in one place to know the state of your design, while avoiding communication mistakes that could cost you an expensive silicon spin. The complete 7 page White Paper can be read here.

Related Blogs


Cadence Automotive Summit Sensor Enablement Highlights

Cadence Automotive Summit Sensor Enablement Highlights
by Camille Kokozaki on 12-18-2018 at 7:00 am

At the November 14 Cadence Automotive Summit, Ian Dennison, Senior Group Director, outlined sensor enablement technologies and SoC mixed-signal design solutions, from Virtuoso electrically aware design with high current, high reliability, yield and performance tools and methodologies enabling ADAS/AV sensors for vehicle perception.

An ADAS/AV camera system was described as containing on the transmit side a Cadence Ethernet MAC, a BroadR-Reach PHY, filters, cables, and connectors to a decision-making board on the receiving side and accompanying simulation-based EMI verification, modeling, PCB power integrity analysis, S-parameter models to ensure data coming in is successfully received without resending.

The actual IP containing the automotive Ethernet MAC IP is available in 10Mbps Ethernet that replaces CAN, Flexray, 100Mbps which still demands image compression, and a higher than 1Gbps rate that avoids the need for image compression for the highest image quality and best object classification, along with DMA, APB configuration interface registers and a time stamp unit for Time Sensitive Networking to ensure camera data is not delayed. The automotive Ethernet MAC has received ASIL-B ready certification under the Automotive ISO 26262 standard.

ASIL compliance requires a quality management process and certification, a safety manual for SEooC, safety features description, failure mode effect, diagnostic analysis, and automotive safety kits for tools and flows. In implementing a system using Innovus, a time stamp unit (TSU), the TSU block is duplicated with timer outputs compared on a cycle-by-cycle basis to detect any faults as one of the safety mechanisms. Other considerations exist like creating safety boundaries where internal nets are maintained inside each TSU and interfaces nets are not routed over to avoid common mode failures in the duplicated TSUs.

ISO 26262 SoC design compliance is ensured with the functional safety methodology described in the diagram. Some requirements for proper classification include continuous checks that are needed on the image sensor to enable failure signals to be raised within two frames. On-chip checkers are placed inside the chip to identify analog or digital functional failures that can result in image sensor row, column, ADC and clock failures. There are some types of image sensor failures that rely on DSP processes downstream to be properly detected.

There are some CIS ADAS/AV considerations that determine object classification success driven primarily by moving vehicle image quality. CIS ADAS/AV issues include high dynamic range (HDR) needed for bright/dark conditions, vehicle motion-induced rolling shutter distortion, LED street/vehicle lighting rolling shutter flicker mitigation needs, real-time shutter compensation, noise vulnerability, moving vehicle stabilization and gyroscope fusion and finally cost in a price sensitive automotive market.

A complete simulation platform for CIS analysis uses the ADE Product Suite and the Spectre family of simulators.

Designing needed CIS ADC high dynamic range includes considering the CIS fps/shutter speed that sets the ADC conversion rate, and CIS dynamic range sets the ADC resolution (60 dB and a range of 1000:1 means a 10-bit ADC). The Cadence methodology characterizes the ADCs in the presence of temporal noise.

Uniformity of the CIS arrays for proper design-in of electrical and electrical reliability is essential. Cadence Virtuoso electrically-aware design offers on-screen real-time parasitics and resistance analysis with colormaps and voltage drop summaries, and electromigration current flow, so they can be considered in the analysis and design.

Lidar uses several technologies such as CMOS used in SoCs for controller and Ethernet and image sensors, MEMS for scanning mirror, silicon photonics, III-V material for laser source, and system-in-package. The drive is towards low-cost, small form-factor lidar for automotive, medical, and industrial applications.

The end of Moore’s law is enabling a disaggregated SoC where packaging is the glue between different die where thermal integrity, AC coupling, losses, reflections, crosstalk, warping mitigation, thermal and electromagnetic integrity need to be comprehended and dealt with.

Silicon photonics for frequency modulated continuous wave (FMCW) requires 10 cm automotive lidar depth precision addressed with tighter control of laser modulation and an electro-optical phase-locked loop (PLL). A MEMs tunable laser producing a laser source is split down into two waveguides. One waveguide is sent to the target and the return signal is blended, using an FMCW with a frequency changing all the time and with the signature generating a beat frequency.

Silicon photonics and MEMS co-design are enabled with Spectre APS and AMS Designer, Virtuoso ADE, schematic and layout suites, along with tools from partners like Lumerical and Coventor.

Cadence’s Legato Reliability Solution has a design-for-reliability approach extending the lifetime of the chips. When a failure occurs, functional safety kicks in to stop a car, but tools are needed to help a design-for-reliability mindset where analog defect analysis occurs to reduce the test cost and eliminate test escapes, electro-thermal analysis prevents thermal overstress avoiding premature failures, and advanced aging analysis accurately predicts product wear-out.

In ADAS radar sensor design, antenna sizes are shrinking allowing on-chip integration. A 122GHz radar includes a low-noise amplifier (LNA), power amplifier, mixer, and two on-chip antennas.

The Virtuoso RF Solution allows multi-fabric RF in PCB, SiP and SoC and interfaces with Spectre RF, Allegro Sigrity and National Instruments’ Axiem. An ADAS radar transceiver design was illustrated showing stretchable transmission lines with pCells, matched RX & TX antennas, Spectre RF and Virtuoso ADE Assembler showing the noise figure, input matching, gain, and stability.

Virtuoso RF allows a layered extraction of modules with EM solvers using QRC (a parasitic extractor), Sigrity PowerSI (a 3D-EM solver), and NI’s Axiem (a 2.5D solver for planar elements). The Sigrity PowerSI 3D-EM has an RF-module package extraction and critical path S-parameter model extraction for layered structure designs (on-chip, package, and PCB).

Datacenters are well suited for labeling training datasets with a training engine run only once per dataset versus an inference engine that is run on every image from various sensors onboard the vehicle that feeds new data to the datacenter.

The labeling of the datasets generates a set of coefficients with various weights pushed to the car, which then does a single pass evaluation on the image and generates the most probable label for proper decision making.

System and software design follow a spectrum starting from workstation simulation with no specialized hardware, to Xcelium parallel simulation with hardware running at about 1KHz for software execution, moving to Palladium Z1 emulation with hardware running at ~1MHz for software execution, then to Protium S1 FPGA prototyping at ~10 MHz and finally with first silicon on a prototype board. This allows development of OS, middleware, firmware, and drivers in parallel with hardware-based simulation accelerating the functional verification. Early start to software development or the new hardware accelerate time-to-market.

The Cadence design enablement allows system and DSP design, advanced node SoC development, MEMS and Silicon photonics implementation, SiP integration and CNN software development, all in one interoperable environment that greatly enhances sensor design and opens design fabrics and opportunities leveraging improved accuracy, decision making, and reliability.

Read more here: Automotive Summit 2018 Proceedings


Photonics with CurvyCore

Photonics with CurvyCore
by Alex Tan on 12-17-2018 at 12:00 pm

As a preferred carrier to data or energy, photonics technology is becoming broad and diverse. In IC design, silicon-photonics technology has been the enabler of new capabilities and has revolutionized many applications as Moore’s-based scaling started to experience a slowdown. It acts as new on-chip inductor in HPC design and fast connectivity in network infrastructure.

At the Cadence Photonics Summit and Workshop 2018 held in San Jose last month, Cadence showcased its CurvyCore Infrastructure, a new technology intended for photonics applications. It is a native infrastructure in the Cadence Virtuoso custom IC design platform, allowing designers to create and edit complex curvilinear shapes common in photonics, RF, MEMs, microfluidics and conformal metal routing.

The State of Photonics Technology
The CurvyCore technology addresses markets and technologies that span from silicon photonics switches and interconnects for HPC/Datacenter, medical sensing applications, LiDAR, aerospace, MEMs to carbon nanotube conductors. According to Dr. Vladimir Stojanovic from UC Berkeley, who gave a keynote at the Cadence 2018 Photonic Summit, the current photonics integration with advanced electronics leverages CMOS transistor performance, its process fidelity and package integration, to enable emerging SoCs for various applications ranging from computing to sensing and imaging.

Based on his team’s research, the sweet spot for a “zero-change” silicon photonics platforms used in this monolithic integration technology is of either 45nm or 32nm SOI CMOS processes –they are suitable for adding photonic capability and enhancing integrated system applications such as main communication of computing tasks without involving complicated 3D integration efforts or double-patterning for EUV. Figure 1 captures the optical I/O landscape as well as the application of photonics on RISC-V microprocessor and DRAM.


Silicon photonics application for fast interconnects also evolved from a data center, off-chip centric type to be more integrated as on-chip feature in both microprocessor and PIM (Photonic-Interconnected DRAM) designs. The open-source RISC-V microprocessor with photonic on-chip interconnect was first implemented as single, electronics-optics hybrid chip, dual-core 1.65Ghz processor in CMOS 45nm SOI.

Challenges to Silicon Photonics
Embracing silicon photonics is an evolving process as handling curvilinear physical shapes is challenging. Unlike the traditional Manhattan polygons, custom design of curvilinear geometries is prone to misalignment, roundoff errors and manufacturing problems.
It is an effort intensive undertaking as its associated Pcell creation is cumbersome and time consuming. Additionally, a lack of common infrastructure leads to many ad-hoc, non-replicable and sub-optimal flows –translating to complex DRC/LVS problems to fix. All of these drives the need of having a robust platform to address the overall physical design automation.

CurvyCore Technology and Its Key Benefits
For an optimal performance, the CurvyCore technology has been natively implemented in the Virtuoso platform. The CurvyCore infrastructure has a three-tier data model and is an extension to the Virtuoso advanced-node platform, which sits on a high-performance symbolic mathematical engine.

As part of the Virtuoso expanded data-model, the CurvyCore infrastructure provides full access to all levels of design captures –from building block to actual symbolic expressions, enabling the creation and maintenance of differentiated curvy IP. For example, figure 3 shows phase shifters for a LiDAR (Light Detection And Ranging) application comprises of orthogonal polygons for the electrical connections and curvy geometries for optical interconnect –both of which can be concurrently viewed and manipulated.

The diagram in figure 4 illustrates the CurvyCore data model starting with the mathematical core which consists of an accurate mathematical representation thru symbolic equations, then is followed by the second layer (in magenta) that captures any curve geometries discretized to its equivalent shapes, and the top, physical layer (in pink), that contains the layout polygons in OA shapes. The combination of curvilinear discretization, boolean and sizing operations enables design rule fixing related to photonic complex shapes.

Aside from enabling the creation and editing of complex curvilinear shapes, the CurvyCore integration with the Cadence Virtuoso custom IC design platform leverages a unified design environment for the development of multi-fabric systems. It has new APIs to allow complex PCells creation and efficient data model to support high-performance editing or storage of curvilinear shapes within the Virtuoso design platform.

CurvyCore also supplements the Virtuoso Layout Suite and works seamlessly with the most advanced Virtuoso features, allowing true co-design and integration of electronics and curvilinear features. For example, during the Cadence Photonics workshop attendees were given the opportunity to use the Virtuoso custom IC design platform to view or edit a LiDAR photonics IC and to perform co-simulation of beam steering using Spectre® AMS Designer, MATLAB and Lumerical INTERCONNECT as part of a test-drive of the CurvyCore infrastructure implementation in the Virtuoso platform.

The CurvyCore technology is planned for general availability in Q1-2019.

For more about CurvyCore check HEREand for the Cadence 2018 Photonics Summit check HERE.


Intel Discontinues the Custom Foundry Business!

Intel Discontinues the Custom Foundry Business!
by Daniel Nenni on 12-17-2018 at 7:00 am

After mentioning what I heard at IEDM 2018, that Intel was officially closing the merchant foundry business as an aside in a SemiWiki forum discussion, I got a lot of email responses so let me clarify. Honestly I did not think it was a big surprise. Intel Custom Foundry was an ill conceived idea (my opinion) from the very start and was not successful by any measures. To be clear, it is not something I just heard, it is something I have verified through multiple sources so I believe it to be true, absolutely.

Just a little background, we started blogging about Intel in the early days of SemiWiki and have posted 202 Intel related blogs that as of today have been viewed 2,822,613 times which is an average of 13,973 per blog. Big numbers in the semiconductor blogging world in my experience. Intel has a very large group of entrenched supporters with even more naysayers that are not easily swayed so there are plenty of blog comments, some of which had to be deleted. My argument against Intel opening up their leading edge manufacturing facilities to the fabless community was that it would be a distraction from Intel’s core competency of making microprocessors. As we know, ecosystem is everything with the foundry business and that takes time, money, and technical intimacy, three things that Intel seemed to greatly underestimate.

Also read: Intel Custom Foundry Explained!

Altera was the big win for the Intel Custom Foundry business. I was having coffee with a friend in TSMC Fab 12 when it was announced. If my memory serves it was Dr. Morris Chang who made the announcement and it honestly felt like parents were divorcing. It was mentioned that TSMC viewed this as a learning experience and would make sure that losing an intimate partner like Altera would never happen again.

Also read: Apple will NEVER use Intel Custom Foundry!

Altera was founded in 1984, the same year I started my semiconductor career. Some of my school friends joined Altera and I worked with Altera as a customer during my EDA and IP career down to 20nm so I had a front row seat. It was a very close relationship between Altera and TSMC up until Xilinx came to TSMC at 28nm. TSMC gave Xilinx equal access which soured the Altera relationship. Altera then moved to Intel at 14nm which led to the acquisition at a premium price.

One of the funniest stories I heard was about the first copy of the Intel 14nm design rules Atera got from Intel. They were heavily redacted, which is something I had never seen in the foundry business. After many delays Intel put their own implementation team on the first 14nm Altera tapeout and the result was a very competitive FPGA chip. If not for the continued delays, Xilinx would have been in serious trouble as the Intel 14nm FPGA, based on my experience with customers, beats the Xilinx 16nm in both density and performance.

You can see the 2014 Intel Custom Foundry pitch HERE. Great intentions, good effort, too many broken promises, but doomed from the very beginning, my opinion.