webinar banner2025 (1)

Customizing and Standardizing IP with eSilicon at the Linley Conference

Customizing and Standardizing IP with eSilicon at the Linley Conference
by Camille Kokozaki on 04-22-2019 at 12:00 pm

During the SoC Design Session at the just concluded Linley Spring Processor Conference in Santa Clara, Carlos Macian, Senior Director AI Strategy and Products at eSilicon, held a talk entitled ‘Opposites Attract: Customizing and Standardizing IP Platforms for ASIC Differentiation’.

Standardization is key to IP in modern systems-on-chip (SoC), yet without customization a huge amount (of revenue, performance, area optimization) is left on the table. The spectrum of standard to custom IP goes from common functions turning into standard IP which could evolve into an IP platform that transforms into what eSilicon terms an ASIC chassis and finally consisting of customized IP.

A recent customer design, a machine learning ASIC, included a large amount of IP from 400Mb of embedded SRAM, 48 lanes of 28G SerDes, PCIe SerDes, HBM2 PHY, custom and compiled memories, PLLs, eFuse, analog, PVT monitors to name a few. Standard IP is the opposite of having your special secret sauce, but it is critical to your schedule, cost and efficiency. In a typical 7nm Data Center chip these days, 40-50% of the area and power of the ASIC is related to the IP, 30-50% of the unit cost depends on the IP, before even accounting for royalties. And 30-50% of the NRE development cost is due to IP-specific NRE and is the single highest cost after the mask tooling for 7nm. It is also more expensive than the total design labor cost from RTL to tape-out.

In addition, there is the effort needed in IP integration from test and bring-up to integration/verification interface and interaction with providers. The IP may not be as critical as your secret sauce solution, but it needs to be as cost efficient, as easy to integrate and test and as power/performance/area efficient as you can possibly get. When IP is used, the number one goal is to reduce or eliminate the development and integration effort for the parts of the design that are not critical.

How do standards help? Common interfaces help in interconnectivity and shared functionality allowing easy integration and less interoperability disconnects. Deliverables in standardized form also simplify the tasks. Beyond standardized IP, platforms enable the harmonization of various IPs, so they work well together for a particular node or a market niche. Verification is always needed and for certain applications there is commonality in desired functionality and deployment, and templates can greatly facilitate the deployment and implementation with cost and time savings. IPs, by definition, include overhead since not all use cases have the same functional features, so a common denominator of sorts injects additional circuitry to cover all bases. So, customizing to the needed features can end up saving a lot of power and real estate. Trimming unneeded memory is one example of customization that pays dividends.

Carlos Macian closed with stressing that IP matters, and in summary, using standard IP for standard consistent results, in conjunction with custom IP to increase your market advantage be it through features, performance, power, area, though opposite in means ,are synergistic in ways, the ways of optimal design practice for timely market success at desirable customer value.

During the Panel discussion at the end of the session, Carlos Macian was asked about eSilicon’s approach to pre-verifying. His approach was to verify the RTL, then netlist operation, followed by the other dimension of going beyond the standalone functionality and verifying the integration with the cores which cannot of course be pre-verified, and that is only possible towards the end when the implementation is complete and this is the responsibility of the customer. But at all the levels, the cycle of verification is stream-lined and more straightforward. Silicon verified IP blocks are mandatory to increase the functionality confidence factor. On the AI front, building blocks are provided to generate the AI tiles. The difference between inference and training is going to affect the functionality you place in the AI tile, but it does not affect what is around it such as the ASIC chassis. When asked about the RISC-V value proposition, eSilicon believes that RISC-V facilitates integration greatly, however other processors are also used in its solutions.

The neuASIC [SUP]TM[/SUP] Platform provides compiled, hardened and verified 7nm functions, greatly simplifying the design imperatives providing fixed functions and streamlined data flow architectures. One of the reasons we are seeing hesitation to optimize to the last degree all the systems is due to the fact the field is evolving rapidly. The semiconductor community at large is conservative and risk averse given the multimillion-dollar cost of advanced silicon nodes, so we are seeing that certain workloads are becoming more mature, better understood and more prevalent. While discovering new workloads and new network models that address those workloads, as those mature workloads are better understood, the incentive of building optimized implementations that will scale extremely well for a larger user base, becomes more attractive. For the newer workloads being discovered, programmability plays a key role, modern accelerators are providing programmability such as RISC-V processors, while sitting next to hardware optimized solutions.00


As far as the deliverables are concerned, the standard list from Verilog model to GDSII, test benches, integration guidelines, data sheets, timing constraints, silicon reports since eSilicon is providing hard IP. The integrity of the overall solution is covered given the eSilicon ASIC heritage, knowing where potential interoperability issues arise, and ongoing communications with the customers occur to make sure their functionality is assured as best as possible. Best in class IP is no longer enough, it is compatible IP and it is an architecture that allows the IP to work together that is in order. Programmability and configurability such as compilable memory provide customization. Configurability is also in the SerDes, with parameterized control in the transmit and receive channels.

IP integration in the ASIC matters and IP customization matters. In order to differentiate your product, you need to take advantage of those two aspects and bring out the value in both dimensions.


User2User Silicon Valley 2019

User2User Silicon Valley 2019
by Daniel Nenni on 04-22-2019 at 7:00 am

This will be one of the more interesting Mentor User Group Meetings now that the Siemens acquisition has fully taken effect and the new management team is in place. The Mentor User Conference is at the Santa Clara Marriott, Santa Clara, California on May 2, 2019 from 9:00 am to 6:00pm.

Remember, in 2017 Siemens acquired Mentor Graphics for $4.5B representing a 21% stock premium. Acquisition rumors had been flying around the fabless semiconductor ecosystem but no one would have guessed it would be the largest industrial manufacturing company in Europe. At first the rumors were that Siemens would break-up and sell Mentor keeping only the groups that were part of Siemens core business, specifically they would sell the Mentor IC Group. Those rumors were flatly denied at the following Design Automation Conference during a CEO roundtable and now Mentor, including the IC group, is an integral part of the Siemens corporate strategy.

Last year Wally Rhines transitioned from Chairman and CEO of Mentor to CEO Emeritus. It’s not just an honorary title, Wally still spends 20% of his time at Mentor, mostly with customers. Joseph Sawicki is now in charge as the Executive VP, Mentor IC EDA, a Siemens Business. Everyone knows Joe, he has been with Mentor for close to 30 years and is a “leading expert in IC nanometer design and manufacturing challenges. Formerly responsible for Mentor’s industry-leading design-to-silicon products, including the Calibre physical verification and DFM platform and Mentor’s Tessent design-for-test product line, Sawicki now oversees all business units in the Mentor IC segment.” Not only that, Joe is a hellava good guy. I will be there for Joe’s keynote. If you see me please say hello, it would be a pleasure to meet you.

The event details and registration can be found here.

Join Mentor on May 2, 2019 at the Santa Clara Marriott in Santa Clara, California for User2User Silicon Valley, a one-day conference and exhibition dedicated to end-users of Mentor EDA/IC solutions. Admission and parking for U2U is always free and includes access to 45+ technical presentations, lunch, parking, end of day networking reception, and more! U2U gives you the opportunity to learn from and meet face-to-face with technical experts who design leading-edge products using Mentor tools. Stay all day and you’ll have a chance to win some fantastic prizes at the closing session!

U2U Silicon Valley is focused on these key areas

  • Analog/Mixed-Signal Verification
  • Functional Verification and Emulation
  • Design-for-Test and Semiconductor Data Analytics
  • IC Design and Verification
  • High Density Advanced Packaging
  • MEMS and Custom/Analog Design for the IoT Era

Keynotes

Joe Sawicki
Executive VP, Mentor IC EDA, Mentor, a Siemens Business

Vicki Mitchell
VP, Technology Services Group, IPG, Arm

Allen Sansano
VP Engineering, Wave Computing

Session Highlights

  • How to Close Coverage 10X Faster Using Questa inFact – Microsoft
  • Integrated Approach to Power Domain/Clock Domain Crossing Checks – Challenges and Implementation – Cypress Semiconductor
  • Analog/Mixed-Signal (AMS) Design Challenges for high speed SerDes in nm-scale CMOS for 5G and Automotive Applications – Qualcomm
  • Accelerating verification of high precision MEMS sensor SoCs with Symphony – Invensense
  • Parasitic Extraction for GLOBALFOUNDRIES 22FDX-EXT PDK – GLOBALFOUNDRIES
  • Enabling faster top-level DRC runtimes through targeted optimizations and Mtflex – Customer Presentation
  • Maximizing Veloce Value for AI Design Verification – WAVE Computing
  • SSD Controller Verification with Veloce Solutions – SK Hynix Memory Solutions
  • What’s Driving Heterogeneous Integration and Which Packaging Option is Best? – TechSearch International
  • Package Assembly Design Kits Bring Value to Semiconductor Designs – Amkor
  • Improving Test and Fault Coverage with Tessent Cell-Aware Models using Artisan Physical IP Library – Arm
  • An AI Chip DFT Design Flow for Catching Time-To-Market (Gyrfalcon) – Gyrfalcon Technology Inc.
  • A Case Study of Testing Strategy for AI SoC (Enflame) – Mentor
  • GLOBALFOUNDRIES 22FDX® Custom Design with Mentor Tanner Tools – GLOBALFOUNDRIES
  • Supersede 5G – Gain Ics
  • Accelerating AR/VR Computer Vision Algorithms in a Hybrid HLS/RTL Approach – Facebook
  • In Depth Power Optimizations of Ultra Low Power STM32 Microcontroller with Nitro-SoC – STMicroelectronics

View the full conference agenda here.

U2U Exchange
Meet experts in the U2U Exchange, your hub for information, product demos, and technical advice. Share your experience and hear about the latest solutions from Mentor as well as featured partners. This year’s exhibitors include Arm, GLOBALFOUNDRIES, Samsung, TowerJazz, TSMC, and Oski Tchnology.


Auto Shows No Connection

Auto Shows No Connection
by Roger C. Lanctot on 04-21-2019 at 12:00 pm

The Washington Auto Show, one of the largest auto shows in the U.S., has a problem and it is a problem shared by other auto shows in the U.S. and around the world. It is a problem that plagues the entire industry and it may spell trouble for connecting with car customers.

I visited the Washington Auto Show last week. The event closed on Sunday. I visited with my wife who wanted to know more about vehicle connectivity services. I was skeptical that the personnel manning the booths at the event could answer my wife’s questions and my skepticism was validated.

There was not a single booth with adequate literature or exhibits to explain connected vehicle services. It seems like a trivial matter. It’s not. It’s a big deal.

More than 20 years ago General Motors defined the meaning of vehicle connectivity with the launch of OnStar – a cellular-based system designed to provide post-crash emergency response. In the event of an airbag deployment, the OnStar system will call the nearest public service access point (PSAP) to connect the car to a call center capable of arranging the dispatch of fire fighters, police or medical personnel.

The elegance of the OnStar service resides in its simplicity. OnStar has always done this one thing well and, at the time of its launch, actually came to be adopted by several competing auto makers as a critical, foundational application for a connected car.

GM ultimately terminated its OnStar licenses to other auto makers, some of which created their own OnStar-like systems. The European Union liked the idea so much it introduced an emergency call (eCall) mandate that went into effect about one year ago.

A lot has changed about vehicle connectivity since the time of OnStar’s launch. The cellular network changed from analog to digital, for one, enabling Internet access to cars.

In turn, GPS and embedded navigation have become widespread and now car makers are experimenting with digital assistants integrated with vehicle systems. In spite of these advances, though, car makers have yet to clearly define what vehicle connectivity is or means to consumers.

The strangeness of this failure arises from the fact that the automotive industry is on the cusp of a transformative change in vehicle connectivity. According to Strategy Analytics, 2019 marks the first year that more than 50% of new cars being shipped from factories globally will come with built-in wireless connections.

This reality will have escaped your notice if you attended the Washington Auto Show. Booth personnel were unfamiliar with the fundamentals of vehicle connectivity and most messaging beyond horsepower and cosmetics was focused on smartphone connectivity and safety systems. Even the electric vehicles, always a point of emphasis at the Washington show, were shown off primarily for their performance characteristics in an on-floor driving demo.

As far back as 2016, Strategy Analytics consumer research conducted in China, Europe and North America showed strong interest in embedded connectivity for a range of mission critical driving services.


My own shorthand for consumer interest in connected services is TWP – traffic, weather, parking. The list does need to be revised though, for 2019: TWPCS – as in, traffic, weather, parking, cybersecurity and software updates.

We have a major cybersecurity problem in the world today and connected cars are just one source of vulnerability. It may seem counter-intuitive but no car will be cyber-secure if it isn’t connected – though a connected car itself is vulnerable.

With millions of lines of software code embedded in most new cars supporting increasingly sophisticated safety and infotainment systems connectivity is a necessity. Yet consumer-focused car shows continue to neglect the vital messaging essential to educating the driving public.

The issue is even worse at dealerships. Vehicle connectivity ought to be seen as a means for reinforcing existing customer retention tools, but the average dealer sales person, with notable exceptions (BMW), is poorly trained on connected services and more inclined to tout smartphone connectivity to the average customer.

Conquering connectivity is a monumental task for the automotive industry. The world is poised on the threshold of a global roll out of 5G wireless technology designed to transform the fundamental nature of connectivity itself.

Cars will benefit mightily from 5G connectivity which will enable cars to avoid collisions in real-time while anticipating traffic light changes and being warned of road hazards ahead. Cars will be able to detect and prevent cyber attacks and receive vital software updates for navigation and safety systems.

There’s no excuse for car makers and their representatives to be incapable of accurately explaining connected vehicle systems and selling those systems to consumers. Horsepower and leather interiors are cool, but connectivity will save lives.


A Tale of Two Semis

A Tale of Two Semis
by Robert Maire on 04-21-2019 at 7:00 am

It was the best of times (for stocks)
It was the worst of times (for memory chips)
The disconnect between stock & chip prices

The Venn Diagram of Stocks and Chips

Having been involved with semiconductor and tech stocks for a long time there has always been a loose correlation between the fortunes of the industry and the fortunes of the stocks, which varies over time. Right now we are in one of those periods where the Venn diagram has little overlap as the stocks have been on a tear while the industry wallows in the mud. Memory chip pricing and demand has been bad, to say the least and logic demand has not lit the world on fire as the whole smart phone industry has clearly slowed, led by Apple.

You don’t buy equipment when cutting capacity
With Micron cutting back on wafer starts by 5%, they are voting with their feet. A 5% cut back in wafer starts does not correlate to a 5% cut in equipment purchases. Equipment purchases are almost binary. In times of glut, such as we are in now, equipment purchases related to capacity go to zero, while equipment purchases related to technology slow. We doubt that Samsung or other memory makers are spending to build capacity as they also have too much.

A third order derivative market
We have said many times that higher order derivative markets are more volatile than first orders markets and semi equipment is a third order derivative market;

  • Smart phone & consumer sales slow – sneezes
  • Semi market sees memory crater – catches a cold
  • Semi Equip memory sales go to zero – gets pneumonia

Life at the end of the food chain is always more volatile.We saw it in the up cycle and we are seeing now in the down cycle.

The “new normal” is likely lower than the “old normal”
The chip industry went through a “perfect storm” of circumstances that is not likely to be repeated when the industry recovers. The industry, in the last cycle, was driven by moving from rotating media to SSD’s, conversion from 2D to 3D memory, multi patterning due to the lateness of EUV, among others among factors, all of which are one time events that will not be repeated.

This most recent up cycle was higher than normal due to these unique one time events that drove demand to a higher and longer up cycle than we would have otherwise seen.

Although AI, VR, and 5G are on the horizon, its unclear that they will drive the industry to equal the previous unusual high. None of these require significant new technology such as the the 2D to 3D conversion or multipatterning these drivers are primarily just different chips not huge increases in demand or new technology that will force big equipment buys.

Gravitational attraction of realities
The reality of the stocks and the reality of the industry Venn diagrams vary over time but always seem to have a gravitational attraction that brings the stocks back to the “real reality” of whats happening in the market. Right now the stocks do not reflect the continuing weak chip market but sooner or later either the stocks will fall or the chip market will pick up as stocks and the industry will get back in alignment.

Betting on a bounce back
Right now, given the divergence, investors are betting that the industry “bounces back” fairly quickly and will support the overly extended stock prices. We are not so sure of this. We have yet to see any hard evidence of any kind of recovery in the chip market, let alone a quick “bounce back” in the second half of 2019 that many bulls are calling for. It seems a bit difficult to suggest that memory prices will bounce back with Micron cutting production, and Samsung is doing the same. 5G is still a long way off (and not happening at Intel…)and smart phones are certainly slower.

There is no evidence or calculation that can be done to predict when the semi industry will recovery. Each down cycle and up cycle is different in its shape and duration. Any one who claims to be able to predict the cycle is lying. Right now all the talk about a 2H 2019 recovery is no more than hope and conjecture and is about as accurate who said the industry was going to have a “one quarter” air pocket in the summer of last year. That one quarter air pocket is going on about a year now….

Will investors get impatient?
Perhaps the biggest question of Q1 earnings reports is about investor patience. Reports will not likely be very rosy nor particularly upbeat about Q2 or future quarters. Will investors continue to sit on stocks whose price is based on a significant recovery that has no basis in reality. So far, this tear that stocks have been on has lasted over a quarter.
Some of the stocks keep gaining ground and have hit 52 week highs

Whats priced in?
It seems that investors are pricing in a second half recovery and an industry getting back to a similar pace that it had over a year ago. We think this is essentially priced for perfection as its hard to have a lot of upside from those assumptions and there is certainly more downside risk of a slower or lower recovery or both.

The view from the trenches
People in the industry we speak to seem incredulous at the stock prices but are obviously more than willing to accept the benefit. Very simply, business is not as good as the stocks would imply and we would challenge someone to suggest their business has improved as much as their stock has.

Has the China risk gone away?
Also absent in stock valuation in the semiconductor group is discount related to trade issues and potential issues. The March deadline for tariffs set by the white house has come and gone and it doesn’t sound like we have significant confidence of a solid deal. China trade seems to have gone the way of Korean de-nuclearization talks….lots of initial bluster and promises followed by deafening silence.

Maybe this is a good thing for the stocks as ignoring the issue may make it go away…at least as far as stock impact is concerned.

The stocks – Time to take money off the table?
From a high level perspective we find it harder to paint an upside scenario to the stocks from here as opposed to potential downside scenarios. We think the downside beta is higher than upside beta. If we were to take some money off the table in some of the chip stocks that have been on a tear since the beginning of the year we think its a reasonable strategy to lock in some of those short term gains we have gotten.

If we had a more aggressive attitude there are likely some of the stocks that could be shorted here.

Our negative bias is more on memory related companies as that part of the chip market is not getting better any time soon and investors may lose patience here first. Logic and foundry, while not great are in better shape and likely are less oversupplied and have more new drivers.

No matter what, this earnings season is critical given the rapid rise we have seen in the stocks in Q1 and we are at a crossroads that will see volatility.


SPIE Advanced Lithography Conference – Imec and Veeco on EUV

SPIE Advanced Lithography Conference – Imec and Veeco on EUV
by Scotten Jones on 04-19-2019 at 12:00 pm

At the SPIE Advanced Lithography Conference Imec presented several papers on EUV and Veeco presented about etching for EUV masks. I had the opportunity to see the presentations and speak with some of the authors. In this article I will summarize the key issues around EUV based on this research.

EUV is ramping up into high volume 7nm production at Samsung and TSMC, and Intel plans to introduce EUV with their 7nm process next year. Although EUV is ramping for 7nm there is still a lot of room for improvement in the technology and going forward 5nm and 3nm will introduce additional challenges.
Continue reading “SPIE Advanced Lithography Conference – Imec and Veeco on EUV”


TSMC Q1 2019 Earnings Call Discussion!

TSMC Q1 2019 Earnings Call Discussion!
by Daniel Nenni on 04-19-2019 at 7:00 am

It’s no coincidence that the TSMC Symposium is right after the Q1 earnings call. This will allow TSMC to talk more freely and they certainly will, my opinion. It is a very interesting time in the semiconductor industry and TSMC, being the bellwether, can tell us what will happen the rest of the year and give us some 2020 insights.

TSMC CEO C.C. Wei again led the call with a prepared statement. This time I will paste the entire statement (minus the packaging stuff) with my embedded comments.

  • Thank you, Lora. Good afternoon, ladies and gentlemen. Let me start with our near-term demand and inventory. We concluded our first quarter with revenue of TWD 280.7 billion or USD 7.1 billion, in line with our revised guidance. Our business in the first quarter was impacted by three factors: first, the overall global economic condition, which dampened the end market demand; second, customers are ongoing inventory adjustment; and third, the high-end mobile product seasonality. Meanwhile, the net effect from the photoresist defect material incident also impact our first quarter revenue by about 3.5%.

My question here is: Who is liable for this defect? Is the supplier being held accountable? Accounts of this incident from South Korea painted TSMC as negligent which I have found to be fake news.

  • Moving into second quarter this year. While the economical factor and mobile product seasonality still linger, we believe we may have passed the pattern of the cycle of our business as we are seeing customers’ demand stabilizing. Based upon customer indications for their business and wafer loading in second quarter, we also expect our customers’ overall inventory to be substantially reduced and approach the seasonal level around the middle of this year.

Personally, I feel the second quarter will be stronger than expected based on 2018 year end CEO comments. It is better to under predict than over predict and I believe that is what is happening here. Let’s not forget the Q1 2019 semiconductor guidance we previously published:

  • In the second half of this year, TSMC’s business will be supported by this year’s inventory base as well as strong demand from our industry-leading 7-nanometer technology, which support high-end smartphone new product launches, initial 5G deployment and HPC-related applications. For the whole year of 2019, we forecast the overall semiconductor market is good in memory as well as foundry growth to both be flattish. For TSMC, we reiterate that we expect to grow slightly in 2019.

To me this is low single digits but closer to 5% than 1%. Here are the previously published analyst forecasts for 2019:

  • Now let me update the photoresist material incident. On February 15, in order to ensure quality of wafer delivery, TSMC announced it will scrap a large number of wafers as a result of a batch of bad photoresist material from a chemical supplier. This batch of photoresist contain a foreign polymer that created a desirable – undesirable effect and resulted in yield deviation on 12-and 16-nanometer wafers at Fab 14B.
  • We have since taken corrective action to enhance our defenses and minimize future risk. Our actions including the following: improved TSMC’s own in-house incoming material, conforming test and controls; upgrade control and methodology with all suppliers for incoming material quality certification; establish robust in-line and off-line monitoring process to prevent defect escape.

TSMC does not point fingers but again I would like to know more about this event.

  • Now I will talk about our N5 status. Our N5 technology development is well on track. N5 has entered risk production in first quarter, and we expect customer tape-outs starting this quarter and volume production ramp in first half of 2020. With 1.8 times logic density and 15% speed gain and an ARM A72 core compared with 7-nanometer, we believe our N5 technology is the most competitive in the industry. With the best density performance, power and the best transistor technology, we expect most of our customers who are using 7-nanometer today will adopt 5-nanometer. With N5, we are expanding our customer product portfolio and increasing our addressable market. Thus, we are confident that 5-nanometer will also be a large and long-lasting node for TSMC.

To be clear TSMC 5nm chips will be in Apple products next year. I have read reports that TSMC released 6nm because 5nm was late which is fake news. I know many companies that are taping-out at 5nm and it is on track and meeting expectations. More details will be available on SemiWiki after the symposium so stay tuned.

  • Now I’ll talk about the ramp up of N7 and N7+ and introduction of N6. We are seeing strong tape-out activity at N7, which include HPC, IoT and automotive. Meanwhile, our N7+, which adopts EUV for few critical areas, has already started volume production now. The yield rate is comparable to N7. We’ll reaffirm N7 and N7+ will contribute more than 25% of our wafer revenue in year 2019.

If you look at TSMC’s Q4 2018 revenue split, 50% is FinFET processes and 50% is mature CMOS nodes. In Q4 2017 FinFET processes were 45% and in Q4 2016 it was 33%. In Q1 2019 FinFET revenue dropped to 42%, not a good sign, let’s blame cryptocurrency.

  • As we continue to improve our 7-nanometer technology and by leveraging the EUV landing form, N7+, we now introduce N6 process. N6 has three major advantage. First, N6 have 100% compatible design rules with N7, which allows customer to directly migrate from N7-based design, which substantially shorten the time-to-market. Second, N6 can deliver 18% higher logical density as compared to N7 and provide customer with a highly competitive performance-to-cost advantage. Third, N6 will offer shortened cycle time and better defect density. Risk production of N6 is scheduled to begin in first quarter year 2020 with volume production starting before the end of 2020.

N6 is a little bit confusing thus far. Hopefully we can get it cleared up at the TSMC Symposium. From what I understand N7 and N7+ are not design rule compatible since N7+ has EUV. N6 is N7+ with an additional layer of EUV which helps with density. Saying N6 and N7+ are design rule compatible makes sense but is N6 really design rule compatible with N7?

  • Finally, I will talk about the HPC as our most important growth driver in the next five years. CPU, AI accelerator and networking will be the main growth area for our HPC platform. With the successful ramp of N7, N7+ and the upcoming N6 and N5, we are able to expand our customer product portfolio and increase our addressable market to support applications, such as data center, PC and tablets. Meanwhile, we also see networking querying thanks to 5G infrastructure deployment over the next few years. We are truly excited about our growth opportunities in HPC. Thank you for your attention.

AI is a trending term on SemiWiki and readership is all over the map. I seriously doubt it will be a quick bubble like cryptocurrency or even a 10 year bubble like mobile. In my opinion AI will be with us for a very long time and it will consume leading edge wafers like a zombie apocalypse, absolutely.

From what I have heard EUV throughput is still ramping up so my fingers are crossed for 5nm. Hopefully EUV is covered in more detail next week at the TSMC Symposium. I will also get a refresh from our resident EUV expert Scott Jones. In fact, he has just posted an EUV blog from SPIE:

SPIE Advanced Lithography Conference – Imec and Veeco on EUV

Bottom line: The second half of 2019 will be good for TSMC and 2020 will be even better. My prediction today for TSMC in 2020 is back to double digit growth. Remember, now that Intel is out of 5G modems TSMC will get the modem business back from Apple next year via the 7nm QCOM modem plus other 5G modem business. 2020 will be the beginning of a beautiful 5G friendship.


Flex Logix InferX X1 Optimizes Edge Inference at Linley Processor Conference

Flex Logix InferX X1 Optimizes Edge Inference at Linley Processor Conference
by Camille Kokozaki on 04-18-2019 at 12:00 pm

Dr. Cheng Wang, Co-Founder and SVP Engineering at Flex Logix, presented the second talk in the ‘AI at the Edge’ session, at the just concluded Linley Spring Processor Conference, highlighting the InferX X1 Inference Co-Processor’s high throughout, low cost, and low power. He opened by pointing out that existing inference solutions are not optimized for edge requirements though high-end server solutions exist. Processing images one at a time, with fixed power budgets, using larger images, larger models, with higher prediction accuracy is needed. High end solutions are not optimized for edge inference. Since cameras see one image at a time, at the edge, batching is not practical. Even high-end devices perform less well at low batch sizes.

Flex Logix started off having embedded FPGA and interconnect programmable technology and is now using it as a foundation for their technology stack. The nnMax technology utilizes embedded FPGA that is integrated into SoCs and with density and performance like leading FPGAs, with XFLX [SUP]TM[/SUP], ArrayLINX [SUP]TM[/SUP], RAMLINX [SUP]TM
[/SUP]
Flex Logix Technology Stack consists of:

  • Hardware

    • InferX [SUP]TM[/SUP] PCIe Cards
    • InferX Edge Inference co-processor ICs
    • nnMAX [SUP]TM[/SUP] Inference IP
    • eFPGA/ Interconnect Technology
  • Software

    • TensorFlow Lite, ONNX
    • Software driver
    • InferX/nnMAX Inference Compiler
    • eFPGA place and route back-end

Inferencing customers needed large EFLX DSP MACs so a 1K nnMax tile was developed. A detailed look at the 1K configurable MAC Inference Tile shows the following architecture and features.


Winograd acceleration [SUP]1[/SUP] for INT8 provides 2.25x performance gain for applicable layers and is invoked automatically by nnMax compiler. The Tile is also programmed by TensorFlow Lite/ONNX with multiple models running simultaneously. The 1K tiles can be configured in any array size with configurable L2 SRAMs supporting 1-4 MB per tile and with a variable DRAM bandwidth through reconfigurable I/Os typically connecting x32 or x64 LPDDR4. The key advantage here is the ability to reconfigure ports and controls for the data path for each layer, and once configured, can run with ASIC-like performance, with routing to the memory and to interconnect, and with localized data access and compute.

Winograd acceleration speeds up 3×3 convolution with a stride of 1 by a factor of x2.25. Though the resulting architecture creates 2x larger weights and a more complex data path, nnMax performs the transformations on the fly, removing the weight penalties. Winograd is essentially free to the user from the performance perspective because they do not get any penalty in DRAM power or precision, but not free in hardware with only some additional bits in the multiplier.


Some layers have large intermediate frame sizes that may not fit in on-chip SRAM (e.g. YOLOv3 layer 0 outputs 64MB), resulting in DRAM writes and re-reads, putting a strain on DRAM bandwidth and potentially causing pipeline stalls when those layers are processed. To address this, multiple layers are run in parallel. In the YOLOv3 situation, Layer 0 and 1 are run simultaneously, avoiding the 64MB need to store with Layer 0 streaming directly into nnMax clusters processing Layer 1.

InferX X1 applications include edge devices such as surveillance cameras, robots, set-top boxes, edge servers like edge gateways and low-end edge servers.

The InferX X1 Edge Inference co-processor which runs at 1.067GHz on TSMC16FFC is scheduled for Q3 2019 tape-out with 8.5 TOPs, with 4K MACs, 8MB SRAM, x32 LPDDR4 DRAM, x4 PCIe Gen 3/4 lanes. Total dynamic worse-case power for YOLOv3, the most demanding, on PCIe Card, and including DRAM and regulators is 9.6W. InferX X1 silicon and PCIe cards will sample by the end of 2019. The typical power is 2.2 Watts on ResNet-50, and varies by model.

InferX X1 throughput is 3 to 11 times existing edge inference ICs and can be chained for higher inference throughput. Performance gain is greater on large models such as YOLOv2, v3. Furthermore, its throughput/Watt is 4 to 26 times better allowing edge devices to stay within their power budget.
The nnMAX compiler front-end flow performs the Neural Network model to soft-logic translation. The back-end flow performs place-and-route, retiming, pipelining and binary generation. The compiler first estimates performance, then accepts the X1 floorplan and TensorFlow Lite (soon ONNX) and automatically partitions the model across multi-layer configurations and computes performance, latency, MAC utilization, DRAM bandwidth per layer and per model.

During the panel discussion Dr. Wang was asked about complexity perception, he stressed that the architecture is simple, and the verification is not any more difficult than verifying an FPGA design. When asked about the memory requirements, he stated that since there is no way to store all parameters into SRAM in edge devices, you can try to train the model to be sparse or partition it over multiple devices. Certain aspects can also be serialized across multiple devices. When asked about the time it takes to reconfigure the fabric, the answer was about 1 microsecond per layer, allowing for a video processing 30 frames per second, with a model having 100 configurations, cycling through 3,000 configurations, 300 microseconds per layer and thus the user will not experience a drop in performance with, as designed, an acceptable hardware impact and complexity. When asked how one addresses models other than CNN, he said that embedded FPGA runs a lookup table from anything to anything. Most functions are not throughput intensive enough and FPGAs handle these beautifully. The activation function is all in the lookup table. Most functions have matrix multiplication or data movement and FPGAs are optimized for that. How you deliver enough bandwidth at the edge, when GDDR or HBM are not in the cards, was the reason why the architecture was designed the way it was so that not much DRAM is required.

Edge application is all about efficiency, and how much throughput one can get for a certain amount of power and certain amount of cost. The goal is to extract as much performance as possible and their solution is as close to a data center in performance while still in edge space. FPGAs typically have a problem with capacity because they try to map everything at once. Flex Logix has multiple configurations addressing the capacity issue where a certain amount of resources is required to map a model, with the compiler making decisions on how to multiplex data and what degree of parallelism to use based on how much resources are available and how much the model requires.

Geoffrey Tate, Flex Logix CEO, emphasized the ability of the reuse of their FPGA technology to deliver very high throughput inference capability for the more challenging models that the customers want to run at low power and low cost. The chip customer programs ONNX or TensorFlow Lite models and Flex Logix software takes care of the FPGA internals. The interconnect technology can reconfigurably program the non-blocking paths from memory on-chip through the hardware units like the MACs and back to memory giving much more efficient computation than other processor architectures.
___
1 Distinct from the conventional convolution algorithm, Winograd algorithm uses less computing resources but puts more pressure on the memory bandwidth. Flex Logix architecture mitigates that.


ML and Memories: A Complex Relationship

ML and Memories: A Complex Relationship
by Bernard Murphy on 04-18-2019 at 7:00 am

No, I’m not going to talk about in-memory-compute architectures. There’s interesting work being done there but here I’m going to talk here about mainstream architectures for memory support in Machine Learning (ML) designs. These are still based on conventional memory components/IP such as cache, register files, SRAM and various flavors of off-chip memory, including not yet “conventional” high-bandwidth memory (HBM). However, the way these memories are organized, connected and located can vary quite significantly between ML applications.

At the simplest level, think of an accelerator in a general-purpose ML chip designed to power whatever edge applications a creative system designer might dream up (Movidius provides one example). The accelerator itself may be an off-the-shelf IP, perhaps FPGA or DSP-based. Power may or may not be an important consideration, latency typically is not so important. The accelerator is embedded in a larger SoC controlled by maybe an MCU or MCU cluster along with other functions, perhaps the usual peripheral interfaces and certainly a communications IP. To reduce off-chip memory accesses (for power and performance), the design provides on-chip cache. Accesses to that cache can come from both the MCU/MCU cluster and from the accelerator, so these must be coherently managed.

Now crank this up a notch, to ADAS applications, where Mobileye is well-known. This is still an edge application, but performance is much more demanding from latency, bandwidth and power consumption standpoints. Complexity is also higher; you need to support multiple accelerator types to support different types of sensor and sensor fusion for example. For scalability in product design, you cluster accelerators in groups, very likely with local scratchpad memory and/or cache; this enables you to release a range of products with varying numbers of these groups. As you increase the numbers and types of accelerators, it makes sense to cluster them together using multiple proxy cache connections to the system interconnect, one for each accelerator group. In support of your product strategy, it should then be easy to scale this number by device variant.

Arteris IP supports both of these use-cases through their Ncore cache-coherent NoC interconnect. Since this must maintain coherence across the NoC, it comes with its own directory/snoop filters. The product also provides proxy caches to interface between the coherent domain and non-coherent domains, and you can have multiple such caches to create customized clusters of IP blocks that use non-coherent protocols like AXI, but can now communicate as a cluster of equals in the cache coherent domain. Arteris IP also provides multiple types of last-level cache including the Ncore Coherent Memory Cache, which is also tied into coherency management to provide a final level of caching before needing to go to main memory. For non-coherent communications, Arteris IP also provides a standalone last-level cache integrating through an AXI interface (CodaCache).

These ML edge solutions are already proven in the field: Movidius and Mobileye are two pretty compelling examples (the company will happily share a longer list).

Moving now to datacenter accelerators, memory architectures look quite different based on what’s happening in China. I’ve talked before about Baidu and their leading-edge work in this area, so here I’ll introduce a new company: Enflame (Suiyuan) Technology, building high-performance but low-cost chips for major machine-learning frameworks. Enflame is a Tencent-backed startup based in Shanghai with $50M in pre-series A funding, so they’re a serious player in this fast-moving space. And they’re going after the same objective as Cambricon, and Baidu with their Kunlun chip – the ultimate in ML performance in the datacenter.

I’ve also talked before about how design teams are architecting for this objective – generally a mesh of accelerators to achieve massive parallelism in 2-D image processing. The mesh may be folded over into a ring or folded twice into a torus to implement RNNs, to support processing temporal sequences. The implementation is often tiled, with say 4 processors per tile and local memory and tiles are abutted to build up larger systems, simplifying some aspects of place and route in the back-end.

Designs like this quickly get very big and they need immediate access to a lot of off-chip working memory, without the latency that can come with mediation through cache coherency management. There are a couple of options here: HBM2 at high-bandwidth but at high cost, versus GDDR6 at lower cost but also lower bandwidth (off-chip memory on the edge is generally LPDDR). Kurt Shuler (VP Marketing at Arteris IP) tells me that GDDR6 is popular in China for cost reasons.

Another wrinkle in these mesh/tiled-mesh designs is that memory controllers are placed around the periphery of the mesh to minimize latency between cores in the mesh and controllers. Traffic through those controllers must then be managed through to channels on the main memory interface, (e.g. HBM2). That calls for a lot of interleaving, reordering, traffic aggregation and data-width adjustments between the memory interface and the controllers, while preserving the benefits of high throughput from these memory standards. The Arteris IP AI-package provides the IP and necessary interfacing to manage this need. On customers, they can already boast Baidu, Cambricon and Enflame at minimum; two of these (that I know of) have already made it through to deployment.

Clearly there is more than one way to architect memory and NoC interconnect for ML applications. Kurt tells me that they have been working with ML customers for years, refining these solutions. Since they’re now clearly king of the hill in commercial NoC solutions, I’m guessing they have a bright future.


TechInsights Gives Memory Update at IEDM18 DRAM and Emerging Memories

TechInsights Gives Memory Update at IEDM18 DRAM and Emerging Memories
by BHD on 04-17-2019 at 12:00 pm

On the Sunday evening at IEDM last year, TechInsights held a reception in which Arabinda Das and Jeongdong Choe gave presentations that attracted a roomful of conference attendees.

This is the second part of the review of Jeongdong’s talk, we covered NAND flash technology in the last post. Jeongdong is a Senior Technical Fellow at TechInsights, and their subject-matter expert for memory technology. Before joining the company, he worked as a Team Lead in R&D for SK Hynix and Samsung advancing next-generation memory devices, so he knows whereof he speaks.
Continue reading “TechInsights Gives Memory Update at IEDM18 DRAM and Emerging Memories”


Hogan Fireside Chat with Paul Cunningham at ESDA

Hogan Fireside Chat with Paul Cunningham at ESDA
by Bernard Murphy on 04-17-2019 at 7:00 am

If you’re in verification and you don’t know who Paul Cunningham is, this is a guy you need to have on your radar. Paul has risen through the Cadence ranks fast, first in synthesis and now running the verification group, responsible for about a third of Cadence revenue and a hefty percentage of verification tooling in the semiconductor industry. Since he was honored as one of the outstanding innovators under 40 at DAC 2017, you should realize he really is on the fast track and is likely to significantly influence how you will be verifying in the future. The ESD Alliance hosted an event recently at which Jim Hogan interviewed Paul, to help us learn more about this rising star and his entrepreneurial journey.

Paul is a fellow Brit/ex-Brit; there are a lot of us around (at least 5 at the ESDA meeting). He took his first degree (CS) at Cambridge, also rowed for the university, then stayed at Cambridge to get his Ph.D. in formal verification of asynchronous circuits. He was quite open about his journey of discovery in async circuits, saying he originally drank the Kool-Aid, believed this design style would conquer the world and decided he wanted to start a company to build compilers for self-timed chips.

Together with a co-founder, they started Azuro in Cambridge, raising ~$100k. Talking to prospects, they got a quick reality check between what is academically interesting and what can make serious money. They found that prospects weren’t interested in self-timed circuits but were very interested in better clock gating and useful skew. Paul/Azuro reworked their PowerPoint to reflect this reality and started doing deals with well-known companies. That woke up the big VCs; ultimately Benchmark Capital, who have a branch in London, put in $4M. Benchmark required, unsurprisingly, that Azuro move their HQ to the Bay Area (though there’s still an R&D operation in Cambridge, now driving clocks for Cadence).

Jim asked Paul what he learned from being a CEO. Pay attention here, would-be CEOs. He said that intense customer focus and agility to meet customer needs are primary. At the same time there’s a need for balance and a broad set of skills. No-one, not even a CEO has everything it takes, so it’s important build a strong team, to fill gaps in expertise and ensure priorities are balanced. One of the gaps was marketing. In the early stages some wins were self-marketing; new prospects called them. Azuro got to escape velocity but generally you can’t assume technology alone will get you there. If he was going to do it over again, he’d be a lot more vocal, even shameless, not try to over-optimize the pitch, pump up the volume and ensure that everyone knew the name. Jim added that now social media has to be a part of the strategy.

Charlie Huang, back then running strategy in Cadence, called Paul in 2010-2011. At that time, smartphones were really taking off and the ARM A9 had caught the wave. ARM were using aggressive clock gating and useful skew, giving them a 10% advantage in PPA. That’s massive in this business; Charlie (whose background was in timing) wanted it to be exclusive to Cadence. Paul had no ambitions to take Azuro public and Charlie saw the opportunity to have a powerful differentiator and grow market share. They just had to do the deal.

Again, for would-be CEOs, if you’re lucky enough to get there, this is one of the most painful stages in a startup; Paul said the due-diligence process was brutal. For several weeks they were gathering/assembling legal and financial docs (NDAs, patents, patent searches, customer contracts, audits, …), a very stressful, sleep-deprived time when the technologists are in a holding pattern while the lawyers and accountants do their thing. Even after that part is done, the transition from a small, tightly-knit startup group to being one group among many in a large enterprise, this also is traumatic. But Paul never regretted it, or the immense leverage it has enabled for the technology and for him personally.

At Cadence, Paul applied the Azuro technology to clock tree synthesis, then quickly took on a broader portfolio managing the digital back-end products. Logic synthesis, these days tightly coupled to implementation, is a solid pier in Cadence’s pretty clearly dominant implementation solution.

Not bad, but Paul wanted more. He saw one of the IBS charts at a kickoff event, the chart that shows growing investment in various phases of design. What stands out for everyone is that system and software verification dominate everything else. In his view if he wanted to make a real dent that was where he had to focus. Anirudh asked him about 15 months ago to run verification; Paul said this wasn’t a hard decision.

He believes the opportunity is boundless if Cadence can deliver new and compelling approaches. This starts with what I find to be a differentiated top-line goal – throughput. By this he means bugs found per dollar per day. He’s very single-minded about this goal; objective by objective, he asks does this move the throughput needle or not? I consider this goal to be an important new direction. When I look at verification pitches over the last 10+ years, it can be difficult to isolate a unifying metric or philosophy other than run faster! ease of use! more features! Laudable goals of course, but how do specific advances affect customer success and profitability? Implementation flows and teams don’t have this problem – they’re always optimizing for PPA. There’s no confusion about the right metric. Verification needs the same singular objective. That’s what I see in this direction.

Of course execution has to be broken down into sub-goals. For Paul this starts with the underlying bare-metal verification hardware – today x86 (Intel/AMD) and ARM based servers for simulation, then emulation and FPGA prototyping. He sees hardware platforms as a variable; they will continue to evolve. Above the bare metal, he sees a heterogenous compute layer, a hybrid mix of platforms to optimize throughput versus accuracy and bug-finding visibility. On top of that, smart analysis – isolating bugs faster and more intelligently in the always exponentially huge state-space.

Jim asked about compliance, safety and security. Paul likes Simon Segars’ (ARM) view, that all of us in the ecosystem enabling and building these transformational products have a responsibility to ensure these solutions are safe and secure. Verification has a big part to play in this, but for Paul this must be guided by the Lip-Bu/Anirudh philosophy of having the right to win. If you don’t have proven domain expertise, you need to work with people who do, which is why he’s so excited about the partnership with Green Hills, a company with proven leadership in automotive and in high-level security solutions.

For me this was a wake-up discussion, the first time in quite a while that I’ve seen someone who’s going to re-engineer the verification tooling business and move it onto a new level. I’m looking forward to hearing more.