Intel's Manufacturing Day Materials

Scotten Jones · Mar 31, 2017

TeemuSoilamo said:
I see. So, that would leave Intel with a slight edge over TSMC with the 36nm MMP. Of course, there may be additional factors that make a pure apples to apples comparison impossible.

The source that told me 40nm MMP was wrong as was the resulting 5 track calculation.

I will be writing up a density comparisons shortly based on all the new data. Intel 10nm is far denser than TSMC or Samsung 10nm but TSMC 7nm will likely be slightly denser than Inetl 10nm.

Scotten Jones · Mar 31, 2017

Jozo035 said:
What i like most is this slide https://3s81si1s5ygj3mzby34dq6qf-wp...el-moores-logic-transistor-density-metric.jpg and consequent comparison with competition https://3s81si1s5ygj3mzby34dq6qf-wp...es-logic-transistor-density-metric-others.jpg .

Intel promised here, that they will deliver stunning 100 MTr/mm2. Which is i must say, amazing. Even more amazing is, that others will deliver "just" 50 MTr/mm2 (which is i guess accurate number).

But lets look at other numbers they showed here. If we start from 45nm, they showed 3,3 MTr/mm2, which is i guess kind of accurate with delivered 3,08 Mtr. But if we look at the rest...

I made average of all Intels processors i know across nodes shown in slide, and here is the table:

[table] style="width: 411px"
|-
| style="height: 20px; width: 64px" | node
| style="width: 84px" | Intel slide
| style="width: 115px" | Intel delivered
| style="width: 149px" | Intel blend of reality
|-
| style="height: 20px" | 45
| 3,3
| 3,081652655
| 1,070853977
|-
| style="height: 20px" | 32
| 7,5
| 5,250487906
| 1,428438677
|-
| style="height: 20px" | 22
| 15,3
| 7,795977663
| 1,962550518
|-
| style="height: 20px" | 14
| 37,5
| 14,80648342
| 2,532674298
|-
| style="height: 20px" | 10
| 100,8
| ??
| ??
|-
[/table]

The Intel slide you are referencing is logic density only! SRAM density, I/O density and analog density will all be different and the mix of those with logic will determine the overall transistor density on the die. Just deciding to design with more or less cache will change the numbers a lot because SRAM transistor density is generally higher than random logic transistor density.

lefty · Mar 31, 2017

By the way, does anyone know what the "pp" means in this slide?
They say "14+ @ 70pp" versus "14++ @ 84pp"
View attachment 19520

Scotten Jones · Mar 31, 2017

lefty said:
By the way, does anyone know what the "pp" means in this slide?
They say "14+ @ 70pp" versus "14++ @ 84pp"
View attachment 19520

pp = poly pitch

70nm is the 14nm process contacted poly pitch (CPP), they apparently relaxed the CPP to 84nm for 14++ to get more performance

carop · Mar 31, 2017

Scotten Jones said:
The Intel slide you are referencing is logic density only!

One of the problems of Intel's logic density metric is that it fails to account for different layout styles.

The following quotation is from one of your articles here at SemiWiki:

Using multi patterning is driving a lot of 1D layout restrictions. You are getting base scaling but it isn’t enough and you are adding a lot of cost. We need EUV so we can get back to 2D layouts. A 2D layout versus a 1D layout is worth about a node of scaling.

David Fried, CTO Coventor

SemiWiki.com - Coventor ASML IMEC: The last half nanometer

Scotten Jones · Mar 31, 2017

carop said:
One of the problems of Intel's logic density metric is that it fails to account for different layout styles.

The following quotation is from one of your articles here at SemiWiki:

SemiWiki.com - Coventor ASML IMEC: The last half nanometer

I am not really a layout expert, but I believe that the advantages of 2D layout would be reflected in the resulting cell sizes and therefore in the Intel metric.

lefty · Mar 31, 2017

So, that means Intel's 8 generation is going to have a 20% bigger die? (70 to 84 is a 20% increase)

carop · Mar 31, 2017

Scotten Jones said:
I am not really a layout expert, but I believe that the advantages of 2D layout would be reflected in the resulting cell sizes and therefore in the Intel metric.

If you have a difficult to route (standard cell) library or are forced by litography to do 1D routing on many layers, I would expect you to pay a density penalty.

Scotten Jones · Mar 31, 2017

carop said:
If you have a difficult to route (standard cell) library or are forced by litography to do 1D routing on many layers, I would expect you to pay a density penalty.

I agree that 2D is better than 1D for routing, but the question is whether that will show up in the cell sizes.

Scotten Jones · Mar 31, 2017

lefty said:
So, that means Intel's 8 generation is going to have a 20% bigger die? (70 to 84 is a 20% increase)

I don't think it is that simple. If by opening up the CPP you get higher performance transistors maybe you need less of them or can go from a 3 fin to 2 fin transistor shrinking the standard cell. At SEMICON West this year there was a presentation where the author discussed how increasing pitches in some cases shrunk the design.

lefty · Mar 31, 2017

Scotten Jones said:
I don't think it is that simple. If by opening up the CPP you get higher performance transistors maybe you need less of them or can go from a 3 fin to 2 fin transistor shrinking the standard cell. At SEMICON West this year there was a presentation where the author discussed how increasing pitches in some cases shrunk the design.

Ok, thanks for the answer

Jozo035 · Mar 31, 2017

Scotten Jones said:
The Intel slide you are referencing is logic density only! SRAM density, I/O density and analog density will all be different and the mix of those with logic will determine the overall transistor density on the die. Just deciding to design with more or less cache will change the numbers a lot because SRAM transistor density is generally higher than random logic transistor density.

Well i understand what you mean, but then it is oranges to apples comparison. Why they are comparing Intel's logic density with numbers, which competition achieved in real products?

Or different question, why Intel's logic density is so far from what they are achieving in processors? AMD ZEN achieved 25 MTr/mm2, 10 more if we compare to Intel 14nm processors. And here any obligatory portion of analog, IO, SRAM is similar.

Or you probably heard about Adapteva. They achieved almost 39 MTr/mm2 using 16FFC node (in Epiphany-V chip). This is average of all parts implemented (logic, IO, SRAM...) and it is better than what Intel stated as their 14nm density. Why?

Sorry for probably stupid questions. I don't want to say that you are missing something since i know that you are significantly better informed than me (i am more into bigger nodes where we don't count transistors in billions), but i see what Intel is promising and it kind of don't match wit what they are delivering in their products...

Scotten Jones · Apr 1, 2017

Jozo035 said:
Well i understand what you mean, but then it is oranges to apples comparison. Why they are comparing Intel's logic density with numbers, which competition achieved in real products?

Or different question, why Intel's logic density is so far from what they are achieving in processors? AMD ZEN achieved 25 MTr/mm2, 10 more if we compare to Intel 14nm processors. And here any obligatory portion of analog, IO, SRAM is similar.

Or you probably heard about Adapteva. They achieved almost 39 MTr/mm2 using 16FFC node (in Epiphany-V chip). This is average of all parts implemented (logic, IO, SRAM...) and it is better than what Intel stated as their 14nm density. Why?

Sorry for probably stupid questions. I don't want to say that you are missing something since i know that you are significantly better informed than me (i am more into bigger nodes where we don't count transistors in billions), but i see what Intel is promising and it kind of don't match wit what they are delivering in their products...

Intel is comparing their density to other companies density calculated the same way using the same metric, it isn't apples versus oranges although I am working to confirm their calaculations now with my own data. The objective is to compare processes without design differences confusing the issue.

As I said previously SRAM has higher native transitor density than logic.

AMD's Ryzen 7 1800x looks good compared to Intel's Core i7 Skylake but if you look at the cache sizes:

L1 - Skylake 64k, Ryzen 768K
L2 - Skylake 256K, Ryzen 4M
L3 - Skylake 8M, Ryzen 16M

The much larger caches will make Ryzen look a lot better than it is because of SRAM density. You would also have to dig into on chip IO and analog and that might be very different as well.

I am working on a blog now that will use consistent logic and memory density metrics to compare processes. Where ever possible I will be checking the claims by the different companies against third party data.

Jozo035 · Apr 1, 2017

Scotten Jones said:
Intel is comparing their density to other companies density calculated the same way using the same metric, it isn't apples versus oranges although I am working to confirm their calaculations now with my own data. The objective is to compare processes without design differences confusing the issue.

As I said previously SRAM has higher native transitor density than logic.

AMD's Ryzen 7 1800x looks good compared to Intel's Core i7 Skylake but if you look at the cache sizes:

L1 - Skylake 64k, Ryzen 768K
L2 - Skylake 256K, Ryzen 4M
L3 - Skylake 8M, Ryzen 16M

The much larger caches will make Ryzen look a lot better than it is because of SRAM density. You would also have to dig into on chip IO and analog and that might be very different as well.

I am working on a blog now that will use consistent logic and memory density metrics to compare processes. Where ever possible I will be checking the claims by the different companies against third party data.

Why you are comparing per-core data of Intel Skylake with per-chip data of ZEN? And why you are comparing 4-core version with 8-core version?

Here is detailed comparison of more comparable models:

[table] style="width: 383px"
|-
| style="height: 20px; width: 77px" |
| style="width: 147px" | Intel Broadwell-e
| style="width: 159px" | AMD ZEN
|-
| style="height: 20px" | L1 per core
| 32KB I-chahe + 32KB D-cache
| 64KB I-chahe + 32KB D-cache
|-
| style="height: 20px" | L1 total
| 640KB
| 786KB
|-
| style="height: 20px" | L2 per core
| 256KB
| 512KB
|-
| style="height: 20px" | L2 total
| 2,5MB
| 4MB
|-
| style="height: 20px" | L3 per ring bus stop /CCX
| 2,5MB
| 8MB
|-
| style="height: 20px" | L3 total
| 25MB
| 16MB
|-
[/table]

Also you probably remember this table ISSCC: http://img.deusm.com/eetimes/2016/02/1331317/Zen-comparison.png

It's describing approximately 1,2x advantage in SRAM density for AMD. But chip itself it is almost 1,7x better for AMD.

IO should make difference, but amount of attachable devices is similar. AMD has less PCI-e lanes but on second hand, it has infinity fabric used to communicate with other dies in MCM... Others... high-current USB is supported by chipset placed on mainboard...

Memory interface is also similar. Both has 2 to 4 memory channels which means 128 to 256 bit memory interface (or 288 ECC) IIRC. Point is that graphics cards has 384 or even 512 bit memory interface, while density remains good.

Scotten Jones · Apr 2, 2017

This conversation is getting outside of what I follow and analyze. I was trying to make a quick point to help you understand why the chip level comparison doesn't relate to process density. I don't generally look at chip level and I just grabbed an example from the internet, my expertise is in process technology and cost, not chip design.

When process technologists compare processes they do it using low level elements that are common across processes to avoid mixing up design related decisions with actual process density and performance. At conferences such as IEDM and VLSIT where processes are presented and compared the metrics are contacted poly pitch, minimum metal pitch, on-state drive current, off-state leakage, SRAM cell size, operating voltage etc. These are metrics that allow apples to apples comparisons of processes without mixing in design decisions.

In your first post on this thread you compared "Intel Slide" - the transistor density for a specific pair of logic elements to "Intel Delivered" - the transistor density for an entire die mixing many types of elements and calculated "Intel blend of reality" - this is an invalid and unfair way to look at it. Your title suggests marketing spin when in fact the two numbers shouldn't match because the delivered density is so tied to design decisions that have nothing to do with the process.

If a designer was told to design a die in the Intel process and their only goal was to maximize transistor density they could design a low performance SRAM array with HD SRAM cells and achieve a higher transistor density than Intel reports. Conversely a die with lots of analog elements, lots of interconnect and other things could be designed with much lower transistor density. Even designing an SRAM cache array presents many different approaches that result in different areas and performance on the same process. A single ported SRAM array is much smaller than a double ported SRAM array for the same number of bits and there are many other decisions in cache design that also drive different areas and performances.

At a die level if Intel and AMD have processors with the same feature set, same performance, and the same power consumption and one die size is bigger than the other all it really says is one company did a better job of optimizing die size than the other company but to even conclude this you need to make sure the delivered performance is all the same. Maybe the design is better, maybe the process is better, you just don't know. Once again I don't really follow this but if the examples in your previous posts have the same performance, features and power, then AMD is doing a better job of optimizing area but until you make sure the performance delivered and power consumption are the same you don't know because maybe that was the design goal. The only way you could even start to compare process technology at a die level would be if you took the same design, say a specific ARM core and implemented it on two different processes and even then there would be some question as to whether the design just happened to use elements that in one process match up with the deign more than the other.

You say things like "of course you are familiar with" this, or reference an ISSCC paper. ISSCC is really a chip design conference, not a process technology conference. I sometime read ISSCC papers but not very often and I don't go to the conference because it is outside of my area of study.

I have looked at the GLOBALFOUNDRIES 14nm process that AMD uses and the Intel 14nm process and at a process level the Intel Process is denser. Even the ISSCC table you link to shows tighter pitches and smaller SRAM cell size for competitor A than for Zen. The smaller L3 cache area for Zen versus competitor A for the same 8MB of memory with a larger SRAM cell size means they designed the cache differently. Maybe it is smaller but lower performance or maybe it is just better designed, even with all the data in the table you can't tell which it is. Assuming the processors have similar performance all you can say is AMD did a better job of designing a smaller die.

Jozo035 · Apr 2, 2017

Hmm. I think i will wait for article.

IanD · Apr 4, 2017

There are three basic sets of optimisations going on here; one is process (CPP, MMP, Ion/Ioff, SRAM cell size); one is cell design (cell library height, routing method, SRAM configuration, library Vth and Lgate options), and a third is chip architecture. In all cases there is often no "best for everything" answer; choices which increase speed (="performance") can also increase power (e.g. taller cells), choices which increase density (e.g. smaller CPP) can add capacitance and reduce performance, choices which increase clock speed (e.g. more pipelining) can increase latency and power consumption.

Scott is concentrating on the first set of basic process parameters because this is a test of the capability of the raw process, including how difficult it is to manufacture; all the others are down to library and chip design decisions on top of the process, which makes chip-level "apple-to-apple" comparisons almost impossible. If company A has an inferior basic process to company B but makes better decisions about libraries and architecture they can end up with a better-performing chip in spite of the process disadvantage -- and the reverse also applies.

rahxe · Apr 4, 2017

dose this means 16+ faster than intel's first generation 14nm？
View attachment 19551

Lodix · Apr 5, 2017

Can you post a link to the image ? It is very compressed and hard to read.

astilo · Apr 5, 2017

rahxe said:
dose this means 16+ faster than intel's first generation 14nm？
View attachment 19551

Both 16FF+ and 14LPP actually (and to make their 14+ looks better, they took into account also the foundry 20nm node). We are talking about xtors only of course.
What I really find funny here is that with one hand they show us their superior scaling, while with the other hand the performance improvement (but only one hand at a time). Except they had to move from a poly pitch of 70nm up to 84nm from 14 to 14++ to increase the xtors performance.

Intel's Manufacturing Day Materials

Moderator

Moderator

Active member

Moderator

New member

Moderator

Active member

New member

Moderator

Moderator

Active member

Active member

Moderator

Active member

Moderator

Active member

Well-known member

New member

Member

Member