NVIDIA announces Tesla GV100 GPU on TSMC 12nm FFN

TeemuSoilamo · May 10, 2017

https://abload.de/img/2017-05-1020_22_23-20ubl6o.jpg

12FFC has 1.2x the density and 25% less power / 10% higher perf. over 16FF+, but this is the first time I've heard of 12FFN. What do we know about this 16FF+ optimization?

Daniel Nenni · May 10, 2017

There is a discussion on 12nm here:

https://www.semiwiki.com/forum/f2/tsmc-12nm-node-8689.html

I also covered it in a blog here:

https://www.semiwiki.com/forum/content/6662-tsmc-talks-about-22nm-12nm-7nm-euv.html

Since NVDA is already on 16nm moving to 12nm is not a big challenge. From what I have heard NVDA will skip 10nm and move directly to 7nm HPC, which they co developed with TSMC.

lefty · May 11, 2017

According to Anandtech N stands for Nvidia.

In terms of die size and transistor count, NVIDIA is genuinely building the biggest GPU they can get away with: 21.1 billion transistors, at a massive 815mm2, built on TSMC’s still green 12nm “FFN” process (the ‘n’ stands for NVIDIA; it’s a customized higher perf version of 12nm for NVIDIA).

NVIDIA Volta Unveiled: GV100 GPU and Tesla V100 Accelerator Announced

TeemuSoilamo · May 11, 2017

NVIDIA now command such power that they can get a half-node tailor-made for them? Wow.

Daniel Nenni · May 11, 2017

I just got through the NVIDIA collateral and let me say, WOW, what an incredible run they are having. The question I have is who will catch them? Certainly not AMD or Intel.

I am wondering about the inference stuff. NVDA seems to be chasing that market with GPUs but it seems like overkill to me. FPGAs seem much more suited. I do understand the single vendor approach to AI, one vendor for both learning and inference but on a chip vs chip level FPGAs seem much better suited. Thoughts?

Daniel Nenni · May 11, 2017

TeemuSoilamo said:
NVIDIA now command such power that they can get a half-node tailor-made for them? Wow.

AI is a target market for TSMC so yes they will work closely with NVDA. You should also know that NVDA has been working closely with TSMC since the beginning of both companies 20+ years ago. In fact, [FONT=Roboto, arial, sans-serif]Jensen Huang and Morris Chang are very close friends. [/FONT]

Bernard Murphy · May 11, 2017

Daniel Nenni said:
I just got through the NVIDIA collateral and let me say, WOW, what an incredible run they are having. The question I have is who will catch them? Certainly not AMD or Intel.

I am wondering about the inference stuff. NVDA seems to be chasing that market with GPUs but it seems like overkill to me. FPGAs seem much more suited. I do understand the single vendor approach to AI, one vendor for both learning and inference but on a chip vs chip level FPGAs seem much better suited. Thoughts?

Agreed on inference. The trend is to skinnying down neural nets in inference for much lower power and area - one to 4 bit multiplication and sparse-matrix handling for example. That motivates more specialized hardware/IP, potentially even on high-volume applications. FPGAs? Maybe though power remains a concern for edge nodes.

count · May 11, 2017

I think the training of NN is better suited to GPU. Here is a bit of a reference.

https://www.quora.com/Why-and-how-are-GPUs-so-important-for-Neural-Network-computations

Most commercial and educational NN software has built in support for GPU, same can't be said for FPGA.

Neural Networks with Parallel and GPU Computing - MATLAB & Simulink

As I've said before, GPU commuting is ideal for algorithms that can benefit from parallelism, and inference and NNs certainly do. FPGA is highly suitable for algorithms that benefit from reprogramability, think search.

smeyer0028 · May 12, 2017

I was just at Eurocrypt conference. The limiting step in breaking RSA public key encryption is large matrix multiplications. Matrix multiplication is not parallizable and best for sparce matrices is n squared (number of rows times number of columns). Matrices are huge. I wonder what special hardware for sparse matrix multiplication is. Best for non sparse matrices is n**2.8.

Also interesting is work on creating password algorithms that inherently require password breaking algorithms (exhaustive search) to require maximum memory and maximum non parallizable compute time so cracking ASICs can't be built.

Bernard Murphy · May 12, 2017

smeyer0028 said:
I was just at Eurocrypt conference. The limiting step in breaking RSA public key encryption is large matrix multiplications. Matrix multiplication is not parallizable and best for sparce matrices is n squared (number of rows times number of columns). Matrices are huge. I wonder what special hardware for sparse matrix multiplication is. Best for non sparse matrices is n**2.8.
...

Good question. I think it would depend very much in how the matrix is sparse. I got my info from the Cadence summit on embedded neural nets (https://www.semiwiki.com/forum/content/6589-notes-neural-edge.html and Embedded Neural Network Summit | Cadence IP) where sparsity is being exploited to reduce area and power over general neural algos. Here's one paper on handling spare matrices for CNNs: http://www.cv-foundation.org/openac...arse_Convolutional_Neural_2015_CVPR_paper.pdf

Doug Smith · May 22, 2017

Jensen said the 815 mm[SUP]2[/SUP] size chip was "reticle limited". Working backward from the photos, I'm guessing the mask is around 10cm by 13cm, so indeed you can only fit one on a 6 inch square reticle. Do you people think that will be a trend for processors? What does that mean for yields?

Fred Chen · May 22, 2017

Doug Smith said:
Jensen said the 815 mm[SUP]2[/SUP] size chip was "reticle limited". Working backward from the photos, I'm guessing the mask is around 10cm by 13cm, so indeed you can only fit one on a 6 inch square reticle. Do you people think that will be a trend for processors? What does that mean for yields?

It can't get much bigger, it's already bumping up against 858 mm<sup>2</sup> (26 mm x 33 mm), the litho tool field size.

astilo · May 23, 2017

Doug Smith said:
What does that mean for yields?

Nothing good. Max dice per wafer ~60, yield below 50%. So most likely 30 good chips per wafer at max. Of course, the prime dice are sold for thousands of dollars, so the profit is still there.

lefty · May 23, 2017

Just wondering, 12nm is 16nm with 6 track cell instead of 7.5 or 9. But I've heard that reducing the track height also reduces the performance and yet 12nm is 10% better performance. So, how do they do it? Presumably, they've changed something else apart from the track height?

TeemuSoilamo · May 26, 2017

lefty said:
Just wondering, 12nm is 16nm with 6 track cell instead of 7.5 or 9. But I've heard that reducing the track height also reduces the performance and yet 12nm is 10% better performance. So, how do they do it? Presumably, they've changed something else apart from the track height?

That's 12FFC, the mobile node. We do not know the specs of 12FFN. It's possible that 12FFN still uses 7.5 track cells, because if you compare the transistor densities of GP100 and GV100, there's hardly any difference.

lefty · May 26, 2017

TeemuSoilamo said:
That's 12FFC, the mobile node. We do not know the specs of 12FFN. It's possible that 12FFN still uses 7.5 track cells, because if you compare the transistor densities of GP100 and GV100, there's hardly any difference.

That's interesting. So, 12FFN should really be called 16FFN

IanD · Jun 6, 2017

>800mm2 ==> yield *way* below 50% -- probably number of good die per wafer is in single digits...

astilo said:
Nothing good. Max dice per wafer ~60, yield below 50%. So most likely 30 good chips per wafer at max. Of course, the prime dice are sold for thousands of dollars, so the profit is still there.

Jozo035 · Jun 9, 2017

What is 12nm wafer price?

Because yield might not be problem. System with 8 V100 acceleators cost $149,000,

DGX-1 Essential Instrument of AI Research | NVIDIA

IanD · Jun 9, 2017

Jozo035 said:
What is 12nm wafer price?

Because yield might not be problem. System with 8 V100 acceleators cost $149,000,

DGX-1 Essential Instrument of AI Research | NVIDIA

12FFN wafer pricing isn't public, but for a die this big the manufacturing cost per good chip could easily be a couple of thousand dollars. I heard for a similar size chip in a different process in the past that a "good wafer" was one which yielded *a* working die -- hopefully 12FFN is better than this...

Search

NVIDIA announces Tesla GV100 GPU on TSMC 12nm FFN

TeemuSoilamo

New member

Daniel Nenni

Admin

lefty

Active member

TeemuSoilamo

New member

Daniel Nenni

Admin

Daniel Nenni

Admin

Bernard Murphy

Moderator

count

Well-known member

smeyer0028

Guest

Bernard Murphy

Moderator

Doug Smith

New member

Fred Chen

Moderator

astilo

Member

lefty

Active member

TeemuSoilamo

New member

lefty

Active member

IanD

Well-known member

Jozo035

Active member

IanD

Well-known member