Array
(
    [content] => 
    [params] => Array
        (
            [0] => /forum/threads/nvidia-announces-tesla-gv100-gpu-on-tsmc-12nm-ffn.9285/
        )

    [addOns] => Array
        (
            [DL6/MLTP] => 13
            [Hampel/TimeZoneDebug] => 1000070
            [SV/ChangePostDate] => 2010200
            [SemiWiki/Newsletter] => 1000010
            [SemiWiki/WPMenu] => 1000010
            [SemiWiki/XPressExtend] => 1000010
            [ThemeHouse/XLink] => 1000970
            [ThemeHouse/XPress] => 1010570
            [XF] => 2021770
            [XFI] => 1050270
        )

    [wordpress] => /var/www/html
)

NVIDIA announces Tesla GV100 GPU on TSMC 12nm FFN

I just got through the NVIDIA collateral and let me say, WOW, what an incredible run they are having. The question I have is who will catch them? Certainly not AMD or Intel.

I am wondering about the inference stuff. NVDA seems to be chasing that market with GPUs but it seems like overkill to me. FPGAs seem much more suited. I do understand the single vendor approach to AI, one vendor for both learning and inference but on a chip vs chip level FPGAs seem much better suited. Thoughts?
 
NVIDIA now command such power that they can get a half-node tailor-made for them? Wow.

AI is a target market for TSMC so yes they will work closely with NVDA. You should also know that NVDA has been working closely with TSMC since the beginning of both companies 20+ years ago. In fact, [FONT=Roboto, arial, sans-serif]Jensen Huang and Morris Chang are very close friends. [/FONT]
 
I just got through the NVIDIA collateral and let me say, WOW, what an incredible run they are having. The question I have is who will catch them? Certainly not AMD or Intel.

I am wondering about the inference stuff. NVDA seems to be chasing that market with GPUs but it seems like overkill to me. FPGAs seem much more suited. I do understand the single vendor approach to AI, one vendor for both learning and inference but on a chip vs chip level FPGAs seem much better suited. Thoughts?
Agreed on inference. The trend is to skinnying down neural nets in inference for much lower power and area - one to 4 bit multiplication and sparse-matrix handling for example. That motivates more specialized hardware/IP, potentially even on high-volume applications. FPGAs? Maybe though power remains a concern for edge nodes.
 
I think the training of NN is better suited to GPU. Here is a bit of a reference.

https://www.quora.com/Why-and-how-are-GPUs-so-important-for-Neural-Network-computations

Most commercial and educational NN software has built in support for GPU, same can't be said for FPGA.

Neural Networks with Parallel and GPU Computing - MATLAB & Simulink

As I've said before, GPU commuting is ideal for algorithms that can benefit from parallelism, and inference and NNs certainly do. FPGA is highly suitable for algorithms that benefit from reprogramability, think search.
 
I was just at Eurocrypt conference. The limiting step in breaking RSA public key encryption is large matrix multiplications. Matrix multiplication is not parallizable and best for sparce matrices is n squared (number of rows times number of columns). Matrices are huge. I wonder what special hardware for sparse matrix multiplication is. Best for non sparse matrices is n**2.8.

Also interesting is work on creating password algorithms that inherently require password breaking algorithms (exhaustive search) to require maximum memory and maximum non parallizable compute time so cracking ASICs can't be built.
 
I was just at Eurocrypt conference. The limiting step in breaking RSA public key encryption is large matrix multiplications. Matrix multiplication is not parallizable and best for sparce matrices is n squared (number of rows times number of columns). Matrices are huge. I wonder what special hardware for sparse matrix multiplication is. Best for non sparse matrices is n**2.8.
...
Good question. I think it would depend very much in how the matrix is sparse. I got my info from the Cadence summit on embedded neural nets (https://www.semiwiki.com/forum/content/6589-notes-neural-edge.html and Embedded Neural Network Summit | Cadence IP) where sparsity is being exploited to reduce area and power over general neural algos. Here's one paper on handling spare matrices for CNNs: http://www.cv-foundation.org/openac...arse_Convolutional_Neural_2015_CVPR_paper.pdf
 
Jensen said the 815 mm[SUP]2[/SUP] size chip was "reticle limited". Working backward from the photos, I'm guessing the mask is around 10cm by 13cm, so indeed you can only fit one on a 6 inch square reticle. Do you people think that will be a trend for processors? What does that mean for yields?
 
Jensen said the 815 mm[SUP]2[/SUP] size chip was "reticle limited". Working backward from the photos, I'm guessing the mask is around 10cm by 13cm, so indeed you can only fit one on a 6 inch square reticle. Do you people think that will be a trend for processors? What does that mean for yields?

It can't get much bigger, it's already bumping up against 858 mm<sup>2</sup> (26 mm x 33 mm), the litho tool field size.
 
Just wondering, 12nm is 16nm with 6 track cell instead of 7.5 or 9. But I've heard that reducing the track height also reduces the performance and yet 12nm is 10% better performance. So, how do they do it? Presumably, they've changed something else apart from the track height?
 
Just wondering, 12nm is 16nm with 6 track cell instead of 7.5 or 9. But I've heard that reducing the track height also reduces the performance and yet 12nm is 10% better performance. So, how do they do it? Presumably, they've changed something else apart from the track height?
That's 12FFC, the mobile node. We do not know the specs of 12FFN. It's possible that 12FFN still uses 7.5 track cells, because if you compare the transistor densities of GP100 and GV100, there's hardly any difference.
 
That's 12FFC, the mobile node. We do not know the specs of 12FFN. It's possible that 12FFN still uses 7.5 track cells, because if you compare the transistor densities of GP100 and GV100, there's hardly any difference.
That's interesting. So, 12FFN should really be called 16FFN
 
>800mm2 ==> yield *way* below 50% -- probably number of good die per wafer is in single digits...

Nothing good. Max dice per wafer ~60, yield below 50%. So most likely 30 good chips per wafer at max. Of course, the prime dice are sold for thousands of dollars, so the profit is still there.
 
What is 12nm wafer price?

Because yield might not be problem. System with 8 V100 acceleators cost $149,000,

DGX-1 Essential Instrument of AI Research | NVIDIA

12FFN wafer pricing isn't public, but for a die this big the manufacturing cost per good chip could easily be a couple of thousand dollars. I heard for a similar size chip in a different process in the past that a "good wafer" was one which yielded *a* working die -- hopefully 12FFN is better than this...
 
Back
Top