You are currently viewing SemiWiki as a guest which gives you limited access to the site. To view blog comments and experience other SemiWiki features you must be a registered member. Registration is fast, simple, and absolutely free so please, join our community today!
Google's TPU - Tensor Processing Unit - surprising reveal yesterday. Any one have thoughts on die size and tech node? The size of the heat sink from the EE Times story seems to suggest about 3cm x 4cm heat sink which is approx 1200 sqmm. Is there a ratio like 10% from the size of heat sink to actual die size to estimate die size? What foundry is making this chip? How many TPUs can you slot in a 1RU server rack - say 2-socket Xeon? Seems like SATA connections are being used?
Google's Tensor Processing Unit (TPU) fits in a hard-drive slot of a server.
Since it is a custom ASIC chip design, I would exclude Intel as a manufacturer. Bad news also for NVIDIA I would say (even if it is not sure Google has any plan to make this chip available on the market) .
Technology node? Most likely 20-22nm, based on the Google statement that they have been running TPUs inside their data centers for more than a year (for that I would exclude 14nm). Foundry? No idea, my bet is IBM at the moment.
Oh God, Skynet is coming
The technology could even be the more cost effective 28nm node, since very few customers used the 20nm node. We kind of have to wait for Google to open up their story a bit more to better understand their specifications.
So, can we logically predict that in-house designed SoC/processors by Facebook, Amazon, or even Microsoft are coming into their data centers or assembled inside their products very soon? Then there is no reason Apple won't develop their own SoC/processors for Apple's servers and Macs.
This is really an exciting moment. But it's a bad news for Intel.
Tensors isn't a trade you can learn many places. My experience is where I was the people that taught it didn't know what they were doing. Selected applications it will be incredible but the general public doesn't want it.
I've seen research talking about doing neural networks in analog 130nm chips, and they get really good power consumption numbers, they could fit Google's 10X perf/w improvement. There are even research on using 40nm analog, to get another 10x perf/w.
Don't know if Google went there, but it fits their brand of r&d.
The TPU project actually began with FPGAs, but we abandoned them when we saw that
the FPGAs of that time were not competitive in performance compared to the GPUs of that time, and the TPU could be much
lower power than GPUs while being as fast or faster, giving it potentially significant benefits over both of FPGAs and GPUs.
note: Today FPGAs are better compared to FPGAs of that time, but today GPUs are also much better (Nvidia is talking about 10x improvements on specific cases).
Catapult V1 runs CNNs—using a systolic matrix multiplier—2.3X as fast as a 2.1 GHz, 16-core, dual-socket server
[Ovt15a]. Using the next generation of FPGAs (14-nm Arria 10) of Catapult V2, performance might go up to 7X, and perhaps
even 17X with more careful floorplanning [Ovt15b]. Although it’s apples versus oranges, a current TPU die runs its CNNs
40X to 70X versus a somewhat faster server (Tables 2 and 6). Perhaps the biggest difference is that to get the best
performance the user must write long programs in the low-level hardware-design-language Verilog [Met16][Put16] versus
writing short programs using the high-level TensorFlow framework. That is, reprogrammability comes from software for the
TPU rather than from firmware for the FPGA.
I'm currently in Taiwan and recently visited Taichung. I was utterly shocked how fast TSMC is building their facilities around there. There were apparently two BIG new buildings that previously didn't exist a year ago. The growth is simply amazing. That said TSMC is a HUGE and important part of Taiwan's economy.