Arthur Hanson
Well-known member
Will the AMD/Microsoft alliance become a serious player and power in AI/ML. Microsoft has the money/software skills and AMD may have the processor skills. Any thoughts on the impact of this alliance appreciated.
Array ( [content] => [params] => Array ( [0] => /forum/threads/any-thoughts-on-the-amd-microsoft-alliance-in-ai-ml.17921/ ) [addOns] => Array ( [DL6/MLTP] => 13 [Hampel/TimeZoneDebug] => 1000070 [SV/ChangePostDate] => 2010200 [SemiWiki/Newsletter] => 1000010 [SemiWiki/WPMenu] => 1000010 [SemiWiki/XPressExtend] => 1000010 [ThemeHouse/XLink] => 1000970 [ThemeHouse/XPress] => 1010570 [XF] => 2021770 [XFI] => 1050270 ) [wordpress] => /var/www/html )
Microsoft has a long history of using FPGAs for inferencing, and Altera added many features to their product line to support Azure/Bing use. Read up on the Catapult project which started with use in Bing around 2010 for inferencing used to rank search results. The advantage of FPGAs has been you can roll out new models with custom data flows as fast as they can be trained, while an ASIC has a minimum 2 year on ramp for fundamental changes, and the AI-customized FPGAs could keep up with throughput.Correct me if i am wrong, but MSFT is unique among the hyperscalers in not having any AI hardware/CPUs of their own.
2 years for fundamental changes? Have you been talkin to Mr. Blue? Stop with the corporate travel meetings. Use the phone!while an ASIC has a minimum 2 year on ramp for fundamental changes
I thought my ears were ringing. When was the last time, Cliff, you led a design team which produced a new generation of a complex ASIC with new custom functionality, starting from the high-level design down to a customer PoC-ready A0?2 years for fundamental changes? Have you been talkin to Mr. Blue? Stop with the corporate travel meetings. Use the phone!
That's often how the design changes are prototyped before spending money on backend chip development. You analog guys are expensive, not to mention mask sets and stuff like that.I just do simple stuff. Would that complex code change also apply to the verilog that would be loaded into an FPGA?
You need both.Pictures are worth 1000 lines of specs.
Yeah...MPWs are a must!
I'm out of date, so I don't know what a typical shuttle turn-around time is anymore.Now this brings up the problem. 12 weeks is too long for an MPW run.
Who is Fred?This is Fred's fault. He never works a full day, just 18 hours maximum. Fred, work harder.
Smoking pot in the morning is bad for your health.I heard that Amazon is primed to take over the foundry business. 24 hour gds2mask2wafer2test2slice2package for prime members only.
But the algorithm guys crank out a new prototype every week and want to ship the best one every month. Hence they like FPGAs.FPGA for prototyping... yes,
Two years is not the time to “adjust” an ASIC. No one spends ASIC development dollars on “adjustments“ anymore. At least for advanced processes. This is for a significant new generation of an existing design of a complex ASIC, meaning new features. For a new design of a new architecture (say a CXL 3.0 switch ASIC with caching and embedded cores for running fabric management (among other things), two years would be an outstanding performance. I suspect most (all?) of the first generation designs appearing for CXL 3.0 will be realized in FPGAs.FPGAs for high performance and flexible changes, sure, but...
2 years (less manufacturing time) to "adjust" most existing designs is ridiculous. I guess you are also used to working with bureaucracy, inefficiency, and incompetency, or perhaps you lost your original staff and didn't shrink wrap the designs (DUTs with TBs and lots of notes on the schematics), which is already covered by the incompetency statement.
Not in any ASIC project I'm aware of. ASIC development is very expensive, as proven by your experience that TSMC doesn't think a small ASIC design company is serious unless they have $50M+ in the bank. To justify an ASIC all-layer spin you need a significant business case, like new major features that are necessary for competitive parity or leadership, much higher performance/throughput, support of new interconnects or radios (e.g. PCI Gen 5 or 5G), stuff like that. This is one reason why FPGAs are getting more attention; ASIC development is becoming so expensive it rules out marginal business cases. Also, as TanJ mentioned, the FPGA industry is getting smarter and adding additional capabilities to their FPGAs.What you coulda, shoulda done, more calibration, and more testing.
Remember, this thread began as speculation on Azure and ML chips. If they are doing that their benchmarks are alternatives like Google TPU, Nvidia H100, Graphcore Colossus. You are not going to beat those with a 16/12 ASIC, not even a large one. The stage for tinkering like that was a few years ago. Azure needs to be leading edge in this area.The $50M was for 5nm (EUV). For ASICs, you would use 16/14. I read in articles NREs of $20M and up, but we can do high performance for $3M to get to MPW. Of course, I am talking in generalities, and I already have the tools and lots of IP for the customers.
Edit: We are talking ASICs here. I took exception to the 2 year time frame, and for chips create by us practical types. 1ps jitter, 10 bits (0.1% error), 28Gsps, RISC-V, etc stuff.