Array ( [content] => [params] => Array ( [0] => /forum/threads/would-optane-have-been-good-for-ai-inference.22002/ ) [addOns] => Array ( [DL6/MLTP] => 13 [Hampel/TimeZoneDebug] => 1000070 [SV/ChangePostDate] => 2010200 [SemiWiki/Newsletter] => 1000010 [SemiWiki/WPMenu] => 1000010 [SemiWiki/XPressExtend] => 1000010 [ThemeHouse/XLink] => 1000970 [ThemeHouse/XPress] => 1010570 [XF] => 2021770 [XFI] => 1050270 ) [wordpress] => /var/www/html )
Too high latency imoAI LLMs are mostly bandwidth limited; would Optane DIMMS have potentially been a good way to lower the cost to run larger LLMs locally? (Typically requiring 768+GB of ram). Would the latency have been too high?
I wonder -- it doesn't seem to matter that GDDR6X has a lot higher latency than main DRAM for models.Too high latency imo
Thanks!It does. GDDR7 should be better specifically aiming at AI.
(One of) Problems of DCPMM was that they were kind of more expensive than DRAM. And they made sense for very specific applications (considering that You are displacing DRAM). I don't know whether AI was one of them.
You can test it. There are some used modules and workstation boards... https://www.ebay.com/sch/i.html?_nkw=dcpmm https://www.ebay.com/sch/i.html?_nkw=lga+3647 https://www.ebay.com/sch/i.html?_nkw=lga+4677 But i have no green idea how supported they are.
There are also Xeons with HBM, which would make probably biggest sense to combine them with Optane.
But 64GB is kind of small. They would need at least 8 stacks (like AMD Mi300 192GB) and fast on-package interconnect...![]()
Intel Xeon MAX 9480 Deep-Dive 64GB HBM2e Onboard Like a GPU or AI Accelerator
We deep-dive into the Intel Xeon MAX 9480 and see several surprises when combining Xeon cores and HBM2e memory (like a GPU uses)www.servethehome.com
I had same idea but shipping for the board in EU is around 50€ (or 100€+ from US to EU plus tax, duties, fees).This is why I was curious if they might have been useful for AI applications. Testing would be a bit expensive, but I may do it for science (and then re-ebay the parts when done). The test would be to load up a system with DDR4 ECC DIMMs, and then replace with Optane DIMMs and see if the latency affects token speed at all for various size LLMs.