Array
(
    [content] => 
    [params] => Array
        (
            [0] => /forum/threads/would-optane-have-been-good-for-ai-inference.22002/
        )

    [addOns] => Array
        (
            [DL6/MLTP] => 13
            [Hampel/TimeZoneDebug] => 1000070
            [SV/ChangePostDate] => 2010200
            [SemiWiki/Newsletter] => 1000010
            [SemiWiki/WPMenu] => 1000010
            [SemiWiki/XPressExtend] => 1000010
            [ThemeHouse/XLink] => 1000970
            [ThemeHouse/XPress] => 1010570
            [XF] => 2021770
            [XFI] => 1050270
        )

    [wordpress] => /var/www/html
)

Would Optane have been good for AI inference?

Xebec

Well-known member
AI LLMs are mostly bandwidth limited; would Optane DIMMS have potentially been a good way to lower the cost to run larger LLMs locally? (Typically requiring 768+GB of ram). Would the latency have been too high?
 
It does. GDDR7 should be better specifically aiming at AI.

(One of) Problems of DCPMM was that they were kind of more expensive than DRAM. And they made sense for very specific applications (considering that You are displacing DRAM). I don't know whether AI was one of them.

You can test it. There are some used modules and workstation boards... https://www.ebay.com/sch/i.html?_nkw=dcpmm https://www.ebay.com/sch/i.html?_nkw=lga+3647 https://www.ebay.com/sch/i.html?_nkw=lga+4677 But i have no green idea how supported they are.

There are also Xeons with HBM, which would make probably biggest sense to combine them with Optane.
But 64GB is kind of small. They would need at least 8 stacks (like AMD Mi300 192GB) and fast on-package interconnect...
 
It does. GDDR7 should be better specifically aiming at AI.

(One of) Problems of DCPMM was that they were kind of more expensive than DRAM. And they made sense for very specific applications (considering that You are displacing DRAM). I don't know whether AI was one of them.

You can test it. There are some used modules and workstation boards... https://www.ebay.com/sch/i.html?_nkw=dcpmm https://www.ebay.com/sch/i.html?_nkw=lga+3647 https://www.ebay.com/sch/i.html?_nkw=lga+4677 But i have no green idea how supported they are.

There are also Xeons with HBM, which would make probably biggest sense to combine them with Optane.
But 64GB is kind of small. They would need at least 8 stacks (like AMD Mi300 192GB) and fast on-package interconnect...
Thanks!

While Optane DIMMs weren't cheap, they were significantly cheaper at launch per GB (5X+) than regular DIMMs, and offered higher capacities:


"Equipping a full server with this kit is pricey, though. Intel outfitted its example configurations with a copious amount of capacity: twelve 128GB sticks weigh in at a total of $6,924. Stepping up to twelve 512GB sticks pushes the needle up to $81,012, but this is relatively tame pricing compared to what you'd pay for an equivalent amount of DRAM.

For instance, even with the currently low DRAM pricing, the highest-capacity 128GB ECC DDR4 sticks are roughly $4,500 apiece, making the ~$850 asking price for Optane look particularly attractive. New 256GB DRAM sticks are coming to market soon, and pricing hasn't been announced, but you can also expect them to land at significantly higher prices than the Optane DIMMs. Finally, there is no option on the near horizon for 512GB DRAM modules."

..

This is why I was curious if they might have been useful for AI applications. Testing would be a bit expensive, but I may do it for science (and then re-ebay the parts when done :) ). The test would be to load up a system with DDR4 ECC DIMMs, and then replace with Optane DIMMs and see if the latency affects token speed at all for various size LLMs.
 
128GB DRAM modules were really expensive but i don't remember being that much. Anyway most of people used lover capacity per module where they got much better prices per GB. On the other hand i don't think Optane was available at that price...

By the way Intel later admitted that they were loosing money like crazy so it would not be sustainable at that price...

This is why I was curious if they might have been useful for AI applications. Testing would be a bit expensive, but I may do it for science (and then re-ebay the parts when done :) ). The test would be to load up a system with DDR4 ECC DIMMs, and then replace with Optane DIMMs and see if the latency affects token speed at all for various size LLMs.
I had same idea but shipping for the board in EU is around 50€ (or 100€+ from US to EU plus tax, duties, fees).
 
Optane was faster than NAND slower than DRAM. plus it was expensive (Pricing was still cheaper than DRAM for large companies). I dont think that is what inference wants (please correct me if i'm wrong).

To broaden the topic.... What is optimal for Inference? Latency,?, bandwidth?, Fast random memory?, fast reads from storage? lots of cheap storage? What would "ideal inference memory system" look like.
 
Gathered more info about the Optane DIMMS and "actual bandwidth":

1. It looks like an Optane DIMM (128GB or 256GB) has "6.8GB/sec" of bandwidth when reading*** 256B at a time (source below). It's pretty easy to build a system with these with 6 to 8 channels of memory, which means ~41GB to ~54GB/sec.​

2. Dual Channel DDR4-3200* by comparison has a theoretical 51GB/sec, though real world measurements show 35-38GB/sec** (with an outlier of 18GB/sec).​

3. For AI inference, contemporary Optane has about 30% of the usable bandwidth per channel of DDR4. (Let's assume DDR5 and Optane future gens would have scaled similarly).​

While it would be very slow (probably below 0.01/sec****) - used Optane DIMMs + server parts would allow you to run Meta's 405B Llama 3.1 or Deepseek-R1's 671B parameter models cheaply. Both "only" require 512GB of RAM. With Optane memory, you could send a query and let it run over night and maybe get something interesting in the morning.

I still might do this as an experiment. I always wondered how Optane DIMM memory worked anyway.

..

Optane 2nd gen specs: https://www.intel.com/content/dam/w...-briefs/optane-dc-persistent-memory-brief.pdf

*Servers were using 2666-2933 DDR4 around the time Optane 2nd gen launched. Desktop PCs were a bit faster.
** ***Grok thinks AI Inference is 99-99.9% reads (Vs writes).

****P.S. Here's the 405B version of Llama3.1 running on a 512GB RAM / 8 channel DDR5 / 96 Core threadripper machine:
. Less than 1 token per second. Video states this is a $50,000 machine.
 
Back
Top