Intel Arc B580

Xebec · Jan 9, 2025

XYang2023 said:
yes... most people do not know what it exactly means. very confusing. When going through the calculation again, as B580 has some overclocking headroom, it is possible that the theoretical INT4 TOPS can reach 500 TOPS.

Hmm. Let's think about that for a sec.

From what I've read, the RTX 4090 doesn't really benefit from more than 300W in AI model processing because there isn't enough bandwidth on the card to feed the CUDA cores any faster when running LLMs. The 4090 has (per Google AI) ~ 1320 INT4 TFLOPS, but is effectively reduced to ~ 900 because of bandwidth limitations (300W/450W * 1320 INT4 TFLOPS).

If this ratio holds true for B580, then B580 has enough bandwidth (450GB/sec) for about 400-450 INT4 TOPS (B580 has 45% of the bandwidth of 4090). Overclocking the B580 VRAM would help of course.

I think 2 x 24GB B580s would be pretty good value for LLM usage.

XYang2023 · Jan 10, 2025

I redid the analysis. Thanks siliconbruh999 for pointing to the right information.

XYang2023 · Jan 12, 2025

XYang2023 said:
I redid the analysis. Thanks siliconbruh999 for pointing to the right information.

A viewer commented on my video with more PyTorch testing:

XYang2023 · Jan 12, 2025

What Intel could do to promote B580 for ML/AI uses:

x.com

XYang2023 · Jan 12, 2025

Intel Arc's sales figure at Mindfactory is close to 5%.

x.com

XYang2023 · Jan 27, 2025

Made a video to share my experience running Deepseek R1 on B580:

Xebec · Jan 27, 2025

XYang2023 said:
Made a video to share my experience running Deepseek R1 on B580:

Good video on the topic. Nice to see it just works fairly easily on the Intel card.

I think the cost model discussion is a good starter on the topic. Of course more privacy locally, but you also have to have a PC you can dedicate to the task on top of the card.

I really liked the comparison of token output speed vs. human reading speed (oral and silent reading) - that gave great perspective.

I am curious if you run that same model on your i5-13500 (CPU only) - what token rate do you get?

XYang2023 · Jan 27, 2025

Xebec said:
Good video on the topic. Nice to see it just works fairly easily on the Intel card.

I think the cost model discussion is a good starter on the topic. Of course more privacy locally, but you also have to have a PC you can dedicate to the task on top of the card.

I really liked the comparison of token output speed vs. human reading speed (oral and silent reading) - that gave great perspective.

I am curious if you run that same model on your i5-13500 (CPU only) - what token rate do you get?

It’s my wife’s PC. To do that, I would need to remove the specific version of Ollama and install a standard version that currently uses the CPU. I might do it later. I’m waiting for a new power supply for my workstation, which will make it easier to handle these tasks.

Xebec · Jan 28, 2025

XYang2023 said:
It’s my wife’s PC. To do that, I would need to remove the specific version of Ollama and install a standard version that currently uses the CPU. I might do it later. I’m waiting for a new power supply for my workstation, which will make it easier to handle these tasks.

Do you know off hand how to force ollama to use the CPU? I tried a while back and kept failing. I couldn't find any instructions that actually worked. I can benchmark Deepseek on my 9800X3D for a data point. (and also an i5-12600KF + DDR4 if interested).

XYang2023 · Jan 28, 2025

Xebec said:
Do you know off hand how to force ollama to use the CPU? I tried a while back and kept failing. I couldn't find any instructions that actually worked. I can benchmark Deepseek on my 9800X3D for a data point. (and also an i5-12600KF + DDR4 if interested).

Are you using an Intel GPU or Nvidia GPU? For Intel GPU, if you get the standard installation of Ollama, I think that will run on CPU by default.

You can see what happens via task manager.

You could try the following suggestion:

In ollama,"/set parameter num_gpu 0"

https://www.reddit.com/r/ollama/comments/1hftbuo/is_there_a_way_to_force_ollama_to_run_on_cpu_only

siliconbruh999 · Jan 28, 2025

XYang2023 said:
Are you using an Intel GPU or Nvidia GPU? For Intel GPU, if you get the standard installation of Ollama, I think that will run on CPU by default.

You can see what happens via task manager.

You could try the following suggestion:

In ollama,"/set parameter num_gpu 0"

https://www.reddit.com/r/ollama/comments/1hftbuo/is_there_a_way_to_force_ollama_to_run_on_cpu_only

Congratulations on your hard work

XYang2023 · Jan 28, 2025

siliconbruh999 said:
Congratulations on your hard work
View attachment 2737

Did you really mean it? Or a bit of sarcasm

I really hope Intel can turn itself around.

siliconbruh999 · Jan 28, 2025

XYang2023 said:
Did you really mean it? Or a bit of sarcasm
I really hope Intel can turn itself around.

Definitely mean it.

I hope they can they have been the biggest risk takers in moving manufacturing and state of the art tech TSMC is very risk averse they first observe than jump unlike Intel which jumps than observes.

I don't want due to few dumb and greedy people the whole Intel suffers

XYang2023 · Jan 30, 2025

https://twitter.com/x/status/1885047334930894898

foft · Feb 2, 2025

Interesting to see the B580 inference videos, many thanks for them. I am keenly waiting for the 24GB B580 to try/potentially replace my Tesla P40.

I was wondering something, the amount of local ram is clearly a constraint for inference. The ram chips themselves are not too expensive, but there is no footprint on the boards/exposed address lines. Old computers used to get around lack of address lines using a register for some extra address bits, i.e. bank switching. Could a bank switched expansion board be made cheaply for B580, or other similar GPU, with a few registered bits and footprints for N ram chip footprints allowing for say 256GB of local ram fairly cheaply? Perhaps clock speed will be too much of a constraint to make the traces long enough?

Xebec · Feb 2, 2025

foft said:
Interesting to see the B580 inference videos, many thanks for them. I am keenly waiting for the 24GB B580 to try/potentially replace my Tesla P40.

I was wondering something, the amount of local ram is clearly a constraint for inference. The ram chips themselves are not too expensive, but there is no footprint on the boards/exposed address lines. Old computers used to get around lack of address lines using a register for some extra address bits, i.e. bank switching. Could a bank switched expansion board be made cheaply for B580, or other similar GPU, with a few registered bits and footprints for N ram chip footprints allowing for say 256GB of local ram fairly cheaply? Perhaps clock speed will be too much of a constraint to make the traces long enough?

Clock speed is the main problem with this approach on modern computers. The traces / wires will be too long to maintain signal integrity. Unfortunately.

XYang2023 · Feb 4, 2025

foft said:
Interesting to see the B580 inference videos, many thanks for them. I am keenly waiting for the 24GB B580 to try/potentially replace my Tesla P40.

I was wondering something, the amount of local ram is clearly a constraint for inference. The ram chips themselves are not too expensive, but there is no footprint on the boards/exposed address lines. Old computers used to get around lack of address lines using a register for some extra address bits, i.e. bank switching. Could a bank switched expansion board be made cheaply for B580, or other similar GPU, with a few registered bits and footprints for N ram chip footprints allowing for say 256GB of local ram fairly cheaply? Perhaps clock speed will be too much of a constraint to make the traces long enough?

https://twitter.com/x/status/1886879299908853787

XYang2023 · Feb 5, 2025

I just purchased my second Intel Arc B580, a factory overclocked version. It is in stock.

XYang2023 · Feb 7, 2025

https://twitter.com/x/status/1887883012576313532

XYang2023 · Feb 7, 2025

Intel Arc B580

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

New member

Well-known member

Well-known member

Well-known member

Attachments

Well-known member

Well-known member