Intel Arc B580

soAsian · Dec 23, 2024

Besides Linus, the Intel Arch B580 received pretty good reviews on price and performance from nearly every hardware reviewer. You see, Intel can get 'lucky' too, like Nvidia, if it's run by a competent board and leadership.

XYang2023 · Dec 23, 2024

Brady · Dec 23, 2024

It’s a nearly unequivocal home run product for Intel during a time where that seems nearly impossible for them to achieve.

I don’t think it’s very good for margins (they might be negative) but you don’t really worry about margins when breaking into a new segment…

Far more important is that the GPU division appears to be executing. Battlemage in Lunar Lake is phenomenal and now this first discrete release of Battlemage is also phenomenal.

This bodes well for the hardware side of Intel’s AI aspirations because of the shared Xe core IP, but as we read from that SemiAnalysis deep dive into MI300X vs H100, hardware only gets you a lane on the race track. Software makes the race car go… is OneAPI going to be able to succeed where ROCm is seemingly still nascent / treading water?

siliconbruh999 · Dec 23, 2024

OneAPI is still better than ROCm though it's supported from CPU/GPU/FPGA similar to CUDA is supported across Nvidia Products

staf · Dec 29, 2024

Brady said:
I don’t think it’s very good for margins (they might be negative) but you don’t really worry about margins when breaking into a new segment…

Don't know as the GPU chip is made by TSMC not Intel's foundry

Xebec · Dec 29, 2024

I bought one for a niece as a Christmas present -- she loves it so far:

(Newegg pre-order, arrived 12/23)

There are also rumors of an "ARC PRO" version of this with 24GB coming in 2025: https://videocardz.com/newz/intel-preparing-arc-pro-battlemage-gpu-with-24gb-memory . This could be a great deal for people running AI or certain other applications locally.

XYang2023 · Dec 29, 2024

I sold my 4090 and purchased the B580. After some testing, I think it has potential. I used it with DaVinci Resolve 19 Pro to edit a 4K timeline, and it got the job done. However, its driver and software support need improvement to compete with NVIDIA's offerings. This should be achievable. Once the 24GB Battlemage model is available, I plan to purchase it and give my current one to my wife for video editing.

Xebec · Dec 30, 2024

XYang2023 said:
I sold my 4090 and purchased the B580. After some testing, I think it has potential. I used it with DaVinci Resolve 19 Pro to edit a 4K timeline, and it got the job done. However, its driver and software support need improvement to compete with NVIDIA's offerings. This should be achievable. Once the 24GB Battlemage model is available, I plan to purchase it and give my current one to my wife for video editing.

Just curious - did you sell the 4090 now to get as much out of it as possible before the 5090 arrives?

re: DaVinci Resolve -- does that app hit both the CPU and GPU when you're editing and rendering? and by "it got the job done" - were there any bugs with the B580 or just a bit slow?

XYang2023 · Dec 30, 2024

Xebec said:
Just curious - did you sell the 4090 now to get as much out of it as possible before the 5090 arrives?

re: DaVinci Resolve -- does that app hit both the CPU and GPU when you're editing and rendering? and by "it got the job done" - were there any bugs with the B580 or just a bit slow?

I roughly broke even with what I spent on the 4090. I won't be getting the 5090 since the leaked pricing seems way too high. However, I am planning to go for the Battlemage 24GB version instead

Regarding DaVinci Resolve, specifically version 19, it allows you to specify the GPU. Since I also use the 14700K, the list includes the iGPU as well. I manually set it to the B580, but it crashed once or twice during editing but timeline data were not lost. It's hard to determine if the crashes were due to version 19 or the B580, as I hadn't used the 4090 with version 19 to compare. The B580 is a bit slower than the 4090, but it doesn’t significantly impact editing performance.

I still have two 3090s for work and also an NVLink. I'm really hoping Intel resolves the PyTorch issues soon so I can use the Battlemage 24GB along with the pair of 3090s for development.

XYang2023 · Dec 30, 2024

The PyTorch issue I raised has been assigned. Clearly, the more people use it, the quicker Intel can prepare its AI software support.

RuntimeError: could not create an engine · Issue #143914 · pytorch/pytorch

🐛 Describe the bug Hi, I experienced the following error (the message before the exception): File c:\Users\xiaoy\anaconda3\envs\llm2\Lib\site-packages\torch\nn\modules\linear.py:125, in Linear.forw...

github.com

As for the notion that Intel is "losing money" with this card, this is unfounded. First, there is no concrete evidence to support this claim, only tangential speculations. Second, this appears to be a strategic investment for Intel, as sizable market share is essential to compete in this sector (graphics/AI). Finally, if Intel wants to make a meaningful impact on cost savings, it should focus on addressing inefficiencies in its manufacturing division, optimizing capital expenditures, reducing redundancies in middle management, and enhancing IP sharing among different business units. Addressing these core issues would improve efficiency far more effectively than focusing on short-term, superficial "fixes" or pseudo-issues, as they might look simple to do.

PG should have right-sized the company from the first day he took over, but he made the matter worse in that regard.

XYang2023 · Jan 3, 2025

Intel B580 GPU Review: Training a LLM/GPT Model in PyTorch (Follow-Up and vs. the Nvidia RTX 3090)

Xebec · Jan 4, 2025

This probably does not affect professional applications, but it looks like ARC Battlemage needs a really strong CPU to get the most out of it:

(Source is HardwareUnboxed, also known as Techspot.com). They have a lot of other games benchmarked showing the same thing (or worse) - but basically the pattern implies the value of B580 is reduced with less CPU performance.

XYang2023 · Jan 4, 2025

x.com

XYang2023 · Jan 4, 2025

I think in this case, Intel B580 is a very good option for new builds.

12400F is a budget CPU from Intel.

Intel 12th Gen Core i5-12400F 4.4 GHz LGA1700 Desktop Processor - Intel CPU | BX8071512400F

Shop now for the Intel 12th Gen Core i5-12400F 4.4GHz LGA1700 Desktop Processor, featuring 6 cores & 12 threads, a 2.50 GHz base clock speed & 18 MB Intel Smart Cache. Enjoy fast dispatch, quick delivery of local stock & Australia wide shipping. Upgrade your PC with this powerful Intel CPU today.

www.centrecom.com.au

Xebec · Jan 4, 2025

XYang2023 said:
Intel B580 GPU Review: Training a LLM/GPT Model in PyTorch (Follow-Up and vs. the Nvidia RTX 3090)

This is a great information video XYang!

For the iGPU issue; you might try messaging Tom A Peterson on Twitter or some other source. "TAP" is their main outreach person for GPUs (he used to work at Nvidia) and if you can catch his eye it might go somewhere. Or alternatively open a bug report (/r/Intel might have some tips on how to do this properly).

I'm kinda curious what limits training performance. I know for running an actual AI (inference?) LLM, memory bandwidth is often more of a limiting factor than CPU or GPU performance. (This is why running Ollama models on a 3090 isn't necessarily slower than a 4090). I'll research this later. I liked that you provided 3 good data points here - B580, 14700K, and 3090.

For training models - could you buy 2 x B580s and get the benefit of more VRAM like you can for actually running models? (I think when you run models, if you get 2 cards, you get effectively 1.6X-1.8X the VRAM available).

Thanks for this video! (Subbed

).

XYang2023 · Jan 4, 2025

Xebec said:
This is a great information video XYang!

For the iGPU issue; you might try messaging Tom A Peterson on Twitter or some other source. "TAP" is their main outreach person for GPUs (he used to work at Nvidia) and if you can catch his eye it might go somewhere. Or alternatively open a bug report (/r/Intel might have some tips on how to do this properly).

I'm kinda curious what limits training performance. I know for running an actual AI (inference?) LLM, memory bandwidth is often more of a limiting factor than CPU or GPU performance. (This is why running Ollama models on a 3090 isn't necessarily slower than a 4090). I'll research this later. I liked that you provided 3 good data points here - B580, 14700K, and 3090.

For training models - could you buy 2 x B580s and get the benefit of more VRAM like you can for actually running models? (I think when you run models, if you get 2 cards, you get effectively 1.6X-1.8X the VRAM available).

Thanks for this video! (Subbed ).

Thank you!

For training on multiple GPUs, you can either:
* Shard the data, or
* Shard the model

For more details, see:

Data Parallelism VS Model Parallelism In Distributed Deep Learning Training

Some Simple Logic and Theory Behind

leimao.github.io

I think data parallelism is more common, but weight updates need to be synchronized. Therefore, NVLink is useful. I have a pair of RTX 3090s and an NVLink bridge.

You can also refer to this discussion:

Does NVLink accelerate training with DistributedDataParallel?

Nvidia's NVLink accelerates data transfer between several GPUs on the same machine. I train large models on such a machine using PyTorch. I see why NVLink would make model-parallel training faster,...

stackoverflow.com

Another way to use multiple GPUs for training is by using a Granite Rapids/Sapphire Rapids-based workstation board. PCIe 5.0 lanes can facilitate inter-GPU communication. I am not sure the performance though.

PCI Express 6.0 Specification | PCI-SIG

pcisig.com

XYang2023 · Jan 4, 2025

Training performance is also affected by batch size selection. I was quite surprised by the results in the GPT-2 training test scenario, which can be extrapolated to problems like fine-tuning. The difference between the RTX 3090 and the B580 is not dramatic. By the way, one difference between the RTX 3090 and the RTX 4090 lies in the latter's support for lower-bit numerical representations. When I first acquired the RTX 4090, I tried to test this feature but couldn’t find a way to do so, likely due to a lack of support in the NVIDIA drivers. Therefore, the actual performance difference between the 3090 and the 4090 for practical use (for me) might not be significant.

XYang2023 · Jan 4, 2025

I think you can use multiple GPUs to serve models relatively easily, as that does not require synchronization. However, I have not tried it yet:

Serving large models with Torchserve — PyTorch/Serve master documentation

Xebec · Jan 4, 2025

XYang2023 said:
Another way to use multiple GPUs for training is by using a Granite Rapids/Sapphire Rapids-based workstation board. PCIe 5.0 lanes can facilitate inter-GPU communication. I am not sure the performance though.

PCI Express 6.0 Specification | PCI-SIG

pcisig.com

Thanks for the articles -- the first link was really interesting with how it's sorta not parallel when training but you do get benefits of multiple GPUs. The extra bandwidth of NVLink makes a lot of sense for this.

I think today the only PCIe 5.0 GPUs available are the new Blackwell professional GPUs, and the future 50 series?

Xebec · Jan 4, 2025

XYang2023 said:
I think you can use multiple GPUs to serve models relatively easily, as that does not require synchronization. However, I have not tried it yet:

Serving large models with Torchserve — PyTorch/Serve master documentation

Very interesting - I need to play with Pytorch and TorchServe. So far I've just been using Ollama (under the Windows subsystem for Linux (WSL), and also on some dedicated Linux machines I have). It's able to effectively split workloads among multiple GPUs or split between GPU and CPU (i.e. you actually get some (small) speedup running a 70B model on a GPU+CPU vs CPU only). This is for inference and not training of course.

You probably already know this stuff - but here's a list of downloadable models, maybe for others interested: https://ollama.com/library . You can choose from many different quantization's within each model for usage too.

Intel Arc B580

Active member

Well-known member

Active member

Well-known member

Member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member