Hmm. Let's think about that for a sec.yes... most people do not know what it exactly means. very confusing. When going through the calculation again, as B580 has some overclocking headroom, it is possible that the theoretical INT4 TOPS can reach 500 TOPS.
From what I've read, the RTX 4090 doesn't really benefit from more than 300W in AI model processing because there isn't enough bandwidth on the card to feed the CUDA cores any faster when running LLMs. The 4090 has (per Google AI) ~ 1320 INT4 TFLOPS, but is effectively reduced to ~ 900 because of bandwidth limitations (300W/450W * 1320 INT4 TFLOPS).
If this ratio holds true for B580, then B580 has enough bandwidth (450GB/sec) for about 400-450 INT4 TOPS (B580 has 45% of the bandwidth of 4090). Overclocking the B580 VRAM would help of course.
I think 2 x 24GB B580s would be pretty good value for LLM usage.