Array
(
    [content] => 
    [params] => Array
        (
            [0] => /forum/index.php?threads/latest-intel-product-presentation.17686/
        )

    [addOns] => Array
        (
            [DL6/MLTP] => 13
            [Hampel/TimeZoneDebug] => 1000070
            [SV/ChangePostDate] => 2010200
            [SemiWiki/Newsletter] => 1000010
            [SemiWiki/WPMenu] => 1000010
            [SemiWiki/XPressExtend] => 1000010
            [ThemeHouse/XLink] => 1000970
            [ThemeHouse/XPress] => 1010570
            [XF] => 2021370
            [XFI] => 1050270
        )

    [wordpress] => /var/www/html
)

Latest Intel Product Presentation

Daniel Nenni

Admin
Staff member
Intel said the 4th-generation Xeon chip used, which is used for data centers, had more than 450 design wins, the most ever for a Xeon chip and was being shipped by more than 50 original equipment manufacturers and original design manufacturers. The 5th-generation Xeon, previously codenamed Emerald Rapids, will be available in the fourth-quarter of the year. Also, the company's low-power chips, known as Sierra Forest, are slated to be delivered in the first-half of 2024. The next-generation chip known as Granite Rapids, which are being tested by customers now, are also set to be released in 2024. Intel also said that the next version of its low-power chip, known as Clearwater Forest, would be available in 2025.



INtel 4th Generation Xeon.jpg
 
Last edited:
The core counts and launch time for Sierra Forest are consistent with the leaks by Moore's Law Is Dead last month:
"1. As for now, the 512 core and 344 core SRF designs are cancelled. A key customer to Intel is demanding that SRF launch by the end of 2024 Q2 in high volume or they will switch to AMD. This is unacceptable.
2. The "Key Customer" is focused on SRF-SP with a requirement of 144 cores. As such, Intel is axing any competing SRF designs that could pull resources away. This makes 144 core SRF-SP a top priority to launch by Q2, and this means any SRF-AP is likely to be limited to 288 cores (2*144 cores tile)"
 
Intel has just been a nightmare, and everything I hear is just more pain. It's true it is the delays that are seriously screwing intel up. And it is only going to get worse in the future. Meteor lake, which is intel 4, then what? there's 3? there's 2.0, 1.8. It just sounds like they're all launching like 6 months apart from one another. Intel 4 by the 4th quarter of 2023, intel 3 by the 2nd quarter of 2024, intel 2.0 by the 4th quarter of 2024, intel 1.8 by the 2nd quarter of 2025... Then they get their fancy new EUV machines and are up and running for intel 1.6 another 6 months later.
 
I fail to see the problem here. If we stick with your schedule, they launch a small CPU "tile" client product and one year later they launch another small die client "tile" on a new node. Like 6 months after MTL intel launches a server product on a full device enabled version of intel 4 with some process improvements (intel 3). Given intel server parts are also disag now, it's not like intel needs to wait for the intel 3 family of nodes to become super mature to launch a SRF or GNR parts (at least compared to things like 28 core skylake which would have required a very mature 14nm to work). What aspect of this cadence is all that different from how intel has been operating for decades (one new gen per year with client coming before xeon)?
 
Why so many cores? Isn't 128 enough in a package? Is this a marketing thing?

Who's in charge here? There are only a few leading edge foundries. Tell that customer to stick it.

The future: Intel data centers.
 
Why so many cores? Isn't 128 enough in a package? Is this a marketing thing?
Cloud computing workloads, among some others like transaction processing, are highly parallel, so lots of cores mean more throughput per server. For these workloads more cores per package generally improves power efficiency and reduces the physical footprint in a datacenter by requiring few server racks. Ampere, Intel's Arm-based competitor for cloud computing, already supports 128 cores per socket in their Altra Max processors. Lots of cores per socket also allows some customers to meet their needs with single-socket servers, which are more efficient than Intel's current reference designs which are mostly dual-socket servers. Ampere has design wins at Microsoft and Oracle, and Oracle is an investor. AMD already supports 94 cores per socket in the Genoa version of EPYC.
 
Doesn't that take away from memory per core, cause bottlenecks on shared memory, and cause thermal issues when many run at the same time? Are you really gaining that much?

Is there a parallel equivalent of Amdahl's law? Perhaps you should create "Mr. Blue's parallel cpu law", whereby the number of CPUs (and temperature) doubles every 2 years.
 
Less memory, more cores. Genius! Memory takes up so much room. It doesn't scale that well. They should just get rid of it.
 
Doesn't that take away from memory per core, cause bottlenecks on shared memory, and cause thermal issues when many run at the same time? Are you really gaining that much?

Is there a parallel equivalent of Amdahl's law? Perhaps you should create "Mr. Blue's parallel cpu law", whereby the number of CPUs (and temperature) doubles every 2 years.
The Altra Max has eight DRR4-3200 memory channels, which equates to 25.6GB/sec of memory throughput per channel, which is enough to satisfy thousands of active application threads, considering typical cache hit rates. This CPU was designed with Arm V8.2 cores, which are more power efficient than any x86 cores, and the Max has a TDP of 250 watts, which is pretty modest for a datacenter chip with 128 cores. Of course, the Ampere uses variable speed clocking like all modern CPUs, so you get lower clock speeds not thermal issues.

The number of cores per socket is going up so quickly because the workloads with thousands of independent threads are more prevalent than they were historically, and increasing at the rate cloud computing is increasing as a proportion of the entire datacenter computing market. Amdahl's Law created a formula for how much an improved specific resource can affect the performance of an overall workload. The bottlenecks in servers are still largely dependent on memory latency and memory bandwidth, which is why some applications tolerate the expense and cost of HBM. And I admit that this fact makes me somewhat skeptical of what I read about CXL 3.0 memory pooling in racks and beyond. Stranded memory in servers is one of the biggest hardware inefficiencies in datacenters, but I suppose we'll have to "Just see what we'll see", to quote a movie script, if memory pooling works well enough to be as prevalent as the CXL promoters predict.
 
Last edited:
I put our layout expert onto creating a 4 CPU 32bit RISC-V tile (WARP-V) last year on 14 and 16nm (real numbers) with our own version of SRAM. Our area was a bit larger than the foundries, but it was really to prove in the flow (we now have P&R) and get educomated on CIM, etc. We know analog well (have some IP), but we suck at CPUs and RTL. We were shocked at how large the area the SRAM (L1) took. I concluded that the reason why so many CPU cores were made is for marketing reasons (as the Count says), since the CPU area was so small compared to the SRAM area.

I called a CPU expert (note again, anybody is an expert compared to me in this area) who started developing CPUs in the late 1960s. He agreed with this assessment, but perhaps we are fossils.
 
This CPU was designed with Arm V8.2 cores, which are more power efficient than any x86 cores
Can I extrapolate that to mean that these RISC-V CPUs can fight the battle at the data centers against Intel and AMD? I assumed they were mostly just for ASICs and other mobile devices.
 
Intel has just been a nightmare, and everything I hear is just more pain. It's true it is the delays that are seriously screwing intel up. And it is only going to get worse in the future. Meteor lake, which is intel 4, then what? there's 3? there's 2.0, 1.8. It just sounds like they're all launching like 6 months apart from one another. Intel 4 by the 4th quarter of 2023, intel 3 by the 2nd quarter of 2024, intel 2.0 by the 4th quarter of 2024, intel 1.8 by the 2nd quarter of 2025... Then they get their fancy new EUV machines and are up and running for intel 1.6 another 6 months later.

I don't agree. Intel is adopting the same process cadence as TSMC which is great for yield learning. Intel 4 will do only CPU chiplets while Intel 3 will do full chips and chiplets. It is just a new naming convention versus doing 14nm, 14nm+, 14nm++ etc... to better match TSMC.

True, Intel is short EUV machines but they need less using this new process methodology. Intel can get companion chiplets from TSMC or Samsung so they can focus on high performance CPU and GPUs. It really is a brilliant strategy.
 
Last edited:
Can I extrapolate that to mean that these RISC-V CPUs can fight the battle at the data centers against Intel and AMD? I assumed they were mostly just for ASICs and other mobile devices.
Eventually, yes. There is one RISC-V datacenter CPU chiplet vendor, Ventana, but I suspect they’re still early in their development journey. Ampere has also said they’re going to move from standard Arm core IP to custom cores to better compete.
 
I put our layout expert onto creating a 4 CPU 32bit RISC-V tile (WARP-V) last year on 14 and 16nm (real numbers) with our own version of SRAM. Our area was a bit larger than the foundries, but it was really to prove in the flow (we now have P&R) and get educomated on CIM, etc. We know analog well (have some IP), but we suck at CPUs and RTL. We were shocked at how large the area the SRAM (L1) took. I concluded that the reason why so many CPU cores were made is for marketing reasons (as the Count says), since the CPU area was so small compared to the SRAM area.

I called a CPU expert (note again, anybody is an expert compared to me in this area) who started developing CPUs in the late 1960s. He agreed with this assessment, but perhaps we are fossils.
Yup. You guys need to enter the 2020s…
 
I put our layout expert onto creating a 4 CPU 32bit RISC-V tile (WARP-V) last year on 14 and 16nm (real numbers) with our own version of SRAM. Our area was a bit larger than the foundries, but it was really to prove in the flow (we now have P&R) and get educomated on CIM, etc. We know analog well (have some IP), but we suck at CPUs and RTL. We were shocked at how large the area the SRAM (L1) took. I concluded that the reason why so many CPU cores were made is for marketing reasons (as the Count says), since the CPU area was so small compared to the SRAM area.

I called a CPU expert (note again, anybody is an expert compared to me in this area) who started developing CPUs in the late 1960s. He agreed with this assessment, but perhaps we are fossils.
Well performance per core does matter of course, since the cloud providers sell offer performance/standard/and efficiency variants of their instances, but if you are a customer you will probably have a application that requires say 4 performance cores, and most of the time you'll pick the cheapest instance that can give you that unless it's a more specialized application that you know performs better on Intel or AMD.
 
Well performance per core does matter of course, since the cloud providers sell offer performance/standard/and efficiency variants of their instances, but if you are a customer you will probably have a application that requires say 4 performance cores, and most of the time you'll pick the cheapest instance that can give you that unless it's a more specialized application that you know performs better on Intel or AMD.
Cloud is a cost play. Each server packs in many customers. Packing hundreds of cores per chip and thousands per rack, done right, improves energy efficiency, physical overheads, footprint, and logistics. Learning to do it right has been a journey. Chips like Sierra Forrest (and AMD's Bergamo) incorporate a lot of these learnings.

These include things like fast multicore fabrics, advanced coherency logic, low latency inter-chiplet links, high throughput DDR5 channels, QoC control of noisy neighbors, advanced security and privacy, and packaging with upwards of 10k pins, liquid cooling, etc. For hyperscalers these translate to lower costs and consumables per core, which is the unit of functionality they sell.
 
Why so many cores? Isn't 128 enough in a package? Is this a marketing thing?

Who's in charge here? There are only a few leading edge foundries. Tell that customer to stick it.

The future: Intel data centers.
The answer is that many cloud players and applications rely on many threads running on a distributed microservices architecture, rather than the monolithic architectures used in the past (and especially for EDA).

 
The answer is that many cloud players and applications rely on many threads running on a distributed microservices architecture, rather than the monolithic architectures used in the past (and especially for EDA).
That is the cloud propaganda and is indeed true for many "cloud native" services which adopted containers from the start. However, the bulk of accounts in the cloud are leased by customers running VMs, usually "lift and shift" from corporate apps. Even many users who have containerized apps run them inside their own VMs rather than using a container service offered by the cloud vendor.

Thus the dominant view of the cloud vendor is how to partition the machines into VMs of whatever size the customer would like to lease. Bin-packing problems are more flexible and have less problems with unusable trim if the bins are large - in other words, if the CPUs have a large and flexible core count.

There is actually some good experience running containers on CPUs with small core counts since containers take just one or two cores and do not share resources. So they have no bin-packing issues. But even for them, the overall cost effectiveness per core is likely to be best on high core-count CPUs.
 
Back
Top