Array
(
    [content] => 
    [params] => Array
        (
            [0] => /forum/index.php?threads/exploring-cpu-pipeline-depths-whats-the-deepest-pipeline-in-modern-cpus.19100/
        )

    [addOns] => Array
        (
            [DL6/MLTP] => 13
            [Hampel/TimeZoneDebug] => 1000070
            [SV/ChangePostDate] => 2010200
            [SemiWiki/Newsletter] => 1000010
            [SemiWiki/WPMenu] => 1000010
            [SemiWiki/XPressExtend] => 1000010
            [ThemeHouse/XLink] => 1000970
            [ThemeHouse/XPress] => 1010570
            [XF] => 2021370
            [XFI] => 1050270
        )

    [wordpress] => /var/www/html
)

Exploring CPU Pipeline Depths: What's the Deepest Pipeline in Modern CPUs?

ref

New member
Dear fellow engineers,

First of all, I would like to thank you in advance for the ones who are willing to help me.

I am a fresh start Ph.D. candidate working with Imec, Belgium (Leuven). For my research, I am excited to explore the potential benefits of 3D monolithic integration by leveraging CMOS 2.0 for complex contemporary CPUs. The idea behind this research is that complex CPUs with many pipeline stages (>12 stages) impact the number of sequential logic in the front end. Considering the FF stages inserted during the BEOL design, the interconnect between the sequential & combinational logic and L-caches could be very complex. Leveraging technologies such as 3D monolithic IC for CMOS 2.0 could potentially benefit both the PPA level and the instruction branch prediction penalty. To my knowledge, there is no exploratory research done with this idea.

To start with, I'm curious to learn about the deepest pipeline stages found in modern CPUs, encompassing both commercially available and research-based designs. While I understand that commercial CPUs often strike a balance between pipeline depth, clock frequency, and architectural complexity, I'm interested in any research or experimental CPU designs that might push the limits in terms of pipeline depth. According to ChatGPT, the AMD Zen 3 core has an estimated 32 pipeline stages as one of the fastest clock CPUs, but as of your knowledge, what is the deepest & most complex instruction pipeline known for modern CPUs, whether it is mobile, high-performance server of experimental CPUs?

I'd love to hear your insights. Please share your knowledge, experiences, or any references you might have regarding the subject.

I would be appreciated for sharing your experiences and knowledge!
Sincerely
 
Last edited:
I am not an expert on all things CPU, but long pipeline CPUs was something of a morbid curiosity of mine a year ago (as well as also loving the concept of CMT). If memory serves NetBurst (30 something stages I think) and Bulldozer (also I think in the high 20s or mid 30s) had REALLY long pipelines. Also given the nature of how RISC ISAs work, my understanding is they often have shorter pipelines. Given that Zen4 and GLC can clock much faster than Zen3 maybe those uArchs have longer pipelines to achieve that task? If conceptual designs are okay, the NetBurst follow on was supposed to have like 40 or 50 stages before it got axed for core.

Hope this small fragment of knowledge is helpful.
 
  • Like
Reactions: ref
I am not an expert on all things CPU, but long pipeline CPUs was something of a morbid curiosity of mine a year ago (as well as also loving the concept of CMT). If memory serves NetBurst (30 something stages I think) and Bulldozer (also I think in the high 20s or mid 30s) had REALLY long pipelines. Also given the nature of how RISC ISAs work, my understanding is they often have shorter pipelines. Given that Zen4 and GLC can clock much faster than Zen3 maybe those uArchs have longer pipelines to achieve that task? If conceptual designs are okay, the NetBurst follow on was supposed to have like 40 or 50 stages before it got axed for core.

Hope this small fragment of knowledge is helpful.
Thank you so much for this knowledge.

I took a look at the NetBurst CPU. I know that there was some effort for hyper pipelined architectures in the early 2000s, which resulted in improved branch predictor performances, yet the longer pipelines were still not the best option in terms of IPC, as you pointed out. I'll try to seize some information for AMD Zen4.

For the sake of my research, I wish to get work on those multi-GHz cores; such as AMD's or ARM's, and hope to get a license for my research!
 
Back
Top