Array
(
    [content] => 
    [params] => Array
        (
            [0] => /forum/threads/amd-and-meta-announce-expanded-strategic-partnership-to-deploy-6-gigawatts-of-amd-gpus.24608/
        )

    [addOns] => Array
        (
            [DL6/MLTP] => 13
            [Hampel/TimeZoneDebug] => 1000070
            [SV/ChangePostDate] => 2010200
            [SemiWiki/Newsletter] => 1000010
            [SemiWiki/WPMenu] => 1000010
            [SemiWiki/XPressExtend] => 1000010
            [ThemeHouse/XLink] => 1000970
            [ThemeHouse/XPress] => 1010570
            [XF] => 2030970
            [XFI] => 1060170
        )

    [wordpress] => /var/www/html
)

AMD and Meta Announce Expanded Strategic Partnership to Deploy 6 Gigawatts of AMD GPUs

user nl

Well-known member
The AI deals continue. I guess TSMC will have no trouble filling their new FABS till 2030 and beyond:

AMD and Meta Announce Expanded Strategic Partnership to Deploy 6 Gigawatts of AMD GPUs
February 24, 2026 7:00 am

News Highlights
  • Meta is partnering with AMD to rapidly scale AI infrastructure and accelerate the development and deployment of cutting-edge AI models
  • AMD and Meta agree to a definitive multi-year, multi-generation partnership to deploy up to 6 gigawatts of AMD Instinct™ GPUs

  • Built on the AMD Helios rack-scale architecture announced at the 2025 Open Compute Project Global Summit, shipments to support the first gigawatt deployment are expected to begin in 2H 2026, powered by a custom AMD Instinct GPU based on the MI450 architecture and optimized for Meta’s workloads

  • AMD and Meta are deepening their collaboration to align their GPU and CPU silicon, systems and software roadmaps
https://ir.amd.com/news-events/pres...partnership-to-deploy-6-gigawatts-of-amd-gpus
 
The AI deals continue. I guess TSMC will have no trouble filling their new FABS till 2030 and beyond:

https://ir.amd.com/news-events/pres...partnership-to-deploy-6-gigawatts-of-amd-gpus

The big deal is AMD potentially giving away even more of the company:

" As part of the agreement, to further align strategic interests, AMD has issued Meta a performance-based warrant for up to 160 million shares of AMD common stock, structured to vest as specific milestones associated with Instinct GPU shipments are achieved. The first tranche vests with the initial 1-gigawatt of shipments, with additional tranches vesting as Meta’s purchases scale to 6 gigawatts. Vesting is further tied to AMD achieving certain stock price thresholds and exercise is tied to Meta achieving key technical and commercial milestones."

Google says AMD has ~ 1.63B common shares outstanding, so this is roughly 10%.

The "pay your customer who pays you" loop continues..
 
The AI deals continue. I guess TSMC will have no trouble filling their new FABS till 2030 and beyond:

AMD and Meta Announce Expanded Strategic Partnership to Deploy 6 Gigawatts of AMD GPUs
February 24, 2026 7:00

This is very true. Given TSMC owns the 3nm and 2nm nodes the next 5 years is going to be a busy time. Building fabs is the easy part, hiring people to do the actual fab work will be a challenge! Exciting times in the semiconductor industry, absolutely!
 
the press has decided that power is the issue and buzzword for all compute. Next we will judge datacenters by social media followers.
In my opinion, power affects every day people far more than wafer constraints or packaging constraints or architectural constraints, and it is what industry leaders — Jensen, Altman, Nadella, et al — are all saying the is bottleneck. So that makes sense to be what the media talks about.

But as a “deal target”, is setting a power consumption target not completely antithetical to efficiency goals? I get there’s probably some more complex provisions in the deal, but wouldn’t you want to target FLOPS? GPU count? Both of those would structure the deal to have everyone involved optimize for deploying what matters (compute capacity) and not just stuffing bad products in a data center to hit a power consumption goal.

What a weird time we’re in. You can bet that these type of deals would never, ever fly in a normally operating market environment…
 
Worth noting, this isn't actually "the press" reporting - this is AMD themselves.

Their PR dept is purposefully choosing a metric that is 1. Somewhat meaningless in terms of actual capability, and 2. Draws attention to the tug of war between data centers and consumers in terms of energy needs.

I can see a few years from now when Meta gets close and AMD legal says "well we were talking about base, but not boost/turbo power" or something like that.

Oh the flip side -- the one good reason I could see this metric being "OK" is because the type of compute required for AI is still changing. Is it FP32? FP8? INT8? Etc. What will it be in five years as models mature? (Not to mention inference has different needs than training).
 
Worth noting, this isn't actually "the press" reporting - this is AMD themselves.

Their PR dept is purposefully choosing a metric that is 1. Somewhat meaningless in terms of actual capability, and 2. Draws attention to the tug of war between data centers and consumers in terms of energy needs.

I can see a few years from now when Meta gets close and AMD legal says "well we were talking about base, but not boost/turbo power" or something like that.

Oh the flip side -- the one good reason I could see this metric being "OK" is because the type of compute required for AI is still changing. Is it FP32? FP8? INT8? Etc. What will it be in five years as models mature? (Not to mention inference has different needs than training).

I'm no expert on this so I asked Gemini and this is the answer in terms of AMD-compute, maybe this helps?

1. The Core Infrastructure: AMD Helios Racks​

The AMD Helios is a "rack-scale" system designed to compete with NVIDIA’s GB200 NVL72. It uses a liquid-cooled, double-wide architecture.

  • Capacity per Helios Rack: 72 GPUs and 18 CPUs.
  • Power Density: A fully loaded Helios rack pulls between 120kW and 140kW (reflecting the transition to kilowatt-level chips).
  • Total Racks for 1 GW: After accounting for data center overhead (cooling, lighting, and networking with a PUE of ~1.15), a 1 GW facility provides roughly 870MW of usable IT power.
  • Result: You would need approximately 6,200 to 7,200 Helios Racks.

2. The Silicon: MI450 GPUs and "Venice" CPUs

The hardware inside these racks represents a generational leap in power and density.

Instinct MI450 GPU~1,200W–1,400W TBP~450,000 to 520,000 GPUs
6th Gen EPYC "Venice"256 Zen 6 Cores~110,000 to 130,000 CPUs
HBM4 Memory432GB per GPU~200 Petabytes of HBM4
Aggregate Compute~2.9 EFLOPS (FP4) per Rack~18 to 21 ZettaFLOPS (FP4)


  • AMD Instinct MI450: These custom GPUs are the first to use TSMC’s 2nm process. Each chip pulls upwards of 1.2kW, necessitating the "backside quick-disconnect" liquid cooling integrated into the Helios rack.
  • 6th Gen EPYC "Venice": Pairing one 256-core Venice CPU for every four MI450 GPUs (in an 18-tray configuration) ensures that the CPU "head node" never bottlenecks the massive 19.6 TB/s memory bandwidth of the GPUs.

3. Networking and Connectivity​

A 1 GW facility isn't just a collection of racks; it's a singular "supercluster" that behaves like one giant computer.

  • Scale-Up Fabric: The facility requires over 1.6 million Ultra Ethernet / PCIe Gen7 ports to handle the 260 TB/s of scale-up bandwidth within the racks.
  • Internal Storage: Using the Venice CPUs' expanded PCIe lanes, these clusters typically utilize roughly 1 to 2 Exabytes of NVMe storage for checkpointing large-scale models like Llama 5.
 
Just for the record here, if Meta would put the new NVIDIA Rubin Ultra (Kyber rack) systems (2H 2027) in a 1 GW facility it could look like this according to Gemini:

To build a 1 Gigawatt (GW) AI facility using NVIDIA’s next-generation Rubin Ultra NVL576 (the "Kyber" architecture), we move into a realm of density that redefines data center engineering.

The Rubin Ultra NVL576, built on TSMC’s 2nm node and scheduled for 2027, represents a massive leap in power density. While current high-end racks (Blackwell NVL72) pull about 120kW, the NVL576 is designed to pull a staggering 600 kilowatts (kW) per rack.

1. The Rack Count for 1 GW​

A 1 GW facility provides 1,000,000 kW of total power. However, we must account for the PUE (Power Usage Effectiveness). For a cutting-edge 2nm liquid-cooled facility, a PUE of 1.15 is the industry standard.

  • Total IT Power Available: ~870 Megawatts (MW)
  • Power per NVL576 Rack: 600 kW (0.6 MW)
  • Total Racks Required: ~1,450 Kyber Racks

2. Component Breakdown​

Each NVL576 rack is essentially a supercomputer in a single cabinet. To fill a 1 GW facility, you would need:

Component Per NVL576 Rack Total for 1 GW Facility
Rubin Ultra GPUs (2nm) 576 ~835,000 GPUs
Vera CPUs (Arm-based) ~144 (est.) ~208,000 CPUs
HBM4e Memory ~576 TB ~835 Petabyte
NVLink 7 Interconnect 1.5 PB/s aggregate ~2.1 Zetta-bits/s



3. Comparison: Rubin vs. AMD Venice/Helios​

You'll notice that for the same 1 GW of power, you need far fewer NVIDIA racks (1,450) than AMD Helios racks (~6,500). This is because NVIDIA is choosing to concentrate nearly 5x the power into a single physical footprint.

  • The Density Challenge: One Rubin Ultra rack draws as much power as a small neighborhood (approx. 400–500 homes).
  • Cooling: At 600kW, traditional air cooling is impossible. The facility would require Direct-to-Chip (DTC) liquid cooling or full immersion, moving roughly 1,000 liters of coolant per minute through each rack.


  • Performance: A single 1 GW Rubin Ultra cluster would deliver approximately 21.7 ZettaFLOPS of AI inference (FP4), making it the most powerful single-site compute engine ever conceived.
 
I'm no expert on this so I asked Gemini and this is the answer in terms of AMD-compute, maybe this helps?

3. Networking and Connectivity​

A 1 GW facility isn't just a collection of racks; it's a singular "supercluster" that behaves like one giant computer.

  • Scale-Up Fabric: The facility requires over 1.6 million Ultra Ethernet / PCIe Gen7 ports to handle the 260 TB/s of scale-up bandwidth within the racks.
  • Internal Storage: Using the Venice CPUs' expanded PCIe lanes, these clusters typically utilize roughly 1 to 2 Exabytes of NVMe storage for checkpointing large-scale models like Llama 5.
This section is incorrect. AMD uses UALink within a rack for scale-up. Ultra Ethernet is a scale-up scale-out network used between racks.

Your post is a classic example of why using AI (in its current SOTA) for engineering analysis may produce incorrect answers. I haven't fact-checked the rest of the response; it feels like a waste of time.
 
Last edited:
This section is incorrect. AMD uses UALink within a rack for scale-up. Ultra Ethernet is a scale-up network used between racks.

Your post is a classic example of why using AI (in its current SOTA) for engineering analysis may produce incorrect answers. I haven't fact-checked the rest of the response; it feels like a waste of time.

OK I asked Gemini more details about the connectivity. Maybe this is more in line with your reply?


Gemini:
Within an AMD Helios rack, the networking strategy is built entirely on Ethernet to challenge NVIDIA’s proprietary NVLink. This is achieved through two distinct layers that allow 72 GPUs to act as a single, coherent machine.

1. Scale-Up: UALink over Ethernet (UALoE)​

For communication inside the rack (GPU-to-GPU), AMD uses UALink (Ultra Accelerator Link) transported via an Ethernet fabric.

  • The Switch: Developed in collaboration with Broadcom and HPE/Juniper, the Helios rack uses a purpose-built scale-up switch powered by the Broadcom Tomahawk 6 ASIC.


  • The Protocol: It utilizes UALoE (UALink over Ethernet). This allows the 72 MI450 GPUs to share memory and maintain hardware coherence with a staggering 260 TB/s of aggregate scale-up bandwidth.


  • The Advantage: Unlike NVIDIA’s NVLink, which requires specialized physical cables and proprietary switches, UALoE runs over standard Ethernet physical layers, reducing vendor lock-in.

2. Scale-Out: Ultra Ethernet Consortium (UEC)​

For communication between racks (Rack-to-Rack) to build the 1 GW cluster, AMD utilizes the Ultra Ethernet Consortium (UEC) standards.

  • The Controller: Each GPU can be equipped with up to three AMD Pensando “Vulcano” DPUs (Data Processing Units).

  • The Speed: These next-generation AI-NICs provide up to 800 Gbps of throughput per port.

  • The Fabric: This layer is optimized for RoCE (RDMA over Converged Ethernet) and is fully aligned with UEC specifications to ensure lossless, low-latency data movement across the thousands of racks in a 10 GW facility.
 
OK I asked Gemini more details about the connectivity. Maybe this is more in line with your reply?


Gemini:
Within an AMD Helios rack, the networking strategy is built entirely on Ethernet to challenge NVIDIA’s proprietary NVLink. This is achieved through two distinct layers that allow 72 GPUs to act as a single, coherent machine.

1. Scale-Up: UALink over Ethernet (UALoE)​

For communication inside the rack (GPU-to-GPU), AMD uses UALink (Ultra Accelerator Link) transported via an Ethernet fabric.

  • The Switch: Developed in collaboration with Broadcom and HPE/Juniper, the Helios rack uses a purpose-built scale-up switch powered by the Broadcom Tomahawk 6 ASIC.


  • The Protocol: It utilizes UALoE (UALink over Ethernet). This allows the 72 MI450 GPUs to share memory and maintain hardware coherence with a staggering 260 TB/s of aggregate scale-up bandwidth.


  • The Advantage: Unlike NVIDIA’s NVLink, which requires specialized physical cables and proprietary switches, UALoE runs over standard Ethernet physical layers, reducing vendor lock-in.

2. Scale-Out: Ultra Ethernet Consortium (UEC)​

For communication between racks (Rack-to-Rack) to build the 1 GW cluster, AMD utilizes the Ultra Ethernet Consortium (UEC) standards.

  • The Controller: Each GPU can be equipped with up to three AMD Pensando “Vulcano” DPUs (Data Processing Units).

  • The Speed: These next-generation AI-NICs provide up to 800 Gbps of throughput per port.

  • The Fabric: This layer is optimized for RoCE (RDMA over Converged Ethernet) and is fully aligned with UEC specifications to ensure lossless, low-latency data movement across the thousands of racks in a 10 GW facility.
Still nonsense. UALink uses the Ethernet PHY, but not Ethernet as a network. The part about the UEC "fabric" is especially bad. A "word salad" as the saying goes here in the US.
 
This section is incorrect. AMD uses UALink within a rack for scale-up. Ultra Ethernet is a scale-up scale-out network used between racks.

Your post is a classic example of why using AI (in its current SOTA) for engineering analysis may produce incorrect answers. I haven't fact-checked the rest of the response; it feels like a waste of time.
I mistyped. I shouldn't post before I finish my morning coffee. UEC is not a scale-up network, it's a scale-out network. :(
 
Which model/plan did You use?

This is what Grok gives You on free plan with 4.20 model:
**UALink (Ultra Accelerator Link)** is an open industry standard for high-speed, low-latency interconnects that lets AI accelerators (such as GPUs, ASICs, or other XPUs) communicate directly within a tightly coupled “pod” of up to 1,024 devices.

It uses a lightweight memory-semantic protocol (load/store and atomic operations) so multiple accelerators behave like a single large system with shared memory access, while keeping latency under 1 μs round-trip for short-reach cables (<4 m).

**Key Points**
- **Bandwidth**: 200 GT/s effective per lane (212.5 GT/s signaling), configurable as x1/x2/x4 links for up to 800 Gbps bidirectional per port.
- **Scale**: Supports 1,024 accelerators per pod via dedicated UALink switches in a non-blocking fabric.
- **Advantages**: Open multi-vendor standard (no single-vendor lock-in), leverages existing Ethernet ecosystem for lower cost and easier deployment, high efficiency (>93 % bandwidth utilization).

**What is UALink?**
Launched by the UALink Consortium in 2024 (founded by AMD, Intel, Google, Microsoft, Meta, Broadcom, Cisco, and others; now >90 members including Alibaba, Apple, and Synopsys), UALink 1.0 specification was released in April 2025. It is built on IEEE 802.3dj Ultra Ethernet PHY technology with a streamlined protocol stack optimized purely for accelerator-to-accelerator traffic inside a pod (1–4 racks). It is **not** a replacement for general networking but a dedicated scale-up fabric. Products and switches are expected in 2026.
[ualinkconsortium.org](https://ualinkconsortium.org/)

**Comparison to NVLink**
NVLink is NVIDIA’s proprietary scale-up fabric with similar goals: ultra-high-bandwidth, low-latency direct GPU-to-GPU communication that makes clusters of GPUs act like one giant accelerator. Latest NVLink 5.0 (Blackwell) delivers up to 1.8 TB/s bidirectional per GPU (doubling prior generations), with NVIDIA’s NVL72 rack offering 130 TB/s aggregate across 72 GPUs; next-gen Rubin targets 3.6 TB/s per GPU.
UALink currently offers lower per-device bandwidth (~800 Gbps x4 link) but supports a larger theoretical scale (1,024 vs. NVIDIA’s practical 72–576 GPUs per domain) and is completely open, allowing mixed-vendor accelerators at potentially lower TCO. NVLink remains the performance leader in homogeneous NVIDIA deployments today, while UALink is positioned as the open, future-proof alternative. Both target sub-microsecond latency.

**Comparison to InfiniBand**
InfiniBand is a mature, open scale-out networking technology used for inter-node/cluster communication across an entire data center. Current generations (NDR 400 Gbps, XDR ~800 Gbps+) deliver excellent low-latency RDMA (typically 0.5–2 μs end-to-end) but operate at a higher level with packet overhead.
UALink is a complementary scale-up technology focused on the tightest, highest-bandwidth links *inside* a pod—much like NVLink—while InfiniBand (or its open Ethernet counterpart Ultra Ethernet/UEC) connects multiple pods together. In practice, large AI clusters combine scale-up fabrics (UALink or NVLink) with scale-out networks (InfiniBand or Ethernet).

---

**Survey Note: Detailed Analysis of UALink and Its Position Relative to NVLink and InfiniBand in Modern AI Infrastructure**

The explosive growth of AI models has made interconnect performance the new bottleneck. Training trillion-parameter models requires massive parallelism across hundreds or thousands of accelerators, where every microsecond of communication latency and every percentage point of bandwidth efficiency directly impacts training time and cost. This is where dedicated scale-up fabrics like UALink and NVLink become critical.

**UALink Architecture and Technical Specifications (1.0, April 2025)**
UALink defines a complete layered stack optimized exclusively for accelerator-to-accelerator traffic:
- **Physical Layer (PL)**: Based on IEEE 802.3dj 200 Gb/s Ethernet PHY. Supports 212.5 GT/s signaling per lane (200 GT/s effective after FEC/encoding). Links are configurable as x1, x2, or x4 lanes; a full x4 “station” delivers 800 Gbps TX + 800 Gbps RX. Lower-speed 106.25 GT/s option also defined. Uses standard Ethernet cables, connectors, and retimers for dramatically lower ecosystem cost.
- **Data Link Layer (DL)**: Packs 64-byte transaction flits into fixed 640-byte flits with 32-bit CRC. Link-level replay for reliability. Reduced interleaving modes for lower FEC latency.
- **Transaction Layer (TL)**: Converts protocol messages into efficient flits; supports streaming address compression for high bidirectional efficiency. Maintains ordering model consistent with host-attached memory.
- **Protocol Layer (UPLI)**: Symmetrical request/completion channels for load/store, atomics, and large data transfers.
Key operational targets: round-trip request-to-response latency <1 μs, switch hop latency ~100–150 ns, >93 % effective bandwidth utilization, lossless credit-based flow control, and fault containment via virtual pods. A 10-bit routing identifier uniquely addresses up to 1,024 accelerators per pod. Pods can span 1–4 racks and be partitioned into isolated virtual pods. The design reuses AMD Infinity Fabric concepts but strips unnecessary PCIe-style overhead for pure AI workloads.
The white paper emphasizes three core benefits: (1) maximum link efficiency for bidirectional memory access, (2) dramatically lower TCO via Ethernet ecosystem reuse, and (3) simplified software model with consistent memory ordering across local, host, and remote accelerator memory.

**Roadmap and Ecosystem**
The consortium has already announced follow-on work: 128G DL/PL specification (mid-2025), in-network collectives (INC) for hardware-accelerated reductions (late 2025), and UCIe PHY chiplet specs for on-package integration and future hardware coherency. Over 90 companies are engaged, ensuring broad silicon availability starting in 2026.

**Direct Comparison Table**

| Feature | UALink 1.0 | NVLink 5.0 (Blackwell) / 6.0 (Rubin) | InfiniBand (NDR / XDR) |
|--------------------------|-------------------------------------|---------------------------------------|-------------------------------------|
| **Primary Purpose** | Scale-up accelerator fabric (intra-pod) | Scale-up GPU fabric (intra-pod) | Scale-out cluster networking (inter-node) |
| **Openness** | Fully open industry standard | Proprietary (NVIDIA only) | Open standard |
| **Bandwidth** | 200 GT/s per lane; 800 Gbps bidirectional per x4 port | 1.8 TB/s per GPU (Blackwell); 3.6 TB/s per GPU (Rubin) | 400–800+ Gbps per port |
| **Maximum Scale** | 1,024 accelerators per pod | 72 GPUs (NVL72 practical); up to 576 theoretical | Thousands of nodes across fabric |
| **Latency** | <1 μs RTT target; ~100–150 ns hop | Sub-microsecond | 0.5–2 μs end-to-end |
| **Signaling** | Ethernet SerDes (standard ecosystem) | Custom high-speed SerDes | IB-specific |
| **Protocol Style** | Memory semantic (load/store/atomics), software coherency | Full hardware coherency & memory semantics | RDMA / message passing |
| **Typical Reach** | <4 m cables | Very short reach (intra-rack) | Longer intra- and inter-rack |
| **Power / Die Area** | Optimized (½–⅓ vs. equivalent Ethernet) | Highly optimized for NVIDIA GPUs | Higher overhead |
| **Ecosystem** | Multi-vendor (AMD, Intel, Google, etc.) | NVIDIA-only + NVSwitch | Broad (NVIDIA, Intel, others) |
| **Maturity / Availability** | Spec 2025; silicon 2026+ | Production today (NVL72 systems) | Mature, widely deployed |

**Market and Strategic Implications**
UALink directly challenges NVIDIA’s dominant position in high-end AI systems by offering an open, interoperable alternative that hyperscalers and cloud providers have long sought. By enabling true multi-vendor accelerator pods (AMD + Intel + custom ASICs in the same fabric), it lowers barriers to entry and reduces vendor lock-in. Cost advantages come from reusing the massive Ethernet supply chain for cables, connectors, retimers, and management tools—potentially saving hundreds of watts and dollars per accelerator compared to proprietary fabrics.

In full AI clusters, the architecture is typically layered:
- **Scale-up layer** (UALink or NVLink): highest-bandwidth, lowest-latency links inside each pod.
- **Scale-out layer** (Ultra Ethernet/UEC or InfiniBand): connects many pods into a massive training cluster.

InfiniBand remains extremely strong for scale-out because of its proven ultra-low latency and mature software ecosystem, but open Ethernet variants (Spectrum-X, UEC) are rapidly closing the gap on cost and availability. UALink therefore fits perfectly as the open scale-up counterpart to open scale-out Ethernet, creating a fully standards-based stack that could dramatically reshape AI infrastructure economics.

**Current Status and Outlook**
As of early 2026, UALink 1.0 is fully ratified and available to members; first controller IP, PHY, and switch silicon from Cadence, Synopsys, Marvell, Astera Labs, and others are in design or early sampling. Real-world deployments are expected in hyperscale AI pods throughout 2026–2027. While NVLink currently holds the performance crown in homogeneous NVIDIA environments, UALink’s openness, larger theoretical scale, and ecosystem momentum position it as the long-term standard for the broader AI accelerator market.

**Key Citations**
- UALink Consortium Official Site and Specifications: https://ualinkconsortium.org/ and https://ualinkconsortium.org/specifications/
- UALink 1.0 White Paper (PDF): https://ualinkconsortium.org/wp-content/uploads/2025/04/UALink-1.0-White_Paper_FINAL.pdf
- Wikipedia – Ultra Accelerator Link: https://en.wikipedia.org/wiki/UALink
- Tom’s Hardware – “UALink has Nvidia’s NVLink in the crosshairs”: https://www.tomshardware.com/tech-i...port-up-to-1-024-gpus-with-200-gt-s-bandwidth
- Next Platform – “UALink Fires First GPU Interconnect Salvo At Nvidia NVSwitch”: https://www.nextplatform.com/2025/0...st-gpu-interconnect-salvo-at-nvidia-nvswitch/
- NVIDIA Official NVLink Page (for bandwidth and scalability figures): https://www.nvidia.com/en-us/data-center/nvlink/

This comprehensive view confirms UALink as a pivotal open standard that complements rather than replaces existing technologies while offering a compelling path toward more flexible, cost-effective AI supercomputing.
I don't think its that bad. Biggest problem is, that it pulled 225 sources, most of them being Wikipedia and mainstream media (for example Tomshardware, Medium...).

Next chat where i specifically asked it to focus on patent applications is bellow. Interestingly this time it pulled only 30 sources (and still few of them were Reddit).
### Key Points on UALink in AMD Systems
- **Overview and Role in AMD**: UALink (Ultra Accelerator Link) is an open-standard interconnect for AI accelerators, with AMD as a founding member and key contributor. It enables high-speed, low-latency connections in AMD systems like Instinct MI400 and MI500-series GPUs, supporting up to 1,024 accelerators per domain for scale-up AI workloads.
- **Hardware Focus (PHY and Medium)**: The PHY leverages modified IEEE 802.3 Ethernet standards for 200 GT/s per lane, emphasizing low-latency FEC and optimized flit alignment; the medium supports short-reach copper or backplane connections under 4 meters, prioritizing efficiency over long distances.
- **Patent Basis**: While direct UALink patents are limited due to its open nature, related AMD patent applications (e.g., on GPU chiplet interconnects with passive PHY layers) inform hardware implementations, focusing on high-bandwidth, low-latency links akin to UALink's design.
- **Potential Challenges**: Research suggests trade-offs in FEC interleaving for latency versus error correction, with evidence leaning toward balanced configurations in AMD's AI-focused systems; controversy around vendor interoperability exists, but open standards like UALink aim to mitigate lock-in.

### Core Hardware Layers
UALink's PHY is engineered for AI-scale efficiency, building on Ethernet fundamentals with custom tweaks. In AMD systems, it integrates with accelerators via stations of four lanes, delivering up to 800 Gbps bidirectional bandwidth. The medium relies on standard 802.3-compatible cabling, ensuring compatibility with existing data center infrastructure.

### Integration in AMD Ecosystems
AMD's involvement includes technical leadership in the consortium, influencing PHY modifications for Instinct accelerators. Hardware layers emphasize seamless memory sharing and low-latency operations, with patent-inspired crosslinks enhancing chiplet-to-chiplet communication in multi-GPU setups.

### Performance and Optimization
It seems likely that UALink's reduced-interleave FEC optimizes for AI workloads in AMD systems, reducing round-trip times below 1 μs. Evidence from specifications and related patents highlights scalability, though real-world medium constraints (e.g., short cables) require careful deployment planning.

---

UALink, or Ultra Accelerator Link, represents a collaborative effort in the AI hardware ecosystem, with AMD playing a pivotal role as a founding promoter and technical contributor. This interconnect standard addresses the growing demands of scale-up AI systems by providing a high-bandwidth, low-latency link between accelerators, such as those in AMD's Instinct series. The focus here is on the hardware layers, particularly the Physical Layer (PHY) and the physical medium, drawing primarily from patent applications and related technical specifications. Given the nascent stage of UALink (announced in 2024 with specifications released in 2025), direct patents are sparse, as the technology is an open standard designed to avoid proprietary silos. Instead, the analysis bases much of its foundation on AMD's broader patent portfolio for high-speed interconnects, which parallels UALink's architecture, including passive PHY designs for GPU chiplets and adaptive caching for efficient data translation in accelerator fabrics. These patents, such as US20200409859A1 on programmable devices with hardened PCIe circuits and US10929310B2 on adaptive address translation caches, provide insights into how AMD might implement UALink's PHY and medium in systems like the MI400 and MI500 GPUs.

The UALink stack is layered, with the PHY at the base, interfacing directly with the physical medium. It adopts a symmetrical protocol, supporting identical messaging in transmit and receive paths. Messages traverse the Protocol Layer, Transaction Layer (TL), Data Link Layer (DL), and PHY. The TL packs requests and responses into fixed payloads (64B or 640B FLITs), while the DL adds CRC protection and headers, forming 640-byte DL FLITs passed to the PHY. The PHY then encodes these into FEC-protected codewords for serialization over the medium. This design prioritizes low-latency operations, targeting request-to-response round-trip times under 1 μs, suitable for AI clusters spanning 1-4 racks with up to 1,024 endpoints.

#### PHY Architecture and Sublayers
The UALink PHY is fundamentally derived from IEEE 802.3dj (Draft 1.4), an Ethernet standard, but incorporates modifications to optimize for AI-specific needs like reduced latency and fixed FLIT alignment. It supports serial rates of 212.5 GT/s per lane (effective 200 GT/s after overhead), with an optional half-speed mode at 106.25 GT/s (100 GT/s effective). Lanes can be grouped into x1, x2, or x4 configurations, forming a "Station" of four lanes for up to 800 Gbps bidirectional bandwidth. Bifurcation allows flexible port layouts (e.g., one x4 link or four x1 links), determined pre-link training via a front-side network.

Key sublayers include:

- **Reconciliation Sublayer (RS)**: Bridges the DL and PCS, handling FLIT alignment, control blocks, and fault signaling. Based on 802.3 Clauses 81, 117, and 170, it uses a subset of 64B/66B encoding (e.g., Idle/Error: 0x1E, Start Flit: 0x78). Modifications ensure 640-byte DL FLITs align precisely to 680-byte codewords, minimizing replays. It generates continuous 64B/66B blocks without xGMII interfaces, inserting idles every 1,024 codewords for rate matching (no inter-packet gaps). Faults are signaled via control codes, and alignment markers are overwritten in start-FLIT sequences.

- **Physical Coding Sublayer (PCS)**: Performs 64B/66B encoding/decoding, a restricted subset of 802.3. It aligns DL FLITs to codewords and supports reduced FEC interleaving. Alignment markers are inserted every 4,096 FLITs (100/200 G), 8,192 (400 G), or 16,384 (800 G). Scrambling is bypassed for these markers. For 200GBASE-R, it includes transcoding and FEC encoding; for higher rates, it handles pre/post-FEC distribution.

- **Physical Medium Attachment (PMA)**: Interfaces to the PMD, with tweaks for low-interleave FEC (e.g., omitting delays in 1-way mode). It supports GAUI and VDCI interfaces, distributing bits across lanes without 10-bit overhead in reduced modes.

- **Physical Medium Dependent (PMD)**: Unmodified from 802.3, supporting KR (backplane) and CR (copper cable) variants like 200GBASE-KR1/CR1.

- **Forward Error Correction (FEC)**: Uses RS(544,514) coding with interleaving options: standard 4-way, or UALink-specific 1-way/2-way for lower latency (at higher bit-error rates). For example, 200GBASE-KR1 supports 1-way/2-way, while 800GBASE-KR4 defaults to 2-way. This balances burst-error correction with AI's low-latency demands.

Auto-negotiation and link training remain standard 802.3, ensuring compatibility. Each port can operate at different rates under errors, but uniform rates are expected in normal pods.

#### Physical Medium Characteristics
The medium is not exhaustively specified, relying on 802.3 PMD for compatibility with existing Ethernet infrastructure. It supports short-reach applications, with cable lengths limited to under 4 meters to maintain signal integrity at high speeds. This implies copper-based media (e.g., twinaxial cables for CR variants) or backplane traces for KR, using standard connectors like QSFP-DD or OSFP. No optical options are emphasized, as the focus is on intra-rack or short inter-rack links. The medium must handle 212.5 GT/s signaling with minimal attenuation, supporting up to 800 Gbps per station.

In AMD systems, this medium integrates with accelerator fabrics, potentially leveraging patent-described passive interposers for chiplet connectivity. For instance, AMD's US20200409859A1 details a passive crosslink PHY for GPU chiplets, using electrical interfaces (e.g., silicon interposers) that mirror UALink's lane-based medium, enabling high-bandwidth bridges without active components.

#### Patent-Driven Insights into AMD Implementations
AMD's patent applications provide the primary lens for UALink's hardware layers, as direct UALink patents remain unpublished or consortium-shared. Key related filings include:

- **US10929310B2 (Adaptive Address Translation Caches, filed 2019, granted 2021, assignee: AMD)**: Describes optimizing Address Translation Caches (ATCs) in I/O devices like NICs for DMA in accelerator systems. The PHY context involves PCIe-based interfaces with ATS support, where adaptive multi-level caching reduces thrashing in high-speed links. This aligns with UALink's DL/PHY integration, using QoS classes (e.g., CoS, DSCP) to prioritize traffic over the medium. Key claims: A NIC writes reservation info to ATC levels; replacement policies based on QoS to minimize misses in fabrics akin to UALink.

- **US20200409859A1 (Programmable Device with Hardened PCIe Circuits, filed 2020, assignee: AMD)**: Outlines GPU chiplet arrays linked via passive crosslink PHYs on interposers. The medium is electrical pathways (e.g., silicon bridges), supporting high-bandwidth inter-chiplet communication. Descriptions focus on partitioning SoC functions into chiplets, with PHY handling serialization and low-latency bridging—paralleling UALink's lane configurations. Abstract: Connects GPU chiplets to CPUs via buses, with crosslinks for secondary links; emphasizes passive PHY for efficiency.

- **US20020126684A1 (Serial Media Independent Interface, filed 2001, assignee: AMD)**: An older but foundational patent on PHY interfaces connecting MACs to media. It details serial transmission over networks, with PHY handling medium access (e.g., twisted-pair copper). Relevant to UALink: Supports high-speed serialization, fault detection, and auto-negotiation—core to modified 802.3 PHY. Claims: PHY connected to physical medium; MAC with FIFO buffers for NIC-to-NIC links.

These patents underscore AMD's emphasis on passive, efficient PHY designs for accelerators, informing UALink implementations in Instinct GPUs. For example, adaptive caching reduces latency in shared-memory scenarios, while crosslink PHYs enable scalable mediums without power-hungry actives.

#### Comparative Analysis of PHY Configurations
The following table compares UALink PHY modes, drawing from specifications and patent parallels:

| Mode | Lane Width | Serial Rate (GT/s) | Effective Bandwidth (Gbps) | FEC Interleave | Latency Impact | Medium Suitability (from Patents) |
|------|------------|---------------------|----------------------------|----------------|----------------|-----------------------------------|
| 200GBASE-KR1/CR1 | x1 | 212.5 | 200 | 1-way/2-way | Low (optimized) | Short copper cables (<4m); akin to AMD's interposer mediums in US20200409859A1 |
| 400GBASE-KR2/CR2 | x2 | 212.5 | 400 | 2-way | Moderate | Backplane traces; supports QoS caching per US10929310B2 |
| 800GBASE-KR4/CR4 | x4 | 212.5 | 800 | 2-way (standard) | Balanced | Twinaxial copper; serial interfaces as in US20020126684A1 |
| 100GBASE-KR1/CR1 (Optional) | x1 | 106.25 | 100 | 1-way/2-way | Very low | Legacy mediums; adaptive for error-prone links |

This table highlights trade-offs: Reduced interleaving lowers latency but increases BER, as noted in 802.3 modifications. AMD patents suggest hardware mitigations, like QoS-based ATC reservations, to handle such variability.

#### Broader Hardware Ecosystem and Challenges
In AMD systems, UALink's PHY enables direct load/store semantics across accelerators, with security via E2E encryption (patent-aligned with AMD's ATS protections). Virtual channels and credit-based flow control in upper layers complement the PHY, ensuring lossless transmission over the medium. Challenges include medium limitations—short cables restrict pod sizes—addressed in patents via passive designs for denser integrations.

Implementation in AMD's MI400/MI500 involves Synopsys or Marvell IP for PHY, as consortium members. Related patents like US20200409859A1 indicate chiplet-level scaling, where UALink extends intra-silicon links to inter-accelerator fabrics. Overall, while patents focus on foundational tech, they substantiate UALink's hardware as an evolution of AMD's interconnect strategies, promoting open AI infrastructure.

### Key Citations
- [UALink 200G 1.0 Specification Evaluation Copy](https://ualinkconsortium.org/wp-content/uploads/2025/04/UALink200_Specification_v1.0_Evaluation_Copy.pdf)
- [UALink roadmap plots course to optimized AI data center interconnects](https://www.tomshardware.com/tech-industry/artificial-intelligence/ualink-roadmap-plots-course-to-optimized-ai-data-center-interconnects-examining-the-open-standard-designed-to-combat-vendor-lock-in-while-offering-cost-and-performance-optimization)
- [US10929310B2 - Adaptive address translation caches](https://patents.google.com/patent/US10929310B2/en)
- [AMD Files Patent on New GPU Chiplet Approach](https://www.hpcwire.com/2021/01/05/amd-files-patent-on-new-gpu-chiplet-approach)
- [UALink: Powering The Future Of AI Compute](https://semiengineering.com/ualink-powering-the-future-of-ai-compute)
- [UALink 1.0 White Paper](https://ualinkconsortium.org/wp-content/uploads/2025/04/UALink-1.0-White_Paper_FINAL_UPDATED.pdf)
- [Synopsys UALink IP Solution](https://www.synopsys.com/designware-ip/interface-ip/ualink.html)

Btw.: I asked Gemini (thinking) and it straight told me to try tomorrow. Claude created professionally looking report (4000 words), i will add just sources.

References and Source Material​

The following primary and secondary sources were used in preparing this report:



UALink Consortium. UALink 200G 1.0 Specification (Evaluation Copy). UALink Consortium Inc., April 2025.

UALink Consortium. UALink 200G 1.0 White Paper. ualinkconsortium.org, April 2025.

Onufryk, P. (Intel Fellow). UALink Technical Briefing. The Next Platform, April 2025.

AMD. Patent US11,036,658 — Light-weight memory expansion in a coherent memory system. USPTO, June 2021.

Intel. Patent Application US20210232520A1 — Logical physical layer interface specification support for PCIe 6.0, CXL 3.0, and UPI 3.0. USPTO / Google Patents, 2021.

Wheeler's Network. AMD Looks to Infinity for AI Interconnects. August 2024.

Wheeler's Network / NVIDIA Networks NVLink. NVLink comparative PHY analysis. 2022–2024.

WikiChip. AMD Infinity Fabric (IF) — comprehensive SerDes and CAKE documentation. wikichip.org.

GlobalTechResearch. NVLink vs UALink vs NeuronLink — PHY and SerDes comparative analysis. Substack, August 2025.

Marvell Technology. Press Release: Marvell Expands Custom Compute Platform with UALink. June 2025.

Synopsys. DesignWare UALink 200G IP Solution (controller, PHY, VIP) product disclosure. synopsys.com.

Astera Labs. Building the Case for UALink: A Dedicated Scale-Up Memory Semantic Fabric. asteralabs.com, 2025.

Alphawave Semi. 224 Gbps SerDes technology for UALink applications. awavesemi.com.

Tom's Hardware. UALink Roadmap Plots Course to Optimized AI Data Center Interconnects. February 2026.

IEEE. P802.3dj standard: 200 Gb/s PHY — task force documentation and PAM4 encoding specifications.

HPCwire. Everyone Except Nvidia Forms Ultra Accelerator Link Consortium. May 2024.

NAND Research. UALink Consortium Releases UALink 1.0. nand-research.com, April 2025.

ABI Research. Interconnect Innovation: NVLink, UALink, and the Race to Scale AI. January 2026.
 
Which model/plan did You use?

This is what Grok gives You on free plan with 4.20 model:

I don't think its that bad. Biggest problem is, that it pulled 225 sources, most of them being Wikipedia and mainstream media (for example Tomshardware, Medium...).

Next chat where i specifically asked it to focus on patent applications is bellow. Interestingly this time it pulled only 30 sources (and still few of them were Reddit).


Btw.: I asked Gemini (thinking) and it straight told me to try tomorrow. Claude created professionally looking report (4000 words), i will add just sources.
Much better.
 
Worth noting, this isn't actually "the press" reporting - this is AMD themselves.

Their PR dept is purposefully choosing a metric that is 1. Somewhat meaningless in terms of actual capability, and 2. Draws attention to the tug of war between data centers and consumers in terms of energy needs.

I can see a few years from now when Meta gets close and AMD legal says "well we were talking about base, but not boost/turbo power" or something like that.

Oh the flip side -- the one good reason I could see this metric being "OK" is because the type of compute required for AI is still changing. Is it FP32? FP8? INT8? Etc. What will it be in five years as models mature? (Not to mention inference has different needs than training).

Blame the lawyers..... They approve all PRs. I have gone through this a few times. The final version looked nothing like the draft and was void of all intelligence.
 
Back
Top