JSON Variables

PCIe 7.0 Is Complete. 512GB/s Is the New Limit.

PCIe 7.0 specification

This week PCI-SIG completed the PCIe 7.0 spec. The trend is true to form, with each generation doubling throughput in a cycle of about three years, and this generation is no different. 128 GT/s raw bit rate, twice that of PCIe 6.0.

512 GB/s x16 slot
bidirectional
128 GB/s x4 slot
(NVMe SSD)
x1 = 3.0 x16 one lane
equivalent

The numbers are almost ridiculous at this point. A complete x16 slot provides 512 GB/s in both directions. A slot with x4, as most NVMe SSDs actually use, reaches 128 GB/s. And one x1 lane is now equivalent to the bandwidth of a full PCIe 3.0 x16 slot. Sit with that a moment.

It is really difficult to achieve 128 GT/s over copper. PCIe 7.0 continues PAM4 signaling of PCIe 6.0, which transmits more data per clock cycle with four signal levels rather than two. Power management got some refinement too. The spec attempts to maintain efficiency as it does not explode with the speed, which is, frankly, optimistic — but we will see.

The more interesting addition is optical support. They are now calling the spec Optical Aware, which means that it can support light-based connections between server racks. At these frequencies, copper begins to disintegrate at greater distances. This isn't really a surprise. It's more of an admission that pure-copper interconnects at these speeds have a visible expiry date.

Backward compatibility is still intact, as always. Your PCIe 4.0 card will work perfectly in a PCIe 7.0 slot.

Enterprise silicon is generally 12 to 18 months behind the spec, so it is a safe bet that AI accelerators and high-end network controllers will appear in data centers by 2027. Consumer motherboards and Gen 7 NVMe drives are a 2028 to 2029 story at the earliest. Probably later if we're being honest.

The apparent question is why any of this is important when the majority of GPUs are not saturating PCIe 4.0 at the moment. The answer is that your GPU isn't really the point anymore. The bottleneck is AI inference at scale, and raw compute is not the bottleneck. It is the speed of data transfer between the CPU, accelerators, and storage. This is exacerbated by unified memory and DRAM-less architectures since all is on the interconnect at all times. In the case of bandwidth as the ceiling, increasing it is significant.

However, you might say the same thing about each generational leap — and the majority of the population was okay.

Post a Comment

أحدث أقدم