How backbone networks support AI infrastructure

This article was originally published by Data Center Knowledge.

AI workloads are rapidly reshaping infrastructure demands and design. Currently, AI model training is mostly concentrated in centralized campuses with access to affordable power, typically with less proximity to network infrastructure. Simultaneously, these workloads are driving data center interconnection traffic, particularly as distributed architectures become more common and latency requirements tighten.

These demands will continue to evolve as the emphasis moves from training to inference. Distributed AI workloads, agentic systems and emerging neocloud platforms are increasing the need for resilient connectivity across core, edge and cloud environments. As inferencing moves closer to users, low-latency delivery between edge infrastructure and end users also becomes even more critical. These shifts may play out similarly to earlier cloud adoption cycles, where workloads and data movement expanded outward from centralized hubs.

Each stage of AI adoption has distinct requirements for infrastructure design, traffic patterns and connectivity. With these shifts, backbone networks are vital in distributing AI-driven services, similar to previous cloud developments. Let’s first explore how these factors converge to support the training phase, enabling companies to develop high-performance AI models.

AI training: Traversing global highways

Training involves massive compute workloads comprised of huge datasets, all processed through large GPU clusters within centralized AI data center campuses. These data centers are typically built in more remote areas with access to affordable power, as processing these workloads consumes immense energy.

While training centers require high-capacity interconnects to transfer data between campuses, they are generally less sensitive to factors like latency and availability than user-facing inference systems, because much of this processing happens within the data center.

This is where emerging neoscalers operate, building large training data centers and deploying huge GPU clusters. A high-capacity backbone is crucial here, with the network underlay serving as the foundational enabler of AI training.

Ultimately, global backbone networks act as the “highways” of AI infrastructure, moving massive training datasets between globally distributed data centers and feeding GPU clusters fast enough to keep large-scale training workloads running efficiently. Optical transport systems leveraging coherent pluggables are essential for scaling data center interconnection capacity for these purposes.

But once these models are trained up, what role will operators play in inferencing at the edge?

AI inference: Cruising down country roads

AI inference refers to utilizing trained AI models to generate and distribute responses to end users, with examples including enterprise AI agents, AI copilots and chatbots. Instead of the model running in one giant data center, the model is copied and distributed across connectivity infrastructure. This requires low latency, high availability, and redundancy to ensure reliable operations and minimize outages.

Inference must take place closer to end users to ensure uninterrupted, real-time functionality. As a result, inferencing requires access to dense, reliable connectivity near population centers, not just affordable power. This balance is a growing challenge for many data center operators.

These factors will likely push AI infrastructure closer to the edge, similar to how cloud services developed in the past. In this phase, companies rely on small “country roads” to distribute responses to end users who are living “out in the sticks” from a networking perspective. So, where does the backbone fit here?

Instead of moving huge datasets to (or between) a few compute locations, backbone connectivity distributes trained models and links the regional infrastructure that supports inference workloads. Backbone networks allow operators to move trained models from centralized training clusters to regional data centers, where they can be replicated across cloud and edge locations to serve users locally.

This functionality relies on backbone connectivity tying smaller data centers and edge locations together to function as a more coordinated system.

Differing traffic patterns, differing network designs

With these differing use cases, operators must now contend with the requirements of two distinct traffic patterns, each with their own optimal network design. While operators were used to designing networks for centralized cloud workloads, they must now serve the needs of training traffic patterns and inferencing traffic patterns.

Training traffic contends with immense datasets transferred in large bursts between data centers, requiring massive bandwidth and posing less sensitivity to downtime.

Inference typically involves smaller traffic flows that rely more on low-latency, redundant connectivity. These evolutions mirror the progression of cloud infrastructure. When cloud computing came onto the scene, workloads were hosted in hyperscale data centers. Content distribution networks (CDNs) eventually emerged, and services were moved closer to end users. AI may follow a similar path, shifting from centralized training clusters to AI agents and inference services at the edge.

These differing traffic patterns and demands also cascade down to enterprises. Unlike traditional enterprise applications, AI environments generate immense east-west traffic volumes that depend on low-latency transport between distributed compute resources.

As AI training and inference environments operate across multiple regions, reliable interconnection and resilience become even more vital. Many enterprises are also adopting hybrid architectures that combine centralized training environments with distributed inference platforms, creating new demands for resilient backbone connectivity, reliable interconnection and improved network visibility.

The road ahead: How connectivity infrastructure drives AI

If the years have taught us anything, it’s that technological innovation is an unpredictable ride. However AI plays out, high-capacity backbone connectivity remains the underlying enabler of each use case. Whether it’s training datasets riding along AI superhighways, or inferencing responses cruising down country roads, many of the connectivity requirements are familiar.

Operators that enhance reliability, redundancy, capacity and reach will position themselves to support the needs of neoscalers, enterprises, wholesale service providers and others as we speed headfirst into the AI era.

Mattias Fridström, Chief Evangelist