Site icon Arelion Blog

Rethinking Internet Backbone Architectures

During the past couple of years, we’ve made several public ad-hoc announcements on technology transitions that keep us busy building and operating the world’s #1 Internet Backbone. This has been done without ever providing a context or connection to the trends believed to be fueling the industry going forward and the path we’ve taken to reinvent ourselves in a highly competitive and ever-changing market.

The purpose of this initial post is to provide an overview and introduction to the extensive work started in early 2018. To stay in that period briefly, this was a time during which we as a company was doing relatively well from a host of key metrics – including NPS score and rising to the very top of the DYN rankings as the world’s #1 Internet Backbone. In addition, the increases in edge traffic offset the wider and aggressive market price erosions for services like IP Transit – a market declared dead for the better part of the last decade. In other words, and although being far from perfect metrics, there were signs of us doing something right.

It wasn’t before closely analyzing the impact of scaling requirements for the next few years that it became clear that conventional architectures built on yesterday’s truths, yesterday’s technology strategy, and decades of accumulated complexity would fall short in helping us keep up with demands.

One of my favorite and much over-used analogies to this is a quote the Nokia CEO-at-the-time Stephen Elop made amid the acquisition by Microsoft back in 2013:

“We didn’t do anything wrong, but somehow, we lost”


This serves as the perfect lesson in despite not necessarily doing anything wrong, if you fail to catch (and exploit) the trends of tomorrow, someone else will – and you still lose.

Especially true in a highly commoditized wholesale networking market, we’re more than used to having to differentiate on caring, transparency, great people, and software. Although software for all the right reasons consume the vast majority of development cycles, this is a time during which hardware matters more than ever in enabling disruptive architectural shifts through all layers of the network to keep up with customer demands for more, consistent bandwidth and a high-quality experience. It is also irrespective of whether one’s building networks to accommodate the insatiable bandwidth requirements of 5G or building the world’s #1 Internet backbone.

Because this transition requires a tremendous amount of discipline and long-term conviction, we set out on a very simple (but not easy) transition path revolving around three key areas illustrated in Figure 1:

Figure 1 – Network Transformation Path
  1. Radical simplification to make use of routing silicon with very different design trade-offs and implications on operations and forward-looking performance evolutions
  2. Partially disaggregate Optical Networks to drive vendor competition and standardize alien wavelengths as the default deployment paradigm – at an acceptable OPEX overhead
  3. Convergence of IP and Optical, starting with short-range, point-to-point, deployments covering the full range of operational, organizational, cultural, and technology scope

1. An Evolving Routing Silicon Landscape

Overly simplified and in lack of a better word, traditional network processor units (NPUs) built for high-touch Service Provider deployments usually share the characteristics of ultra-deep buffering, flexible and re-programmable stages, granular statistics, and large memories for general logical scaling – preferably all at the same time. This is achieved by massively sacrificing attributes such as bandwidth and – in turn – cost and power efficiency. On average, NPUs of this class have had a total performance (as defined by bandwidth) increase since January 2010 of just roughly 300% and still provide less than 1 Tbps duplex throughput. In parallel, we’ve also seen the emergence of chipsets that strikes a far better balance to meet bandwidth scaling demands while still maintaining the characteristics of deep (partial) buffering, Internet-scale FIB, and just enough general logical scaling and flexibility to serve a wide array of use-cases.

One merchant silicon example is the Strata DNX (aka ‘Jericho’) series from Broadcom, originally from the acquisition of Dune Networks. Looking back over the same beforementioned period and as visualized in Figure 2, the cumulative performance increase for this chipset family optimized for power and bandwidth efficiency has been 5900%. With Jericho2c+ just around the corner, the total goes up to 8900% at 7.2 Tbps per ASIC.

Figure 2 – Routing Silicon Evolution

While it should not be forgotten that high-touch NPUs excel in certain and ever-decreasing number of roles, the emergence of chipsets with similar characteristics to that of DNX drive fundamentally different architectural, operational, accounting, and process incentives. From what can be visually derived in Figure 2 when over-simplifying a more complex topic involving electrical lane speeds, transistor sizes, and other noise-creating factors:

As a result of being on a Moore’s Law-like trajectory where 2Y = 2X, the economically viable lifetime of equipment based on cost per incremental bit is becoming significantly shorter. If that period used to be 4-6Y, spending 1Y in sourcing and validation wasn’t all that bad all things considered. However, if the same period has shrunk to be just 2Y or less and all else equal; 50% of the lifetime is wasted before even starting the deployment. With legacy methodologies, this of course doesn’t work. The way we approached this was by looking carefully at three distinct areas:

This fundamental shift also amplifies the IP/MPLS architectural trends seen over the past few years:

If 2020 with more than 50% edge traffic growth, albeit for all the wrong reasons, has taught us anything it is that the ability to build capacity consistently and swiftly is a key differentiator. The recent COVID-driven growth wave has certainly helped to catapult this architectural, building blocks, and 400GE shift deeper and wider into AS1299. Thus, we can now pre-build capacity in far more places that in turn enables fewer truck rolls, fewer reactive network build-outs, and overall quicker commissioning of new circuits.

I believe one of the key ways of driving long-term conviction, discipline, and stamina through trying times of major transitions is finding meaningful metrics to measure progress. One such metric used internally is looking at the chipset family distribution over time in AS1299. As evident in Figure 3, more than 70% of the active/in-use capacity is now run on chipsets optimized for bandwidth and power efficiency.

Figure 2 – Chipset Family Distribution

The dedicated migration teams will accelerate the efforts throughout 2021. The results are also very much visible in the line overlaying Figure 3, showing the three months moving average relative spending on traditional high-touch NPUs and system components. By moving around, consolidating, and retiring those systems and components; a critical mass was achieved towards the end of 2019. With the exception of a COVID-driven explosion in capacity demands, this has enabled us to make no new investments in those technologies. That same explosion has also accelerated the transition from 25G to 50G SerDes as indicated in Figure 4. For the sake of clarity, it should be mentioned that while the previous chart was showing relative bandwidth in terms of active ports, Figure 4 is denoting capacity installed (but not necessarily in-use) normalized to x100Gs:

Figure 4 – Committed Builds on target ASICs

Out of the roughly 20,000 x 100G-normalized ports installed on these ASICs, almost 3,500x are 400GE-capable and widely distributed amongst tier-1 North America and Europe markets. Thanks to their efficiency in providing up to 10.8 Tbps of forwarding per chip with on-package high-bandwidth memories, the number of components and footprint required to produce the same capacity is greatly reduced.  Although skewed by growth and higher speed efficiencies, these systems as shown in Figure 2 host more than 70% of the active capacity – but only consume around 30% of the total IP/MPLS systems power. In a time when OPEX (power and cooling specifically) is becoming the key constraining factor to new deployments and an ever-increasing part of the total network cost, the importance of this shift cannot be stressed enough when looking at the impact over the next couple of years.

The other notable effect is the acceleration of the long-foreseen trend shift of relative spending towards the optical layer, thus driving very different architectural and strategy incentives that will be discussed later in this post. While there are technology, component (material), volume, and complexity reasons why this trend is poised to continue over time, a significant part can also be attributed to the historical lack of dynamic competition in the optical networking space. Therefore, it’s only natural that addressing this issue is a major focus area for most network operators.

2. Partially Disaggregated Optical Networks

Disaggregation is a massive topic with a plethora of different definitions and meanings, and now relevant across all layers of networks. Ultimately, we focus on technology, function, commercial, and accounting split of parts with very different innovation cycles and/or general life cycles.

Over the past few years, I believe we’ve come a long way and to a much healthier discussion in the wider industry that goes way beyond the initial: “Because company x does y, it has got to make sense for us as well” narrative without ever considering other input factors and drivers. This naturally hasn’t been helped by the marketing debacle from vendors, integrators, and (paid by the former) analysts. Taking the router disaggregation discussion as an example being a few years ahead, disaggregation-driven enablers and benefits are still often mixed with completely unrelated (yet equally important) factors such as:

… Nothing of which has much to do with the functional and/or component disaggregation of closed and vertically integrated systems including for example NOS, forwarding hardware, and optics. While there are certainly different flavors and levels of extremes, the focus needs to be on what actual problems are being solved after adjusting for any associated complexity and overhead – especially when having to maintain, develop and test across multiple generations of technologies. It should also be pointed out that while the most predominent driver is seemingly bottom-up cost competition, there’s also an element of control when components are consumed in massive volumes and/or with specific capability requirements. As the cost delta between the models shrink, and it does thanks to continuous and ruthless competition, the risk/reward modeling and subsequent analysis needs to follow accordingly.

For us and in the context of optical networks, the right balance is struck at what is commonly referred to as partial disaggregation. In this model, transponders are split from optics and from line-systems. As such, the static, mature and highly commoditized line-systems are open in the sense that they transparently serve any alien wavelength while being:

In contrast, healthy and dynamic competition is brought to the much faster evolving and costly transponders with bottom-up pricing and best-of-breed selections. A deployment model further (or fully) disaggregating the optical layer adds very little net value when weighed against the added burden on multi-vendor integration and interoperability in software and APIs, operational demarcation points, and other tools.
This topic is fully deserving of its own in-detail post and although the implementation is far from perfect, we are reaping the benefits of:

Public Press Resources:
Orange, Telia Initiate TransportPCE Project in OpenDaylight
Ciena Lands Open Optical Line System Deal with Telia Carrier
Telia Carrier, Coriant and Facebook collaborate on successful trial of Voyager
European Industry Consortium Successfully Demonstrates SDN-based Reach Planning in a Multi-Vendor Optical Network Field Trial

3. DWDM Pluggables and Standardization

IP over DWDM has been around for more than a decade. As a concept, it revolves around placing coherent pluggable optics directly into routers. The value propositions of collapsing an entire layer includes the elimination of back-to-back grey optics, lower power consumption and unification of monitoring, control planes and management. However, and although architecturally making all the sense in the world, it never really took off outside of a few niche deployment scenarios. This begs the question of what’s different this time around?

Two major differences stem from the previous sections:

  1. The evolution of routing silicon has led to multi-Tbps ASICs and exponential, Moore’s Law-like, performance increases in throughput and power efficiency. This performance trajectory now exceed the needs of most networks, reduce the overall cost structures, and shift the relative spending even more to optical networks – as evident in Figure 5.
  2. The partial disaggregation of optical networks with a uniform approach to deploying alien wavelengths over open optical line-systems; thus mitigating many of the traditional challenges in operations, vendor competition, culture, process, and technology
Figure 5 – Avg. CAPEX distribution per 100G Circuit

As recent as just last year, deploying IPoDWDM did not only incur a huge density penalty on switch-router blades. Because the application itself was revolving around the CFP2 form-factor, the other natural side effect was therefore a need for dedicated single-purpose cages/line cards. Through standardization work by the OIF and other MSAs, this all changed with the introduction of 400G-ZR(+) that can be realized using industry-standard QSFP-DD modules. This, for the very first time, enables true mix-and-matching of all client applications on the same switch-router building block at full face-plate density and in turn:

Perhaps even more importantly as the relative spending is shifted heavily towards transponders is a standards-based DWDM layer without bookending. By being pluggable and interoperable, just-in-time investments without front-heavy discrete transponder chassis/card costs are made possible. It also enables continuous and healthy per module granularity to vendor competition and thus making it the most powerful pay-as-you-grow schema ever seen in this space. In addition, these modules can subsequently be reused in any 400GE capable host, offering unparalleled investment protection.

The latter is yet another testament as to why conventional sourcing methodologies fall short. In that model, multi-year investments and the associated lock-in is based on pricing given at a single point in time. Throughout the lifetime and fill-up of transponder systems, the mapping disconnect to bottom-up supplier COGS is usually increasing and without any credible leverage to exercise competition. This has always been true for modular systems in general as they truly are, and as mentioned in Section 1. for routers, the ultimate lock-in for vendors because of the proprietary nature of the components loaded into them. With standards-based DWDM optics, being the by far most expensive components, competition can be exercised at any given point without much operational or administrative overhead. It also serves as cheap insurance to some cases of supply chain contraints.

The other often forgotten aspect and benefit with IPoDWDM is around availability schemas. As previously mentioned, traditional transponder chassis often become front-heavy in terms of cost unless assuming high initial fill-rate. This is partially driven by discrete systems each coming with their own additional space, power and cost components including:

As a result, you’ve effectively deployed another whole set of hardware with its own intra-system redundancies, software, support, and life-cycle management overhead. Limiting the blast radius of any single failure to for example one fiber direction in many cases necessitates deploying more (surplus) components and amplifies the negative ROI effects even further.

This is by no means to say that high-performance discrete transponders will go away, at least not in challenging fiber-scarce (ultra) long-haul and subsea applications where spectral efficiency will continue to be key for network operators needing to squeeze out every last bit. However, as a consequence of forever changed cost structures, overall topological and architectural principles have to evolve accordingly. With the existing and as-is topology of AS1299, almost 80% of the total circuit bandwidth is between router pairs less than 1,000 km apart.

It will also be interesting to see the subsequent spending effects any IPoDWDM adoption may have on conventional DWDM technologies amongst networks operators in general, as it not only has the potential to cannibalize on new deployments. As with the case of high-touch NPUs in routers discussed in the first section of this post, there is probably similar case to be made for replacing already (and recently) deployed transponder shelves and modem cards with pluggable coherent optics directly into switch-routers. The ripple effects of such a scenario for traditional DWDM components would then be, just as with high-touch NPU-based routers:

  1. A stock is built as migrations are made, thus reducing or even eliminating the need for new investments in those technologies
  2. Redeployment where their premium performance add value, which is arguably also a shrinking number of scenarios. At the very least, using the freed-up modem cards to consolidate and fill up already deployed chassis with slot capacity elsewhere
  3. By converting to IPoDWDM, each A-Z circuit also releases four (2x back-to-back) grey optics

Telia Carrier was one of the first operators to validate Acacia’s 400G-ZR and 400G-ZR+ across multiple switch-router vendors and on top of an open, third-party, optical line-system. With the imminent delivery of the first batch of 400G-ZR modules for production, we’re very excited to start the deployment.

Public Press Resources:
Telia Carrier Embraces Coherent Pluggables using Acacia’s Open-ZR+ Modules

3.1 Pluggable Amplifiers for DCI Line-Systems

On a related topic and using the same rationale, we are working with industry partners on standardizing a simple and pluggable point-to-point line-system, with the EDFAs integrated into QSFP. This would be optimized for around 100 km and 8-16 channels of 400G-ZR wavelengths, for which most existing solutions are over-capable in the context of IPoDWDM applications. The short target distances also have a strong positive correlation to scenarios in which fiber is plentiful, capacity requirements limited, and thus where spectral efficiency less of a concern. Traditional line-systems for this application come with all the before-mentioned caveats mentioned about discrete transponders in section 2., including surplus components (CPUs, power feeds, fans/cooling), intra-device redundancy, mgmt. interfaces and software, monitoring, and front-heavy deployments from a CAPEX perspective. Because these pluggable EDFAs will be implemented in a QSFP-DD, they will also backward compatible to QSFP28 and as such (as opposed to existing solutions):

In addition, it provides a much-needed reset of traditional vendor commercial models, fosters ease-of-adoption of 400G-ZR and is very straightforward operationally.
Although embedding EDFAs into pluggable optics for booster and pre-amplification functionality is nothing new, it’s becoming much more relevant when there’s finally a high-volume use-case for IPoDWDM in many SPs access and aggregation domains. This is especially true when considering the reach of direct-detect technologies becomes ever-shorter as data rates go up beyond 400G. It can potentially also address inter-working with brownfield deployments when 400G-ZR wavelengths launching at -10 dB have to co-exist with traditional ones. Consider Figure 5 and an illustrative subset of <120 km circuits and their respective capacity in Mbps:

Figure 5 – Distance to Capacity relationship in sample metro

In this subset, there is no circuit larger than 1.8 Tbps. If assuming a 16-channel system of 400G-ZR yielding a total of 6.4 Tbps, the spectrum utilization would be just 28%.
The other aspect when taking into account the beforementioned radical cost structure changes is that for these (relatively) low-capacity and short-distance deployments primarily for access and aggregation, a traditional (albeit for simple point-to-point) OLS may now very well represent the most significant part of CAPEX (and OPEX, respectively) for new deployments. Having a uniform solution to adress scenarios for brownfield inter-working, distances >40 km and/or muxing/demuxing of multiple 400G-ZR then becomes very powerful and presents the very first potential high-volume deployment for (Q)SFP pluggable EDFAs across network operators worldwide. Especially for scenarios within the teal dotted box of Figure 7, which have previously had a relatively high deployment cost.

4. Conclusions

Relying on yesterday’s truths, cost structures, routing silicon, operational paradigms, tools and two decades of accumulated complexity falls short in meeting insatiable bandwidth requirements consistently and at a sustainable power consumption. As the interval between meaningful performance increases is significantly reduced, vastly different incentives emerge:

Open and partially disaggregated optical networks provide a framework for commissioning and operating alien wavelengths uniformly as the default deployment paradigm.
Standards-based 400G DWDM pluggable optics put directly into high-scale routing systems bring much-needed fluency to vendor competition in addition to redefining network economics, operational simplicity, and architectures.
For many network operators, this requires breaking down the cultural, process, planning, and organizational barriers that have kept the IP and Optical domains apart and as such often lead to sub-optimal decisions. With the emergence of disruptive technology innovations discussed in this post, there have never been stronger incentives to finally get it right.

Rest assured, someone else will.

Johan Gustawsson,
Head of Network Engineering & Architecture


Exit mobile version