Internet services increasingly employ service oriented architectures, where the retrieval of a single web page can require coordination and communication with many individual sub-services running on remote nodes. Present, data center networks are expensive due to the use of non-commodity switches at the highest levels. Even with the high cost of the switches, bandwidth can become a bottle-neck if many servers from one section of the network try to communicate with servers in another section.
In this paper, the authors show how to leverage largely commodity Ethernet switches to support the full aggregate bandwidth of clusters consisting of tens of thousands of elements. The paper attempts to design a data center communication architecture that meets the following goals: scalable interconnection bandwidth, achieve economies of scale using commodity ethernet switches, backward compatibility with hosts running Ethernet and IP.
Today, the price differential between commodity and non-commodity switches provides a strong incentive to build large-scale communication networks from many small commodity switches rather than fewer larger and more expensive ones. (The table showing the cost decrease of GigE and 10 GigE is very illustrative of the strength of the authors' argument.) Also, the commodity switches use less power and give off less heat. Use of commodity switches is even more compelling due to the increasing data center energy/heat density. The authors make an important point that an approach that uses commodity switches will be the only way to deliver full bandwidth for large clusters once 10 GigE switches become commodity at the edge.
Another strong motivation to use such an interconnected network topology is the fault-tolerance provided by multiple paths. Such a network would be able to continue for some time without immediate repair, or indefinitely if sealed in a modular data center.
The authors chose to ignore wiring costs in their evaluation of their strategy. However, the complexity involved in properly wiring up such a system is not trivial, since many setup errors frequently occur in data centers. This actually provides an opportunity for manufacturers of switches to sell prepackaged solutions with the switches preloaded with the modified routing algorithms and pre-wired interconnections.
In the future, bandwidth will increasingly become the scalability bottleneck in large-scale clusters. Existing solutions for addressing this bottleneck center around hierarchies of switches, with expensive, non-commodity switches at the top of the hierarchy. Larger numbers of commodity switches have the potential to displace high-end switches in data centers in the same way that clusters of commodity PCs have displaced high-end servers. While this specific approach may not be taken, this paper and the DCell paper will hopefully inspire more research on the subject.