1. Technical Field
The present invention relates to network architectures, and more particularly, to software defined, scalable, hybrid packet/circuit switching architectures for data centers.
2. Description of the Related Art
Many data center applications are bandwidth-intensive, and as such, the data center network (DCN) is a limiting factor in the performance of the data center applications. For example, a virtual machine migration application in cloud computing (e.g. Amazon Elastic Compute Cloud (EC2) application) requires a large amount of bandwidth resources for a significant time duration; and The MapReduce applications in a Hadoop system may generate one-to-many, many-to-one and all-to-all communication patterns among servers in the Map phase and Reduce phase. The different types of communication requirements impose challenges to the data center networks, which generally end-up as a significant source of Capital Expenditure (CAPEX) and Operational Expenditure (OPEX) in overall data center construction.
Currently, several different network architectures are employed to handle the heterogeneous communication requirements within a data center. One method for the construction of a large scale DCN is the continuous “scaling up” of the hierarchical tree network, where the leaves of the tree (e.g., the top of the rack (TOR) switches) stay with low-cost commodity switches, while the higher hierarchies of the tree employ more high-end switches. Despite the high cost of the high-end electrical switches, their high line-rate features are generally enabled by high speed serializer/deserializer (SerDes) and parallel high-speed electrical connections. Such connections are limited by distance, Printed Circuit Board (PCB) layout, Input/Output (I/O) port densities and power dispatch, etc. Therefore, the continuous “scaling up” of high-end electrical switches is extremely difficult, if not impossible, from the technical point of view.
Another method currently employed is to “scale out” rather than to “scale up” the DCN, which means the use of commodity switches to build a Fat-Tree network in order to increase the network scalability. The Fat-Tree network is essentially a folded CLOS network which inherits both the benefits and drawbacks of the CLOS network (e.g., an advantage is that the network can be built as a non-blocking switch which scales up to very large port count, and a drawback is that the number of small commodity switches required scales at the same pace with the number of servers the Fat-Tree can support). The advantages of the Fat-Tree network make the large-size DCN technically feasible, but the drawbacks of it still leave the cost of building and operating a relatively large DCN prohibitively high.
A system for packet switching in a network, including: two or more hybrid packet/circuit switching network architectures configured to connect two or more core level switches in the network architectures, the network architectures being controlled and managed using a centralized software defined network (SDN) control plane; an optical ring network configured to interconnect the two or more hybrid network architectures; one or more hybrid electrical/optical packet/circuit switches configured to perform switching and traffic aggregation; and one or more high-speed optical interfaces and one or more low-speed electrical/optical interfaces configured to transmit data.
A method for packet switching in a network, including connecting two or more core level switches in the network architectures using two or more hybrid packet/circuit switching network architectures, the network architectures being controlled and managed using a centralized software defined network (SDN) control plane; interconnecting the two or more hybrid network architectures using an optical ring network; performing switching and traffic aggregation using one or more hybrid electrical/optical packet/circuit switches; and transmitting data using one or more high-speed optical interfaces and one or more low-speed electrical/optical interfaces.
These and other features and advantages will become apparent from the following detailed description of illustrative embodiments thereof, which is to be read in connection with the accompanying drawings.
The disclosure will provide details in the following description of preferred embodiments with reference to the following figures wherein:
In accordance with the present principles, systems and methods are provided for a hybrid electrical-optical switching network architecture that is low-cost, power efficient, scalable to support large scale data centers and/or multiple data centers and provides the software defined network (SDN) control and virtualization functionalities. The system/method according to the present principles may include a hybrid electrical/optical data center network architecture which does not require a high port-count optical switching fabric in a centralized optical switching architecture in DCN. Rather, it may employ small scale optical switches and a Fat-Tree based network for the optical implementation of the hybrid network, and may keep a similar topology for both the electrical packet switching network and its optical counterparts. An optical ring based network may be established at the core layer of the Fat-Tree to extend all-optical reachability.
In one embodiment, the identical mixed electrical/optical switches may be employed at all the layers of the Fat-Tree network. There may be k number of Optical/Electrical (O/E), Electrical Optical (E/O) conversion ports attached to the electrical switching fabric, functioning similarly as the add/drop port in a reconfigurable optical add/drop module (ROADM). These ports may be responsible for aggregating/de-aggregating the electrical traffic from servers, racks and even pods and convert them between the optical domain and electrical domain. In one embodiment, the optical ring networks may connect the corresponding group of core switches and provide hop-by-hop all-optical connectivity to extend the reach of the all-optical paths. The optical ring network is potentially a blocking network that provides a reconfigurable optical bandwidth resource pool.
In one embodiment, the Fat-Tree based hybrid switching network may be implemented in a smaller scale (e.g. a two-layer Fat-Tree), while the all-optical ring network above the core layer of the Fat-Tree can use more complicated topology (e.g., 2D/3D Torus, Flattened Butterfly or mesh network). In one embodiment, so-called “super servers” may be employed to generate or aggregate large traffic volume, and are equipped with long-reach optics to go beyond racks, pods, or even data centers with large bandwidth and server-to-server all-optical connectivity.
In one embodiment, one or more software defined network (SDN) controllers and orchestrators may be employed in the data center network to support the centralized control and the network resource virtualization functionalities. It is noted that network virtualization may include virtualization of both the packet granularity and the circuit granularity bandwidth resources offered by the hybrid electrical/optical data center network.
Embodiments described herein may be entirely hardware, entirely software or including both hardware and software elements. In a preferred embodiment, the present invention is implemented in software, which includes but is not limited to firmware, resident software, microcode, etc.
Embodiments may include a computer program product accessible from a computer-usable or computer-readable medium providing program code for use by or in connection with a computer or any instruction execution system. A computer-usable or computer readable medium may include any apparatus that stores, communicates, propagates, or transports the program for use by or in connection with the instruction execution system, apparatus, or device. The medium can be magnetic, optical, electronic, electromagnetic, infrared, or semiconductor system (or apparatus or device) or a propagation medium. The medium may include a computer-readable storage medium such as a semiconductor or solid state memory, magnetic tape, a removable computer diskette, a random access memory (RAM), a read-only memory (ROM), a rigid magnetic disk and an optical disk, etc.
A data processing system suitable for storing and/or executing program code may include at least one processor coupled directly or indirectly to memory elements through a system bus. The memory elements can include local memory employed during actual execution of the program code, bulk storage, and cache memories which provide temporary storage of at least some program code to reduce the number of times code is retrieved from bulk storage during execution. Input/output or I/O devices (including but not limited to keyboards, displays, pointing devices, etc.) may be coupled to the system either directly or through intervening I/O controllers.
Network adapters may also be coupled to the system to enable the data processing system to become coupled to other data processing systems or remote printers or storage devices through intervening private or public networks. Modems, cable modem and Ethernet cards are just a few of the currently available types of network adapters.
Referring now to
Switching and traffic aggregation may be performed using one or more hybrid electrical/optical packet/circuit switches in block 108, and the data may be transmitted using one or more high-speed optical interfaces and one or more low-speed electrical/optical interfaces in block 110. It is noted that the data may also be transmitted using high-speed optical interfaces alone, low-speed electrical/optical interfaces alone, low-speed electrical interfaces, or any combination thereof.
Referring now to
In one embodiment, servers may be organized in one or more racks 201. The number of servers that can be mounted in a rack may be determined by the size of the rack, the power management plan and the number of downward (e.g., switch to server) links that the top-of-rack (TOR) switch can support. As the computing power, storage and I/O speed of the servers are continuously evolving, the servers within data centers may gradually be upgraded. At an intermediate stage of evolution, the servers in one rack may include a mix of both traditional servers 202 and super servers 203. The traditional servers may be equipped with low-speed network interface cards (NIC) that are connected to the TOR switch using either electrical cable or low speed optical cables 204, (e.g., 1 Gigabyte per second (Gb/s) Ethernet cables). The super servers may be equipped with high-speed NIC and are connected to the TOR switch using high-speed optical fiber cables 205 (e.g., 10 Gb/s or 40 Gb/s wavelength division multiplexed (WDM) single mode fibers (SMF)). The super servers may also be equipped with one or more low-speed network interface cards which may be connected to the low-speed network interfaces in the TOR switch.
In one embodiment, the TOR switches 211 may include hybrid electrical/optical switching fabrics and multiple high-speed optical interfaces 214, as well as low-speed electrical/optical interfaces 216. The electrical switching fabric 217 may be implemented using, for example, the crossbar based electrical switching technologies that are used in conventional commodity switches. The optical switching fabric 212 may be implemented using, for instance, Micro-Electro-Mechanical-Systems (MEMS) based technologies as the switching matrix. Both switching fabrics 212, 217 may be controlled using, for example, existing distributed L1 control technologies or a software defined network (SDN) controller 241, 242, 243. However, the SDN based control technology may enhance the network virtualization functionalities.
In one embodiment, without losing generality, the system and method according to the present principles does not restrict the control plane technologies applied to the hybrid network architecture, but employment of an SDN based control plane may be advantageous. The Optical/Electrical/Optical (O/E/O) interfaces connecting the electrical switching fabric 217 and the optical switching fabric 212 may be simply pluggable modules (e.g., Small Form-Factor Pluggable (SFP), SFP+, 10 Gigabit Small Form-Factor Pluggable (XFP), 100 Gigabit Form-Factor Pluggable (CFP), etc.) or special-purpose O/E/O converters. The number of such O/E/O interfaces is not restricted, and it may be determined based on a variety of factors, including, for example, the budget, the fanout constraint, and the traffic aggregation/de-aggregation requirements.
The number of upward (e.g., connected to the aggregation layer 219) interfaces, including both the electrical and optical interfaces, can be equal to or less than the number of downward (connected to the servers) interfaces depending on the oversubscription ratio in the Fat-Tree design. In one embodiment, the oversubscription may be 1:1, and there may be an equal number of interfaces upward and downward.
In one embodiment, as in the traditional Fat-Tree based data center network, the aggregation switches interconnecting the TOR switches 211 in the second tier of the Fat-Tree follow the same low-cost design as the TOR. It is noted that the switches in all the layers of the Fat-Tree network may follow the same design, as the switch 212 and 221 shows in
In one embodiment, using a modular data center design, one pod of clusters 210 may be the second hierarchy of server groups which generally includes several rows and columns of racks. One pod may be considered as a mini-data-center. The N/2 TOR switches and N/2 aggregation switches in one pod may be connected in a full-mesh topology using (N/2)2 low-speed electrical/optical wires 204. The interconnection pattern is shown in
For Example, in one embodiment, in Pod 1210 in
In one embodiment, similarly to the conventional Fat-Tree network, the second-layer aggregation switches may be interconnected through the third-layer (core layer) switches 221. The core-layer switches 221 may follow the same design as the TOR switches 211 and the aggregation switches 212. If there is no 4th-layer of switches in the Fat-Tree, the 3-layer switches may use all of their electrical ports to interconnect the 2nd layer switches. Therefore, the electrical part of the network may follow the same topology as the conventional Fat-Tree. As for the all-optical part of the network, in the three-layer Fat-Tree, if N=M, then the optical network topology may be the same as the electrical network. If N<M, then the optical network can be constructed as a single (e.g., big) Fat-Tree, while the electrical network may be segmented into different (e.g., small) Fat-Trees. If N>M, as illustrated in
It is noted that the interconnection pattern drawn in
In one embodiment, the present principles may employ software defined network (SDN) control plane technology to control, abstract and virtualize the underlying data plane resources in the hybrid packet/circuit switching network. In such cases, the switches may be SDN capable. The SDN controllers 241, 242, 243 may be responsible for controlling the hybrid switching fabric at all the layers in the Fat-Tree network. They may be responsible for monitoring the switch status, compute the routing path, setup/modify/delete the flow tables and/or optical cross-connections, etc. The SDN controllers may also be responsible for providing the north bound interfaces to one or more SDN orchestrators 251 and/or other control/management plane applications.
The one or more SDN orchestrators 251 may be responsible for orchestrating different controllers within a data center or even between data centers. It is noted that although the present invention advocates the use of the SDN control plane technologies for the hybrid data center network, and a main focus of the present principles involves the hybrid packet/circuit switching network architecture, the present principles do not restrict the type of control plane technologies used in the hybrid data center network.
In one embodiment, at the core layer 218, the core switches 221, 222, 223 may be grouped into different small optical Fat-Trees (assuming N>M). In each optical Fat-Tree (assuming there are in total j number of optical ports from all of the optical switching fabrics in that group that will be connected to the optical ring layer), then each of the j ports may be connected to a separate optical ring through one or more optical add/drop modules 235, 236, 237. For example, the ith optical port (1<i<) of group 1 may be connected to the ith optical port of group 2, 3, 4, etc. in the same optical ring, while the optical ports in the same group may not be connected to each other through the upper layer optical rings. Therefore, there may be j number of optical rings above the core layer in the network architecture according to the present principles.
It is noted that the one or more optical add/drop modules 235, 236, 237 in the hybrid data center network may have different designs, and
Referring now to
Referring now to
Referring now to
Referring now to
Referring now to
In one embodiment, a 2-layer hybrid Fat-Tree 701 may be connected directly to the optical network on the top. To ensure connectivity and scalability, an all-optical network may be constructed in a more scalable fashion (e.g. in the topology of Torus network (2D, 3D, 4D, etc.)) For simplicity of illustration, the network shown in
In one embodiment, in the case of a 2D Torus network, each optical 2-layer Fat-Tree 711, 712, 713 may be considered as one communication unit (e.g., node) that may add/drop traffic to the Torus network. The PODs 701, 702, 703 may be equivalent to the POD 210 in
In one embodiment, the optical links 725, 726, 728, 729 may be conventional WDM links that interconnect the neighboring optical cross-connect boxes. The SDN controllers 731, 732, 733 may be equivalent to the SDN controllers 241, 242, 243 in
It is noted that although the above network types and configurations are illustratively depicted according to the present principles, other network types and configurations are also contemplated, and may also be employed according to the present principles.
Referring now to
To simplify the discussion, we just use the example which takes advantage of the technology 600 in in
It is noted that although the above configurations are illustratively depicted according to the present principles, other sorts of configurations are also contemplated, and may also be employed according to the present principles.
The foregoing is to be understood as being in every respect illustrative and exemplary, but not restrictive, and the scope of the invention disclosed herein is not to be determined from the Detailed Description, but rather from the claims as interpreted according to the full breadth permitted by the patent laws. Additional information is provided in an appendix to the application entitled, “Additional Information”. It is to be understood that the embodiments shown and described herein are only illustrative of the principles of the present invention and that those skilled in the art may implement various modifications without departing from the scope and spirit of the invention. Those skilled in the art could implement various other feature combinations without departing from the scope and spirit of the invention. Having thus described aspects of the invention, with the details and particularity required by the patent laws, what is claimed and desired protected by Letters Patent is set forth in the appended claims.
This application claims priority to provisional application Ser. No. 61/920,592, filed on Dec. 24, 2013, incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
61920592 | Dec 2013 | US |