Computer networks can have a large number of servers or other types of computing devices interconnected with one another by routers, switches, bridges, firewalls, or other network nodes via wired or wireless network links. The network nodes can enable communications among the computing devices by exchanging messages via the network links in accordance with one or more network protocols.
This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
Computer networks in datacenters can include multiple interconnected switches, routers, and other network nodes organized into a hierarchy, a mesh, or other suitable arrangements. For example, in one implementation, a single enclosure (e.g., a rack) can house multiple servers that are coupled to a single switch associated with the enclosure. Such a switch is sometimes referred to as “top-of-rack” or “TOR” switch. Multiple TOR switches can then be connected to one or more Tier 1 or “T1” switches, each of which can in turn be connected to one or more Tier 2 or “T2” switches.
Typically, redundancy of T1, T2, or other upper-level switches can be readily provided, for example, by adding one or more extra switches. In contrast, providing redundancy for TOR switches can be challenging due to added costs and operating complexity. For instance, one solution includes installing two TOR switches for each enclosure housing multiple computing devices and provisioning two network interface controllers (“NICs”) in each of the computing devices. However, such an arrangement can easily double the capital investments associated with the TOR switches. Also, the dual TOR switches may confuse the computing devices during operation because both TOR switches may be operating at the same time. As such, the computing devices can be more prone to communications failures with dual NICs communicating with dual TOR switches than using just one NIC for each computing device.
Several embodiments of the disclosed technology can provide efficient and cost effective TOR switch redundancy by implementing optical switching between multiple primary TOR switches and one or more standby TOR switches. In one implementation, computing devices in an enclosure can be individually coupled to an optical multiplexer via fiber optic cables. A primary optical switch can then couple the optical multiplexer to a primary TOR switch. The primary optical switch can switch the computing devices from being connected to the primary TOR switch to a standby optical switch when the primary TOR switch encounters abnormal operation conditions. In turn, the standby optical switch can couple the primary optical switch to a standby TOR switch operating in place of the primary TOR switch.
The standby TOR switch can be generally similar to the primary TOR switch in structure and function. As such, a single standby TOR switch can provide redundancy for two, four, eight, sixteen, thirty two, or any other suitable number of primary TOR switches. Thus, capital investments for providing redundancy to the primary TOR switches can be much lower than using dual TOR switches for each enclosure. Several embodiments of the disclosed redundancy scheme can also be more efficient than using dual TOR switches per enclosure because switching optical switches can be a simple operation. Optical switches can be more reliably switched than switching between a pair of active TOR switches. Thus, communications reliability of computer networks in datacenters can be improved.
Certain embodiments of systems, devices, components, modules, routines, and processes for managing backup capability of primary network nodes in a computer network are described below. In the following description, specific details of components are included to provide a thorough understanding of certain embodiments of the disclosed technology. A person skilled in the relevant art will also understand that the disclosed technology may have additional embodiments or may be practiced without several of the details of the embodiments described below with reference to
As used herein, the term “computer network” generally refers to an interconnected network that has a plurality of network nodes connecting a plurality of computing devices (e.g., servers) to one another and to other networks (e.g., the Internet). One example computer network can include a Gigabit Ethernet network implemented in a datacenter for providing various cloud-based computing services. The term “network node” generally refers to a physical or software emulated network device. In one example, a network node can include a TOR switch. In other examples, network nodes can include routers, other types of switches, hubs, bridges, load balancers, security gateways, firewalls, network name translators, and name servers. Each network node may be associated with one or more ports. As used herein, a “port” generally refers to a physical and/or logical communications interface through which data packets and/or other suitable types of communications can be transmitted and/or received. For example, switching one or more ports can include switching routing data from a first optical port to a second optical port, or switching from a first TCP/IP port to a second TCP/IP port.
The term “optical switch” generally refers to i-s-a switch configured to selectively switch signals in optical fibers or integrated optical circuits from one circuit or optical pathway to another. An optical switch can have a number of input and output ports. For example, a “1:2” optical switch includes a single input port and two selectively switchable output ports. A “32:1” optical switch includes thirty two input ports selectively connectable to a single output port. In another example, a “16:2” optical switch includes sixteen input ports each selectively connectable to one of the two output ports. An optical switch can include mechanical, electro-optic, magneto-optic, or other suitable switching mechanisms. Example optical switches suitable for various embodiments of the disclosed technology include N77 series optical switches provided by Agilent Technologies of Santa Clara, Calif. and S Series optical circuit switches provided by Calient Technologies, of Goleta, Calif.
The term “standby” is used herein to denote a readiness for duty and/or immediate deployment. For example, a standby network node (e.g., a standby switch or router) can be generally similar in structure and/or function as a corresponding primary network node. The standby network node can also be suitably connected to other computing devices, network nodes, or other components of a computer network via, for example, fiber optic, Ethernet, or other suitable types of cables. In certain embodiments, the standby network node can be powered up and await instructions to perform certain functions in a computer network in place of the corresponding primary network node. In other embodiments, the standby network node can be in a power-safe mode and may be awaken upon reception of certain instructions to perform the functions in place of the corresponding primary network node.
The network nodes 102 can be organized into a hierarchy, a mesh, or other suitable organizations. For instance, in the illustrated embodiment, the network nodes 102 can include primary network nodes 112 (illustrated as first primary network node 112a and second primary network node 112b), tier one network nodes 114, and tier two network nodes 116 interconnected with one another in a hierarchy. In particular, the primary network nodes 112 are individually connected with one or more tier one network nodes 114. In turn, the tier one network nodes 114 are individually connected with one or more tier two network nodes 116. Though not shown in
As shown in
As shown in
Each enclosure 104 can also be associated with one of the primary network nodes 112. For example, as illustrated in
As shown in
The standby network node 118 can have generally similar connectivity with higher level network nodes 102 as the primary network nodes 112. For example, in the illustrated embodiment, the standby network node 118 can be connected to one or more of the tier one network nodes 114. In other embodiments, the standby network node 118 can also be connected to one or more of the tier two or other suitable network nodes 102. In certain embodiments, the standby network node 118 can be generally similar in structure and function as the primary network nodes 112. In other embodiments, the standby network node 118 can have different structure and/or function as the primary network nodes 112. One example is described in more detail below with reference to
The network controller 120 can include a sever, a virtual machine, or other suitable computing facilities operatively coupled to the computing devices 106, the primary optical switches 110, the primary network nodes 112, the standby optical switch 111, the standby network node 118, and/or other components of the computer network 100. In
In operation, the network nodes 102 can facilitate communications with the computing devices 106. For example, in certain embodiments, messages (e.g., packets) from a computing device 106a in the first enclosure 104a can be routed to another computing device 106b in the second enclosure 104b via a first optical connection along the first optical multiplexer 108a, the first primary optical switch 110a, and the first primary network 112a to a tier one network node 114. The tier one and/or tier two network nodes 114 and 116 can then route the messages to the computing device 106b following a suitable protocol. The tier one and/or tier two network nodes 114 and 116 can also route the messages to a destination outside the computer network 100 via upper-level network nodes (not shown), core network nodes (not shown) or other suitable components.
During operation, the network controller 120 can be configured to monitor for an abnormal operating condition of one or more of the primary network nodes 112 and provide backup capabilities with the standby network node 118 accordingly. For example, in response to a detected abnormal operating condition at, for instance, the first primary network node 110a, the network controller 120 can be configured to cause the first primary optical switch 110a to switch from the first optical connection 113a to a second optical connection 113b between the first primary optical switch 110a and the standby network node 118. The network controller 120 can also be configured to cause the standby optical switch 111 to connect the first primary optical switch 110a to the standby network node 118. The network controller 120 can then enable the standby network node 118 to facilitate communications with the computing devices 106 in the first enclosure 104a in place of the first primary network node 112a. Similarly, in response to a detected abnormal operation condition at the second primary network node 110b, the network controller 120 can also cause the standby network node 118 to provide backup capability for the second primary network node 110b.
As such, the standby network node 118 can provide standby backup capabilities to two, three, or any suitable number of primary network nodes 112. Thus, capital investments for providing such standby backup capabilities can be much lower than providing dual primary network nodes (not shown) for each enclosure 104. Several embodiments of the computer network 100 can also operate more efficiently and reliably than using dual primary network nodes per enclosure. Optical switches such as the primary optical switches 110 and standby optical switch 111 can be more reliably operated than switching between a pair of active dual primary network nodes. Operations and components of the network controller 120 are described in more detail below with reference to
In
The computer program, procedure, or process may be compiled into object, intermediate, or machine code and presented for execution by one or more processors of a personal computer, a network server, a laptop computer, a smartphone, and/or other suitable computing devices. Equally, components may include hardware circuitry. A person of ordinary skill in the art would recognize that hardware can be considered fossilized software, and software can be considered liquefied hardware. As just one example, software instructions in a component can be burned to a Programmable Logic Array circuit, or can be designed as a hardware circuit with appropriate integrated circuits. Equally, hardware can be emulated by software. Various implementations of source, intermediate, and/or object code and associated data may be stored in a computer memory that includes read-only memory, random-access memory, magnetic disk storage media, optical storage media, flash memory devices, and/or other suitable computer readable storage media excluding propagated signals.
As shown in
As shown in
The processor 130 can execute instructions to provide a plurality of software components 140 configured to facilitate providing backup capabilities to the primary network nodes 112. As shown in
The detection component 133 can be configured to detect an abnormal operating condition at the individual primary network nodes 112. In certain embodiments, the detection component 133 can be configured to receive one or more operating parameters 154 from the individual primary network nodes 112 and indicate an abnormal condition based on the received operating parameters 154. For example, the operating parameters 154 can include an average, accumulative, or other suitable types of throughput values at the primary network nodes 112. In other examples, the operating parameters 154 can include instantaneous or average transmission speed, instantaneous or average change in throughput, network load balancing parameters, and/or other suitable parameters. In certain embodiments, the detection component 133 can poll the primary network nodes 112 for the operating parameters on a continuous or periodic basis. In other embodiments, the primary network nodes 112 can be configured to automatically transmit the operating parameters 154 to the detection component 133.
The detection component 133 can then compare the received operating parameters 154 with a corresponding threshold value to indicate whether the primary network nodes 112 are associated with abnormal operating conditions. For example, in certain embodiments, the detection component 133 can indicate an abnormal operating condition at the primary network nodes 112a based on comparisons indicating the following:
An associated average throughput over a period of time is below a threshold;
An accumulated throughput over a period of time is below a threshold;
An instantaneous transmission speed is below a threshold for a pre-determined period of time; or
In other embodiments, the detection component 133 can be configured to detect abnormal operating conditions by receiving one or more status indicators 156 from the primary network nodes 112. For example, the status indicator 156 can indicate that one of the primary network node 112 is in a non-operating mode, e.g., device failure, software update, system maintenance, or other suitable modes. The detection component 133 can then indicate an abnormal operating condition at the primary network nodes 112a based on the status indicators 156.
In certain embodiments, the detection component 133 can indicate an abnormal condition at the individual primary network nodes 112 with an impact period associated with the indicated abnormal condition. For example, if the status indicator 156 indicates that a primary network node 112 is undergoing software update, the detection component 133 can indicate the abnormal operating condition with an associated impact period (e.g., 10 minutes). At the expiration of the impact period, the detection component 133 may re-check a status of the corresponding primary network node 112. In other embodiments, the detection component 133 can indicate an abnormal condition (e.g., system failure) at the primary network nodes 112 without an impact period. Thus, the indication of the abnormal operating condition can be indefinite. In further embodiments, the detection component 133 can re-check a status of the primary network nodes 112 even without an associated impact period, for instance, over a pre-determined time periods. The detection component 133 can also be configured to forward an indicated abnormal operating condition at the individual primary network nodes 112 to the control component 135 for further processing.
The control component 135 can be configured to provide standby backup capabilities to a primary network node 112 associated with an indicated abnormal operating condition from the detection component 133.
The control component 135 can also retrieve a set of configuration information 152 associated with the first primary network node 112a from the memory 150. The control component 135 can then be configured to cause the output component 137 to transmit the retrieved configuration information 152 to the standby network node 118 along with an instruction (not shown) to configure the standby network node 118 based on the transmitted configuration information 152. In certain embodiments, the standby network node 118 can provide a confirmation message (not shown) to the control component 135 confirming successful completion of configuration based on the transmitted configuration information 152. Upon receiving the confirmation message, the control component 135 can cause the output component 137 to transmit another instruction 160c to the standby network node 118 to facilitate communications with the computing devices 106 (
The output component 137 is configured to transmit instructions, configuration information 152, and/or other suitable types of data to the various components of the computer network 100 (
The control component 135 can also be configured to determine to provide standby backup capabilities to one or more selected primary network nodes 112 having abnormal operating conditions.
If the determined number of available standby network node(s) 118 is less than the number of primary network nodes 112 with abnormal operating conditions, in certain embodiments, the control component 135 can be configured to select one or more of the primary network nodes 112 based on, for example, an operating profile of the computing devices 106 associated with the primary network nodes 112, administrator preference, or other suitable criteria. The operating profile can include priority of tasks for execution, current operating modes of the computing devices 106, service availability guarantee associated with the computing devices 106, and/or other suitable characteristics. For instance, with respect to
Based on the selection, the control component 135 can be configured to provide standby backup capabilities to the selected primary network node(s) 112 as discussed in more detail above with reference to
If the determined number of available standby network node(s) 118 is not less than the number of primary network nodes 112 with abnormal operating conditions, the control component 135 can be configured to provide standby backup capabilities to all of the primary network nodes 112, as illustrated in
Upon receiving indication of abnormal operating conditions at both the first and second primary network nodes 112a and 112b, the control component 135 can be configured to cause the output component 137 to transmit:
Even though only two standby network nodes 118a and 118b are illustrated in
As shown in
The process 200 can then include a decision stage 204 to determine whether an abnormal operating condition is detected at the network node. In response to determining that an abnormal operating condition is not detected at the network node, the process 200 includes reverting to detecting an abnormal operating condition at stage 202. In response to determining that an abnormal operating condition is detected at the network node, the process 200 includes switching optical connections from the network node to a standby network node at stage 206, for example, by utilizing the control component 135 of
As shown in
Optionally, the process 200 can include re-checking condition of the network node by reverting to detecting abnormal operating condition at stage 202. In one embodiment, re-checking condition of the network node can be based on an impact period with the indicated abnormal operating condition, as described in more detail above with reference to
The process 206 can then include switching one or more primary optical switches 110 at stage 224 by, for example, utilizing the output component 137 of
The process 208 can then include replicating the retrieved configuration information at the standby network node at stage 234. In one embodiment, replicating the configuration information includes transmitting the retrieved configuration information to the standby network node with an instruction to configure based on the configuration information. In other embodiments, configuration information may be replicated manually or via other suitable techniques. The process 208 can then include activating the standby network node with the replicated configuration information at stage 236. In one embodiment, activating the standby network node can be automatic. In other embodiments, activating the standby network node can include transmitting an activation instruction to the standby network node.
As shown in
Depending on the desired configuration, the processor 404 may be of any type including but not limited to a microprocessor (μP), a microcontroller (μC), a digital signal processor (DSP), or any combination thereof. The processor 404 may include one more levels of caching, such as a level one cache 410 and a level two cache 412, a processor core 414, and registers 416. An example processor core 414 may include an arithmetic logic unit (ALU), a floating point unit (FPU), a digital signal processing core (DSP Core), or any combination thereof. An example memory controller 418 may also be used with processor 404, or in some implementations memory controller 418 may be an internal part of processor 404.
Depending on the desired configuration, the system memory 406 may be of any type including but not limited to volatile memory (such as RAM), non-volatile memory (such as ROM, flash memory, etc.) or any combination thereof. The system memory 406 can include an operating system 420, one or more applications 422, and program data 424. As shown in
The computing device 400 may have additional features or functionality, and additional interfaces to facilitate communications between basic configuration 402 and any other devices and interfaces. For example, a bus/interface controller 430 may be used to facilitate communications between the basic configuration 402 and one or more data storage devices 432 via a storage interface bus 434. The data storage devices 432 may be removable storage devices 436, non-removable storage devices 438, or a combination thereof. Examples of removable storage and non-removable storage devices include magnetic disk devices such as flexible disk drives and hard-disk drives (HDD), optical disk drives such as compact disk (CD) drives or digital versatile disk (DVD) drives, solid state drives (SSD), and tape drives to name a few. Example computer storage media may include volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information, such as computer readable instructions, data structures, program modules, or other data.
The system memory 406, removable storage devices 436, and non-removable storage devices 438 are examples of computer readable storage media. Computer readable storage media include storage hardware or device(s), examples of which include, but not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other media which may be used to store the desired information and which may be accessed by computing device 400. Any such computer readable storage media may be a part of computing device 400. The term “computer readable storage medium” excludes propagated signals and communication media.
The computing device 400 may also include an interface bus 440 for facilitating communication from various interface devices (e.g., output devices 442, peripheral interfaces 444, and communication devices 446) to the basic configuration 402 via bus/interface controller 430. Example output devices 442 include a graphics processing unit 448 and an audio processing unit 450, which may be configured to communicate to various external devices such as a display or speakers via one or more A/V ports 452. Example peripheral interfaces 444 include a serial interface controller 454 or a parallel interface controller 456, which may be configured to communicate with external devices such as input devices (e.g., keyboard, mouse, pen, voice input device, touch input device, etc.) or other peripheral devices (e.g., printer, scanner, etc.) via one or more I/O ports 458. An example communication device 446 includes a network controller 460, which may be arranged to facilitate communications with one or more other computing devices 462 over a network communication link via one or more communication ports 464.
The network communication link may be one example of a communication media. Communication media may typically be embodied by computer readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave or other transport mechanism, and may include any information delivery media. A “modulated data signal” may be a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media may include wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, radio frequency (RF), microwave, infrared (IR) and other wireless media. The term computer readable media as used herein may include both storage media and communication media.
The computing device 400 may be implemented as a portion of a small-form factor portable (or mobile) electronic device such as a cell phone, a personal data assistant (PDA), a personal media player device, a wireless web-watch device, a personal headset device, an application specific device, or a hybrid device that include any of the above functions. The computing device 400 may also be implemented as a personal computer including both laptop computer and non-laptop computer configurations.
Specific embodiments of the technology have been described above for purposes of illustration. However, various modifications may be made without deviating from the foregoing disclosure. In addition, many of the elements of one embodiment may be combined with other embodiments in addition to or in lieu of the elements of the other embodiments. Accordingly, the technology is not limited except as by the appended claims.
Number | Name | Date | Kind |
---|---|---|---|
8483046 | DeCusatis | Jul 2013 | B2 |
8705954 | Singla et al. | Apr 2014 | B2 |
8948181 | Kapadia et al. | Feb 2015 | B2 |
8965203 | Vahdat et al. | Feb 2015 | B1 |
9025434 | Maltz et al. | May 2015 | B2 |
20070058973 | Tanaka | Mar 2007 | A1 |
20120076006 | DeCusatis | Mar 2012 | A1 |
20120099863 | Xu et al. | Apr 2012 | A1 |
20120236761 | Yang et al. | Sep 2012 | A1 |
20130232382 | Jain et al. | Sep 2013 | A1 |
20140006843 | Kim et al. | Jan 2014 | A1 |
20140064104 | Nataraja et al. | Mar 2014 | A1 |
Number | Date | Country |
---|---|---|
103686467 | Mar 2014 | CN |
Entry |
---|
Zhu, et al., “Fully Programmable and Scalable Optical Switching Fabric for Petabyte Data Center”, In Proceedings of Optics Express, vol. 23, Issue 3, Feb. 5, 2015, 18 pages. |
McGillicuddy, Shamus, “Plexxi SDN includes tiered controller, data center-based WDM”, Published on: Dec. 5, 2012 Available at: http://searchsdn.techtarget.com/news/2240173858/Plexxi-SDN-includes-tiered-controller-data-center-based-WDM. |
Driver, et al.,“IBM zEnterpriseBladeCenter Extension—Network Connectivity Options”, In Red Paper, May, 2014, 58 pages. |
“High Availability for the Contrail OVSDB TOR Agent (2.20)”, Retrieved on: Jun. 10, 2015 Available at: https://techwiki.juniper.net/Documentation/Contrail/Contrail—Controller—Getting—Started—Guide/40—Extending—Contrail—to—Physical—Routers,—Bare—Metal—Servers,—Switches,—and—Interfaces/21—High—Availability—for—the—Contrail—TOR—Agent—(2.20). |
Gill, et al., “Understanding Network Failures in Data Centers: Measurement, Analysis, and Implications”, In Proceedings of the ACM SIGCOMM conference, Aug. 15, 2011, 12 pages. |
Wu, et al., “NetPilot: Automating Datacenter Network Failure Mitigation”, In Proceedings of ACM SIGCOMM Conference, Aug. 13, 2012, pp. 419-430. |
“International Search Report and Written Opinion Issued in PCT Application No. PCT/US2016/046096”, Mailed Date: Nov. 2, 2016, 10 Pages. |
Number | Date | Country | |
---|---|---|---|
20170078015 A1 | Mar 2017 | US |