1. Technical Field
The present invention relates to networks.
2. Related Art
Network systems are commonly used to move network information (may also be referred to interchangeably, as frames, packets or commands) between computing systems (for example, servers) or between computing systems and network devices (for example, storage systems). Various hardware and software components are used to implement network communication, including network switches.
A network switch is typically a multi-port device where each port manages a point-to-point connection between itself and an attached system. Each port can be attached to a server, peripheral, input/output subsystem, bridge, hub, router, or another switch. The term network switch as used herein includes a Multi-Level switch that uses plural switching elements within a single switch chassis to route data packets.
A network switch port may be routinely taken offline for maintenance, credit loss, and reconfiguration of virtual lanes, for collecting statistics or any other reason. It is desirable to reduce packet loss when a port is taken offline and then brought online.
The present disclosure provides a system and associated method for delaying packet delivery to a port to be taken offline while maintaining in order packet delivery.
In one embodiment, a method for network communication is provided. The method includes identifying a first network port to be taken offline. Before taking the first network port offline, processing any pending packet tag for the first network port. The method further includes taking the first network port offline; storing a packet tag destined for the first network port at the second network port, while the first network port is offline. Thereafter, bringing the first network port online and routing the packet tag stored at the second network port, while the first network port was offline; wherein the packet tag is routed from the second network port to the first network port.
In another embodiment, a system for network communication is provided. The system includes a first network port configured to receive and transmit a network packet; and a second network port configured to communicate with the first network port. The first network port is identified to be taken offline and before taking the first network port offline, any pending packet tag at the first network port is processed. While the first network port is offline, the second network port is configured to stop a packet tag that is destined for the first network port and store the packet tag at the second network port. When the first network port is brought online, the second network port routes the packet tag stored at the second network port to the first network port.
In another embodiment, a method for network communication is provided. The method includes identifying a first network port to be taken offline; and before taking the first network port offline, processing all pending tags for the first network port. Thereafter, stopping all packet flow to the first network port from other network ports that communicate with the first network port; and storing all tags received at the other network ports while the first network port is offline.
The method further includes bringing the first network port online and releasing all stored tags to the first network port from the other network ports after the first network port is back online.
This brief summary has been provided so that the nature of the disclosure may be understood quickly. A more complete understanding of the disclosure can be obtained by reference to the following detailed description concerning the attached drawings.
The foregoing features and other features of the present disclosure will now be described with reference to the drawings of the various embodiments. In the drawings, the same components have the same reference numerals. The illustrated embodiments are intended to illustrate, but not to limit the disclosure. The drawings include the following Figures:
Definitions:
The following definitions are provided for convenience as they are typically (but not exclusively) used in Infiniband and general networking environment, implementing the various adaptive aspects described herein.
Infiniband (“IB”) is a switched fabric interconnect standard for servers, incorporated herein by reference in its entirety. IB technology is deployed for server clusters/enterprise data centers ranging from two to thousands of nodes. The IB standard is published by the InfiniBand Trade Association, and is incorporated herein by reference in its entirety.
“Inter switch link” or “ISL”: A physical link that is used for connecting two or more switches.
“Offline”: Status of a network port, which is not receiving and transmitting network packets at any given time. A network port may be taken offline for maintenance.
“Online”: Status of a network port when it is operating to send and receive network packets.
“Packet”: A group of one or more network data word(s) used for network communication.
“Switch”: A device that facilities network communication.
“Virtual Lane” (VL): The term VL as defined by Section 3.5.7 of the IB Specification provides a mechanism for creating virtual links within a single physical link. A virtual lane represents a set of transmit and receive buffers in a port. A data VL is used to send IB packets and according to the IB Specification, configured by a subnet manager based on a Service Level field in a packet.
Any of the embodiments described with reference to the figures may be implemented using software, firmware, hardware (e.g., fixed logic circuitry), manual processing, or a combination of these implementations. The term “logic” “module,” “component,” “system” or “functionality” as may be used herein generally represents software, firmware, hardware, or a combination of these elements. For instance, in the case of a software implementation, the term “logic,” “module,” “component,” “system,” or “functionality” represents program code that performs specified tasks when executed on a processing device or devices (e.g., processors). The program code can be stored in one or more computer readable memory devices.
Generally, the illustrated separation of logic, modules, components, systems, and functionality into distinct units may reflect an actual physical grouping and allocation of software, firmware, and/or hardware, or can correspond to a conceptual allocation of different tasks performed by a single software program, firmware program, and/or hardware unit. The illustrated logic, modules, components, systems, and functionality may be located at a single site (e.g., as implemented by a processing device), or may be distributed over plural locations.
The terms “machine-readable media” or the like when used, refer to any kind of medium for retaining information in any form, including various kinds of storage devices (magnetic, optical, static, and the like). The term machine-readable media also encompasses transitory forms for representing information, including various hardwired and wireless links for transmitting the information from one point to another.
The embodiments disclosed herein, may be implemented as a computer process (a method), a computing system, or as an article of manufacture, such as a computer program product or computer readable media. The computer program product may be a computer storage media readable by a computer device and encoding a computer program of instructions for executing a computer process. The computer program product may also be a propagated signal on a carrier readable by a computing system and encoding a computer program of instructions for executing a computer process.
Various industry standards, hardware and software components are typically used to implement network communication. The IB is one such industry standard used with computing systems and input/output (I/O) devices. The IB is used to create fabrics that are complex networks, which may encompass hundreds and even thousands of interconnected hosts/switches/servers, all working in parallel to solve complex problems.
It is noteworthy that the disclosed embodiments are not limited to the IB environment. The capabilities disclosed herein are applicable to other network protocols and standards, for example, the Fibre Channel (FC), the Fibre Channel over Ethernet (FCOE) standard and others.
To facilitate an understanding of the various embodiments, the general architecture and operation of a network system with respect to the IB standard will be described. The specific architecture and operation of the various embodiments will then be described with reference to the general architecture of the network system.
An IB switch is typically a multi-port device. Physical links (optical or copper) connect each port in a switch to another IB switch or an end device (for example, Target Channel Adapter (TCA) or a Host Channel Adapter (HCA)).
In one embodiment, switch 102 may be coupled to system 106, network device 114 and network 116, via ports 113, 122 and 124, respectively. Switch 104 may be operationally coupled to storage system 108, network 112 and host system 110 via ports 134, 138, and 136, respectively. In one embodiment, port 120 of switch 102 may be coupled to port 132 via a network link 128. A plurality of virtual lanes 130 (shown as VL0 to VLn) may be used between ports 120 and port 132.
Systems 106, 108 and 110 typically include several functional components. These components may include a central processing unit (CPU), main memory, input/output (“I/O”) devices, and streaming storage devices (for example, tape drives). In conventional systems, the main memory is coupled to the CPU via a system bus or a local memory bus. The main memory is used to provide the CPU access to data and/or program information that is stored in main memory at execution time. Typically, the main memory is composed of random access memory (RAM) circuits. A computer system with the CPU and main memory is often referred to as a host system.
Switch 102 may be coupled to an external processor 142 that is coupled to an Ethernet port 144 and serial port 145. In one embodiment, processor 142 may be a part of computing system 106. A network administrator may use processor 142 to configure switch 102.
Each port 120, 132 and 162 may include a receive buffer 152, 154 and 164, respectively, to receive and temporarily store a network packet, such as packet 168. Each port 120, 132 and 162 may also include a transmit buffer 146, 156 and 166, respectively, to temporarily store a packet before the packet is sent to its destination.
Generally, to ensure proper flow control, credit (i.e. available space) should be available at a receive buffer before a packet is transmitted by a port. For example, before ingress port 120 sends packet 168 to egress port 132, space should be available at receive buffer 154 of egress port 132. Egress port 132 sends a flow control packet to ingress port 120 to synchronize available credit information between egress port 132 and ingress port 120.
An incoming packet is received and stored at receive buffer 202 in receive segment 210. A tag writer module 204 in receive segment 210 generates a tag 218 (
As shown in
Tag writer 204 forwards tag 218 at 206 to the transmit segment 212. The transmit segment 212 includes a tag buffer 214 used to store a plurality of tags and an arbiter 216, which receives requests for processing tags 218. Arbiter 216 selects one of the plurality of tags 218. A packet 200 associated with tag 218 is then fetched from a receive buffer location and transmitted to its destination 222 by the transmit segment 212, via transmit buffer 220.
At any given time, as an example, egress port 132 is to be taken offline (shown as “Port “O”). Firmware for Ports 120 and 162 program a “Destination Port Reject Mask” 240 and 244. When port 132 is taken offline, the destination port reject mask stops all tag/packet flow to port 132. Tags 238 and 242 destined for egress port 132 are stored at ports 120 and 162. When port 132 is brought online, tags 238 and 242 are released and sent to port 132.
The process begins in block S300, when at any given time; a port that is to be taken offline is identified (for example, port 132) (“Port O”). In one embodiment, a network administrator (not shown) identifies the port that is to be taken offline.
In block, S302, a destination port mask is set in ports (for example, 120 and 162,
In block S304, all the pending tags for Port “O” are processed.
In block S306, Port “O” is taken offline.
In block S308, Port “O” is brought back online. The destination mask is then cleared. In block S310, tags stored at the masked ports (238 and 242) are received by Port “O” and processed.
In one embodiment, fewer packets are lost when a port is taken offline.
Although the present disclosure has been described with reference to specific embodiments, these embodiments are illustrative only and not limiting. Many other applications and embodiments of the present disclosure will be apparent in light of this disclosure and the following claims.
This application claims the benefit and priority of U.S. Provisional Application Ser. No. 61/114,406, entitled Method and System for Taking A Network Port Offline, filed Nov. 13, 2008, which is incorporated herein by reference in its entirety for all purposes.
Number | Name | Date | Kind |
---|---|---|---|
5463625 | Yasrebi | Oct 1995 | A |
5546543 | Yang et al. | Aug 1996 | A |
6115356 | Kalkunte et al. | Sep 2000 | A |
6192422 | Daines et al. | Feb 2001 | B1 |
6195334 | Kadambi et al. | Feb 2001 | B1 |
6317427 | Augusta et al. | Nov 2001 | B1 |
6487208 | Chirashnya et al. | Nov 2002 | B1 |
6535489 | Merchant et al. | Mar 2003 | B1 |
6732186 | Hebert | May 2004 | B1 |
6976088 | Gai et al. | Dec 2005 | B1 |
6981174 | Hanning | Dec 2005 | B1 |
7010716 | Yu et al. | Mar 2006 | B2 |
7035255 | Tzeng | Apr 2006 | B2 |
7085966 | Luick | Aug 2006 | B2 |
7237016 | Schober | Jun 2007 | B1 |
7260066 | Wang et al. | Aug 2007 | B2 |
7340325 | Sousa et al. | Mar 2008 | B2 |
7460527 | Ghosh et al. | Dec 2008 | B2 |
7483394 | Chang et al. | Jan 2009 | B2 |
20020012341 | Battle et al. | Jan 2002 | A1 |
20040008722 | G. Ellis et al. | Jan 2004 | A1 |
20040085894 | Wang et al. | May 2004 | A1 |
20050108444 | Flauaus et al. | May 2005 | A1 |
20050249123 | Finn | Nov 2005 | A1 |
20050271073 | Johnsen et al. | Dec 2005 | A1 |
20070076701 | Yamada | Apr 2007 | A1 |
20070274204 | Varada et al. | Nov 2007 | A1 |
20080273456 | Messing et al. | Nov 2008 | A1 |
20090201909 | Bou-Diab et al. | Aug 2009 | A1 |
20100014525 | Rehman et al. | Jan 2010 | A1 |
Number | Date | Country | |
---|---|---|---|
61114406 | Nov 2008 | US |