1. Field of the Invention
The present invention relates to network technology. More particularly, the present invention relates to methods and apparatus for processing packets within a switch.
2. Description of the Related Art
In recent years, the capacity of storage devices has not increased as fast as the demand for storage. Therefore a given server or other host must access multiple, physically distinct storage nodes (typically disks). In order to solve these storage limitations, the storage area network (SAN) was developed. Generally, a storage area network is a high-speed special-purpose network that interconnects different data storage devices and associated data hosts on behalf of a larger network of users. However, although a SAN enables a storage device to be configured for use by various network devices and/or entities within a network, data storage needs are often dynamic rather than static.
In a network such as a SAN, the speed of transmission of data is particularly important. However, there are a variety of limitations to the speed of data transmission within a network. One such limitation is the number of packets that may be transmitted per second. Moreover, attempting to meet or exceed this limitation may result in congestion within the switch unless the switch is designed to avoid the congestion.
One system commonly used to prevent congestion of packets at output ports within a switch is an arbitration system. During conventional arbitration processes, a packet is received by the switch via one of a plurality of input ports. More specifically, each packet received by the switch is addressed for transmission via one of a plurality of output ports. Rather than automatically forwarding the packets to the appropriate output port as they are received, the arbitrator arbitrates the transmission of packets to prevent congestion at the output ports. Unfortunately, arbitration is typically performed on a per-packet basis. Thus, the arbitration process introduces a substantial delay with the transmission of each packet.
In view of the above, it would be desirable if the speed of transmission of data within a network such as a storage area network could be increased. Moreover, it would be beneficial if data transmission could be expedited in a switch implementing an arbitrator.
The present invention enables data transmission within a switch implementing an arbitrator to be expedited. This is accomplished, in part, through the generation of a frame including multiple packets (or frames). In this manner, the arbitrator manages the transmission of frames rather than single packets.
In accordance with one aspect of the invention, methods and apparatus for transmitting a plurality of packets in a switch having a plurality of input ports and a plurality of output ports are disclosed. Each port may support input port functionality as well as output port functionality. However, for purposes of this application, the term input port and output port are used to refer to these separate functions. Two or more packets (or frames) are received at one or more of the plurality of input ports. One of the plurality of output ports via which to send each of the two or more packets is identified. A request message is sent to an arbitrator. A grant message is then received from the arbitrator in response to the request message. A frame including the two or more received packets, referred to as a superframe, is generated. The frame is then sent to the one of the plurality of output ports when the grant message is received. Once the frame or associated packets are transmitted via the one of the plurality of output ports, a corresponding available message is sent to the arbitrator indicating that the output port is now capable of receiving the next frame or one or more packets. In other words, the available message indicates the availability of one or more buffers capable of receiving a frame or associated packet(s). For instance, the available message may indicate the availability of one or more buffers capable of receiving a pre-determined number of bytes.
In accordance with another aspect of the invention, the request message is sent immediately upon receipt and/or queueing of a packet. This enables the arbitration process to begin while additional packets may be received and placed in a virtual output queue for subsequent transmission in a superframe. Alternatively, the request message may be sent upon the queueing of two or more packets destined for the same output port. When a grant message is received from the arbitrator, a frame including multiple packets destined for the same output port may be transmitted to the output port. In this manner, transmission of multiple packets may be managed by the arbitrator as well as transmitted by the switch while requiring only a single request and grant message to be processed by the arbitrator. For instance, each request and grant message may correspond to a maximum number of bytes, depending upon output storage resources.
In accordance with yet another aspect of the invention, an arbitrator is used to coordinate the sending of a plurality of packets or frames received at one or more input ports for transmission by one or more output ports. The arbitrator receives one or more request messages from one or more of the input ports, each of the request messages indicating a request to send one or more packets or frames via one of the output ports. For instance, multiple packets or frames may be sent together in what will be referred to as a “superframe.” The arbitrator determines whether the one of the output ports is capable of receiving the one or more packets or frames. For instance, the arbitrator may determine whether a credit is available for the requested output port. A grant message is then generated or sent when it is determined that the one of the output ports is capable of receiving the one or more packets or frames, the grant message indicating that the one of the output ports is capable of receiving the one or more packets or frames.
In accordance with another aspect of the invention, the present invention is implemented on a per-port basis. More particularly, the present invention may be implemented in hardware and/or software dedicated to each port within a switch. In other words, selected ports of one or more network devices may implement the disclosed functionality in hardware and/or software. This allows processing to scale with the number of ports. Accordingly, the present invention provides far greater bandwidth for data transmission than traditional arbitration-based switching schemes as a result of the processing of multiple packets per “available-request-grant” cycle.
Various network devices may be configured or adapted for intercepting, generating, modifying, and transmitting packets or frames to implement the disclosed functionality. These network devices include, but are not limited to, servers (e.g., hosts), routers, and switches. Moreover, the functionality for the above-mentioned processes may be implemented in software as well as hardware.
Yet another aspect of the invention pertains to computer program products including machine-readable media on which are provided program instructions for implementing the methods and techniques described above, in whole or in part. Any of the methods of this invention may be represented, in whole or in part, as program instructions that can be provided on such machine-readable media. In addition, the invention pertains to various combinations and arrangements of data generated and/or used as described herein. For example, packets and frames having the format described herein and provided on appropriate media are part of this invention.
These and other features of the present invention will be described in more detail below in the detailed description of the invention and in conjunction with the following figures.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. It will be obvious, however, to one skilled in the art, that the present invention may be practiced without some or all of these specific details. In other instances, well known process steps have not been described in detail in order not to unnecessarily obscure the present invention.
Within a fibre channel network, a number of buffers is typically allocated on a per-port basis at some initial time. Fibre channel credits are then issued according to usage of the assigned buffers. However, such a credit mechanism has not been used in combination with an arbitration mechanism.
In accordance with various embodiments of the invention, an arbitrator is used to prevent congestion at the output ports of the switch. However, rather than arbitrating on a per-packet basis, multiple packets are appended and transmitted in a single frame. In this manner, the delay associated with the transmission of multiple packets by a switch is reduced.
Various embodiments of the invention may be implemented in a network device such as a switch. Note that the frames and/or packets being received and transmitted by such a switch possess the frame format specified for a standard protocol such as Ethernet or fibre channel. Hence, software and hardware conventionally used to generate such frames may be employed with this invention. Additional hardware and/or software is employed to modify and/or generate frames compatible with the standard protocol in accordance with this invention.
The frame is generated by a network device such as a host, switch, or storage device. Obviously, the appropriate network devices should be configured with the appropriate software and/or hardware for performing the disclosed functionality. Of course, all network devices within the storage area network need not be configured with the disclosed functionality. Rather, selected switches and/or ports may be configured with or adapted for the disclosed functionality. Similarly, in various embodiments, such functionality may be enabled or disabled through the selection of various modes. Moreover, it may be desirable to configure selected ports of network devices as ports capable of performing the disclosed functionality, either continuously, or only when in an enabled state.
The standard protocol employed in the storage area network (i.e., the protocol used to frame the data) will typically, although not necessarily, be synonymous with the “type of traffic” carried by the network. As explained below, the type of traffic is defined in some encapsulation formats. Examples of the type of traffic are typically layer 2 or corresponding layer formats such as Ethernet, Fibre Channel, and InfiniBand.
As described above, a storage area network (SAN) is a high-speed special-purpose network that interconnects different data storage devices with associated network hosts (e.g., data servers or end user machines) on behalf of a larger network of users. A SAN is defined by the physical configuration of the system. In other words, those devices in a SAN must be physically interconnected. Within a storage area network 131 such as that illustrated in
As indicated above, this invention pertains to data transmission in networks such as storage networks. Although it is possible that the present invention may be implemented within a single switch, multiple switches making up a network fabric may together implement the present invention. Further, various embodiments of this invention are implemented on a per port basis. In other words, a multi-port switch will have the disclosed functionality separately implemented on one or more of its ports. Individual ports have dedicated logic for handing the disclosed functions for packets or frames handled by the individual ports. This allows processing of packets and frames to scale with the number of ports, and provides far greater bandwidth for data transmission than can be provided with traditional arbitration schemes.
In a specific and preferred embodiment of the invention, the disclosed logic is separately implemented at individual ports of a given switch—rather than having centralized processing for all ports of a switch. This allows the processing capacity to be closely matched with the exact needs of the switch on a per port basis. If a central processor is employed for the entire switch (serving numerous ports), the processor must be designed/selected to handle maximum traffic at all ports. For many applications, this represents extremely high processing requirements and a very large/expensive processor. If the central processor is too small, the switch will at times be unable to keep up with the switching demands of the network.
Many storage area networks in commerce run a SCSI protocol to access storage sites. Frequently, the storage area network employs fibre channel (FC-PH (ANSI X3.230-1994, Fibre Channel-Physical and Signaling Interface) as a lower level protocol and runs IP and SCSI on top of fibre channel. Note that the invention is not limited to any of these protocols. For example, fibre channel may be replaced with Ethernet, Infiniband, and the like. Further the higher level protocols need not include SCSI. For example, other protocols may be used by hosts to access storage.
In order to send a frame for transmission by one of the output ports, a request message 422 is sent to an arbitrator 424. When the arbitrator 424 receives an available message 426 indicating the ability of an output port to accept one or more packets or frames for transmission, a grant message 428 is sent by the arbitrator 424 to the input port. A frame 430 including the two or more received packets is then generated and sent to the appropriate output port. Alternatively, the frame 430 is preferably generated in whole, or in part, prior to receiving the grant message 428.
When the frame 430 is transmitted, it is stored in an available buffer associated with the appropriate output port. It is important to note that the buffers are a limited resource per output port. Thus, each buffer represents a specified number of bytes to which “ownership” is granted when a request and available message are matched.
In accordance with one embodiment, a frame is generated by appending two or more packets obtained from a virtual output queue associated with one of the output ports. This frame, referred to as a “superframe,” will be described in further detail below with reference to
Communication between the input ports and the arbitrator, and between the arbitrator and the output ports, may be accomplished in a variety of ways. For instance, an available message, request message, and grant message may be implemented through various control lines within one or more line cards. However, this is merely illustrative, and alternate methods of communicating with an arbitrator may be implemented. It is important to note that packets or frames need not be intercepted by the arbitrator, but may be transmitted from an input port to an output port directly. However, in various embodiments, it may be desirable to send the packets or frames to an output port via the arbitrator.
As described above, various switches within a storage area network may be switches supporting the disclosed switching functionality.
The frame or packet is received by a forwarding engine 512, which obtains information from various fields of the frame, such as source address and destination address. The forwarding engine 512 then accesses a forwarding table 514 to determine whether the source address has access to the specified destination address. The forwarding engine 512 also determines the appropriate port of the switch via which to send the frame, and generates an appropriate routing tag for the frame. In one embodiment, the port via which the frame is to be sent is identified in a header that is appended to the packet or frame.
Once the packet or frame is appropriately formatted for transmission, the frame will be received by a buffer queueing block 516, which will be referred to interchangeably as a virtual output queue engine or scheduler, prior to transmission. Rather than transmitting frames or packets as they are received, they are stored temporarily in a buffer or virtual output queue 518, as described above with reference to
The above-described functionality is performed in combination with an arbitrator as described above with reference to
The above-described functionality is preferably performed on a per-port basis rather than per switch. Thus, each switch may have one or more ports that are capable of performing the disclosed functions, as well as ports that are not capable of such functions.
Although the network devices described above with reference to
Once received by the output port, the output port may either send the entire frame or parse the packets for separate transmission via the output port.
In order to ensure that the superframe complies with the size of the memory (e.g., buffer) in which the superframe will be stored at the output port, there will typically be a maximum number of packets, or otherwise be a limit to the number of bits or bytes within a superframe. Thus, as two or more packets within a virtual output queue are appended, the sum of the lengths of those packets in bits or bytes may be maintained to ensure that the sum is less than or equal to a pre-defined number of bits or bytes. For instance, a number of packets may be appended to provide a maximum of approximately 1.5 maximum transmission units. It is important to note that the packets transmitted by the switch may include a variety of information or data.
When the arbitrator receives the request at block 808, it waits until it receives an available message (i.e., credit) indicating an available inbound buffer capable of receiving a frame or one or more packets addressed to the output port identified in the request message as shown at block 810. The arbitrator then sends a grant message identifying the output port identified in the request message at block 812. More specifically, the grant message may indicate that a frame of one or more packets addressed to the specified output port can be sent. Of course, if the arbitrator receives the request and it has already previously received a credit, it will send the grant message immediately.
It is important to note that the arbitrator may receive multiple requests associated with the same output port. Thus, in accordance with one embodiment, the arbitrator selects which one of those requests is to be matched with an “available” message based on the order of request message arrival.
When the input port or associated hardware and/or software receives the grant message from the arbitrator at block 814, it generates a superframe such as that described above with reference to
As described above with reference to block 814, in accordance with one embodiment, the superframe is generated after the grant message is received from the arbitrator. Thus, regardless of when the request message is sent, the grant message will trigger the generation of the superframe. However, generation of the frame may also be performed, or begun, before the grant message is received. In other words, the frame may be generated in whole or in part prior to receipt of the grant message. For instance, various packets received in the virtual output queue may be appended as they are received. In other words, once the grant message is received, this may trigger the sending of a frame that has already been generated.
Once the superframe is received by the output port, or associated hardware and/or software, the superframe addressed to the output port is then stored in a buffer associated with the output port at block 818. The frame is then obtained from the buffer and parsed to obtain the two or more packets. For instance, the packet length may be obtained from the header of the packets in the frame in order to parse the frame. The packets are then transmitted via the output port at block 820. The output port then sends an available message (i.e., credit) to the arbitrator at block 822. In this instance, the available message indicates an available buffer capable of receiving a frame (or one or more packets) addressed to the output port.
Although the above-described embodiment is described with reference to obtaining a frame in order to separately transmit two or more packets via an output port, this example is merely illustrative. Thus, the superframe may also be transmitted via the output port in its entirety. For instance, the superframe may be parsed by another switch receiving the frame, as well as compressed for efficient transmission.
As described above with reference to
In accordance with one embodiment, the cross-bar is a frame-based buffered cross-bar 1002. As shown, in this example, the buffered cross-bar includes a plurality of vertical bars 1004 and horizontal bars 1006. In addition, the cross-bar 1002 includes a plurality of switches 1008. The switches 1008 are together configured such that any switch may be turned on at a given time. Typically, two inputs cannot be connected simultaneously to the same output. However, this limitation is eliminated with the use of a buffered cross-bar.
A buffered cross-bar includes one or more buffers 1010 at each input and output to the cross-bar 1002. With the use of the buffers 1010, data may be transmitted simultaneously at all inputs until the buffers 1010 at the inputs and outputs are full. The usage of a buffered cross-bar allows the arbitrator to send the grant as soon as possible to match the credit with a request without concern about whether the previous grant has resulted in a packet being sent or not, since the buffer(s) in the cross-bar allows multiple packets to be sent to the same output without blocking. Effectively, one could match the number of credits used by the arbitrator with the size of buffers in the cross-bar. In this manner, the efficiency of traffic through the cross-bar is maximized.
In accordance with various embodiments, request messages sent by the input ports are intercepted by the line card and sent to the arbitrator. For instance, the request messages may be sent in a list to the arbitrator.
A second line card 1224 includes input port C 1226 and input port D 1228. Input port C 1226 has an associated set of virtual output queues 1230-1236 and input port D 1228 has an associated set of virtual output queues 1238-1244. Each set of virtual output queues 1230-1236 and 1238-1244 includes a virtual output queue for each output port A-D, as shown. In this example, line card 21224 has three pending requests from input port C and zero pending requests from input port D. The second line card 1224 sends a corresponding list of requests 1250 via the cross-bar 1246 to the arbitrator (not shown). The list of request messages may include one or more requests associated with one or more of the input ports, as shown. Thus, the line card sends a list of request messages on behalf of one or more input ports of the line card. As described above, each list of request messages indicates an order in which the associated request messages were sent by the line card to the arbitrator for processing. In the above description, request messages are generated for each packet and sent to the arbitrator. However, in a preferred embodiment, it is preferable to send a number of request messages that is less than the number of requests (e.g., packets) in the virtual output queues.
When the arbitrator receives the request messages from the line cards, it preferably stores or tracks these request messages to enable grant messages to be generated and transmitted to the appropriate line cards and/or input ports in the corresponding order.
As shown, each request message is sorted and stored in one of the output queues. For instance, a line card identifier identifying the line card from which the request message was received may be stored in the output queue. In this example, the first output queue 1302 associated with output port A includes a single request message associated with the first line card. The second output queue 1304 associated with output port B includes a set of request messages 1312 including three request messages received from the first line card and one request message received from the second line card. The third output queue 1306 associated with output port C includes a single request message 1314 received from the first line card, while the fourth output queue 1308 associated with output port D includes two requests messages 1316 received from the second line card. Each of the request messages may also identify one of the output ports. Note that the requests are inserted into the queue in the order that they are received from the line cards in order to provide fairness between the line cards. For example, in Queue B 1304, the request from line card 2 is behind the requests from line card 1 because the request from line card 2 arrives later in time.
When the arbitrator determines that the output port is capable of receiving one or more packets or frames for transmission, it generates or sends a grant message. For instance, when the arbitrator receives an available message (e.g., credit) or determines that a credit is available for the output port, it may send a grant message to the line card identified in the output queue. In this manner, grant messages for a given output port may be processed in the order in which request messages were received.
As described above, the arbitrator keeps track of requests and credits for each output port. In this manner, it may determine when a grant may be sent to the requesting input port and/or line card. As described above with reference to
While the arbitrator may transmit a grant message immediately upon generation of the grant message, it may temporarily store the grant messages.
Through the generation and transmission of a superframe within a switch using an arbitration system, it is possible to maximize the amount of data transmitted by a switch while controlling the congestion at the output ports. Accordingly, the throughput of the switch is maximized while minimizing the time delay imposed by an arbitrator.
Although illustrative embodiments and applications of this invention are shown and described herein, many variations and modifications are possible which remain within the concept, scope, and spirit of the invention, and these variations would become clear to those of ordinary skill in the art after perusal of this application. For instance, the present invention is described as being applied to frames. However, it should be understood that the invention is not limited to such implementations, but instead would equally apply to packets as well. In addition, it is possible to support intentional re-ordering of packets and/or frames by attaching a priority to the request, credit, and/or grant messages, which may then be matched with the priority of the packets/frames. Moreover, the present invention would apply regardless of the context and system in which it is implemented. Thus, broadly speaking, the present invention need not be performed using the operations described above, but may be used to support other operations in a network such as a storage area network.
In addition, although an exemplary switch is described, the above-described embodiments may be implemented in a variety of network devices (e.g., servers) as well as in a variety of mediums. For instance, instructions and data for implementing the above-described invention may be stored on a disk drive, a hard drive, a floppy disk, a server computer, or a remotely networked computer. Accordingly, the present embodiments are to be considered as illustrative and not restrictive, and the invention is not to be limited to the details given herein, but may be modified within the scope and equivalents of the appended claims.
This application is related to co-pending application Ser. No. ______, Attorney Docket No. ANDIP012, entitled “Arbitration System,” by Kloth et al, filed on the same day as the present application.