Within the SAN 104, one or more switches 112 provide connectivity, routing, and other SAN functionality. Some of the switches 112 may be configured as a set of blade components inserted into a chassis or as rackable or stackable modules. The chassis, for example, may comprise a back plane or mid-plane into which the various blade components, such as switching blades and control processor blades, are inserted. Rackable or stackable modules may be interconnected using discrete connections, such as individual or bundled cabling.
In the illustration of
The second level arbitration system then arbitrates between information flows received from the arbitration segments of the first level arbitration system based upon the aggregate weights received along with those information flows. The second level arbitration system then forwards a selected information flow to an egress point of the stage. The stage may, for example, comprise a portion of a switch, a switch, or a switch network. The stage may also be scalable such that the second level arbitration system further aggregates the aggregate weights received from active arbitration segments of the first level arbitration system to determine a stage weight associated with the information flow forwarded to the egress point of the stage. This stage weight is then forwarded to an ingress point of a second stage disposed downstream of the stage. The second stage receives input information flows at a plurality of ingress points including the information flow received from the egress point of the prior stage. The second stage then uses the stage weight received along with the information flow of the prior stage to arbitrate between its information flow inputs as described above.
The computing and storage framework 100 may further comprise a management client 114 coupled to the switches 112, such as via an Ethernet connection 116. The management client 114 may be an integral component of the SAN 104, or may be externally to the SAN 104. The management client 114 provides user control and monitoring of various aspects of the switch and attached devices, including without limitation, zoning, security, firmware, routing, addressing, etc. The management client 114 may identify at least one of the managed switches 112 using a domain ID, a World Wide Name (WWN), an IP address, a Fibre Channel address (FCID), a MAC address, or another identifier, or be directly attached (e.g., via a serial cable). The management client 114 therefore can send a management request directed to at least one switch 112, and the switch 112 will perform the requested management function. The management client 114 may alternatively be coupled to the switches 112 via one or more of the application clients 106, the LAN 102, one or more of the application servers 108 and 109, one or more of the application data storage devices 110, directly to at least one switch 112, such as via a serial interface, or via any other type of data connection.
The stage 200 of the computing and storage framework may comprise, for example, a portion of a LAN or a SAN. In the embodiment shown in
The stage 200 comprises a dual-level fairness arbitration system in which each level comprises an independent arbiter. The independent arbiters of each stage, for example, may be used to approximate a global arbiter while only requiring a single direction of control communication (i.e., the system only requires feed-forward control communication, not feedback control communication although feedback control communication may also be used). The stage 200 comprises a first level arbitration system 202 and a second level arbitration system 204. For simplicity, only two levels of arbitration are shown, although the stage 200 may include any number of additional levels. The first level arbitration system 202 comprises a plurality of ingress points 206, such as input ports of a switch, ultimately providing a path through the second level arbitration system 204 to a common egress point 208, such as an output terminal of a switch. Although only a single egress point 208 is shown in the example of
Each ingress point 206 and egress point 208 receives and transmits any number of “flows.” Each flow, for example, may comprise a uniquely identifiable series of frames or packets that arrive at a specific ingress point 206 and depart from a specific egress point 208. Other aspects of a frame or packet may be used to further distinguish one flow from another and there can be many flows using the same ingress point 206 and egress point 208 pair. Each flow may thus be managed independently of other flows.
The first level arbitration system 202 comprises a plurality of segments 210, 212, and 214 that provide separate paths to the second level arbitration system 204 of the stage 200. At least one of these segments receives information flow inputs (e.g., packets or frames) from at least one ingress point 206, arbitrates between one or more of the inputs provided to the segment, and provides an output information flow corresponding to a selected one of the ingress points 206 to the second level arbitration system 204. Although the first and third segments 210 and 214 of the example shown in
In the example shown in
As shown in
In
The arbiters 218 may arbitrate among information flows received at their corresponding ingress points 206 targeting a single virtual output queue 220 (e.g., a FIFO queue) based upon the weights assigned to or otherwise associated with the ingress points 206, the virtual input queues 216, or a combination thereof. For example, the weights of the ingress points 206 may be used to determine a portion of the bandwidth or a portion of the total frames or packets available to the arbiter 218 that is allocated to information flows received from each ingress point 206. As shown in
The arbiters 218, alternatively, may utilize weighted round robin queuing to arbitrate between information flows in the virtual input queues 216 of the segments 210, 212, and 214 based upon the weights associated with the flows. The selected information flows are then forwarded to the second level arbitration system 204 for further arbitration. Alternatively, the arbiters 216 may bias their input information flows (e.g., bias their packet or frame grant) to achieve a weighted bandwidth allocation based upon the assigned weights of the ingress points or virtual input queues. In one configuration, for example, the arbiter may back pressure the ingress points 206 exceeding their portion of the bandwidth.
The weights associated with each of the ingress points 206, the virtual input queues 216, or the input flows of a particular segment 210, 212, or 214 are aggregated to provide an aggregate weight for information flows forwarded from that segment. The aggregate weight associated with an information flow is forwarded to the second level arbitration system 204 along with its associated information flow. The aggregate weight forwarded to the second level arbitration system 204 may be forwarded in-band with the information flow (e.g., within a control frame of the information flow) or may be forwarded in out-of-band with the information flow (e.g., along a separate control path).
The aggregate weight, for example, may comprise the total weight assigned to active ingress points 206 of the segment 210, 212, or 214. An active ingress point, for example, may be defined as an ingress port that has had at least one information flow (e.g., at least one packet or frame) received within a predetermined period of time (e.g., one millisecond prior to the current time) or may comprise an ingress point having at least one information flow (e.g., at least one packet or frame) within its corresponding virtual input queue 216 that is vying for resources of the stage 200 at the present time. Thus, assuming each ingress point 206 of the first segment 210 is active, the aggregated weight (a+b+c+d) of the first segment 210 is determined as the sum of the weights assigned to the ingress points 206 of the first segment 210 and is passed forward with an information flow from the first segment 210. If the second ingress point 206 of the first segment 210 (i.e., the ingress point assigned a weight of “b”) is inactive, however, the aggregated weight passed forward with an information flow at that time from the first segment 210 would be a+c+d. Where the weights of each ingress point 203 is equal (e.g., one), the aggregated weight determined for each segment corresponds to the number of active ingress points contributing to the segment at any particular point in time. The aggregated weight, however, may also be merely representative of such an algebraic sum and ratio. For example, the aggregate weight may be “compressed” so that fewer bits are required or levels (e.g., high, medium, and low) may be used to indicate two or more levels and indicate one or more threshold being met.
The second level arbitration system 204 receives information flows from the segments 210, 212, and 214, and arbitrates between these flows based on the aggregated weights received from the corresponding segments 210, 212, and 214. Assuming each ingress point 206 is active, the information flow received from the virtual output queue 220 of the first segment 210 has an aggregated weight associated with it of a+b+c+d (i.e., the sum of the weights of the four active ingress points of the first segment 210), the information flow received from the virtual output queue 220 of the second segment 212 has an aggregated weight associated with it of “e” (i.e., the weight associated with the active single ingress point of the second segment 212), and the information flow received from the virtual output queue 220 of the third segment 214 has an aggregated weight associated with it of f+g+h (i.e., the sum of the weights associated with the three active ingress points of the third segment 214). The arbiter 222 then arbitrates between the information flows based upon the aggregated weights associated with each of the information flows, such as described above with respect to the arbiters 218 of the first level arbitration system 202. The arbiter 222, for example, may utilize weighted round robin queuing to arbitrate between information flows in the virtual output queues 220 of the segments 210, 212, and 214 based upon the aggregated weights received from the segments. The mathematical algorithm used here, for example, may comprise the same algorithm described above with respect to the segments 210, 212, and 214. The selected one of the information flows is forwarded to the egress point 208 of the stage 200. Alternatively, the arbiter 222 may bias its selection of input information flows (e.g., bias their packet or frame grant for each input) to achieve a weighted bandwidth, frame, or packet allocation based upon their assigned aggregate weights. In one configuration, for example, the arbiter may back pressure the segments exceeding their portion of the bandwidth.
The arbitration system of the stage 200 further allows for scaling between multiple stages. Where at least one further stage is located downstream of the stage 200 shown, the arbiter 222 of the second level arbitration system 204 may aggregate the weights of the information flows received from the virtual output queues 220 of the segments 210, 212, and 214 to produce an aggregated weighting associated with the information flow forwarded to the egress point 208 of the stage 200. Thus, in the example shown in
Alternatively, such as where scaling multiple stages is not required, an information flow selected by the arbiter 220 may be forwarded to the egress point 208 of the stage 200 without a weight associated with it (or with the weight associated with the flow prior to arbitration by the arbiter 220).
The arbitration system of the stage 200 thus comprises dual levels of arbitration that only require a single direction of control communication (i.e., a feed-forward system) and does not require feedback control (although feedback control may be used). The system may further be variable to compensate for inactive ingress points and arbitrate upon the number of active ingress points competing for resources of the stage. Thus, as one or more ingress points become inactive, the arbiters 218 and 222 may immediately dedicate remaining bandwidth to other information flow inputs that are still active. Feedback loops changing upstream conditions, and causing corresponding delays, are unnecessary.
The allocated segment 310 comprises at least one virtual input queue 316, an arbiter 318, and a virtual output queue 320. The virtual input queues 316 in this example, however, are not tied to a particular ingress point 306, but rather are shared between one or more ingress points providing a path to a common egress point 308. In one configuration, for example, a time division multiplexing (TDM) bus may be used to allow flows received at various ingress points 306 to be transmitted to a particular one of the virtual input queues 316 of the allocated segment 310 or to the unallocated segment 312. Other configurations, however, may also be used. In this manner, a particular stage may share virtual input queues 316 without the need to provide a virtual input queue 316 for every ingress point 306 and egress point 308 combination in the stage. Once an information flow input is received by one of the virtual input queues 316, the allocated segment operates as described above with respect to
In the unallocated segment 312, however, information flow inputs received from at least one of the ingress points targeting the egress point 308 are directed into a virtual output queue 321. From the virtual output queue 321, the information flows are forwarded to the second level arbitration system 304, where they are processed without regard to fairness concerns. High priority flows (e.g., fabric traffic or management traffic) may be directly provided to the second level arbitration system 304 where they are associated with a weight greater than the aggregated weight received from the allocated segment and thus have a higher relative priority than the flows received from the allocated segment. Low priority flows (e.g., background flows) may, for example, be associated with a weight lower than the aggregated weight received from the allocated segment and thus have a lower relative priority than the flows received from the allocated segment. The stage 300 may, for example, comprise a plurality of allocated segments and/or unallocated segments (e.g., a high priority unallocated segment and a low priority unallocated segment). In this example, medium priority information flows comprising the bulk of the traffic (e.g., user data traffic flows) are forwarded through the allocated segment 310 and are have a relative priority lower than the unallocated high priority information flows, and a relative priority higher than the unallocated the low priority information flows.
The information flows (e.g., packets or frames) are received at the ingress points 306 targeting the egress point 308. The information flows comprise at least a destination identifier and other information from which the egress point 308 can be derived. The information flows may further comprise additional fields such as a source identifier and/or a virtual fabric identifier that may be used to assign the information field to one of the allocated virtual input queues 316. The information flows thus may be assigned to the input queues 316 of the allocated segment 310. In addition, one or more of the individual virtual input queues may be individually assignable, e.g., information flows may be directly assigned to a particular virtual input queue instead of merely to the allocated segment. If the information flow does not identify a virtual input queue 316, however, the information flow is transferred to the virtual output queue of the unallocated segment 315. Frames that were not assigned to the allocated segment, however, may be transferred to the unallocated segment and treated with a fixed weight by the arbiter 322. Alternatively, a look up table, such as a content addressable memory (CAM), may be used by the stage to identify a path for an information flow received at an ingress point 306 of the stage 300. If an information flow comprises a destination ID identifying the egress point 308, and the flow is received by the stage at a particular ingress point 306, the look up table may identify a particular virtual input queue 316 or a virtual output queue 321 of the unallocated segment 315. In this example, the path of the information flow is tied to the ingress point 306 it is received at and the egress point 308 it is targeting.
The switch segments 410, 412, and 414 receive information flows from the ingress points 406. Each of the ingress points 406 has a weight assigned to it. The switch segments arbitrate between information flows received from active ingress points 406 based on the weights of those ingress points 406. Weights assigned to the active ingress points 406 are aggregated for each of the switch segments 410, 412, and 414 to determine aggregate weights for the output ports of the switch segments 410, 412, and 414. The aggregate weight of each switch segment at a particular point in time is forwarded with information flows passed from the switch segments 410, 412, and 414 to the switch 422 of the second level arbitration system 404. The switch 422 then uses the aggregated weights received with the information flows from the switch segments 410, 412, and 414 of the first level arbitration system 402 to arbitrate between the information flows received from the switch segments 410, 412, and 414 of the first level arbitration system 402 and forwards the selected information flow to the egress point 408 of the stage 400.
Although only two hierarchical levels of the switch system are shown for the stage 400, any additional number of switches may be utilized. In such an example, each level may arbitrate between information flows received from active ingress points based upon weights associated with the information flows and aggregate those weights to determine an aggregated weight for that level. The level forwards a selected information flow along with the aggregate weight determined for that level. The switch of the next level receives information flows from a plurality of upstream switches and their associated aggregate weights and arbitrates between these received information flows based upon the associated aggregate weights. The level also aggregates each received aggregate weight and forwards the newly aggregated weight with a selected information flow to another downstream switch until the switch provides the selected information flow to the egress point of the stage 400.
Although the embodiments shown in
The embodiments of the invention described herein are implemented as logical steps in one or more computer systems. The logical operations of the present invention are implemented (1) as a sequence of processor-implemented steps executing in one or more computer systems and (2) as interconnected machine or circuit modules within one or more computer systems. The implementation is a matter of choice, dependent on the performance requirements of the computer system implementing the invention. Accordingly, the logical operations making up the embodiments of the invention described herein are referred to variously as operations, steps, objects, or modules. Furthermore, it should be understood that logical operations may be performed in any order, unless explicitly claimed otherwise or a specific order is inherently necessitated by the claim language.
The above specification, examples and data provide a complete description of the structure and use of exemplary embodiments of the invention. Since many embodiments of the invention can be made without departing from the spirit and scope of the invention, the invention resides in the claims hereinafter appended. Furthermore, structural features of the different embodiments may be combined in yet another embodiment without departing from the recited claims.