This invention relates to virtual channel arbitration in switched fabric networks.
PCI (Peripheral Component Interconnect) Express is a serialized I/O interconnect standard developed to meet the increasing bandwidth needs of the next generation of computer systems. PCI Express was designed to be fully compatible with the widely used PCI local bus standard. PCI is beginning to hit the limits of its capabilities, and while extensions to the PCI standard have been developed to support higher bandwidths and faster clock speeds, these extensions may be insufficient to meet the rapidly increasing bandwidth demands of PCs in the near future. With its high-speed and scalable serial architecture, PCI Express may be an attractive option for use with or as a possible replacement for PCI in computer systems. The PCI Special Interest Group (PCI-SIG) manages PCI specifications as open industry standards, and provides the specifications to its members Advanced Switching (AS) is a technology which is based on the PCI Express architecture, and which enables standardization of various backplane architectures. AS utilizes a packet-based transaction layer protocol that operates over the PCI Express physical and data link layers. The AS architecture provides a number of features common to multi-host, peer-to-peer communication devices such as blade servers, clusters, storage arrays, telecom routers, and switches. These features include support for flexible topologies, packet routing, congestion management (e.g., credit-based flow control), fabric redundancy, and fail-over mechanisms. The Advanced Switching Interconnect Special Interest Group (ASI-SIG) is a collaborative trade organization chartered with providing a switching fabric interconnect standard, specifications of which it provides to its members.
Each switch element 102 and end point 104 has an Advanced Switching (AS) interface that is part of the AS architecture defined by the “Advance Switching Core Architecture Specification” (available from the Advanced Switching Interconnect-SIG at www.asi-sig.com). The AS architecture utilizes a packet-based transaction layer protocol that operates over the PCI Express physical and data link layers 202, 204, as shown in
AS uses a path-defined routing methodology in which the source of a packet provides all information required by a switch (or switches) to route the packet to the desired destination.
A path may be defined by the turn pool 402, turn pointer 404, and direction flag 406 in the route header 302, as shown in
The PI field in the AS route header 302 determines the format of the encapsulated packet 304. The PI field is inserted by the end point 104 that originates the AS packet and is used by the end point that terminates the packet to correctly interpret the packet contents. The separation of routing information from the remainder of the packet enables as AS fabric to tunnel packets of any protocol.
PIs represent fabric management and application-level interfaces to the switched fabric network 100. Table 1 provides a list of PIs currently supported by the AS Specification.
PIs 0-7 are used for various fabric management tasks, and PIs 8-254 are application-level interfaces. As shown in Table 1, PI-8 is used to tunnel or encapsulate a native PCI Express packet. Other PIs may be used to tunnel various other protocols, e.g., Ethernet, Fibre Channel, ATM (Asynchronous Transfer Mode), InfiniBand®, and SLS (Simple Load Store). An advantage of an AS switch fabric is that a mixture of protocols may be simultaneously tunneled through a single, universal switch fabric making it a powerful and desirable feature for next generation modular applications such as media gateways, broadband access routers, and blade servers.
The AS architecture supports the establishment of direct endpoint-to-endpoint logical paths through the switch fabric known as Virtual Channels (VCs), This enables a single switched fabric network to service multiple, independent logical interconnects simultaneously, each VC interconnecting AS end points for control, management and data. Each VC provides its own queue so that blocking in one VC does not cause blocking in another. Each VC may have independent packet ordering requirements, and therefore each VC can be scheduled without dependencies on the other VCs.
The AS architecture defines three VC types: Bypass Capable Unicast (BVC); Ordered-Only Unicast (OVC); and Multicast (MVC). BVCs have bypass capability, which may be necessary for deadlock free tunneling of some, typically load/store, protocols. OVCs are single queue unicast VCs, which are suitable for message oriented “push” traffic. MVCs are single queue VCs for multicast “push” traffic.
The AS architecture provides a number of congestion management techniques, one of which is a credit-based flow control technique that ensures that packets are not lost due to congestion. Link partners (e.g., an end point 104 and a switch element 102) in the network exchange flow control credit information to guarantee that the receiving end of a link has the capacity to accept packets. Flow control credits are computed on a VC-basis by the receiving end of the link and communicated to the transmitting end of the link. Typically, packets are transmitted only when there are enough credits available for a particular VC to carry the packet. Upon sending a packet, the transmitting end of the link debits its available credit account by an amount of flow control credits that reflects the packet size. As the receiving end of the link processes (e.g., forwards to an end point 104) the received packet, space is made available on the corresponding VC and flow control credits are returned to the transmission end of the link. The transmission end of the link then adds the flow control credits to its credit account.
The AS architecture supports the implementation of an AS Configuration Space in each AS device (e.g., AS end point 104) in the network. The AS Configuration Space is a storage area that includes fields to specify device characteristics as well as fields used to control the AS device. The AS Configuration Space includes up to 16 apertures where configuration information can be stored. Each aperture includes up to 4 Gbytes of storage and is 32-bit addressable. The configuration information is presented in the form of capability structures and other storage structures, such as tables and a set of registers. Table 2 provides a set of capability structures (“AS Native Capability Structures”) that are defined by the AS Specification and stored in aperture 0 of the AS Configuration Space.
Legend:
O = Optional normative
R = Required
R w/OE = Required with optional normative elements
N/A = Not applicable
The information stored in the AS Native Capability Structures can be accessed through node configuration packets, e.g., PI-4 packets, which are used for device management.
In one implementation of a switched fabric network, the AS devices on the network are restricted to read-only access of another AS device's AS Native Capability Structures, with the exception of one or more AS end points that have been elected as fabric managers.
A fabric manager election process may be initiated by a variety of either hardware or software mechanisms to elect one or more fabric managers for the switched fabric network. A fabric manager is an AS end point that “owns” all of the AS devices, including itself, in the network. If multiple fabric managers, e.g., a primary fabric manager and a secondary fabric manager, are elected, then each fabric manager may own a subset of the AS devices in the network. Alternatively, the secondary fabric manager may declare ownership of the AS devices in the network upon a failure of the primary fabric manager, e.g., resulting from a fabric redundancy and fail-over mechanism.
Once a fabric manager declares ownership, it has privileged access to its AS devices' AS Native Capability Structures. In other words, the fabric manager has read and write access to the AS Native Capability Structures of all of the AS devices in the network.
As previously discussed, the AS Native Capability Structures of an AS device are accessible through PI-4 packets. Accordingly, each AS device in the switched fabric network can be implemented to include an AS PI-4 unit for processing PI-4 packets received through the network from a fabric manager or another AS device. In the examples to follow, the term “local AS device” refers to an AS device that has received a PI-4 packet and is processing the PI-4 packet, and the term “remote AS device” refers to an AS device, e.g., a fabric manager or another AS device, on the network that is attempting to access the local AS device's AS Native Capability Structures.
Referring to
Packets received at the local AS device 500 over a switch fabric 524 are passed from the physical layer 504 and data/link layer 506 to the inbound packet director 508. The inbound packet director writes each incoming packet to a VC receive queue 516a-516d of a VC unit based on a TC-to-VC mapping that is stored at (or is otherwise accessible by) the inbound packet director 508. In the example of
Each VC receive queue 516a-516d can be implemented as a first-in-first-out (FIFO) structure that passes packets to its corresponding VC packet dispatch unit 518a-518d in the order it receives them. For example, packets on the VC2 receive queue 516c are passed to its corresponding VC2 packet dispatch unit 518c. Upon receipt of a packet, the VC packet dispatch unit 518a-518d determines the format of the packet by inspecting the PI field of the AS route header. For PI-4 packets, the VC packet dispatch 518a-518d performs one or more PI-4 packet validation operations. In one example, the VC packet dispatch unit 518a-518d performs a payload check to determine whether the actual payload size of the packet is equal to the payload size indicated in the packet header. In another example, the VC packet dispatch unit 518a-518d performs a configuration space permissions check to determine whether the AS device from which the PI-4 packet originated has the appropriate permission, e.g., a write permission, to access the target AS device's AS Native Capability Structures 512a.
If the PI-4 packet is invalid, the VC packet dispatch unit 518a-518d discards the PI-4 packet, generates an error signal, and sends the error signal to a processor (not shown) external to the VC packet dispatch unit 518a-518d. In one implementation, the external processor generates a PI-5 (event notification) packet in response to the error signal.
If the PI-4 packet is valid, the VC packet dispatch unit 518a-518d identifies the packet type using the field values associated with an Operation Type field in the AS route header. Table 3 shows how a packet is identified using the Operation Type field.
For each valid PI-4 packet, the VC packet dispatch unit 518a-518d sends an access request (e.g., a read request or a write request) to the ASCSA unit 514 for processing. The access request includes an aperture number and address (corresponding to the PI-4 packet header) of a location in the AS Configuration Space 512.
In one implementation, the ASCSA unit 514 arbitrates access to the AS Configuration Space 512 between the multiple access requests sent by the VC packet dispatch units 518a-518d in a round-robin fashion. The ASCSA unit 514 optionally between access requests including those originated at a processor (not shown) local to the AS device 500 and received over communications paths other than a virtual channel of the switch fabric 524. If an access request is a write request, the ASCSA unit 514 writes data to a location in an AS Native Capability Structure 512a specified by the aperture number and address specified by the access request.
If an access request is a read request, the ASCSA unit 514 retrieves data from a location in the AS Native Capability Structure 512a specified by an aperture number and address specified by the access request. If a failure occurs before or while the data is being retrieved from the AS Native Capability Structure 512a, the ASCSA unit generates an error signal and sends the error signal to the VC packet dispatch unit 518a-518d that originated the access request (“originating VC packet dispatch unit”). In response to the error signal, the originating VC packet dispatch unit 518a-518d generates an AS payload having a PI-4 Read Completion with Error packet header. Within the PI-4 Read Completion with Error packet header, the VC packet dispatch unit 518a-518d provides a value in a Status Code field that indicates the type of failure that occurred during the data retrieval process. Any partial data that may have been retrieved is typically discarded rather than included in the payload of the generated packet for transmission to the remote AS device.
If the data retrieval is successful, the originating VC packet dispatch unit 518a-518d generates an AS payload by appending the retrieved data to a PI-4 Read Completion with Data packet header. Within the PI-4 Read Completion with Data packet header, the originating VC packet dispatch unit 518a-518d provides a value in the Payload Size field that indicates the size of the retrieved data.
In both cases, the VC packet dispatch unit 518a-518d generates a PI-4 packet by attaching an AS route header to the AS payload. The VC packet dispatch unit 518a-518d sends the generated PI-4 packet to its corresponding VC arbiter 520a-520d. For example a PI-4 packet generated by the VC2 packet dispatch unit 518c is sent to the VC2 arbiter 520c, which passes the PI-4 packet to the VC2 transmit queue 522c.
In one implementation, the outbound packet arbiter 510 retrieves the PI-4 packets from the VC transmit queues 522a-522d in round-robin fashion and transfers the PI-4 packets to a remote AS device (not shown) through the switch fabric 524.
Referring to
When access to the local memory 614 is requested by the AS-Core receive unit 608, the AS-Core transmit unit 610, and the embedded micro-processor 612, a bus arbiter 616 authorizes the access using an arbitration scheme. Arbitration schemes typically try to balance two factors in choosing which device (i.e., the AS-Core receive unit 608, the AS-Core transmit unit 610, and the embedded micro-processor 612) to grant access to the bus. First, each device has a bus priority, and the highest-priority devices are serviced first. Second, to assure that no device, even with low priority, is completely locked out, the bus arbiter 616 uses a round-robin fairness protocol that does not grant a device which has just completed a bus operation access to the bus for a second operation until all the requesting devices have first been granted access to the bus.
Packets received at the local AS device 600 are passed from the physical layer 604 and data/link layer 606 to the AS-Core receive unit 608. For each incoming packet, the AS-Core receive unit 608 allocates a packet descriptor from a packet descriptor pool stored in the local memory 614 to the packet, stores the packet in a buffer location corresponding to the allocated packet descriptor, and pushes the packet descriptor onto a VC receive queue 618. In one implementation, the packet descriptor is pushed onto a VC receive queues 618 based on a TC-to-VC mapping that is stored at (or is otherwise accessible by) the AS-Core receive unit 608. In the example of
The embedded micro-processor 612 may be notified of an incoming packet by an interrupt that is generated when a descriptor is pushed onto a VC receive queue 618 or by periodically polling the VC receive queues 618. In one implementation, the embedded micro-processor 612 services the multiple VC receive queues 618 in a weighted round-robin fashion, and processes the packets within each VC receive queue 618 in the order in which they are received.
To process a packet, the embedded micro-processor 612 pulls a packet descriptor from the head of a VC receive queue 618 and stores VC context information in the local memory 614. The VC context information identifies the VC receive queue 617 from which the packet descriptor was pulled. The embedded micro-processor 612 examines the packet stored in the buffer location corresponding to the packet descriptor to determine the format of the packet, e.g., by inspecting the PI field of the packet's AS route header. For PI-4 packets, the embedded micro-processor 612 performs one or more PI-4 packet validation operations (e.g.., a payload check or a configuration space permissions check).
If the PI-4 packet is invalid, the embedded micro-processor 612 discards the PI-4 packet, generates an error signal, and sends the error signal to a processor (not shown) external to the embedded micro-processor 612. In one implementation, the external processor generates a PI-5 (event notification) packet in response to the error signal.
If the PI-4 packet is valid, the embedded micro-processor 612 identifies the packet type using the field values associated with an Operation Type field in the AS route header. For a valid PI-4 write request packet, the embedded micro-processor 612 extracts data from the PI-4 packet and writes the data to a register location (corresponding to the aperture number and address specified by the PI-4 packet header) in the AS Native Capability Structures region in the local memory 614.
For a valid PI-4 read request packet, the embedded micro-processor 612 reads the data from a register location (corresponding to the aperture number and address specified by the PI-4 packet header) in the AS Native Capability Structures region 616 in the local memory 614. If a failure occurs before or while the data is being retrieved from the AS Native Capability Structures region 616, the embedded micro-processor 612 generates an AS payload having a PI-4 Read Completion with Error packet header. Within the PI-4 Read Completion with Error packet header, the embedded-microprocessor provides a value in a Status Code field that indicates the type of failure that occurred during the data retrieval process. Any partial data that may have been retrieved is typically discarded rather than included in the payload of the generated packet for transmission to the remote AS device.
If the data retrieval is successful, the embedded micro-processor 612 generates an AS payload by appending the retrieved data to a PI-4 Read Completion with Data packet header. Within the PI-4 Read Completion with Data packet header, the embedded micro-processor 612 provides a value in the Payload Size field that indicates the size of the retrieved data.
In both cases, the embedded micro-processor 612 generates a PI-4 packet by attaching an AS route header to the AS payload. For each outgoing PI-4 packet, the embedded micro-processor 612 unit allocates a packet descriptor from a packet descriptor pool stored in the local memory 614 to the packet, stores the outgoing PI-4 packet in a buffer location corresponding to the allocated packet descriptor, and uses the VC context information stored in memory to select a VC transmit queue 620 onto which the packet descriptor is pushed. For example, if a packet descriptor allocated to an incoming packet is pulled from the VC1 receive queue, the packet descriptor allocated to the corresponding outgoing packet is pushed onto the VC1 transmit queue.
The AS-Core transmit unit 610 retrieves the packet descriptors from the VC transmit queues 620 in round-robin fashion, and transfers the PI-4 packets from the corresponding buffer locations in the local memory 614 to a remote AS device through the switch fabric 622.
Referring to
When access to the local memory 718 is requested by the AS-Core receive unit 714, the AS-Core transmit unit 716, and the multiple micro-engines 708, 710, 712, a bus arbiter 726 authorizes the access using one or more arbitration schemes. Unlike the arbitration scheme described above with respect to
Packets received at the local AS device 700 are passed from the physical layer 704 and data/link layer 706 to the AS-Core receive unit 714. For each incoming packet, the AS-Core receive unit 714 determines the format of the packet by inspecting the PI field of the packet's AS route header. The AS-Core receive unit 714 then allocates a packet descriptor from a packet descriptor pool stored in the local memory 718 to the packet, stores the packet in a buffer location corresponding to the allocated packet descriptor, and pushes the packet descriptor onto an appropriate VC receive queue 722. In one implementation, the packet descriptor is pushed onto a VC receive queue 722 based on a TC-to-VC mapping that is stored at (or is otherwise accessible by) the AS-Core receive unit 714. In the example of
The micro-engines 708, 710, 712 may be notified of an incoming packet by an interrupt that is generated when a descriptor is pushed onto their respective VC receive queues or by periodically polling their respective VC receive queues 722. In one implementation, each of the micro-engines 708, 710, 712 service their respective VC receive queues 722 in a weighted round-robin fashion, and process the packets within each VC receive queue 722 in the order in which they are received.
For example, to process a PI-4 packet at the head of the PI-4 VC1 receive queue 722a′, the PI-4 micro-engine 712 first performs one or more PI-4 packet validation operations (e.g., a payload check or a configuration space permissions check). If the PI-4 packet is invalid, the PI-4 micro-engine 712 generates an error signal, sends the error signal to the PI-5 micro-engine 708, and discards the PI-4 packet.
In one implementation, the PI-5 micro-engine 708 generates a PI-5 (event notification) packet in response to the error signal. The PI-5 micro-engine 708 uses the turn pool, turn pointer, and other information provided in the route header of the PI-4 packet to form an AS route header, which is appended to an AS payload that identifies the event condition (e.g., configuration space permissions protection error). The generated PI-5 packet is written to a buffer location in the local memory 718. The PI-5 micro-engine 708 pushes a packet descriptor (e.g., with a pointer to the buffer which stores the outgoing PI-5 packet) to a PI-5 VC transmit queue 724b.
If the PI-4 packet is valid, the PI-4 micro-engine 712 identifies the packet type using the field values associated with an Operation Type field in the AS route header. For a valid PI-4 write request packet, the PI-4 micro-engine 712 extracts data from the PI-4 packet and writes the data to a register location (corresponding to the aperture number and address specified by the PI-4 packet header) in the AS Native Capability Structures region 720 in the local memory 718.
For a valid PI-4 read request packet, the PI-4 micro-engine 712 reads the data from a register location (corresponding to the aperture number and address specified by the PI-4 packet header) in the AS Native Capability Structures region 720 in the local memory 718. If a failure occurs before or while the data is being retrieved from the AS Native Capability Structures region 720, the PI-4 micro-engine 712 generates an AS payload having a PI-4 Read Completion with Error packet header. Within the PI-4 Read Completion with Error packet header, the PI-4 micro-engine 712 provides a value in a Status Code field that indicates the type of failure that occurred during the data retrieval process. Any partial data that may have been retrieved is typically discarded rather than included in the payload of the generated packet for transmission to the remote AS device.
If the data retrieval is successful, the PI-4 micro-engine 712 generates an AS payload by appending the retrieved data to a PI-4 Read Completion with Data packet header. Within the PI-4 Read Completion with Data packet header, the PI-4 micro-engine 712 provides a value in the Payload Size field that indicates the size of the retrieved data.
In both cases, the PI-4 micro-engine 712 generates a PI-4 packet by attaching an AS route header to the AS payload, and writes the generated PI-4 packet to a buffer location in the local memory 718. The PI-4 micro-engine 712 pushes a packet descriptor (e.g., with a pointer to the buffer which stores the outgoing packet) to a PI-4 VC transmit queue 724a.
The AS-Core transmit unit 716 retrieves the packet descriptors from the multiple VC transmit queues 724 in round-robin fashion, and transfers the packets from the corresponding buffer locations in the local memory 718 to a remote AS device through the switch fabric 726.
The invention and all of the functional operations described in this specification can be implemented in digital electronic circuitry, or in computer hardware, firmware, software, or in combinations of them. The invention can be implemented as a computer program product, i.e., a computer program tangibly embodied in an information carrier, e.g., in a machine-readable storage device or in a propagated signal, for execution by, or to control the operation of, data processing apparatus, e.g., a programmable processor, a computer, or multiple computers. A computer program can be written in any form of programming language, including compiled or interpreted languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A computer program can be deployed to be executed on one computer or on multiple computers at one site or distributed across multiple sites and interconnected by a communication network.
Method steps of the invention can be performed by one or more programmable processors executing a computer program to perform functions of the invention by operating on input data and generating output. Method steps can also be performed by, and apparatus of the invention can be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit).
Processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer. Generally, a processor will receive instructions and data from a read-only memory or a random access memory or both. The essential elements of a computer are a processor for executing instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks. Information carriers suitable for embodying computer program instructions and data include all forms of non-volatile memory, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto-optical disks; and CD-ROM and DVD-ROM disks. The processor and the memory can be supplemented by, or incorporated in special purpose logic circuitry.
The invention can be implemented in a computing system that includes a back-end component, e.g., as a data server, or that includes a middleware component, e.g., an application server, or that includes a front-end component, e.g., a client computer having a graphical user interface or a Web browser through which a user can interact with an implementation of the invention, or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication, e.g., a communication network. Examples of communication networks include a local area network (“LAN”) and a wide area network (“WAN”), e.g., the Internet.
The computing system can include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
The invention has been described in terms of particular embodiments. Other embodiments are within the scope of the following claims. For example, the steps of the invention can be performed in a different order and still achieve desirable results.