This invention relates to packet validity checking in switched fabric networks.
PCI (Peripheral Component Interconnect) Express is a serialized I/O interconnect standard developed to meet the increasing bandwidth needs of the next generation of computer systems. PCI Express was designed to be fully compatible with the widely used PCI local bus standard. PCI is beginning to hit the limits of its capabilities, and while extensions to the PCI standard have been developed to support higher bandwidths and faster clock speeds, these extensions may be insufficient to meet the rapidly increasing bandwidth demands of PCs in the near future. With its high-speed and scalable serial architecture, PCI Express may be an attractive option for use with or as a possible replacement for PCI in computer systems. The PCI Special Interest Group (PCI-SIG) manages PCI specifications as open industry standards, and provides the specifications to its members.
Advanced Switching (AS) is a technology which is based on the PCI Express architecture, and which enables standardization of various backplane architectures. AS utilizes a packet-based transaction layer protocol that operates over the PCI Express physical and data link layers. The AS architecture provides a number of features common to multi-host, peer-to-peer communication devices such as blade servers, clusters, storage arrays, telecom routers, and switches. These features include support for flexible topologies, packet routing, congestion management (e.g., credit-based flow control), fabric redundancy, and fail-over mechanisms. The Advanced Switching Interconnect Special Interest Group (ASI-SIG) is a collaborative trade organization chartered with providing a switching fabric interconnect standard, specifications of which it provides to its members.
Each switch element 102 and end point 104 has an Advanced Switching (AS) interface that is part of the AS architecture defined by the “Advance Switching Core Architecture Specification” (available from the Advanced Switching Interconnect-SIG at www.asi-sig.com). The AS architecture utilizes a packet-based transaction layer protocol that operates over the PCI Express physical and data link layers 202, 204, as shown in
AS uses a path-defined routing methodology in which the source of a packet provides all information required by a switch (or switches) to route the packet to the desired destination.
A path may be defined by the turn pool 402, turn pointer 404, and direction flag 406 in the AS route header 302, as shown in
The PI field in the AS route header 302 determines the format of the encapsulated packet 304. The PI field is inserted by the end point 104 that originates the AS packet and is used by the end point that terminates the packet to correctly interpret the packet contents. The separation of routing information from the remainder of the packet enables an AS fabric to tunnel packets of any protocol.
PIs represent fabric management and application-level interfaces to the switched fabric network 100. Table 1 provides a list of PIs currently supported by the AS Specification.
PIs 0-7 are used for various fabric management tasks, and PIs 8-254 are application-level interfaces. As shown in Table 1, PI-8 is used to tunnel or encapsulate a native PCI Express packet. Other PIs may be used to tunnel various other protocols, e.g., Ethernet, Fibre Channel, ATM (Asynchronous Transfer Mode), InfiniBand®, and SLS (Simple Load Store). An advantage of an AS switch fabric is that a mixture of protocols may be simultaneously tunneled through a single, universal switch fabric making it a powerful and desirable feature for next generation modular applications such as media gateways, broadband access routers, and blade servers.
The AS architecture supports the establishment of direct endpoint-to-endpoint logical paths through the switch fabric known as Virtual Channels (VCs). This enables a single switched fabric network to service multiple, independent logical interconnects simultaneously, each VC interconnecting AS end points for control, management and data. Each VC provides its own queue so that blocking in one VC does not cause blocking in another. Each VC may have independent packet ordering requirements, and therefore each VC can be scheduled without dependencies on the other VCs.
The AS architecture defines three VC types: Bypass Capable Unicast (BVC); Ordered-Only Unicast (OVC); and Multicast (MVC). BVCs have bypass capability, which may be necessary for deadlock free tunneling of some, typically load/store, protocols. OVCs are single queue unicast VCs, which are suitable for message oriented “push” traffic. MVCs are single queue VCs for multicast “push” traffic.
The AS architecture provides a number of congestion management techniques, one of which is a credit-based flow control technique that ensures that packets are not lost due to congestion. Link partners (e.g., an end point 104 and a switch element 102) in the network exchange flow control credit information to guarantee that the receiving end of a link has the capacity to accept packets. Flow control credits are computed on a VC-basis by the receiving end of the link and communicated to the transmitting end of the link. Typically, packets are transmitted only when there are enough credits available for a particular VC to carry the packet. Upon sending a packet, the transmitting end of the link debits its available credit account by an amount of flow control credits that reflects the packet size. As the receiving end of the link processes (e.g., forwards to an end point 104) the received packet, space is made available on the corresponding VC and flow control credits are returned to the transmission end of the link. The transmission end of the link then adds the flow control credits to its credit account.
The AS architecture supports the implementation of an AS Configuration Space in each AS device (e.g., AS end point 104) in the network. The AS Configuration Space is a storage area that includes fields to specify device characteristics as well as fields used to control the AS device. The AS Configuration Space includes up to 16 apertures where configuration information can be stored. Each aperture includes up to 4 Gbytes of storage and is 32-bit addressable. The configuration information is presented in the form of capability structures and other storage structures, such as tables and a set of registers. Table 2 provides a set of capability structures (“AS Native Capability Structures”) that are defined by the AS Specification and stored in aperture 0 of the AS Configuration Space.
Legend:
O = Optional normative
R = Required
R w/OE = Required with optional normative elements
N/A = Not applicable
The information stored in the AS Native Capability Structures can be accessed through node configuration packets, e.g., PI-4 packets, which are used for device management.
In one implementation of a switched fabric network, the AS devices on the network are restricted to read-only access of another AS device's AS Native Capability Structures, with the exception of one or more AS end points that have been elected as fabric managers.
A fabric manager election process may be initiated by a variety of either hardware or software mechanisms to elect one or more fabric managers for the switched fabric network. A fabric manager is an AS end point that “owns” all of the AS devices, including itself, in the network. If multiple fabric managers, e.g., a primary fabric manager and a secondary fabric manager, are elected, then each fabric manager may own a subset of the AS devices in the network. Alternatively, the secondary fabric manager may declare ownership of the AS devices in the network upon a failure of the primary fabric manager, e.g., resulting from a fabric redundancy and fail-over mechanism.
Once a fabric manager declares ownership, it has privileged access to its AS devices' AS Native Capability Structures. In other words, the fabric manager has read and write access to the AS Native Capability Structures of all of the AS devices in the network.
As previously discussed, the AS Native Capability Structures of an AS device are accessible through PI-4 packets. Accordingly, each AS device in the switched fabric network can be implemented to include an AS PI-4 unit for processing PI-4 packets received through the network from a fabric manager or another AS device. In the examples to follow, the term “local AS device” refers to an AS device that has received a PI-4 packet and is processing the PI-4 packet, and the term “remote AS device” refers to an AS device, e.g., a fabric manager or another AS device, on the network that is attempting to access the local AS device's AS Native Capability Structures.
Referring to
PI-4 packets received at the local AS device 500 are passed from the physical and data link layers to the PI-4 unit 510 for processing through the AS-Core receive unit 504. In one implementation, an inbound arbiter 512 in the PI-4 unit 510 arbitrates access between multiple VCs in the AS-Core receive unit 504 and a single receive queue 514 in round-robin fashion. The receive queue 514 provides buffer space for incoming PI-4 packets so that the PI-4 packets can be removed from the VCs in the AS-Core receive unit 504 as quickly as possible. There is an inherent latency involved in accessing the AS device's AS Configuration Space 508. Having the receive queue 514 in the PI-4 unit 510 shields this latency from the AS-Core receive unit 504, thus allowing flow control credits to be made available quickly to remote AS devices.
The receive queue 514 can be implemented as a first-in-first-out (FIFO) structure that passes PI-4 packets to a transaction executor 516 in the order it receives them. The receive queue 514 is sufficiently wide to enable the transaction executor 516 to simultaneously examine all of the fields of an incoming PI-4 packet's header in a single clock cycle prior to processing the PI-4 packet.
As shown in
The header fields under test may include all or a subset of the header fields of a PI-4 packet having the format shown in
In the following clock cycle, the packet validity check block 516a performs the set of checks on the header fields under test concurrently.
The expected header values can be pre-loaded into the packet validity check block 516a. In one implementation, for each header field under test, a single expected header value is provided with the expectation that all incoming PI-4 packets contain a header value that should match the expected header value. However, in other implementations, multiple expected header values can be stored in the packet validity check block 516a for each header field, with one of the multiple expected header values selected to perform the comparison with the header value in a received PI-4 packet. The selection may be made based on packet type. For example, if an incoming PI-4 packet is determined to be a read request packet, the packet validity check block 516a uses one set of expected header values. If an incoming PI-4 packet is determined to be a write packet, a different set of expected header values are used.
In the following description, the check performed on each header field under test is described on a field-by-field basis with reference to the PI-4 packet format of
The least significant 7 bits of dword 1 contains a Primary Protocol Interface (PPI) field. For PI-4 packets, the PPI field must contain the value “4”. If the PPI field contains any other value, the PPI field check fails and the packet validity check block 516a automatically discards the incoming packet.
Bit 8 of dword 1 contains the Packet CRC Enable (PCRCE) field. The PCRCE flag indicates whether a dword of PCRC has been appended to the packet's payload (e.g., dword 13) or the end Qf the PI-header if no payload is included (e.g., dword 5). If bit 8 is set, the packet validity check block 516a calculates a 32-bit cyclic redundancy code value over dwords 3-12 and checks the resulting value against the value provided in the PCRC field in dword 13. If the two values match, the PCRCE field check passes. If not, the PCRCE field check fails and the PI-4 packet is discarded.
The Ordered Only (OO) field is in bit 12 of dword 1. For PI-4 packets, the OO flag must be set to 0, otherwise the OO field check fails.
The Type Specific (TS) field is in bit 13 of dword 1. For PI-4 read request packets, this bit must be set to 1. For PI-4 write packets and PI-4 read completion packets, this bit must be set to 0, otherwise the TS field check fails.
Bits 25 to 31 contain the Header CRC (HCRC) field. The source of a PI-4 packet is required to generate a header cyclic redundancy check value using the first two dwords of the PI-4 packet and store the value in the HCRC field prior to transmitting the PI-4 packet on the AS fabric. The packet validity check block 516a calculates the HCRC value using a polynomial defined in the AS Core specification, and compares the calculated HCRC value against the value provided in the incoming PI-4 packet's HCRC field. If the two values match, the HCRC field check passes, otherwise the HCRC field check fails.
Bits 0 to 30 of dword 2 contain the Turn Pool (TP) field. The TP value provided in the incoming PI-4 packet is checked against the forward routed turn pool values of the primary and secondary fabric managers. The forward routed turn pool value for each fabric manager is provided in the spanning tree capability structure 508a stored in aperture 0 of the AS Configuration Space 508. The TP value is also checked against the forward routed turn pool value of each AS device that is permitted to access the AS Configuration Space 508. If the TP value provided in the incoming PI-4 packet matches a forward routed turn pool value stored in the spanning tree capability structure 508a, this indicates to the packet validity check block that the incoming PI-4 packet originated from a remote AS device that is permitted some form of access to the AS Configuration Space 508.
The least significant four bits of dword 3 contain the Aperture Number (APT) field. The APT field is used in performing Configuration Space Permissions (CSP) checks. The AS Configuration Space 508 includes up to 16 apertures and is implemented with an access protection mechanism that is applied on a per aperture basis using a Configuration Space Permission structure 508a. Each configuration space aperture can be programmed to allow either read or write access, or both, from one, some or all of the remote devices through the setting of particular bits of the Configuration Space Permission structure 508a. The value provided in the APT field indicates the AS Configuration Space aperture that contains the location (“target aperture”) to be read from or written to. The packet validity check block 516a performs a lookup operation of the Configuration Space Permission structure 508a to determine the access permissions for the target aperture. The APT field check passes only if the access permissions for the target aperture correspond with the packet type of the incoming PI-4 packet.
Bits 22-25 contain the Request Code (RC) field. For a PI-4 write packets, the RC field's value provides the payload packet length in dwords. The packet validity check block 516a converts the 4-bit binary value to its decimal form and determines the (requested) payload packet length of the PI-4 packet according to the following:
As the PI-4 write packet is being received, the packet validity check block 516a counts the number of dwords that form the payload. If the AS device 500 supports block writes to the AS Configuration Space 508, larger payloads of up to a maximum of 8 dwords can be written, and the Block Write Supported bit in the Device Control and Status register of the Baseline Device Capability structure 508a must be set to 1. The detected packet length is compared against the packet length specified by the RC field. If the two values match, the RC field check passes, otherwise, the RC field check fails and the PI-4 write packet is discarded. If the Block Write Supported bit is set to 0, any value in the RC field other than 15 is illegal, and the PI-4 write packet is discarded.
In the case of a PI-4 read request packet, if the Request Scale (RS) field in dword 4 is set to 1 and the PI-4 unit supports single dword reads, then the RC field's value must also be 1, otherwise the RC field check fails. If the PI-4 unit supports block reads, the RC field's value provides the number of dwords to be read, and the PI-4 unit 510 must be able to return up to 8 dwords of requested data in a single PI-4 read completion packet. If the RC field's value provides an unsupported number (i.e., a request for more than 8 dowrds of data), the RC field check fails and the transfer control block 516b is notified of a malformed packet event.
Bits 2 to 31 of dword 4 contain the Offset (OFF) field. The OFF field's value is used to generate the address within the specified aperture of the AS Configuration Space 508 that is being accessed for a read or write operation. If the Block Write Supported bit or the Block Read Supported bit in the Device Control and Status register of the Baseline Device Capability structure 508a is set to 1, the packet validity check block 516a adds the OFF field's value to the RC field's value to determine whether the block write or block read access operation will cross an aligned 8-dword boundary. If so, the OFF field check fails and the packet validity check block notifies the transfer control block 516b of a malformed packet event.
Once the set of packet validity checks have been performed and the PI-4 packet is determined to be valid, the packet validity check block 516a passes the valid packet to the transfer control block 516b for processing.
If a valid PI-4 packet is identified as a write packet, the transfer control block 516b processes a write command to write data, e.g., extracted from the payload of the received PI-4 packet, to a location in an AS Native Capability Structure 508a specified by an aperture number and address in the received PI-4 packet header.
If the valid PI-4 packet is identified as a read request packet, the transfer control block 516b processes a read request command to retrieve data from a location in the AS Native Capability Structure 508a specified by an aperture number and address in the PI-4 packet header. If a failure occurs before or while the data is being retrieved from the AS Native Capability Structure 508a, the transfer control block 516b generates an AS payload having a PI-4 Read Completion with Error packet header. Within the PI-4 Read Completion with Error packet header, the transfer control block provides a value in a Status Code field that indicates the type of failure that occurred during the data retrieval process. Any partial data that may have been retrieved is typically discarded rather than provided to the remote AS device.
If the data retrieval is successful, the transfer control block 516b generates an AS payload by appending the retrieved data to a PI-4 Read Completion with Data packet header. Within the PI-4 Read Completion with Data packet header, the transfer control block 516b provides a value in the Payload Size field that indicates the size of the retrieved data.
In both cases, the transfer control block 516b generates a PI-4 packet by attaching an AS route header to the AS payload. The generated PI-4 packet is sent to the transmit queue 518, which requests access and transfers the generated PI-4 packet to one of multiple VCs in the AS-Core transmit unit 506. The PI-4 packet is then returned to the remote AS device through the switched fabric network 100.
The invention and all of the functional operations described in this specification can be implemented in digital electronic circuitry, or in computer hardware, firmware, software, or in combinations of them. The invention can be implemented as a computer-program product, i.e., a computer program tangibly embodied in an information carrier, e.g., in a machine-readable storage device or in a propagated signal, for execution by, or to control the operation of, data processing apparatus, e.g., a programmable processor, a computer, or multiple computers. A computer program can be written in any form of programming language, including compiled or interpreted languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A computer program can be deployed to be executed on one computer or on multiple computers at one site or distributed across multiple sites and interconnected by a communication network.
Method steps of the invention can be performed by one or more programmable processors executing a computer program to perform functions of the invention by operating on input data and generating output. Method steps can also be performed by, and apparatus of the invention can be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit).
Processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer. Generally, a processor will receive instructions and data from a read-only memory or a random access memory or both. The essential elements of a computer are a processor for executing instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks. Information carriers suitable for embodying computer program instructions and data include all forms of non-volatile memory, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto-optical disks; and CD-ROM and DVD-ROM disks. The processor and the memory can be supplemented by, or incorporated in special purpose logic circuitry.
The invention can be implemented in a computing system that includes a back-end component, e.g., as a data server, or that includes a middleware component, e.g., an application server, or that includes a front-end component, e.g., a client computer having a graphical user interface or a Web browser through which a user can interact with an implementation of the invention, or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication, e.g., a communication network. Examples of communication networks include a local area network (“LAN”) and a wide area network (“WAN”), e.g., the Internet.
The computing system can include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
The invention has been described in terms of particular embodiments. Other embodiments are within the scope of the following claims. For example, the steps of the invention can be performed in a different order and still achieve desirable results.