The aforementioned and other features and objects of the present invention and the manner of attaining them will become more apparent and the invention itself will be best understood by reference to the following description of a preferred embodiment taken in conjunction with the accompanying drawings, wherein:
The Figures depict embodiments of the present invention for purposes of illustration only. One skilled in the art will readily recognize from the following discussion that alternative embodiments of the structures and methods illustrated herein may be employed without departing from the principles of the invention described herein.
Dual ported I/O cards (also referred to herein as I/O routers) couple I/O devices to a cross-coupled switching fabric to provide multiple levels of path redundancy. Each I/O router possesses two or more internal ports allowing each I/O router to access multiple switches in a cross-coupled switching fabric. I/O cards, routers, possess ports on both the front and back side of the card. Traditionally I/O cards, as is well known in the art, will have between 2 and 8 ports on the front side of the card for connecting the card to I/O devices but only a single port on the back side of the I/O card. According to one embodiment of the present invention, I/O routers of the present invention possess two or more back end, or internal ports so as to provide additional redundant paths between each I/O device and each microprocessor complex thus supplying additional means to balance data traffic and thereby maximize bandwidth utilization. Hereinafter, I/O routers referred to in this detailed description are those I/O cards comprising two or more back end or back side ports. In one embodiment of the present invention, I/O routers using a plurality of back end ports accesses a switching fabric that uses cross-coupled nontransparent ports providing each I/O device with multiple paths upon which to pass data.
One embodiment of the present invention utilizes I/O routers that can themselves determine what virtualization services are needed and thereafter send the control portion of a data transfer to the microprocessor complex on which that service is running. Simultaneously, I/O routers can select a (possibly different) microprocessor complex to store the data in a local buffer. Thus data control can be passed to the appropriate microprocessor complex without first visiting the microprocessor owning the I/O router independent of the storage location of the data.
Control of what path is selected between any I/O device and any particular microprocessor complex rests with the microprocessor complex connected to the I/O device via the transparent port on a switch. Note that data can be sent over any port (i.e. path) of the I/O router. In one embodiment of the present invention, each port of an I/O router is connected to a different switch. The I/O router can continue to function to deliver data to the designated microprocessor complex even when one of the switches coupled to the I/O router fails. In another embodiment the I/O router is not only connected to a functioning switch of the switching fabric, but, should the utilized switch employ a nontransparent port or ports to allow connection to between all microprocessor complexes as is described in co-pending U.S. patent application Ser. No. ______ entitled “Cross-Coupled Peripheral Component Interconnect Express Switch,” the I/O router can maintain connectivity to all microprocessor complexes even in the event of individual switch failure with minimal performance degradation.
For example, when a dual ported router is connected to a cross-coupled switching fabric and the cross-coupled switching fabric is connected to a microprocessor complex having two or more microprocessors, the data for each I/O transaction traveling from one I/O device to a second I/O device will have many possible routes. The sending I/O device via a dual ported router has available to it two possible switches. Due to the cross coupling of switches, each switch has itself access to two microprocessors providing a total of four routes. From the microprocessor that the transaction landed there exists paths to two switches again having at least two paths to an I/O router coupled to the destination I/O device. By adding additional microprocessors the number of possible paths can increase substantially. Any of these routes can be selected to balance data traffic as well as serve as redundant paths should any component on a primary or selected path fail.
According to one embodiment of the present invention, each of a plurality of I/O devices 490 has at its disposal multiple paths by which to reach any microprocessor complex 425. Each I/O device 490 is coupled to the switching fabric 430 via an interface. The interface can be in the form of a single ported HBA 450, a NIC 460 or an I/O router. According to one embodiment each I/O device 490, such as the depicted arrays 470, are provided with multiple paths to access the switching fabric 430. In this depiction, each array 470 can be coupled to the switching fabric 430 via a dual ported I/O router 440. As one skilled in the art can appreciate, an I/O router 440 with additional ports can be also utilized thus allowing the present invention to be scaled accordingly.
Each I/O router 440 shown in
Likewise a network connection 480 through multiple NICs 460 can again provide multiple data paths. Each NIC 460 possesses a single port by which it can couple the network 480 to the switching fabric 430. In the embodiment shown in
In the like manner, an array 470 can also achieve the same level of path reliability by being coupled to the switching fabric 430 through two single ported HBAs 450. Again the array 470 has two possible paths by which to interface with the switching fabric 430 at unique points of entry, i.e. unique switches 432, 438.
The switching fabric 430 is also cross-coupled as shown in
As each I/O device 490 possesses multiple data paths at its disposal, the selection of what path is used must be managed. In one embodiment the microprocessor owning the HBA or NIC controls what path is selected. With standard operating systems, target and initiator drivers would be placed on the microprocessor complex that owns an associated HBA to control movement of data. A request from a host would arrive to the controlling microprocessor complex and the target driver and the initiator driver within that microprocessor complex would process the request. Control information from the target driver to a virtualization service is conveyed via a SCSI server.
As will be appreciated by one skilled in the art, control information typically passes from an HBA to a virtualization service via a number of steps. Generally control information originating in a HBA 540, 542, 544, as shown in
According to one embodiment of the present invention, the passing of control information can be simplified by using a Remote Procedure Call (RPC) mechanism associated with I/O routers in place of the SCSI class drivers and a second use of SCSI server. Using such a mechanism, control information can be passed by using the SCSI server virtualization service on the first microprocessor complex and then calling directly to the additional virtualization service on the second microprocessor complex. Alternatively and according to another embodiment of the present invention, the target mode driver can determine what microprocessor complex to use, and go directly to the SCSI server on the second microprocessor complex. In yet another embodiment and as illustrated in
In other embodiments of the present invention, a data balancing manager communicates with each microprocessor complex to manage the data flow between the I/O devices and the microprocessor complexes based on balancing the data flow between each microprocessor complex. In another embodiment of the present invention, path selection is random. As will be appreciated by one skilled in the art, other data balancing routines may be invoked without departing from the intent and scope of the present invention. These and other implementation methodologies an be successfully utilized by the present invention. These implementation methodologies are known within the art and the specifics of their application within the context of the present invention will be readily apparent to one of ordinary skill in the relevant art in light of this specification.
In another embodiment of the present invention, an address scheme is established to provide each I/O device with a static address map of destination microprocessor complexes. For example, one particular I/O device, coupled to an I/O router, may select from a menu of 8 different data paths so as to reach a desired microprocessor complex.
A simple address routing associated with a single stage switching complex is extended to a multi-stage switch through a recursive application of address based routing. The steps to construct these address mappings proceeds from the microprocessor complexes themselves up through the switching fabric and I/O routers. For example, let the largest address range of any of the switch complexes using non-transparent ports be 0 to M−1 bytes. Then the transparent port of each of the lowest level of switch will also be 0 to M−1 bytes, while the address range of the non-transparent ports will be M to 2M−1, with an offset of −M applied to the addresses of requests that map to the non-transparent port. Similarly, the next level of switches will have a transparent port range of 0 to 2M−1 bytes, and the non-transparent range will be 2M to 4M−1 with an offset of −2M. As with the lower switch level, the lowest 0 to M−1 address map to the microprocessor complex serving as the root complex of the tree which owns the switch, while the M to 4M−1 all map to a non-transparent port at one or the other or both of the levels of the switch.
When “L” is defined as the level number of the switch, and with L=1 the level closest to the microprocessor complexes, then at each level thereafter the transparent port covers a range of 0 to L*M−1, while the non-transparent port covers a range of L*M to 2*L*M−1, with an offset of −L*M. A dual ported I/O router essentially adds an additional level of switches for that particular I/O device. Based on these assignments and the actual switch connectivity, a static map of address ranges to microprocessor complexes can be produced for each switch tree. Then, when setting up an I/O router or HBA (I/O device) to microprocessor complex direct memory access transfer, i.e. a data path, the destination and owning microprocessor complex numbers are simply used to index a table of direct memory access address offsets that are added to the local address of the allocated buffers as shown in Table 1.
Thus, using the techniques discussed here, an address mapping table can easily be developed for any size cross-coupled system of switches with non-transparent ports coupled to multiple ported I/O routers. In one embodiment, the tables would be derived during boot up with relevant information programmed into the switches at that time. In another embodiment of the present invention, the information would be saved for each microprocessor complex, so it could immediately translate a destination complex plus local memory address into the correct offset memory address for that complex's devices, thus enabling efficient and rapid communication from any I/O device to any microprocessor complex in the system.
Switches within the switching fabric are further coupled 640 to plurality of microprocessor complexes wherein each microprocessor complex is coupled to at least two unique switches of the switching fabric. The resulting network of switches and I/O routers produces multiple redundant paths between each I/O device and each microprocessor complex.
An address scheme is configured 650 to establish an address based routing or map between each microprocessor complex and each I/O device. From these multiple paths or routes, a select data path is selected 660 based, in one embodiment of the present invention, on data path balancing. A query 670 is then performed to determine if all of the I/O associated with the processing is complete. When the I/O is complete the process terminates 695. When additional I/O is ongoing, path selection 660 is again conducted.
While I/O routers having a plurality of ports provide a plurality of options for balancing data traffic as well as switch and microprocessor complex utilization, such I/O routers with multiple ports also improve system availability. In another embodiment of the present invention, a scheme allows both single and dual ported I/O devices to co-exist in the same I/O chassis. This configuration provides maximum configuration flexibility without limiting the capability of single ported systems. In yet another embodiment of the present invention, single ported I/O cards are interleaved with dual ported I/O routers.
As will be understood by those familiar with the art, the invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. Likewise, the particular naming and division of the modules, managers, functions, systems, engines, layers, features, attributes, methodologies and other aspects are not mandatory or significant, and the mechanisms that implement the invention or its features may have different names, divisions and/or formats. Furthermore, as will be apparent to one of ordinary skill in the relevant art, the modules, managers, functions, systems, engines, layers, features, attributes, methodologies and other aspects of the invention can be implemented as software, hardware, firmware or any combination of the three. Of course, wherever a component of the present invention is implemented as software, the component can be implemented as a script, as a standalone program, as part of a larger program, as a plurality of separate scripts and/or programs, as a statically or dynamically linked library, as a kernel loadable module, as a device driver, and/or in every and any other way known now or in the future to those of skill in the art of computer programming. Additionally, the present invention is in no way limited to implementation in any specific programming language, or for any specific operating system or environment and can be stored on any applicable storage media or medium that can possess program code. Accordingly, the disclosure of the present invention is intended to be illustrative, but not limiting, of the scope of the invention.
While there have been described above the principles of the present invention in conjunction with specific computer architecture, it is to be clearly understood that the foregoing description is made only by way of example and not as a limitation to the scope of the invention. Particularly, it is recognized that the teachings of the foregoing disclosure will suggest other modifications to those persons skilled in the relevant art. Such modifications may involve other features which are already known per se and which may be used instead of or in addition to features already described herein. Although claims have been formulated in this application to particular combinations of features, it should be understood that the scope of the disclosure herein also includes any novel feature or any novel combination of features disclosed either explicitly or implicitly or any generalization or modification thereof which would be apparent to persons skilled in the relevant art, whether or not such relates to the same invention as presently claimed in any claim and whether or not it mitigates any or all of the same technical problems as confronted by the present invention. The Applicant hereby reserve the right to formulate new claims to such features and/or combinations of such features during the prosecution of the present application or of any further application derived therefrom.
The present application relates to U.S. patent application Ser. No. ______ filed on ______ entitled, “Data Buffer Allocation in a Non-blocking Data Services Platform using Input/Output Switching Fabric” and U.S. patent application Ser. No. ______ filed on ______ entitled, “Cross-Coupled Peripheral Component Interconnect Express Switch”. The entirety of both applications is hereby incorporated by this reference.