Traditional server systems were designed so that each server had dedicated input/output (I/O) devices. The I/O devices were either integrated onto the server motherboard or added by the vendor or customer in form of an add-in card, such as PCI (Peripheral Component Interconnect) or PCI-Express adapter cards. All resources of the I/O device were utilized only by the associated server. When multiple servers are deployed together, say in a network, each server has a dedicated network adapter that performs the required I/O functions. These servers are usually connected to a network switch, which has a port reserved for each server.
Each server is usually limited to hosting a single application to avoid operating system (OS) conflicts. When an application is deployed onto a server, I/O devices are allocated and the system is configured in order to host that particular application. For example, in certain networking applications, dedicated I/O devices—a network adapter and a storage adapter—are allocated to the server. The system configuration involves installing an OS and application software on the server, configuring the local adapters, connecting the server to switches, configuring the network and storage fabric to associate those connections to the required network and storage devices, etc. In scenarios where an application needs to be moved due to server failure or other reasons, the server to which the application is moved needs to be reconfigured again. The resources involved in such reconfigurations may also negatively impact the cost of operation of the server due to long server downtime.
One or more embodiments of the present invention relate to an apparatus comprising: a server comprising n operating system images and an IOV aware root complex; a plurality of physical I/O devices comprising n virtual I/O functions; and a PCI Express bus operatively connected to the server and the plurality physical I/O devices via the root complex, wherein the root complex is operable to provide communication between the n operating system images and the n virtual I/O functions, and wherein the server and the plurality of physical I/O devices are modules in a chassis.
One or more embodiments of the present invention relate to an apparatus comprising: a plurality of servers, each server comprising n operating system images and an IOV aware root complex; a plurality of physical I/O devices, each physical I/O device comprising n virtual I/O functions; a PCI Express switch fabric comprising a plurality of upstream ports respectively connected to the plurality of servers and a plurality of downstream ports connected to the plurality of physical I/O devices; an IOV management entity operable to provide communication between any one of the n operating system images and at least one I/O virtual function, wherein the plurality of servers and the plurality of devices are modules in a chassis.
One or more embodiments of the present invention relate to an interconnect fabric comprising: a plurality of ports configured as upstream ports, each upstream port operatively connected to a server; a plurality of ports configured as downstream ports, each downstream port operatively connected to a physical I/O device; and an I/O virtualization management entity operable to provide communication between at least one of the upstream ports and at least one of the downstream ports, wherein the interconnect fabric supports I/O virtualization of the I/O devices connected to the downstream ports.
Other aspects and advantages of the invention will be apparent from the following description and the appended claims.
In one aspect, some embodiments enclosed herein relate to systems for sharing I/O devices among multiple servers, hosts, and applications. In particular, embodiments of the present invention relate to virtualization of I/O devices based on the PCI-Express I/O virtualization.
Embodiments of the present invention are described in detail below with respect to the drawings. Like reference numbers are used to denote like parts throughout the figures.
Virtualization is a set of technologies that allow multiple applications to securely share the server hardware, allow applications to be moved easily and efficiently from one server to another, and allow network and storage connections to track changes in the allocations of applications to hardware without requiring administrative action on the network or storage fabrics.
With I/O virtualization, the I/O devices themselves have logic that allows them to serve multiple entities. The servers may run multiple OS images, where each OS image may run a particular application. I/O virtualization allows multiple OSs to share a single I/O device.
The root complex 213 connects the processor and memory subsystem (not shown) of the server 201 to the PCI-Express bus 225 through a PCI-Express port (not shown). Its function is similar to a host bridge in a PCI system. The root complex 213 generates transaction requests on behalf of the processor, which is interconnected through a local bus (not shown). The root complex 213 may be implemented as a discrete device (e.g., a custom design CMOS chip, an FPGA chip) or may be integrated with the processor. The root complex 213 may have more than one PCI-Express port, which may, in turn, be connected to multiple PCI-Express buses or PCI-Express switches.
Each of the virtual I/O functions, Virtual I/O-1217 to Virtual I/O-n 223, may direct memory access (DMA) engine. The DMA engine moves data back and forth between the memory associated with the associated OS image in the server 201 and the virtual I/O function in the I/O device 203. The root complex 213 is used to directly map each OS image to a virtual I/O function within the I/O device 203.
The hypervisor 215 allows multiple OS images, OS Image-1205 to OS Image-n 211, to simultaneously run on a single server. The hypervisor 215 may be considered as a operating system onto itself, on which multiple guest OSs are installed. Each guest OS operates as if it owned all of the server hardware. The guest OSs may also run simultaneously. For example, in
As can be seen in
Two physical I/O devices device-1307 and device-2309 are connected to the downstream ports of the PCIe IOV Fabric 311. Each I/O device includes virtual I/O functions. The n virtual I/O functions included in device-1307 are labeled 337 to 341. The virtual I/O devices included in device-2309 are labeled 343 to 347.
The upstream ports of the shared PCIe IOV Fabric 311 are connected to the servers, while the downstream ports are connected to the physical I/O devices. The PCIe IOV Fabric 311 may be composed of a single switch or multiple switches and a IO management unit (not shown). The IO management unit maintains port mappings that allows each server to build its own I/O device tree and assign device addresses independently of other systems. The mappings are dependent on the system design, which determines the server and I/O device connectivity architecture. When address mappings are established prior to a system being booted, the BIOS in the system determines the available I/O devices behind the PCIe IOV Fabric 311 and proceeds to configure them in a manner similar to when it configures dedicated I/O devices. When mappings are torn down while the server is running, changes in the I/O configurations is conveyed as PCI-Express “hot-plug events,” which will result in the operating system adding or removing the particular devices from its device tree. The hot plug capability allows insertion and removal of I/O devices while the main power is maintained to the system. Therefore, powering down the entire platform in order to plug and unplug devices is not necessary.
The PCIe IOV fabric 311 establishes a hierarchy associated with each root complex 325, 327, and 329. A hierarchy includes all the devices and links associated with a root complex that are either directly connected to the root complex via its ports, or indirectly connected via switches and bridges.
The physical I/O devices described above are designed in an industry standard form factor—the PCI Express Express Module (EM). The form factor of the Express modules is specified by the PCI Express special interest group (PCISIG). The physical I/O devices 403-407 may be separate modules within a chassis supporting the system. Alternatively, they may be grouped into one single module called the Network Express Module (NEM). An NEM provides aggregation of I/O resources to within a single module.
Two EMs are dedicated to each Blade server module. Express module-1607 and Express module-2609 are directly connected to Blade server module-1601. Similarly, Express module-19611 and Express module-20613 are directly connected to Blade server module-10603. The dedicated EMs are not sharable by multiple servers. However, each dedicated EM may be shared by multiple operating systems installed on the associated blade server module.
Four Network Express modules NEMs are also connected to the blade servers through the midplane 605. NEM-1615 is connected to each Blade server module 601-603. Similarly, NEM-4617 is connected to each Blade server module 601-603. The configuration shown allows each NEM to be shared by all the blade server modules on the computer system chassis. The NEM-1615 includes a PCI Express IOV fabric 619 and two Express modules 621 and 622. The root complexes of the Blade servers 601-603 access the virtual I/O functions of Express modules 621 and 622 of the NEM-1 via the PCI Express IOV fabric 619. Similarly, NEM-4617 also includes a PCI Express IOV fabric 625 and two Express modules 627 and 629.
Advantages of the present invention may include one or more of the following. In one or more embodiments of the present invention, resources of a physical I/O device are shared by multiple servers using I/O virtualization. Each of the servers may have multiple operating systems running different applications. This configuration allows full utilization of the resources of the physical I/O device-reducing operating costs and increasing efficiency.
In one or more embodiments of the present invention, blade server modules share physical I/O devices in industry standard form factors using I/O virtualization. The modular design allows for higher computing density by providing more processing power per rack unit than that with conventional rack-mount systems; allows increased serviceability and availability by featuring shared common system components such as power, cooling, and I/O interconnects; allows reduced complexity through fewer required components, cable and component aggregation, and consolidated management; allows lower costs by providing ease of serviceability and low acquisition costs.
The industry standard form factor eliminates the disadvantages associated with being locked on to a single vendor. The user is no longer limited by a single vendor's innovation. The ability to use I/O devices from several vendors drives costs lower and at the same time increases availability. The industry standard form factor, along with modular design, provides greater efficiency and lower operation costs to the end user.
While the invention has been described with respect to a limited number of embodiments, those skilled in the art, having benefit of this disclosure, will appreciate that other embodiments can be devised which do not depart from the scope of the invention as disclosed herein. Accordingly, the scope of the invention should be limited only by the attached claims.