1. Technical Field
The present invention relates generally to communication protocols between a host computer and a storage server via an input/output (I/O) adapter. More specifically, the present invention is directed to a system and method for enabling user space middleware or applications to pass I/O storage requests to a storage server which authenticates the I/O storage requests before processing them with respect to particular storage devices associated with the storage server. Moreover, the present invention is directed to a system and method for enabling user space middleware or applications to pass such I/O storage requests to the storage server without run-time involvement from the local Operating System (OS), or, in a virtual system, the local hypervisor.
2. Description of Related Art
Operating systems, according to the present state of the art, do not permit user space middleware or applications, such as a database, to directly access persistent storage that is identified through the Operating System's Raw Mode Storage I/O interface or the Operating System's Logical Volume Storage I/O interface. As a result, the user space middleware must invoke an Operating System (OS) call and incur several task switches every time an I/O operation is performed. The first task switch is caused when the middleware or application transfers a storage request to the OS. A second task switch occurs when the OS passes control back to the user space middleware or application, after the OS completes processing the middleware or application storage request and passes the storage request to the storage adapter.
A third task switch occurs when the storage adapter completes the associated I/O storage operations and interrupts the processing being performed by an application so that the OS may process the storage adapter's completion. The final task switch occurs when the OS finishes processing the storage adapter's completion and gives control back to the middleware or application that transferred the storage request to the OS. In addition to these task switches the storage adapter typically has a single request queue to process work from the operating system.
The four task switches described above may be considered wasted processor cycles because all work on the thread being switched is stopped until the task switch is complete. On some servers, the number of storage operations performed by a user space middleware or application program may be quite large. Modern, high-end servers may have millions of these operations per second, resulting in several million task switches per second.
In view of the above, it would be beneficial to have a method, system and computer program product having computer readable instructions for handling input/output (I/O) storage requests in which such task switches are minimized. Moreover, it would be advantageous to have an improved method, system, and computer instructions that enables user space middleware or applications to pass I/O storage requests directly to a physical I/O adapter, and thereafter to a storage server, without run-time involvement from the local Operating System (OS), or, in a virtual system, the local hypervisor. It would also be advantageous to have the mechanism apply for InfiniBand, TCP/IP Offload Engines, RDMA (Remote Direct Memory Access) enabled NICs (Network Interface Controllers), iSCSI adapters, iSER (iSCSI Extensions for RDMA) adapters, parallel SCSI adapters, Fibre Channel Adapters, Serial Attached SCSI Adapters, ATA Adapters, Serial ATA Adapters, and any other type of storage adapter.
Further, it would be advantageous to have an improved method, system, and computer instructions that enables protection mechanisms in a storage server to ensure that I/O storage requests that are directly sent to a physical I/O adapter from an application instance, and received by a storage server, are only completed to portions of storage devices associated with the storage server that have been previously allocated for out of user space I/O with the application instance. Moreover, it would be beneficial to have a method, system and computer instructions for enabling the creation, modification, querying and deletion of data structure entries used to facilitate direct I/O operations between an application instance, a physical I/O adapter, and a storage server. In addition, it would be beneficial to have a method, system and computer instructions for processing user space operations so as to perform storage device resource management and direct I/O operation data structure management. Finally, it would be beneficial to have a method, system and computer instructions for achieving the above objectives using the file system of the operating system running on the host system.
The present invention provides a method, computer program product, and data processing system that enables user space middleware or applications to pass I/O storage requests directly to a physical I/O adapter, and thus a storage server via the physical I/O adapter, without run-time involvement from the local Operating System (OS), or, in a virtual system, the local hypervisor. The mechanism described in this invention applies for InfiniBand Host Channel Adapters, TCP/IP Offload Engines, RDMA (Remote Direct Memory Access) enabled NICs (Network Interface Controllers), iSCSI adapters, iSER (iSCSI Extensions for RDMA) adapters, parallel SCSI adapters, Fibre Channel Adapters, Serial Attached SCSI Adapters, ATA Adapters, Serial ATA Adapters, and any other type of storage adapter.
Specifically, the present invention is directed to a mechanism for providing and using a translation protection table (TPT) data structure to control user space, and out of user space, Input/Output (I/O) operations. In one exemplary embodiment of the present invention, the TPT includes a file name protection table (FNPT) having entries for each file managed by the operating system's file system. The entries in the FNPT include pointers to a segment of a file extension protection table (FEPT) that corresponds to the file name. Entries in the FEPT may include a storage device identifier number, a logical unit number (LUN), an offset and a length, and optional other protection table context information. With the TPT of this exemplary embodiment, an application instance or middleware may submit an I/O request by using a file key to find an entry in the FNPT that corresponds to the file that is the target of the I/O request. The entry in the FNPT includes a pointer to an entry in the FEPT. The offset in the FEPT entry is used to calculate a starting storage block address corresponding to the I/O request. The storage device identifier number, LUN and length in the FEPT entry are retrieved and placed, along with the calculated starting storage block address and an authentication key provided by the requesting application instance or middleware, into a storage command in a storage server command queue.
When an application instance initializes, the application instance may request the opening of a file with which I/O operations are to be performed by the application instance. In response to such a request, the operating system may create the file and executes a bootstrap algorithm to authorize the application instance to access portions of a storage device associated with the created file. This bootstrap algorithm may cause the operating system to send a request to a storage server to allocate one or more portions of one or more storage devices for the file. The storage server may associate an authentication key with the file and may return this authentication key to the application instance. The authentication key may be placed in the translation protection table, e.g., the FNPT or FEPT, in association with the file.
Thereafter, when the application instance is to perform I/O operations with the file, the application instance requests opening of the created file. The translation protection table is then used to retrieve the authentication key for the file in addition to the identifiers of portions of the storage device(s) that are associated with the file, e.g., the logical unit numbers (LUNs). The authentication key is packaged into an “open” storage device command that is stored in a queue for sending to the storage server to thereby initiate a session between the application instance and the storage server.
The storage command is then retrieved from the queue by the I/O adapter which forwards the “open” storage command to a storage server controller of the storage server. The storage server controller then authenticates the “open” storage command based on the authentication key. The storage server controller checks, for example, a storage server based data structure that identifies a correspondence between authentication keys and logical unit numbers (LUNs). The storage server controller looks up the LUN(s), identified in the “open” storage command, in the storage server based data structure and identifies a corresponding authentication key. The authentication key retrieved from the storage server based data structure is then compared against the authentication key passed to the storage server controller in the “open” storage command. If there is a match, the storage server controller initiates a session between the application instance and the identified portion(s) of the storage device(s). If there is not a match between the authentication keys, then opening of the portion(s) of the storage device(s) is denied and an error message may be returned. Assuming that the session is initiated between the application instance and the portion(s) of the storage device(s), the application instance may then send storage commands to these opened portion(s) of the storage device(s) without further authentication until the session is discontinued.
In one embodiment of the present invention, the TPT includes a logical volume protection table having entries for each logical volume created for the OS or system image of the host system. The entries in the logical volume protection table identify the storage device identifier number, logical unit number, offset and length, similar to the entries in the FEPT described above. An application instance or middleware may use a volume key to identify an entry in the logical volume protection table (LVPT) corresponding to a logical volume that is the target of an I/O storage request. The volume offset in the corresponding LVPT entry may be used to calculate a starting storage block address for the I/O storage request. The storage identifier number, logical unit number, and length may be retrieved from the LVPT entry and placed, along with the calculated starting storage block address and an authentication key provided by the requesting application instance or middleware, in a storage command in a storage command queue.
A similar mechanism as described above for opening a file may be used when opening a logical volume in this embodiment. That is, a request to allocate one or more portions of one or more storage devices may be sent to a storage server to thereby initialize a logical volume for I/O operations from the application instance. This process may involve the establishment of an authentication key which is returned to the application instance and used by the application instance to access the portion(s) of the storage device(s) associated with the logical volume. The authentication key may be stored in the LVPT in a similar fashion as described previously with regard to the FNPT or FEPT. Having stored the authentication key in the LVPT, the authentication key may be used when initiating a session with the storage server to thereby open the logical volume for use with I/O operations from the application instance.
In this way, only those portions of the storage device(s) associated with the storage server that have been allocated to the application instance or middleware may be accessed by the application instance or middleware. Moreover, only the application instance or middleware for which the portion of the storage device(s) is allocated may access the portion of the storage device(s). In addition, during the processing of the I/O request, the operating system or system image of the host system is not required to be involved beyond maintaining storage commands in a storage command queue. Thus, the multiple context switches required in prior art mechanisms are eliminated by the present invention.
As set forth hereafter, in one exemplary embodiment of the present invention, a computer program product, method and apparatus are provided in which an input/output (I/O) request is received from an application instance, the I/O request including a key value for identifying an entry in a translation protection table data structure, and wherein the I/O request targets a portion of a storage device in a remotely located storage system upon which an I/O operation is to be performed. An entry is retrieved from a translation protection table based on the key value, the entry including an identifier of the storage device and a logical unit number corresponding to the portion of the storage device targeted by the I/O request. A storage command is generated based on the identifier of the storage device and the logical unit number retrieved with the entry from the translation protection table. The storage command is placed in a storage command queue for transmission to the remotely located storage system.
The portion of the storage device that is targeted may be a logical volume comprising one or more logical unit numbers (LUNs) of the storage device. Alternatively, the portion of the storage device that is targeted may be a file comprising one or more logical unit numbers (LUNs) of the storage device.
In addition to the above, a request may be received, from the application instance, to open the portion of the storage device, the request including an authentication key. A command, having the authentication key, may be sent to the remotely located storage system to open the portion of the storage device. Results of the command to open the portion of the storage device may be returned. The remotely located storage system may perform authentication on the command to open the portion of the storage device based on the authentication key.
Moreover, a request may be received, from the application instance, to allocate a logical unit of the storage device to the portion of the storage device for input/output operations of the application instance. An allocate command, generated based on the received request to allocate the logical unit, may be sent to the remotely located storage system and a response may be received from the remotely located storage system identifying an authentication key for use in opening the logical unit of the portion of the storage device for I/O operations. An identifier of the authentication key may be stored, in the translation protection table data structure, in association with an identifier of the portion of the storage device.
Furthermore, the authentication key may be retrieved from the entry retrieved from the translation protection table and an open command may be generated to open the portion of the storage device. The open command may include the authentication key. The open command may be transmitted to the remotely located storage system to thereby open the logical unit of the portion of the storage device for I/O operations during a current session between the application instance and the remotely located storage system. The remotely located storage system may open the logical unit of the portion of the storage device for I/O operations only when the remotely located storage system verifies that the application instance is permitted to access the logical unit of the portion of the storage device based on the authentication key in the open command.
In addition to the above, a request may be received to create an entry in the translation protection table data structure for the portion of the storage device. An entry may be created in the translation protection table data structure for the portion of the storage device and a translation protection table key may be returned that corresponds to the created entry. The entry may include a storage device identifier number, a logical unit number, an offset, and a length. The entry may further include at least one of a protection domain, access control information, or an authentication key.
In another embodiment of the present invention, a request to query an entry in the translation protection table data structure for the portion of the storage device may be received and an entry in the translation protection table data structure for the portion of the storage device may be identified. Attributes of the entry in the translation protection table data structure may then be returned.
In a further embodiment, a request to modify an entry in the translation protection table data structure for the portion of the storage device may be received. The entry in the translation protection table data structure may be modified and attributes of the modified entry in the translation protection table may be returned. A determination may be made as to whether there are any active transactions on the entry in the translation protection table data structure and the modification of the entry in the translation protection table data structure may be performed only if there are no active transactions on the entry.
If there is an active transaction on the entry in the translation protection table data structure, a timer may be initiated. A determination may be made as to whether the timer times out prior to a quiescent point being reached. An error may be generated if the timer times out prior to the quiescent point being reached.
In another embodiment of the present invention, a request to delete an entry in the translation protection table data structure for the portion of the storage device may be received. The entry in the translation protection table data structure may be marked as being invalid. A determine may be made as to whether there are any active transactions on the entry in the translation protection table data structure. The marking of the entry in the translation protection table data structure may be performed only if there are no active transactions on the entry.
If there is an active transaction on the entry in the translation protection table data structure, a timer may be initiated. A determination may be made as to whether the timer times out prior to a quiescent point being reached. An error may be generated if the timer times out prior to the quiescent point being reached.
In one exemplary embodiment of the present invention, the apparatus may include a processor and a memory. The memory may contain instructions for performing the various operations described above. These instructions may be accessed and processed by the processor coupled to the memory.
These and other features and advantages of the present invention will be described in, or will become apparent to those of ordinary skill in the art in view of, the following detailed description of the exemplary embodiments of the present invention.
The novel features believed characteristic of the invention are set forth in the appended claims. The invention itself, however, as well as a preferred mode of use, further objectives and advantages thereof, will best be understood by reference to the following detailed description of an illustrative embodiment when read in conjunction with the accompanying drawings, wherein:
The present invention applies to any general or special purpose host that accesses portions of remotely located storage devices associated with a storage server. For example, the present invention applies to a host that communicates, via an I/O adapter, such as a PCI family I/O adapter, virtual I/O adapter, endpoint device, virtual endpoint device or the like, with a storage server over one or more networks. The one or more networks may consist of end nodes, switches, routers and links interconnecting these components. The network links may be Fibre Channel, Ethernet, InfiniBand, Advanced Switching Interconnect, another standard storage network interconnect, or a proprietary link that uses proprietary or standard protocols. While the depictions and description hereafter will make reference to particular arrangements of networks and host nodes, it should be appreciated that the following exemplary embodiments are only exemplary and modifications to the arrangements specifically depicted and described may be made without departing from the spirit and scope of the present invention.
It is important to note that the present invention can take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment containing both hardware and software elements. In an exemplary embodiment, the present invention is implemented in software, which includes but is not limited to firmware, resident software, microcode, and the like.
Furthermore, the invention can take the form of a computer program product accessible from a computer-usable or computer-readable medium providing program code for use by or in connection with a computer or any instruction execution system. For the purposes of this description, a computer-usable or computer readable medium can be any apparatus that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
The medium can be an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system (or apparatus or device) or a propagation medium. Examples of a computer-readable medium include a semiconductor or solid state memory, magnetic tape, a removable computer diskette, a random access memory (RAM), a read-only memory (ROM), a rigid magnetic disk and an optical disk. Current examples of optical disks include compact disk-read only memory (CD-ROM), compact disk-read/write (CD-R/W) and DVD.
A data processing system suitable for storing and/or executing program code will include at least one processor coupled directly or indirectly to memory elements through a system bus. The memory elements can include local memory employed during actual execution of the program code, bulk storage, and cache memories which provide temporary storage of at least some program code in order to reduce the number of times code must be retrieved from bulk storage during execution.
Input/output or I/O devices (including but not limited to keyboards, displays, pointing devices, etc.) can be coupled to the system either directly or through intervening I/O controllers. Network adapters are coupled to the system to enable the data processing system to become coupled to other data processing systems or remote printers or storage devices through intervening private or public networks. Modems, cable modem and Ethernet cards are just a few of the currently available types of network adapters that may be used with the present invention.
With reference now to the figures, and in particular with reference to
As shown, processor I/O hierarchy 100 includes a processor chip 107 which includes one or more processors and their associated caches. Processor chip 107 is connected to memory 112 through a link 108. One of the links on the processor chip, such as link 120, connects to a PCI family I/O bridge 128. The PCI family I/O bridge 128 has one or more PCI family (PCI, PCI-X, PCI-Express, or any future generation of PCI) links that are used to connect other PCI family I/O bridges or a PCI family I/O adapter, such as PCI family adapter 1145 and PCI family adapter 2144, through a PCI link, such as links 132, 136, and 140. A PCI family adapter, such as PCI Family Adapter 1145, may be used to connect to a network attached storage 152 through a network link, such as link 156 to network 164, that connects to either a switch or router 160, which in turn connects to the network attached storage 152 via link 158 and storage server 159. A PCI family adapter, such as PCI family adapter 2144, may also be used to connect a direct attached storage device 162 through a link 148.
It is important to note that a PCI Family Adapter, such as PCI Family Adapter 1145 or PCI Family Adapter 2144, may be integrated with other components on the host node 102. For example, PCI family adapter 1145 or PCI family adapter 2144, may be integrated with PCI family I/O bridge 128. Another example is that the PCI family adapter, such as PCI family adapter 1145 or PCI family adapter 2144, may be integrated with processor chip 107.
With the exemplary embodiments of the present invention, the network attached storage devices and direct attached storage devices, such as network attached storage device 152 and direct attached storage device 162, are Small Computer System Interface (SCSI) storage devices. Each SCSI storage device has a unique SCSI ID number. This SCSI ID number uniquely identifies the SCSI storage device and may also be used in determining a priority associated with the SCSI storage device. Each SCSI storage device may further be broken up into logical units, identified by logical unit numbers (LUNs), e.g., eight logical units numbered 0 to 7.
In the exemplary embodiments of the present invention, the storage server 159 is an Internet Small Computer System Interface (iSCSI) storage server. The iSCSI protocol is an Internet Protocol (IP)-based storage networking standard for linking data storage facilities, developed by the Internet Engineering Task Force (IETF). By carrying SCSI commands over IP networks, iSCSI is used to facilitate data transfers over intranets and to manage storage over long distances.
With iSCSI based storage, when an end user or application sends a request, the operating system generates the appropriate SCSI commands and data request, which then go through encapsulation and, if necessary, encryption procedures. A packet header is added before the resulting IP packets are transmitted over an Ethernet connection.
When a packet is received, it is decrypted (if it was encrypted before transmission), and disassembled, separating the SCSI commands and request. The SCSI commands are sent on to the SCSI controller, and from there to the SCSI storage device. Because iSCSI is bi-directional, the protocol can also be used to return data in response to the original request.
It should be noted that in the iSCSI protocol, the operating system is required for generating the SCSI commands and data request. With the present invention, the iSCSI protocol is enabled in such a manner that the operating system need not be involved in the generation of the SCSI commands and data request, as discussed hereafter.
While the exemplary embodiments of the present invention will be described with regard to network attached storage devices and storage servers being SCSI storage devices and iSCSI storage servers, the present invention is not limited to such. Rather, other hardware interfaces may be used without departing from the spirit and scope of the present invention.
In addition, while the exemplary embodiments of the present invention will be described with regard to a PCI family adapter, it should be appreciated that the present invention is not limited to this type of adapter. Rather, the physical I/O adapter may be any type of I/O adapter including a PCI family adapter, a virtual I/O adapter, an endpoint device, a virtual endpoint device, a virtual I/O adapter endpoint device, or the like. One example of a virtual I/O adapter that may be used with the present invention is described in, for example, commonly assigned and co-pending U.S. patent application Ser. No. 11/065,829 entitled “Data Processing System, Method and Computer Program Product for Creation and Initialization of a Virtual Adapter on a Physical Adapter that Supports Virtual Adapter Level Virtualization,” filed on Feb. 25, 2005, which is hereby incorporated by reference. Other types of I/O adapters may be used without departing from the spirit and scope of the present invention.
With reference now to
As shown in
When the application instance 224 requires a portion of one or more storage device(s) 280 for data storage, the application instance 224 submits a request to the operating system (OS) 230. In response, the OS 230 creates a logical volume, assuming there is sufficient capacity for the portion of storage requested by the application instance, and returns information to the OS 230 identifying the logical volume that has been allocated to the application instance 224. This information may include, for example, a storage device identifier number, a logical unit number, an authentication key, and the like. The OS 230 generates one or more entries in the TPT 228 based on this information. The OS 230 may then return key values to the application instance which may be used by the application instance to submit I/O storage requests directed to the allocated logical volume.
After allocation of the logical volume for use by the application instance 224, the application instance 224 may open the logical volume and submit I/O storage requests targeting the logical volume directly. That is, the application instance 224 may generate an I/O storage logical volume open request having the key values provided to it for opening the logical volume for a current session. The application instance 224 invokes the application library 226 to perform translation of the I/O storage logical volume request into a storage command that can be processed by the storage server 270 to thereby open the logical volume for access during a session between the application instance 224 and the storage server 270. As part of this opening process, the storage server 270 may perform authentication of the open I/O request based on an authentication key passed to the storage server 270 by the application instance 224. Thereafter, the application instance 224 may submit I/O storage requests to the opened logical volume via translation by the TPT 228 into appropriate storage commands that are encapsulated into network data packets, transmitted to the storage server 270, and processed by the storage server 270 to perform I/O operations on the storage device(s) 280.
Based on information in the “open” I/O storage request from the application instance 224, the application library 226 looks up storage device information in the TPT 228 and generates a storage command based on the retrieved information from the TPT 228. As mentioned above, the “open” storage command may include an authentication key passed to the application library 228 by the application instance 224. This authentication key may be obtained from the TPT 228, for example, based on the key value passed into the application library 226 by the I/O request from the application instance 224. Alternatively, the application instance 224 itself may have a related register or other storage device that stores the authentication key such that the application instance 224 may supply the authentication key for the “open” storage command. The “open” storage command is placed in the storage command queue 232 and is eventually dispatched, by the adapter 250, to the storage server 270.
The storage server 270 receives the open storage command and performs authentication on the open storage command to ensure that the application instance 224 may be granted access to the portions of storage device(s) 280 referenced by the open storage command. Such authentication may be performed based on the authentication key included in the open storage command. The storage server 270 may perform a lookup, in a local data structure, of an entry corresponding to an identifier of a portion of storage device(s) 280 targeted by the storage command. An associated authentication key in the identified entry may then be compared against the authentication key received in the open storage command.
If the two keys match, the open storage command is permitted to be processed by the storage server and thereby, the corresponding logical volume is opened for access during a current session between the application instance 224 and the storage server 270. Thereafter, subsequent I/O storage requests may be submitted by the application instance 224, which are then converted to storage commands, encapsulated in network data packets, transmitted to the storage server, and processed by the storage server to execute I/O operations on the LUN associated with the logical volume in the storage device(s) 280. If the two keys do not match, then opening of the logical volume for access by the application instance 224 is denied and an error message may be returned to the application instance 224.
As will be described with reference to
Using the above mechanisms of the present invention, once portions of the storage device(s) 280 are allocated for access by an application instance 224, the operating system 230 need not be included in the processing of I/O storage requests between the application instance and the storage server 270. That is, the application instance 224 may open a logical volume associated with a LUN in the storage device(s) 280 and submit I/O storage requests directly to the storage server via the application library 226, storage command queue 232, and adapter 250. Thus, the context switching, that is required when the operating system 230 is involved in I/O storage request processing in prior art systems, is eliminated.
In addition, since authentication of “open” storage commands is offloaded to the storage server 270, the host system 200 need not be modified to include authentication mechanisms and utilize its resources to perform authentication when processing “open” I/O storage requests. Thus, the adapter 250 and operating system 230 of the host system 200 may remain as generally known in the prior art. The only modification needed to the host system 200 is the modification to the application library 226 to include the TPT 228 mechanism of the present invention. The storage server 270 is modified to include authentication logic for authenticating “open” storage commands and returning results of the authentication.
Turning next to
As shown in
Logical volume protection table entry N 320 depicts an example entry in the logical volume protection table segment. Each entry in the logical volume protection table (LVPT) segment 304 contains a set of fields that are used to define that entry. Logical volume protection table entry N 320 contains the following fields: Protection Domain, Access Controls, Storage Device Identifier Number, Logical Unit Number, Offset, Length, and Authentication Key. The Protection Domain is a value that is provided to the application instance 305 when the LVPTE 320 is created in the LVPT segment 304. When the application instance 305 submits an I/O request directed to a particular logical volume, the application instance 305 provides a key value for identifying a particular LVPTE and a protection domain value that may be used to authenticate the application instance's ability to access the identified LVPTE. If the protection domain value in the I/O request matches the protection domain value in the LVPTE, then the I/O request may be further processed; otherwise, an error message may be returned.
The Access Controls identify whether the LVPTE is still valid or if it has been flagged as having been deleted or deallocated. If the LVPTE is no longer valid, then further processing of an I/O request targeting the LVPTE is aborted with an error message being returned. The Access Controls may further identify what types of I/O operations may be performed on the portion(s) of storage device(s) corresponding to the logical volume. If an I/O request identifies an I/O operation that is not permitted, then further processing of the I/O request may be aborted with an error message being returned.
The Storage Device Identifier Number may be, for example, a SCSI Identifier Number for the SCSI storage device or devices that make up the logical volume identified by the Logical Unit Number. The Logical Unit Number may be a SCSI LUN, for example. The SCSI Identifier Number (ID) and SCSI Logical Unit Number (LUN) are used to associate the LVPTE 320 with a specific SCSI device and a specific LUN within that device, respectively.
The Offset and Length are values that identify an offset to a starting address for a storage block in the logical volume and a length of the storage block. The Offset may be used to calculate a storage block address for the start of the storage block, e.g., a linear block address (LBA) for the storage block. The calculation of a LBA from the Offset is performed in a manner generally known in the art and thus, a detailed explanation is not provided herein.
The information contained in the fields of the LVPTE 320 may be used to create a storage command queue entry in the SCQ 322. That is, the Offset in the LVPTE 320 may be used to calculate a starting storage block address for a targeted storage block. This starting storage block address may be combined with the storage device identifier number, logical unit number, and length obtained from the LVPTE 320 and the authentication key to generate an “open” storage command for opening the logical volume corresponding to the LVPTE 320. Once opened, the starting storage block address may be combined with the storage device identifier number, logical unit number and length to generate subsequent I/O storage commands. These storage commands are placed in the SCQ 322 after they are generated so that they may be dispatched to the remotely located storage server.
The authentication key is a key value that was provided to the application instance 305 from the storage server when the application instance 305 requested allocation of a logical volume in the storage system, i.e. storage server and storage device(s). The authentication key uniquely identifies the application instance 305 as the only source that is capable of accessing the storage locations of the logical volume created by the storage server for that application instance 305. In one embodiment, the authentication key is stored in a register or other storage device in association with the application instance 305 such that only that application instance 305 can provide the authentication key when opening a logical volume. In another exemplary embodiment, the authentication key is stored in an entry of the logical volume protection table 302 for the logical volume and, by use of the protection domain, only the application instance 305 may access that entry in the logical volume protection table 302.
As shown in
The storage command queue 322 is a queue of storage commands, such as storage command n 330, that are to be processed by the I/O adapter 316. The storage command queue 322 may be, for example, a SCSI command queue that contains SCSI commands that are processed by the I/O adapter 316 for transmission to a remotely located storage server via one or more networks. The use of a storage command queue and I/O adapter are generally known in the art and thus, a detailed explanation of the processing performed by the I/O adapter 316 in transmitting and receiving I/O transactions is not included herein. Suffice it to say, the storage command queue 322 and I/O adapter 316 operate in a manner generally known in the art.
Once storage commands are processed by the storage server (not shown), a completion message may be returned to the system image 300 via the adapter 316 and completion queue 350. The system image 300 or operating system may retrieve completion queue entries and process them to thereby inform the application instance 305 of the completion of a particular I/O storage request submitted by the application instance 305. The use of completion messages and completion queues is also generally known in the art and thus, a further detailed explanation of the processing involved is not provided herein.
With reference next to
As part of this processing of storage commands, the storage server controller 420 may perform authentication checks on the storage commands to ensure that the application instances that are the source of the storage commands are permitted to access the storage locations targeted by the storage commands. Such authentication checks may be performed when an “open” storage command is received for opening a logical volume for access by an application instance during a session between the application instance and the storage server 410, for example.
In one exemplary embodiment of the present invention, the authentication check involves performing a lookup operation on the authentication data structure 440 of the storage device identifier number and logical unit number referenced in the storage command. This lookup operation will result in an entry being retrieved that correlates the storage device identifier, logical unit number, and an associated authentication key that was generated when the logical volume corresponding to the storage device identifier and logical unit number was created.
The storage server controller 420 may then compare the authentication key retrieved from the authentication data structure 440 with the authentication key received in the storage command. If there is a match, then the storage command originated from an application instance that is permitted to access the storage locations of the logical volume targeted by the storage command. As a result, the logical volume is opened for access by the application instance during a current session between the application instance and the storage server. Subsequent I/O requests from the application instance are translated into storage commands in a similar manner, except that the authentication key may or may not be incorporated into the storage command. These storage commands may be processed by the storage server in a manner generally known in the art so as to perform the requested I/O operations on the targeted portion(s) of the storage device(s) 450, 452 and/or 454. Upon completion of the requested I/O operations, the storage server controller 420 may return a completion message to the host system which is placed in a completion queue for processing by the system image. In this way, the application instance is informed of the completion of the requested I/O operations.
If the authentication keys do not match, then the storage command originated from an application instance that is not permitted to access the storage locations of the logical volume targeted by the storage command. In such a case, the logical volume is not opened for access by the application instance and an error message may be returned to the host system. This error message may be returned as a completion message that is placed in the completion queue for processing by the system image to thereby inform the application instance of the inability to complete the requested I/O operations.
With reference next to
Accordingly, blocks of the flowchart illustrations support combinations of means for performing the specified functions, combinations of steps for performing the specified functions and program instruction means for performing the specified functions. It will also be understood that each block of the flowchart illustrations, and combinations of blocks in the flowchart illustrations, can be implemented by special purpose hardware-based computer systems which perform the specified functions or steps, or by combinations of special purpose hardware and computer instructions.
As shown in
The operating system then sends a request to the storage server to allocate one or more LUNs on one or more storage devices for the OS logical volume (step 540). The storage server determines if there is a storage device with sufficient SCSI LUN capacity available to satisfy the application instance's OS logical volume size (step 550). If so, then the LUN(s) are created on the identified storage device and are associated with the OS logical volume (step 560), such as associating the LUN(s) with the logical volume in the logical volume protection table, in a logical volume manager, or the like, for example.
An authentication key is then associated with the LUN and returned to the application instance (step 570). This may involve storing the authentication key in the logical volume protection table entry corresponding to the logical volume or storing the authentication key in a register or other storage device in association with the application instance. Thereafter, the application instance may use the authentication key to open the LUN associated with the OS logical volume in order to perform I/O operations on storage locations within the LUN.
If there is not sufficient SCSI LUN capacity on a storage device (step 550), then a determination is made as to whether the logical volume can be created using multiple SCSI devices with one or more LUNs (step 580). If so, the operation continues to step 560. If not, then the operation cannot be completed successfully and an error message may be returned to the application instance (step 590). The operation then terminates.
In response, the operating system executes a bootstrap algorithm to authorize the application instance's access to the SCSI device(s) that contain the SCSI LUN(s) associated with the OS logical volume (step 620). The “open” I/O storage request is then translated into a storage command and sent to the storage server (step 630). The storage server performs an authentication check to determine if the authentication key provided in the “open” I/O storage request matches an authentication key corresponding to the LUN(s) of the OS logical volume that is targeted by the “open” I/O storage request (step 640). If the authentication keys match, then the LUN(s) of the OS logical volume are opened (step 650) and a successful completion message is returned to the application instance (step 660). If the authentication keys do not match, then the “open” operation cannot be completed and an error message is returned to the application instance (step 670).
Having opened the LUNs of the OS logical volume for the current session between the application instance and the storage server, the application instance may thereafter submit I/O storage requests to the storage server using the translation protection table data structure of the present invention, as described previously. The authentication described above need only be performed when an “open” storage command is processed. Subsequent I/O requests from the application instance, and thus storage commands, need not be authenticated in the same manner. Once the application instance no longer needs to access the LUN(s), or upon the occurrence of a session close event, the LUN(s) may be closed.
As discussed above, once the application instance opens an OS logical volume, the application instance may perform I/O storage operations on the LUN(s) of the OS logical volume. For example, the application instance may perform read/write I/O operations on the opened OS logical volume using the translation protection table data structure available in the application library. The translation protection table in the application library is used to convert the I/O storage request obtained from the application instance into a storage command that may be processed by the storage server at a remotely located storage system.
The I/O storage request may be verified as coming from an application instance that may access the logical volume identified in the I/O storage request. This may be done by using a protection domain and access controls as previously mentioned above. This verification process is optional and is not necessary to the operation of the present invention and thus, is not explicitly shown in
The storage device identifier, logical unit number, offset and length information in the retrieved entry are used to generate a storage command (step 830). The storage command is then placed in a storage command queue (step 840). The adapter then retrieves the storage command from the storage command queue and encapsulates the storage command into network data packets for transmission to the storage server (step 850). The operation then ends.
In addition to opening/closing OS logical volumes and submitting read/write I/O operations, various translation protection table management operations may be performed including creating logical volume protection table entries, querying logical volume protection table entry attributes, modifying logical volume protection table entry attributes, and deleting or destroying logical volume protection table entries. Each of these operations will be described in detail with reference to
Thereafter, or if the logical volume protection table has already been created, an entry in the logical volume protection table is created (step 950). This logical volume protection table entry may identify the storage device identifier number, logical unit number, offset to the starting address of the logical volume, the length of the LUNs, and an associated authentication key that is received from a storage server, as previously described above. The operating system then returns the logical volume protection table entry key values for the created entry to the application instance (step 960). The operation then ends.
A determination is made as to whether there are any I/O transactions active on the entry or entries in the logical volume protection table that are to be modified (step 1120). If there are I/O transactions pending on the entry or entries, a timer is initiated (step 1130) and a determination is made as to whether the timer times out before a quiescent point is reached (step 1140). A quiescent point is a point at which no I/O transactions are active on the entry or entries. If the timer times out before the quiescent point is reached, then the application library generates an error result and returns it to the application instance (step 1150).
If there are no I/O transactions active on the entry or entries, or if the quiescent point is reached before the timer times out, the application requests a logical volume modification through the application library (step 1160). The application library then requests that the operating system perform the logical volume modification (step 1170). As part of this logical volume modification, the operating system may obtain an allocation of additional LUNs for the logical volume, for example, and thus, modifications to the attributes in the logical volume protection table entry for the logical volume may be required. The application library then returns the attributes of the modified logical volume protection table entry (step 1180) and the operation ends.
A determination is made as to whether there are any I/O transactions active on the entry or entries in the logical volume protection table that are to be deleted or destroyed (step 1220). If there are I/O transactions pending on the entry or entries, a timer is initiated (step 1230) and a determination is made as to whether the timer times out before a quiescent point is reached (step 1240). A quiescent point is a point at which no I/O transactions are active on the entry or entries. If the timer times out before the quiescent point is reached, then the application library generates an error result and returns it to the application instance (step 1250).
If there are no I/O transactions active on the entry or entries, or if the quiescent point is reached before the timer times out, the application requests a logical volume destruction or deletion through the application library (step 1260). The application library then requests that the operating system perform the logical volume deletion or destruction (step 1270) and the operation terminates.
It should be noted that while the above flowcharts make reference to logical volume protection tables, the operations outlined above may also be applied to file mode I/O based embodiments of the present invention. That is, rather than creating, querying, modifying, and deleting entries in a logical volume protection table, the mechanisms of the present invention may also be used to perform such operations on a file name protection table and file extension protection table that includes entries for each file and its corresponding file extensions maintained in a file system of the operating system. Thus, during creation, a file name protection table entry may be generated along with its file extension protection table entry or entries. The file name protection table entry may include a protection domain, access control information, and a pointer to one or more file extension protection table entries corresponding to the file name protection table entry. The file extension protection table entry or entries may identify a storage device identifier number, logical unit number, offset and length. A file name key may then be returned to the application instance for later use in accessing the file name protection table entry to thereby access the file. Similarly, querying, modifying, and deletion may be performed with respect to such a file name protection table entry and/or file extension protection table entry or entries.
In addition, the file name protection table 1302 may further include the authentication key that was assigned to the file corresponding to the file name protection table 1302 when the file was created. This authentication key may be used to open the file for a current session between an application instance and the storage server so that the application instance may perform I/O operations on the file. The process for opening a file is similar to the process described above for opening a logical volume. That is, an “open” storage command is sent to the remotely located storage system (not shown) via the adapter 1316 which is authenticated by the storage server of the remotely located storage system based on the authentication key in the “open” storage command. If the “open” storage command is verified as being authentic, the file is opened, in a current session between the application instance 1305 and the storage server, for I/O operations from the application instance 1305.
The file extension protection table 1312 contains an entry for each file extension. Each of these entries describes the storage device identifier number for the storage device(s) that make up the logical volume or LUN, the logical unit number, an offset into the logical volume or LUN address space for a storage block, and a length of the storage block in the logical volume or LUN. In the depicted example, the file extension protection table 1312 contains entries for each SCSI logical unit number (LUN).
The file extension protection table 1312 may be segmented into a set of file extension protection table segments. The segments may be interconnected using several data structures, including a B-tree, a tree made up of pointers in non-leaf nodes and pointers in leaf nodes, simple linked list, or the like. In the depicted example, file extension protection table segment 1314 uses a simple linked list where the first entry in the table is a pointer to the next table that contains file extension protection table entries.
File extension protection table entry N 1320 depicts an example entry in the file extension protection table segment. Each entry in the file extension protection table segment 1314 contains a set of fields that are used to define that entry. File extension protection table entry N 1320 contains the following fields: Storage Device Identifier Number, Logical Unit Number, Offset and Length. The Storage Device Identifier Number may be, for example, a SCSI Identifier Number for the SCSI storage device or device(s) that make up the logical volume identified by the Logical Unit Number. The Logical Unit Number may be a SCSI LUN, for example. The SCSI Identifier Number (ID) and SCSI Logical Unit Number (LUN) are used to associate the FEPT 1312 entry with a specific SCSI device and a specific LUN within that device, respectively.
As mentioned above, the Offset and Length are values that identify an offset to a starting address for a storage block in the logical volume and a length of the storage block. The Offset may be used to calculate a storage block address for the start of the storage block, e.g., a linear block address (LBA) for the storage block.
The information contained in the fields of the FEPT 1312 entry may be used to create a storage command queue entry in the SCQ 1322. That is, the Offset in the FEPT entry 1320 may be used to calculate a starting storage block address for a targeted storage block. This starting storage block address may be combined with the storage device identifier number, logical unit number, and length obtained from the FEPT 1312 entry. For an “open” I/O storage command, these values may further be combined with the authentication key passed by the application instance 1305, as obtained from the file name protection table 1302, based on a file name key referenced in the I/O request from the application instance.
As shown in
The file mode I/O embodiment described above may make use of similar operations as outlined in
This authentication key may likewise be used to open the file once the file has been allocated by the storage system in a similar manner as outlined in
In addition to these operations, the closing of a file and creation of file name protection table and file extension protection table entries may be performed in a similar manner as the logical volume based operations in
Thus, with the present invention, mechanisms are provided for enabling direct I/O between an application instance and a remotely located network attached storage device via a storage server. It should be noted that, while the above mechanisms of the exemplary embodiments of the present invention make use of the operating system or system image to perform a number of operations with regard to the creation and management of the translation protection table entries, these operations are not generally performed with each I/O storage request. That is, the operating system or system image is only involved in the opening/closing of an OS logical volume and the setup of the translation protection table entries. The operating system or system image is not required in order to process each actual I/O storage request submitted by the middleware or application instance since the application can use the translation protection table and mechanisms described above to process the I/O storage request. Furthermore, the authentication is performed by the storage server and not the operating system. As a result, the present invention eliminates the context switches, and their associated overhead, required by prior art mechanisms, as explained in the background of the invention above.
It is important to note that while the present invention has been described in the context of a fully functioning data processing system, those of ordinary skill in the art will appreciate that the processes of the present invention are capable of being distributed in the form of a computer readable medium of instructions and a variety of forms and that the present invention applies equally regardless of the particular type of signal bearing media actually used to carry out the distribution. Examples of computer readable media include recordable-type media, such as a floppy disk, a hard disk drive, a RAM, CD-ROMS, DVD-ROMs, and transmission-type media, such as digital and analog communications links, wired or wireless communications links using transmission forms, such as, for example, radio frequency and light wave transmissions. The computer readable media may take the form of coded formats that are decoded for actual use in a particular data processing system.
The description of the present invention has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art. The embodiment was chosen and described in order to best explain the principles of the invention, the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated.