1. Field of the Invention
This invention relates to computer systems and, more particularly, to off-host virtualization within storage environments.
2. Description of the Related Art
Many business organizations and governmental entities rely upon applications that access large amounts of data, often exceeding a terabyte of data, for mission-critical applications. Often such data is stored on many different storage devices, which may be heterogeneous in nature, including many different types of devices from many different manufacturers.
Configuring individual applications that consume data, or application server systems that host such applications, to recognize and directly interact with each different storage device that may possibly be encountered in a heterogeneous storage environment would be increasingly difficult as the environment scaled in size and complexity. Therefore, in some storage environments, specialized storage management software and hardware may be used to provide a more uniform storage model to storage consumers. Such software and hardware may also be configured to present physical storage devices as virtual storage devices (e.g., virtual SCSI disks) to computer hosts, and to add storage features not present in individual storage devices to the storage model. For example, features to increase fault tolerance, such as data mirroring, snapshot/fixed image creation, or data parity, as well as features to increase data access performance, such as disk striping, may be implemented in the storage model via hardware or software. The added storage features may be referred to as storage virtualization features, and the software and/or hardware providing the virtual storage devices and the added storage features may be termed “virtualizers” or “virtualization controllers”. Virtualization may be performed within computer hosts, such as within a volume manager layer of a storage software stack at the host, and/or in devices external to the host, such as virtualization switches or virtualization appliances. Such external devices providing virtualization may be termed “off-host” virtualizers, and may be utilized in order to offload processing required for virtualization from the host. Off-host virtualizers may be connected to the external physical storage devices for which they provide virtualization functions via a variety of interconnects, such as Fiber Channel links, Internet Protocol (IP) networks, and the like.
Traditionally, storage software within a computer host consists of a number of layers, such as a file system layer, a disk driver layer, etc. Some of the storage software layers may form part of the operating system in use at the host, and may differ from one operating system to another. When accessing a physical disk, a layer such as the disk driver layer for a given operating system may be configured to expect certain types of configuration information for the disk to be laid out in a specific format, for example in a header (located at the first few blocks of the disk) containing disk partition layout information. The storage stack software layers used to access local physical disks may also be utilized to access external storage devices presented as virtual storage devices by off-host virtualizers. Therefore, it may be desirable for an off-host virtualizer to provide configuration information for the virtual storage devices in a format expected by the storage stack software layers. In addition, it may be desirable for the off-host virtualizer to implement a technique to flexibly and dynamically map storage within external physical storage devices to the virtual storage devices presented to the host storage software layers, e.g., without requiring a reboot of the host.
Various embodiments of a system and method for dynamic logical unit (LUN) mapping are disclosed. According to a first embodiment, a system may include a first host and an off-host virtualizer, such as a virtualization switch or a virtualization appliance. The off-host virtualizer may be configured to present a virtual storage device, such as a virtual LUN, that comprises one or more regions that are initially unmapped to physical storage, and make the virtual storage device accessible to the first host. The first host may include a storage software stack including a first layer, such as a disk driver layer, configured to detect and access the virtual storage device as if the virtual storage device were mapped to physical storage. A number of different techniques may be used by the off-host virtualizer in various embodiments to present the virtual storage device as if it were mapped to physical storage. For example, in one embodiment, the off-host virtualizer may be configured to generate metadata formatted according to a requirement of an operating system in use at the host and map a portion of the virtual storage device to the metadata, where the metadata makes the virtual storage device appear to be mapped to physical storage. The recognition of the virtual storage device as a “normal” storage device that is backed by physical storage may occur during a system initialization stage prior to an initiation of production I/O operations. In this way, an unmapped or “blank” virtual LUN may be prepared for subsequent dynamic mapping by the off-host virtualizer. The unmapped LUN may be given an initial size equal to the maximum allowed LUN size supported by the operating system in use at the host, so that the size of the virtual LUN may not require modification after initialization. In some embodiments, multiple virtual LUNs may be pre-generated for use at a single host, for example in order to isolate storage for different applications, or to accommodate limits on maximum LUN sizes.
In one embodiment, the system may also include two or more physical storage devices, and the off-host virtualizer may be configured to dynamically map physical storage from a first and a second physical storage device to a respective range of addresses within the first virtual storage device. For example, the off-host virtualizer may be configured to perform an N-to-1 mapping between the physical storage devices (which may be called physical LUNs) and virtual LUNs, allowing storage in the physical storage devices to be accessed from the host via the pre-generated virtual LUNs. Configuration information regarding the location of the first and/or the second address ranges within the virtual LUN (i.e., the regions of the virtual LUN that are mapped to the physical storage devices) may be passed from the off-host virtualizer to a second layer of the storage stack at the host (e.g., an intermediate driver layer above a disk driver layer) using a variety of different mechanisms. Such mechanisms may include, for example, the off-host virtualizer writing the configuration information to certain special blocks within the virtual LUN, sending messages to the host over a network, or special extended SCSI mode pages. In one embodiment, two or more different ranges of physical storage within a single physical storage device may be mapped to corresponding pre-generated virtual storage devices such as virtual LUNs and presented to corresponding hosts. That is, the off-host virtualizer may allow each host of a plurality of hosts to access a respective portion of a physical storage device through a respective virtual LUN. In such embodiments, the off-host virtualizer may also be configured to implement a security policy isolating the ranges of physical storage within the shared physical storage device; i.e., to allow a host to access only those regions to which the host has been granted access, and to prevent unauthorized accesses.
In another embodiment, the off-host virtualizer may be further configured to aggregate storage within one or more physical storage device into a logical volume, map the logical volume to a range of addresses within a pre-generated virtual storage device, and make the logical volume accessible to the second layer of the storage stack (e.g., by providing logical volume metadata to the second layer), allowing I/O operations to be performed on the logical volume. Storage from a single physical storage device may be aggregated into any desired number of different logical volumes, and any desired number of logical volumes may be mapped to a single virtual storage device or virtual LUN. The off-host virtualizer may be further configured to provide volume-level security, i.e., to prevent unauthorized access from a host to a logical volume, even when the physical storage corresponding to the logical volume is part of a shared physical storage device. In addition, physical storage from any desired number of physical storage devices may be aggregated into a logical volume using a virtual LUN, thereby allowing a single volume to extend over a larger address range than the maximum allowed size of a single physical LUN. The virtual storage devices or virtual LUNs may be distributed among a number of independent front-end storage networks, such as fiber channel fabrics, and the physical storage devices backing the logical volumes may be distributed among a number of independent back-end storage networks. For example, a first host may access its virtual storage devices through a first storage network, and a second host may access its virtual storage devices through a second storage network independent from the first (that is, reconfigurations and/or failures in the first storage network may not affect the second storage network). Similarly, the off-host virtualizer may access a first physical storage device through a third storage network, and a second physical storage device through a fourth storage network. The ability of the off-host virtualizer to dynamically map storage across pre-generated virtual storage devices distributed among independent storage networks may support a robust and flexible storage environment.
a is a block diagram illustrating one embodiment of a computer system.
b is a block diagram illustrating an embodiment of a system configured to utilize off-host block virtualization.
a is a block diagram illustrating the addition of operating-system specific metadata to a virtual logical unit (LUN) encapsulating a source volume, according to one embodiment.
b is a block diagram illustrating an example of an unmapped virtual LUN according to one embodiment.
While the invention is susceptible to various modifications and alternative forms, specific embodiments are shown by way of example in the drawings and are herein described in detail. It should be understood, however, that drawings and detailed description thereto are not intended to limit the invention to the particular form disclosed, but on the contrary, the invention is to cover all modifications, equivalents and alternatives falling within the spirit and scope of the present invention as defined by the appended claims.
a is a block diagram illustrating a computer system 100 according to one embodiment. System 100 includes a host 110 coupled to a physical block device 120 via an interconnect 130. Host 110 includes a traditional block storage software stack 140A that may be used to perform I/O operations on a physical block device 120 via interconnect 130.
Generally speaking, a physical block device 120 may comprise any hardware entity that provides a collection of linearly addressed data blocks that can be read or written. For example, in one embodiment a physical block device may be a single disk drive configured to present all of its sectors as an indexed array of blocks. In another embodiment the physical block device may be a disk array device, or a disk configured as part of a disk array device. It is contemplated that any suitable type of storage device may be configured as a block device, such as fixed or removable magnetic media drives (e.g., hard drives, floppy or Zip-based drives), writable or read-only optical media drives (e.g., CD or DVD), tape drives, solid-state mass storage devices, or any other type of storage device. The interconnect 130 may utilize any desired storage connection technology, such as various variants of the Small Computer System Interface (SCSI) protocol, Fiber Channel, Internet Protocol (OP), Internet SCSI (iSCSI), or a combination of such storage networking technologies. The block storage software stack 140A may comprise layers of software within an operating system at host 110, and may be accessed by a client application to perform I/O (input/output) on a desired physical block device 120.
In the traditional block storage stack, a client application may initiate an I/O request, for example as a request to read a block of data at a specified offset within a file. The request may be received (e.g., in the form of a reado system call) at the file system layer 112, translated into a request to read a block within a particular device object (i.e., a software entity representing a storage device), and passed to the disk driver layer 114. The disk driver layer 114 may then select the targeted physical block device 120 corresponding to the disk device object, and send a request to an address at the targeted physical block device over the interconnect 130 using the interconnect-dependent I/o driver layer 116. For example, a host bus adapter (such as a SCSI HBA) may be used to transfer the I/O request, formatted according to the appropriate storage protocol (e.g., SCSI), to a physical link of the interconnect (e.g., a SCSI bus). At the physical block device 120, an interconnect-dependent firmware layer 122 may receive the request, perform the desired physical I/O operation at the physical storage layer 124, and send the results back to the host over the interconnect. The results (e.g., the desired blocks of the file) may then be transferred through the various layers of storage stack 140A in reverse order (i.e., from the interconnect-dependent I/O driver to the file system) before being passed to the requesting client application.
In some operating systems, the storage devices addressable from a host 110 may be detected only during system initialization, e.g., during boot. For example, an operating system may employ a four-level hierarchical addressing scheme of the form <“hba”, “bus”, “target”, ”lun”> for SCSI devices, including a SCSI HBA identifier (“hba”), a SCSI bus identifier (“bus”), a SCSI target identifier (“target”), and a logical unit identifier (“lun”), and may be configured to populate a device database with addresses for available SCSI devices during boot. Host 110 may include multiple SCSI HBAs, and a different SCSI adapter identifier may be used for each HBA. The SCSI adapter identifiers may be numbers issued by the operating system kernel, for example based on the physical placement of the HBA cards relative to each other (i.e., based on slot numbers used for the adapter cards). Each HBA may control one or more SCSI buses, and a unique SCSI bus number may be used to identify each SCSI bus within an HBA. During system initialization, or in response to special configuration commands, the HBA may be configured to probe each bus to identify the SCSI devices currently attached to the bus. Depending on the version of the SCSI protocol in use, the number of devices (such as disks or disk arrays) that may be attached on a SCSI bus may be limited, e.g., to 15 devices excluding the HBA itself. SCSI devices that may initiate I/O operations, such as the HBA, are termed SCSI initiators, while devices where the physical I/O may be performed are called SCSI targets. Each target on the SCSI bus may identify itself to the HBA in response to the probe. In addition, each target device may also accommodate up to a protocol-specific maximum number of “logical units” (LUNs) representing independently addressable units of physical storage within the target device, and may inform the HBA of the logical unit identifiers. A target device may contain a single LUN (e.g., a LUN may represent an entire disk or even a disk array) in some embodiments. The SCSI device configuration information, such as the target device identifiers and LUN identifiers may be passed to the disk driver layer 114 by the HBAs. When issuing an I/O request, disk driver layer 114 may utilize the hierarchical SCSI address described above.
When accessing a LUN, disk driver layer 114 may expect to see OS-specific metadata at certain specific locations within the LUN. For example, in many operating systems, the disk driver layer 114 may be responsible for implementing logical partitioning (i.e., subdividing the space within a physical disk into partitions, where each partition may be used for a smaller file system). Metadata describing the layout of a partition (e.g., a starting block offset for the partition within the LUN, and the length of a partition) may be stored in an operating-system dependent format, and in an operating system-dependent location, such as in a header or a trailer, within a LUN. In the Solaris™ operating system from Sun Microsystems, for example, a virtual table of contents (VTOC) structure may be located in the first partition of a disk volume, and a copy of the VTOC may also be located in the last two cylinders of the volume. In addition, the operating system metadata may include cylinder alignment and/or cylinder size information, as well as boot code if the volume is bootable. Operating system metadata for various versions of Microsoft Windows™ may include a “magic number” (a special number or numbers that the operating system expects to find, usually at or near the start of a disk), subdisk layout information, etc. If the disk driver layer 114 does not find the metadata in the expected location and in the expected format, the disk driver layer may not be able to perform I/O operations at the LUN.
The relatively simple traditional storage software stack 140A has been enhanced over time to help provide advanced storage features, most significantly by introducing block virtualization layers. In general, block virtualization refers to a process of creating or aggregating logical or virtual block devices out of one or more underlying physical or logical block devices, and making the virtual block devices accessible to block device consumers for storage operations. For example, in one embodiment of block virtualization, storage within multiple physical block devices, e.g. in a fiber channel storage area network (SAN), may be aggregated and presented to a host as a single virtual storage device such as a virtual LUN (VLUN), as described below in further detail. In another embodiment, one or more layers of software may rearrange blocks from one or more block devices, such as disks, and add various kinds of functions. The resulting rearranged collection of blocks may then be presented to a storage consumer, such as an application or a file system, as one or more aggregated devices with the appearance of one or more basic disk drives. That is, the more complex structure resulting from rearranging blocks and adding functionality may be presented as if it were one or more simple arrays of blocks, or logical block devices. In some embodiments, multiple layers of virtualization may be implemented. That is, one or more block devices may be mapped into a particular virtualized block device, which may be in turn mapped into still another virtualized block device, allowing complex storage functions to be implemented with simple block devices. Further details on block virtualization, and advanced storage features supported by block virtualization, are provided below.
Block virtualization may be implemented at various places within a storage stack and the associated storage environment, in both hardware and software. For example, a block virtualization layer in the form of a volume manager, such as the VERITAS Volume Manager™ from VERITAS Software Corporation, may be added between the disk driver layer 114 and the file system layer 112. In some storage environments, virtualization functionality may be added to host bus adapters, i.e., in a layer between the interconnect-dependent I/O driver layer 116 and interconnect 130. Block virtualization may also be performed outside the host 110, e.g., in a virtualization appliance or a virtualizing switch, which may form part of the interconnect 130. Such external devices providing block virtualization (i.e., devices that are not incorporated within host 110) may be termed off-host virtualizers or off-host virtualization controllers. In some storage environments, block virtualization functionality may be implemented by an off-host virtualizer in cooperation with a host-based virtualizer. That is, some block virtualization functionality may be performed off-host, and other block virtualization features may be implemented at the host.
While additional layers may be added to the storage software stack 140A, it is generally difficult to remove or completely bypass existing storage software layers of operating systems. Therefore, off-host virtualizers may typically be implemented in a manner that allows the existing storage software layers to continue to operate, even when the storage devices being presented to the operating system are virtual rather than physical, and remote rather than local. For example, because disk driver layer 114 expects to deal with SCSI LUNs when performing I/O operations, an off-host virtualizer may present a virtualized storage device to the disk driver layer as a virtual LUN. In some embodiments, as described below in further detail, on off-host virtualizer may encapsulate, or emulate the metadata for, a LUN when providing a host 110 access to a virtualized storage device. In addition, as also described below, one or more software modules or layers may be added to storage stack 140A to support additional forms of virtualization using virtual LUNs.
b is a block diagram illustrating an embodiment of system 100 configured to utilize off-host block virtualization. As shown, the system may include an off-host virtualizer 180, such as a virtualization switch or a virtualization appliance, which may be included within interconnect 130 linking host 110 to physical block device 120. Host 110 may comprise an enhanced storage software stack 140B, which may include an intermediate driver layer 113 between the disk driver layer 114 and file system layer 112. In one embodiment, off-host virtualizer 180 may be configured to present a virtual storage device (e.g., a virtual LUN or VLUN) that includes one or more regions that are not initially mapped to physical storage to disk driver layer 114 using a technique (such as metadata emulation) that allows disk driver layer to detect and access the virtual storage device as if it were mapped to physical storage. After the virtual storage device has been detected, off-host virtualizer 180 may map storage within physical block device 120, or multiple physical block devices 120, into the virtual storage device. The back-end storage within a physical block device 120 that is mapped to a virtual LUN may be termed a “physical LUN (PLUN)” in the subsequent description. In another embodiment, off-host virtualizer 180 may be configured to aggregate storage within one or more physical block devices 120 as one or more logical volumes, and map the logical volumes within the address space of a virtual LUN presented to host 110. Off-host virtualizer 180 may further be configured to make the portions of the virtual LUN that are mapped to the logical volumes accessible to intermediate driver layer 113. For example, in some embodiments, off-host virtualizer 180 may be configured to provide metadata or configuration information on the logical volumes to intermediate driver layer 113, allowing intermediate driver layer 113 to locate the blocks of the logical volumes and perform desired I/O operations on the logical volumes located within the virtual LUN on behalf of clients such as file system layer 112 or other applications. File system layer 112 and applications (such as database management systems) configured to utilize intermediate driver layer 113 and lower layers of storage stack 140B may be termed “virtual storage clients” or “virtual storage consumers” herein. While off-host virtualizer 180 is shown within interconnect 130 in the embodiment depicted in
As described above, in some embodiments, disk driver layer 114 may expect certain operating system-specific metadata to be present at operating-system specific locations or offsets within a LUN. When presenting a virtual LUN to a host 110, therefore, in such embodiments off-host virtualizer 180 may logically insert the expected metadata at the expected locations.
The metadata inserted within virtual LUN 210 may be stored in persistent storage, e.g., within some blocks of physical block device 120 or at off-host virtualizer 180, in some embodiments, and logically concatenated with the mapped blocks 220. In other embodiments, the metadata may be generated dynamically, whenever a host 110 accesses the virtual LUN 210. In some embodiments, the metadata may be generated by an external agent other than off-host virtualizer 180. The external agent may be capable of emulating metadata in a variety of formats for different operating systems, including operating systems that may not have been known when the off-host virtualizer 180 was deployed. In one embodiment, off-host virtualizer 180 may be configured to support more than one operating system; i.e., off-host virtualizer may logically insert metadata blocks corresponding to any one of a number of different operating systems when presenting virtual LUN 210 to a host 110, thereby allowing hosts with different operating systems to share access to a storage device 120.
While logical volumes such as source volume 205 may typically be created and dynamically reconfigured (e.g., grown or shrunk, imported to hosts 110 or exported from hosts 110) efficiently, similar configuration operations on LUNs may typically be fairly slow. Some LUN reconfiguration operations may be at least partially asynchronous, and may have unbounded completion times and/or ambiguous failure states. On many operating systems, LUN reconfiguration may only be completed after a system reboot; for example, a newly created physical or virtual LUN may not be detected by the operating system without a reboot. In order to be able to flexibly map logical volumes to virtual LUNs, while avoiding the problems associated with LUN reconfigurations, therefore, it may be advisable to generate unmapped virtual LUNs (e.g., to create operating system metadata for virtual LUNs that are not initially mapped to any physical LUNs or logical volumes) and pre-assign the unmapped virtual LUNs to hosts 110 as part of an initialization process. The initialization process may be completed prior to performing storage operations on the virtual LUNs on behalf of applications. During the initialization process (which may include a reboot of the system in some embodiments) the layers of the software storage stack 140B may be configured to detect the existence of the virtual LUNs as addressable storage devices. Subsequent to the initialization, off-host virtualizer 180 may dynamically map physical LUNs and/or logical volumes to the virtual LUNs (e.g., by modifying portions of the operating system metadata), as described below in further detail. The term “dynamic mapping”, as used herein, refers to a mapping of a virtual storage device (such as a VLUN) that is performed by modifying one or more blocks of metadata, and/or by communicating via one or more messages to a host 110, without requiring a reboot of the host 110 to which the virtual storage device is presented.
b is a block diagram illustrating an example of an unmapped virtual LUN 230 according to one embodiment. As shown, the unmapped virtual LUN 230 may include an operating system metadata header 215 and an operating system metadata trailer 225, as well as a region of unmapped blocks 235. In some embodiments, the size of the region of unmapped blocks (X blocks in the depicted example) may be set to a maximum permissible LUN or volume size supported by an operating system, so that any subsequent mapping of a volume or physical LUN to the virtual LUN does not require an expansion of the size of the virtual LUN. In one alternative embodiment, the unmapped virtual LUN may consist of only the emulated metadata (e.g., header 215 and/or trailer 225), and the size of the virtual LUN may be increased dynamically when the volume or physical LUN is mapped. In such embodiments, disk driver layer 114 may have to modify some of its internal data structures when the virtual LUN is expanded, and may have to re-read the emulated metadata in order to do so. Off-host virtualizer 180 may be configured to send a metadata change notification message to disk driver layer 114 in order to trigger the re-reading of the metadata.
After VLUN 230 has been recognized by disk driver layer 114 (e.g., as a result of the generation of operating system metadata such as a partition table in an expected format and location), a block at any offset within the VLUN address space may be accessed by the disk driver layer 114, and thus by any other layer above the disk driver layer. For example, intermediate driver layer 113 may be configured to communicate with off-host virtualizer 180 by reading from, and/or writing to, a designated set of blocks emulated within VLUN 230. Such designated blocks may provide a mechanism for off-host virtualizer 180 to provide intermediate driver layer 113 with configuration information associated with logical volumes or physical LUNs mapped to VLUN 230 in some embodiments.
In one embodiment, off-host virtualizer 180 may be configured to map storage from a back-end physical LUN directly to a VLUN 230, without any additional virtualization (i.e., without creating a logical volume). Such a technique of mapping a PLUN to a VLUN 230 may be termed “PLUN tunneling”. Each PLUN may be mapped to a corresponding VLUN 230 (i.e., a 1-to-1 mapping of PLUNs to VLUNs may be implemented by off-host virtualizer 180) in some embodiments. In other embodiments, as described below in conjunction with the description of
In addition, off-host virtualizer 180 may also be configured to increase the level of data sharing using PLUN tunneling. Disk array devices often impose limits on the total number of concurrent “logins”, i.e., the total number of entities that may access a given disk array device. In a storage environment employing PLUN tunneling for disk arrays (i.e., where the PLUNs are disk array devices), off-host virtualizers 180 may allow multiple hosts to access the disk arrays through a single login. That is, for example, multiple hosts 110 may log in to the off-host virtualizer 180, while the off-host virtualizer may log in to a disk array PLUN once on behalf of the multiple hosts 110. Off-host virtualizer 180 may then pass on I/O requests from the multiple hosts 110 to the disk array PLUN using a single login. The number of logins (i.e., distinct entities logged in) as seen by a disk array PLUN may thereby be reduced as a result of PLUN tunneling, without reducing the number of hosts 110 from which I/O operations targeted at the disk array PLUN may be initiated. The total number of hosts 110 that may access storage at a single disk array PLUN with login count restrictions may thereby be increased, thus increasing the overall level of data sharing.
As described earlier, in addition to mapping physical storage directly to VLUNs 230, in some embodiments off-host virtualizer 180 may be configured to aggregate physical storage into a logical volume, and map the logical volume to an address range within a VLUN 230. For example, in some implementations a set of two or more physical storage regions, either within a single physical storage device or from multiple storage devices, may be aggregated into a logical volume. (It is noted that a logical volume may also be created from a single contiguous region of physical storage; i.e., the set of physical storage regions being aggregated may minimally consist of a single region). Mapping a logical volume through a VLUN may also be termed “volume tunneling” or “logical volume tunneling”.
Many storage environments utilize storage area networks (SANs), such as fibre channel fabrics, to access physical storage devices. SAN fabric reconfiguration (e.g., to provide access to a particular PLUN or logical volume from a particular host that did not previously have access to the desired PLUN or logical volume), which may require switch reconfigurations, recabling, rebooting, etc., may typically be fairly complex and error-prone. The techniques of PLUN tunneling and volume tunneling, described above, may allow a simplification of SAN reconfiguration operations. By associating pre-generated, unmapped VLUNs to hosts, and mapping PLUNs and logical volumes to VLUNs dynamically as needed, many reconfiguration operations may require only a change of a mapping table at a switch, and a recognition of new metadata by intermediate driver layer 113. Storage devices may be more easily shared across multiple hosts 110, or logically transferred from one host to another, using PLUN tunneling and/or volume tunneling. Allocation and/or provisioning of storage, e.g., from a pool maintained by a coordinating storage allocator, may also be simplified.
In addition to simplifying SAN configuration changes, PLUN tunneling and volume tunneling may also support storage interconnection across independently configured storage networks (e.g., interconnection across multiple fiber channel fabrics).
Each storage network 910 (i.e., storage network 910A, 910B, 910C, or 910D) may be independently configurable: that is, a reconfiguration operation performed within a given storage network 910 may not affect any other storage network 910. A failure or a misconfiguration within a given storage network 910 may also not affect any other independent storage network 910. In some embodiments, hosts 110 may include multiple HBAs, allowing each host to access multiple independent storage networks. For example, host 110A may include two HBAs in the embodiment depicted in
In one embodiment, volume tunneling may also allow maximum LUN size limitations to be overcome. For example, the SCSI protocol may be configured to use a 32-bit unsigned integer as a LUN block address, thereby limiting the maximum amount of storage that can be accessed at a single LUN to 2 terabytes (for 512-byte blocks) or 32 terabytes (for 8-kilobyte blocks). Volume tunneling may allow an intermediate driver layer 113 to access storage from multiple physical LUNs as a volume mapped to a single VLUN, thereby overcoming the maximum LUN size limitation.
In various embodiments, off-host virtualizer 180 may implement numerous different types of storage functions using block virtualization. For example, in one embodiment a virtual block device such as a logical volume may implement device striping, where data blocks may be distributed among multiple physical or logical block devices, and/or device spanning, in which multiple physical or logical block devices may be joined to appear as a single large logical block device. In some embodiments, virtualized block devices may provide mirroring and other forms of redundant data storage, the ability to create a snapshot or static image of a particular block device at a point in time, and/or the ability to replicate data blocks among storage systems connected through a network such as a local area network (LAN) or a wide area network (WAN), for example. Additionally, in some embodiments virtualized block devices may implement certain performance optimizations, such as load distribution, and/or various capabilities for online reorganization of virtual device structure, such as online data migration between devices. In other embodiments, one or more block devices may be mapped into a particular virtualized block device, which may be in turn mapped into still another virtualized block device, allowing complex storage functions to be implemented with simple block devices. More than one virtualization feature, such as striping and mirroring, may thus be combined within a single virtual block device in some embodiments, creating a logically hierarchical virtual storage device.
The off-host virtualizer 180, either alone or in cooperation with one or more other virtualizers such as a volume manager at host 110 or other off-host virtualizers, may provide functions such as configuration management of virtualized block devices and distributed coordination of block device virtualization. For example, after a reconfiguration of a logical volume shared by two hosts 110 (e.g., when the logical volume is expanded, or when a new mirror is added to the logical volume), the off-host virtualizer 180 may be configured to distribute metadata or a volume description indicating the reconfiguration to the two hosts 110. In one embodiment, once the volume description has been provided to the hosts, the storage stacks at the hosts may be configured to interact directly with various storage devices 340 according to the volume description (i.e., to transform logical I/O requests into physical I/O requests using the volume description). Distribution of a virtualized block device as a volume to one or more virtual device clients, such as hosts 110, may be referred to as distributed block virtualization.
As noted previously, in some embodiments, multiple layers of virtualization may be employed, for example at the host level as well as at an off-host level, such as at a virtualization switch or at a virtualization appliance. In such embodiments, some aspects of virtualization may be visible to a virtual device consumer such as file system layer 112, while other aspects may be implemented transparently by the off-host level. Further, in some multilayer embodiments, the virtualization details of one block device (e.g., one volume) may be fully defined to a virtual device consumer (i.e., without further virtualization at an off-host level), while the virtualization details of another block device (e.g., another volume) may be partially or entirely transparent to the virtual device consumer.
In some embodiments, a virtualizer, such as off-host virtualizer 180, may be configured to distribute all defined logical volumes to each virtual device client, such as host 110, present within a system. Such embodiments may be referred to as symmetric distributed block virtualization systems. In other embodiments, specific volumes may be distributed only to respective virtual device consumers or hosts, such that at least one volume is not common to two virtual device consumers. Such embodiments may be referred to as asymmetric distributed block virtualization systems.
It is noted that off-host virtualizer 180 may be any type of device, external to host 110, that is capable of providing the virtualization functionality, including PLUN and volume tunneling, described above. For example, off-host virtualizer 180 may include a virtualization switch, a virtualization appliance, a special additional host dedicated to providing block virtualization, or an embedded system configured to use application specific integrated circuit (ASIC) or field-programmable gate array (FPGA) technology to provide block virtualization functionality. In some embodiments, off-host block virtualization may be provided by a collection of cooperating devices, such as two or more virtualizing switches, instead of a single device. Such a collection of cooperating devices may be configured for failover, i.e., a standby cooperating device may be configured to take over the virtualization functions supported by a failed cooperating device. An off-host virtualizer 180 may incorporate one or more processors, as well as volatile and/or non-volatile memory. In some embodiments, configuration information associated with virtualization may be maintained at a database separate from the off-host virtualizer 180, and may be accessed by off-host virtualizer over a network. In one embodiment, an off-host virtualizer may be programmable and/or configurable. Numerous other configurations of off-host virtualizer 180 are possible and contemplated. A host 110 may be any computer system, such as a server comprising one or more processors and one or more memories, capable of supporting the storage software stack described above. Any desired operating system may be used at a host 110, including various versions of Microsoft Windows™, Solaris™ from Sun Microsystems, various versions of Linux, other operating systems based on UNIX, and the like. The intermediate driver layer 113 may be included within a volume manager in some embodiments.
Although the embodiments above have been described in considerable detail, numerous variations and modifications will become apparent to those skilled in the art once the above disclosure is fully appreciated. It is intended that the following claims be interpreted to embrace all such variations and modifications.
Number | Date | Country | Kind |
---|---|---|---|
PCT/US04/39306 | Nov 2004 | WO | international |
This application is a continuation-in-part of U.S. patent application Ser. No. 10/722,614, entitled “SYSTEM AND METHOD FOR EMULATING OPERATING SYSTEM METADATA TO PROVIDE CROSS-PLATFORM ACCESS TO STORAGE VOLUMES”, filed Nov. 26, 2003.
Number | Date | Country | |
---|---|---|---|
Parent | 10722614 | Nov 2003 | US |
Child | 11156821 | Jun 2005 | US |