The present invention relates generally to computer systems, and more particularly, but without limitation, to maintaining system configuration integrity during updates associated with memory allocations.
Computer systems can comprise input devices, output devices, one or more CPUs and storage media such as semiconductor RAM, EEPROM, disc drives, CD drives, or other storage media. An operating system provides an application environment and a file system for allocating (and deallocating) storage capacity as files are created, modified, or deleted. Specialized computer systems, such as servers and storage arrays, for example, also employ a file system for allocating storage capacity that is accessed through a network or other connection. Servers and storage arrays store files across a plurality of disc drives, depending on a desired storage format, such as a RAID level, for example. User data files are mapped to one or more areas on one or more disc drives. Mapping includes storage of mirror data or parity data. Configuration information describing the manner in which data files are mapped to one or more disc drives is contained in tables or other data structure termed metadata. As files are created, modified, or deleted, metadata is updated to reflect the allocation or deallocation of storage capacity.
Systems can be multi-threaded and multi-tasking, simultaneously executing a number of processes. Abnormal execution of one process (such as a system process or user application) can cause one or more processes to end in an incomplete manner. While operating systems strive to provide an operating environment where abnormal execution of one application does not affect other applications, conditions such as bus failures, memory errors, code errors, power failures, power surges, or other conditions can result in a system crash. Storage capacity allocation or de-allocation processes can be operating when system execution is halted by a crash, possibly resulting in erroneous allocation or de-allocation and loss of data.
As embodied herein and as claimed below, the present invention is generally directed to a device and associated method for updating computer system configuration information.
In some embodiments a recovery record is provided that is stored in a memory space. The recovery record comprises memory allocation information associated with a change in a system configuration of memory allocation of the space, and a completion indicator comprising a first value when the memory allocation information is included in the system configuration and comprising a second value when the memory allocation information is not included in the system configuration.
In other embodiments a method is provided comprising: storing memory allocation information associated with a change in a system configuration of memory allocation of a memory space; and assigning a first value to a completion indicator indicating that the memory allocation information is not included in the system configuration.
In other embodiments a data storage system is provided comprising system configuration change information associated with a change in a system configuration, and means for updating the system configuration by saving the system configuration change information before updating the system configuration.
These and various other features and advantages which characterize the embodiments of the claimed invention will become apparent upon reading the following detailed description and upon reviewing the associated drawings.
To illustrate an exemplary environment in which presently preferred embodiments of the present invention can be advantageously practiced,
The system 100 includes a number of host computers 102, respectively identified as hosts A, B, and C. The host computers 102 interact with each other as well as with a pair of data storage arrays 104 (denoted A and B, respectively) via a fabric 106. The fabric 106 is preferably characterized as fibre-channel based switching network, although other configurations can be utilized as well including the Internet.
Each array 104 includes a pair of controllers 108 (denoted A1, A2 and B1, B2) and a set of data storage devices 110 preferably characterized as hard disc drives operated as a RAID (redundant array of independent discs). The controllers 108 and data storage devices 110 preferably utilize a fault tolerant arrangement so that the various controllers 108 utilize parallel, redundant links and at least some of the user data stored by the system 100 is stored in a redundant format within at least one set of the data storage devices 110.
It is further contemplated that the A host computer 102 and the A data storage array 104 can be physically located at a first site, the B host computer 102 and B storage array 104 can be physically located at a second site, and the C host computer 102 can be yet at a third site, although such is merely illustrative and not limiting.
A fabric interface (I/F) circuit 118 communicates with the other controllers 108 and the host computers 102 via the fabric 106, and a device I/F circuit 120 communicates with the storage devices 110. The I/F circuits 118, 120 and a path controller 122 form a communication path to pass commands and data between the storage array 104 and the host 102, such as by employing the cache memory 124. Although illustrated discretely, it will be understood that the path controller 122 and the I/F circuits 118, 120 can be unitarily constructed.
The data storage capacity of an array 104, defined by the extent of the data storage devices 110 in a given array 104, is organized into ordered files that can be written to and read from the array 104. System configuration information defines the relationship between user data files, including any associated parity and mirror data, with the respective storage locations. The system configuration furthermore identifies the relationship between blocks of storage capacity allocated to user files and the memory storage locations, such as logical block addresses. The system configuration can furthermore include virtualization by defining virtual block addresses that are mapped to logical block addresses.
System configuration information is changed when storage capacity is allocated, such as when saving new files or enlarging existing files, or after storage capacity is deallocated, such as when deleting files or reducing the size of existing files. System metadata defines file allocation information and other data structures that support allocation processes.
The metadata 130 thus represents a summary of the system configuration with respect to the storage capacity utilization. Updating the metadata 130 involves altering the indicator bit data to reflect the change in allocation from the state before the allocation, for example at time t0, to the state after the allocation at time t1. Preferably, the updating takes place as a one-pass process, as illustrated by the flowchart of
If an error occurs in the system 100 during the updating step 146, such as in the event of a system 100 crash, the system 100 can be faced with attempting a restart with partially updated metadata. Embodiments of the present invention contemplate a solution to that problem by providing the opportunity for a two-pass recovery process of the metadata 130.
In the two-pass process 152, a recovery record 160 is created comprising a stored record of the MEMORY ALLOCATION INFORMATION 162 (“MAI”). The MAI 162 comprises information associated with the changes in the system configuration due to the allocation request 142. The recovery record 160 farther comprises a COMPLETION INDICATOR 164 comprising a first value 168 if the MAI 162 is included in the system configuration and comprising a second value 166 if the MAI 162 is not included in the system configuration. That is, in the two-pass process 152, the system configuration is defined by a combination METADATAt0 with the MAI 162, both of which are individually stored in memory. Preferably, both are stored in nonvolatile memory for recovery in the event of a momentary power loss, such as in a structure similar to that set forth in
Recalling
The storing MAI 162 step in block 170 preferably includes storing a copy of the MAI 162 for backup purposes. More particularly, in some embodiments it is advantageous to mirror the MAI 162 in write back cache memory. As discussed above, the store MAI 162 step in block 170 can also advantageously include a variety of information, such as the availability of a particular data storage unit for allocation, the total number of available data storage units for allocation, and the total number of available data storage units per zone in a mapped arrangement of zoned capacity.
It will now be appreciated that the various preferred embodiments of the present invention generally contemplate a data storage system 100 comprising memory allocation information 162 associated with a change in a system 100 configuration of memory allocation, and means for updating the system 100 configuration by saving the memory allocation information 162 before updating the system configuration. This is generally defined by a two-pass process of first saving the memory allocation information 162 and then updating the system configuration.
The means for updating is characterized by an indicator means, such as without limitation the completion indicator 164, indicating whether the memory allocation information 162 is included in the system 100 configuration. The means for updating is characterized by a means for mapping, such as without limitation the metadata 130, the system 100 configuration allocatability. The means for updating is characterized by a means for indicating the allocatability of the system 100 configuration, such as without limitation the allocatability of a particular data storage unit 132. The means for indicating can alternatively indicate the total number of data storage units that are available for allocation, as well as the number of data storage units per zone that are available for allocation in a zoned capacity arrangement.
Continuing with
Depending on the storage format, such as RAID level, for example, grids 200 can contain different numbers of data storage units 132. Configuring all the data storage units 132 in a particular grid 200 as the same storage format and allocating all data storage units 132 in the grid 200 to the same logical device is advantageous in simplifying the metadata 130 arrangement for processing. Grid-groups 206 can be allocated to a logical device and grids 200 within the grid-groups 206 assigned to the logical device as needed. A number can be assigned to each grid 200 of the plurality of grids 200 and the grids 200 can be assigned to a logical device with respect to the grid numbers, such as but not limited to in a sequential manner. The number of the first grid 200 in a grid-group 206 can serve as a grid-group 206 identifier, or other numbers or identifiers can be assigned to grid-groups.
Preferably, an allocation request 142 (
The storage capacity selected to meet the allocation request 142 can be determined by evaluating the GGAST 212 for the requested capacity, and evaluating the count of free grid-groups in each zone to identify where sufficient free grid-groups exist. The GGAM 212 can then be used to select one or more free grid-groups to provide the requested storage capacity. The GGAM 212 can also be employed to select storage that is contiguous to or in proximity to other storage allocated to an existing volume if the allocation request 142 specifies increasing the size of an existing logical device.
Once the allocation determination has been made, the respective changes must be reflected in the system 100 configuration. This can be accomplished by copying that portion of the GGAM 210 affected by the changes in allocation in defining the MAI 162 (
It will be noted that advantageously the MAI 162 consists only of resultant data, and not formulaic data, such that it is not used to reinitiate any processes during the updating of the system configuration. In this manner, the recovery record 160 requires no intelligence or decision-making for crash recovery. Rather, the crash recovery involves only mechanistic calculations such as address determinations and writes. The recovery code of the recovery record 160 does not need to know what the higher level operation was that got interrupted. In the context of crash recovery, whether the operation being recovered was an allocation or deallocation does not matter; the updated metadata values are just stored to the respective locations.
This arrangement makes the recovery code both simple and reliable. The actual address of where to write a particular updated metadata value can be implied by the value's location in the recovery record 160, implied by reference to other constructs (i.e. logical device number), or explicit with an actual address in the recovery record 160. Particularly advantageous is that this permits the recovery record 160 to be used in code for an idempotent updating of the system configuration, ensuring the fastest recovery possible. Addresses identifying locations within the system 100 configuration can be written to the MAI 162 for each map, table or other data structure to be updated. Alternatively, information within the MAI 162, such as the logical device numbers and grid numbers, can provide information from which the metadata address can be generated. Further, the location of update information within the MAI 162 can be employed to indicate the data structure to be updated. For example, the count of free grid-groups can occupy a predefined location within the MAI 162.
An example format for storing data in a MAI 162 is shown in
The MAI 162 record of
Alternatively, MAI 162 record can include a GGAM address 256 that specifies a starting location in the GGAM 210 to which the update is to be written, a free grid-group count address 258 that specifies the address of the total free grid group count in the GGAST 212, a GGAM zone free grid-group count address 260 that specifies the address in the GGAST 212 of one free grid-group count for one zone, and an LDAM address 262 that specifies the address of the LDGT 242 for one logical device.
While the foregoing description has employed a grid-based storage architecture, embodiments of the present invention are not limited to a particular storage architecture. An allocatable unit represents an amount of storage capacity allocated to a logical device. While foregoing examples have employed a grid-group as an allocatable unit, embodiments of the present invention are not limited as to the organization or size of an allocatable unit and can include a stripe as an allocatable unit, for example.
While the foregoing examples are directed to configuration of data storage systems (and can be applied to storage systems employing any media including but not limited to disc drives, WORM drives, writeable CD ROMS, DVDs, EEPROM, semiconductor RAM, and EEPROM), embodiments of the present invention can be applied to other hardware and software configurable elements of computer systems including component and connection configurations, such as network interfaces and graphics adapters, and to software configurations, such as application environment settings, register settings, passwords, and the like. For example, methods of the present invention can be applied to setting a password such that if a crash occurs while the password is being changed and the password is corrupted, a recovery record can be employed to write a non-corrupted password. As such, a metadata update request represents any request that updates configuration information of a system.
It is to be understood that even though numerous characteristics and advantages of various embodiments of the present invention have been set forth in the foregoing description, together with details of the structure and function of various embodiments of the invention, this detailed description is illustrative only, and changes may be made in detail, especially in matters of structure and arrangements of parts within the principles of the present invention to the full extent indicated by the broad general meaning of the terms in which the appended claims are expressed. For example, the particular elements may vary depending on the particular metadata structure without departing from the spirit and scope of the present invention.
Number | Name | Date | Kind |
---|---|---|---|
5276860 | Fortier et al. | Jan 1994 | A |
5632027 | Martin et al. | May 1997 | A |
5752250 | Minatogawa et al. | May 1998 | A |
5761678 | Bendert et al. | Jun 1998 | A |
5961625 | Carter | Oct 1999 | A |
6058455 | Islan et al. | May 2000 | A |
6195695 | Cheston et al. | Feb 2001 | B1 |
6243773 | Mahalingam | Jun 2001 | B1 |
6282670 | Rezaul Islam et al. | Aug 2001 | B1 |
6308287 | Mitchell et al. | Oct 2001 | B1 |
6338126 | Ohran et al. | Jan 2002 | B1 |
6427198 | Berglund et al. | Jul 2002 | B1 |
6438563 | Kawagoe | Aug 2002 | B1 |
6438606 | Ward | Aug 2002 | B1 |
6477612 | Wang | Nov 2002 | B1 |
6584499 | Jantz et al. | Jun 2003 | B1 |
6625754 | Aguilar et al. | Sep 2003 | B1 |
6629111 | Stine et al. | Sep 2003 | B1 |
6671777 | Krehbiel et al. | Dec 2003 | B1 |
6684293 | Backman et al. | Jan 2004 | B1 |
6687849 | Cherf | Feb 2004 | B1 |
6697971 | Dwyer | Feb 2004 | B1 |
6701421 | Elnozahy et al. | Mar 2004 | B1 |
6715055 | Hughes | Mar 2004 | B1 |
6718466 | Duwe et al. | Apr 2004 | B1 |
6732171 | Hayden | May 2004 | B2 |
6769022 | DeKoning et al. | Jul 2004 | B1 |
6779130 | Sprunt et al. | Aug 2004 | B2 |
20030172150 | Kennedy | Sep 2003 | A1 |
20030177307 | Lewalski-Brechter | Sep 2003 | A1 |
20040028069 | Tindal et al. | Feb 2004 | A1 |
20040243991 | Gustafson et al. | Dec 2004 | A1 |
20050144609 | Rothman et al. | Jun 2005 | A1 |
Number | Date | Country |
---|---|---|
01140342 | Jun 1989 | JP |
WO 0023890 | Apr 2000 | WO |
Number | Date | Country | |
---|---|---|---|
20060085617 A1 | Apr 2006 | US |