The subject matter of this disclosure is generally related to electronic data storage, and more particularly to verifying the integrity of “forever” snapshots.
A storage array is an example of a high-capacity data storage system that can be used to maintain large active storage objects that are frequently accessed by multiple host servers. A storage array includes a network of specialized, interconnected compute nodes that respond to IO commands from host servers to provide access to data stored on arrays of non-volatile drives. The stored data is used by host applications that run on the host servers. Examples of host applications may include email programs, inventory control programs, and accounting programs, for example, and without limitation. Low latency data access may be required to achieve acceptable host application performance.
Cloud storage is a distinct type of storage system that is typically used in a different role than a storage array. Cloud storage exhibits greater data access latency than a storage array and may be unsuitable for servicing IOs to active storage objects. For example, host application performance would suffer if the hosts accessed data from cloud storage rather than a storage array. However, cloud storage is suitable to reduce per-bit storage costs in situations where high-performance capabilities are not required, e.g., data backup and storage of inactive or infrequently accessed data. Cloud storage and storage arrays also differ in the types of protocols used for IOs. For example, and without limitation, the storage array may utilize a transport layer protocol such as Fibre Channel, iSCSI (internet small computer system interface) or NAS (Network-Attached Storage) protocols such as NFS (Network File System), SMB (Server Message Block), CIFS (Common Internet File System) and AFP (Apple Filing Protocol). In contrast, the cloud storage may utilize any of a variety of different non-standard and provider-specific APIs (Application Programming Interfaces) such as AWS (Amazon Web Services), Dropbox, OpenStack, Google Drive/Storage APIs based on, e.g., JSON (JavaScript Object Notation).
A variety of techniques such as snapshots, backups, and replication can be implemented to avoid data loss, maintain data accessibility, and enable recreation of storage object state at a previous point in time in a storage system that includes storage arrays and cloud storage. A typical snapshot is a point-in-time representation of a storage object that includes only the changes made to the storage object relative to an earlier point in time, e.g., the time of creation of the previous snapshot. Either copy-on-write or redirect-on-write can be performed to preserve changed data that would otherwise be overwritten. Metadata indicates the relationship between the changed data and the storage object. At some regular interval, e.g., hourly, or daily, a snapshot is created by writing the changes to a snap volume. A storage array may maintain snapshots for a predetermined period of time and then discard them.
Although snapshots are typically maintained locally by a storage array, backup snapshots may be stored remotely in order to better protect against disaster events such as destruction of a storage array. For example, backup snapshots may be maintained in cloud storage or purpose-built data backup appliance that is geographically remote from the storage array, e.g., in a different data center. Storing backup snapshots in the cloud offers the advantage of low-cost storage in addition to the protection offered by geographic separation.
As the data set size is typically large, array-embedded snapshot backups to remote system only transfers the data blocks changed since the last successful backup on that remote backup storage, and then use the remote backup storage capabilities to merge the changes with the previous backup to create a new full backup. Metadata required to achieve the snapshot backups is generally more extensive and complex than for array only snapshots.
A problem arises when backups of the snapshots are implemented. Scheduled or ad-hoc created snapshot backups require transmission of data over wire, and writing to the remote system, and merging those changes with the previous base backup on the remote system. Although the storage array creates and maintains extensive metadata to assure the integrity of array only snapshots, cloud storage does not necessarily implement all of the same metadata. Corruption can occur in the data path when changes are being transferred, written, or synthesized. Consequently, corruption of incremental backup snapshots can remain undetected indefinitely.
In accordance with some implementations, a method for validating integrity and correctness of a backup snapshot of a storage object comprises providing at least one checksum algorithm to a storage array; the storage array calculating a checksum of the snapshot being backed up with the at least one checksum algorithm; calculating or retrieving a checksum of the backup of the snapshot using the same checksum algorithm; performing validation of the backup snapshot by comparing the checksum of the local snapshot being backed up with the checksum of the backup snapshot; and prompting and possibly performing remedial action in response to determining that the checksum of the local snapshot does not match the checksum of the backup snapshot.
In accordance with some implementations, a storage system comprises: remote backup storage configured to provide at least one checksum algorithm to a storage array that is configured to calculate a checksum of a snapshot being backed up with the at least one checksum algorithm, perform validation of the backup snapshot by comparing the checksum of the local snapshot being backed up with the checksum of the backup snapshot; and prompt and possibly perform remedial action in response to determining that the checksum of the local snapshot does not match the checksum of the backup snapshot.
In accordance with some implementations, a non-transitory computer-readable storage medium stores instructions that when executed by a computer cause the computer to perform a method for validating integrity and correctness of a backup snapshot of a storage object, the method comprising: providing at least one checksum algorithm to the storage array; the storage array calculating a checksum of the snapshot being backed up with the at least one checksum algorithm; calculating or retrieving a checksum of the backup of the snapshot using the same checksum algorithm; performing validation of the backup snapshot by comparing the checksum of the local snapshot being backed up with the checksum of the backup snapshot; and prompting and possibly performing remedial action in response to determining that the checksum of the local snapshot does not match the checksum of the backup snapshot.
All examples, aspects and features mentioned in this document can be combined in any technically possible way. Other aspects, features, and implementations may become apparent in view of the detailed description and figures.
The terminology used in this disclosure is intended to be interpreted broadly within the limits of subject matter eligibility. The terms “disk” and “drive” are used interchangeably herein and are not intended to refer to any specific type of non-volatile storage media. The terms “logical” and “virtual” are used to refer to features that are abstractions of other features, e.g., and without limitation abstractions of tangible features. The term “physical” is used to refer to tangible features that possibly include, but are not limited to, electronic hardware. For example, multiple virtual computers could operate simultaneously on one physical computer. The term “logic” is used to refer to special purpose physical circuit elements, firmware, software, computer instructions that are stored on a non-transitory computer-readable medium and implemented by multi-purpose tangible processors, and any combinations thereof. Aspects of the inventive concepts are described as being implemented in a data storage system that includes host servers and a storage array. Such implementations should not be viewed as limiting. Those of ordinary skill in the art will recognize that there are a wide variety of implementations of the inventive concepts in view of the teachings of the present disclosure.
Some aspects, features, and implementations described herein may include machines such as computers, electronic components, optical components, and processes such as computer-implemented procedures and steps. It will be apparent to those of ordinary skill in the art that the computer-implemented procedures and steps may be stored as computer-executable instructions on a non-transitory computer-readable medium. Furthermore, it will be understood by those of ordinary skill in the art that the computer-executable instructions may be executed on a variety of tangible processor devices, i.e., physical hardware. For practical reasons, not every step, device, and component that may be part of a computer or data storage system is described herein. Those of ordinary skill in the art will recognize such steps, devices, and components in view of the teachings of the present disclosure and the knowledge generally available to those of ordinary skill in the art. The corresponding machines and processes are therefore enabled and within the scope of the disclosure.
In order to provide data storage services to the host servers 106, 108, 110, the storage array 100 creates a storage object known as a production volume 102. The production volume 102 contains a full copy of host application data, i.e., an application image. The production volume 102 is accessed by instances of a host application 104 running on each of the host servers 106, 108, 110, of which there may be many. The production volume 102 is a logical storage device that is created by the storage array using the storage resources of a storage group 112. The storage group 112 includes multiple thinly provisioned devices (TDEVs) 114, 116, 118 that are also logical storage devices. In general, logical storage devices may be referred to as logical volumes, devices, or storage objects.
An application image snapshot 170 is produced by generating respective individual snapshots 150, 152, 154 of each of the TDEVs 114, 116, 118 of the storage group associated with the production volume 102. The TDEV snapshots are stored locally by the storage array. In order to create a corresponding backup application image snapshot 158 on cloud storage 120, the storage array 100 sends data difference messages (“diff's”) 156 via network 121 to a data backup appliance 130. The diff's 156 represent changes to the production volume and thus to the snapshots 150, 152, 154 of each of the TDEVs. An individual diff is not necessarily sent for each write to the production volume 102, e.g., a diff may represent multiple updates to the production volume. The data backup appliance 130 performs deduplication and uses the diff's to prompt update of backup snapshots 160, 162, 164 of the TDEVs. In the illustrated example, backup snapshot 160 corresponds to snapshot 150, backup snapshot 162 corresponds to snapshot 152, and backup snapshot 164 corresponds to snapshot 154. The storage array maintains the local application image snapshot 170 in order to be able to recreate storage object state at any prior point in time. In a disaster recovery operation in which the application image and application image snapshot 170 become unavailable, the backup application image snapshot 158 is used to recreate the application image in a new storage group on the storage array 100 or a different storage array. For example, if storage array 100 is destroyed in a natural disaster, then the backup application image snapshot 158 can be used to rebuild the production volume 102 on a different storage array at a different data center.
Referring to
A cloud replication system (CRS) 250 running on the storage array 100 automatically prompts transmission of the diff's 156 to the data backup appliance 130. A checksum query application programming interface (API) 252 on the storage array 100 and a corresponding API on the data backup appliance enable sharing of the checksum library and algorithms 254. The APIs also enable coordinated generation of checksums on snapshots 150, 152, 154 and backup snapshots 160, 162, 164 to verify data integrity and correctness. For example, the API 252 may be used to prompt the data backup appliance 130 to obtain or generate a checksum of a designated backup snapshot. The storage array may generate a checksum of the corresponding snapshot and then compare the generated checksum with the checksum shared by the data backup appliance.
Specific examples have been presented to provide context and convey inventive concepts. The specific examples are not to be considered as limiting. A wide variety of modifications may be made without departing from the scope of the inventive concepts described herein. Moreover, the features, aspects, and implementations described herein may be combined in any technically possible way. Accordingly, modifications and combinations are within the scope of the following claims.