None.
The present technology relates generally to deep storage in distributed storage systems.
Information I and management computer applications are used extensively to track and manage data relevant to an enterprise, such as marketing and sales data, manufacturing data, inventory data, and the like. Typically, the application data resides in a centralized database within a distributed storage system, and in a format such as in Oracle, Informix, or SQL and the like. Local applications integrate remote clients and network servers to use and manage the application data, and to make the application data available to remote applications such as via remote function calls (RFCs).
The centralized location of the application data can be problematic in that it places on the enterprise owner the onus of maintaining complex computer systems in order to support the applications. For example, it has traditionally been necessary for the enterprise owner to acquire the knowledge necessary to purchase and maintain the physical storage devices that store the data. The maintenance includes implementing extensive and complex requirements that protect the stored data from file loss, from storage device failure such as corrupted storage media, and even from entire installation failure. Where just file failure has occurred, it is advantageous to provide an end-user initiated recovery rather than requiring the enterprise owner's participation. When a storage failure requires complete recovery of a file system, preferably removable storage devices store the backups in a way suited for high performance streaming. Worst case, when an entire installation failure requires an offsite recovery, preferably the removable storage devices are ordered in a way making it efficient to remove them to the offsite location.
What is needed is a solution that replaces the complex and expensive archive requirements of the previously attempted solutions with a back-end archive controller having top-level control of removable physical storage devices. It is to that need that the embodiments of the present technology are directed.
Some embodiments of the claimed technology contemplate an apparatus having a processor-based storage controller and a nontransient, tangible computer memory configured to store a plurality of data files. Computer instructions are stored in the computer memory defining container-level array storage logic that is configured to be executed by the controller to sequentially containerize the data files to a plurality of virtual storage containers, and to flush the virtual storage containers by migrating each storage container's contents to a respective physical storage device.
Some embodiments of the claimed technology contemplate a method that includes: containerizing data files in a plurality of virtual storage containers; and flushing the storage containers by migrating all of the data files in each storage container to a respective physical data storage device.
Some embodiments of the claimed technology contemplate an apparatus that has a storage controller caching data files and storing parity for the cached data files, and means for striping the cached data files across a plurality of physical storage devices.
Initially, it is to be appreciated that this disclosure is by way of example only, not by limitation. The data management concepts herein are not limited to use or application with any specific system or method. Thus, although the instrumentalities described herein are for the convenience of explanation, shown and described with respect to exemplary embodiments, it will be appreciated that the principles herein may be applied equally in other types of storage systems and methods involving deep storage of archive data.
To illustrate an exemplary environment in which preferred embodiments of the present technology can be advantageously practiced,
A detailed description of the computer applications 104 is unnecessary for the skilled artisan to understand the scope of the claimed technology. Generally, the applications 104 can be any type of computer application such as but not limited to a point of sale application, an inventory application, a supply-chain application, a manufacturing application, and the like. The server 104 may communicate with one or more other servers (not depicted) via one or more networks (not depicted). The server 104 in these illustrative embodiments communicates with a network attached storage (NAS) device 106 via a local network 108. The NAS device 106 presents an independent storage file system to the server 102. The server 102 stores application data to and retrieves application data from the NAS device 106 in the normal course of executing the respective applications 104.
Further in these illustrative embodiments the NAS device 106 cooperates with an archive storage controller (ASC) 110 to store copies of the application data for long-term retention in a deep storage system 114. The long-term storage may be provisioned for backup copies (backups) and other data that is subject to retention policies. The NAS device 106 and the ASC 110 communicate via a network 112 that can be characterized as Ethernet based switching network. The protocol utilized by the ASC 110 makes it well suited for placement at a remote site a distance away from the NAS device 106. This protocol is compatible with the Internet and can be run over either private or public ISP networks. The NAS device 106 can execute programmed routines that periodically transfer archive data files to the ASC 110 for the long-term retention. As described in detail herein, deep storage can be managed entirely by applications in the ASC 110, independently of any control by the enterprise server 102.
The ASC 110 can provide a cloud storage compatible interface for copying the file data from the NAS 106 to the ASC 110. For example, a link application in the NAS 106 can send the file data via the network 112 through implementation of representational state transfer (REST) calls from the link module via object-oriented language. A REST architecture, such as the World Wide Web, can transfer data and command via hypertext transfer protocol (HTTP) commands such as GET, POST, DELETE, PUT, etc. Particularly, the link application can send and receive file data via connection with the ASC 110 configured as an HTTP device. The NAS 106 connection with the ASC 110 is built into the link module so that both sending file data to and receiving file data from the ASC 110 is self-contained and automatically established by the link application when necessary. Generally, the link application can map requests/responses to REST request/response streams to carry out predetermined transfers of file data via object transfers.
The ASC 110 has a container-level array storage application 120 that executes computer instructions stored in the computer memory to allocate a number of logical volumes 122 for logically arranging the file data temporarily stored in the computer memory. The logical volumes 122 are sometimes referred to herein as storage containers 122 (C1, C2, . . . Cn). The number of storage containers 122 is flexible, and will be based on the format of physical storage devices selected in the deep storage 114. Each storage container 122 is only a temporary repository for the file data during the time it is migrated to the deep storage 114.
Although these illustrative embodiments depict the use of distributed parity in the file-level striping, the contemplated embodiments are not so limited. In alternative embodiments other useful redundancy (RAID) methods can be used including dual distributed parity and no parity at all. In all such cases the term “stripe” means all the files being flushed and the corresponding parity data, and “stripe unit” means all the data in one container, whether files or parity.
For purposes of continuing the description of these illustrative embodiments, the deep storage 114 (
The physical storage devices in the tape library 126 are a plurality of tape cartridges 132 grouped in magazines 134. The tape cartridges 132 can be identified by a computer control that continually tracks the position of each tape cartridge 132, both by magazine and position in the magazine. A particular tape cartridge 132 might be moved to a different position during normal operations of the tape library 126. The tape cartridges 132 can also be physically identified, such as by attaching radio frequency identification (RFID) tags or semiconductor memory devices and the like. By continuously identifying the tape cartridges 132, a selected one can be mounted into one of the tape drives 136 to transfer data to and/or retrieve data from the selected tape cartridge 132. A map module 135 logically maps the physical location of each tape cartridge 132. The logical map is used by the ASC 110 to account for the file data it stores to and retrieves from the tape library 126. In alternative embodiments the physical storage devices can be a different form, such as optical disks, optical disk cartridges, magnetic disks, optical-magnetic disks, mobile solid state memory devices, and the like.
The tape library 126 can have a shelving system 138 for queuing the magazines 134 not presently at a tape drive 136. A transport unit 140 shuttles magazines 134 between the shelving system 138 and the drives 136, and picks and places a particular tape cartridge 132 from a shuttled magazine 134 to/from a drive 136. Although
Top-level control is provided by a central processor unit (CPU) 142 that has top-level control of all the various components and their functionalities. Data, virtual mappings, executable computer instructions, operating systems, applications, and the like are stored to a memory 144 and accessed by one or more processors in and/or under the control of the CPU 142. The CPU 142 includes macroprocessors, microprocessors, memory, and the like to logically carry out software algorithms and instructions.
As one skilled in the art recognizes, the tape library 126 depicted in
The process begins in block 152 with the ASC receiving a file that was transferred by the NAS. In block 154 the ASC assigns a current storage container C, for storing the most recently received file in block 152. As described, the ASC begins by selecting one of the storage containers and sequentially storing files to the selected storage container until it is full, then selecting the next storage container and likewise storing files until that storage container is full, and repeating the cycle until all of the storage containers for files are full. In block 156 a determination is made as to whether storing the most recent file would overflow all of the storage containers that are allocated for files. If the determination of block 156 is “yes,” then in block 158 the most recent file is not released from the buffer for storing at this time. In block 160 a determination is made as to whether there is a smaller pending file that could be stored without overflowing the storage containers. If the determination of block 160 is “yes,” then control returns to block 152 and the smaller file is received for processing by the method.
If the determination of block 160 is “no,” then in block 162 a padding file is constructed to fill the last storage container. Parity that includes the last file is stored in block 164. All of the data files can be compressed in block 166, and then the files and parity are flushed to the destination physical storage devices in block 168, under the constraint of rules 170. The rules 170 in these illustrative embodiments implement striping with rotating parity, although the contemplated embodiments are not so limited. With the storage containers flushed empty, the new current storage container is reset to C, in block 169.
If, contrarily, the determination of block 156 is “no,” then in block 174 a determination is made as to whether the most recent file fills all the storage containers allocated for files. That determination can be made by comparing what capacity remains after storing the most recent file with a predetermined threshold. If the remaining capacity is less than the threshold then control passes to block 162 and that portion of the process repeats as described above.
If the determination of block 174 is “no,” then in block 176 a determination is made as to whether storing the most recent file will overflow the current storage container Ci. If the determination of block 176 is “yes,” then in block 178 a first portion of the most recent file is stored to the current storage container Ci and parity for the full storage container Ci is stored in block 180. In block 182 the new current storage container, which is allocated for files, is incremented to the next storage container Ci+1. In block 184 the second portion of the most recent file is stored to the new current storage container Ci+1 and parity for Ci+1 is stored in block 186.
If, contrarily, the determination of block 176 is “no,” then in block 188 the most recent file is stored in the current storage container C, and parity is stored in block 190. In all events, control eventually passes from one of blocks 169, 186, or 190 to block 192 for determining whether the last file has been processed. If the determination of block 192 is “no,” then control returns to block 152 for processing more files in accordance with this technology.
Embodiments of the present invention can be commercially practiced in a Black Pearl archive storage system that possesses a Spectra Logic T-Finity tape cartridge library on the backend manufactured by Spectra Logic of Boulder, Colorado.
It is to be understood that even though numerous characteristics and advantages of various embodiments of the present technology have been set forth in the foregoing description, together with the details of the structure and function of various embodiments of the invention, this disclosure is illustrative only, and changes may be made in detail, especially in matters of structure and arrangement of parts within the principles of the present technology to the full extent indicated by the broad general meaning of the terms in which the appended claims are expressed. For example, the disclosed technology can be employed across multiple library partitions, while still maintaining substantially the same functionality and without departing from the scope of the claimed technology. Further, though communication is described herein as between an ASC and a tape library, communication can be received directly by a tape drive, via the interface device 154, for example, without departing from the scope of the claimed technology. Further, although the preferred embodiments described herein are directed to tape library systems, and related technology, it will be appreciated by those skilled in the art that the claimed technology can be applied to other physical storage systems, such as storage drive arrays, without departing from the scope of the claimed technology.
It will be clear that the claimed technology is well adapted to attain the ends and advantages mentioned as well as those inherent therein. While presently preferred embodiments have been described for purposes of this disclosure, numerous changes may be made which readily suggest themselves to those skilled in the art and which are encompassed in the spirit of the claimed technology disclosed and as defined in the appended claims.
It is to be understood that even though numerous characteristics and advantages of various aspects have been set forth in the foregoing description, together with details of the structure and function, this disclosure is illustrative only, and changes may be made in detail, especially in matters of structure and arrangement to the full extent indicated by the broad general meaning of the terms in which the appended claims are expressed.