In a file system, a file may be retained such that the file is stored for a period of time. While under retention, the file system may perform validation scans on the retained file to ensure that the integrity of the data contained in the file has not been compromised.
Certain exemplary embodiments are described in the following detailed description and in reference to the drawings, in which:
The present disclosure is generally related to file retention in a file system. When a file in a file system is retained, the file system may perform validation scans to check the integrity of the data contained in the file. In some examples, the file system may check each stored file one-by-one to see if the file is in retention or not. If the file is in retention, then the file system performs a validation scan. Otherwise, the file system can skip the particular file. This process may be time-consuming and cumbersome.
Described herein is a method to reduce the amount of time and resources used for validation scans. When a file undergoes retention, a retention event can be recorded in a journal. The unique identifier and location information of the retained filed can be stored in a database. A hash generator can generate a hash of the retained foe. A hash, as described herein, is a datum used to represent the data content of the retained file. The hash may be a checksum, for example. The hash can be recorded into the database and associated with the unique identifier and the location information of the retained file. The information recorded in the database can be used during a validation scan to determine which files in the file system are under retention, thus eliminating the process of checking the retention state of each file one-by-one. Thus, the file system can query the database to select retained files to scan. The described method is more expedient, and can allow for multiple retained files to be validated in parallel, thus optimizing the amount of time required to perform multiple validation scans.
The processor 102 may be connected through a system bus 104 (e.g., AMBA®, PCI®, PCI Express®, Hyper Transport®, Serial ATA, among others) to an input/output (I/O) device interface 106 adapted to connect the computing system 100 to one or more I/O devices 108. The I/O devices 108 may include, for example, a keyboard and a pointing device, wherein the pointing device may include a touchpad or a touchscreen, among others. The I/O devices 108 may be built-in components of the computing system 100, or may be devices that are externally connected to the computing system 100.
The processor 102 may also be linked through the system bus 104 to a display device interface 110 adapted to connect the computing system 100 to display devices 112. The display devices 112 may include a display screen that is a built-in component of the computing system 100. The display devices 112 may also include computer monitors, televisions, or projectors, among others that are externally connected to the computing system 100.
The processor 102 may also be linked through the system bus 104 to a memory device 114. In some examples, the memory device 114 can include random access memory (e.g., SRAM, DRAM, eDRAM, EDO RAM, DDR RAM, RRAM®, PRAM, among others), read only memory (e.g., Mask ROM, EPROM, EEPROM, among others), non-volatile memory (PCM, STT_MRAM, ReRAM, Memristor), or any other suitable memory systems.
The processor 102 may also be linked through the system bus 104 to a storage device 116. The storage device 116 may contain one or more files 118 in a file system. The file 118 may be a document, application, media, or any other virtual item that can be stored.
A retention module 120 in the storage device can include instructions to direct the processor 102 to retain a file 118 such that the file 118 can become read-only. The retention module 120 can place the file in a retention state. The retention module 120 can store information pertaining to the retained file 118 in a database. The retention module 120 can generate a hash to represent the contents of the file, and store the hash such that the hash is associated with the information pertaining to the retained file 118.
A validation module 122 in the storage device can include instructions to direct the processor 102 to perform a validation scan on the retained file 118. The validation module 122 can scan the database to quickly determine which of the files 118 in the file system of the storage device 116 has been retained. The validation module 122 can retrieve the stored hash associated with the retained file 118. The validation module 112 can generate a new hash, referred to herein as a validation hash, of the retention file 118 in its current state. The validation module 122 can compare the validation hash to the stored hash to determine whether or not the retained file 118 has undergone any alterations while in the retention state. The validation module 122 can update the database entry of the retained file 118 with results of the comparison.
At block 204, following the WORM transition 202, the WORM event can be written into a journal 206. The journal 206 may be a collection of files that can be made available for all user mode processes involving the file system. The journal 206 can also provide a record of updates made to each file in the file system. At block 208, the WORM event is scanned and picked up. Identification and location information regarding the retained file, such as file's unique ID, Segment ID, and the path name can be determined. At block 210, a hash of the content and metadata of the file can be generated. The file identification and location information, along with the generated hash, can be stored in an entry of a pipelined database 212, such that the hash is associated with the file identification and location information. At block 214, information stored in the pipelined database 212 can be made available for easy query and retrieval by the computer's reporting systems as well as future scans.
A validation scan 216 may be performed by either user initiation or by a schedule in the computer. The processor of the computer can run a query on the pipelined database 212 to see which files are under retention. The validation scan 216 can generate a new hash from a retained file under scan. The new hash is compared to the hash associated with the retained file in the pipelined database 212. The results of the validation scan 216 can be updated to the journal 206 and the pipelined database 212.
At block 302, the processor accesses a file in the storage device. The file may be part of a file system. At block 304, the processor places the file in a retention state. Retention can allow the file to be stored for a set period of time. In some examples, the file undergoes write-once-ready-many (WORM) transition. In some examples, the retention event is recorded into a journal.
At block 306, the processor stores the file's information in a database. The file's information can include a file ID, a segment ID, and a path name. The file ID is a unique identifier for the file. The segment ID indicates what segment of the file system the retained file exists on. In some examples, the file's information can be entered in a query-able table of a pipelined database.
At block 308, the processor generates a hash of the file's content. The hash may be a small, arbitrary datum mapped to the retained file. The hash may be a checksum that represents the content of the retained file. In some examples, a hash of the file's metadata can also be generated.
At block 310, the processor stores the hash into the database with the file's information. The hash can be stored in the same table as the file's information, such that the hash is associated with the file's information. When queried, the database can provide the hash along with the information pertaining to the file. In some examples, the stored hash can be used for several other applications beyond validation scans.
At block 402, the processor scans a database for a stored hash associated with a retained file. The processor can run a query on the database to see which files in a file system are associated with a hash. The processor can retrieve a file path corresponding to a retained file with a stored hash.
At block 404, the processor generates a validation hash of the retained file. At block 406, the processor compares the validation hash to the stored hash. For a plurality of retained files, a plurality of validation hashes and subsequent comparisons to stored hashes may be performed simultaneously. In other words, multiple validation scans can be performed in parallel.
At block 408, the processor stores results of the comparison in the database. The results can be entered into the same table as the stored hash and information pertaining to the retained file. The information stored in the table can be made available to the computing system's reporting tools. In some examples, the journal is also updated with the results from the comparison. Thus, the journal can provide a history log of the file's retention history.
As shown in
The block diagram of
As shown in
The block diagram of
While the present techniques may be susceptible to various modifications and alternative forms, the exemplary examples discussed above have been shown only by way of example. It is to be understood that the technique is not intended to be limited to the particular examples disclosed herein. Indeed, the present techniques include all alternatives, modifications, and equivalents falling within the true spirit and scope of the appended claims.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2013/073616 | 12/6/2013 | WO | 00 |