1. Field
The present disclosure relates generally to a method, apparatus, system, and computer readable media for detecting and correcting errors in append-only datastores, and more particularly for representing append-only error detection and correction both on-disk and in-memory.
2. Background
Traditional datastores and databases are designed with log files and paged data and index files. Traditional designs store operations and data in log files and then move this information to paged database files, e.g., by reprocessing the operations and data. This approach has many weaknesses or drawbacks, such as the need for extensive error detection and correction when paged files are updated in place, the storage and movement of redundant information and the disk seek bound nature of in-place page updates.
In light of the above described problems and unmet needs as well as others, systems and methods are presented for providing transaction consistent error detection and correction both in-memory and on-disk. This is accomplished using the novel properties of append-only files to greatly simplify the error detection and correction process.
For example, aspects of the present invention provide advantages such as instantaneous shut down and start up times, asynchronous index creation and updates, greatly simplified transaction consistent error detection and correction including transaction roll-back and efficient use of storage resources by eliminating traditional logging and page files containing redundant information and replacing them with append-only transaction end state files and associated index files.
Additional advantages and novel features of these aspects of the invention will be set forth in part in the description that follows, and in part will become more apparent to those skilled in the art upon examination of the following or upon learning by practice thereof.
Various aspects of the systems and methods will be described in detail, with reference to the following figures, wherein:
Location requests in accordance with aspects of the present invention.
Record location request in accordance with aspects of the present invention.
These and other features and advantages in accordance with aspects of this invention are described in, or will become apparent from, the following detailed description of various example illustrations and implementations.
The detailed description set forth below in connection with the appended drawings is intended as a description of various configurations and is not intended to represent the only configurations in which the concepts described herein may be practiced. The detailed description includes specific details for the purpose of providing a thorough understanding of various concepts. However, it will be apparent to those skilled in the art that these concepts may be practiced without these specific details. In some instances, well-known structures and components are shown in block diagram form in order to avoid obscuring such concepts.
Several aspects of systems capable of providing representations of transactions for both disk and memory, in accordance with aspects of the present invention will now be presented with reference to various apparatuses and methods. These apparatus and methods will be described in the following detailed description and illustrated in the accompanying drawings by various blocks, modules, components, circuits, steps, processes, algorithms, etc. (collectively referred to as “elements”). These elements may be implemented using electronic hardware, computer software, or any combination thereof Whether such elements are implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system.
By way of example, an element, or any portion of an element, or any combination of elements may be implemented using a “processing system” that includes one or more processors. Examples of processors include microprocessors, microcontrollers, digital signal processors (DSPs), field programmable gate arrays (FPGAs), programmable logic devices (PLDs), state machines, gated logic, discrete hardware circuits, and other suitable hardware configured to perform the various functionality described throughout this disclosure. One or more processors in the processing system may execute software. Software shall be construed broadly to mean instructions, instruction sets, code, code segments, program code, programs, subprograms, software modules, applications, software applications, software packages, routines, subroutines, objects, executables, threads of execution, procedures, functions, etc., whether referred to as software, firmware, middleware, microcode, hardware description language, or otherwise.
Accordingly, in one or more example illustrations, the functions described may be implemented in hardware, software, firmware, or any combination thereof If implemented in software, the functions may be stored on or encoded as one or more instructions or code on a computer-readable medium. Computer-readable media includes computer storage media. Storage media may be any available media that can be accessed by a computer. By way of example, and not limitation, such computer-readable media can comprise random-access memory (RAM), read-only memory (ROM), Electrically Erasable Programmable ROM (EEPROM), compact disk (CD) ROM (CD-ROM) or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. Disk and disc, as used herein, includes CD, laser disc, optical disc, digital versatile disc (DVD), floppy disk and Blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
Computer system 100 includes one or more processors, such as processor 104. The processor 104 is connected to a communication infrastructure 106 (e.g., a communications bus, cross-over bar, or network). Various software implementations are described in terms of this example computer system. After reading this description, it will become apparent to a person skilled in the relevant art(s) how to implement aspects of the invention using other computer systems and/or architectures.
Computer system 100 can include a display interface 102 that forwards graphics, text, and other data from the communication infrastructure 106 (or from a frame buffer not shown) for display on a display unit 130. Computer system 100 also includes a main memory 108, preferably RAM, and may also include a secondary memory 110. The secondary memory 110 may include, for example, a hard disk drive 112 and/or a removable storage drive 114, representing a floppy disk drive, a magnetic tape drive, an optical disk drive, etc. The removable storage drive 114 reads from and/or writes to a removable storage unit 118 in a well-known manner. Removable storage unit 118, represents a floppy disk, magnetic tape, optical disk, etc., which is read by and written to removable storage drive 114. As will be appreciated, the removable storage unit 118 includes a computer usable storage medium having stored therein computer software and/or data.
In alternative implementations, secondary memory 110 may include other similar devices for allowing computer programs or other instructions to be loaded into computer system 100. Such devices may include, for example, a removable storage unit 122 and an interface 120. Examples of such may include a program cartridge and cartridge interface (such as that found in video game devices), a removable memory chip (such as an EPROM, or programmable read only memory (PROM)) and associated socket, and other removable storage units 122 and interfaces 120, which allow software and data to be transferred from the removable storage unit 122 to computer system 100.
Computer system 100 may also include a communications interface 124. Communications interface 124 allows software and data to be transferred between computer system 100 and external devices. Examples of communications interface 124 may include a modem, a network interface (such as an Ethernet card), a communications port, a Personal Computer Memory Card International Association (PCMCIA) slot and card, etc. Software and data transferred via communications interface 124 are in the form of signals 128, which may be electronic, electromagnetic, optical or other signals capable of being received by communications interface 124. These signals 128 are provided to communications interface 124 via a communications path (e.g., channel) 126. This path 126 carries signals 128 and may be implemented using wire or cable, fiber optics, a telephone line, a cellular link, a radio frequency (RF) link and/or other communications channels. In this document, the terms “computer program medium” and “computer usable medium” are used to refer generally to media such as a removable storage drive 114, a hard disk installed in hard disk drive 112, and signals 128. These computer program products provide software to the computer system 100. Aspects of the invention are directed to such computer program products.
Computer programs (also referred to as computer control logic) are stored in main memory 108 and/or secondary memory 110. Computer programs may also be received via communications interface 124. Such computer programs, when executed, enable the computer system 100 to perform the features in accordance with aspects of the present invention, as discussed herein. In particular, the computer programs, when executed, enable the processor 110 to perform various features. Accordingly, such computer programs represent controllers of the computer system 100.
In an implementation where aspects of the invention are implemented using software, the software may be stored in a computer program product and loaded into computer system 100 using removable storage drive 114, hard drive 112, or communications interface 120. The control logic (software), when executed by the processor 104, causes the processor 104 to perform various functions as described herein. In another implementation, aspects of the invention are implemented primarily in hardware using, for example, hardware components, such as application specific integrated circuits (ASICs). Implementation of the hardware state machine so as to perform the functions described herein will be apparent to persons skilled in the relevant art(s).
In yet another implementation, aspects of the invention are implemented using a combination of both hardware and software.
Transaction log, transaction state log, index files and schema may be written to disk in append-only mode, greatly increasing reliability and durability. However, greatly increased reliability and durability does not eliminate errors such as incomplete or inconsistent writes to disk. It also does not eliminate the possibility of data corruption errors within a file itself (e.g. a file is overwritten or corrupted by an external mechanism).
Thus, potential errors include three major classes of errors:
Aspects presented herein address potential sources of error, including each of these three classes of errors while simultaneously speeding startup and shutdown times.
Files may need to be closed and/or archived and error correction and detection must take these events into consideration and converge quickly when they are detected. In this case, new Real Time Key Logging (LRT) files, Real Time Value Logging (VRT) files, and Real Time Key Tree Indexing (IRT) files can be created, and new entries may be written to these new files. An LRT file may be used to provide key logging and indexing for a VRT file. An IRT file may be used to provide an ordered index of VRT files. LRT, VRT, and IRT files are described in more detail in U.S. Utility application Ser. No. 13/781,339, filed on Feb. 28, 2013, titled “Method and System for Append-Only Storage and Retrieval of Information,” the entire contents of which are incorporated herein by reference. Performing error detection and correction requires an understanding of the type of files being processed and how the files are organized in storage, e.g., how the on-disk transaction log, state log, index and schema files are organized. An example logical illustration of file layout and indexing with an LRT file, VRT file, and IRT file is shown in
At 304, error detection and correction is begun, the error detection and correction involving at least one datastore based on the user or agent input. The datastore involved in the error detection and correction may have its associated transaction log checked for errors, as at 312. In addition, detected transaction log errors may be corrected at 314. Additional aspects of such detection and correction of transaction log errors are described in additional detail in connection with
Affected state logs may be identified at 306. State log error detection may be performed at 316. State log error correction may be performed at 318. Additional aspects of such detection and correction of state log errors are described in additional detail in connection with
Affected index files may be identified at 308. Index file error detection may be performed at 320. Index file error correction may be performed at 322. Additional aspects of such detection and correction of index file errors are described in additional detail in connection with
The error detection and correction process is ended at 310, and the state of the error correction process is written to the transaction log in an append-only manner at 310, wherein the state comprises error detection and correction flags.
All LRT, VRT, and IRT files include a header. This header may contain errors. Thus, it may be beneficial to check a header for errors before additional error detection and correction is performed. Error detection may include checking for a properly formed header of the appropriate length and checking the header's Cyclic Redundancy Check (CRC), if one is present.
When the header CRC is to be checked, a determination of header validity may be made at 416. If the header CRC is not valid, as determined at 416, an invalid CRC error code is returned at 418.
When the header CRC is not checked, as determined at 414, or when the header CRC is determined to be valid at 416, a determination as to the validity of header contents is made at 420. If the header contents are not valid as determined at 420 an invalid header content error code is returned at 422, otherwise the header length is returned at 424.
When a header has been determined to include an error(s) such header errors may be corrected, e.g., by header regeneration.
When there are header errors to correct, as determined at 506, the header length is compared to the file length in 508. If the header length is greater than the file length, the header is regenerated at 522 and the header length is returned at 524. If there are header errors to correct, as determined at 506, the Error Code is checked in 508. If it is determined that the header length was greater than the file length, the header is regenerated in 522. Then, the regenerated header's length is returned in 524.
If the Error Code, as determined at 508, is not header length greater than file length, the Error Code is checked to determine if there is an invalid CRC in 510. If the error is an invalid CRC, as determined at 510, the header is regenerated in 522. Then, the regenerated header's length is returned in 524.
When the Error Code is not an invalid CRC, as determined at 510, the Error Code is checked in 512 to determine whether there was a header length decode failure. If there was no header length decode failure, an error condition is returned at 518. If there was a header length decode failure, as determined at 512, the file is decoded from its end in 514. If the file could not be decoded, as determined at 516, an error condition is returned at 518. If the file could be decoded, as determined at 516, the header length is captured at 520, the header is regenerated in 522 and the regenerated header's length is returned in 524.
If the header errors are corrected in 610, as determined by 612, the header length is returned at 608. Otherwise the errors could not be corrected and an error indication is returned at 614.
Entries in LRT/VRT file pairs are written in lock-step. Each LRT entry maps to one, and only one, VRT entry. If the number of LRT entries is not equal to the number of VRT entries, or if the last entry in either or both files is not complete, there has been an incomplete record error.
Correcting the incomplete record error includes, e.g., its erasure from both the LRT and VRT files. Record erasure includes, e.g., the detection of the last complete record in each file. Once detected, the shortest run length of complete records from both files is chosen and all records after that point (complete or incomplete) are erased from both the LRT and VRT files. This process results in files with equal numbers of complete entries.
Record erasure may be performed by physically removing records from the file or by marking records as erased. Marking records as erased preserves the append-only invariant and provides a persistent indication of error detection and correction.
When group operations are being used (e.g. transactions or defragmentation) detecting and erasing incomplete records is necessary but may not be sufficient. In this case, incomplete groups must also be erased. Incomplete groups are deleted by scanning backwards from the last complete record until either a group end or a group start flag is reached (checking for group end first).
If a group end flag is reached this implies the records that followed were not part of a group operation. In this case the search stops and all records after the group end are erased. It is assumed records falling outside group boundaries are non-transactional and thus may be erased without data loss.
If a group start flag is reached this implies a group operation has lost its end and is incomplete. In this case all records after and including the group start flag are erased. This operation will result in data loss if the group operation represented a transaction. However, if the group operation represents a defragmentation operation, and the data contained within the defragmentation operation is still present in the LRT/VRT files (e.g. it is redundant information), there is no data loss.
Erasing records at the end of LRT/VRT files reduces the virtual length of those files. This new length provides an upper bound on value position encoded in IRT files. All changes after the last valid value position are rolled back by erasing all IRT segments containing value positions after the last value position indicated by the corrected VRT virtual file length. Once all error correction is performed the virtual file lengths are set to the real file lengths.
When there are LRT records a determination is made as to whether the records are a fixed size at 706 and if the records are determined to be a fixed size, fixed size processing is performed to determine the last valid record location in 708. At 710 a determination is made as to whether the last valid record location that was calculated in 708 is less than the header size. If the last valid record location is determined to be less than the header size, zero (indicating an error) is returned at 716. When no error is present, as determined at 710, the last valid record location is returned at 712.
If LRT entries are not a fixed size, as determined at 706, a determination is then made as to whether the file should be scanned from its end at 714. If it is determined that the file should be scanned from its end, processing is continued at 750 in
When the Flags and Key Size Field can be read, as determined at 734, the Next Cursor is advanced to Cursor plus Flags Length plus Key Size in 738. If the next Cursor is greater than the File Length, the Last Valid Record Location is set to Cursor in 736. Then, processing continues in
When the Cursor is not less than the Header Length as determined at 756 a determination is made as to whether the Max Scan Length has been exceeded at 758. If the Max Scan Length has been exceeded, processing continues in
If a frame was detected at 762, the cursor is rewound to before the key Flags at 766 and the Frame Count is checked in 768 to determine if this is the first valid frame. If this is the first valid frame, the Last Valid Record Location is set to the Cursor in 770 and the Frame Count is incremented at 772.
When the Frame Count is non-zero, as determined at 768, a subsequent frame has been found, and the Frame Count is incremented in 772. Next, the Frame Count is checked at 774 to determine whether it is greater than or equal to the number of Trials required for valid framing detection. If valid framing has been detected at 744, processing continues in
When the Last Valid Record Location is not less than Header Size, as determined at 806, a check is made to determine if the LRT Entries are of a fixed size at 808. When the LRT entries are not of a fixed size, the last valid entry's size is decoded from the last valid LRT entry in 810 and is set to LRT Entry Size. If LRT entries are of fixed size, as determined at 808, the LRT Entry Size is set to the known fixed size. In both cases, the Last Valid Entry End is set in 812 based on the LRT length minus the sum of Last Valid Record Location and LRT Entry Size.
Finally, if the Last Valid Entry End is less than the LRT File Size, as determined at 814, an error has occurred and TRUE is returned at 820. Otherwise there is no error and TRUE is returned at 818.
Similar to a request to get the last valid LRT record location, a request may be received to obtain the last valid VRT record location.
When VRT File Length is greater than VRT Header Length, as determined at 904, a determination is made as to whether the VRT Entries are of a Fixed Size at 906. If the VRT Entries are of a fixed size, as determined at 908, the Last Valid Record Location is set to VRT Length minus any partial record bytes minus the VRT Entry Size in 908. At 910 the Record Location is checked against Header Size. If the Record Location is less than Header Size, an error has occurred and zero is returned at 926. When the Record Location is not erred, as determined at 910, the Record Location is returned at 912.
If VRT entries are not of a fixed size, as determined at 906, the Last VRT Record
Location is retrieved from the LRT in 914. Additional details of such a retrieval are described in connection with
When the Record Location is zero, as determined at 918, the Record Location is set to
Header Length at 920 and the VRT is scanned from the Record Location in 922. Additional details of such a scan are described in connection with
When the Record Length is decoded, as determined at 1008, the Next Record
Location is set to Record Location plus Record Length at 1016. If the Next Record Location equals File Length, as determined at 1018, the Record Location is returned at 1024. Otherwise, the Next Record Location is checked to determine whether it is less than File Length at 1020. If it is determined that the Next Record Location is less than File Length, Record Location is set to the Next Record location at 1022. Thereafter, scanning continues at 1006. When the Next Record Location is not less than File Length, as determined at 1020, the end of the file has been reached and processing continues at 1010.
If the VRT File Length is greater than or equal to the VRT Header Length, as determined at 1104, the Last Valid VRT Record Location is calculated at 1106. Additional details of such a calculation are described in connection with
Once an error has been detected, a request may be received to correct the error. FIG.
12 is a flow chart illustrating aspects of an example automated method 1200 of receiving a correct LRT/VRT error request at 1202 and getting the Last Valid LRT Record Location in 1204. Additional details are described in connection with
When the Record Location is equal to zero, as determined in 1206, the LRT File
Length is compared to the LRT Header Length in 1220. If the LRT File Length is less than LRT Header Length, the LRT File Header is regenerated in 1222. When the LRT File Length is greater than or equal to Header Length as determined at 1220 or after the LRT File Header is regenerated in 1222, the Next LRT Record Location is set to the LRT Header Length in 1224.
If the VRT File Length is less than the VRT Header Length, as determined at 1226, the VRT file header is regenerated at 1228 and the Next VRT Record Location is set to VRT Header Length in 1230. Thereafter, the process returns at 1218. When the VRT Header Length is greater than or equal to the VRT Header Length, as determined at 1226, the Next VRT Record Location is set to VRT Header Length at 1230. Then, the process returns at 1218.
When File Length is not equal to Header Length, as determined at 1306, the Segment
End Reference is set to File Length at 1308. Then, the IRT segment errors are detected at 1310. Additional details of such error detection are described in connection with
A IRT file may suffer an incomplete write. This error is detected by scanning for well-formed segments starting at the end of the IRT. A simple heuristic error checker can consider the following:
1) Segment size at the end of the segment and the beginning of the segment must be equal, if not there is corruption
2) If a per-segment CRC is present, it should be recalculated against the segment data and if it does not match there is an error
3) If segment sizes are equal the File UUID Index can be checked to determine if it references a valid file, if not there is an error
4) If the File UUID Index is valid, the Last Indexed Position can be checked to see if it falls within the indexed file, if not there is an error
The above process can be repeated for up to T successful trials where increasing T decrease the probability of false negatives (undetected errors). In other words, the larger T the more accurate error detection becomes.
When an erred IRT file is detected the error is corrected by scanning for complete segments. A complete segment scan can start at the beginning of the IRT file or at the end of the IRT file. Starting at the beginning of the file can assume aligned segments while starting at the end of the file requires a byte-by-byte scan for segment alignment.
The scanning process can use the full segment error detection process described above or can use a subset of error detection checks depending on speed and accuracy goals. For example, step 1 can be used until an error is detected, at which point the test can “back up” N segments and do steps 2-4 to ensure the integrity of the last N segments. Generally, only the last segment will be invalid due to an incomplete write.
Once the first invalid segment is detected all data from that point in the file to its end can be erased. This leaves only complete, valid segments in the IRT file.
Once error correction has been performed, the IRT file may have more or less segments than its associated LRT/VRT files based on the Last Indexed Position of its segments. This is the final error that must be detected and corrected. There are three possibilities:
1) The IRT has indexed more data than is now available in LRT/VRT files (because those files were erred and have been corrected)
2) The IRT file has indexed less data than is now available in LRT/VRT files—this is a “normal” case where the remaining IRT file index can simply be generated from the LRT/VRT files
3) The IRT file has indexed all data available in the LRT/VRT files—no further work needs to be done
When the IRT has indexed more data than is now available in the LRT/VRT (case 1 above), those indexes must be erased from the IRT file. Maintaining the append-only invariant requires the erasure of all segments after the earliest missing segment across all LRT/VRT files. This may erase valid indexes from the IRT from unaffected LRT/VRT files but this is OK as those indexes can and will be re-generated (case 2 above).
When the segment size is decoded, as determined at 1410, the decoded segments sizes are compared in 1414. If they do not match, an error has occurred and TRUE is returned at 1422. If the segment sizes match and a Segment CRC should be checked, as determined at 1416, the Segment CRC is checked in 1418. If the segment does not match the CRC, an error has occurred and TRUE is returned in 1422. Otherwise, the CRC is determined to be valid in 1418 or the Segment CRC did not need to be checked as determined at 1416 and the File UUID Indexes are validated at 1420. If the file indexes are not valid, as determined at 1420, an error has occurred and TRUE is returned at 1422.
When the File UUID Indexes are valid, as determined at 1420, the File References are validated in 1424, and when they are valid, FALSE is returned at 1426. If the file references are not valid, as determined in 1424, an error has occurred and TRUE is returned in 1422.
Once errors in the IRT have been detected, steps may be taken to correct the error.
When the Header Length is greater than or equal to zero, as determined at 1506, the
File Length is compared to File Header Length at 1510. If they are equal, the File Length is returned at 1512. When the File Length is not equal to the File Header Length, the Segment End Reference is set to File Length at 1514 and IRT segment errors are detected at 1516. Additional details are described in connection with
At 1518 a determination is made as to whether the last segment is valid. If the last segment is determined to be valid, the File Length is returned at 1512. When the last segment is not valid, as determined at 1518, the last segment is erased at 1520. Additional details are described in connection with
When the IRT has indexed more data than is now available in the LRT/VRT, those indexes must be erased from the IRT file. Maintaining the append-only invariant requires the erasure of all segments after the earliest missing segment across all LRT/VRT files. This may erase valid indexes from the IRT from unaffected LRT/VRT files. However, those indexes can be re-generated.
The transaction log file may suffer an incomplete write as well as containing incomplete or erred transactions due to errors and error correction in LRT and VRT files. This leads to at least two error correction phases: (1) correcting incomplete writes within the transaction log file, and (2) correcting incomplete/erred transactions with respect to LRT and VRT files. As the transaction log file includes fixed size records, incomplete writes may be detected simply by determining whether there are any partial records in the transaction log. This may be accomplished, e.g., using ((file length-header length) % record length) !=0.
When an incomplete record is detected that record is erased.
The transaction log file is processed from its end to determine if there are any incomplete transactions. This is accomplished by matching each transaction's begin, end and commit/abort entries until a No Outstanding Transactions flag is encountered in a commit/abort entry. Once the No Outstanding Transactions flag is encountered, any unmatched transactions are determined to be incomplete. Note, there may also be unknown transactions in the event of the loss of a large number records.
Incomplete transactions must be aborted if there is no chance for those transactions to completed (e.g. there was an incomplete write and/or process crash). This is accomplished, e.g., by:
1) Writing an Abort to the transaction log for each aborted transaction
2) Setting the No Outstanding Transactions flag in the last aborted transaction written
3) Ensuring the LRT and VRT files match the global transaction log and if they do not performing error correction on them and the transaction log (see below)
The final step in transaction log file error detection and correction, e.g., ensures its transactions match the information in the LRT and VRT files. Once LRT/VRT error correction has been performed those files may have less information than the transaction log indicates they should have. Conversely, once the transaction log's incomplete write errors are corrected it may contain less transactions than the LRT/VRT files indicate it should have. In both cases it is determined the transaction log does not match the LRT/VRT files and error correction must be performed.
When the transaction log does not match the LRT/VRT files all of the files may be analyzed to determine which transactions must be erased from the LRT/VRT files (if any) and aborted in the transaction log (if any). The following process may be used to determine which files must be modified:
1) Scan the transaction log file from its end
2) For each Start, End, Commit or Abort load the transaction details and record its state
3) For each ordered transaction marked as abort
4) If any transactions were aborted, start over at step 1
At the end of the above process any incomplete transactions will have been marked as aborted in the transaction log and each LRT/VRT file will contain only successful transactions. Once complete, IRT files are adjusted to only contain valid LRT/VRT entries (see IRT error detection and correction above).
If the Header Length is greater than or equal to zero as determined by 1706 the transaction log is checked for incomplete records at 1710. If there are incomplete records, an error has occurred and negative two is returned at 1712. Otherwise, File Length is returned at 1714.
If the Header Length is greater than or equal to zero, as determined by 1806, the transaction log is checked for incomplete records at 1810. If there are no incomplete records the File Length is returned at 1812. When incomplete records are detected at 1810, append only erasure is determined at 1814. If append only erasure is enabled, an erasure code is appended at 1816 and the File Length is returned at 1812. Otherwise, the last complete record and file length is calculated and returned at 1818.
As the different file types are dependent on the others, and as errors can occur in any of the files, it is important to ensure error correction in one file propagates to all other affected files. This may include, e.g., detecting and correcting incomplete record errors in all files and consistency checking between files. Detection and correction of incomplete record errors may include scanning the transaction log, LRTs, VRTs, and IRTs and correcting incomplete records. This may be localized to each file. It may be considered a file level consistency check. Once the files are internally consistent, consistency checking between files may be performed. Consistency checking between files may include ensuring that LRT/VRT file record counts match and correcting those that do not match, ensuring LRT/VRT and IRTs match and correcting those that do not match, and ensuring that transactions in the transaction log match all the corrected LRTNRT/IRT files and correcting those that do not match.
After transaction log errors are corrected in 1906, each datastore contained by the database is iterated over at 1908 with the iteration ending at 1904. During each iteration each datastore's LRT/VRT file pairs are iterated over in 1910 and LRT/VRT errors are corrected in 1912. Additional details are described in connection with
Once each IRT has been iterated over in 1914, processing continues at 1922 where each LRT/VRT/IRT group is iterated over and LRT/VRT/IRT groups are checked for self-consistency in 1924. When groups are not self-consistent, the LRT/VRT/IRT errors are corrected at 1930. Once the LRT/VRT/IRT group is self consistent it is checked for consistency with the transaction log at 1926.
When the LRT/VRT/IRT group is not consistent with the transaction log as determined at 1926 the LRT/VRT/IRT group and transaction log are made consistent at 1928 and iteration continues at 1922. If the LRT/VRT/IRT group was consistent with the transaction log as determined at 1926 iteration continues at 1922. When LRT/VRT/IRT group iteration is complete datastore iteration continues at 1908.
Data corruption is detected, e.g., using heuristics, file framing and checksums. Heuristics may be used mainly to check the internal consistency of known fields, e.g., flags, segment start, segment end, etc.. File framing, e.g., group start/end, segment start/end, provides natural boundaries for frame compression and frame checksums.
When frame compression is used, decompression operations may detect corruption, e.g., fail decompression. Additionally, frame checksums detect frame corruption in both compressed and uncompressed cases.
When the Frame is compressed, as determined at 2010, the frame is decompressed at 2012. If a decompression error occurs, as determined at 2018, TRUE is returned at 2008. If the frame is not compressed, as determined at 2010, or the frame was successfully decompressed, as determined at 2018, the consistency of the frame's files are checked at 2014. If the fields are consistent, as determined at 2014, the frame is not corrupt and FALSE is returned at 2016. If the fields are not consistent, as determined at 2014, there is an error and TRUE is returned at 2008.
Among other times, error detection and correction can be performed on system startup. The algorithms presented herein enable fast detection and correction when a system is interrupted unexpectedly. When a system is shut down cleanly, start up convergence can be accelerated by appending an indication of a clean shutdown to modified files at the time that the clean shutdown is performed. Additionally, sealed files can have a sealed indication appended to them. This indication enables rapid convergence when performing error detection and correction. Appending clean shutdown and/or sealed indications to each file enables a greatly simplified error detection, and therefore, rapid convergence on system start up. Clean shutdown and sealed indications may also be extended to include additional information used to reduce convergence times for additional operations, e.g., count, cardinality calculation, indexing, etc.
For each database iterated over in 2304 the datastores within that database are iterated over at 2306 and the last file in each file chain is iterated over in 2308. Each last file is checked to determine if it is sealed in 2310 and if it is iteration continues at 2308. If the file is not sealed it is checked for clean shutdown indication in 2312 and if a clean shutdown has been performed iteration continues at 2308.
When a clean shutdown is not indicated at 2312 an error may be present and overall errors are detected and corrected at 2314. Additional details are described in connection with
While aspects of this invention have been described in conjunction with the example aspects of implementations outlined above, various alternatives, modifications, variations, improvements, and/or substantial equivalents, whether known or that are or may be presently unforeseen, may become apparent to those having at least ordinary skill in the art. Accordingly, the example illustrations, as set forth above, are intended to be illustrative, not limiting. Various changes may be made without departing from the spirit and scope hereof Therefore, aspects of the invention are intended to embrace all known or later-developed alternatives, modifications, variations, improvements, and/or substantial equivalents.
This application claims the benefit of U.S. Provisional Application Ser. No. 61/916,541, entitled “Method and System for Error Detection and Correction in Append-only Datastores” and filed on Dec. 16, 2013, which is expressly incorporated by reference herein in its entirety.
Number | Date | Country | |
---|---|---|---|
61916541 | Dec 2013 | US |