The invention relates to the field of optical recording, more specifically to the maintenance of databases containing metadata under the restrictions imposed by optical recording media.
Metadata is a term known in the art denoting data about data. Metadata being structured, they can be stored in databases. In future multimedia applications, metadata will likely be large in size and frequently changing; they will likely be stored on rewritable optical data carriers, alongside the data they relate to. Storage of frequently changing, “living” databases on rewritable optical media is hampered by the fact that such media allow only a limited number of rewrite cycles for each data sector. Too many write cycles for a data sector lead to a degradation of the sector. Hence a problem arises to devise a database management system adapted to the context of a limited rewrite cycle environment.
The mentioned problem, and others, are solved in the invention by a method for modifying a database file containing the steps of: reserving, within the database file, at least one area of predetermined size and position dedicated to writing thereto data records of at least one type, respectively; indicating within the database file, as a last written segment, that segment within the area to which data records were last written; and ensuring distributed write in that, whenever a data record of a specific type is to be written to the database, the writing uses, within the area dedicated to the specific type, the next available segments after the last written segment.
In this, advantageously, the segments are first written in sequential order into a database area. After that, when the last segment of a database area has been written, the next write operation will wrap around to the beginning of the database area again, and will write to any unused or invalidated segments found there. Active segments, i.e. segments containing valid data, will not be changed or overwritten. They can only be invalidated. Changed content can only be written to one of the next segments ready for writing to. So, even in a second or consecutive pass through the database area, sequential writing is maintained as much as possible, hence ensuring distributed write as much as possible.
According to another aspect of the invention, modifying a data record of a specific type in the database file contains the steps of: reading, from the associated database area, the data record; modifying the read data record; obtaining a first write address information indicating a segment within the area to which a data record of the specific type was last written; forwarding, as part of ensuring distributed write, the first write address information so that it indicates a next segment within the area which contains unused space; and writing the modified data record to the segment as indicated by the first write address information.
Databases may contain a control area comprising several control blocks. Typically, only one of these control blocks of one or more contiguous segments will be valid. Such control block typically is subject of frequent changes and may contain information about the validity of documents and segments in a payload area as well as of indices in an index area in the database. At least prior to ejecting a data carrier when the database content has changed, the new control block has to be written and shall be written to the next segments in the control area. This spreads the number of write or rewrite operations for the control block to all the segments in the control area. When opening an unknown data carrier, the one and only valid control block has to be found by inspecting an attached version number, as there is no possibility to store its permanently changing segment address at a fixed position on the data carrier.
According to this aspect of the invention, deleting a payload data record from a database file containing a control area, contains the steps of: reading, from the control area, control blocks containing information associated to the payload data record to be deleted; marking, in the read control blocks, the payload data record to be deleted as deleted, thereby obtaining a modified control block; obtaining a write address information indicating the segment within the control area to which a control block was last written; forwarding, as part of ensuring distributed write, the write address information so that it indicates a next segment within the control area which contains unused space; and writing the modified control block to the segment as indicated by the forwarded write address information.
According to another aspect of the invention, ensuring distributed write contains substeps of incrementing the write address information until it indicates a next segment after the last written segment which contains unused space; and resetting the write address information to the start of the area in case the incrementing has caused the write address information to indicate a segment beyond the end of the area. This contributes to maintaining sequential writing as much as possible, even in a second or consecutive pass through the database area, hence ensuring distributed write as much as possible. Also, the bigger an area has been dimensioned at database creation time, the less often will this “wrap-around” happen; hence, under an otherwise unchanged application, there is an inverse relationship between the size of an area and the average number of rewrites of the segments within that area.
The invention relates to a general database format as well as to a data carrier write strategy, which advantageously ensure a number of rewrite operations for each data sector to be leveled as much as possible. In this way degradation of specific sectors of the data carrier is avoided. The system of this invention is distinguished by being adapted to specific characteristics of optical data carriers like a limited number of rewrite cycles and a relatively high track seek time in comparison to hard disks. For some media, about 1000 rewrite cycles are realistic to assume, which is a high number considering the rewrite strategy in use and will not be reached in normal use cases.
Exemplary embodiments of the invention are illustrated in the drawings and are explained in more detail in the following description.
In the figures:
An embodiment of the invention uses a pre-allocated contiguous area of the available storage space for the database. This can be a simple file or a partition depending on the file system of the data carrier. The only requirement is for it to be organized in segments and to have random read and write access.
Except a first segment containing static database information like sizes of all areas and segment size and version, all other segments of the DBFile are grouped into several distinct database areas like control area, index area, payload area. The sizes of each area are application specific and specified at DBFile creation time. Each area of the DBFile advantageously can be organized in segments of constant size, which should reasonably be a multiple of the ECC-block size. The Error Correction Code or ECC determines the smallest readable block on data carrier. Hence it is also advantageous to align the segment borders with ECC-block borders. Applications may exist, where it is advantageous to use a different although constant segment size within each area, respectively.
Payload data, also denoted as documents or records, will be stored in the payload area. A segment in this area can store one or more documents. Documents may span over one or more segments due to their size or due to using the last free space in an almost full segment. Only complete documents are added, retrieved, or invalidated/“removed”. “Removed” documents will not really be removed on the data carrier because of the additional write access to the segment this would cause. Rather, they will just be invalidated in the control block. Any unused parts in segments can be left unused until e.g. the number of completely unused segments gets low. Then the remaining documents in partially invalidated segments can advantageously be gathered and put into new segments, as a kind of garbage collection. In this way the old segments will become available for new documents.
It must be noted, that the system of this invention, despite being adapted to the specific characteristics of optical data carriers, can nevertheless also be used on hard disk storage without drawback.
In summary, search operations are optimized with respect to the physical order of the payload within the database file. Even the parallel execution of multiple search operations is possible. This enables multi-user applications. The underlying file structure ensures the best performance for optical recording media with respect to a very limited and nearly equal number of rewrite cycles for all sectors.
Error-Correction-Code Blocks, are the smallest segments readable and writable on optical data carriers. The database file is advantageously designed to occupy one area consisting of one continuous extent of these ECC blocks, organized as one file under the pertinent file system. No other file is needed on the data carrier. The internals of this database file are managed by the system of the invention, there is no reliance on specific file system features to support the rewrite strategy of this invention. With this design, the fragmentation of stored data can be determined and controlled. The size of the database file can be set to a default value and may be adjusted if necessary. The database file should be big enough to avoid the need for any sophisticated operations, otherwise a complete database reorganization may be necessary in the worst case. On the other hand, in situations when other applications need more space on the data carrier outside the database file, there also is the option to reduce the size of the database.
Due to the limited number of rewrite cycles of the data carrier, there is the rule that segments cannot be changed. With other words, segments will never be read, modified, and then stored back to the same location on the data carrier. Rather, segments can only be invalidated completely or in parts. Changed content has to be invalidated at its original location and then has to be written to the next segment ready for writing to, which normally is a different location. While adding documents to the database file, segments can be write-cached until their capacity is completely used.
A start code for easy identification of the database file
A version number
Segment size
Control area size
Index area size
Payload area size.
The Header segment 41 is normally written only once in the lifetime of a database file. It has only to be changed if the database is reorganized in such a way as to change one of the Database Header fields, e.g. changing the size of segments, the size of areas or the internal data format. This may happen upon an update of related specifications.
In certain cases, it is advantageous to employ control blocks in a separate control area 42, as shown in
The control block 53 is the container for the possibly changing information about segments containing payload or indexes. Control block data will typically be loaded into memory when the database is opened and can be kept in memory until the database is closed. The control block needs only to be written back to data carrier if it has been changed. For security reasons the changed control block may be stored more than once on the data carrier to improve resilience against data loss in case of system failure. The following information about the database can be stored in the control block:
A control block 53 may span more than one contiguous segments. Advantageously, control blocks are stored segment aligned, i.e. every control block starts at the beginning of the segment following the last segment of the previous control block. This implies that there may be unused space in the end part of the last segment of the previous control block, and this will be left unused. Segment alignment eases the identification of the last written and therefore currently valid control block. If the remaining segments of the control area 42 are not sufficient to store the complete new control block, then the control block uses these remaining segments and continues at the beginning of the control area, i.e. the control block may wrap around the control area borders.
There is one exception where the next free segment should not be used: For storing documents that span segment borders, more than one consecutive segment may have to be used. In this case, a next free segment group not large enough for such storage has to be skipped until a usable group is found.
An alternative exists for the special case where documents, despite being smaller than a segment, happen to be stored such that they span segment borders. Such documents can be stored at the beginning of the next single free segment. The remaining free space of the current segment will then be left unused.
Number | Date | Country | Kind |
---|---|---|---|
03016382.8 | Jul 2003 | DE | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/EP04/07985 | 7/16/2004 | WO | 1/12/2006 |