The present invention relates to a method for mapping a hierarchical data format to a relational database management system. Furthermore, the present invention relates to a database model and an apparatus for reading from and/or writing to recording media using such method.
The future of digital recording will be characterised by the preparation, presentation and archiving of added value data services, i.e. a recorder, like a DVR (Digital Video Recorder) for example, will store and handle additional information delivered by content providers like broadcasters or special services or even assembled by the user himself. Added value (metadata) is generated to give further information to the user. For example, added value may be a movie summary explaining the story, a listing of the actors etc. Also the provision of additional information facilitating navigation inside the movie constitutes added value. For example, a movie can be structured into sections, subsections etc. each having an individual title and possibly comprising further useful information.
For providing structural information and for transporting other metadata for multimedia objects like video or audio streams, an hierarchical data format is generally used. A well-known and widely accepted hierarchical data format is the extensible markup language XML. XML is a system for defining specialized markup languages that are used for transmitting formatted data. It is, therefore, also called a meta language, a language used for creating other specialized languages. XML data consists of text, which is organised in form of a plurality of descriptors. The text itself contains elements, attributes and content, i.e. the remaining text. Besides the use for multimedia objects, many other applications for XML are known.
It is to be expected that in the foreseeable future digital recorders will store quite a large amount of data in XML or another hierarchical data format in relational databases, since these databases are widely used and quite sophisticated. However, there is the problem that for storage the hierarchical data format has to be mapped to a relational database management system (RDBMS). A number of database models for XML have already been proposed. See for example Rahayu et al.: Representation of multilevel composite objects in relational databases. OOIS'98, Proceedings of the 1998 International Conference on Object Oriented Information Systems, pp 221-238, or Zhang et al.: On Supporting Containment Queries in Relational Database Management Systems, ACM. Sigmod Record, vol. 30, no. 2 (2001), pp. 425-36. However, no database model is known capable of handling diverse types of hierarchical descriptors in a fast manner for inserting descriptors, reading parts of descriptors, reading whole descriptors and performing fast text queries.
It is, therefore, an object of the invention to provide a method for mapping a hierarchical data format comprising descriptors to a relational database management system. It is another object of the invention to provide a database model and an apparatus for reading from and/or writing to recording media using such method.
According to the invention, the descriptors are separated into portions of a common format, which are stored in a relation in the relational database. The method has the advantage that it is independent of the structure of the stored descriptors. Only a restricted number of common formats is required for storing all types of descriptor formats. The common formats comprise, for example, elements, attributes, text etc. In this way each descriptor is analysed word by word, separated into its different components, and stored in the relation, which preferably is a table.
The method can be further improved by providing independent relations for the common formats. Every query uses only these relations. For example, a first relation contains only text, while a second relation contains elements etc. This enables fast and simple queries due to the restricted number of relations. If, for example, a text query has to be performed, only the relation containing text has to be searched. While it is advantageous to provide independent relations for all common formats, it is likewise possible to use a relation for more than one common format. For example, elements and attributes can be stored together in a first relation, while text is stored in a second relation.
According to a refinement of the invention, the method further comprises the step of storing information allowing recovery of the descriptor structure in the relations. When a query delivers only a single database entry, the complete structure of the descriptor belonging to the specific database entry can be recovered.
Advantageously, the information allowing recovery of the descriptor structure comprises descriptor numbers and relative and/or absolute positions of the portions of a common format within the descriptors. Using this information it is possible to collect the appropriate values from the database and to sort these values in a useful manner. Every time a descriptor is stored in the database, it receives a univocal descriptor number. In addition, for every portion of a common format of the descriptor the relative position within the descriptor and/or the absolute position within the relation is derived. The descriptor numbers and the relative and/or absolute positions are stored in the relations together with the portions of a common format.
Favourably, the information allowing recovery of the descriptor structure further comprises an indicator for the next upper hierarchical level of the portions of the common format within the descriptors. This facilitates a fast reconstruction of descriptor parts by starting from an arbitrary part of the descriptor back (level oriented) to the head of the descriptor. The next upper hierarchical level is a helpful information for reconstructing a descriptor part when only the relative or absolute word position of a portion of a common format is known, for example as a query result.
According to another aspect of the invention, the method further comprises the step of storing a descriptor index in the relational database. Such a descriptor index allows to store additional information for every descriptor and to easily find a specific descriptor in the database.
Advantageously, the descriptor index comprises at least descriptor numbers, absolute positions of the descriptors within the relations and/or unique identifiers for the descriptors. Storing this information in the descriptor index allows fast access to a specific descriptor in the relations. The absolute position of a descriptor within the relations is favourably defined as the absolute position of its first portion of a common format. Since the unique identifiers are often needed, a faster access to this kind of data is provided by storing the unique identifier in the descriptor index. In addition to the mentioned information, other types of information can be stored in the descriptor index, like for example the number of levels of a descriptor or other useful data.
Favourably, the hierarchical data format comprising descriptors corresponds to the extensible markup language. Since XML is widely used and well accepted, this allows a wide range of applications of the inventive method.
According to the invention, the common formats comprise at least elements, attributes and text. These types of common formats are sufficient for many applications. While the elements are mainly used for structuring the descriptors, the text contains the information which is in general searched in a query. Attributes are mostly used for characterising elements.
Favourably, the common format text is further divided into string values and integer values. In this way faster searches can be achieved, since the relations which have to be searched for a query become smaller. A query for a string value, for example, is performed in the relation containing only string values, which contains less elements than a relation containing both string and integer values.
Advantageously, the common formats further comprise namespace information. This feature is especially interesting for XML and allows to prevent collisions between different documents when markup intended for one document uses the same element types or attribute names as another document for different purposes.
Favourably, a database model for mapping a hierarchical data format comprising descriptors to a relational database management system uses a method according to the invention. Such a database model is capable of realizing simple and fast queries, flexible handling of diverse descriptor formats, simple and fast reconstruction of descriptors, and simple and fast insertion of descriptors. In addition, such a database model can easily be implemented with existing relational database management systems.
Advantageously, an apparatus for reading from and/or writing to recording media uses a method or a database model according to the invention for mapping a hierarchical data format comprising descriptors to a relational database management system. Such an apparatus allows to store added value information in an existing relational database. A user of the apparatus can easily use and/or edit the added value information.
For a better understanding of the invention, exemplary embodiments are specified in the following description of advantageous embodiments with reference to the figures, using XML as an example for an hierarchical data format. It is understood that the invention is not limited to these exemplary embodiments and that specified features can also expediently be combined and/or modified without departing from the scope of the present invention. In the figures:
a, 1b show a simplified XML descriptor and its representation as an XML tree,
a, 4b show the representations of an XML descriptor as in
In
In
In
In
The database models shown in the figures have a plurality of advantages, such as:
Number | Date | Country | Kind |
---|---|---|---|
02017045.2 | Jul 2002 | EP | regional |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/EP03/07671 | 7/16/2003 | WO | 1/25/2005 |