This application is based on and hereby claims priority to German Application No. 10 2004 043 269.4 filed on Sep. 7, 2004, the contents of which are hereby incorporated by reference.
A method for encoding an XML-based document and a corresponding decoding method, as well as corresponding encoding and decoding apparatuses are described below.
XML (Extensible Markup Language) is a language which allows a structured description of the contents of a document. In this situation, name spaces may be used, which are defined by XML scheme language definitions. A more accurate description of the XML scheme and of the structures, data types and content models used therein can be found in TR/2001/REC-xmlschema-0-20010502, TR/2001/REC-xmlschema-1-20010502 and TR/2001/REC-xmlschema-2-20010502 from w3.org.
The related art discloses methods for encoding XML-based documents in which the document is converted into an encoded binary representation. By way of example, documents ISO/IEC 15938-1 Multimedia Content Description Interface—Part 1: Systems, Geneva 2002 and ISO/IEC 15938-1:2002/FDAM 1:2004 Multimedia Content Description Interface—Part 1: Systems, Amendment 1: Systems Extensions, which were produced during the development of an MPEG-7 encoding standard, describe methods for encoding and decoding XML-based documents. In this situation, fragments of the XML-based document can be encoded into what are known as Fragment Update Units.
It is frequently necessary to categorize Fragment Update Units on the basis of their content and to store them, for example with such categorization, in tables. This allows fragments in a category to be quickly retrieved when required and to be presented, for example. In this situation, it is advantageous if the categorization requires little computation complexity, since the categorization needs to be performed during reception without specific retrieval besides other tasks of a receiver. By way of example, besides reception, decoding and indication of a broadcast radio transmission, XML fragments are also received which contain program-accompanying information and are quick to categorize. In this situation, it is advantageous if the context information which is used to categorize the fragments is of fixed length, since this can then be read and compared for the categorization with little complexity.
The methods known from the related art for producing a binary representation of XML-based documents have drawbacks with the fast categorization of received fragments. The related art contains methods for signaling context information for the fragments ETSI TS 102 822-3-2: Broadcast and On-line Services: Search, select and rightful use of content on personal storage systems (“TV-Anytime Phase 1”), Part 3: Metadata, Sub-part 2: System Aspects in a Unidirectional Environment and DVB GBS0005r16: Carriage of TVA information in DVB TSs. However, these have the drawback that context information is either variable in length and inefficient with a small number of different fragments, as described in or is a fixed length but limited to fragments predefined in a standard, as described in DVB GBS0005r16: Carriage of TVA information in DVB TSs.
The problem of categorizing of fragments arises with a document which is created using XML language (XML=Extensible Markup Language) and which is represented in a binary format specified on the basis of the MPEG7 standard, what is known as MPEG7-BiM format, for example. With regard to the MPEG7-BiM format of an XML document, reference is made particularly to documents ISO/IEC 15938-1 Multimedia Content Description Interface—Part 1: Systems and ISO/IEC 15938-1:2002/FDAM 1:2004 Multimedia Content Description Interface—Part 1: Systems, Amendment 1: Systems Extensions.
Such representation involves a data stream being produced which is split into a plurality of units (Access Units), which for their part in turn include a plurality of fragments, the aforementioned Fragment Update Units. The units are encoded and, when needed, are sent as an MPEG7-BiM stream to one or more receivers. In this case, the fragments contain context information which is represented with a different number of bits, depending on the fragment content.
The possible fragment content is in this case not limited to a subset of the XML elements which are to be transmitted.
Within the context of TV Anytime (TVA)—a concept which, on the basis of a combination of interactive services such as the Internet with the traditional broadcast such as television, allows a television viewer to view his television program at any desired time, and which is described in more detail in DVB GBS0005r16: Carriage of TVA information in DVB TSs, to which reference is made—a limited number of possible fragment contents is stipulated.
In this case, the volume of possible XML elements in an XML document is stipulated by a name space in DVB GBS0005r16: Carriage of TVA information in DVB TSs, to which reference is made. In addition, the contents of fragments are stipulated as a subset of these XML elements. In this case, the signaling of the context information for these fragments is specified by a code of fixed length. This allows efficient categorization of the received fragments, but the fragmentation is limited to the specified fragment contents. If new information elements need to be transmitted then this is not possible without reallocating codes.
An aspect is to provide a method for encoding and a method for decoding XML-based documents and a corresponding encoding and decoding device which allows improved categorization of fragments in the encoded data stream without restricting the volume of possible fragment contents and allows efficient encoding of the context information.
One advantage which is fundamental is that the categorization can take place more quickly than is the case with methods based on the related art. In this case, this is advantageously achieved without restricting the volume of possible fragments. In addition, this also allows efficient encoding of the context information.
Also described is a method for decoding a data structure, where a data structure encoded using the encoding method described above is decoded.
Also described is a method for encoding and decoding a data structure using the encoding method and decoding method described above.
Also described is an encoding apparatus which can be used to carry out the encoding method, and also a decoding apparatus which can be used to carry out the decoding method. In addition, a corresponding encoding and decoding apparatus is described which can be used to carry out the combined encoding and decoding method described above.
In structured documents, particularly XML documents, the type of information in an XML element or XML attribute of a document is declared by the names of all the father elements and their types. In this situation, the XML elements and XML attributes are arranged in a document tree on the basis of a structured definition.
In the described method for encoding the structured document, all the XML elements, which are root elements of an encoded fragment, are stored in a table according to their name and the name of their father elements, that is to say according to their path. The paths are absolute paths which start at the root node of the document structure tree and lead to an element of the document structure tree which is exclusively contained in a fragment, that is to say a root element of an encoded fragment. This table, called a context path table, is transmitted in advance in order to initialize the decoder. The encoder and decoder associates a context code (ContextCode) of fixed length with every entry in the context path table. Before an encoded fragment is transmitted, the absolute path to the root element of the fragment is signaled as context information by the associated ContextCode. This ContextCode has a fixed length for a transmission. The use of an initialization table allows free selection of the split into fragments during initialization of the transmission, however.
In a further embodiment, the paths are stored in a table and transmitted relative to the preceding path. This allows a reduction in the storage complexity for the table.
In one particularly preferred embodiment, the paths are stored in the table and transmitted in line with the context path (ContextPath) encoding of the MPEG-7 BiM format as described in ISO/IEC 15938-1 Multimedia Content Description Interface—Part 1: Systems and ISO/IEC 15938-1:2002/FDAM 1:2004 Multimedia Content Description Interface—Part 1: Systems, Amendment 1: Systems Extensions. This allows the use of a standardized, widely used structure and a further increase in the reduction in the storage complexity.
If the length of the ContextCodes which is to be associated is signaled explicitly with the context path table, this allows new context paths to be included in the table for a sufficiently large selected length of the context codes during the transmission without altering the length and association of the context codes.
In one preferred embodiment, the context path tables are stored and transmitted repeatedly in the data stream. In this case, the length of the context codes is signaled by variable length codes, for example using variable length unsigned integer most significant bit first “vluimsbf”, as defined in ISO/IEC 15938-1 Multimedia Content Description Interface—Part 1: Systems and ISO/IEC 15938-1:2002/FDAM 1:2004 Multimedia Content Description Interface—Part 1: Systems, Amendment 1: Systems Extensions. This allows receivers dialing into a transmission to categorize fragments immediately and to associate context paths as soon as a context path table is received.
In one preferred embodiment, the context path table only transmits context paths which contain paths to root elements of previously transmitted fragments and fragments which are to be transmitted before the next transmission of the context path table. If there are new paths to root elements of fragments, the context path table is expanded. This method is particularly advantageous for repeated transmission of context path tables, since the context path table only contains necessary information hitherto. This context path table is therefore smaller than those containing paths of all the root elements of fragments of the entire transmission. If the context paths which the context path table contains are not associated with successive context codes then the associated context code needs to be encoded in the context path table in addition to the respective context path.
These and other objects and advantages will become more apparent and more readily appreciated from the following description of the exemplary embodiments, taken in conjunction with the accompanying drawings of which:
Reference will now be made in detail to the preferred embodiments, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to like elements throughout.
In addition, such embedded data or structure elements can also exist in parallel with one another.
The resultant structure in this case is difficult to present in text form from a certain size onward. On the basis of the resultant structure, it is therefore known practice to show a document structured in this way as a tree structure.
Starting from a root node DRE of the document, every node DE1 . . . DE10 can thus be determined or described by an absolute path routed to it. By way of example, the node DE5 is determined by the path resulting from steps A2 and B1.
Taking the tree structure shown as a starting point, the tree representation shown in
This division produces a root element or node FRE1 . . . FRE4 of the respective fragment (subtree) ST1 . . . ST4, which in turn opens out either into remaining elements DE5 . . . DE10 or into value forms, for each subtree ST1 . . . ST4 from a respective one of the elements DE1 . . . DE10 which is exclusively contained in a subtree ST1 . . . ST4.
this case, the subtrees ST1 . . . ST4 can be identified by paths to the root elements FRE1 . . . FRE4 of the subtrees in similar fashion to the method described above.
For transmission, such a document is now normally encoded. This usually produces a (bit) data stream.
In this representation, the data stream is divided into Access Units AU which include a plurality of fragments FUU. In this case, the fragments FUU represent subtrees of an XML document, in line with
By way of example, a context path (ContextPath) CP is represented on the basis of an XPATH notation which is known from the related art, as described by www.w3.org/TR/xpath, and which is obtained from an array, separated by oblique strokes, of the names of a predecessor node (also father node) for its succeeding node(s) (also successor or child node).
In this case, the context path can identify every XML element or attribute of a name space declared in the instance. Normally, however, it is only appropriate to use particular elements or attributes as a root element of a subtree for representing a fragment FUU for a transmission. In addition, context paths with codes of variable length similar to the length of a context path are represented using the XPATH notation. This has drawbacks as described above, however.
Encoding based on the described method provides a way of allowing efficient encoding with context codes of fixed length in the fragments FUU particularly when there are a plurality of fragments with the same context path.
According to the number of entries, the bit length of the context codes CC is determined, which remains constant for the duration of a transmission to a decoder, so that all the entries can be clearly identified. Usually, the bit length is chosen to be (CC)>=1d (number of entries), where 1d is the logarithm base two. The root nodes of the subtrees are signaled in the respective fragments by the value of the context codes CC, which refers to entries in the context path table CPT, which contains the context path CP1 . . . CP4 to the root node.
In the example shown in
A description has been provided with particular reference to preferred embodiments thereof and examples, but it will be understood that variations and modifications can be effected within the spirit and scope of the claims which may include the phrase “at least one of A, B and C” as an alternative expression that means one or more of A, B and C may be used, contrary to the holding in Superguide v. DIRECTV, 358 F3d 870, 69 USPQ2d 1865 (Fed. Cir. 2004).
Number | Date | Country | Kind |
---|---|---|---|
10 2004 043 269.4 | Sep 2004 | DE | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/EP2005/054255 | 8/30/2005 | WO | 00 | 2/19/2008 |