This application is based upon and claims the benefit of priority from the prior Japanese Patent Application No. 2011-231996, filed on Oct. 21, 2011, the entire contents of which are incorporated herein by reference.
An embodiment described herein relates to an EXI (Efficient XML (Extensible Markup Language) Interchange) decoder and a computer readable medium.
EXI is a technique of creating compact binary expression of XML using grammatical knowledge (schema) of XML and is defined by Non-Patent Document 1 (John Schneider and Takuki Kamiya. Efficient XML Interchange (EXI) Format 1.0. W3C Recommendation, March 2011. http://www.w3.org/TR/exi/). In the prior art, there is known a data compression scheme using EXI.
In modes of EXI, a schema-informed grammar generates a state machine indicating state transitions that can be taken by each part in a text, from the schema, and encodes the text using this state machine.
For the purpose of information exchange by EXI, extension schema may be defined in which, with respect to a standard or fundamental schema (i.e. basic schema), a data type is extended by individual vendors. Due to individual definition of the extension schema, state machines (i.e. a set of type grammars) with respect to individual vendor's extension schemas are required and therefore a storage area size such as a ROM size required for implementation increases.
This is a unique problem due to change of a code bit width caused by variation in the number of states of the state machine used for encoding and decoding in EXI.
According to an embodiment, there is provided an EXI decoder which decodes an EXI (Efficient XML (Extensible Markup Language) Interchange)) stream.
The EXI decoder includes a grammar store, a stream input unit and a parser unit.
The grammar store stores a first set of grammars and a second set of grammar.
The first set of type grammars is type grammars generated according to an EXI specification from a basic schema of an XML wherein the first set of type grammars corresponds to types defined in the basic schema, respectively.
The second set of type grammars is type grammars that, among a set of type grammars generated according to the EXI specification from an extension schema of XML, type grammars common to the first set of type grammars are excluded wherein the set of type grammars generated corresponds to types defined in the extension schema, respectively.
The stream input unit receives an EXI stream.
The parser unit decodes the EXI stream, when the EXI stream is compatible with the basic schema, based on the first set of type grammars, and, when the EXI stream is compatible with the extension schema, based on the second set of type grammars and the common type grammars.
Hereinafter, the present embodiment will be described with the accompanying drawings.
The header analysis unit 12 analyzes the header and the header option of the EXI stream and extracts the option of the EXI stream. The option includes a schema Id (schemaId). This schemaId is output to a grammar selection unit 13 and a string table initialization vector selection unit 15.
The grammar store 14 holds all EXI grammars corresponding to all schemas that can be used in the parser unit 17, and grammar set table wherein the grammar set table is information as to which schemaId the individual grammars are used in. The information is formed as a bitmap, etc. Also, regarding some grammars (e.g. grammar called from “xsi:type”), the grammar store 14 holds a table indicating a correspondence relationship between QName (Qualified Name) indicating a type (or Type) and a grammar. Incidentally, the xsi:type is a specification defined by XML-Schema-Instance specification. The xsi:type explicitly specifies a type at which XML element is interpreted. A configuration of the grammar store 14 is illustrated in
In
To be more specific, each type grammar is a state machine (grammar) corresponding to each type. An available grammar(s) is shown for individual schemaId in the form of a bitmap with the schemaId used as a key. Also, in the table on the right side of the figure, a type grammar is looked up using QName (which is a pair of a name space and a name) as a key.
With reference to the schemaId reported from the header analysis unit 12, the grammar selection unit 13 selects a set of grammars to be used and a corresponding part of the grammar set table (i.e. a part of grammar set table corresponding to the schemaId) from the grammar store 14 and sends them to the parser unit 17.
A string table initialization vector store 16 holds all string table initialization vectors that can be used in the parser unit 17 for each of schemaId's. A specific configuration of the string table initialization vector store 16 is realized by, for example, a ROM area in which all the strings (or all string initialization vectors) are stored and references to the strings corresponding to schemaId's.
The string table initialization vector selection unit 15 determines a used string table initialization vector based on the schemaId reported from the header analysis unit 12 and sends it to the parser unit 17.
The parser unit 17 initializes (or overwrites) a string table with the string table initialization vector transmitted from the string table initialization vector selection unit 15, and processes the stream received from the stream input unit 11 using the initialized string table and the grammars and grammar set table received from the grammar selection unit 13. That is, the stream is converted into an event sequence (e.g. a sequence of SAX events) corresponding to an XML document and the converted event sequence is output to an application (not illustrated). The application interprets content of the XML document according to the event sequence and performs operation based on a result of the interpretation.
A specific explanation with respect to the string table and the initialization vector will be given later.
In the following, a structure of an EXI stream, a structure of an EXI stream header, a structure of an EXI grammar, a string table initialization vector and parse processing will be explained.
The EXI stream is formed with the EXI stream header, the header option and a stream corresponding to a text body. The header option is an EXI document (i.e. EXI stream) itself based on a specific schema.
The stream has a structure in which a pair of an event code (EventCode) and a value (Value) is repeated. Document structuration by tags (or elements) in XML is expressed by recursive occurrence of the repetition of a pair of an event code and a value, which corresponds to a sub-element, in the above value part. A configuration example of the EXI stream body will be schematically illustrated in
A structure of the EXI stream header is defined in Section 5 in Non-Patent Document 1. There is a case where the header structure has the EXI option in addition to a fixed-length header part that is necessarily included. Whether there is the EXI option is decided by Presence Bit of the header part. The EXI option itself is an EXI document described with a schema defined by the EXI specification.
Although various types of description are possible in the EXI option, an important element in the present embodiment is schemaId. This schemaId is a character string to report, information that by which schema the original XML document was encoded into the EXI stream, from the EXI decoder on the transmission side to the EXI decoder.
The XML document is converted into an event sequence and the event sequence is encoded into the EXI stream according to the EXI grammar in the EXI encoder wherein the EXI grammar have been generated based on the EXI specification from the schema. The EXI grammar consists of a set of type grammars. A method of generating the EXI grammar from the schema is described in Non-Patent Document 1.
Here, grammars (elements) included in the EXI grammar will be explained. One grammar defines one state machine and is generated for each of types defined in a schema. To be more specific, individual grammars include the following structure.
label of type and state machine corresponding to the label
a set of states (and definition of the initialization state and terminal state) wherein the states are elements forming the state machine
state transition(s) from each state to own or different state wherein the transition(s) is elements forming the state machine
Also, each grammar defines the following for each state transition.
event type (such as SD (StartDocument), SE (StartElement), AT (Attribute), CH (Character), EE (EndElement) and ED (EndDocument))
auxiliary-element with respect to an event (such as a label of a tag forming the XML element and an attribute key)
type of an event value (Terminal in the EXI specification), which indicates a different “type” or a built-in data type such as an integer and a string)
next transition state (NonTerminal in the EXI specification)
The grammar store 14 has a storage area to store a set of type grammars defined in the above format. The type grammars corresponding to individual types are independently stored. Besides, the grammar store 14 has the grammar set table (see
The grammar selection unit 13 reads a corresponding part from the grammar set table based on the schemaId input from the header analysis unit 12 and outputs the corresponding part of the grammar set table and the grammar(s) corresponding to the schemaId to the parser unit 17. The grammar set table includes a reference to each of individual type grammars, and therefore the parser can find a corresponding type grammar according to a pair of a schemaId and QName.
Here, the string table and the initialization vector will be described in detail.
In EXI, the string table is used to avoid retransmission of known character strings.
The string table is a table used to reuse a prescribed character string and a character string present in a document, which are defined in Section 7.3 of Non-Patent Document 1. The string table is initialized into the same content in the encoder and the decoder, respectively, and, in case of transmission of a character string from the encoder to the decoder, the same change is made on the encoder side and the decoder side for the table. The string table is used to refer to, by numbers, a character string appeared in a schema and the same character string appeared in an XML document two times or more. To be more specific, numbers are assigned to character strings appeared in a stream in order and the character strings can reused by their numbers. The number is assigned to a value part corresponding to an event code. Incidentally, regarding a character string to which no number is assigned, the character string itself is included as a value part corresponding to the event code.
The URL (URI) of a name space included in an XML schema used for grammar generation is used to initialize the string table. For example, expression (QName) of a tag name included in a schema is designated by a number using the name space in this initialized string table. Therefore, even in the same grammatical structure, the initial value of the URL included in the string table varies depending on a used schema.
To solve this, a string table initialization vector corresponding to each of individual schemaId's is prepared and stored in a memory (the string table initialization vector store 16). Also, in response to the schemaId of an input EXI stream, a string table initialization vector is selected (the string table initialization vector selection unit 15). The selected string table initialization vector is output to the parser unit 17, and the parser unit 17 initializes the string table by the received string table initialization vector.
The string table will be explained in more detail. The string table is used with four items of (1) URI (URL), (2) prefix, (3) URI and local name in QName and (4) value. For efficient encoding, the string table is divided into the following partitions.
URI: including a character of“URI” and a URI part in QName
prefix: created every URI to which the prefix belongs (which is used only in a specific mode and therefore is not described herein)
local name: a table of local names is created for name space to which the local name belongs
value: dynamically described in both a name space to which an element or attribute of the value belongs and a partition storing a global value
In the following, initialization of the URI partition and the local name partition will be described in detail.
First, a basic schema for explanation will be illustrated in
1. one written order can include an order of multiple (unbounded) types of dishes
2. a dish is designated by its color
An example of the written order based on this schema will be illustrated in
An initialization method of URI partitions is specifically described in Appendix D.1 in Non-Patent Document 1.
According to this, an initialization vector in the basic schema illustrated in
The URI's corresponding to Compact ID's 0 to 3 are constants defined by the specification, and the URI corresponding to the Compact ID 4 or subsequent URI's are name spaces derived from a schema.
Similarly, an initialization vector of a string table corresponding to a local name is created for name space. Regarding an initialization vector derived from the terms of XML, see Appendix D.3 in Non-Patent Document 1. Here, only an initialization vector derived from a schema will be specifically described.
The local names (i.e., an initialization vector) derived from the basic schema illustrated in
Although an explanation has been given using the basic schema as an example, the same applies to the case of an extension schema (described later).
As described above, the string table initialization vector selection unit 15 selects a string table initialization vector (in the above example, each table such as an URI partition and a local name partition) according to the schemaId reported from the header analysis unit 12 and reports it to the parser unit 17. The parser unit 17 initializes the string table by the reported initialization vector.
Parse processing in the parser unit 17 in EXI is performed in the following steps. That is, this corresponds to a pushdown automaton. In the parser unit 17, a grammar setting table corresponding to schemaId is given from the grammar selection unit 13. Since the initial grammar is previously determined by the EXI specification, the decode starts from the initialization state corresponding to the grammar set table, in the following steps.
1. reading data from a stream by a bit width designated by the current grammar and processing this as an event code (which is an event code included in a pair of the above event code (EventCode) and value (Value))
2. reading transition corresponding to the event code from a transition table corresponding to the current state
3. recording an event type and reading a corresponding value (which is a value included in a pair of the above event code (Event Code) and value (Value))
Regarding the “value” in this case, a reading method is defined by a “value type” recorded in the transition.
It should be noted that, in a case where an event type is SE or AT, the corresponding value may indicate a different type grammar itself. At this time, parse processing recursively shifts to the designated type grammar. When the shifted type grammar is terminated, it returns to the current grammar processing. Therefore, a value indicating the transition destination grammar may be referred to as “terminal.”
4. indicating the next state in a case where the current grammar continues (i.e. there is an event in which it should be read). Since the current grammar is not terminated, a value at this time may referred to as “nonterminal.”
Regarding more detailed explanation of the parse processing, see Non-Patent Document 1.
In the following, a specific example of the present embodiment will be shown based on the above basic schema (
As described above,
1. one written order can include an order of multiple (unbounded) types of dishes
2. a dish is designated by its color
Here, it is assumed that, with business expansion, SaucersCo. begins handling patterned dishes. Since the above requirement definition does not include a dish pattern in the written order based on the basic schema, it is not possible to handle dishes of the same color and different patterns. Therefore, it is necessary to extend the schema in any way.
When ordering a patternless dish, an orderer may use a plate tag (plateType type) to describe an XML document. When ordering a patterned dish, the orderer may use a patternedPlate tag (patternedPlateType type) to describe an XML document. A description example of the XML document in the case of ordering a patterned dish will be illustrated in
There is no problem in this scheme in the case of XML processing. However, when EXI is used for the operational efficiency of this order, it is found that there is an inefficient aspect. That is, since a set of type grammars (an EXI grammar) needs to be prepared for each of basic schema and extension schema, the amount of codes that have to be mounted, linearly increases as a number of schemas increases, which is Inefficient.
In EXI operating in the schema-informed grammar, a set of grammars is generated according to type information defined by the XML schema. Each grammar is a state machine and shared between an encoder and a decoder. A different small number is assigned to each of state transitions in the state machine in accordance with a certain scheme. The assigned number is transmitted to the decoder side with a minimal number of bits used. Thereby, document information is shared on the encoder side and the decoder side.
Here, in the extension schema, substance of the order tag (orderType type) changes from that of the basic schema. Accordingly, in a case that the state machine corresponding to the basic schema is used without a change made, decoding is impossible. That is, sharing of information is impossible between the encoder side and the decoder side. To be more specific, in view of the following two points, the decode fails: the number assigned to the state transition changes; and the minimal number of bits changes to express the number of pieces of the total transitions.
For example, when the total transition increases from 4 to 5, the bit number required to express the state changes from 2 bits to 3 bits. This bit number is determined based on a state machine. For this reason, in a case that the state machine is not shared between the encoder side and the decoder side, the bit numbers to be read are mismatched. Therefore, it is not possible to perform the subsequent decode.
Here, based on
Also, in addition to the above order tag change, the basic schema does not include patternedPlateType.
Therefore, it is necessary to hold a set of grammars (state machines) individually corresponding to each of the basic schema in
In the present embodiment, it is decided whether individual grammars are the same between the schemas. The same grammar is not related to a feature of stream of each schema such as schemaId. It is thus determined that the same grammar is shared in the memory between the schemas.
Whether the state machines (type grammars) are the same is decided as follows: provided that state machine X and state machine Y are given, there are states in the Y corresponding to all states in the X, and, with respect to respective pairs of two states among the X and the Y, it is checked whether all transitions are equivalent. If it is true in all of them, it can be said that the two state machines are identical to each other.
The above identity decision is performed on the sets of type grammars of all schemas which are handled, and when there are plural identical type grammars, only one of them is stored in the memory. In this manner, if type grammars corresponding to all schemas are collectively stored, it may become unclear which grammar is used for which schema. Therefore, information as to in which schema each type grammar is present, is stored in the form of a bitmap or the like. The grammar store 14 (
In the case of the present example, among orderType, plateType and patternedPlateType defined in the extension schema, the type grammar corresponding to plateType is common to the basic schema (as seen from the comparison between
Thus, by installing a mechanism to switch grammars, it is possible to minimize the amount of grammars that have to be prepared for many schemas. That is, in an EXI processor switching and using a plurality of schema-informed grammars, a number of common grammars is maximized and used grammars are switched based on a feature of an EXI stream so that it is possible to reduce the program size and the memory usage.
The present embodiment is available for communication of a built-in device. For example, it is assumed that there is a home network protocol that performs communication by performing EXI-coding of a payload defined by the XML schema. In the home network often using a radio, a strict schema-informed grammar that can reduce the payload size is an advantageous scheme. It is necessary to support individual extension schemas for extension. In the related art, the sets of grammars corresponding to all schemas are individually prepared and have to be stored in a memory each time an XML schema is extended. This is inadequate for use in a system with strict resource restriction.
On the other hand, according to the proposed scheme, it is possible to implement an extension schema by only incorporating a grammar(s) corresponding to an extended difference. By this means, it is possible to easily mount various extensions on a built-in device such as lower-price and lower-spec home electronics, sensor and meter.
The EXI grammar size is substantially proportional to the number of state transitions. From this, in a case where there are two grammar sets in which one grammar set shares 95% grammars with the other grammar set and holds extended 5% grammars, it is sufficient to only hold 105% grammars (for example, basic schema 100% and extension schema 5%) in the proposed scheme although, in the related art, it is necessary to hold two grammar sets all.
In the above explanation, although the present embodiment and its advantages have been described with respect to a decoder, the present embodiment is similarly applicable to an encoder. By utilizing state machines (a set of type grammars) of the basic schema, only a state machine (a type grammar) corresponding to an extended type has only to be incorporated to the extension schema. Thereby, it is possible to share the state machines between the basic schema and the extension schema. Therefore it is possible to save the storage area size of the encoder.
As described above, according to the present embodiment, it is possible to implement the extension schema without making change to state machines (type grammars) of the basic schema. As a result, a common part between basic-schema-informed state machines and extension-schema-informed state machines is maximized, the program size and the memory usage can be reduced and the number of built-in device types that can be mounted in an EXI processing system can increase.
The EXI decoder of this embodiment may also be realized using a general-purpose computer device as basic hardware. That is, the stream input unit, the header analysis unit, the grammar selection unit, the string table initialization vector selection unit and the parser unit can be realized by causing a processor mounted in the above described computer device to execute a program. In this case, the EXI decoder may be realized by installing the above described program in the computer device beforehand or may be realized by storing the program in a storage medium such as a CD-ROM or distributing the above described program over a network and installing this program in the computer device as appropriate. Furthermore, the grammar store and the string table initialization vector store may also be realized using a memory device or hard disk incorporated in or externally added to the above described computer device or a storage medium such as CD-R, CD-RW, DVD-RAM, DVD-R as appropriate.
While certain embodiments have been described, these embodiments have been presented by way of example only, and are not intended to limit the scope of the inventions. Indeed, the novel embodiments described herein may be embodied in a variety of other forms; furthermore, various omissions, substitutions and changes in the form of the embodiments described herein may be made without departing from the spirit of the inventions. The accompanying claims and their equivalents are intended to cover such forms or modifications as would fall within the scope and spirit of the inventions.
Number | Date | Country | Kind |
---|---|---|---|
2011-231996 | Oct 2011 | JP | national |