This application is based upon and claims the benefit of priority from Japanese Patent Application No. 2013-060587, filed on Mar. 22, 2013; the entire contents of which are incorporated herein by reference.
Embodiments described herein relate generally to a decoding device and a computer program product.
There has been an increasing trend in the data amount of structured documents in XML and the like, and the structured documents are thus not suitable for high-speed data processing and processing handling a large amount of XML documents. Efficient XML Interchange (EXI) is therefore proposed as a standard for efficient and high-speed data processing. The EXI converts an XML document to an EXI stream that is a binarized representation according to the XML schema. This can contribute to efficient data communication and processing since binarized data are dramatically reduced in data volume.
Furthermore, for actually checking data binarized as described above by a user, the user inputs the EXI stream to a decoding device having the same logic as that of a state machine used to binarize the XML document, and the original XML document is output therefrom. Since the output XML document is written in a natural language, the user can thus check the content thereof.
The EXI stream is encoded or decoded bit-by-bit. Typically, reading and writing of data bit-by-bit cause heavy loads and tend to decrease the processing speed. When a decoding device installed in a server that receives all EXI streams output from numerous devices or a decoding device installed in a low-processing-speed device is assumed, processing by reading and writing data bit-by-bit may not be fast enough.
According to an embodiment, a decoding device includes a decoder, a holding unit, and a retention determiner. The decoder decodes binary data into a structured document according to a state machine that has been used to convert the structured document into binary data. The holding unit includes a first level cache and a second level cache. The first level cache holds a result of decoding the binary data into the structured document by the decoding unit with the binary data. The second level cache holds partial data pieces into which the binary data held by the first level cache is divided in predetermined units of events of the structured document and the result of decoding that corresponds to the partial data pieces. The retention determiner generates the partial data pieces by dividing the binary data held by the first level cache in the predetermined units of events, and storing the generated partial data pieces and the result of decoding corresponding to the partial data pieces into the second level cache. When the binary data that is input includes a part matching a partial data piece held by the second level cache in the predetermined units of events, the decoding unit outputs the result of decoding corresponding to the matching partial data piece held by the second level cache.
An embodiment in which a decoding device is embodied as a smart meter will be described below.
The method of the search is as follows. First, it is searched whether or not there is a partial EXI stream having a state matching the current state of the state machine of the EXI stream decoding unit 301. If there is a matching partial EXI stream, binary data constituting the partial EXI stream and the received binary data are compared, and it is determined that there is a matching partial EXI stream if the binary data match each other in all bits. A partial EXI stream is binary data of part of an EXI stream extracted from the EXI stream. Partial EXI streams are obtained by dividing an EXI stream held by the holding unit 302 in predetermined units of events. The unit of events is the data width of a code corresponding to transition of the state machine used for binarization and decoding of a structured document or the data width of a content of the structured document.
If there is a matching partial EXI stream in a second level cache of the holding unit 302 (step S401: Yes), the EXI stream decoding unit 301 skips the decoding process for the partial EXI stream, uses the decoding result held by the holding unit 302, and resumes decoding of the remaining EXI stream from a state corresponding to a decoding end position (step S402). If there is no matching partial EXI stream (step S401: No), the EXI stream decoding unit 301 performs normal decoding (step S404). The EXI stream decoding unit 301 then determines whether or not decoding of the whole EXI stream is completed (step S403), and terminates the process if completed.
Next, decoding of an EXI stream “1000 0000 1000 0000 1000 0000” will be described first and decoding of an EXI stream “1000 0000 1000 0000 1100 0000” will then be described. Since there is no data in the holding unit 302 when a first EXI stream is to be decoded, the EXI stream decoding unit 301 performs normal decoding from the start to the end according to the decoding rule.
If an XML document obtained by decoding “1000 0000 1000 0000 1000 0000” is:
<B>0</B>
the event and the content that are the decoding result will be:
StartElement(A) StartElement(B) Character(Boolean) Value(0) EndElement EndElement EndDocument.
The EXI stream decoding unit 301 provides the EXI stream and the decoding result with the information on the state machine for the decoding rule having events as transitions to the holding unit 302. The holding unit 302 holds the information in the first level cache. The retention determining unit 303 checks whether data is present in the holding unit 302 at predetermined timing. At this point, data is present only in the first level cache.
The retention determining unit 303 divides the data held in the first level cache into two partial EXI streams as follows:
Partial EXI stream: 10000000100000001
Decoding result: StartElement(A) StartElement(B) Character(Boolean)
Start position: Type=Document, State=init
End position: Type=B, State=Term1
Number of references: 0
Partial EXI stream: 0000000
Decoding result: Value(0) EndElement EndElement EndDocument
Start position: Type=B, State=Term1
End position: Type=Document, State=Term2
Number of references: 0.
The retention determining unit 303 then holds the two partial EXI streams obtained by the division in the second level cache of the holding unit 302. In this case, the data obtained by the division may overlap in such a manner as:
StartElement(A) StartElement(B) Character(Boolean) StartElement(B) Character(Boolean).
If the number of data pieces to be held is to be limited, such a condition that a partial EXI stream having a length equal to or shorter than a threshold is not held may be provided. In this case, since the processing load of decoding a short partial EXI stream is not very heavy, the capacity of the second level cache can be reduced while maintaining the processing efficiency. It is assumed here that there is no second data to be held. “Type”, “State” and the values of the start position and the end position are names used to represent the positions of the state machine for the decoding rule, and any names may be used as long as the positions can be provided. The condition for the division is immediately before a content.
Subsequently, the EXI stream decoding unit 301 decodes the second EXI stream. The EXI stream decoding unit 301 searches whether or not data matching the current decoding position (the EXI stream and the state machine for the decoding rule) is present in the second level cache of the holding unit 302. Out of the second EXI stream “1000 0000 1000 0000 1100 0000”, data up to “1000 0000 1000 0000 1” matches.
The EXI stream decoding unit 301 thus refers to data:
Partial EXI stream: 10000000100000001
Decoding result: StartElement(A) StartElement(B) Character(Boolean)
Start position: Type=Document, State=init
End position: Type=B, State=Term1
Number of references: 0
as matching data in the second level cache and obtains the data as the decoding result.
Subsequently, since there is no decoded partial EXI stream corresponding to the remaining EXI stream “1000000” in the second level cache, the EXI stream decoding unit 301 decodes the remaining EXI stream from the end position.
If the event and the content of the result of decoding the remaining EXI stream “1000000” is:
Value(1) EndElement EndElement EndDocument, the entire decoding result including the referred data will be:
StartElement(A) StartElement(B) Character(Boolean) Value(1) EndElement EndElement EndDocument.
With the decoding device according to the present embodiment as described above, a result of decoding a partial EXI stream is used for decoding an EXI stream if a decoded partial EXI stream having the same bits is present, which allows redundant processing in decoding to be skipped. It is therefore possible to decode an EXI stream more efficiently.
Next, an embodiment in which a decoding device is installed as a home server 102 will be described.
Specifically, it is assumed that the results of decoding three EXI streams:
100000001000000010000000
100000010000000010000000
1000000010000000100000001000000011000000
are:
StartElement(A) StartElement(B) Character(Boolean) Value(0) EndElement EndElement EndDocument
StartElement(A) StartElement(C) Character(Boolean) Value(0) EndElement EndElement EndDocument
StartElement(A) StartElement(B) Character(Boolean) Value(0) EndElement StartElement(B) Character(Boolean) Value(1) EndElement EndElement EndDocument.
In this case, the state corresponding to <B> </B> in an XML document occurs three times. If the threshold is three, partial data containing this state out of the divided EXI stream is held in the second level cache. Alternatively, counting may be performed before dividing the EXI streams and the EXI streams may be divided according to the counting results.
According to the present embodiment, a partial EXI stream that frequently occurs is selectively held, which allows efficient decoding while reducing the cache capacity.
The decoding device according to the embodiments described above includes a control device such as a CPU, a storage device such as a read only memory (ROM) and a random access memory (RAM), an external storage device such as an HDD and a CD drive, a display device such as a display, and an input device such as a key board and a mouse, which is a hardware configuration utilizing a common computer system.
Programs to be executed by the decoding device according to the embodiments described above are recorded on a computer readable recording medium such as a CD-ROM, a flexible disk (FD), a CD-R, and a digital versatile disk (DVD) in a form of a file that can be installed or executed, and provided therefrom.
Alternatively, the programs in the embodiments described above may be stored on a computer system connected to a network such as the Internet, and provided by being downloaded via the network. Still alternatively, the programs to be executed by the decoding device according to the embodiments described above may be provided or distributed through a network such as the Internet. Still alternatively, the programs in the embodiments described above may be embedded on a ROM or the like in advance and provided therefrom.
While certain embodiments have been described, these embodiments have been presented by way of example only, and are not intended to limit the scope of the inventions. Indeed, the novel embodiments described herein may be embodied in a variety of other forms; furthermore, various omissions, substitutions and changes in the form of the embodiments described herein may be made without departing from the spirit of the inventions. The accompanying claims and their equivalents are intended to cover such forms or modifications as would fall within the scope and spirit of the inventions.
Number | Date | Country | Kind |
---|---|---|---|
2013-060587 | Mar 2013 | JP | national |