IBM® is a registered trademark of International Business Machines Corporation, Armonk, N.Y., U.S.A. Other names used herein may be registered trademarks, trademarks or product names of International Business Machines Corporation or other companies.
1. Field of the Invention
This invention relates to parsing and validation of XML documents, and particularly to a method for representing the XCodes parsing and validation execution plan by using an abstract, platform-independent language.
2. Description of Background
Bytecode-compiled systems use a highly specific stream of instructions to represent an execution plan for parsing and validation. The instructions themselves, in abstract, are not platform or target-language specific, but they are an in-memory representation that is highly tuned for the target platform. This instruction stream, which is produced by a compiler, is passed to the machine in some form.
In typical interpreters (tel, lisp, java, etc.) instruction streams are created in a binary, byte-code form, which is passed to the interpreter, as well as stored on disk for reuse. This method, while straightforward, has the disadvantage of binding the representation of the instructions to the execution environment on which they are used. This close binding between the compiled form and the execution environment imposes rigid constraints on the possible variability of the execution environments. The byte-code interpreters are thus truly virtual machines, with a rigid, virtual environment.
High-level, domain-specific virtual machines, as for example the high-level virtual machine for parsing and validation of xml documents, can be hosted on a wide variety of native environments. Indeed, in the referenced system, separate implementations have been designed for both a directly compiled native environment (written in the C language), and a general purpose, low-level virtual machine (Java language). These two platforms pose divergent strengths and weaknesses, and thus the binary, in-memory instruction stream representations for the two platforms are quite dissimilar.
In addition to the practical difficulties of sharing a binary instruction representation between several target platforms, binary instruction streams are difficult to extend. Since all instructions in a given stream must be at least partially understood by every interpreter (at least enough to disregard them, and to distinguish the difference between unsupported extensions, and corruptions in a completely standard instruction stream), the extension instructions must also be partially understood by every interpreter. For real extensibility to work, then, the instructions must be self-describing. This creates an overhead (in both time and size) for extension instructions, thus limiting their usefulness.
Considering the above limitations, it is desired to have a high-level platform-independent representation of the instruction stream that can be shared among disparate virtual machine implementations, allowing for greater reusability of the byte-code compilation engine.
The shortcomings of the prior art are overcome and additional advantages are provided through the provision of a method for constructing and executing an XCodes execution plan stored in a self-contained, platform-independent format, the method comprising: providing a plurality of Extensible Markup Language (XML) documents each having content, structure, and a plurality of instruction; identifying a language in which the content of the plurality of XML documents is written: converting the language to a set of abstract, platform-independent instructions (XCodes) representing the structure of the plurality of the XML documents, via a compilation step; converting the set of abstract, platform-independent instructions (XCodes) to a highly optimized, platform-specific form via a loading process; mandating an instruction-space allocation; allowing one or more extension instruction into the XCodes execution plan; setting symbolic references to the one or more extension instructions; ignoring the one or more extension instructions having the symbolic references; and constructing implementation-specific tables.
The shortcomings of the prior art are overcome and additional advantages are provided through the provision of a computer program product for constructing and executing an XCodes execution plan stored in a self-contained, platform-independent format, the computer program product comprising: a storage medium readable by a processing circuit and storing instruction for execution by the processing circuit for performing a method comprising: providing a plurality of Extensible Markup Language (XML) documents each having content, structure, and a plurality of instructions; identifying a language in which the content of the plurality of XML documents is written; converting the language to a set of abstract, platform-independent instructions (XCodes) representing the structure of the plurality of the XML documents, via a compilation step; converting the set of abstract, platform-independent instructions (XCodes) to a highly optimized, platform-specific form via a loading process; mandating an instruction-space allocation; allowing one or more extension instructions into the XCodes execution plan; setting symbolic references to the one or more extension instructions; ignoring the one or more extension instructions having the symbolic references; and constructing implementation-specific tables.
Additional features and advantages are realized through the techniques of the present invention. Other embodiments and aspects of the invention are described in detail herein and are considered a part of the claimed invention. For a better understanding of the invention with advantages and features, refer to the description and the drawings.
As a result of the summarized invention, technically we have achieved a solution for representing an XML parsing and validation execution plan by using XCodes, an abstract, platform-independent language.
The subject matter, which is regarded as the invention, is particularly pointed out and distinctly claimed in the claims at the conclusion of the specification. The foregoing and other objects, features, and advantages of the invention are apparent from the following detailed description taken in conjunction with the accompanying drawings in which:
One aspect of the exemplary embodiments is a method for using an XML-based language that represents the XCode instruction set in an abstract form, without inference to its platform specific in-memory representation. Another aspect of the exemplary embodiments is a method wherein this representation is designed, explicitly, to make no such references (instruction numbers, stream offsets, etc.) while still retaining the low-level flow-control model of an instruction set. In yet another exemplary embodiment, a simple loading process converts this form to a platform-optimal representation when execution begins. The XML format provides a built-in extensibility mechanism, as well as simple means of verifying the integrity of the instruction stream.
As an example of the implementation independence, the XCode execution plan uses only symbolic (i.e., named) references to instruction, rather than integer codes, which would mandate a particular instruction-space allocation. Furthermore, all cross-references in the set are symbolic (i.e., indirected by name, rather than by offset). The language is also designed to allow the introduction of optional extension instructions into the execution plan. Through established conventions, these instructions are easily identified, and ignored by interpreter implementations that do not support them, but equally easily integrated into the native instruction stream by interpreters that do.
The embodiment of this language is an XML dialect, which is described below. The XML syntax provides a convenient platform-neutral exchange format, and through the use of namespaces, provides a natural system of conventions to support extensibility. While not required to be materialized on disk (the XML language can easily be realized in memory with any of the usual, standard APIs), the XCodes language also provides a useful physical artifact to cache the results of a compilation that may involve many configuration options, and input schema documents.
On the interpreter side, the XCodes language is converted to a highly optimized, platform-specific in-memory form through a simple loading process, where the symbolic references are resolved, extension instructions are resolved or ignored, and various implementation-specific tables constructed. In this way, the final program representation is carefully optimized for the specific virtual machine, but the compilation technology is not required to be tuned to any specific interpreter.
Referring to
Referring to
Referring to
Referring to
Referring to
Referring to
The native form of the instructions in the java-based interpreter is an array of integer data (instructions and their arguments), which reference a variety of more complicated tabular data (BitVector objects, HashMaps, etc.). This structure is an efficient in-memory representation for the given platform, but not necessarily for all platforms. Furthermore, depending on the characteristics of the particular Java virtual machine being used, a different representation might be more efficient, such as a more object oriented in-memory representation in which each instruction is its own object, and the stream is represented as a series of links between the instruction objects. Neither of these representations is prohibited or favored by the x-code file layout, which means that they can both share the same producer of XCode files (the compiler).
The capabilities of the present invention can be implemented in software, firmware, hardware or some combination thereof.
As one example, one or more aspects of the present invention can be included in an article of manufacture (e.g., one or more computer program products) having, for instance, computer usable media. The media has embodied therein, for instance, computer readable program code means for providing and facilitating the capabilities of the present invention. The article of manufacture can be included as a part of a computer system or sold separately.
Additionally, at least one program storage device readable by a machine, tangibly embodying at least one program of instructions executable by the machine to perform the capabilities of the present invention can be provided.
The flow diagrams depicted herein are just examples. There may be many variations to these diagrams or the steps (or operations) described therein without departing from the spirit of the invention. For instance, the steps may be performed in a differing order, or steps may be added, deleted or modified. All of these variations are considered a part of the claimed invention.
While the preferred embodiment to the invention has been described, it will be understood that those skilled in the art, both now and in the future, may make various improvements and enhancements which fall within the scope of the claims which follow. These claims should be construed to maintain the proper protection for the invention first described.