The instant disclosure relates to computer programs. More specifically, the instant disclosure relates to processing markup language documents.
When working with markup language documents in a computer program, navigating the document's tree structure is inefficient. Many application program interfaces (APIs) such as, for example, document object model (DOM) or ECMAScript for XML (E4X) support random access to open files. When accessing a markup language document, an application needs to have a count of the number of child nodes to a parent node before iterating through the child nodes. Counting nodes often involves accessing each of the child nodes. Reparsing the relevant portion of the source document as each child node is accessed is costly for the computer program.
In object-oriented programming environments such as, for example, JavaScript and C++ the parsed document may conventionally be represented as a collection of linked programming objects. That is, each node or node member such as, for example, attribute or text is addressed as a language-specific object. These objects are conventionally constructed as the document is parsed and before any reference is made by the application program to individual document nodes. While this approach facilitates efficient random access, it is also very likely that only a portion of the document nodes will actually be referenced by the computer application. The object creation overhead for the unreferenced nodes is wasted, which results in decreased application speed when loading files and increased resource usage.
According to one embodiment, a method includes reading a markup language document. The method also includes parsing the markup language document. The method further includes storing an in-memory document model of the markup language document.
According to another embodiment, a computer program product includes a computer-readable medium having code to read a markup language document. The medium also includes code to parse the markup language document. The medium further includes code to store an in-memory document model of the markup language document.
According to yet another embodiment, an apparatus includes a processor and a memory coupled to the processor, in which the processor is configured to read a markup language document from the memory. The process is also configured to parse the markup language document from the memory. The processor is further configured to store an in-memory document model of the markup language document.
The foregoing has outlined rather broadly the features and technical advantages of the present invention in order that the detailed description of the invention that follows may be better understood. Additional features and advantages of the invention will be described hereinafter which form the subject of the claims of the invention. It should be appreciated by those skilled in the art that the conception and specific embodiment disclosed may be readily utilized as a basis for modifying or designing other structures for carrying out the same purposes of the present invention. It should also be realized by those skilled in the art that such equivalent constructions do not depart from the spirit and scope of the invention as set forth in the appended claims. The novel features which are believed to be characteristic of the invention, both as to its organization and method of operation, together with further objects and advantages will be better understood from the following description when considered in connection with the accompanying figures. It is to be expressly understood, however, that each of the figures is provided for the purpose of illustration and description only and is not intended as a definition of the limits of the present invention.
For a more complete understanding of the disclosed system and methods, reference is now made to the following descriptions taken in conjunction with the accompanying drawings.
In one embodiment, the user interface device 110 is referred to broadly and is intended to encompass a suitable processor-based device such as a desktop computer, a laptop computer, a personal digital assistant (PDA) or table computer, a smartphone or other a mobile communication device or organizer device having access to the network 108. In a further embodiment, the user interface device 110 may access the Internet or other wide area or local area network to access a web application or web service hosted by the server 102 and provide a user interface for enabling a user to enter or receive information such as a count request.
The network 108 may facilitate communications of data between the server 102 and the user interface device 110. The network 108 may include any type of communications network including, but not limited to, a direct PC-to-PC connection, a local area network (LAN), a wide area network (WAN), a modem-to-modem connection, the Internet, a combination of the above, or any other communications network now known or later developed within the networking arts which permits two or more computers to communicate, one with another.
In one embodiment, the user interface device 110 access the server 102 through an intermediate sever (not shown). For example, in a cloud application the user interface device 110 may access an application server. The application server fulfills requests from the user interface device 110 by accessing a database management system (DBMS). In this embodiment, the user interface device 110 may be a computer executing a Java application making requests to a JBOSS server executing on a Linux server, which fulfills the requests by accessing a relational database management system (RDMS) on a mainframe server.
In one embodiment, the server 102 is configured to store databases, pages, tables, and/or records. Additionally, scripts on the server 102 may access data stored in the data storage device 106 via a Storage Area Network (SAN) connection, a LAN, a data bus, or the like. The data storage device 106 may include a hard disk, including hard disks arranged in an Redundant Array of Independent Disks (RAID) array, a tape storage drive comprising a physical or virtual magnetic tape data storage device, an optical storage device, or the like. The data may be arranged in a database and accessible through Structured Query Language (SQL) queries, or other data base query languages or operations.
In one embodiment, the server 102 may submit a query to selected data from the storage devices 204, 206. The server 102 may store consolidated data sets in a consolidated data storage device 210. In such an embodiment, the server 102 may refer back to the consolidated data storage device 210 to obtain nodes of a markup language document. Alternatively, the server 102 may query each of the data storage devices 204, 206, 208 independently or in a distributed query to obtain the set of data elements. In another alternative embodiment, multiple databases may be stored on a single consolidated data storage device 210.
In various embodiments, the server 102 may communicate with the data storage devices 204, 206, 208 over the data-bus 202. The data-bus 202 may comprise a SAN, a LAN, or the like. The communication infrastructure may include Ethernet, Fibre-Chanel Arbitrated Loop (FC-AL), Fibre-Channel over Ethernet (FCoE), Small Computer System Interface (SCSI), Internet Small Computer System Interface (iSCSI), Serial Advanced Technology Attachment (SATA), Advanced Technology Attachment (ATA), cloud attached storage, and/or other similar data communication schemes associated with data storage and communication. For example, the server 102 may communicate indirectly with the data storage devices 204, 206, 208, 210; the server 102 first communicating with a storage server or the storage controller 104.
The server 102 may include modules for interfacing with the data storage devices 204, 206, 208, 210, interfacing a network 108, interfacing with a user through the user interface device 110, and the like. In a further embodiment, the server 102 may host an engine, application plug-in, or application programming interface (API).
The computer system 300 also may include random access memory (RAM) 308, which may be SRAM, DRAM, SDRAM, or the like. The computer system 300 may utilize RAM 308 to store the various data structures used by a software application such as markup language documents. The computer system 300 may also include read only memory (ROM) 306 which may be PROM, EPROM, EEPROM, optical storage, or the like. The ROM may store configuration information for booting the computer system 300. The RAM 308 and the ROM 306 hold user and system data.
The computer system 300 may also include an input/output (I/O) adapter 310, a communications adapter 314, a user interface adapter 316, and a display adapter 322. The I/O adapter 310 and/or the user interface adapter 316 may, in certain embodiments, enable a user to interact with the computer system 300. In a further embodiment, the display adapter 322 may display a graphical user interface associated with a software or web-based application. For example, the display adapter 322 may display menus allowing an administrator to input data on the server 102 through the user interface adapter 316.
The I/O adapter 310 may connect one or more storage devices 312, such as one or more of a hard drive, a compact disk (CD) drive, a floppy disk drive, and a tape drive, to the computer system 300. The communications adapter 314 may be adapted to couple the computer system 300 to the network 108, which may be one or more of a LAN, WAN, and/or the Internet. The communications adapter 314 may be adapted to couple the computer system 300 to a storage device 312. The user interface adapter 316 couples user input devices, such as a keyboard 320 and a pointing device 318, to the computer system 300. The display adapter 322 may be driven by the CPU 302 to control the display on the display device 324.
The applications of the present disclosure are not limited to the architecture of computer system 300. Rather the computer system 300 is provided as an example of one type of computing device that may be adapted to perform the functions of a server 102 and/or the user interface device 110. For example, any suitable processor-based device may be utilized including, without limitation, personal data assistants (PDAs), tablet computers, smartphones, computer game consoles, and multi-processor servers. Moreover, the systems and methods of the present disclosure may be implemented on application specific integrated circuits (ASIC), very large scale integrated (VLSI) circuits, or other circuitry. In fact, persons of ordinary skill in the art may utilize any number of suitable structures capable of executing logical operations according to the described embodiments.
According to one embodiment, an in-memory document model may be created from a markup language document while parsing the markup language document to allow efficient random access in a computer program. The model may include small fixed-size memory structures allocated from a single larger memory pool. The model may include the data contained in the markup language document and the hierarchical relationship between the data items in the markup language document. Thus, after initially parsing the markup language document random access to the data of the markup language document is allowed without additional accesses to the file. Additionally, the overhead of proactive language-specific object construction is avoided. When an object-oriented computer program instance references the document model, a language-specific object may be constructed from the model including pointer to an element of the model.
Because markup language node string data lengths may be unknown and variable, an estimate of a string length that will accommodate the majority of items found in document nodes is used when allocating memory space to the data structure. According to one embodiment, if a node string exceeds this length the model node item structure may point to a memory area outside the structure that contains the longer string data item. Otherwise, if the node string is less than the estimated length, the string data may be stored within the node.
At block 606 a model of the markup language document is stored in an in-memory document model. Each XML_ITEM may represent a specific attribute, comment, namespace declaration, processing instruction, tag or text component within the XML document. The XML_ITEM may link its relationship with the other components in the document. The creation of separate “implicit” XML objects may be deferred until they are needed to populate the returned XMLList objects as a result of explicit references to an XML object by the application. Each referenced XML_ITEM may contain a hidden identifier (oxnum) to its implicit XML object to prevent the creation of unnecessary duplicate implicit objects.
An example markup language document is illustrated in
The document model described above may be destroyed through cleanup processing when the various explicit and implicit XML objects are destroyed. As each explicit or implicit XML object is destroyed a callback function may be called to handle cleanup. When less than the entire document model is requested for destruction the callback function may take no action. When an object's XML_INFO bXMLList member is set to true and the pHeadItem has oxnum equal to zero, no processing may occur by the callback function because the object is a previously processed implicit XML object. When neither of these conditions is satisfied the callback function may: set to zero the XML object's XML_INFO pHeadItem XML_ITEM oxnum, release the pPrologHead and pXltemHead lists of XML_ITEMs by calling a relXMLItems( ) helper function, release the XML_INFO structure.
Although the present disclosure and its advantages have been described in detail, it should be understood that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the disclosure as defined by the appended claims. Moreover, the scope of the present application is not intended to be limited to the particular embodiments of the process, machine, manufacture, composition of matter, means, methods and steps described in the specification. As one of ordinary skill in the art will readily appreciate from the present invention, disclosure, machines, manufacture, compositions of matter, means, methods, or steps, presently existing or later to be developed that perform substantially the same function or achieve substantially the same result as the corresponding embodiments described herein may be utilized according to the present disclosure. Accordingly, the appended claims are intended to include within their scope such processes, machines, manufacture, compositions of matter, means, methods, or steps.