PARTITION BASED STRUCTURED DOCUMENT TRANSFORMATION

Information

  • Patent Application
  • 20130290829
  • Publication Number
    20130290829
  • Date Filed
    May 04, 2012
    12 years ago
  • Date Published
    October 31, 2013
    11 years ago
Abstract
Various embodiments of systems and methods for transforming a source structured document to a target document are described herein. A request is received for transforming a source structured document. The source structured document is divided into many portions. A transformation rule is applied one by one on the portions of source structured document to obtain portions of the target document. The target document is generated based on the obtained portions of the target document.
Description

This application claims priority under 35 U.S.C. §119 to Chinese Patent Application 201210126487.9, filed on Apr. 26, 2012, titled “PARTITION BASED STRUCTURED DOCUMENT TRANSFORMATION”, which is incorporated herein by reference in its entirety.


FIELD

Embodiments generally relate to computer systems, and more particularly to methods and systems for transforming structured documents.


BACKGROUND

Several reporting software, such as SAP® Business One, transform business data, received in form of a structured document, into another format according to the requirement of the user. In many cases, instructions may be provided for transforming the structured document into another format. For example, an extensible stylesheet transformation language (XSLT) may be defined to translate an extensible markup language (XML) document, which is a structured document, to another structured or unstructured document form (such as plain text, word processor, spreadsheet, database, pdf, HTML, etc.).


Typically, for transforming an XML document, XSLT builds a Document Object Model (DOM) tree, which has a node corresponding to each element of the XML document. The XSLT then performs the transformation operation on the created DOM tree. A DOM tree consumes memory size linear to the size of XML document. Therefore, if the size of the XML document is more than the available system memory then the transformation process may throw an out of memory exception.





BRIEF DESCRIPTION OF THE DRAWINGS

The claims set forth the embodiments of the invention with particularity. The invention is illustrated by way of example and not by way of limitation in the figures of the accompanying drawings in which like references indicate similar elements. The embodiments of the invention, together with its advantages, may be best understood from the following detailed description taken in conjunction with the accompanying drawings.



FIG. 1 is a block diagram illustrating a method for transforming a source structured document to a target document, according to an embodiment.



FIG. 2 is a detailed flow diagram illustrating a method for transforming a source structured document to a target document, according to an embodiment.



FIG. 3 is an exemplary block diagram illustrating a source structured document index file, according to an embodiment.



FIGS. 4A-B is an exemplary block diagram illustrating a transformation file for transforming the source structured document index file of FIG. 3, according to an embodiment.



FIG. 5 illustrates an exemplary block diagram illustrating an interim result file obtained after transforming the source structured document index file of FIG. 3 using the transformation file of FIG. 4, according to an embodiment.



FIG. 6 is an exemplary block diagram illustrating a first portion of audit data file referred by the first placeholder in the interim result file of FIG. 5, according to an embodiment.



FIG. 7 is an exemplary block diagram illustrating a second portion of audit data file referred by the second placeholder in the interim result file of FIG. 5, according to an embodiment.



FIGS. 8A-B is an exemplary block diagram illustrating a transformation file for transforming the first portion of audit data file of FIG. 6 and the second portion of audit data file of FIG. 7, according to an embodiment.



FIG. 9 is an exemplary block diagram illustrating a first portion of the target document obtained after transforming the first portion of audit data included in the first file of FIG. 6, according to an embodiment.



FIG. 10 is an exemplary block diagram illustrating a second portion of the target document obtained after transforming the second portion of audit data included in the second portion of the audit data file of FIG. 7, according to an embodiment.



FIG. 11 is an exemplary block diagram illustrating a target document obtained based on the first portion of the target document of FIG. 9, the second portion of the target document of FIG. 10, and the interim result file of FIG. 5, according to an embodiment.



FIG. 12 is a block diagram illustrating a computing environment in which the techniques described for fault tolerance based query execution can be implemented, according to an embodiment.





DETAILED DESCRIPTION

Embodiments of techniques for partition based structured document transformation are described herein. In the following description, numerous specific details are set forth to provide a thorough understanding of embodiments of the invention. One skilled in the relevant art will recognize, however, that the invention can be practiced without one or more of the specific details, or with other methods, components, materials, etc. In other instances, well-known structures, materials, or operations are not shown or described in detail to avoid obscuring aspects of the invention.


Reference throughout this specification to “one embodiment”, “this embodiment” and similar phrases, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, the appearances of these phrases in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.



FIG. 1 is a block diagram 100 illustrating a method for transforming a source structured document 102 to a target document 104, according to an embodiment. A structured document is an electronic document structured in accordance with one or more structured definition languages, such as HTML (Hyper Text Markup Language), XML (Extensible Markup Language), or WSDL (Web Service Definition Language). The structured document may be in form of a hierarchical tree of nodes. Each node may have a name, value, and other associated information. For example, consider a source XML document, of a school:











TABLE 1









<?xml version=“1.0” encoding=“UTF-8”?>



<?xml-stylesheet type=“text/xsl” href=“class.xsl”?>



<school>



<class>



           <student>James</student>



<student>Michael</student>



</class>



<address>



    <street> XYZ </street>



</address>



    </school>










The source XML document, shown in Table 1, includes nodes <school>, <class><student>, <address> and <street>. The node <student> has values James and Michael and the node <street> has value XYZ. The transformation operation may be performed on the source structured document 102 to obtain a target document 104 according to the requirement of the user. For transforming the source structured document 102, initially a divide operation 106 is performed on the source structured document 102 to obtain several portions 108 of the source structured document 102. The source structured document 102 may be divided based on the nodes in the source structured document 102. As shown, the source structured document 102 is divided into a portion 1110 and a portion 2112 of the source structured document 102. In the above example, the source structured document, in Table 1, may be divided according to the <class> node and the <address> node to obtain two portions of the source structured document, shown in Table 2 and Table 3, respectively.











TABLE 2









(Portion 1)



         <student>James</student>



         <student>Michael</student>










and











TABLE 3









(Portion 2)    <street> XYZ </street>










Next a transform operation 114 may be performed on the portions of the source structured document 108 to obtain portions 116 of the target document 104. The transform operation 114 may be performed in several steps, at each step one of the portions 108 of the source structured document 102 may be transformed to obtain one of the portions 116 of the target document 104. For example, a transform operation 114 may be performed on the portion 1110 of the source structured document 102 to obtain a portion 1118 of the target document 104. Next a transform operation 114 may be performed on the portion 2112 of the source structured document 102 to obtain a portion 2120 of the target document 104. In one embodiment, transformation rules are defined to transform portions 108 of the source structured document 102 to obtain portions of the target document 116. In the above example, transformation rules may be defined to transform the <student> node, included in the portion 1 of the source structured document, shown in Table 2, to a “FOUND A LEARNER” output and the <street> node, included in the portion 2 of the source structured document, shown in Table 3, to a value of <street> node (XYZ) output. Using the transformation rule, the portion 1 of the source structured document, shown in Table 2, may be first transformed to obtain a portion 1 of the target document, shown in Table 4, which includes:











TABLE 4









(Portion 1) FOUND A LEARNER



  FOUND A LEARNER











Next, the transformation rule may be applied to portion 2 of the source structured document, shown in Table 3, to obtain a portion 2 of the target document, shown in Table 5, which includes.











TABLE 5









(PORTION 2) XYZ










Finally, a generate operation 122 is performed to generate the target document 104 using the obtained portions of the target document 116. The target document 104 may be generated by combining the obtained portion 1118 and the obtained portion 2120 of the target document 104. In the above example, the portion 1, shown in Table 4, and portion 2 of the target document, shown in Table 5, are combined to obtain the target document, shown in Table 6, which includes:











TABLE 6









FOUND A LEARNER



FOUND A LEARNER



XYZ











FIG. 2 is a detailed flow diagram 200 illustrating a method for transforming a source structured document to a target document, according to an embodiment. Initially at block 202 a request is received for transforming the source structured document. The source structured document may be structured in accordance with any one of the standard structured document languages such as XML, HTML, WSDL, and the like. The source structured document, for example an XML document, may have a tree structure, which may have a root object, branch objects, and leaf objects. Each of the root object, branch object, and leaf object are nodes of the structured document. The node may have a corresponding node value or any other information associated with the node. The source structured document may include source data, such as business data, or any other data. For example, a source XML document, shown in Table 7, representing business data of several companies includes:









TABLE 7







−<business data>


−<companies>


−<company 1>


<company name> HAPPY COMPANY </company name>


<location> SUNNYVALE </location>


    </company 1>


    −<company 2>


<company name> WINNER INC. </company name>


      <location> NEW JERSEY </location>


    </company 2>


  </companies>


    −<customer>


      <customer name> BIG CUSTOMER </customer name>


  </customer>


</business data>









In the above example, <business data> is a root node, <companies>, <company 1>, <company 2> and <customer> are branch nodes, and <company name>, <location>, and <customer name> are leaf nodes. The node <company name> has values HAPPY COMPANY and WINNER INC., the node <location> has values SUNNYVALE and NEW JERSEY, and the node <CUSTOMER NAME> has value BIG CUSTOMER.


Next at block 204 nodes in the source structured document are selected, for partitioning the source structured document. A user may select one or more nodes from the source structured document, based on which the user wants to divide the source structured document. A display tool may be provided which displays the source structured document to the user. The user may select one or more nodes from the displayed source structured document. In one embodiment, the system may allow the user to select only some of the nodes in the structured document for dividing the source structured document. For example, the user may be allowed to select only the root node and the branch nodes but not the leaf nodes. In the above example, the source structured document may be presented to the user. The user may be allowed to select only the root node <business data> and the branch nodes <companies>, <company 1>, <company 2>, and <customer>. Assume that the user selects the branch nodes <companies>, <company 1> and <company 2>.


The selected nodes may be defined as placeholders for the source structured document. A placeholder is a node that refers to a portion of the source structured document. In one embodiment, the placeholder is a node that indirectly refers to a portion of the source structured document. In this case, the placeholder may refer to another placeholder that refers to a portion of the source structured document. In the above example, the selected nodes <companies>, <company 1>, and <company 2> are defined as placeholders of the source structured document. The placeholder <company 1> and the placeholder <company 2> may refer to a portion 1 and a portion 2, respectively, of the source structured document shown in Table 8 and Table 9, respectively.









TABLE 8







(portion 1) <company name> HAPPY COMPANY </company name>


  <location> SUNNYVALE </location>


















TABLE 9









(portion 2) <company name> WINNER INC. </company name>



          <location> NEW JERSEY </location>










The placeholder <companies> indirectly refers to the portion 1 and portion 2 of the source structured document, which means that the placeholder <companies> refers to the placeholder <company 1> and the placeholder <company 2> which refers to portion 1 and portion 2, respectively, of the source structured document. Next at block 206, the source structured document may be divided based on the portions of the source structured document referred by the defined placeholders. Dividing the source structured document may include storing each of the portions of the source structured document, referred by the defined placeholders, as separate structured document files. The placeholder may refer to the structured document file storing the portion of the source structured document. In the above example, the source structured document is divided into two portions, a portion 1 referred by the placeholder <company 1> and a portion 2 referred by the placeholder <company 2>. A first structured document file (file 1.xml) and a second structured document file (file 2.xml) storing the portion 1 and the portion 2, respectively, of the source structured document may be created. The placeholder <company 1> and the placeholder <company 2> may refer to the first structured document file (file 1.xml) and the second structured document file (file 2.xml), respectively. Next at block 208, a source structured document index file is created which stores the defined placeholders, included in the source structured document, and a remaining portion of the source structured document, which is not referred by any of the placeholders. In the above example, the remaining portion of the source structured document, shown in Table 10, which is not referred by the placeholders <companies>, <company 1>, and <company 2>, includes:











TABLE 10









−<customer>



<customer name> BIG CUSTOMER </customer name>



</customer>











The obtained source structured document index file, is shown in Table 11, which includes:









TABLE 11







−<companies> (placeholder companies)


−<company 1> (placeholder company 1)


    −<company 2> (placeholder company 2)


  (remaining portion of the source structured document)


    −<customer>


      <customer name> BIG CUSTOMER </customer name>


    </customer>









In one embodiment, the source structured document index file may be directly created based on the source data, such as business data. In this case, the user selection may be received to define placeholders that refer to different portions of the source data. The source structured document index file in this case may include the defined placeholders and the remaining portion of the source data, which is not referred by any of the placeholders. Next at block 210, the source structured document index file obtained at block 208 is transformed to obtain an interim result. Transformation is a process of transforming an input structured document, on which the transformation is to be applied, to obtain an output target document. In one embodiment, the target document may be in a structured or unstructured document form (such as plain text, word processor, spreadsheet, database, pdf, HTML, etc.). For example, the source structured document may be in XML and the target document may be in XML or HTML, or any other format depending on the requirement of the user. The transformation operation may be performed by transformation files, which include transformation rules for transforming the source structured document to the target document. For example, an XML source document may be transformed into an XML target document using an Extensible Stylesheet Language Transformation (XSLT) file. The XSLT transformation may be performed by an XSLT processor which takes as input an XML source document, and an XSLT stylesheet and produces the target document. The XSLT stylesheet contains a collection of transformation rules, which are instructions and other directives that guide the processor in the production of the target document. The XSLT stylesheet may include transformation rules corresponding to the different nodes in the source structured document. The XSLT processor may perform the transformation by matching the nodes in the source structured document with the transformation rules, in the XSLT stylesheet, and applying the corresponding transformation rules to the node. The system may store several transformation files for performing different transformations. For example, different transformation files may be stored in the system to transform the source structured document index file, and the portions of the source structured document referred by the placeholders.


For transforming the source structured document index file, the transformation rules may be applied on the source structured document index file to transform the remaining portion of the source structured document, included in the source structured document index file, to obtain a remaining portion of the target document. An interim result, which includes the remaining portion of the target document and the placeholders in the source structured document index file, may be obtained after the transformation operation on the source structured document index file. In the above example, a source structured document index transformation file for transforming the source structured document index file may include:











TABLE 12









<xsl: template match = “Customer”>



    <xsl: apply_template select = “Customer Name”/>



</xsl: template>



<xsl: template match = “Customer Name”>



<xsl: value of select = “.”/>



</xsl: template>










As shown in Table 12, the source structured document index transformation file includes the transformation rule <xsl:template match>, which checks whether a node of the source structured document is a “customer name”. In case the node is a “customer name” node, the transformation rule <xsl: value of select=“.”/>, included in the source structured document index transformation file extracts the value of the node (“Customer Name”), from the source structured document, and places the extracted node value in the output structured document (interim result, in this case). In the above example, the value of the <customer name> node, BIG CUSTOMER, is placed in the interim result, shown in Table 13, obtained after transforming the source structured document index file. The interim result obtained after applying the transformation rule to the source structured document index file, shown in Table 13, includes:









TABLE 13







−<companies> (placeholder companies)


−<company 1> (placeholder company 1)


        −<company 2> (placeholder company 2)


    BIG CUSTOMER (transformation result of remaining portion of


source structured document)









As shown the interim result, shown in Table 13, includes the transformation result of the remaining portion of the source structured document (remaining portion of the target document) and the placeholders (<companies>, <company 1>, and <company 2>) defined in the source structured document. Next at block 212 the interim result is traversed to identify the placeholders in the interim result. In the above example, the interim result is traversed to identify three placeholders <companies>, <company 1>, and <company 2>. Next at block 214 the portion of the source structured document referred by the placeholders, included in the interim result, are retrieved for transforming these portions of the source structured document (block 216). In one embodiment, the structured document files storing the portion of the source structured document referred by the placeholders, in the interim result file, are retrieved one by one for performing the transformation operation. The portions of the source structured document may be transformed using the transformation rules, included in the transformation files, to obtain portions of the target document. The transformation operation may be performed in several steps; at each step one of the portions of the source structured document, referred by the placeholders, may be loaded in a memory of the system for performing a transformation operation on the portion of the source structured document. The transformation operation may be repeated until all the portions of the source structured document, referred by the placeholders, are transformed to obtain the portions of the target document. For example, suppose that the interim result file includes three placeholders referring three different portions of the source structured document. The first portion of the source structured document referred by the first placeholder in the interim result may be retrieved and loaded in the memory. The transformation operation may be applied on the first portion of the source structured document to obtain a first portion of the target document. After obtaining the first portion of the target document, the second portion of the source structured document referred by the second placeholder in the interim result may be retrieved for transforming the second portion of the source structured document to a second portion of the target document. Finally after obtaining the second portion of the target document, the third portion of the source structured document referred by the third placeholder in the interim result file may be retrieved for transforming the third portion of the source structured document.


Loading the placeholder one by one in the memory, for performing the transformation operation, ensures that the memory consumed depends on the complexity of transforming the portion of source structured document and not based on the size of the source structured document. As discussed above, different transformation files may be stored in the system for transforming different portions of the source structured document. In the above example, as the portion 1 and the portion 2 of the source structured document referred by the placeholder <company 1> and <company 2>, respectively, include similar elements, a single transformation file may be used for transforming the portion 1 and the portion 2 of the source structured document. The single transformation file, shown in Table 14, may include:











TABLE 14









<xsl: template match = “Company”>



    <xsl: apply_template select = “Company Name”/>



</xsl: template>



<xsl: template match = “Company Name”>



<xsl: value of select = “.”/>



</xsl: template>










The first portion of the source structured document, shown in Table 8, may be loaded in the memory and the transformation rules in the transformation file may be applied on the first portion of the source structured document to obtain a first portion of the target document, shown in Table 15, which includes:











TABLE 15









HAPPY COMPANY



SUNNYVALE











After obtaining the first portion of the target document, the second portion of the source structured document, shown in Table 9, may be loaded in the memory and the transformation rules in the single transformation file, shown in Table 9, may be applied on the second portion of the source structured document, shown in Table 16, to obtain a second portion of the target document, which includes:











TABLE 16









WINNER INC.



NEW JERSEY










Finally at block 218, a target document is generated based on the portions of the target document obtained at block 216 and the interim result obtained at block 210. In one embodiment, the target document may be obtained by combining the portions of the target document obtained at block 216 and the remaining portion of the target document in the interim result obtained at block 210. The target document may be generated by replacing the portion of the source structured document with the corresponding portions of the target document. In the above example, the first portion, the second portion and the remaining portion of the source structured document are replaced by the first portion of the target document shown in Table 15, the second portion of the target document shown in Table 16, and the remaining portion of the target document included in the interim result shown in Table 13, to obtain the target document, shown in Table 17, which includes:











TABLE 17









HAPPY COMPANY



SUNNY VALE



WINNER INC.



NEW JERSEY



BIG CUSTOMER











FIG. 3 is an exemplary block diagram illustrating a source structured document index file 300, according to an embodiment. As discussed above, the source structured document index file may be created directly from a source data, such as a business data. The source structured document index file 300 is created based on an audit data of a company. The source structured document index file 300 includes a first placeholder 302 that has a placeholder id as “N12711” and a second placeholder 304 that has a placeholder id as “N1286.” The first placeholder 302 and the second placeholder 304 may refer to different portions of the audit data as selected by the user. The source structured document index file 300 may also include a remaining portion 306 of the audit data of the company, which is not referred by the first placeholder 302 and the second placeholder 304. The source structured document index file 300 includes nodes <company>, <companyName>, <streetAddress>, <streetname>, <city>, <transactions>, <totalDebit>, <totalCredit>, <journal>, <desc>, <jrnTp>, and <subledgers>.



FIGS. 4A-B is an exemplary block diagram illustrating a transformation file 400 for transforming the source structured document index file 300 of FIG. 3, according to an embodiment. The transformation file 400 may include transformation rules for each of the nodes of the source structured document. For example, the transformation rule 402 for a node “companyName”, included in the source structured document index file 300 of FIG. 3, is defined to retrieve the node “companyName” and the value of the node “companyName” (ABC CORP.) from the source structured document index file 300 of FIG. 3 and places it in a target document (which, in this case is an interim result file). Similarly, the transformation file 400 includes transformation rules for retrieving other nodes and their corresponding values from the source structured document index file 300 of FIG. 3 and placing them in the target document (interim result file).



FIG. 5 illustrates an exemplary block diagram illustrating an interim result file 500 obtained after transforming the source structured document index file 300 of FIG. 3 using the transformation file 400 of FIG. 4, according to an embodiment. The interim result file 500 includes the transformation result 502 of the remaining portion of the audit data 306 included in the source structured document index file 300 of FIG. 3. The interim result file 500 also includes the first placeholder 302 and the second placeholder 304 included in the source structured document index file 300 of FIG. 3. Next, the interim result file 500 is traversed to identify the placeholders, the first placeholder 302 and the second placeholder 304, in the interim result file 500. The first placeholder 302 and the second placeholder 304 may refer to a first portion and a second portion of the audit data.



FIG. 6 is an exemplary block diagram illustrating a first portion of audit data file 600 referred by the first placeholder 302 in the interim result file 500 of FIG. 5, according to an embodiment. The first portion of audit data file 600 may store a first portion of the audit data referred by the first placeholder 302 of FIG. 5. As shown, the first portion of audit data file 600 stores a transaction data of a report, which includes a number node (<nr>), a description node (<desc>), a period number node (<period number>), a transaction date node (<trDt), and a source ID node (<sourceID>).



FIG. 7 is an exemplary block diagram illustrating a second portion of audit data file 700 referred by the second placeholder 302 in the interim result file 500 of FIG. 5, according to an embodiment. The second portion of audit data file 700 may store a second portion of the audit data referred by the second placeholder 302 of FIG. 5. As shown, the second portion of audit data file 700 stores information of a sub-ledger in the audit data, which includes subledger type (<nr>), a total debit node (<totalDebit>), and total credit node (<totalCredit>).



FIGS. 8A-B is an exemplary block diagram illustrating a transformation file 800 for transforming the first portion of audit data file 600 of FIG. 6 and the second portion of audit data file 700 of FIG. 7, according to an embodiment. The transformation file 800 includes transformation rules 802 for transforming the first portion of the audit data stored in the file 600 of FIG. 6. The transformation file 800 also includes transformation rule 804 for transforming the second portion of audit data stored in the file 700 of FIG. 7. In one embodiment, separate transformation files may be used for transforming the first portion of audit data file 600 of FIG. 6 and the second portion of audit data file 700 of FIG. 7. Initially, the first portion of the audit data file 600 storing the first portion of the audit data may be loaded in the memory of the system. Next, the transformation rules may be applied on the first portion of the audit data to obtain a first portion of a target document. Next, the second portion of audit data file 700 of FIG. 7 storing the second portion of the audit data may be loaded in the memory of the system. Finally, the transformation rule may be applied on the second portion of the audit data to obtain a second portion of the target document. The transformation file include transformation rules (xsl: for each select=“text ( )><xsl: value of select=“.”/> for each of the nodes in the first portion of the audit data included in the file 600 of FIG. 6 and the second portion of the audit data included in the file 700 of FIG. 7. These transformation rules retrieve the nodes and the corresponding node values from the first portion of audit data file 600 of FIG. 6 and the second portion of audit data file 700 of FIG. 7 and place it in a first portion and a second portion of the target document, respectively.



FIG. 9 is an exemplary block diagram illustrating a first portion of the target document 900 obtained after transforming the first portion of audit data included in the first portion of the audit data file 600 of FIG. 6, according to an embodiment. The first portion of the target document 900 is obtained by applying the transformation rules 802 included in the transformation file 800 of FIG. 8A-B to the first portion of the audit data included in the file 600 of FIG. 6. The first portion of target document 900 includes the nodes and the corresponding value of nodes retrieved, after performing the transformation operation, from the first portion of the audit data file 600 of FIG. 6.



FIG. 10 is an exemplary block diagram illustrating a second portion of the target document 1000 obtained after transforming the second portion of audit data included in the second portion of the audit data file 700 of FIG. 7, according to an embodiment. The second portion of the target document 1000 is obtained by applying the transformation rules 804 included in the transformation file 800 of FIG. 8A-B to the second portion of the audit data included in the file 700 of FIG. 7. The second portion of target document 1000 includes the nodes and the corresponding value of nodes retrieved, after performing the transforming operation, from the second portion of the audit data file 700 of FIG. 7.



FIG. 11 is an exemplary block diagram illustrating a target document 1100 obtained based on the first portion of the target document 900 of FIG. 9, the second portion of the target document 1000 of FIG. 10, and the interim result file 500 of FIG. 5, according to an embodiment. The target document 1100 may be obtained by combining the first portion of the target document 900 of FIG. 9, the second portion of the target document 1000 of FIG. 10, and the transformation result of remaining portion of the audit data 502 included in the interim result 500 of FIG. 5. In one embodiment, the target document 1100 is obtained by replacing the remaining portion of the audit data 306, the first placeholder 302, and the second placeholder 304 included in the source structured document index file 300 of FIG. 3 with the transformation result of the remaining portion of audit data 502 included in the interim result of FIG. 5, the first portion of the target document 900 of FIG. 9, and the second portion of the target document 1000 of FIG. 10, respectively.


Some embodiments of the invention may include the above-described methods being written as one or more software components. These components, and the functionality associated with each, may be used by client, server, distributed, or peer computer systems. These components may be written in a computer language corresponding to one or more programming languages such as, functional, declarative, procedural, object-oriented, lower level languages and the like. They may be linked to other components via various application programming interfaces and then compiled into one complete application for a server or a client. Alternatively, the components maybe implemented in server and client applications. Further, these components may be linked together via various distributed programming protocols. Some example embodiments of the invention may include remote procedure calls or web services being used to implement one or more of these components across a distributed programming environment. For example, a logic level may reside on a first computer system that is remotely located from a second computer system containing an interface level (e.g., a graphical user interface). These first and second computer systems can be configured in a server-client, peer-to-peer, or some other configuration. The clients can vary in complexity from mobile and handheld devices, to thin clients and on to thick clients or even other servers.


The above-illustrated software components are tangibly stored on a computer readable storage medium as instructions. The term “computer readable storage medium” should be taken to include a single medium or multiple media that stores one or more sets of instructions. The term “computer readable storage medium” should be taken to include any physical article that is capable of undergoing a set of physical changes to physically store, encode, or otherwise carry a set of instructions for execution by a computer system which causes the computer system to perform any of the methods or process steps described, represented, or illustrated herein. Examples of computer readable storage media include, but are not limited to: magnetic media, such as hard disks, floppy disks, and magnetic tape; optical media such as CD-ROMs, DVDs and holographic devices; magneto-optical media; and hardware devices that are specially configured to store and execute, such as application-specific integrated circuits (“ASICs”), programmable logic devices (“PLDs”) and ROM and RAM devices. Examples of computer readable instructions include machine code, such as produced by a compiler, and files containing higher-level code that are executed by a computer using an interpreter. For example, an embodiment of the invention may be implemented using Java, C++, or other object-oriented programming language and development tools. Another embodiment of the invention may be implemented in hard-wired circuitry in place of, or in combination with machine readable software instructions.



FIG. 12 is a block diagram of an exemplary computer system 1200. The computer system 1200 includes a processor 1202 that executes software instructions or code stored on a computer readable storage medium 1222 to perform the above-illustrated methods of the invention. The computer system 1200 includes a media reader 1216 to read the instructions from the computer readable storage medium 1222 and store the instructions in storage 1204 or in random access memory (RAM) 1206. The storage 1204 provides a large space for keeping static data where at least some instructions could be stored for later execution. The stored instructions may be further compiled to generate other representations of the instructions and dynamically stored in the RAM 1206. The processor 1202 reads instructions from the RAM 1206 and performs actions as instructed. According to one embodiment of the invention, the computer system 1200 further includes an output device 1210 (e.g., a display) to provide at least some of the results of the execution as output including, but not limited to, visual information to users and an input device 1212 to provide a user or another device with means for entering data and/or otherwise interact with the computer system 1200. Each of these output devices 1210 and input devices 1212 could be joined by one or more additional peripherals to further expand the capabilities of the computer system 1200. A network communicator 1214 may be provided to connect the computer system 1200 to a network 1220 and in turn to other devices connected to the network 1220 including other clients, servers, data stores, and interfaces, for instance. The modules of the computer system 1200 are interconnected via a bus 1218. Computer system 1200 includes a data source interface 1208 to access data source 1224. The data source 1224 can be accessed via one or more abstraction layers implemented in hardware or software. For example, the data source 1224 may be accessed by network 1220. In some embodiments the data source 1224 may be accessed via an abstraction layer, such as, a semantic layer.


A data source is an information resource. Data sources include sources of data that enable data storage and retrieval. Data sources may include databases, such as, relational, transactional, hierarchical, multi-dimensional (e.g., OLAP), object oriented databases, and the like. Further data sources include tabular data (e.g., spreadsheets, delimited text files), data tagged with a markup language (e.g., XML data), transactional data, unstructured data (e.g., text files, screen scrapings), hierarchical data (e.g., data in a file system, XML data), files, a plurality of reports, and any other data source accessible through an established protocol, such as, Open DataBase Connectivity (ODBC), produced by an underlying software system (e.g., ERP system), and the like. Data sources may also include a data source where the data is not tangibly stored or otherwise ephemeral such as data streams, broadcast data, and the like. These data sources can include associated data foundations, semantic layers, management systems, security systems and so on.


In the above description, numerous specific details are set forth to provide a thorough understanding of embodiments of the invention. One skilled in the relevant art will recognize, however that the invention can be practiced without one or more of the specific details or with other methods, components, techniques, etc. In other instances, well-known operations or structures are not shown or described in details to avoid obscuring aspects of the invention.


Although the processes illustrated and described herein include series of steps, it will be appreciated that the different embodiments of the present invention are not limited by the illustrated ordering of steps, as some steps may occur in different orders, some concurrently with other steps apart from that shown and described herein. In addition, not all illustrated steps may be required to implement a methodology in accordance with the present invention. Moreover, it will be appreciated that the processes may be implemented in association with the apparatus and systems illustrated and described herein as well as in association with other systems not illustrated.


The above descriptions and illustrations of embodiments of the invention, including what is described in the Abstract, is not intended to be exhaustive or to limit the invention to the precise forms disclosed. While specific embodiments of, and examples for, the invention are described herein for illustrative purposes, various equivalent modifications are possible within the scope of the invention, as those skilled in the relevant art will recognize. These modifications can be made to the invention in light of the above detailed description. Rather, the scope of the invention is to be determined by the following claims, which are to be interpreted in accordance with established doctrines of claim construction.

Claims
  • 1. A computer implemented method for transforming a source structured document to a target document, the method comprising: receiving a request to transform the source structured document;receiving a selection of a first node and one or more further nodes in the source structured document, the first node defined as a first placeholder, and the one or more further nodes defined as one or more further placeholders;based on the received selection, dividing, by a processor of the computer, the source structured document into a first portion and one or more further portions of the source structured document, the first placeholder referring to the first portion of the source structured document, the one or more further placeholders referring to the one or more further portions of the source structured document;based on a transformation rule, transforming, by the processor of the computer, the first portion of the source structured document to obtain a first portion of the target document;based on the transformation rule, transforming, by the processor of the computer, the one or more further portions of the source structured document to obtain one or more further portions of the target document; andgenerating, by the processor of the computer, the target document based on the obtained first portion of the target document and the obtained one or more further portions of the target document.
  • 2. The computer implemented method according to claim 1, wherein dividing the source structured document includes: based on the defined first placeholder and the one or more further placeholders, dividing, by the processor of the computer, the source structured document into the first portion and the one or more further portions of the source structured document.
  • 3. The computer implemented method according to claim 2, further comprising: storing, in a memory of the computer, the first portion of the structured document as a first structured document file, the first placeholder referring the first structured document file;storing, in the memory of the computer, the one or more further portions of the source structured document as one or more further structured document files, the one or more further placeholders referring the one or more further structured document files; andstoring the first placeholder, the one or more further placeholders, and a remaining portion of the source structured document in a source structured document index file.
  • 4. The computer implemented method according to claim 3, wherein transforming the first portion of the structured document includes: based on the transformation rule, transforming, by the processor of the computer, the source structured document index file to obtain an interim result file, the interim result file including the first placeholder, the one or more further placeholders, and a transformation result of the remaining portion of the source structured document;identifying, by the processor of the computer, the first placeholder included in the interim result file;retrieving, from the memory of the computer, the first structured document file referred by the identified first placeholder; andbased on the transformation rule, transforming, by the processor of the computer, the first structured document file to obtain the first portion of the target document.
  • 5. The computer implemented method according to claim 4, wherein transforming the one or more further portions of the structured document includes: identifying, by the processor of the computer, the one or more further placeholders included in the interim result file;retrieving, from the memory of the computer, the one or more further structured document files referred by the one or more further placeholders; andbased on the transformation rule, transforming, by the processor of the computer, the one or more further structured document files to obtain the one or more further portions of the target document.
  • 6. The computer implemented method according to claim 1, wherein generating the target document includes: replacing, by the processor of the computer, the first portion of the source structured document with the obtained first portion of the target document; andreplacing, by the processor of the computer, the one or more further portions of the source structured document with the obtained one or more further portions of the target document.
  • 7. A computer implemented method for transforming a source structured document to a target document, the method comprising: receiving a selection of a first node in the source structured document, the selected first node, defined as a first placeholder, referring a first portion of the source structured document;receiving a selection of one or more further nodes in the source structured document, the selected one or more further nodes defined as one or more further placeholders referring one or more further portions of the source structured document;based on the defined first placeholder and the defined one or more further placeholders, dividing, by the processor of the computer, the source structured document into the first portion and the one or more further portions;storing, the first placeholder, the one or more further placeholders, and a remaining portion of the source structured document in a source structured document index file;based on a transformation rule, transforming, by the processor of the computer, the source structured document index file to obtain an interim result file including the first placeholder, the one or more further placeholders, and a transformation result of the remaining portion of the source structured document;identifying, by the processor of the computer, the first placeholder included in the interim result file;retrieving, from the memory of the computer, the first portion of the source structured document referred by the first placeholder;based on the transformation rule, transforming, by the processor of the computer, the first portion of the source structured document to obtain a first portion of the target document;identifying, by the processor of the computer, the one or more further placeholders included in the interim result file;retrieving, from the memory of the computer, the one or more further portions of the source structured document referred by the one or more further placeholders;based on the transformation rule, transforming, by the processor of the computer, the one or more further portions of the source structured document to obtain one or more further portions of the target document; andgenerating, by the processor of the computer, the target document based on the first portion of the target document, the one or more further portions of the target document, and the transformation result of the remaining portion of the source structured document.
  • 8. The computer implemented method according to claim 7, further comprising: storing, in a memory of the computer, the first portion of the source structured document as a first structured document file, the first placeholder referring the first structured document file; andstoring, in the memory of the computer, the one or more further portions of the source structured document as one or more further structured document files, the one or more further placeholders referring the one or more further structured document files.
  • 9. The computer implemented method according to claim 8, wherein transforming the first portion of the source structured document includes: retrieving, from the memory of the computer, the first structured document file referred by the identified first placeholder included in the interim result file; andbased on the transformation rule, transforming, by the processor of the computer, the first structured document file to obtain the first portion of the target document.
  • 10. The computer implemented method according to claim 9, wherein transforming the one or more further portions of the source structured document includes: retrieving, from the memory of the computer, the one or more further structured document files referred by the one or more further placeholders included in the interim result file; andbased on the transformation rule, transforming, by the processor of the computer, the one or more further structured document files to obtain the one or more further portions of the target document.
  • 11. An article of manufacture including a computer readable storage medium to tangibly store instructions, which when executed by a computer, cause the computer to: receive a request to transform a source structured document;receive a selection of a first node and one or more further nodes in the source structured document, the first node defined as a first placeholder, and the one or more further nodes defined as one or more further placeholders;based on the received selection, divide the source structured document into a first portion and one or more further portions of the source structured document, the first placeholder referring to the first portion of the source structured document, the one or more further placeholders referring to the one or more further portions of the source structured document;based on a transformation rule, transform the first portion of the source structured document to obtain a first portion of a target document;based on the transformation rule, transform the one or more further portions of the source structured document to obtain one or more further portions of the target document; andgenerate the target document based on the obtained first portion of the target document and the obtained one or more further portions of the target document.
  • 12. The article of manufacture according to claim 11, further comprising instructions which when executed by the computer further causes the computer to: based on the defined first placeholder and the one or more further placeholders, divide the source structured document into the first portion and the one or more further portions of the source structured document.
  • 13. The article of manufacture according to claim 12, further comprising instructions which when executed by the computer further causes the computer to: store the first portion of the structured document as a first structured document file, the first placeholder referring the first structured document file;store the one or more further portions of the source structured document as one or more further structured document files, the one or more further placeholders referring the one or more further structured document files; andstore the first placeholder, the one or more further placeholders, and a remaining portion of the source structured document in a source structured document index file.
  • 14. The article of manufacture according to claim 13, further comprising instructions which when executed by the computer further causes the computer to: based on the transformation rule, transform the source structured document index file to obtain an interim result file, the interim result file including the first placeholder, the one or more further placeholders, and a transformation result of the remaining portion of the source structured document;identify the first placeholder included in the interim result file;retrieve the first structured document file referred by the identified first placeholder; andbased on the transformation rule, transform the first structured document file to obtain the first portion of the target document.
  • 15. The article of manufacture according to claim 14, further comprising instructions which when executed by the computer further causes the computer to: identify the one or more further placeholders included in the interim result file;retrieve the one or more further structured document files referred by the one or more further placeholders; andbased on the transformation rule, transform the one or more further structured document files to obtain the one or more further portions of the target document.
  • 16. A computer system for transforming a source structured document to a target document, the computer system comprising: a memory to store a program code; anda processor communicatively coupled to the memory, the processor configured to execute the program code to: receive a request to transform the source structured document;receive a selection of a first node and one or more further nodes in the source structured document, the first node defined as a first placeholder, and the one or more further nodes defined as one or more further placeholders;based on the received selection, divide the source structured document into a first portion and one or more further portions of the source structured document, the first placeholder referring to the first portion of the source structured document, the one or more further placeholders referring to the one or more further portions of the source structured document;based on a transformation rule, transform the first portion of the source structured document to obtain a first portion of the target document;based on the transformation rule, transform the one or more further portions of the source structured document to obtain one or more further portions of the target document; andgenerate the target document based on the obtained first portion of the target document and the obtained one or more further portions of the target document.
  • 17. The system of claim 16, wherein the processor further executes the program code to: based on the defined first placeholder and the one or more further placeholders, divide the source structured document into the first portion and the one or more further portions of the source structured document.
  • 18. The system of claim 17, wherein the processor further executes the program code to: store the first portion of the structured document as a first structured document file, the first placeholder referring the first structured document file;store the one or more further portions of the source structured document as one or more further structured document files, the one or more further placeholders referring the one or more further structured document files; andstore the first placeholder, the one or more further placeholders, and a remaining portion of the source structured document in a source structured document index file.
  • 19. The system of claim 18, wherein the processor further executes the program code to: based on the transformation rule, transform the source structured document index file to obtain an interim result file, the interim result file including the first placeholder, the one or more further placeholders, and a transformation result of the remaining portion of the source structured document;identify the first placeholder included in the interim result file;retrieve the first structured document file referred by the identified first placeholder; andbased on the transformation rule, transform the first structured document file to obtain the first portion of the target document.
  • 20. The system of claim 19, wherein the processor further executes the program code to: identify the one or more further placeholders included in the interim result file;retrieve the one or more further structured document files referred by the one or more further placeholders; andbased on the transformation rule, transform the one or more further structured document files to obtain the one or more further portions of the target document.
Priority Claims (1)
Number Date Country Kind
201210126487.9 Apr 2012 CN national