This invention relates to a method of and apparatus for processing data containing variants.
A data item or a collection of data items can have a number of purposes. The data may be intended to be instantiated as a document which, for example, may be intended for both paper publishing and publication as a web page on the Internet. Alternatively a document may be published in a number of different countries. The document may have the same content in its various published versions, but its form and appearance may vary. Alternatively parts of the document content may vary to reflect legal or cultural differences in each country.
Such a document requires a number of variants, each of which satisfies the requirements of a particular situation. For example, where a document is published in a number of different countries, a variant for publication in the US may be in English and have “letter” as the paper size, whereby a variant for publication in France would be in French and have A4 as the paper size.
It is inefficient to store electronically a number of separate documents representing the variants of a single document, particularly because the variants will contain a large amount of the same information. Furthermore a change made to one of the variants will not be reflected in the other variants. Therefore management of revisions made to such a document becomes time consuming.
The inventor has realised that existing solutions for dealing with documents where the document may be instantiated differently to different recipients, for example due to language or customs, are deficient.
According to a first aspect of the present invention, there is provided a method of processing data where the data includes or is associated with metadata identifying a plurality of processing options, the processing comprising the steps of:
It is thus possible to arrange for a master data set to be repeatedly processed such that at each processing step the person controlling the processing can cause one or more variants within the data to be permanently bound to or retained within the data whilst allowing the processed version of that data to still contain unbound variants. These remaining unbound variants can be processed at a later stage such that the selected variants become bound to the data during a subsequent processing step. This process can be repeated until no more variants are left and hence a final form of the data has been established.
Alternatively a data set may be processed to expunge (i.e. remove) a variant.
The inventor has realised that handling of such variants could be improved by marking such variants with in a document in such a way as to facilitate automated processing of the document.
The master data can be processed to select different options depending on the purpose of the processing. Where, for example, the data expresses a document (whether it is intended to be printed or displayed on a visual display unit) then the options may include a language selection such that the document becomes customised for its intended recipient or group of recipients.
Of course the data does not have to be a document, and can in general be any selection of data (which may be regarded as a data object) such as audio and video files, streaming data content and executable program code.
Preferably the processor options are orthogonal. In this context this means that the order of selection of the processing options does not impact on the final expression of the data after processing.
According to a second aspect of the present invention there is provided a method of processing data wherein the data includes mark-up data identifying mutually exclusive alternatives within the data, the processing comprising the steps of
According to a third aspect of the present invention there is provided a data processor arranged to process data where the data includes or is associated with metadata and where a plurality of processing options are associated with the data and, wherein during processing the data processor is arranged to alter the data such that where a processing option has been selected variants within the data associated with and selected by the selected option are activated within the data whereas variants within the data associated with but not selected by the selected option are rendered inactive such that the processing cannot be reversed, and the data processor is arranged to maintain variants associated with processing options that have not been selected such that the data may be processed again and a variant may be selected by a further processing option.
Embodiments of the invention will now be described, by way of example only, with reference to the accompanying Figures, in which:
In the context of machine-readable data which is to be processed and/or interpreted by a computer or data processor, a “document” as used hereinafter is a block of data which is not limited to documents to be published and/or read by humans. A document may include or be associated with metadata which provides information about data within a document or portions thereof. Metadata may be embedded within a document's data or may be separated from the document or its data.
A machine readable document is often represented using a mark-up language. A mark-up language comprises metadata tags (mark-ups) embedded within the data. A tag often provides information about data which immediately follows that tag. Mark-up languages which are well-known as of July 2004 include HTML and XML. These mark-up languages use text to represent both tags and data. Other mark-up languages may not use only text, and may be more difficult to interpret by a user without processing.
Documents can be described and stored (i.e. represented) using XML which is a hierarchical mark-up language. XML is extensively used for representing, storing and exchanging data, especially over the Internet. A version of the XML specification, e.g., “Extensible Markup Language (XML) 1.0 (Third Edition)” W3C Recommendation of Feb. 4, 2004, is available from the World Wide Web Consortium (W3C).
An XML document usually comprises a hierarchical arrangement of start tags, end tags and data content. The tags are machine-readable, and so the machine can determine information about the document and its data content. This allows the machine to process and/or interpret the document. An XML element comprises a start tag, an end tag and optionally data content located between the start and end tags. An XML element may also have one or more “child” elements enclosed between its start and end tags. The child elements are hierarchically less significant than the immediately enclosing element which is called the “parent” element of the child elements. The hierarchically most significant element is known as the “root” element, of which there can be only one in a well-formed XML document.
An XML element start tag comprises a name of that element enclosed within angled brackets, for example <element>. The name can be chosen to indicate the nature of the contents of that element to a user, although this is not a requirement. An end tag is identical to its associated start tag, with a forward slash character preceding its name, for example </element>. The contents of the element (actual data content and/or child elements) are physically located between its start and end tags. If an element is empty, i.e. it contains no data or child elements, it may be represented by a single empty-element tag instead of separate start and end tags. An empty element tag is identical to a start tag, except that a forward slash character follows the element name, for example <element/>.
An XML start tag or empty-element tag can also contain one or more attributes. An attribute has a name and an associated value, and conveys some information about that element and/or any data contained therein. An example of an element start tag having an attribute is <element language=“English”>. In this example, the name of the attribute is “language” and its value is “English”. The name of an attribute can be chosen to be descriptive of the information it conveys. An example of an empty-element tag containing an attribute is <element language=“English”/>.
Many document viewers exist which allow a user to view an XML document in a format that is more user friendly than merely viewing the XML document as a text file. Such viewers include Microsoft Internet Explorer. However because elements which contain data in an XML document can have any name chosen by the document creator, the viewer does not automatically know how the data is intended to be displayed, and so the viewer merely displays a hierarchical tree structure representing the XML document.
In order to allow XML documents to be displayed correctly, an Extensible Stylesheet Language Transformation (XSLT) has been developed, a specification of which is available from the W3C, e.g., “XSL Transformations (XSLT) Version 1.0” W3C Recommendation of Nov. 16, 1999. XSLT is a programming language which transforms an XML document into another XML document, or a document capable of being displayed correctly by an appropriate viewer. For example, XSLT can transform an XML document into HTML for correct display using a web browser. XSLT can also be used to filter data, sort data, add data and/or remove data to/from an XML document.
There are a number of software programs available which will perform transformations of XML documents using XSLT. These include up-to-date browsers such as Microsoft Internet Explorer. An open source XSLT processor called “Saxon” is also available from an open source project titled SAXON available over the Internet from the domain saxon.sourceforge.net.
The author of the document has planned that either one of these alternative messages should be given and has structured the content of the document accordingly. Furthermore the author has contemplated that the document may be printed or may be viewed electronically. For this reason the author has included options for the pictures to be rendered in either low resolution for electronic display or in high resolution for printing or zoomed in electronic display.
The XML document shown in
An example of a “variant” element can be seen in
Each “alt” element has two attributes, “type” and “value”. The value of the “type” attribute must match that of the parent “variant” element.
The “value” attribute can be used to specify (or select) one of the “alt” variant elements when a choice is made as to the form that the variant should take. The alt element contains between its start and end tags XML code which represents the form the “variant” element should take when that “alt” element is specified.
The “variant” element in lines 2 to 11 of
When a choice is made, for example, to specify that this “variant” element should take the form given by the “alt” element with a “value” of “A4”, the document is processed such that other variants represented by other “alt” elements are removed. In this example, the “alt” element with a “value” of “letter” in lines 7 to 10 is removed. The start and end tags of the “variant” element and remaining “alt” element are also removed. In other words, the entire “variant” element in lines 2 to 11 of
An XML document can contain more than one module or portion. In
In order to select variants of “variant” elements inside an XML document according to one embodiment of the present invention, a separate XML document is created containing a list of parameters.
Put simply, the document shown in
In order to process the XML document according to an embodiment of the invention and to set the form of “variant” elements contained therein according to the parameters, a process as shown in the flow charts of
The process starts in
From step 102, control passes to step 104. At step 104, the root element of the XML document is processed using the sub-process shown in
From step 104, control passes to step 106 where the process ends.
The sub-process shown in
On the other hand, if no matching parameter was found in step 114, then control passes from step 114 to step 118. At step 118, the whole current “variant” element, including its start and end tags, is copied, processed using the sub-process and added to the output XML document, i.e. the sub-process of
From either step 116 or step 118, control passes to step 120 where the sub-process ends.
If in step 112 it is determined that the current element is not a “variant” element, then control passes from step 112 to step 122. In step 122 a test is performed to determine whether the current element is an “alt” element. If so then control passes to step 124.
In step 124, a test is performed to determine whether the “type” attribute value of the current “alt” element matches the “name” of any of the parameters in the list of parameters. If so then control passes to step 126, where a test is performed to determine whether the “value” of the named parameter matches the “value” of the current “alt” element. In other words, the test determines whether the current “alt” element is selected by a parameter, and is to be activated. If there is a match of “value” attribute values, then control passes to step 128 where the contents of the current “alt” element are copied, processed and added to the output XML document. If there is no match then control passes from step 126 to step 120 where the sub-process ends. In this way, child “alt” elements of a “variant” element which is associated with a parameter are processed, and selected “alt” elements have their contents copied, processed and added to the output XML document, while unselected “alt” elements and their contents are not copied to the output XML document, and their contents are not processed.
If at step 124 it is determined that the “type” of the current “alt” element does not match the “name” of any parameter, then control passes from step 124 to step 130. At step 130 the current “alt” element is copied, processed and added to the output XML document. Therefore any “alt” elements which are not selected by a parameter, and are associated with a “variant” element which is not associated with a parameter, are copied to the output XML document. Their contents however are processed to ensure that any “variant” elements contained therein are processed accordingly. From step 130, control passes to step 120 where the sub-process ends.
If in step 122 it is determined that the current element is not an “alt” element, then control passes to step 132 where the whole element is copied to the output XML document. Control then passes from step 132 to step 120 where the sub-process ends.
The sub-process shown in
Thus the output XML document contains a copy of the original XML document with “variant” elements having been replaced by variants selected by the list of parameters.
Thus during processing of a group of data variants where, in general, the data variants are mutually exclusive, the processed group of data variants is replaced by the variant that had been designated for retention.
Where a group of data variants had not be selected for processing, the group of data variants is retained within the document.
When the process of
On the other hand, the “variant” element with the “type” value “language” (see lines 13 to 36 of
Because the output XML document still includes the “variant” element with a “type” of “papersize” (see lines 3 to 12 of
The output XML document in
The overall process for producing the first and second output XML documents according to the example described above is shown schematically in
The first output XML document 156 is then processed again along with a second list of parameters, shown in
The invention is not limited to only two iterations of the process shown in
An example of an implementation of the process of
The list of parameters may contain parameters to select a variant “alt” element within any one or more of the “variant” elements (modules) within an XML document. When selecting a variant within one “variant” element, there is no requirement that a variant within another “variant” element is selected simultaneously, or has been selected and activated previously.
In another example of the use of the present embodiment of the present invention,
When the process of
The output XML document of
The order in which the two lists of parameters (shown in
If the output XML document of
The communications device 208 is capable of sending and receiving information to/from the internet 214. The computer system 200 is therefore able to exchange information with remote computer systems. Additionally or alternatively the computer system 200 is directly connected to other computer systems and/or a local area network (LAN) of computer systems.
Thus the computer 200 can receive a document, process it in accordance with the method of the present invention and save the output to a file or send it to a further device, such as another computer or a printer.
In a second embodiment of the invention, the “variant” elements are omitted from the XML document containing variants. Instead, the XML document contains variants which are represented only by “alt” elements. The “alt” elements of a single module are still associated with each other as they have identical “type” attribute values.
An example of an XML document containing such variants is shown in
An XSLT program suitable for processing the XML document of
The process according to the second embodiment can be represented by the flow charts of
When the XML document of
In a third embodiment of the invention, each variant module in an XML document may include a default “otherwise” element. In this embodiment, where a parameter is associated with a module, but its “value” does not match the “value” of any of the associated “alt” elements, then the “otherwise” element will be activated, i.e. the output XML document will contain the contents of the “otherwise” element in place of the “variant” element. However if a variant module has no associated parameter, then the output XML document will still contain that module (“variant” element) unchanged, with its contents processed accordingly.
The process according to the third embodiment is shown in the flow charts of
At step 250 a test is performed to determine whether the current element is an “otherwise” element. If so then control passes to step 252. At step 252 a test is performed to determine whether there are any parameters associated with the “variant” element which is the immediate parent of the current “otherwise” element, and if so then whether any of the “alt” elements within the “variant” element are selected by an associated parameter. If so, then an “alt” element has been selected, and therefore the “otherwise” element is not needed. Control in this case passes to step 120 where the sub-process ends.
If however it is determined in step 252 that a parameter associated with the parent “variant” element does not select an “alt” element within the “variant” element, then control passes from step 252 to step 254. At step 254 the contents of the current “otherwise” element are processed and added to the output XML document. Control then passes to step 120 where the sub-process ends.
If it is determined at step 250 that the current element is not an “otherwise” element, then control passes from step 250 to step 132.
If the XML document of
In the above description when a variant is activated, the data within that variant preferably becomes the data associated with the enclosing module or portion. However in another embodiment this is not the case. Data within variants of a module may be encoded to compress the module, and/or to include error correction. Data within the variants may also be encrypted to protect it against unauthorised access before it is processed according to the invention whereby data intended for the recipient can be decrypted.
The present invention can be used for example in the field of document publishing, where the general purpose of a particular document is known, but specific details are not known. For example if a document is created then sent to Europe for publishing, then using the present invention the document can be restricted to A4 paper size, metric unit measurements, and other European requirements for form and content of the documents. Information which is not needed, such as US letter-size paper information, languages outside of Europe and imperial units of measurement, can be discarded. However, information which may or may not be required in a final form of the document remains within the document. For example, the document sent to Europe for publishing may contain both English and French content. Also, any images within the document may be represented by both high and low-resolution versions. When it is finally decided, for example, to release the document in England with high resolution images, the English language and high-resolution information can be fixed within the document using the present invention, and information not required such as low-resolution versions of images can be discarded. The document can then be sent for printing.
The present invention also allows data, for example a document to remain general purpose, but to assume a compact size as the specific purpose becomes known by discarding information not useful to the purpose. This discarding of information can happen any number of times in respect of a single document, each time because some more specific information concerning the purpose of the document has been determined.
The invention can be used in situations other than document distribution. For example, a user may perform a search in a database stored on a remote computer, for example a database containing cars for sale. The user may search for example for cars made by a certain manufacturer. The remote computer in prior art systems performs the search, produces a document containing the results and sends the document to the user for display. When the user wishes to refine the search, the user sends the original search criteria plus further search criteria to the remote computer. The remote computer then performs the search all over again, creates a new document containing the new results and sends this document to the user for display.
If the document containing the first set of search results contains “variant” element modules suitable for processing in accordance with the present invention, the user may specify the further search criteria at his local machine. The results document is then processed locally with respect to the further search criteria, and a document is produced (from the result document) which contains refined search results. In this way the invention requires less transmission bandwidth, and imposes less of a computational burden as a refined search is performed on the latest search results instead of a whole database, and a document containing results does not have to be created anew for every set of refined search results.
In the above description the processing option explicitly defined which data variant was to be retained. However, it is equally valid to operate the invention in a mode where the processing option explicitly defines the or each data variant that is to be expunged from the input file. Thus upon identifying the variant which is to be removed a copy of the input filed can be written to a temporary store or an output file with the selected variant omitted. This is useful where for example a document is intended to be distributed in several languages, but one version of it is being distributed to customers who may use two of the languages but it is desirable to make sure they cannot inadvertently have access to variants expressed in, say, a third language.
Number | Date | Country | Kind |
---|---|---|---|
0424142.8 | Oct 2004 | GB | national |
Number | Name | Date | Kind |
---|---|---|---|
6125391 | Meltzer et al. | Sep 2000 | A |
6611840 | Baer et al. | Aug 2003 | B1 |
6622247 | Isaak | Sep 2003 | B1 |
7039878 | Auer et al. | May 2006 | B2 |
7395500 | Whittle et al. | Jul 2008 | B2 |
7451229 | Klemets et al. | Nov 2008 | B2 |
20020065857 | Michalewicz et al. | May 2002 | A1 |
20040111254 | Gogel | Jun 2004 | A1 |
20050091276 | Brunswig et al. | Apr 2005 | A1 |
Number | Date | Country |
---|---|---|
0241646 | Apr 1992 | EP |
Number | Date | Country | |
---|---|---|---|
20060095837 A1 | May 2006 | US |