Often electronic content data do not consistently adhere to one standard on format, organization, and use in consistent software. For example, each individual content data creator may choose to save electronic content data in various formats including a variety of text formats, document formats, spread sheet formats, presentation formats, visual graphic formats (e.g. chart, graph, map, drawing, image formats), audio formats, multimedia (e.g. video formats) formats, and database formats. Even when content data is encoded using standards-based formats, such as xml, often many different schemas are used. This heterogeneous nature of the electronic content data can pose challenges when the various content need to be re-purposed, re-styled, searched, combined, transformed, rendered or otherwise processed. Existing solutions typically require a user to convert heterogeneous content to a specific format required for desired processing. In some cases, it is difficult for a user to determine both the specific content format and the content formatting application best suited for the desired processing. Many standard tools for format conversion operate at inconsistent semantic levels, or encode an inappropriate semantic level, potentially causing information needed to perform desired content management and/or electronic publishing functions, for example, to be lost. Therefore, there exists a need for a better way to process electronic content.
Various embodiments of the invention are disclosed in the following detailed description and the accompanying drawings.
The invention can be implemented in numerous ways, including as a process, an apparatus, a system, a composition of matter, a computer readable medium such as a computer readable storage medium or a computer network wherein program instructions are sent over optical or electronic communication links. In this specification, these implementations, or any other form that the invention may take, may be referred to as techniques. A component such as a processor or a memory described as being configured to perform a task includes both a general component that is temporarily configured to perform the task at a given time or a specific component that is manufactured to perform the task. In general, the order of the steps of disclosed processes may be altered within the scope of the invention.
A detailed description of one or more embodiments of the invention is provided below along with accompanying figures that illustrate the principles of the invention. The invention is described in connection with such embodiments, but the invention is not limited to any embodiment. The scope of the invention is limited only by the claims and the invention encompasses numerous alternatives, modifications and equivalents. Numerous specific details are set forth in the following description in order to provide a thorough understanding of the invention. These details are provided for the purpose of example and the invention may be practiced according to the claims without some or all of these specific details. For the purpose of clarity, technical material that is known in the technical fields related to the invention has not been described in detail so that the invention is not unnecessarily obscured.
Rendering electronic content in a desired manner, regardless of source format, is disclosed. In some embodiments, an indication of a desired interpretation of a starting content having a first encoding is used to process the starting content to a second encoding of the starting content. The second encoding is determined at least in part by the desired interpretation and rendered using a view associated with the desired interpretation. In various embodiments, an encoding of content is associated with a file format of the content, an organization of data in the content, a structure of data in the content, an attribute of data in the content, a semantic relationship of data in the content, and/or stored/in-memory representation of content.
Desired interpretation 104 in some embodiments includes at least a portion of a specification associated with how to interpret the starting content. For example, a starting content document can be specified to be interpreted as a curriculum vitae (CV), a journal article, a magazine article, a newspaper article, a greeting card, a poster, a presentation, a photo album, an expense form, a financial report, a graph, a flowchart, or a map. In some embodiments, the specification of desired interpretation 104 does not require knowledge of an output encoding format and/or file format of the starting content. Using starting content 102 and desired interpretation 104, content processing 106 is performed. In some embodiments, content processing 106 includes parsing and reconstructing starting content 106 in accordance with desired interpretation 104. For example, starting content 102 is parsed and up-converted into a meta-language encoded representation. Up-conversion includes converting the starting content into a higher semantic encoding, such as by determining with respect to content elements comprising the starting content one or more semantic relationships not encoded in the source content data comprising the starting content 102. In some embodiments, semantic data included, expressly or implicitly, in and/or otherwise associated with, the starting content data is used at least in part to convert the starting content to a higher semantic encoding. Up-conversion includes reconstruction of semantic structure, e.g., the association of letters to form words, words to form paragraphs, etc.; the organization of paragraphs into columns and/or sections/regions, such as caption boxes, sidebars, text inserts, etc.; the use of different heading levels to provide titles, subtitles, etc. The reconstruction, in some embodiments, includes reconstructing the semantic structure of starting content 102 with one or more additional semantic encoding than were available in starting content 102. In some embodiments, the starting content is up-converted using a content profile data that is associated with at least one rule for converting encodings. The profile data, in some embodiments, is associated with the desired interpretation. In some embodiments, formatting and/or text and/or outline hierarchy data is used to up-convert, e.g., by identifying and processing differently text that is in title case or formatted in a manner commonly used to distinguish major headings from other text. By up-converting content data, advanced search functions are possible. For example, when searching an invoice, fields such as “items”, “quantity”, “price” and “description” can be automatically extracted from the starting content for inclusion as fields in a search dialog. Using the names and/or data types of such fields, which are normally part of the schema of the up-converted representation, a custom search dialog can be constructed on the fly that permits a user to apply context-aware searches. In particular, functions and relations can be applied to the search terms to form predicates, and logical relations can be applied to predicates to form propositions. For example, this would enable a user to specify “invoice” interpretation, then search the invoice for “items” whose “quantity” is greater than “10” and whose “price” is greater than “US $100”, or whose “description” contains “diamond”. In this example, “greater than” and “contains” are relations applied to the search terms, while “and” and “or” are logical relations applied to predicates.
Representation data 108 includes converted starting content 102 in an encoding associated with the desired interpretation. In some embodiments, the representation data includes XML (Extensible Markup Language) encoding of the starting content in the desired interpretation. For example, if the desired interpretation is a “Financial Summary,” representation data 108 includes at least a portion of starting content 102 parsed into XML encodings of assets, liabilities, shares, revenue, and expenses. In some embodiments, representation data 108 includes an in-memory representation/encoding of data associated with starting content 102. In some embodiments, starting content 102 includes data already in the encoding of representation data 108. In various embodiments, content processing 106 is responsive to the file format of starting content 102. In various embodiments, interpretation 104 includes a two-way mapping between starting content 102 and representation data 108. Each interpretation may support various starting content encodings and/or various representation data encodings.
Representation data 108 and desired interpretation 104 is used in view generation 110 to produce view output 112. In some embodiments, view output 112 includes a rendering of representation data 108 based on interpretation 104. For example, in various embodiments representation data 108 is converted to output 112 comprising content data in one or more of XHTML+CSS (Extensible HyperText Markup Language+Cascading Style Sheet), SVG (Scalable Vector Graphics), and/or XAML (Extensible Application Markup Language) format using XSLT (Extensible Stylesheet Language Transformation). In some embodiments, the conversion and/or view generated therefrom are determined at least in part by the desired interpretation 104. An example of view output 112 is a “structure view” that graphically shows relationships between data contained in representation data 108 in a manner determined at least in part by desired interpretation 104 (e.g., slides in the case of a presentation interpretation of the content, a matrix of cells in the case of a financial report or other spread sheet interpretations, columns of text with appropriate headings and subheadings in the case of a magazine article interpretation, etc. In some embodiments, interpretation 104 includes one or more associated view representations that can be selected by a user. When various views are selected by a user, representation data 108 is used to render the output associated with the selected view. In some embodiments, desired interpretation 104 specifies a default format for representation data 108 and/or view output 112. In some embodiments, data associated with view output 112 can be saved in a format directly used to render a view output. If desired, representation data 108 is serialized 114 to produce output content 116. In some embodiments, serialization 114 includes saving an interpretation-specific conversion of starting content 102. In some embodiments, serialization 114 is associated with file format conversion.
At 604, a pipeline is configured to perform the desired processing. Configuring the pipeline includes determining, configuring, and/or connecting one or more components to perform the desired processing. In various embodiments, the pipeline is used to manage data flow and/or order of execution associated with the components. In some embodiments under at least certain circumstances the pipeline configuration determines how the output stream(s) of one component feed the input stream(s) of other components. In some embodiments, the pipeline is preconfigured using a configuration file. In some embodiments, the pipeline is configured dynamically. In various embodiments, the components are interconnecting translation components. For example, a generator component parses and maps a binary document to a corresponding XML (Extensible Markup Language) format. A second component sorts the XML format data without changing the schema, and a third component converts the XML document to another XML format associated with a higher level encoding schema.
At 706, for each data flow pattern, a component and/or sets of components are configured to implement the pattern. The component performs at least a portion of processing used to achieve a desired result. At least a portion of the input to one component includes at least a portion of the source content. The component may be reused in other processing. In some embodiments, other content (e.g. component configuration data, user profile, session information, and/or Web service calls) is included as an input to a component. Configuring the component in various embodiments includes determining the input to the component and/or determining data and/or configurations needed to perform a function of the component. In various embodiments, the component is chosen from a group of existing components, and/or an indication that a new component is needed is generated. In some embodiments, the new component is dynamically generated in response to the indication, e.g., by a manual or at least partly automated process. In some embodiments, at least a portion of a component is generated from one or more preexisting components, such as by customizing or configuring an existing component to perform a required translation. In some embodiments, resolving each of the identified phases into the data flow includes connecting one or more components together. In various embodiments, connecting the next component includes associating at least a portion of input data to the next component from an output of one or more other components and/or associating an order to the processing associated with the next component with respect to one or more other components. Connecting the component, in some embodiments, includes determining order, inter-relationships, and/or dependences associated the component. If a component is dependent on another component, a component may not be connected until the dependent component has been determined, configured, and/or connected. The data flow, including pattern/component specification, configuration, and/or connection, is specified using a configuration file and/or specified substantially concurrent while performing at least a portion of the desired processing function.
In some embodiments, using one or more reconfigurable and reusable components, selected in some embodiments from a toolkit group of available components, to implement one or more data flow patterns to achieve desired content processing enables content management solutions to be provided without the use of traditional full-featured content management applications and/or systems. The approaches described herein therefore provide a relatively lightweight solution to problems that in the past may have required more substantial investment in specialized servers and/or full-featured content management software.
By pipelining components, processing the electronic content, in some embodiments, remains highly flexible, adaptable and re-configurable. In some embodiments, additional efficiencies are realized by reusing components and/or pipelines or portions thereof. Unlike monolithic content processing solutions that require redevelopment of the entire system when new formats and desired functions are developed, new components can be added and configured to an already existing pipeline without redevelopment of the entire solution. In various embodiments, this enables easy reuse of components by simply redefining the pipeline to include the new components and, as applicable, new source content (input) and output streams. Change in data source, data format, schema, content management function, component configuration, publishing target, and/or publication styling can be easily adapted through the pipeline redefinition.
Any suitable pipeline technology may be used to implement the pipeline. In some embodiments, the Cocoon framework from Apache Software Foundation of Forest Hill, Md. is used to implement a pipeline for web publishing. The Cocoon pipeline is defined using a “sitemap file” written in an XML grammar. In some embodiments, a batch file is used to implement a pipeline. A batch file, in some embodiments, provides a lightweight way to implement a pipeline. In various embodiments, one or more Ant build files (XML files that encode build instructions for each target) of the Apache Software Foundation is used to implement a pipeline.
In some embodiments, a plug-in and/or add-on to a computer program product for content encoding rendering is used to provide context-awaver content conversion and/or interpretation specific views. The plug-in/add-on is associated with one particular interpretation or a set of particular interpretations. In some embodiments, the plug-in/add-on is associated with an interface for providing information to the host program, which may include such information as the ID and/or name and/or description of the plug-in's supported interpretation(s), the supported input encodings, the supported output encodings, the IDs and/or names and/or descriptions of the supported views, and the names and data types of the supported search fields or categories. In some embodiments, the plug-in/add-on is associated with a software interface for interacting with the host program, which may include a function to invoke processing content according to the chosen interpretation, a function to invoke creation of a view or an encoding of a view that is passed back to the host, and functions to invoke the predicates of a custom search. In some embodiments, the plug-in/add-on is associated with processing the starting content to generate a representation data comprising a second encoding of the starting content, wherein the second encoding is determined automatically and at least in part by the desired interpretation. In some embodiments, the plug-in/add-on is associated with rendering the representation data using a view associated with the desired interpretation. In some embodiments, the plug-in/add-on is associated with determining the search results arising from evaluating a particular predicate associated with the desired interpretation.
Although the foregoing embodiments have been described in some detail for purposes of clarity of understanding, the invention is not limited to the details provided. There are many alternative ways of implementing the invention. The disclosed embodiments are illustrative and not restrictive.
Number | Name | Date | Kind |
---|---|---|---|
6397232 | Cheng-Hung et al. | May 2002 | B1 |
6725426 | Pavlov | Apr 2004 | B1 |
6782531 | Young | Aug 2004 | B2 |
6820094 | Ferguson et al. | Nov 2004 | B1 |
6912529 | Kolfman | Jun 2005 | B1 |
6964011 | Mizuno | Nov 2005 | B1 |
6993711 | Tanaka et al. | Jan 2006 | B1 |
7251777 | Valtchev et al. | Jul 2007 | B1 |
7492949 | Jamieson | Feb 2009 | B1 |
7509572 | Melander et al. | Mar 2009 | B1 |
7937265 | Pasca et al. | May 2011 | B1 |
20010032217 | Huang | Oct 2001 | A1 |
20020046235 | Foy et al. | Apr 2002 | A1 |
20020103835 | Kearney | Aug 2002 | A1 |
20020107913 | Rivera et al. | Aug 2002 | A1 |
20020178171 | Walker et al. | Nov 2002 | A1 |
20020194227 | Day et al. | Dec 2002 | A1 |
20030014447 | White | Jan 2003 | A1 |
20030023634 | Justice et al. | Jan 2003 | A1 |
20030028503 | Giuffrida et al. | Feb 2003 | A1 |
20030084405 | Ito et al. | May 2003 | A1 |
20030093760 | Suzuki et al. | May 2003 | A1 |
20030106021 | Mangrola | Jun 2003 | A1 |
20030110236 | Yang et al. | Jun 2003 | A1 |
20030110442 | Battle | Jun 2003 | A1 |
20030126136 | Omoigui | Jul 2003 | A1 |
20030149934 | Worden | Aug 2003 | A1 |
20030176996 | Lecarpentier | Sep 2003 | A1 |
20030221170 | Yagi | Nov 2003 | A1 |
20040001099 | Reynar et al. | Jan 2004 | A1 |
20040083199 | Govindugari et al. | Apr 2004 | A1 |
20040194009 | LaComb et al. | Sep 2004 | A1 |
20040205452 | Fitzsimons et al. | Oct 2004 | A1 |
20040205616 | Rosenberg et al. | Oct 2004 | A1 |
20040205621 | Johnson et al. | Oct 2004 | A1 |
20040243645 | Broder et al. | Dec 2004 | A1 |
20040243930 | Schowtka et al. | Dec 2004 | A1 |
20040268249 | Fennelly et al. | Dec 2004 | A1 |
20050060648 | Fennelly et al. | Mar 2005 | A1 |
20050066273 | Zacky | Mar 2005 | A1 |
20050132284 | Lloyd et al. | Jun 2005 | A1 |
20050154979 | Chidlovskii et al. | Jul 2005 | A1 |
20050166143 | Howell | Jul 2005 | A1 |
20050187954 | Raman et al. | Aug 2005 | A1 |
20050193334 | Ohashi et al. | Sep 2005 | A1 |
20050203924 | Rosenberg | Sep 2005 | A1 |
20050210374 | Lander | Sep 2005 | A1 |
20050229099 | Rogerson et al. | Oct 2005 | A1 |
20050240876 | Myers et al. | Oct 2005 | A1 |
20060004638 | Royal et al. | Jan 2006 | A1 |
20060101058 | Chidlovskii | May 2006 | A1 |
20060112029 | Estes | May 2006 | A1 |
20060117307 | Averbuch et al. | Jun 2006 | A1 |
20060136809 | Fernstrom | Jun 2006 | A1 |
20060150088 | Kraft et al. | Jul 2006 | A1 |
20060161559 | Bordawekar et al. | Jul 2006 | A1 |
20060271843 | Yarde et al. | Nov 2006 | A1 |
20070011134 | Langseth et al. | Jan 2007 | A1 |
20070027905 | Warren et al. | Feb 2007 | A1 |
20070028166 | Hundhausen et al. | Feb 2007 | A1 |
20070136660 | Gurcan et al. | Jun 2007 | A1 |
20070192687 | Simard et al. | Aug 2007 | A1 |
20070203693 | Estes | Aug 2007 | A1 |
20080016020 | Estes | Jan 2008 | A1 |
20080126080 | Saldanha et al. | May 2008 | A1 |
20080147716 | Omoigui | Jun 2008 | A1 |
Entry |
---|
Gurcan et al.; Converting PDF to XML with Publication-Specific Profiles; Proceedings of the XML Conference 2003—ideaalliance.org, Dec. 2003, pp. 1-11. |
Childlovskii, Supervised Learnding for the Legacy Document Conversion, p. 220-228 (DocEng '04, ACM, 2004). |
Kettler et al., A Template-Based Markup Tool for Semantic Web Content, p. 446-460 (Lecture Notes in Computer Science vol. 3729, Springer-Verlag Berlin Heidelberg, 2005). |
Lum et al., A Context-Aware Decision Engine for Content Adaptation, IEEE Pervasive Computing, Jul.-Sep. 2002, p. 41-49. |
Villard et al., An XML-Based Multimedia Document Processing Model for Content Adaptation, Springer-Verlag Berlin Heidelberg, Lecture Notes in Computer Science vol. 2023, 2004, p. 104-119. |
Kurz et al., FACADE—A Framework for Context-Aware Content Adaption and Delivery, IEEE Proceedings of the Second Annual Conference on Communication Networks and Services Research, 2004, p. 1-10. |
Pradhan et al., Semantic Role Parsing: Adding Semantic Structure to Unstructured Text, IEEE Proceedings of the Third IEEE International Conference on Data Mining, 2003, p. 1-4. |
Philip Andrew Mansfield, Consistent Electronic Publishing from Inconsistent Sources, XML 2004 Conference, Nov. 15-19, 2004, Washington, D.C., U.S.A. |
Number | Date | Country | |
---|---|---|---|
20070250762 A1 | Oct 2007 | US |