Data store for software application documents

Abstract
A data store is provided for storing, relating and for allowing use of data associated with a computer-generated document. Data for structuring information associated with a document, such as document metadata, is maintained in the data store where relationships between different pieces of data are maintained in a location distinct from the surface level view of a document. The data store exposes interfaces to the various pieces of data in the data store for allowing different applications to access and operate on one or more of the data pieces. The pieces of data may be structured according to a markup language such as the Extensible Markup Language (XML), and XML schemas may be associated with each piece of data for allowing the data store to validate the structure applied to the data based on an XML schema associated with a given piece of data.
Description
FIELD OF THE INVENTION

The present invention generally relates to managing data associated with computer-generated documents. More particularly, the present invention relates a data store for storing and relating data associated with computer-generated documents in a separate location from presentation data for a document's typical presentation format.


BACKGROUND OF THE INVENTION

With the advent of the computer age, computer and software users have grown accustomed to user-friendly software applications that help then write, calculate, organize, prepare presentations, send and receive electronic mail, make music, and the like. For example, modem electronic word processing applications allow users to prepare a variety of useful documents. Modem spreadsheet applications allow users to enter, manipulate, and organize data. Modem electronic slide presentation applications allow users to create a variety of slide presentations containing text, pictures, data or other useful objects.


According to prior methods and systems, documents created by such applications (e.g. word processing documents, spreadsheets, slide presentation documents) have limited facility for storing/transporting the contents of arbitrary metadata required by the context of the documents. For example, a solution built on top of a word processing document may require the storage of workflow data that describes various states of the document, for example, previous workflow approval states (dates, times, names), current approval states, future workflow states before completion, name and office address of document author, document changes, and the like. According to such prior methods and systems, the options for storing this information were primarily limited to the use of document variables or existing custom object linking and embedding (OLE) document properties that have several limitations. For example, such prior methods can only store name/value pairs (no hierarchical data). Such methods are limited to 255 characters maximum. Such methods are built to contain only text. All properties for such methods are stored in a single store, for example, an OLE properties store, which means the properties have a possibility of conflicting. Further, such stored properties have no data validation because they are plain text. The result of these limitations is that it is difficult for users of such applications and related documents to store arbitrary data with documents, which is a common need of many users.


Another problem with prior methods and systems is that structured markup language data, for example Extensible Markup Language (XML) data may not be concurrently edited by multiple clients (for example, multiple add-ins each independently running in the context of a word processing document.) However, in the context of many documents, there is a higher likelihood that the scenarios involving this metadata will require concurrent editing by one or more sources.


Accordingly, there is a need for a data store for storing and relating data associated with a computer-generated document and for allowing use and manipulation of such data by one or more software applications. It is with respect to these and other considerations that the present invention has been made.


SUMMARY OF THE INVENTION

Embodiments of the invention solve the above and other problems by providing a data store within the document, yet separate in location (and possibly format) from the primary presentation storage location for storing, relating and for allowing use of data associated with a computer-generated document.


According to one aspect of the invention, data for structuring information associated with a document, such as document metadata, is maintained in a data store where relationships between different pieces of data are maintained. The data store exposes interfaces to the various pieces of data in the data store for allowing different applications to access and operate on one or more of the data pieces.


According to another aspect of the invention, the pieces of data are structured according to a markup language such as the Extensible Markup Language (XML). XML schemas may be associated with each piece of data, and the data store may validate the XML structure applied to the data based on an XML schema associated with a given piece of data. According to this aspect of the invention, documents may contain any number of arbitrary data items, for example metadata, structured according to the Extensible Markup Language (XML). Accordingly, document solution providers may store arbitrary metadata as XML with a given document and have that information automatically processed by a given solution having access to the data when the document is opened/edited/saved by a user.


According to another aspect of the invention programmatic access is provided to the data in its XML form while the document is being edited. Thus, a standard mechanism is provided that is familiar to solution developers via which the data may be accessed and modified programmatically while the document is open. This programmatic access mimics standard XML interfaces. Programmatic access to the data is provided via application programming interfaces to one or more editing client applications (for example, document editing or creating applications and/or third party application add-in solutions, and the like). According to this aspect, multiple client applications may access and edit the same piece of document data, and any conflicting changes to a given piece of data are resolved. “Side effects” to any given change may be made (for example, in response to setting a company name to “Microsoft,” changing a stock symbol to “MSFT”). In addition, changes to data and any associated side effects may be “bundled” by the data store so that undoing one or more changes reverses all related changes. This removes the burden of development from the solution itself to ensure that it has reversed all changes when the user initiates an undo of the original change from the document surface, for example, by pressing an Undo command.


According to another aspect of the invention, standard XML schemas (XSDs) may be used to define the contents of any of the pieces of custom XML data associated with document metadata in order to ensure that XML data applied to the document data are valid. These schemas may be attached to any instance of XML data stored in the document, and the data store will disallow any change to the XML data that would result in the XML structure (that is, the XML tags as opposed to their contents) of that data from becoming invalid. This ensures that the solution developer can attach a specific piece of XML metadata to a document and ensure that the XML data will continue to be structurally “correct” according to the associated schema, regardless of which processes (for example, add-ins) are used to modify that data.


These and other features and advantages, which characterize the present invention, will be apparent from a reading of the following detailed description and a review of the associated drawings. It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention as claimed.




BRIEF DESCRIPTION OF THE DRAWINGS


FIG. 1 illustrates an exemplary computing device that may be used in one exemplary embodiment of the present invention.



FIG. 2 is a block diagram illustrating a relationship between one or more client applications and a data store and the contents of the data store according to embodiments of the present invention.




DETAILED DESCRIPTION

As briefly described above, embodiments of the present invention are directed to methods and systems for storing and relating data associated with a computer-generated document and for efficiently allowing use and manipulation of data associated with a computer-generated document by one or more software applications. These embodiments may be combined, other embodiments may be utilized, and structural changes may be made without departing from the spirit or scope of the present invention. The following detailed description is therefore not to be taken in a limiting sense and the scope of the present invention is defined by the appended claims and their equivalents.


With reference to FIG. 1, one exemplary system for implementing the invention includes a computing device, such as computing device 100. In a very basic configuration, computing device 100 typically includes at least one processing unit 102 and system memory 104. Depending on the exact configuration and type of computing device, system memory 104 may be volatile (such as RAM), non-volatile (such as ROM, flash memory, etc.) or some combination of the two. System memory 104 typically includes an operating system 105, one or more applications 106, and may include program data 107. In one embodiment, application 106 may include a word processor application 120. This basic configuration is illustrated in FIG. 1 by those components within dashed line 108.


Computing device 100 may have additional features or functionality. For example, computing device 100 may also include additional data storage devices (removable and/or non-removable) such as, for example, magnetic disks, optical disks, or tape. Such additional storage is illustrated in FIG. 1 by removable storage 109 and non-removable storage 110. Computer storage media may include volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information, such as computer readable instructions, data structures, program modules, or other data. System memory 104, removable storage 109 and non-removable storage 110 are all examples of computer storage media. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by computing device 100. Any such computer storage media may be part of device 100. Computing device 100 may also have input device(s) 112 such as keyboard, mouse, pen, voice input device, touch input device, etc. Output device(s) 114 such as a display, speakers, printer, etc. may also be included. These devices are well know in the art and need not be discussed at length here.


Computing device 100 may also contain communication connections 116 that allow the device to communicate with other computing devices 118, such as over a network. Communication connection 116 is one example of communication media. Communication media may typically be embodied by computer readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave or other transport mechanism, and includes any information delivery media. The term “modulated data signal” means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media. The term computer readable media as used herein includes both storage media and communication media.


A number of program modules and data files may be stored in the system memory 104 of the computing device 100, including an operating system 105 suitable for controlling the operation of a networked personal computer, such as the WINDOWS operating systems from MICROSOFT Corporation of Redmond, Wash. System memory 104 may also store one or more program modules, such as word processor application 120, and others described below. Word processor application 120 is operative to provide functionality for creating, editing, and processing electronic documents.


According to one embodiment of the invention, the word processor application 120 comprises the WORD program from MICROSOFT Corporation. It should be appreciated, however, that word processor application programs from other manufacturers may be utilized to embody the various aspects of the present invention. It should further be appreciated that illustration of a word processing application is for purposes of example only and is not limiting of other types of applications that may produce and operate on documents according to the present invention. For example, other application programs 106 which are capable of processing various forms of content (e.g. text, images, pictures, etc.), such as spreadsheet application programs, database application programs, slide presentation application programs, drawing or computer-aided application programs, etc. are equally applicable to embodiments of the present invention. An example application program 106 that produces and operates on a variety of different types of documents includes OFFICE from MICROSOFT Corporation.


Embodiments of the invention may be implemented as a computer process, a computing system, or as an article of manufacture such as a computer program product or computer readable media. The computer program product may be a computer storage media readable by a computer system and encoding a computer program of instructions for executing a computer process. The computer program product may also be a propagated signal on a carrier readable by a computing system and encoding a computer program of instructions for executing a computer process.


Throughout the specification and claims, the following terms take the meanings associated herein, unless the context of the term dictates otherwise.


The term “data” may refer to document surface level or presentation level information such as words, sentences, paragraphs and the like, as well as, supplementary information, for example, metadata, which is carried with, referred to, or used by the word processing document. This information is often large and is likely not exposed on the presentation layer of the document.


The terms “markup language” or “ML” refer to a language for special codes within a document that specify how parts of the document are to be interpreted by an application. In a word processor file, the markup language specifies how the text is to be formatted or laid out.


The term “element” refers to the basic unit of an XML document. The element may contain attributes, other elements, text, and other content regions for an XML document.


The term “presentation” refers to the visible portion of the document—the text and layout that would appear if the document were printed.


The term “tag” refers to a character inserted in a document that delineates elements within an XML document. Each element can have no more than two tags: the start tag and the end tag. It is possible to have an empty element (with no content) in which case one tag is allowed.


The XML content between the tags is considered the element's “children” (or descendants). Hence other elements embedded in the element's content are called “child elements” or “child nodes” or the element. Text embedded directly in the content of the element is considered the element's “child text nodes”. Together, the child elements and the text within an element constitute that element's “content”.


The term “attribute” refers to an additional property set to a particular value and associated with the element. Elements may have an arbitrary number of attribute settings associated with them, including none. Attributes are used to associate additional information with an element that will not contain additional elements, or be treated as a text node.


“XPath” is an operator that uses a pattern expression to identify nodes in an XML document. An XPath pattern is a slash-separated list of child element names that describe a path through the XML document. The pattern “selects” elements that match the path.


The term “XML data store” refers to a container within a document, such as a word processor document, a spreadsheet document, a slide presentation document, etc., which provides access for storage and modification of the data ( in XML format, for example) stored in the document while the file is open. Further definition of XML data store is provided below with respect to FIG. 2.



FIG. 2 is a block diagram illustrating a relationship between one or more client applications and a data store and the contents of the data store according to embodiments of the present invention. Referring to FIG. 2, the document data 220 includes XML structure data and associated document data representing the surface or presentation level view of a document. For example the document data 220 may include XML structure (e.g., heading tags, body tags, conclusion tags) and associated surface view data (e.g., words, sentences, paragraphs) of a word processing document, spreadsheet document, slide presentation document, and the like.


The data store 208 is a document data repository for storing one or more pieces of structured data associated with one or more types of data associated with a given document. The metadatal 225 (structured data item) may include XML structure data and associated data for a first piece of metadata associated with the document. For example, the metadatal 225 may include XML structure data (e.g., date tags, name tags, etc.) applied to metadata listing the document author, date of document creation, date of document last change/save, and the like. The metadata2230 (structured data item) may include XML structure data (tags) and associated metadata representing a second piece of metadata associated with the document. As should be understood, the metadatal and metadata2 are for purposes of example and are not limiting of the variety and number of different types of data that may be maintained in the data store 208 in association with a given document. For example, as described herein, arbitrary data may be structured and added to the document by one or more software application as desired by solution providers or users having access to the document data.


Referring still to FIG. 2, a schema file 240, 245 may be attached to each piece of data stored in the data store 208 for dictating the syntax and validation rules associated with Extensible Markup Language (XML) data applied to each piece of data 225, 230. As known to those skilled in the art, XML schema files provide a way to describe and validate data in an XML environment. A schema file states what XML markup data, including elements and attributes, are used to describe content in an XML document, and the schema file defines XML markup syntax, including where each element is allowed, what types of content are allowed within an element and which elements can appear within other elements. The use of schema files ensures that the document (or individual piece of data in this case) is structured in a consistent and predictable manner. Schema files 240, 245 may be created by a user and generally supported by an associated markup language, such as XML.


This schematization of the document allows the data store to provide the ability to “guarantee” the structural validity of the document by rejecting any change that violates a given schema file at the data store level. According to an embodiment, the data store 208 utilizes a schema validation module 260 for validating XML structure added to or changes made to a given piece of data against an associated schema file. For example, if a document creator or editor makes XML structural changes to a given piece of data, for example, the metadatal, wherein the editor adds or removes a given XML tag, the data store 208 will utilize the schema validation module to check the XML structural changes against the associated schema file to ensure the validity of the change. If the change is not valid, an error can be generated to the editor. As is understood, such control of the XML structure applied to a given piece of data allows for structural consistency and predictability which is especially important for allowing client and third party applications to interact with associated data.


According to an embodiment of the invention, the data store 208 provides one or more application programming interfaces (API) 270 which can be accessed by client applications 205 (e.g., word processing applications, spreadsheet applications, slide presentation applications, etc.), as well as, third party applications 210, 215 via the object models (OM) of the respective applications 205, 210, 215. These interfaces allow client applications and third party applications to load any existing XML file into a given document's data store 208, thus ensuring that that data is now part of the document and will travel within that document for its lifetime (e.g., through opening/editing/saving/renaming/etc.) or until the data is deleted from the data store. According to one embodiment, the data in the data store is available in its XML format even when a source application for a given piece of data 225, 230 is closed or is otherwise not available. That is, a given piece of data 225, 230 may be accessed via the APIs 270 by other applications (other than a source application). As described below, the APIs also allow client and third party applications to make changes to the XML markup data applied to the data items 225, 230.


Once XML data 225, 230 is loaded into the data store for association with a document 220, it can be manipulated as standard XML using the data store interfaces designed to provide similar methods to existing XML editing interfaces in order to leverage developers' existing knowledge of the XML programming standard. This allows users to perform standard XML operations on XML data added to the data store for a document, such as adding elements and attributes, removing elements and attributes, changing the value of existing elements/attributes, and reading the values of any existing part of the associated XML tree. Using these XML standard operations, solutions may store structured complex metadata with a document subject to none of the previous restrictions on the length/size of the data or structure of the data, which enables the use of this XML data store for significantly more structured solutions than prior solutions. For example, a third party application 215 may be written for locating and extracting document author names and document creation dates from a number of documents 204 by reading the metadatal 225 added to the data store 208 for each document. The example third party application may be a spreadsheet application programmed for making a list of document author names and document creation dates for all documents created by a given organization. In accordance with embodiments of the present invention, the third party application may utilize the XML structure applied to the metadatal for efficiently locating and extracting the desired data. For example, the third party application may be written to parse the XML structure of the metadatal file to locate XML tags, such as <docauthor> and <doccreationdate> for obtaining and using data associated with those tags. As should be appreciated, the forgoing is just one example of the many ways one or more applications may interact with structured data that is associated with the document via the data store 208.


In addition, the data store 208 provides any number of API interfaces 270 to any individual piece of XML data 220, 225, 230 (also known as a store item) to enable multiple applications 205, 210, 215 to work with the same piece of data. For example, several solutions, such as a client application (e.g., word processing application) and third party application solutions (e.g., the example spreadsheet application described above), may work with the same set of document properties (e.g., properties contained in the metadata2230 file). Using the data store 208, each of these applications receive separate access to the desired XML data 230 through their own data store API interface 270 for allowing each application to communicate with the data via its own OM without having to deal with the complexity of having multiple processes accessing the same piece of data.


In order to allow for these multiple applications 205, 210, 215 to access the same data, the data store 208 notifies each of these applications when any part of the XML data is changed by another application so that a given application may respond to that change (both internally to its own process and externally by other changes to the same data). When one application requests a change to a given data item, that request is automatically sent to all other applications to allow other applications to decide how or if to respond to the requested change. According to one embodiment, this is accomplished by allowing each application to register to “listen” to any part of the XML data to which it has an interface so that a given application solution/program only receives those messages which are pertinent to its own logic. For example, one type of application 210 may wish to register to listen to all changes made to a given XML data in order to provide detailed business logic capabilities to a third party solution, but another type of application 215 may wish to only listen to changes to one or two specific XML elements within the same data because its logic does not care about changes to any other part of the XML data.


According to this embodiment, the multiple applications 205, 210, 215 may access and edit the same piece of document data, and any conflicting changes to a given piece of data are resolved. For example, “side effects” to any given change may be made when one change by on application causes a side effect change by another application. For example, a first application 210 may be tasked with extracting company names from one or more data items 225, 230 associated with a given document for translating those names into corresponding stock symbols, if available, for compiling a list of company stock symbols related to a given document. If a second application 215 causes a given company name in a given piece of metadata to be added or to be changed, for example, changing a company name from “Company ABC” to Company XYZ,” the first application may listen to this change for automatically updating its list of stock symbols to include the stock symbol for “Company XYZ” instead of “Company ABC.” In addition, such changes and any associated side effects may be bundled by the data store 208 so that undoing one or more changes reverses all related changes.


As described herein, embodiments of the invention provide a data store for storing, relating and for allowing use of data associated with a computer-generated document. It will be apparent to those skilled in the art that various modifications or variations may be made in the present invention without departing from the scope or spirit of the invention. Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein.

Claims
  • 1. A method of managing data associated with computer-generated documents, comprising: storing a document with an associated document data store; storing a structured data item associated with the document in the document data store; and exposing one or more application programming interfaces (API) to one or more software applications for allowing programmatic access to the structured data item by the one or more software applications.
  • 2. The method of claim 1, further comprising structuring the structured data item according to the Extensible Markup Language (XML).
  • 3. The method of claim 2, further comprising associating with the structured data item an XML schema file for providing XML markup data and XML markup syntax that may be validly applied to the structured data item.
  • 4. The method of claim 3, further comprising receiving a change to an XML markup data applied to the structured data item via the exposed one or more APIs.
  • 5. The method of claim 4, in response to receiving a change to an XML markup data applied to the structured data item, reading an XML schema file associated with the structured data item to which the change to the XML markup data is directed; and determining whether the change to the XML markup data is valid according to the read XML schema file.
  • 6. The method of claim 5, further comprising if the change to the XML markup data is not valid according to the read XML schema file, disallowing the change to the XML markup data.
  • 7. The method of claim 1, further comprising receiving programmatic access to the structured data item associated with the document by one of the one or more software applications via the one or more application programming interfaces.
  • 8. The method of claim 7, further comprising if a change is received to the structured data item via one of the one or more software applications having access to the document and to the structured data item, notifying any other of the one or more software applications having access to the structured data item of the change received to the structured data item.
  • 9. The method of claim 8, further comprising allowing a notified one of the one or more software applications to make changes to the structured data item in response to the change received to the structured data item.
  • 10. The method of claim 9, whereby if the change received to the structured data item is undone by one of the one or more software applications, correspondingly undoing any changes made to the structured data item by any other of the one or more software applications where the any changes made by the other of the one or more software applications were made in response to the undone change.
  • 11. The method of claim 1, prior to storing a structured data item associated with the document in the document data store, receiving the structured data item from one of the one or more software applications via the one or more application programming interfaces (API).
  • 12. The method of claim 1, whereby the structured data item associated with the document includes metadata associated with the computer-generated document.
  • 13. A document data store for managing data associated with computer-generated documents and being operative: to store a structured data item associated with a document in the document data store; and to expose one or more application programming interfaces (API) to one or more software applications for allowing programmatic access to the structured data item by the one or more software applications.
  • 14. The document data store of claim 13, being further operative to receive a change to an Extensible Markup Language (XML) markup data applied to the structured data item via the exposed one or more APIs; and to read an XML schema file associated with the structured data item to which the change to the XML markup data is directed; to determine whether the change to the XML markup data is valid according to the read XML schema file; and to disallow the change to the XML markup data if the change to the XML markup data is not valid according to the read XML schema file.
  • 15. The document data store of claim 13, being further operative to receive programmatic access to the structured data item associated with the document by one of the one or more software applications via the one or more application programming interfaces; and to notify any other of the one or more software applications having access to the structured data item of the change received to the structured data item if a change is received to the structured data item via one of the one or more software applications having access to the structured data item.
  • 16. A computer-readable medium having stored thereon computer-executable instructions which when executed by a computer perform a method of managing data associated with computer-generated documents, comprising: storing a structured data item associated with a document in the document data store; and exposing one or more application programming interfaces (API) to one or more software applications for allowing programmatic access to the stored document and to the structured data item by the one or more software applications.
  • 17. The computer-readable medium of claim 16, further comprising structuring the structured data item according to the Extensible Markup Language (XML); and associating with the structured data item an XML schema file for providing XML markup data and XML markup syntax that may be validly applied to the structured data item.
  • 18. The computer-readable medium of claim 17, further comprising receiving a change to an XML markup data applied to the structured data item via the exposed one or more APIs; reading an XML schema file associated with the structured data item to which the change to the XML markup data is directed; determining whether the change to the XML markup data is valid according to the read XML schema file; and if the change to the XML markup data is not valid according to the read XML schema file, disallowing the change to the XML markup data.
  • 19. The computer-readable medium of claim 16, further comprising receiving programmatic access to the structured data item associated with the document by one of the one or more software applications via the one or more application programming interfaces; if a change is received to the structured data item via one of the one or more software applications having access to the structured data item, notifying any other of the one or more software applications having access to structured data item of the change received to the structured data item; and allowing a notified one of the one or more software applications to make changes to the structured data item in response to the change received to the structured data item.
  • 20. The computer-readable medium of claim 19, whereby if the change received to the structured data item is undone by one of the one or more software applications, correspondingly undoing any changes made to the structured data item by any other of the one or more software applications where the any changes made by the other of the one or more software applications were made in response to the undone change.