Markup Languages have attained wide popularity in recent years. One type of markup language, Extensible Markup Language (XML), is a universal language that provides a way to identify, exchange, and process various kinds of data. For example, XML is used to create documents that can be utilized by a variety of application programs. Elements of an XML file have an associated namespace and schema.
In XML, a namespace is a unique identifier for a collection of names that are used in XML documents as element types and attribute names. The name of a namespace is commonly used to uniquely identify each class of XML document. The unique namespaces differentiate markup elements that come from different sources and happen to have the same name.
XML Schemata provide a way to describe and validate data in an XML environment. A schema states what elements and attributes are used to describe content in an XML document, where each element is allowed, what types of text contents are allowed within it and which elements can appear within which other elements. The use of schemata ensures that the document is structured in a consistent manner. Schemata may be created by a user and generally supported by an associated markup language, such as XML. By using an XML editor, the user can manipulate the XML file and generate XML documents that adhere to the schema the user has created. XML documents may be created to adhere to one or more schemata.
The XML standard is by many considered the ASCII format of the future, due to its expected pervasiveness throughout the hi-tech industry in the coming years. Recently, some word-processors have begun producing documents that are somewhat XML compatible. For example, some documents may be parsed using an application that understands XML.
In XML, it is necessary to maintain a well formed document. Generally, this means that tags within the XML document do not overlap. There are a number of features in word processors, however, that are allowed to span arbitrary ranges. These features include features such as comments, bookmarks, document protection, and the like. What is needed is a way to represent these features in XML.
The present invention is directed towards representing non-structured features that are common with word-processors such that these elements can be recognized and parsed separately from other elements within an XML document.
According to one aspect of the invention, non-structured features are represented as well formed in XML. Some of the features that may span arbitrary ranges include features such as comments, bookmarks, document protection, and the like.
According to another aspect of the invention, empty tags are used to mark the start and end of a feature that may span other features. These elements can be recognized and parsed separately from other elements.
According to yet another aspect of the invention, the word-processing documents may be parsed by applications that understand XML. The XML word-processing documents may be manipulated on a server, or anywhere even when the word-processor creating the XML document is not present.
Throughout the specification and claims, the following terms take the meanings explicitly associated herein, unless the context clearly dictates otherwise.
The terms “markup language” or “ML” refer to a language for special codes within a document that specify how parts of the document are to be interpreted by an application. In a word-processor file, the markup language specifies how the text is to be formatted or laid out, whereas in a particular customer schema, the ML tends to specify the text's meaning according to that customer's wishes (e.g., customerName, address, etc.) The ML is typically supported by a word-processor and may adhere to the rules of other markup languages, such as XML, while creating further rules of its own.
The term “element” refers to the basic unit of an ML document. The element may contain attributes, other elements, text, and other building blocks for an ML document.
The term “tag” refers to a command inserted in a document that delineates elements within an ML document. Each element can have no more than two tags: the start tag and the end tag. It is possible to have an empty element (with no content) in which case one tag is allowed.
The content between the tags is considered the element's “children” (or descendants). Hence other elements embedded in the element's content are called “child elements” or “child nodes” or the element. Text embedded directly in the content of the element is considered the element's “child text nodes”. Together, the child elements and the text within an element constitute that element's “content”.
The term “attribute” refers to an additional property set to a particular value and associated with the element. Elements may have an arbitrary number of attribute settings associated with them, including none. Attributes are used to associate additional information with an element that will not contain additional elements, or be treated as a text node.
Illustrative Operating Environment
With reference to
Computing device 100 may have additional features or functionality. For example, computing device 100 may also include additional data storage devices (removable and/or non-removable) such as, for example, magnetic disks, optical disks, or tape. Such additional storage is illustrated in
Computing device 100 may also contain communication connections 116 that allow the device to communicate with other computing devices 118, such as over a network. Communication connection 116 is one example of communication media. Communication media may typically be embodied by computer readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave or other transport mechanism, and includes any information delivery media. The term “modulated data signal” means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media. The term computer readable media as used herein includes both storage media and communication media.
Representing Non-Structured Features in a Well Formed Document
Generally, the present invention is directed at representing non-structured features common with word-processors such that these elements can be recognized and parsed separately from other elements
In XML, it is necessary to maintain a well formed document. Generally, this means that tags within the XML document do not overlap. There are a number of features in word processors, however, that are allowed to span arbitrary ranges. These features include features such as comments, bookmarks, document protection, and the like.
In one embodiment, word-processor 120 has its own namespace or namespaces and a schema, or a set of schemas, that is defined for use with documents associated with word-processor 120. The set of tags and attributes defined by the schema for word-processor 120 define the format of a document to such an extent that it is referred to as its own native ML. Word-processor 120 internally validates ML file 210. When validated, the ML elements are examined as to whether they conform to the ML schema 215. A schema states what tags and attributes are used to describe content in an ML document, where each tag is allowed, and which tags can appear within other tags, ensuring that the documentation is structured the same way. Accordingly, ML 210 is valid when structured as set forth in arbitrary ML schema 215.
ML validation engine 225 operates similarly to other available validation engines for ML documents. ML validation engine 225 evaluates ML that is in the format of the ML validation engine 225. For example, XML elements are forwarded to an XML validation engine. In one embodiment, a greater number of validation engines may be associated with word-processor 120 for validating a greater number of ML formats.
There are enough ML elements for an application that understands XML to fully recreate the document from a single XML file. Hint tags may also be included that provide information to an application to help understand the content of the file.
There are a number of fundamental rules when using XML. One of these rules is called “well-formed ness.” This means that the XML markup must not overlap. Here is an example of XML that is not well formed:
<root>
<title>
</title>
Note how the <subTitle> tag starts inside of the <title> tag, but the <subtitle> ends outside of the <title> tag. In order for this document to be a well formed XML document, it should look like the following:
<root>
<title>
</title>
</root>
There are a number of word-processing features that consist of some type of “structure” applied to a range of text. A bookmark for instance can be applied to a selection of text. For purposes of this disclosure, assume that a word-processing bookmark is identified by the <w:bookMark> tag. An example will be presented to illustrate.
If one were to apply a bookmark called “2nd sentence” to the 2nd sentence of the above paragraph, the XML representation of that might look something like what is illustrated in
While this style of tags works for representing the paragraph with the <w:p> tag, and the bookmark with the <w:bookMark> tag. This approach does not always work. What if there were two paragraphs with a bookmark spanning the paragraphs.
Here is my first paragraph
Here is my second paragraph
Using the approach as illustrated in
<w:p>
Here is my
second paragraph
</w:p>
The above example is not well formed, as the bookmark tag overlaps the paragraph tags. To create a well formed XML representation, two tags will be used for for objects like bookmarks. According to one embodiment, there will be a <w:bookmarkStart> tag and <w:bookmarkEnd> tag to represent bookmarks. So the above example would be represented as shown in
Since the <w:bookmarkStart> and <w:bookmarkEnd> tags are both empty tags, they don't have the problem of wrapping the <w:p> tag. Instead, they are two empty tags that are associated with each other by the id attribute contained within both of the tags. With this method, it is possible to show where a bookmark starts and ends, while still maintaining a well formed document.
According to one embodiment, using empty tags is used for a number of different features in the word-processor, including: Range Level permissions; Bookmarks; Comments; Tracked changes; Spelling Errors; and Grammar Errors.
Bookmarks are used in Word processing documents for a variety of reasons. Bookmarks allow a user to call attention to part of a document without actually altering the document. A bookmark allows a user to easily get back to that point in the document.
Bookmarks become even more powerful with XML. Since XML is a text based format that is easily readable and parseable, bookmarks become a great way to getting into a specific portion of a rich document. Bookmarks may also be used within XML documents to index the documents based on their bookmarks. For example, documents stored on a server may be bookmarked and then indexed.
Bookmarks not only identify a key areas in a document, they also allow a user to select a range within a document. In other words, this is analogous to taking a book, and instead of just inserting a bookmark on a single page, a user could highlight a specific section of text within the book.
Since bookmarks can be applied to a range, one could use an XML parser to show the textual values of all bookmarks in a specific document or in a group of documents.
This provides bookmarks something that is not possible using most XML schemas. Just as an example, take the following example schema for a memo. Assume the following elements: “To:”, “From:”, “Subject:” and “Message:”.
These elements would allow a user to create a structured memo that could easily be routed to the proper recipient. An example memo is as follows:
<memo><from>Brian</from>
<to >Scott</to >
<subject>Hello</subject>
<message>Hey there, how's it going. Did you finish the task of mailing the feedback?</message>
</memo>
Now, what if the actual task of “mailing the feedback” were of interest to certain people? It's obvious that tasks aren't always going to be in a memo, so there probably wouldn't be an actual “task” element in the memo schema. Even if there were, there are probably a number of other types of things that can randomly appear in a memo that some people may want to flag, but that wouldn't appear in the memo schema.
With the bookmark feature, it is possible to flag that bit of text within the “message” element, so that XML parsers could easily parse through not just the memo, but parse through relevant bits of data within the actual message. The following is an exemplary way to bookmark the “mailing the feedback” section.
<memo><from>Brian</from>
<to>Scott</to >
<subject>Hello</subject>
<message>Hey there, how's it going. Did you finish the task of
<w:bookmarkStart name=“Feedback” id=“bk2”/>
mailing the feedback?
<w:bookmarkEnd id =“bk2”/>
</message>
</memo>
Here is an exemplary definition of a paragraph, in accordance with aspects of the invention. Some of the elements, include: aml:annotation; proofErr; permStart; permEnd
According to one embodiment, the following is a definition the aml:annotation element:
<xsd:element name=“annotation” type=“aml:AnnType”>
</xsd:element>
<xsd:complexType name=“AnnType” mixed=“false”>
</xsd:complexType>
Where the attribute type xsd:anyAttribute, and element type xsd:any are referenced, is where the word processor specific information can go. Those attributes are described in the attribute group “wordAnnotationGroup”:
The values used in the “type” attribute are described below:
<xsd:simpleType name=“annotationValuesType”>
</xsd:simpleType>
So, in the case of a Bookmark, the beginning tag would look something like:
<aml:annotation aml:id=“0” w:type=“Word.Bookmark.Start” w:name=“myBookmark”
/>
And the end of the bookmark would look something like:
<aml:annotation aml:id=“0” w:type=“Word.Bookmark.End”/>
According to one embodiment, the following is a list of exemplary proofErr types:
According to one embodiment, the following is a definition of the permStart element:
According to one embodiment, the following is a definition for the permEnd element:
<xsd:complexType name=“permElt”>
</xsd:complexType>
The above specification, examples and data provide a complete description of the manufacture and use of the composition of the invention. Since many embodiments of the invention can be made without departing from the spirit and scope of the invention, the invention resides in the claims hereinafter appended.
This patent application is a continuation-in-part application under 35 United States Code § 120 of U.S. patent application Ser. No. 10/187,060 filed on Jun. 28, 2002, which is incorporated herein by reference. An exemplary schema in accordance with the present invention is disclosed beginning on page 11 in an application entitled “Mixed Content Flexibility,” Ser. No. 10/726,077, filed Dec. 2, 2003, which is hereby incorporated by reference in its entirety.
Number | Name | Date | Kind |
---|---|---|---|
4751740 | Wright | Jun 1988 | A |
4864501 | Kucera et al. | Sep 1989 | A |
4866777 | Mulla et al. | Sep 1989 | A |
5185818 | Warnock | Feb 1993 | A |
5295266 | Hinsley et al. | Mar 1994 | A |
5557722 | DeRose et al. | Sep 1996 | A |
5579466 | Habib et al. | Nov 1996 | A |
5586241 | Bauermeister et al. | Dec 1996 | A |
5781714 | Collins et al. | Jul 1998 | A |
5787451 | Mogilevsky | Jul 1998 | A |
5881225 | Worth | Mar 1999 | A |
5895476 | Orr et al. | Apr 1999 | A |
6023714 | Hill et al. | Feb 2000 | A |
6031989 | Cordell | Feb 2000 | A |
6044387 | Angiulo et al. | Mar 2000 | A |
6092068 | Dinkelacker | Jul 2000 | A |
6119136 | Takata et al. | Sep 2000 | A |
6141754 | Choy | Oct 2000 | A |
6182029 | Friedman | Jan 2001 | B1 |
6209124 | Vermeire et al. | Mar 2001 | B1 |
6230173 | Ferrel et al. | May 2001 | B1 |
6233592 | Schnelle et al. | May 2001 | B1 |
6249794 | Raman | Jun 2001 | B1 |
6336124 | Alam et al. | Jan 2002 | B1 |
6397232 | Cheng-Hung et al. | May 2002 | B1 |
6507856 | Chen et al. | Jan 2003 | B1 |
6507857 | Yalcinalp | Jan 2003 | B1 |
6519617 | Wanderski et al. | Feb 2003 | B1 |
6535896 | Britton et al. | Mar 2003 | B2 |
6538673 | Maslov | Mar 2003 | B1 |
6613098 | Sorge et al. | Sep 2003 | B1 |
6675353 | Friedman | Jan 2004 | B1 |
6697999 | Breuer et al. | Feb 2004 | B1 |
6725423 | Muramoto et al. | Apr 2004 | B1 |
6725426 | Pavlov | Apr 2004 | B1 |
6754648 | Fittges et al. | Jun 2004 | B1 |
6763500 | Black et al. | Jul 2004 | B2 |
6785685 | Soetarman et al. | Aug 2004 | B2 |
6799299 | Li et al. | Sep 2004 | B1 |
6829570 | Thambynayagam et al. | Dec 2004 | B1 |
6829745 | Yassin et al. | Dec 2004 | B2 |
6845483 | Carroll | Jan 2005 | B1 |
6886115 | Kondoh et al. | Apr 2005 | B2 |
6918086 | Rogson | Jul 2005 | B2 |
6928610 | Brintzenhofe et al. | Aug 2005 | B2 |
6938204 | Hind et al. | Aug 2005 | B1 |
6941510 | Ozzie et al. | Sep 2005 | B1 |
6954898 | Nakai et al. | Oct 2005 | B1 |
6968503 | Chang et al. | Nov 2005 | B1 |
6996772 | Justice et al. | Feb 2006 | B2 |
7028009 | Wang et al. | Apr 2006 | B2 |
7257772 | Jones et al. | Aug 2007 | B1 |
7275209 | Jones et al. | Sep 2007 | B1 |
7376650 | Ruhlen | May 2008 | B1 |
7389473 | Sawicki et al. | Jun 2008 | B1 |
20010014900 | Brauer et al. | Aug 2001 | A1 |
20010032217 | Huang | Oct 2001 | A1 |
20020087702 | Mori | Jul 2002 | A1 |
20020091725 | Skok | Jul 2002 | A1 |
20020124115 | McLean et al. | Sep 2002 | A1 |
20020184189 | Hay et al. | Dec 2002 | A1 |
20030007014 | Suppan et al. | Jan 2003 | A1 |
20030018668 | Britton et al. | Jan 2003 | A1 |
20030101416 | McInnes et al. | May 2003 | A1 |
20030135584 | Roberts et al. | Jul 2003 | A1 |
20030163784 | Daniel et al. | Aug 2003 | A1 |
20030167444 | Zorc | Sep 2003 | A1 |
20030231626 | Chuah et al. | Dec 2003 | A1 |
20040073871 | Giannetti | Apr 2004 | A1 |
20040098320 | Mitsuhashi et al. | May 2004 | A1 |
20040194035 | Chakraborty | Sep 2004 | A1 |
20040205553 | Hall et al. | Oct 2004 | A1 |
20040210818 | Jones et al. | Oct 2004 | A1 |
20050102265 | Jones et al. | May 2005 | A1 |
20050108198 | Jones et al. | May 2005 | A1 |
20050108278 | Jones et al. | May 2005 | A1 |
Number | Date | Country |
---|---|---|
1230566 | Feb 2005 | EP |
Number | Date | Country | |
---|---|---|---|
Parent | 10187060 | Jun 2002 | US |
Child | 10727276 | US |