Markup Languages have attained wide popularity in recent years. One type of markup language, Extensible Markup Language (XML), is a universal language that provides a way to identify, exchange, and process various kinds of data. For example, XML is used to create documents that can be utilized by a variety of application programs. Elements of an XML file have an associated namespace and schema.
In XML, a namespace is a unique identifier for a collection of names that are used in XML documents as element types and attribute names. The name of a namespace is commonly used to uniquely identify each class of XML document. The unique namespaces differentiate markup elements that come from different sources and happen to have the same name.
XML Schemata provide a way to describe and validate data in an XML environment. A schema states what elements and attributes are used to describe content in an XML document, where each element is allowed, what types of text contents are allowed within it and which elements can appear within which other elements. The use of schemata ensures that the document is structured in a consistent manner. Schemata may be created by a user and generally supported by an associated markup language, such as XML. By using an XML editor, the user can manipulate the XML file and generate XML documents that adhere to the schema the user has created. XML documents may be created to adhere to one or more schemata.
The XML standard is considered by many as the ASCII format of the future, due to its expected pervasiveness throughout the hi-tech industry in the coming years. Recently, some word-processors have begun producing documents that are somewhat XML compatible. For example, some documents may be parsed using an application that understands XML. However, much of the functionality available in word processor documents is not currently available for XML documents.
The present invention is generally directed towards a method for representing an application's native field structures, such as “Creation Date of the Document”, “Formula”, a specially formatted number, a reference to text in another part of the document, or others in a markup language document. Fields are commonly used for document automation, so that the application itself includes certain information among the contents of the document, with possibly no extra user intervention required. The method of the invention provides a way to save this field definition information in a markup language (ML) document without data loss, while allowing the field structures to be parsed by ML-aware applications and to be read by ML programmers.
Throughout the specification and claims, the following terms take the meanings explicitly associated herein, unless the context clearly dictates otherwise.
The terms “markup language” or “ML” refer to a language for special codes within a document that specify how parts of the document are to be interpreted by an application. In a word-processor file, the markup language specifies how the text is to be formatted or laid out, whereas in a particular customer schema, the ML tends to specify the text's meaning according to that customer's wishes (e.g., customerName, address, etc). The ML is typically supported by a word-processor and may adhere to the rules of other markup languages, such as XML, while creating further rules of its own.
The term “element” refers to the basic unit of an ML document. The element may contain attributes, other elements, text, and other building blocks for an ML document.
The term “tag” refers to a command inserted in a document that delineates elements within an ML document. Each element can have no more than two tags: the start tag and the end tag. It is possible to have an empty element (with no content) in which case one tag is allowed.
The content between the tags is considered the element's “children” (or descendants). Hence, other elements embedded in the element's content are called “child elements” or “child nodes” or the element. Text embedded directly in the content of the element is considered the element's “child text nodes”. Together, the child elements and the text within an element constitute that element's “content”.
The term “attribute” refers to an additional property set to a particular value and associated with the element. Elements may have an arbitrary number of attribute settings associated with them, including none. Attributes are used to associate additional information with an element that will not contain additional elements, or be treated as a text node.
Illustrative Operating Environment
With reference to
Computing device 100 may have additional features or functionality. For example, computing device 100 may also include additional data storage devices (removable and/or non-removable) such as, for example, magnetic disks, optical disks, or tape. Such additional storage is illustrated in
Computing device 100 may also contain communication connections 116 that allow the device to communicate with other computing devices 118, such as over a network. Communication connection 116 is one example of communication media. Communication media may typically be embodied by computer readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave or other transport mechanism, and includes any information delivery media. The term “modulated data signal” means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media. The term computer readable media as used herein includes both storage media and communication media.
Generally, the present invention is directed at representing field structures in an ML document. The ML document may be read by applications that do not share the same schema that created the document. The application not sharing the same schema may parse the field structures, regardless of whether or not the fields are understood.
In one embodiment, word-processor 120 has its own namespace or namespaces and a schema, or a set of schemas, that is defined for use with documents associated with word-processor 120. The set of tags and attributes defined by the schema for word-processor 120 define the format of a document to such an extent that it is referred to as its own native ML. Word-processor 120 internally validates ML file 210. When validated, the ML elements are examined as to whether they conform to the ML schema 215. A schema states what tags and attributes are used to describe content in an ML document, where each tag is allowed, and which tags can appear within other tags, ensuring that the documentation is structured the same way. Accordingly, ML 210 is valid when structured as set forth in arbitrary ML schema 215.
ML validation engine 225 operates similarly to other available validation engines for ML documents. ML validation engine 225 evaluates ML that is in the format of the ML validation engine 225. For example, XML elements are forwarded to an XML validation engine. In one embodiment, a greater number of validation engines may be associated with word-processor 120 for validating a greater number of ML formats.
Representing Fields in a Markup Language Document
The present invention generally provides a method to represent an application's native field structures in markup language (ML) such as XML. The field structures may be parsed by applications that understand the markup other than the application that generated the ML file. Fields are commonly used for document automation, so that the application itself includes certain information among the contents of the document, with possibly no extra user intervention required. Fields may be a very powerful feature making the document authoring or editing process much more efficient.
Fields are elements of the content of a document, whose purpose is to automatically generate or modify the content, or its appearance, depending on various conditions and/or settings specified by the user. Fields may be very simple or very complex.
A defining characteristic of a field is that it is updatable. For example, a “LastSavedBy” field may insert the name of the last person who saved the document at the location of the field. When a different person saves the document from the one who saved it last time, the name inserted by the field is automatically replaced with the name of the latest user. The field therefore generates and modifies the content of the document depending the identity of the person saving the document.
A “Ref” field (reference) is a more complex example. The field's result is text which is a “linked” copy of text from another place of the document, identified by a named bookmark. As soon as the original text changes, the text inserted by the field changes as well. The “ref” field may also affect the formatting of the copied text (e.g., by making the copied text uppercased).
An even more complex example is a field which creates a table of contents for the document by: reproducing all the headings used in the document in a single location; organizing the headings according to their level to expose the hierarchy of the document; changing the formatting of the headings; automatically including the correct page number with each heading in the table of contents; and determining the numbering style to use for the table of contents. A table of contents that is the result of such a field is automatically updatable and self-organizing based on the contents of the document. Therefore, the maintenance of a table of contents is automated so that the user is not required to create and maintain the table of contents manually.
Certain fields may refer to one another. For example, a field whose result is the Index section of a document relies on the existence of fields throughout the document that mark index entries. Also, certain fields may be nested one inside of another and work together in a “recursive” manner to create the desired result.
In order for an application to support the concept of fields, the application represents each field internally by a structure mirroring field properties. A field structure generally consists of the following two major parts:
“Field instructions” comprise the portion of a field containing pieces of information such as:
The “field result” comprises the portion of the field which contains the result of the operation performed by the field. The field result may simply be a number, but also may be as arbitrarily rich and complex as a whole fully formatted document or OLE (Object Linking and Embedding) object. The result is the part that is updated by the field when the value of the arguments of the field changes.
Since a field itself is an editable part of a document, it coexists with the surrounding content. The field may be separated from the surrounding content by a field start and field end marks. Also, the instructions are separated from the result. In a first embodiment, the separators are visible to the user. In a second embodiment, the separators are not visible to the user. Correspondingly, in other embodiments, the instructions may or may not be visible to the user. Typically, a user is able to choose between a view where only field instructions are visible and one where only field results are displayed.
Based on how the instructions portion of a field is structured, fields are divided into two major categories:
simple fields—The instructions portion only contains instructions, and not richly formatted content or other embedded fields.
complex fields—The instructions portion contains richly formatted content or other embedded fields.
The present invention provides a method for saving all the field information described above as ML without losing any data, by mapping the application's internal field structures described above to saved ML markup.
The present invention represents the fields in ML depending on whether the field is a “complex” field or a “simple” one.
In the example shown, the simple field is represented by fldSimple element 310 containing instructions 320 and result 330. Instructions 310 of the field are written out as the string value of the instr attribute. Result 330 of the field is arbitrarily rich ML content written out as the child of fldSimple element 310. In the example given, the ML markup represents an “Author” field, whose function is to insert the name of the document author (John Doe) into the document, in upper case. Other field instructions and results may be used within a simple field, and a simple field may correspond to elements other than the fldSimple element without departing from the scope of the present invention.
As shown, instructions 440 of a complex field themselves may contain arbitrarily rich content, including other fields. Accordingly, ML for a complex field includes the definition of two empty elements such as fldChar 410 and instrText 420. Element fldChar 410 marks the beginning of the field, the boundary between the instructions and the result, or the end of the field, depending on the value of its fldCharType attribute 430 (e.g., “begin”, “separate”, “end”, etc.). Element instrText 420 contains the ML markup for the arbitrarily rich instructions of the field.
In one embodiment, the elements appear in the following specific order for the field representation to be valid:
<fldchar fldCharType=“begin”/>
. . .
<instrText>
</instrText>
. . .
<fldChar fldCharType=“separate”/>
<fldChar fldCharType=“end”/>
The actual contents of the field instructions may vary from application to application, depending on the types of fields the application supports.
The attached appendix is a listing of an exemplary portion of schema for generating the fields, in accordance with aspects of the present invention.
At decision block 530, a determination is made whether each field used is a complex field. When the field being examined is a complex field, processing moves to block 540. However, if the field is not a complex field, the field is a simple field and processing moves to block 550. In another embodiment, the fields may be categorized according to fields other than complex fields and simple fields.
At block 540, the properties of the complex field (when the field is a complex field) are mapped into elements, attributes, and values of the ML file. As an example, the fields may include “Creation Date of the Document”, “Formula”, a specially formatted number, a reference to text in another part of the document, or others that each have their own associated properties. Two elements used in mapping the properties of a complex field are the fldChar element and the instrText element (see
At block 550, the properties of the simple field (when the field is a simple field) are mapped into elements, attributes, and values. An elements used in mapping the properties of a simple field is the fldSimple element (see
At decision block 560, a determination is made whether all the fields of the document have had their properties mapped to elements, attributes, and values. If not all of the fields have been processed, processing returns to block 530 where the category of the next field is determined. However, if all the fields have been processed, then the process then moves to block 570.
At block 570, the properties of the fields are stored in a ML document that may be read by applications that understand the ML. Once the properties are stored, processing moves to end block 580 and returns to processing other actions.
In another embodiment, the properties of each field are mapped to elements, attributes, and values without a distinction being made between complex fields and simple fields.
The above specification, examples and data provide a complete description of the manufacture and use of the composition of the invention. Since many embodiments of the invention can be made without departing from the spirit and scope of the invention, the invention resides in the claims hereinafter appended.
This patent application is a continuation-in-part application under 35 United States Code § 120 of U.S. patent application Ser. No. 10/187,060 filed on Jun. 28, 2002, which is incorporated herein by reference. An exemplary schema in accordance with the present invention is disclosed in a file entitled Appendix.txt in a CDROM attached to an application entitled “Mixed Content Flexibility,” Ser. No. 10/726,077, filed Dec. 2, 2003, which is hereby incorporated by reference in its entirety. A computer listing is included in a Compact Disc appendix in the attached CD ROM (quantity of two) in IBM-PC using MS-Windows operating system, containing file Appendix.txt, created on Dec. 26, 2006, containing 12,288 bytes (Copy 1 and Copy 2) and is hereby incorporated by reference in its entirety.
Number | Name | Date | Kind |
---|---|---|---|
4751740 | Wright | Jun 1988 | A |
4864501 | Kucera et al. | Sep 1989 | A |
4866777 | Mulla et al. | Sep 1989 | A |
5185818 | Warnock | Feb 1993 | A |
5295266 | Hinsley et al. | Mar 1994 | A |
5557722 | DeRose et al. | Sep 1996 | A |
5579466 | Habib et al. | Nov 1996 | A |
5586241 | Bauermeister et al. | Dec 1996 | A |
5781714 | Collins et al. | Jul 1998 | A |
5787451 | Mogilevsky | Jul 1998 | A |
5881225 | Worth | Mar 1999 | A |
5895476 | Orr et al. | Apr 1999 | A |
6023714 | Hill et al. | Feb 2000 | A |
6031989 | Cordell | Feb 2000 | A |
6044387 | Angiulo et al. | Mar 2000 | A |
6092068 | Dinkelacker | Jul 2000 | A |
6119136 | Takata et al. | Sep 2000 | A |
6141754 | Choy | Oct 2000 | A |
6182029 | Friedman | Jan 2001 | B1 |
6209124 | Vermeire et al. | Mar 2001 | B1 |
6230173 | Ferrel et al. | May 2001 | B1 |
6233592 | Schnelle et al. | May 2001 | B1 |
6249794 | Raman | Jun 2001 | B1 |
6336124 | Alam et al. | Jan 2002 | B1 |
6397232 | Cheng-Hung et al. | May 2002 | B1 |
6507856 | Chen et al. | Jan 2003 | B1 |
6507857 | Yalcinalp | Jan 2003 | B1 |
6519617 | Wanderski et al. | Feb 2003 | B1 |
6535896 | Britton et al. | Mar 2003 | B2 |
6538673 | Maslov | Mar 2003 | B1 |
6613098 | Sorge et al. | Sep 2003 | B1 |
6675353 | Friedman | Jan 2004 | B1 |
6697999 | Breuer et al. | Feb 2004 | B1 |
6725423 | Muramoto et al. | Apr 2004 | B1 |
6725426 | Pavlov | Apr 2004 | B1 |
6754648 | Fittges et al. | Jun 2004 | B1 |
6763500 | Black et al. | Jul 2004 | B2 |
6785685 | Soetarman et al. | Aug 2004 | B2 |
6799299 | Li et al. | Sep 2004 | B1 |
6829570 | Thambynayagam et al. | Dec 2004 | B1 |
6829745 | Yassin et al. | Dec 2004 | B2 |
6845483 | Carroll | Jan 2005 | B1 |
6886115 | Kondoh et al. | Apr 2005 | B2 |
6918086 | Rogson | Jul 2005 | B2 |
6928610 | Brintzenhofe et al. | Aug 2005 | B2 |
6938204 | Hind et al. | Aug 2005 | B1 |
6941510 | Ozzie et al. | Sep 2005 | B1 |
6954898 | Nakai et al. | Oct 2005 | B1 |
6968503 | Chang et al. | Nov 2005 | B1 |
6996772 | Justice et al. | Feb 2006 | B2 |
7028009 | Wang et al. | Apr 2006 | B2 |
7257772 | Jones et al. | Aug 2007 | B1 |
7275209 | Jones et al. | Sep 2007 | B1 |
7376650 | Ruhlen | May 2008 | B1 |
7389473 | Sawicki et al. | Jun 2008 | B1 |
20010014900 | Brauer et al. | Aug 2001 | A1 |
20010032217 | Huang | Oct 2001 | A1 |
20020087702 | Mori | Jul 2002 | A1 |
20020091725 | Skok | Jul 2002 | A1 |
20020124115 | McLean et al. | Sep 2002 | A1 |
20020184189 | Hay et al. | Dec 2002 | A1 |
20030007014 | Suppan et al. | Jan 2003 | A1 |
20030018668 | Britton et al. | Jan 2003 | A1 |
20030101416 | McInnes et al. | May 2003 | A1 |
20030135584 | Roberts et al. | Jul 2003 | A1 |
20030163784 | Daniel et al. | Aug 2003 | A1 |
20030167444 | Zorc | Sep 2003 | A1 |
20030231626 | Chuah et al. | Dec 2003 | A1 |
20040073871 | Giannetti | Apr 2004 | A1 |
20040098320 | Mitsuhashi et al. | May 2004 | A1 |
20040194035 | Chakraborty | Sep 2004 | A1 |
20040205553 | Hall et al. | Oct 2004 | A1 |
20040210818 | Jones et al. | Oct 2004 | A1 |
20050102265 | Jones et al. | May 2005 | A1 |
20050108198 | Jones et al. | May 2005 | A1 |
20050108278 | Jones et al. | May 2005 | A1 |
Number | Date | Country |
---|---|---|
1 230 566 | Feb 2005 | EP |
Number | Date | Country | |
---|---|---|---|
Parent | 10187060 | Jun 2002 | US |
Child | 10731515 | US |