1. Technical Field
The disclosed technology relates to the field of information interface tools (for example, tools that use, create, and/or manipulate XML, SGML, or other information markup instances and/or schemas).
2. Background Art
Prior to development of generalized data description languages programmers would define custom file data formats or simple data languages to share information between programs. This required the programmer to create (at a very low level of abstraction) detailed specifications and code for the required custom data parser and writer routines to create and use the data that carried the information between the programs.
With a markup language such as the “Standard Generalized Markup Language” (SGML) or the “Extensible Markup Language” (XML), etc., the programmer (at a relatively high level of abstraction) can develop rules for such data so that standardized parsing routines/programs/procedures can be used to create, read and use the data carrying the information.
SGML and XML provide a text-based mechanism that describes a tree-based structure for the information carried by the data in the SGML/XML text. The information can be expressed as self-structured textual data in that the data itself includes elements (for example, markup elements) used to define the structure of the information carried by the data. The structure includes a hierarchy of container-like elements for holding the information as well as attributes about the container-like elements.
In the XML environment, a structured information instance (such as a data file, data-stream or other concrete-information-instance) is “well-formed” if it conforms to a markup language's syntax rules. In addition, the structured information instance is “valid” if the data and the information structure conform to an information-model (such as a concrete-schema-instance defined using a schema language). The information-model constrains the structure and/or content of the data contained in the structured information instance. These constraints can include restrictions on element and attribute names (and their allowable containment hierarchies) as well as restrictions or requirements on an element's data-type. Thus, the information-model is a high-level abstraction of what information can be included in a well-formed and valid structured information instance. Schema languages exist to define schemas or other information-models. A structured information instance is associated with its information-model in accordance with the requirements of the schema language used to define the schema. Examples of such schema languages include the simple Document-Type-Definition (DTD) schema language, and vastly more complex and capable languages such as the W3C® XML schema language (WXS) and the RELAX_NG schema language.
As previously discussed, a concrete-information-instance can be validated against an information-model. The information-model is commonly defined using some schema language (such as the W3C XML schema language) and defines both the structure and data types for the information carried by the structured information instance. In some sense, the schema language is similar to a programming language for the information carried by the data in a structured information instance.
XML is one example of a structured information instance validated by an information-model. An XML data instance (an example of a concrete-information-instance) is a fairly simple, human-readable, self-structured, textual document. The XML data instance is self-structured in that the XML data instance identifies attributes, comments, and textual data, and provides structure for the information carried by the XML data instance. The XML data instance carries information in accordance with its validating information-model (for example, one defined using the W3C XML schema language). Thus for example, while an XML data instance can carry information to be presented in tabular form, the information-model defines that a table exists and that the table contains information of a specified type.
One skilled in the art will understand that information within a structured information instance (such as an XML data instance) is much simpler than the definition of the corresponding information-model (such as one written using the W3C XML schema language).
The information-models (schemas) share some characteristics with relational databases in that they can and should be normalized for computational efficiency purposes. However such normalized information-models (like normalized databases) are often difficult to understand and use because the normalized information-model structure spreads the information-model across multiple files/tables. This makes it difficult and unwieldy for a non-expert to use, edit, or create an information-model. The difficulty of understanding the schema language and of creating a computationally efficient information-model is a barrier that inhibits the programmer from using a fully-featured, complex markup language such as the W3C XML schema language.
While there exist prior art programs/tools to assist with creating information-models these programs/tools require the programmer to be familiar with the schema language used to create the information-model. These programs/tools provide little more than a graphical representation of the schema language. The programmer must still develop and maintain a detailed understanding of the schema language used to define an information-model when creating or modifying the information-model. Furthermore, the creation of an information-model requires significant programmer effort and time even if the programmer uses the available programs/tools.
It would be advantageous to provide a technology that simplifies the creation of an information-model for programmers who are not experts in any particular schema language.
One aspect of the technology disclosed within is a method of generating a concrete-schema-instance as a result of modifying a representation of an abstract-instance-object-model. The concrete-schema-instance so generated represents an information-model used to process a concrete-information-instance. Other aspects of the technology include apparatus that perform the method and program products that, when executed by a computer, cause the computer to perform the method.
The inventors have determined that programmers who are not expert in a schema language can easily understand a structured information instance. The technology disclosed herein enables a programmer to manipulate an abstract-data-representation to create a concrete-schema-instance that will validate a concrete-information-instance of the form indicated by the abstract-data-representation that represents the structured information instance. Thus, a programmer can operate at the level of abstraction of, for example, an XML data instance when creating a validating concrete-schema-instance for that XML data instance. In other words, the disclosed technology allows the programmer to visualize an information-model using an abstract-data-representation as a metaphor for the information-model. The technology gives the programmer the ability to declaratively construct the abstract-data-representation and to automatically create a validating information-model for concrete-information-instance represented by the abstract-data-representation. For example, a programmer who is not an XML expert, but using tools that implement the disclosed technology, can easily create an information-model that will validate a concrete-information-instance that has the characteristics of the abstract-data-representation defined by the programmer. The information-model can be instantiated as a concrete-schema-instance defined using a schema language such as the W3C XML schema language. Thus for example, a programmer who needs to pass XML formatted information can focus on the information that will be contained in a concrete-information-instance instead being distracted by the complex and tortuous details of the schema language used to define a validating XML schema for the information. By using tools built on the disclosed technology, both expert and non-expert programmers become more efficient at generating information-models for concrete-information-instances.
The abstract-data-representation is a representation of an abstract-instance-object-model. As a programmer manipulates the abstract-data-representation the technology disclosed herein changes the abstract-instance-object-model accordingly and thus modifies the information-model. A programmer can manipulate the abstract-data-representation using any editor tool. One embodiment uses a graphical user interface (GUI) designed to represent the elements, attributes, element hierarchy and data in an XML data instance. Some embodiments of the editor tool generate files that can be compiled into an information-model. Other embodiments dynamically and/or interactively generate the information-model. An existing information-model can be read by some of these tools to generate a corresponding abstract-data-representation (such as a text file) for subsequent manipulation. The technology disclosed herein is generally implemented as a computer program for execution by a computer and some embodiments use the JAVA® programming system. However, the technology can also be totally or partially implemented on custom electronics.
The disclosed technology teaches how to create a normalized information-model by creating and/or modifying an abstract-data-representation (for example, creating an information-model using an Author-By-Example approach); how to transform an information-model (normalized or not) to an abstract-data-representation for subsequent review or modification; how to transform an abstract-data-representation into an information-model (such as a structured information instance or normalized structured information instance; and how to provide code completion capability when manipulating concrete-information-instances and/or abstract-data-representations.
The technology disclosed within uses an abstract data instance 213 as a tool to allow a programmer (or other user) to develop the abstract-instance-object-model 201 without the need to understand the details of the schema language used to create the schema instance 203. Using the abstract data instance 213 the programmer is able to design an information-model (that is embodied in the abstract-instance-object-model 201) using a data instance metaphor (for example, the abstract data instance 213 can be presented in a form that is very similar to an XML instance document). Thus, both the abstract data instance 213 and the schema instance 203 represent the information-model. The abstract data instance 213 allows the programmer to visualize the possible instance documents from the abstract-data-representation for a given information-model (schema). As the programmer changes the abstract-data-representation, the information-model is changed accordingly. The abstract-instance-object-model 201 can transform the information-model between the abstract data instance 213 and the schema instance 203 and can generate either. Thus, the information-model can be initialized using an existing schema such as the schema instance 203 as well as created or modified by manipulation of the abstract data instance 213.
Because the abstract data instance 213 and the abstract-instance-object-model 201 are synchronized, the disclosed technology can assist the programmer when creating or modifying the information-model. For example, when the programmer uses a GUI to manipulate the abstract data instance 213 the GUI can, after the programmer's selection of a portion of the abstract data instance 213 and/or on receiving a partial input from the programmer, closely interact with the information-model to provide options to the programmer based on the programmer's position in the information-model. For example, as a programmer starts to provide input to the GUI, the GUI can communicate with the information-model to offer possible completions for the programmer's partial input (for example, code-completion). For another example, instant contextual help messages can also be provided to the programmer to explain or further describe one or more of the possible completions.
Because the complete information-model is maintained by the abstract-instance-object-model 201 the programmer is not exposed to any normalization aspect of the schema instance 203 and need not consider details related to the normalized schema portion 205. The transformation back and forth between the schema instance 203 (normalized or not) and the abstract data instance 213 is automatically performed by the abstract-instance-object-model 201.
Because the abstract-instance-object-model 201 represents the information-model, the abstract-instance-object-model 201 can also be used to transform one schema instance (for example, one concrete-schema-instance (defined using a first schema language) into an equivalent concrete-schema-instance (defined using a second schema language). In addition, because the non-expert programmer modifies the schemas using through the abstract-instance-object-model 201, the programmer need not become an expert in both the first and second schema languages.
One skilled in the art will understand that the information formatting library 209 can be included within the information source/sink 211.
Such Author-By-Example tools allow programmers to interactively build an information-model that can be used to generate a concrete-schema-instance. Thus, users with little or no prior knowledge of the W3C XML schema language (or other schema language) can create an XML schema and generate a code/object library that will produce a concrete-information-instance that will be validated by that XML schema.
The disclosed technology seamlessly automates the propagation of structural or type changes throughout the information-model. For example, if the programmer adds an element to the USAddress in the shipTo field, the USAddress in the billTo field will change accordingly.
In some embodiments the functionality of the ‘alter abstract instance object model’ procedure 407 and the ‘generate concrete schema’ procedure 409 are combined such that as the programmer modifies the abstract-data-representation corresponding changes are made not only to the abstract-instance-object-model but to the concrete-schema-instance as well.
The abstract-data-representation can be created and manipulated using a GUI, a text editor, etc. The abstract-data-representation is similar to a minimal concrete-information-instance including all structure, elements, and attributes but need not contain data that would be included in a concrete-information-instance. An example of a GUI presentation of an abstract-data-representation was shown on
The abstract-instance-object-model 201 maintains a tree-like model structure having a root that is a document component. A recursive procedure (such as shown in
For example, where the schema instance 203 is defined using the W3C XML schema language, the abstract-instance-object-model 201 represents all the defined elements, attributes and (if used) compositors for each element at all times. The denormalization can occur as a single computation using the entire XML schema or can be computed for each element as it is expanded in the abstract data instance 213. Both approaches are equivalent.
That is, if the schema is not manipulated or changed, the abstract-instance-object-model 201 does not change. Because the abstract-instance-object-model 201 maintains a denormalized representation of the information-model, manipulations of the abstract data instance 213 are effectively refactorings and thus, these changes are processed in a manner similar to refactorings related to instance documents conforming to the information-model.
The abstract-instance-object-model creation process 500 initiates at a start populate-children terminal 501 with one parameter being a parent object from the abstract-instance-object-model and another being a parent schema object. The abstract-instance-object-model creation process 500 continues to a ‘for each schema child’ iterative procedure 503 that processes each child of the provided parent schema object. As each schema child is iterated, the child's component type is determined by a ‘select component type’ procedure 505 to be an attribute, element, compositor, or other schema object and the appropriate handler is selected to process the schema child object. These handlers can include an ‘attribute handler’ procedure 507, an ‘element handler’ procedure 509, a ‘compositor handler’ procedure 511, a ‘default handler’ procedure 513 and etc. After the schema child objects of the passed schema parent are processed, the abstract-instance-object-model creation process 500 returns through an ‘end’ terminal 515.
The ‘attribute handler’ procedure 507 instantiates an attribute object in the abstract-instance-object-model with attributes equivalent to that of the schema child object and adds that attribute object to the passed abstract-instance-object-model parent object.
The ‘element handler’ procedure 509 instantiates an element object in the abstract-instance-object-model, adds the instantiated object to the passed abstract-instance-object-model parent object, determines the type definition for the element object and recursively invokes the abstract-instance-object-model creation process 500 (by the call to populate children) passing the element object as the abstract-instance-object-model parent object and the type definition as the parent schema object.
The ‘compositor handler’ procedure 511 instantiates a compositor object in the abstract-instance-object-model, adds that object to the passed abstract-instance-object-model parent object and recursively invokes the abstract-instance-object-model creation process 500 (by the call to populate children) passing the instantiated compositor object and forwarding the schema child object.
The ‘default handler’ procedure 513 recursively invokes the abstract-instance-object-model creation process 500 (by the call to populate children) forwarding both the passed abstract-instance-object-model parent object and the schema child object.
After each handler completes, the abstract-instance-object-model creation process 500 continues back to the ‘for each schema child’ iterative procedure 503 to iterate the next child component of the parent object. When all child components have been iterated, the abstract-instance-object-model creation process 500 terminates its respective recursive invocation by completing through an ‘end’ terminal 515.
One aspect of the information-model is that most complex schema languages (such as the W3C XML schema language) allow “type definition and reuse” such that a programmer can define a data type and then use that defined data type elsewhere in the same schema and/or across other schemas. This capability requires that objects that use those data types monitor when the data type is changed (for example, as a result of a manipulation of the abstract data instance 213) so that the data type definitions remain synchronized across uses. An example of this was previously discussed with respect to
In some embodiments a local proxy object monitors the definition of a type and detects changes such that the changes are propagated throughout the information-model. Thus, if a change is made to an information-model all applications that use that information-model are informed of the change and adjust accordingly. For imported schema components, a remote proxy monitors the schema referencing model for changes and responds accordingly.
Information-models can be represented in multiple ways (for example, as a tree, a table, as text, or as an abstract-data-representation). Each way can be presented as a different view into the information-model. Changes made to the information-model while in one view need to be synchronized with the other views of the information-model. In addition, changes made by one programmer need to be synchronized with changes made by another.
When all the children of X have been iterated, the synchronization process 600 continues to a ‘Ychild’ iterative procedure 615 that iterates each child (Ychild) in Xa. For each iterated child, a ‘Ychild exists in X’ decision procedure 617 determines whether the iterated Ychild exists in X. If the Ychild does not exist in X, an ‘add Ychild to X’ procedure 619 adds the Ychild to X and the synchronization process 600 continues to the ‘Ychild’ iterative procedure 615 to complete the iteration of the children of Xa. When the ‘Ychild’ iterative procedure 615 completes, the synchronization process 600 continues back to the ‘component X’ iterative procedure 603 to synchronize the next component in X.
From the foregoing, it will be appreciated that the technology has (without limitation) the following advantages:
As used herein, a procedure is a self-consistent sequence of steps that can be performed by logic implemented by a programmed computer, specialized electronics or other circuitry or a combination thereof that lead to a desired result. These steps can be defined by one or more computer instructions. These steps can be performed by a computer executing the instructions that define the steps. Further, these steps can be performed by circuitry designed or configured to perform the steps. Thus, the term “procedure” can refer (for example, but without limitation) to a sequence of instructions, a sequence of instructions, organized within a programmed-procedure or programmed-function, a sequence of instructions organized within programmed-processes executing in one or more computers, or a sequence of steps performed by electronic or other circuitry, or any logic. In particular these steps can be performed by one or more of a generator logic, an information-model modification logic, a presentation logic, an information-model creation logic, a schema transformation logic, a schema access logic, a user input logic, an information-model monitor logic, and a presentation logic separately or combined.
One skilled in the art will understand that the network transmits information (such as informational data as well as data that defines a computer program). The information can also be embodied within a carrier-wave. The term “carrier-wave” includes electromagnetic signals, visible or invisible light pulses, signals on a data bus, or signals transmitted over any wire, wireless, or optical fiber technology that allows information to be transmitted over a network. Programs and data are commonly read from both tangible physical media (such as a compact, floppy, or magnetic disk) and from a network. Thus, the network, like a tangible physical media, can be a computer-usable data carrier.
The claims, as originally presented and as they may be amended, encompass variations, alternatives, modifications, improvements, equivalents, and substantial equivalents of the embodiments and teachings disclosed herein, including those that are presently unforeseen or unappreciated, and that, for example, may arise from applicants/patentees and others
It will be appreciated that various of the above-disclosed and other features and functions, or alternatives thereof, may be desirably combined into many other different systems or applications. Also that various presently unforeseen or unanticipated alternatives, modifications, variations or improvements therein may be subsequently made by those skilled in the art which are also intended to be encompassed by the following claims. Unless specifically recited in a claim, steps or components of claims should not be implied or imported from the specification or any other claims as to any particular order, number, position, size, shape, angle, color, or material.