The invention relates to the conversion, exchange and structuring of data between a great number of entry points and especially the handling of information from forms and the like used in organizations such as governments, industries and so fort. The invention further relates to the preservation of hierarchical information while using a flat data structure.
Data models are tools that can be used for describing the world. Consider for example a house. This house has a number of attributes, such as the address, the size, the number of rooms and so forth. All these attributes in combination are unique to this house and together they describe the house with a required level of detail. Therefore, such data models—in order to reflect the real world—are hierarchical in nature; this house has this address, this house has this number of rooms, etc.
On the other hand, when data is entered using the data model (the number of rooms is 5; the address is Sunflower street No. 7), this is usually performed without any knowledge of the data model and in most cases any information regarding the data model would be unwanted, since such an information load would complicate the collection of data. Forms and other interacting interfaces such as buttons, sliders and browsing tools for selecting data files in the form of raw data, pictures or the like are used for data collection in an inherently flat nature, where the individual fields in the form are used for the collection of form fields. The form fields are inherently flat and do not contain any interrelated information and therefore can be regarded as a flat list of information.
In order to gather hierarchical data through forms, a hierarchical naming convention is often used. In this hierarchical naming convention, the hierarchy of the data may be deduced from the structure of the names. Many schemes can be applied, where the hierarchical structure retrieves codes from the form fields, thereby mapping the hierarchy to the codes in the structure.
The advantages of a hierarchical naming convention are the possibilities for a global naming convention, a decentralized naming maintenance, the nature of the data, persistency for groups of data, etc.
There exist several tools for describing hierarchical information. An example is XML (eXtensible Markup Language) with the use of XML schemas, which are excellent for mapping structures and semi-structured information. XML is a well-known language for this purpose and the advantages of using such a language will not be described in any detail, since it will be well-known to a specialist within the technical field.
The problem when collecting data is inter alia to maintain consistency between the hierarchical data model through the flatness of the form to the hierarchical instance of data produced by the form tool.
No data model will ever stay the same, so part of the problem will also be the construction and maintenance of the data model, in particular while simultaneously enforcing consistency trough a flat form format. The task also grows in complexity with the number of data elements as a function of N1 (“N factorial”). The element number, N, may potentially interact with any of the remaining N-1 elements.
Therefore, it becomes unmanageable to maintain each element in disparate files. If a name is changed during the building of the model, the references to the corresponding element fail. This is often resolved in the XML schema world by enforcing a “No Name Change”-policy. Once a name has been given to an element or a type, this name never changes. This restriction, however, can be interfering with the practical building of data models, since multiple parties may need to be heard during the naming process. In the same way, names that at one time appear logical may due to a change appear misleading at a later time. The names can therefore be changed in the construction period of the model, and only afterwards the names can be locked. This, however, results in time-consuming revisions, where the names are changed, but where the actual functionality of the model remains the same.
A data model rarely, if ever, covers the entire universe and all systems dealing with the corresponding data elements. Interfaces to other data models representing the same data, parts of it or additional data are therefore also needed. Data in one hierarchical model may therefore be transformed fully or in part to another data model and this data model may or may not be hierarchical. This also produces a need for mapping between the different systems.
Likewise, with regard to maintenance of the data model, a method is required for automatically updating the mapping (like the aforementioned need to keep consistency with the form tool etc.). Systems often require a static naming convention and manual work for updating mapping tables is required. This is also time consuming, impractical and expensive.
When a hierarchical data model is applied, it can be used to construct a naming convention in such a way that any piece of information (any data element) gets its own global name. Identical pieces of data can thereby be recognized across systems applying the same data model, thereby implicitly integrating the systems.
In order to ensure this integration in form tools, the convention of naming the form fields must respect the above-mentioned requirement to uniquely identify the same element of data. This also applies to all other flat interfaces.
U.S. patent application 2003/0204511 A1 discloses how two different structures for related databases containing partly the same information can be mapped using XML and where XML schemas are used as the common definition for the same information. Therefore, this patent represents the general idea of using XML defined in XML schemas for representing hierarchical data when these are transferred between relational databases. However, it does not resolve the issue of having to represent the data in a flat format such as a form, where the fields inherently are not subordinated to one another.
EP 1089195 A1 discloses a method for storing and retrieving data in a database without converting it to a relational structure, while simultaneously keeping the XML instance as it is and storing it as one database element in full. XML schemas (and their equivalents) are used as means to represent the structure of the data in each field. The patent, however, does not cover the situation, where the data within the XML instance has to be represented in a flat format such as a form, where each piece of data in the XML instance must be stored and represented separately with the name of the field carrying the information on the hierarchical structure.
It is an object of the invention to provide the user with a tool to create consistent data models underlying a system, where forms and other data collecting devices can collect data and convert it to the correct hierarchy while recognizing data that already has been collected.
According to the invention, the object is obtained by a method of entering, structuring, storing and transferring data using a hierarchical data model and flat forms including:
In this case, the term instance covers a set of data structured according to the hierarchical data model.
Since the hierarchical information is contained in the field names for the data fields, it is easy to correct or amend the data field values. By conserving the hierarchical information in the names, it is also unnecessary to maintain a stringent structure, when storing the data field values.
The method according to the invention further includes obtaining an XML schema document from the hierarchical data model, where the XML schema document includes the expected XML schema for the instance, and parsing with the use of a parser for mutual inconsistencies between the XML schema and the instance and outputting an error message if the instance does not have the expected structure.
Ideally, parsing is not needed, since said instance is based on the field name list created by the hierarchical data model. This is one of the advantages of the invention.
Due to the hierarchical structure of the data model, the individual field names of the field name list become unique. However, the hierarchical structure of the data model can be derived from the names alone and no further information regarding the positions of the individual data elements in the data model is needed. This especially is advantageous when such a data model is used in connection with HTML-browsers, since such browsers handle data in a manner, which is flat, thereby making it difficult to preserve the hierarchical information. Having a naming convention, which preserves the hierarchical information, is therefore a great help.
The invention will be described with reference to a simple example, where a user 1 wishes to enter some data regarding his company.
The user 1 operating the system then enters data into the form 37 shown in
While the form tool 4 requests the field name list 3, the hierarchical data model 2 requests an XML schema document or file 8. Alternatively the XML schema document 8 is requested at an earlier time (“design time”). The XML schema document 8 contains the expected structure of the instance 7 and a parser 9 performs a comparison between the XML schema document 8 and the instance 7. If the parser 9 finds structural errors in the instance 7, it may continue to parse the instance 7 and log all errors or it may simply stop at the first error. Depending on the error method according to the invention, it may be unable to continue and stop or it may return to one or more of the previous steps in order to remedy the error.
An outer loop shown in
The field name list 3 is a simple list of the names for the data fields of the form. The names for the data fields could be arbitrarily selected without regard to the hierarchical structure. The object of the invention, however, is to preserve the hierarchical information in the naming convention. The proposed naming convention has the following format:
Many characters could be used for the separator characters, but in this example the character “:” is used as a name space separator and the character “.” is used as a hierarchy separator. A name with three hierarchical levels would for instance look like the following:
Referring to the example shown in
The address of the company usually comprises several different fields of information such as the road name, the postal code, etc. The names for the data fields with the address information are thus:
Finally the form contains some financial information. The names for the data field with the financial information are thus:
As it appears, each of names for the data fields is thus uniquely named in a way that preserves the hierarchical information. It is also easy to expand the structure by adding new data fields. Each of the data fields are also assigned a specific data type and only data of the specified type will be accepted as input. Different types of data fields may be reused while appointing a separate name. Elements may also be reused thereby inheriting both the data type and the element name. Elements can be grouped into an united number of grouping hierarchies. Each group can be reused (referenced) in the same way as a data element, both as a type and as a data element. An element (or group) may appear once or several times within its super-group. Each element, type (and hence group) is named in one of several (unlimited) name spaces. This corresponds to the normal use of XML schemas. When progressing up/down through a grouping hierarchy, several name spaces may have been applied to the field names thereof.
As seen from
The above example was shown for a single form 37 only. In many cases, however, There will be a need for using several different forms. One example of this could be where the user wishes to build a new house. He will need to obtain several different permits, e.g. a building permit and other permits, and hence there is a need to fill in several different forms. As the user progresses through the different forms, information is accumulated, As indicated by the bidirectional arrow between 4 and 5 in
In the above description in connection with
As mentioned above, generally there are no sequential requirements in the data structures used. This is particularly true for the data field values 5, since the data is added as a consequence of the graphical layout of the form, the order the user enters the data or as a consequence of corrections or amendments of data in connection with later forms. Due to the hierarchical naming, in many cases it will not matter which order the data is stored as long as the hierarchy is preserved in the names. On the other hand, it is sometimes useful or necessary to have some sequential information (element X has to be before element Y). This can be included in several different ways, e.g. by including the information with the field name list 3 or by using a sequential list 38 as indicated in
The above description of the invention reveals that it is obvious that it can be varied in many ways. Such variations are not to be considered a deviation from the scope of the invention, and all such modifications which are obvious to persons skilled in the art are also to be considered comprised by the scope of the succeeding claims.