C. J. Date and E. F. Codd, “The Relational and Network Approaches: Comparison of the Application Programming Interfaces,” in ACM SIGFIDET (now SIGMOD) workshop on Data description, access and control, 1974.
J. D. Ullman, Principles of Database and Knowledge-Base Systems, Vol. 1, Computer Science Press, 1988.
H. F. Korth and A. Silberschatz, Database System Concepts, second edition, McGraw-Hill, 1991.
H. Garcia-Molina, J. D. Ullman, and J. Widom, Database Systems: The Complete Book, second edition, Prentice-Hall, 2009.
University of Illinois, Urbana-Champaign database tutorial found at http://mias.uiuc.edu/files/tutorials/kcchang01.ppt
Introductory information on database management systems at http://www.scribd.com/doc/4355522/dbms
Not applicable.
1. Field of the Invention
This application relates to database systems, specifically to database systems in which there are data entities that have relationships amongst one another.
2. Prior Art
All but the most trivial databases comprise data entities of various sorts and interrelationships among them. For example, databases of social or political networks, technical systems, movie production networks, university information, and virtually all other non-trivial database systems involve entities and their interrelationships. Naive users might store information in text files or spreadsheets. More sophisticated users might turn to Relational database systems. Heretofore the best known ways to store databases involving entities and their interrelationships are, (1) Relational Databases, (2) Object-Oriented Databases, including variations like Object-Relational Databases, (3) XML, and (4) Hierarchical and Network Databases, these last types considered mostly obsolete. None of these is really satisfactory for storing data about systems of any complexity in terms of the interrelationships amongst data entities. Relational Databases have a high level of built-in data redundancy that invites errors and inconsistencies. These shortcomings can make the design of a good database schema difficult, and can even make the simple act of data entry very annoying. Object-Oriented databases and the like only directly support simple relationships and can be hard to use, accounting for their lack of popularity. XML imposes an hierarchy on the entities, and only allows limited breakaway from the hierarchy with any degree of convenience. Additionally, pointers indicating the non-hierarchical relationships are represented as text rather than true pointers.
Like XML, Hierarchical databases are oriented towards hierarchical relationships amongst data entities rather than more general relationships.
Network databases permit non-hierarchical relationships more natively, but are still hard to use because it is oriented towards binary, many-to-one relationships, requiring more general relationships to be available only by simulation. Additionally, querying a Network database is an exercise in “manual” (programmed) physical navigation of the network, which means that any update to the database requires code update as well. The query language is procedural rather than declarative like SQL.
We need to spend more attention to Relational and Network databases in our discussion of prior art. Relational databases need to be examined in more detail because it is still the predominant type of database, and hence is so readily available that it is used even in inappropriate circumstances. Network databases, on the other hand, needs to be examined in detail because it is the closest to what we are proposing to patent here, though also significantly different as we will point out.
The Relational database scheme certainly has the advantage of simplicity over all the others except for the flat file database, which is not discussed here. As Codd, its inventor, stated, the only concept the user really has to know to begin using Relational databases is that of a two-dimensional table. Every data entity is represented merely by something that can be written into one or more table entries, such as a character string (a name, perhaps) or a number (I.D.), or a combination of a string and a number, for example. Often, such simplicity works very well in practice. However, real-world databases get complex very quickly, and often such simplicity doesn't work any more, as we will now see through a real example.
Our sample application is to store a database about a network of merchants and their clients in Mediaeval Spain. The source of our data is a set of notarized documents or contracts, each representing some kind of business transaction from the 1500's. In each business transaction, there was one or more merchants acting as servers or service providers and zero or more clients. In case there is exactly one server and one client, the transaction is easily represented as a row in a Relational database. However, even in this case there is a potential for errors due to mistyping. For instance, if each person's name (or an ID number) is used as a key, and it is misspelled or mistyped. Such an act of misspelling or mistyping amounts to the creation of a new person entity. This is an important flaw in Relational databases—it is caused by the fact that something as important as an entity (a person, no less!) is represented by a mere character string (or an ID number).
Another problem arises when we try to use a Relational database if the numbers of servers and clients can vary, as they do in the actual application under consideration. Relational database tables can't have a variable number of columns. One solution is to have enough columns to accommodate the maximum number of columns we will ever need. This is a poor solution, because it leads to a large number of NULL entries.
Another well-known problem with Relational database tables is data redundancy, which often leads to data incoherence as well as errors caused by mistyping. Some of that can be removed by means of a process called normalization, which splits up a table into two or more tables. However, normalization can be quite complicated and hard to understand, defeating one major advantage, that of simplicity, touted by Relational DBMS' creator and proponents. In fact, in business practice few users even know much about database schema normalization techniques.
There is yet another cause for the complexity of Relational databases, belying the advertised simplicity. The proponents claim that there are nothing but values in tables, but in fact for the sake of efficiency pointers are needed, just like they are needed in other kinds of DBMS'. For example, indexes are needed for efficient searches, and indexes require large numbers of pointers. Additionally, pointers are often required for the storage of data on media such as disks.
In summary, Relational databases are simple, but only to a casual user who does not intend to use them for a complex project where a great deal of efficiency or reliability is required.
We will now turn to examining the Network DBMS. The Network DBMS uses pointers, also called links or references, to represent binary, many-to-one relationships amongst entities, which in turn are represented by records. General relationships (many-to-many or those with arities greater than 2) can also be represented, but only by simulation.
As shown in
Searches in a Network database is done by means of pointer navigation or traversal. Such navigation is done by procedural, not declarative, code, and must be explicitly programmed by the application programmer. This is not only difficult because the application programmer has to know the exact structure of the database, but it also means that any change in the database could be bad news, because it frequently requires code change!
Additionally, even simple queries may require traversing practically the entire network of records [6]. There is no automatic, easy-to-use search facility in Network databases.
The subject of this patent is a new kind of database management system called Intentionally-Linked Entities, or ILE. In ILE, relationships among entities will be represented directly as true links among them. Thus general graphs (as in Graph Theory), and in fact more (to be explained below), can be represented naturally. The data model will be similar to the Entity/Relationship data model, which was never implemented very well in the prior art partly due to the lack of good programming tools such as object-oriented languages and simple-to-use dynamic memory allocation. (The most valiant attempt in the past was the flopped Network Databases discussed earlier in this document.) However, at the present time sufficient tools and programming languages have been developed so that complex linked data structures are now in more widespread use. Complex linked data structures are used in operating system kernels, for example. Interestingly enough, complex linked data structures have not been used in the database field except in index structures. The main idea behind the ILE database system is to use modern linked data structures, dynamically-allocated arrays, hashes, and objects in general in the main arena of database storage to the fullest extent possible.
What was meant above by saying that we can represent more than just general graphs in ILE? In a graph, an edge represents a binary relationship, that is, a relationship between two nodes, where the nodes commonly represent entities. In ILE, relationships with arities greater than two are possible, and in fact are convenient to create and naturally represented. Thus ILE data structures are more powerful than general graphs. In fact, in ILE, we can also store a new kind of attribute that pertain not to entities in a static way, but that pertain to the entities as they enter a specific relationship. These extra capabilities of ILE are important in the application of ILE to complex networks such as the ones to be referred to in the next paragraph.
We now turn to a more detailed description of ILE, as shown in
A database includes a data structure or object such as a hash that contains or holds references to all the entity sets (reference 40), which are data structures or objects that represents sets of data entities, such that all the entities in each such set are of the same kind. For example, in a university database all entities representing students could be in a single entity set.
A database also includes a data structure or object that contains or hold references to all the relationship sets (reference 60), which are data structures or objects that represent sets of relationships of like kind. For example, all the relationships between two people of the form “is the father of” form one relationship set.
Now we look into an embodiment of a data structure or object that holds an entity set, which is shown as references 50 and 50′ in
Much as ILE uses modern objects for its implementation, and is object-oriented in the sense that it can be embodied to permit objects as data entities, it is not an object-oriented database like Network databases, as in the sense used in Ullman [2] but is instead value-oriented like Relational databases. That is, ILE does not use storage location as key, but uses key attribute values as key instead.
Back to
Describing now samples of individual entities, we once again refer to
Ref. 90, 90′ in
A simpler embodiment of relationship objects is possible, wherein at most one entity plays each role. Instead of having an entire array of “entity plus” objects representing each role, we use only one such object. This simpler embodiment will be represented as a separate set of claims.
Finally note that the database is value-oriented, as opposed to object-oriented, in the sense that the address of an entity is not part of the key, thus permitting value-comparison-based searches. To understand this last point it is important to note that there was a different meaning to the phrase “object-oriented” than the one currently used. See Ullman [2] in the “Other references” section. There, a database is object-oriented if the storage location of an entity can be used as the entity's key. The opposite of object-oriented is “value-oriented.” A database is value-oriented if an entity is identified only by attribute values. Relational databases are value-oriented, and its success relative to Network databases is due in a significant part to that fact. Learning from that success, ILE is meant to be value-oriented. It can be said that ILE has Relational DBMS's advantage of being value-orientedness, as well as Network DBMS's advantage of having links, although ILE's links are more direct than those of Network DBMS's.
This application claims the benefit of provisional patent application Ser. No. 61/075,189, filed Jun. 24, 2008 by the present inventor. U.S. Pat. No. 7,483,920 Jan. 27, 2009 Mori, et al.: Database management system, database management method, and programU.S. Pat. No. 7,333,986 Feb. 19, 2008 Minamino, et al.: Hierarchical database management system, hierarchical database management method, and hierarchical database management programU.S. Pat. No. 6,633,886 Oct. 14, 2003 Chong: Method of implementing an acyclic directed graph structure using a relational database
Number | Date | Country | |
---|---|---|---|
61075189 | Jun 2008 | US |