1. Technical Field of the Invention
This invention relates to discussion threaded relationships in a relational database using adjacency and character map tree models.
2. Background Art
An example of a threaded discussion application is a Google news group, a type of application often referred to as a discussion forum. In a typical discussion forum, a topic is posted and people respond. The responses in such a discussion forum create a response hierarchy.
Documents in threaded discussions conceptually form tree relationships. There are a number of ways to represent tree relationships in a relational database, e.g. Adjacency Model, Nested Set Model, and Character Tree Map Model.
The different approaches to representing trees in a relational database each provide different advantages and disadvantages with regard to operational efficiency. Some typical tree operations include: adding a child, finding a topic and all its descendants, finding all roots, and so forth. These correspond to the discussion forum operations: entering a response document, finding a topic and all responses (including their relationships), deleting a topic and all its responses, and finding all topics. Applying the Adjacency Model alone can result in expensive recursive query operations on topics and responses, e.g. delete. Applying the Nested Set Model alone can also result in expensive operations, e.g. adding a response may result in many records updated. Applying the Character Tree Map Model alone may unduly restrict the number of topics.
Character Tree Map Model is described in U.S. patent application Ser. No. 10/326,187, filed 20 Dec. 2002 for “Method, System, and Program Product for Managing Hierarchical Structure Data Items in a Database”. Nested Set Model of Trees is described in Joe Celko, “SQL for Smarties” in DBMS Online, March 1996. He also describes the advantages and disadvantages of the Adjacency Model.
Referring to
Referring to
To reconstruct the hierarchy 24 of
Consequently, a partial solution is to keep ID 20 and parent ID 22 in database from which to reconstruct the thread of topics and responses. This reconstruction takes processing time, which may be intolerably long for large discussion threads. The entire tree must be searched each time a user requests a thread reconstruction. The search time is unbounded: that is, as responses are added to the thread and entered to the table of IDs 20 and parent IDs 22, an ever larger database must be recursively processed with each insertion or inquiry.
A method, system, and program storage device for storing discussion threaded relationships by representing in a character map tree model tree relationships of a topic and its descendent responses in a discussion thread; storing for each node in the tree in accordance with an adjacency model a node key, a next key, a parent key, and root identifier; and with reference to the character map tree model and adjacency model, selectively retrieve a topic and all descendants, including their relationships, creating a response and adding it as a child to a topic or response, deleting a topic or response and all its descendants, and retrieving topics in a folder.
Other features and advantages of this invention will become apparent from the following detailed description of the presently preferred embodiment of the invention, taken in conjunction with the accompanying drawings.
The present invention relates to an efficient database implementation of threaded discussions in a relation database.
Given some typical discussion forum characteristics, the present invention takes advantage of aspects of both the Character Tree Map Model and Adjacency Model to provide an efficient relational database implementation with regard to common discussion forum operations.
Referring to
The Character Map Tree Model is used to represent the tree relationship of a topic and all its descendants. This provides for efficient operations over other approaches for topic deletion and adding a new response to a topic. Move and copy of responses and their descendants are uncommon operations for discussion forms. The Character Map Tree Model is used to bound response level depth, and the maximum number of direct descendants a response may have, but the model can be implemented in a way that this is not prohibitive for discussion forums. The Adjacency Model is used, in part, to distinguish between topic and response trees, which efficiently provides parent information in query results and efficiently identifies topics in a forum. This adjacency information is not used, however, to implement operations such as delete, or retrieving a topic and all its responses since this can result in expensive operations.
Thus, this invention uses the Charater Map Tree Model (CMT) to represent the tree relationship of a topic and all its descendants. The CMT model uses a single fixed-length character field to represent the position of a given node within the tree hierarchy. This character field, designated herein as NDXKEY, uniquely identifies a node within a tree, identifies the node's parents at all preceding levels, provides a range of all the node's children, indicates the level of a given node within a tree, can specify an ordering, and provides efficient query operations.
The CMT model does bound the number of nodes at a given level and the number of nodes at any level, but this is controlled by the size of the character field, the character set used, and the number of characters used per level. For discussion forum applications, these parameters can be set to give satisfactory limits. For example, using a key of length 256, a character set [A-z], and two characters per level, the number of levels will be restricted to 128, and the number of direct response to 3,364. A key length of 256 or less can also be indexed on major relational database implementations, such as DB2, SQL Server, Oracle, and so forth.
This invention uses, in part, the Adjacency Model by storing with a node not only the NDXKEY, but also the parent and root ID's. The relationship columns within a table are defined as (lengths are implementation specific):
The PARENTID is part of the result set for operations such as getAllChildren, but is not used logically to implement the operations.
By this implementation, since the NDXKEY is only unique within a tree (topic and its descendants), not globally for a set of topics, ROOTID is used to uniquely associate a response to a topic. Alternatively, keys could be generated starting with a folder (group of topics), but this would bound the number of topics allowed within a folder, usually an undesirable characteristic for a discussion forum.
Referring to
By way of example, for topic 1.1.1.1, a first response adds one to the second digit, giving 1.2.1.1. The second response adds one to the second digit of the largest previous response 1.2.1.1, yielding 1.3.1.1. This same processing occurs when adding further sub-responses to the thread: one is added to the third digit, and so forth, as is illustrated in
If keys are based on integers 0-9, only 10 responses to a given parent response may be inserted to the tree. However, if the sort order A-Z, a-z, 0-9 is used, then 26+26+10=62 keys are available. If multiple digits are used with separators, even more response are possible.
The above works. However, there is yet another consideration. When creating a next entry to the tree, it is necessary to search for a maximum key 34, and then increment it by one. When creating an object, users are generally more lenient with the time it takes than when viewing. Therefore, in order to minimize read time at the expense of insertion time, a next key field 36 is provided. This makes a parent responsible for farming out a next key to a direct descendent. When a topic or response gives out a key for a next response, it increments its next key 36 by one in anticipation of a next request. Upon request, now the highest key need not be searched from column 34 and incremented to obtain the next key, but is immediately available from next key 36.
If
In the embodiment of
In accordance with the present invention, several exemplary operations are provided and described in the following tables:
The tables for a topic/response meta-data, properties, and relationships data can be implemented in many different and acceptable ways. The operation descriptions hereafter make some assumptions to simplify the description in order to demonstrate the benefits of the method. The main point to demonstrate is how the main discussion forum operations can be made efficient by using the data model previously described.
In accordance with a further aspect of the invention, the requirement of maintaining locks on topics and responses at a global, database wide, level is alleviated by using parent ID 22 as the control for updates. Such a lock is now required on the immediate parent only for the time required to update next key 36 and give it out to the requester. Previously, the entire table needed to be locked throughout the update operation. Thus, referring to
It will be appreciated that, although specific embodiments of the invention have been described herein for purposes of illustration, various modifications may be made without departing from the spirit and scope of the invention. Referring to
Further, each step of the method may be executed on any general computer, such as IBM Systems designated as zSeries, iSeries, xSeries, and pSeries, or the like and pursuant to one or more, or a part of one or more, program elements, modules or objects generated from any programming language, such as C++, Java, Pl/1, Fortran or the like. And still further, each said step, or a file or object or the like implementing each said step, may be executed by special purpose hardware or a circuit module designed for that purpose.
Accordingly, the scope of protection of this invention is limited only by the following claims and their equivalents.