The disclosures herein relate generally to databases and more particularly to methods and apparatus for storing and accessing information in content management systems.
Conventional content management systems, such as that shown in
One approach employed to store items in a content management system is to model an item in a single table. Unfortunately, such a single table approach results in many fields among the rows and columns of the table being unused. Such an approach is inefficient from the storage viewpoint. In the past, flat data models have been used to store data in a content management system. For example,
Content management systems typically store three types of information, namely primary content (data), user metadata and system metadata. Primary content is stored in the Resource Manager and includes both structured and semi-structured data such as text files, images, web pages, video clips, for example. Descriptions of, and information about the primary content stored in the Resource Manager, which are normally provided by client users, are referred to as “user metadata” which is stored in the Library Server. In contrast to “user metadata”, “system metadata” is the information created by the content management system itself for access control, storage management, and content tracking and reference. Both user metadata and system metadata reside in the Library Server which acts as a repository for the metadata in database form. As compared to primary content, both user and system metadata are well structured. In general, content management systems provide a set of functions for content (data and metadata) creation, content search and retrieval, and content distribution that enable user to manage data, system metadata and user metadata. As mentioned, one approach depicted in
In most content management systems, both system metadata and user metadata are searchable. However, content-based searching of the primary content is more limited. Traditionally, search on the metadata, termed parametric search, is through either a specific API or via SQL language on many systems. Since content management systems in general provide a much richer data model than their underlying database systems, writing search queries based on a specific API or SQL can be both tedious and inefficient.
What is needed is a methodology and apparatus for providing the content management system user with a simplified user experience when preparing queries to search for specific data stored in the system.
The disclosure herein involves a content management system which employs a hierarchical item type tree-based structure including tables at different levels to store metadata for items. More particularly, a method of organizing information in a content management system is disclosed which includes the steps of storing metadata in a tree hierarchy of tables in a storage repository using a first data format. The method also includes accessing the metadata in the storage repository to provide accessed metadata. The method further includes creating a view of the accessed metadata in a second data format. The method also includes the step of running a query against the view of the accessed metadata in the second data format.
A principal advantage of the embodiment disclosed herein is a significantly simplified user experience when running queries in the content management system.
Library server 25 is coupled to resource manager 35 and contains user and system metadata concerning the primary content (data or objects) that are stored in resource manager 35. Many types of data can be stored in resource manager 35, for example, business information, applications, text, audio, video and streaming data, just to name a few.
Content management system 10 employs a rich data model wherein data including metadata is stored in a tree-based hierarchical data structure including multiple database tables. In this model, an item is the basic unit of resource managed by the system. More specifically, an item is a typed object whose type is defined by an Item Type. Logically, an Item Type is composed of components arranged in a hierarchy. This hierarchy forms a tree structure and has a unique root component. An Item is an instance of an Item Type. It is composed of one instance of the root component and zero or more instances of descendant components, also called child components (repeating groups). Within the Item, these component instances have ancestor-descendant relationships as dictated by the Item Type. In implementation, an Item Type is composed of multiple relational or object-relational database tables, each representing a component of the Item Type. An Item is composed of one row from the database table representing the root component and zero or more rows from each of the database tables representing descendant components (child components)
A query with predicates involving attributes from multiple components of an Item Type, if written in SQL, would require specifying “joins” which reflect the ancestor-descendant relationships between those components. It was discovered that it is possible to hide this complexity from users/applications, if the system can provide higher level query language that automatically maps the client application data model of client 15 to the underlying (relational or object relational) database data model employed by library server 25.
It was mentioned earlier that traditionally, search on the metadata is through either a specific API or via SQL language on some systems. Since content management systems in general provide a much richer data model than their underlying database systems, writing search queries based on a specific API or SQL can be both tedious and inefficient in content management systems without query processor 30. This is so because an “item” can be a compound object, which maps into multiple database tables. To search on properties of an item, users would potentially need to either write very complex SQL query involving many complex join and/or union operations or make many complex API calls. For some queries searching on user metadata, it is first necessary to query the system metadata, thus requiring multiple SQL statements. (Note: e.g. to find all publications with a specific Title, it is first necessary to find which Item Types contain attribute Title). The disclosed content management system 10 with its query processor 30 advantageously insulates the client user from these high levels of query complexity.
Content management system 10 includes a hierarchical data structure, repeating groups, link relationships and reference attributes. One goal of content management system 10 is to abstract out the application data model and provide a high level query language for which it is both easy for client users to write queries and yet which can be efficiently executed by the system.
An example of a content management system 10 data model including 4 representative Item Types and demonstrating query conversion is now presented.
To simplify the query process as earlier described, the tree-based content manager hierarchies shown in
To achieve the above described mapping, each content management system Item in represented by nested XML elements, with the top level XML element representing the root component and the nested XML elements representing the descendant components. The nesting of the XML element thus represents the component hierarchy.
Five mapping rules are now described which assist in the conversion of an XML query to an SQL query, effectively by mapping the SQL data model format to the XML data model format such that XQuery, the query language of XML will run on content management (CM) system 10. A way has been discovered to view the tree-based SQL hierarchy of the content management (CM) system as an XML document, such that queries written in Xquery format can be run against it. Query processor 30 of library server 25 implements the five following mapping rules.
Mapping Rule 1—CM Root Components
The CM root component of an Item is represented by a top level XML element with XML attributes: String ItemID, String VersionID, ID ItemVersionID plus all the user defined attributes within that component. ItemVersionID can be thought of as concatenation of ItemID and VersionID, thus making it unique within library server 25.
Mapping Rule 2—CM Child Components
Each CM child component of the Item is represented by a nested XML element with attributes same as the user defined attributes of the CM child component.
Mapping Rule 3—CM User-defined Attributes
Each user-defined CM attribute is represented as an XML attribute within the XML element representing the containing CM component.
Mapping Rule 4—CM Links
Although the inbound and outbound links are not a part of an Item itself in the CM data model (i.e. they are stored separately in the Links table), for the purpose of querying it is very convenient to conceptually think of them as being a part of the XML element representing the Item. This relieves applications from writing joins explicitly in the queries. The links originating at an Item are represented by <Outbound_Link> XML elements with attributes: IDREF LinkItemRef, IDREF TargetItemRef and INTEGER LinkType. The LinkItemRef is a reference to the Item associated with the link. The TargetItemRef is a reference to the Item pointed to by the link. The LinkType is the type of the link. Similarly links pointing to an Item are represented by <Inbound_Link> XML elements with attributes: IDREF LinkItemRef, IDREF SourceItemRef and INTEGER LinkType. The SourceItemRef is a reference to the Item where the link originates. In the CM data model, links are independent of versions. i.e. a link between two items is essentially a link between all versions of both items. Thus if items I1 and I2 are the source and destination respectively of a link, the conceptual XML representation of I1 will have nested within it, an <Outbound_Link> element for each version of I2 that exists in the LS. Similarly, the representation of I2 will have nested within it, an <Inbound_Link> element for each version of I1 that exists in the LS.
Mapping Rule 5—CM Reference Attributes
A CM reference attribute is represented by an IDREF XML attribute within the XML element representing component containing the attribute. In the CM data model, references are version specific. i.e. a reference attribute points to a specific version of an Item.
The CM data model set forth in the representations of Item Type Journal in
Applying the above mapping rules results in the following representative XML schema:
The following 5 query examples show representative XML syntax queries followed by a corresponding SQL query defined under the CM data model. These comparisons demonstrate the very significant simplification provided to the user once the mapping is applied to the hierarchical CM data model. After mapping, the client user can use a simple first format query language such as Xquery resulting in a significantly improved user experience as compared to the client user using SQL query directly on the CM hierarchical data model.
Find all journal articles by author “Hsiao” which contain figures with the caption “Architecture”.
Clearly, the XML Query syntax is much simpler for the client user than SQL query. The document (“LS.xml”) is the implicit context node for all queries and is determined by the context of connection to the Library Server (LS) 25. The predicate Journal_Author/@Last_Name=“Hsiao” evaluates to true if “Hsiao” is equal to any element of the set Journal_Author/@Last_Name. Thus it evaluates to true for all journal articles which contain one or more articles written by “Hsiao”. The Journal_Section table needs to be joined in as well to specify the ancestor-descendent relationship between articles and figures.
Find all publications by ‘Morgan-Kaufman’ in year 2000. Find only those which are latest versions.
The query subexpression “/*” selects all root components. The predicate involves attributes @Year and @Publisher_Name. These are present only in Journal and Book Item Types. So the result can only contain these Item Types. This example shows how both metadata and data search can be expressed in a single XML Query. Since only latest versions are desired in this query, a function latest-version( ) is used which returns the VersionID of the latest version of XML elements representing Items. (Note: The system table ICMSTITEMSnnnsss contains only the latest versions of all Items of all Item Types. The Item Type specific tables (e.g. Book, Journal) contain all versions of Items of those specific Item Types).
Find all journals which have links from Special Interest Groups (SIGs) with title ‘SIGMOD’.
The subexpression /SIG[@Title=“SIGMOD”]/Outbound_Link selects the Outbound_Link elements for SIGs with title ‘SIGMOD’. The subexpression @TargetItemRef=> Journal dereferences the @TargetItemRef attributes of these elements to select the corresponding journals.
Find all journal articles which contain the text ‘The authors are thankful to’.
This example shows how text search is done. Basically, the search functions defined by a text search engine, e.g. SQL UDFs in the DB2 Text Extender, can be directly used in the query.
Find all journal articles whose title starts with the pattern ‘XML’. Order the result by the title of the containing Journal.
The example above shows how ordering can be done using XML query syntax.
The disclosed content management system has many advantages in terms of simplifying the user's task of writing queries. Viewing the CM system metadata as an XML document abstracts out the physical mapping of CM data model to relational database tables. It enables the user of existing XML Query Language (XQuery) to easily write queries against the CM metadata. It also eliminates the need of developing a new query notation. The Path Expressions in XQuery allow a high level expression of parent-child and ancestor-descendent relationships between components of an Item. It is also possible to write a single query for both data and metadata using the wild-card notation (‘*’) and the descendent axis notation (‘//’). The dereference operator in XQuery allows easy expression of queries involving link relationship if a link is viewed as both—(i) an “Outbound_Link” element nested within the element representing the source of the link and (ii) an “Inbound_Link” element nested within the element representing the target of the link. In fact, viewing links this way eliminates the need to use XQuery FLWR expressions in query involving links—i.e. such a query can be expressed using just the Path Expression subset of XQuery.
The disclosed content management system can be stored on virtually any computer-readable storage media, such as CD, DVD and other magnetic and optical media in either compressed or non-compressed form. Of course, it can also be stored on a server computer system or other information handling system.
Although illustrative embodiments have been shown and described, a wide range of modification, change and substitution is contemplated in the foregoing disclosure and in some instances, some features of an embodiment may be employed without a corresponding use of other features. Accordingly, it is appropriate that the appended claims be construed broadly and in a manner consistent with the scope of the embodiments disclosed herein.
Number | Name | Date | Kind |
---|---|---|---|
4751740 | Wright | Jun 1988 | A |
4969091 | Muller | Nov 1990 | A |
5615337 | Zimowski et al. | Mar 1997 | A |
5644768 | Periwal et al. | Jul 1997 | A |
5742810 | Ng et al. | Apr 1998 | A |
5774719 | Bowen | Jun 1998 | A |
5778398 | Nagashima et al. | Jul 1998 | A |
5799310 | Anderson et al. | Aug 1998 | A |
5819252 | Benson et al. | Oct 1998 | A |
5862378 | Wang et al. | Jan 1999 | A |
5875332 | Wang et al. | Feb 1999 | A |
5892902 | Clark | Apr 1999 | A |
5940616 | Wang | Aug 1999 | A |
6012067 | Sarkar | Jan 2000 | A |
6016394 | Walker | Jan 2000 | A |
6047291 | Anderson et al. | Apr 2000 | A |
6055637 | Hudson et al. | Apr 2000 | A |
6063133 | Li et al. | May 2000 | A |
6065117 | White | May 2000 | A |
6067414 | Wang et al. | May 2000 | A |
6088524 | Levy et al. | Jul 2000 | A |
6104393 | Santos-Gomez | Aug 2000 | A |
6128621 | Weisz | Oct 2000 | A |
6148342 | Ho | Nov 2000 | A |
6161182 | Nadooshan | Dec 2000 | A |
6167405 | Rosensteel, Jr. et al. | Dec 2000 | A |
6173400 | Perlman et al. | Jan 2001 | B1 |
6219826 | De Pauw et al. | Apr 2001 | B1 |
6233586 | Chang et al. | May 2001 | B1 |
6263313 | Milsted et al. | Jul 2001 | B1 |
6263342 | Chang et al. | Jul 2001 | B1 |
6272488 | Chang et al. | Aug 2001 | B1 |
6279111 | Jensenworth et al. | Aug 2001 | B1 |
6282649 | Lambert et al. | Aug 2001 | B1 |
6289344 | Braia et al. | Sep 2001 | B1 |
6289458 | Garg et al. | Sep 2001 | B1 |
6292936 | Wang | Sep 2001 | B1 |
6308274 | Swift | Oct 2001 | B1 |
6314449 | Gallagher et al. | Nov 2001 | B1 |
6327629 | Wang et al. | Dec 2001 | B1 |
6338056 | Dessloch et al. | Jan 2002 | B1 |
6339777 | Attaluri et al. | Jan 2002 | B1 |
6343286 | Lee et al. | Jan 2002 | B1 |
6591272 | Williams | Jul 2003 | B1 |
20010002486 | Kocher et al. | May 2001 | A1 |
20010008015 | Vu et al. | Jul 2001 | A1 |
20010019614 | Madoukh | Sep 2001 | A1 |
20030018607 | Lennon et al. | Jan 2003 | A1 |
Number | Date | Country | |
---|---|---|---|
20030200218 A1 | Oct 2003 | US |