1. Field of the Invention
The present invention relates generally to an improved data processing system. More specifically, the present invention is directed to a computer implemented method, apparatus, and computer usable program code to retrieve product information data using a product information search engine.
2. Description of the Related Art
Currently, a variety of data are stored in databases. These databases, which include many different categories and types of information, make data available for retrieval by users as needed. While some databases are relatively simple in use and application, the nature and amount of data contained within a database may be quite extensive and complex. As a result, querying complex interrelated data may be substantially more difficult and complicated if truly usable information is going to be retrieved from a database containing such complex data.
A user employs a query language to retrieve desired data from a database. The word query means to interrogate a collection of data, such as, for example, records in a database. Query languages define the syntax that a user must utilize to communicate with a database. The query language determines which data is manipulated or retrieved within the database. Two commonly used query languages are structured query language (SQL) and object-oriented query language.
Within the various specialized fields of database and query language programming, relational database systems and hierarchical database systems are widely used. A relational database may be seen as a collection of tables that contain aggregated data about different entities. A hierarchical database may be seen as an organizational chart that links records in a hierarchy from a top root node down to bottom terminal nodes. Usually, the data within a relational database is indexed, whereas data within a hierarchical database is non-indexed. Also, a relational database typically needs an SQL query search engine to retrieve data, whereas a hierarchical database usually requires a specific query search engine to retrieve data. Consequently, querying a complex database that contains both relational and hierarchical data with a standard SQL or specific search engine may not retrieve complete or accurate information.
A product information management system may employ such a complex database that includes a very unique data structure of highly interconnected referential and hierarchical product information. An enterprise may use the product information management system to assemble an accurate, consistent central repository of product information, which may otherwise be scattered throughout the enterprise's other systems. The data within the complex database may include, for example, product name, type, description, price, picture, location, trading partner, shipping information, terms of trade information, and the like. Some product information may be localized, that is, the information about the same item may differ from region to region or from one branch to another branch. Searching such a unique and complex graph of product information through a standard query language search engine may be cumbersome and difficult for the user.
Thus, it would be beneficial to have a computer implemented method, apparatus, and computer usable program code to provide a simple, user-friendly product information search engine to retrieve relational and hierarchical product information data within high quality product information management systems.
The present invention provides a computer implemented method, apparatus, and computer useable program code for searching for data in a database. A query search is received in a query language using objects. The query search includes a number of objects having attributes. A set of hybrid query instructions is generated using the number of objects having attributes for searching relational and hierarchical data in the database. In response to generating the set of hybrid query instructions recognized by the database for searching data, the set of hybrid instructions are executed to obtain a result from the database.
The novel features believed characteristic of the invention are set forth in the appended claims. The invention itself, however, as well as a preferred mode of use, further objectives and advantages thereof, will best be understood by reference to the following detailed description of an illustrative embodiment when read in conjunction with the accompanying drawings, wherein:
With reference now to the figures and in particular with reference to
With reference now to the figures,
In the depicted example, server 104 and server 106 connect to network 102 along with storage unit 108. In addition, clients 110, 112, and 114 also connect to network 102. Clients 110, 112, and 114 may be, for example, personal computers or network computers. In this depicted example, server 104 provides data, such as boot files, operating system images, and applications to clients 110, 112, and 114. Clients 110, 112, and 114 are clients to server 104 in this example. Network data processing system 100 may include additional servers, clients, and other devices not shown.
In the depicted example, network data processing system 100 is the Internet with network 102 representing a worldwide collection of networks and gateways that use the Transmission Control Protocol/Internet Protocol (TCP/IP) suite of protocols to communicate with one another. At the heart of the Internet is a backbone of high-speed data communication lines between major nodes or host computers, consisting of thousands of commercial, governmental, educational and other computer systems that route data and messages. Of course, network data processing system 100 also may be implemented as a number of different types of networks, such as for example, an intranet, a local area network (LAN), or a wide area network (WAN).
With reference now to
In the depicted example, data processing system 200 employs a hub architecture including north bridge and memory controller hub (NB/MCH) 202 and south bridge and input/output (I/O) controller hub (SB/ICH) 204. Processing unit 206, main memory 208, and graphics processor 210 are connected to north bridge and memory controller hub 202. Processing unit 206 contains a set of one or more processors. When more than one processor is present, these processors may be separate processors in separate packages. Alternatively, the processors may be multiple cores in a package. Further, the processors may be multiple multi-core units.
In the depicted example, local area network (LAN) adapter 212 connects to SB/ICH 204. Audio adapter 216, keyboard and mouse adapter 220, modem 222, read only memory (ROM) 224, hard disk drive (HDD) 226, CD-ROM drive 230, universal serial bus (USB) ports and other communication ports 232, and PCI/PCIe devices 234 connect to SB/ICH 204 through bus 238 and bus 240. PCI/PCIe devices may include, for example, Ethernet adapters, add-in cards, and PC cards for notebook computers. PCI uses a card bus controller, while PCIe does not. ROM 224 may be, for example, a flash binary input/output system (BIOS).
HDD 226 and CD-ROM drive 230 connect to SB/ICH 204 through bus 240. HDD 226 and CD-ROM drive 230 may use, for example, an integrated drive electronics (IDE) or serial advanced technology attachment (SATA) interface. Super I/O (SIO) device 236 may be connected to SB/ICH 204.
An operating system runs on processing unit 206 and coordinates and provides control of various components within data processing system 200 in
As a server, data processing system 200 may be, for example, an IBM® eServer™ pSeriese® computer system, running the Advanced Interactive Executive (AIX®) operating system or the LINUX® operating system (eServer, pSeries and AIX are trademarks of International Business Machines Corporation in the United States, other countries, or both while LINUX is a trademark of Linus Torvalds in the United States, other countries, or both). Data processing system 200 may be a symmetric multiprocessor (SMP) system including a plurality of processors in processing unit 206. Alternatively, a single processor system may be employed.
Instructions for the operating system, the object-oriented programming system, and applications or programs are located on storage devices, such as HDD 226, and may be loaded into main memory 208 for execution by processing unit 206. The processes for embodiments of the present invention are performed by processing unit 206 using computer usable program code, which may be located in a memory such as, for example, main memory 208, ROM 224, or in one or more peripheral devices 226 and 230.
Those of ordinary skill in the art will appreciate that the hardware in
In some illustrative examples, data processing system 200 may be a personal digital assistant (PDA), which is configured with flash memory to provide non-volatile memory for storing operating system files and/or user-generated data.
A bus system may be comprised of one or more buses, such as bus 238 or bus 240 as shown in
Aspects of the present invention provide a computer implemented method, apparatus, and computer usable program code for retrieving product information data in a data processing system. However, it should be noted that although in this illustrative example product information management system data is retrieved, other types of data may be retrieved or queried. The data processing system uses a search engine within a product information management system to retrieve the product information data. The product information management system may reside, for example, in a server within the data processing system. The product information data is information contained in a database and/or memory of the data processing system with regard to products and/or services provided by an enterprise utilizing the product information management system.
In response to receiving a product query search to retrieve product data, the search engine generates a set of hybrid query instructions using a product information management domain specific query language for searching relational and hierarchical product information data. A domain specific language is created specifically to solve problems in a particular domain and is not intended to solve problems outside of the particular domain the language is created for. For example, in this illustration, the domain is a product information management system and the product information management domain specific query language used to generate the set of hybrid query instructions is only utilized within the product information management system. Hence, a hybrid query instruction is generated by the search engine by creating an abstract syntax tree from a user created query search or statement. The hybrid query instruction is stored within the abstract syntax tree. The hybrid query instruction is able to search indexed and non-indexed data within a data repository, such as a database and/or memory, using a hybrid of structured query language and object-oriented query language to return a combined result of both the indexed and non-indexed data searches. During hybrid query instruction generation, the search engine populates the abstract syntax tree with different indexed and non-indexed data using visitor pattern. The data populating the abstract syntax tree are specific object attributes, which include indexed and non-indexed attributes, stored in the nodes of the abstract syntax tree. These indexed and non-indexed attributes stored in the abstract syntax tree determine what actions the search engine takes with regard to the query search, such as performing the structured query language search of the indexed attributes first and then if non-indexed attributes are discovered after performing the structured query language search an object-oriented query language search is performed for the non-indexed attributes. However, it should be noted that although an abstract syntax tree is used in this illustrative example, other types of data structures or abstractions may be utilized to maintain the relationships between the objects and the object's attributes. In addition, the set of hybrid query instructions may be zero, one, or more hybrid query instructions.
The product information management domain specific query language is a hybrid query language that includes both structured query language and object-oriented query language. In addition, the product information management domain specific language defines product information management objects. The objects may be, for example, item, category, catalog, hierarchy, location, spec, and step. These objects all hold data. The data, to be more specific, are object attributes. In other words, the attributes are part of the object. The object attributes may be, for example, an object join or member data. Examples of an object join using dot notation may be, for example, item.category and item.catalog, which are illustrated in
Using aspects of the present invention, a user, such as, for example, an application developer, administrator, or end user, may interact with and query indexed, as well as non-indexed, product information data contained within a product information management system without understanding the complexities of the product information management system's data model. Thus, aspects of the present invention allows an end user to quickly and effectively find product information by performing a query search operation against complicated product data through a relational structured query language based search and/or a hierarchical Java™ serialized based search. This query search operation is made possible by augmenting metadata of relational databases with domain specific language of the product information management system.
Referring now to
Server 302 may include product information management system 310, and server 304 may include database 312. Even though product information management system 308 is depicted within server 302, embodiments of the present invention are not restricted to such. For example, product information management system 310 may reside in another client device, such as, for example, client 112 in
Product information management system 310 is a software application designed to manage an enterprise's product information in a central repository. A user utilizing client 306 may access product information management system 310 to manipulate or retrieve desired product information data. Product information management system 310 may use database 312 to store the enterprise's product information. Database 312 may, for example, represent a plurality of databases and/or memory, such as main memory 208 and ROM 224 in
Database 312 is able to store product information in two primary forms. One form is structured or indexed data and the other is semi-structured or non-indexed data. As discussed previously above, querying both indexed and non-indexed data within database 312 using a standard search engine may not produce the desired product information result. Consequently, embodiments of the present invention implement a unique search engine, such as search engine 314, to perform a search of indexed and non-indexed data within database 312. Search engine 314 resides in product information management system 310. Further, search engine 314 may include, for example, syntax parser 316, semantic checker 318, SQL search processor 320, and Java™ search processor 322.
Product information management system 310 utilizes search engine 314 to execute a product query search sent, for example, from client 306, parse the product query search, generate a set of hybrid query instructions, execute the set of hybrid query instructions, and return a result set to client 306. The product information management domain specific query language adopts the SQL syntax and defines a list of data objects in the product information management domain specific query language so that the product information management domain specific query language includes both an SQL based query language and an object-oriented based query language. These data objects are built in the product information management domain specific language and are used to generate hybrid query instructions. Examples of these data objects may be, for example, item, category, catalog, hierarchy, location, spec, and step. These object examples are illustrated in
Subsequent to receiving the product query search string from client 306, search engine 314 parses the query string and performs syntax analysis. Parsing means to breakdown the query string into the query string's functional units. Search engine 314 utilizes syntax parser 316 to parse the query string. Search engine 314 uses the parsed query string to generate an abstract syntax tree. An abstract syntax tree is a finite, labeled, directed tree, where operators label the internal nodes and the leaf nodes represent the operands of the node operators. An operator performs an operation and an operand references data.
In addition to parsing the query string, syntax parser 316 also performs the syntax analysis of the query string. Syntax parser 316 analyzes the query string to make sure that the query string adheres to each syntax rule. If syntax parser 316 determines that the query string violates the syntax rules, then an error message is sent to client 306. For example, a query search string, such as new SearchQuery string, may throw a parsing exception with the following error message: ‘Invalid search query:<SearchQuery string>-<error details>’.
After generating the abstract syntax tree and analyzing the query for syntax error, search engine 314 performs a semantic check over the abstract syntax tree. Search engine 314 employs semantic checker 318 to perform the semantic check. If semantic checker 318 determines that the query violates any semantic rule, then an error message is sent to client 306. For example, the query may throw a search unsupported exception with the following error message:‘Unsupported attribute and/or predicate in query:<SearchQuery string>-<error details>’.
After performing the semantic check over the abstract syntax tree, search engine 314 generates the hybrid query instructions. Hybrid query instruction generation may include several steps, such as, for example, data population of the abstract syntax tree, attribute analysis, and hybrid query instruction generation. An attribute represents a single element of a product information data object. During attribute analysis, the abstract syntax tree is traversed to collect all attribute data. The goal of attribute analysis is to create a map between the product query search path and the target data. The query attribute path is a sequence or list of terminal attribute nodes in the abstract syntax tree. Embodiments of the present invention may use, for example, named attributes and spec-driven attributes as terminal attribute nodes. For example, the query attribute node “item.category [‘spec/xyz’]” includes three terminal attribute nodes: the named attribute node ‘item’ the named attribute node ‘category’ and the spec-driven attribute node ‘[spec/xyz]’. Named attributes may include, for example, item named attributes, category named attributes, spec named attributes, step named attributes and location named attributes. Spec-driven attributes define attributes using a spec and a path in the spec.
Search engine 314 may use, for example, visitor pattern to generate the set of hybrid query instructions. Visitor pattern represents an operation to be performed on the elements of an object structure. A visitor defines a new operation without changing the classes of the objects on which the visitor operates.
Search engine 314 uses visitors to populate data to the abstract syntax tree. In addition, search engine 314 uses visitors to extract query attributes and predicates from the abstract syntax tree. For example, search engine 314 may use the visitor PopulateFromDBVisitor to retrieve data from database 312 and store the retrieved data in the abstract syntax tree. Also, search engine 314 may use, for example, the visitor SQLAttributeVisitor to traverse the abstract syntax tree to collect all attribute data.
Each query attribute path indicates the information for creating the target SQL statement, which includes the table and column type in the relational or indexed product information data model. After analysis of the query, all attributes in the query are identified and mapped to product information data in database 312. These attributes are translated into a SQL based query.
After generating the set of hybrid query instructions, search engine 314 executes the product query search using the set of hybrid query instructions. Search engine 314 may execute the set of hybrid query instructions by, for example, performing a two phase search. In the first phase, search engine 314 employs an SQL based search of the relational product information data within database 312, by utilizing, for example, SQL search processor 320. Search engine 314 performs the relational product information data search first because the product information management domain specific query language adopts the SQL syntax. As a result, search engine 314 searches for indexed object attributes contained within the abstract syntax tree first. If after the first phase of the search serialized or non-indexed object attributes are discovered within the abstract syntax tree, search engine 314 performs the second phase of the search. In the second phase, search engine 314 performs a Java™ based search of the serialized or non-indexed product information data within database 312 by using, for example, Java™ search processor 322.
Below, exemplary product query search statements are shown, along with a description of the target product query search data, for illustration purposes only. As a general example, the product query search statement:
It should be noted that embodiments of the present invention are not limited to the exemplary product query search statements above. Search engine 314 may generate any hybrid query instructions capable of retrieving any user desired product information query search data located within database 312. Also, it should be noted that as in SQL, each SQL statement returns a table in which the columns are the attributes or objects in the select clause and each row is a match against the where clause.
Embodiments of the present invention expose specs, items, categories, steps, catalogs, and hierarchies as first class objects. First class objects have an identity independent of any other object. This first class identity allows an object to persist when its attributes change. Also, this first class identity allows other objects to claim relationships with the object. A from clause allows direct access to these first class objects and the first class objects may be returned by the select clause.
These first class objects have a fixed number of named attributes and a flexible number of domain specific (spec) driven attributes. A spec-driven attribute is defined in the domain specific language and mappings dictate whether the spec-driven attribute applies to a given item or category. Spec-driven attributes are referred to by using a fully qualified attribute path:‘<spec name>/<attribute path within spec>’ where <attribute path within spec> is the path to the attribute starting from the root node of the spec. With hierarchical specs, depending on the level of nesting of the attribute, the path may contain multiple slashes.
Basically, items and categories behave as hashtables whose keys are attribute paths. This hashtable notation is used in the select, where, and order by clause. In addition to spec-driven attributes, an item also exposes a set of named attributes. These named attributes are accessed using dot notation, such as, for example, item.<named attribute name>.
Named attributes also may be used in the select, where, and order by clauses. For example, the product query search statement:
Category named attributes behave the same way as item named attributes. For example, the product query search statement:
Using embodiments of the present invention, dot and hashtable notations may freely be mixed together. For example, since item.category is itself a category object, the following attributes are valid:
Turning now to
Query search window 400 may include, for example, search options 402 and search template 404. A user may utilize search options 402 to, for example, specify the scope of the search, how to sort the search results, and where to display the search results. The scope of the search may be defined by “search within” 406. A user may employ “search within” 406 to run the search within a given selection, such as in this particular example, the entire data hierarchy. “Search within” 406 may present selections, for example, in a drop-down menu. Another given selection for “search within” 406 drop-down menu may be, for example, any saved selection, such as a search within a pre-saved selection of hierarchy nodes.
A user may use “sort by” 408 to determine how the search engine, such as search engine 314 in
A user may utilize show “results in” 412 to indicate where the search engine displays the search results. Once again, selections for show “results in” 412 may be presented in a drop-down menu. Given selections for show “results in” 412 may be, for example, single edit, multiple edit, or based on number of matches. By selecting single edit or multiple edit, the search engine displays the search results on the single edit or multiple edit tabbed page, such as “single edit” tabbed page 414 or “multiple edit” tabbed page 416, respectively, regardless of whether there are one or more entries matching the query search criteria. On the other hand, by selecting the based on number of matches option, the search engine shows the search result on the single edit tabbed page if there is only one matched entry or on the multiple edit tabbed page if there are two or more matched entries. In this particular example, the user selected the multiple edit option in the drop-down menu.
A user may employ “search template” 404 to pre-define a product information query search template with a collection of specs and/or attributes. The user uses “select search template” 418 to choose a previously saved template from a list of saved templates presented in, for example, a drop-down menu. Subsequent to selecting a previously saved product information query search template from the drop-down menu, the user mouse clicks on “load” button 420 to load the previously saved product information query search template data.
After selecting a previously saved query search template, the user may desire to edit the selected query search template. Consequently, the user may mouse click on “edit” button 422. Subsequent to mouse clicking “edit” button 422, a new search template window, such as new search template window 500 in
If the user desires to create a new product information query search template, the user may mouse click on “new” button 426. After mouse clicking “new” button 426, the new search template window appears in the client device for the user to create a new product information query search template. Subsequent to inputting all the desired product information query search criteria within query search window 400, the user may mouse click on “search” button 428 to start the product information search query process. Alternatively, the user may mouse click on “clear” button 430 to clear all inputs within query search window 400.
With reference now to
New product information search template window 500 may include, for example, template “name” textbox 502, template “description” textbox 504, and template “attribute picker” 506. A user utilizes “attribute picker” 506 to list specs and/or attributes within “specs in collection” textbox 508 and “attributes in collection for specs” textbox 510, respectively.
After inputting all necessary criteria to create the new product information query search template or to edit a previously saved product information query search template, the user may mouse click on “save” button 512 to save all inputs within new product information search template window 500. Alternatively, the user may mouse click “cancel” button 514 to cancel all inputs within new product information search template window 500.
Referring now to
Object model in product information management domain specific query language 600 shows a type name with an uppercase first letter and an object name with a lowercase first letter. For example, ‘Item’ is a type name and ‘item’ is an object name. Each box within object model in product information management domain specific query language 600 represents an object. Each object box contains an object name and a type name. The object box lists all available non-object attributes.
Spec-driven attributes 602 for item relationship and item link are of the object type Item and behave as item:Item 604. One object box may link to another object box where the linked object box is the object attribute of the linking object box. For example, object box 604 may link to object box 606 where linked object box 606 is the object attribute of the linking object box 604. Location, parent and child object boxes 608 are of the object type Category and behave as category:Category 610. Spec-driven attribute 612 for type names Item and Category are of non-object types, whereas, item link and item relationship spec-driven attributes 602 are object types. Item link, item relationship, and item are types of Item.
Turning now to
The process begins when the server device receives a request for a product information search query from a client device, such as client 306 in
The product information management system receives the product query search string contained within the query search window (step 706). After receiving the product query search string in step 706, the product information management system employs a search engine, such as search engine 314 in
In addition to parsing the product query search string in step 708, the syntax parser makes a determination as to whether the product query search string contains syntax errors (step 710). If the product query search string does contain syntax errors, yes output of step 710, then the product information management system sends an error report to the client device (step 712) and the process terminates thereafter. If the product query search string does not contain syntax errors, no output of step 710, then the search engine generates a set of hybrid query instructions from the parsed product query search string (step 714).
Hybrid query instruction generation includes, for example, data population of the abstract syntax tree, attribute analysis, and generation of the hybrid query instruction. After analysis of the product query search, the search engine generates the hybrid query instruction by identifying all object attributes in the abstract syntax tree and mapping the identified object attributes to the product information data contained in a database, such as, for example, database 312 in
If the hybrid query instruction set contains semantic errors, yes output of step 716, then the product information management system sends an error report to the client device (step 718) and the process terminates thereafter. If the hybrid query instruction set does not contain semantic errors, no output of step 716, then the search engine executes a relational indexed product information data search of data residing in a database, such as, database 312 in
After executing the indexed product information data search in step 720, the search engine makes a determination whether a hierarchical non-indexed product information data search is required (step 722). If the search engine discovered a reference to non-indexed product information data during the indexed product information data search in step 720, then a hierarchical non-indexed product information data search is required. If the search engine did not discover a reference to non-indexed product information data during the indexed product information data search in step 720, then a hierarchical non-indexed product information data search is not required.
If a hierarchical non-indexed product information data search is required, yes output of step 722, then the search engine executes a hierarchical non-indexed product information data search using the generated hybrid query instruction set (step 724). The search engine may utilize, for example, a Java™ search processor, such as Java™ search processor 322 in
With reference now to
The process begins when a user using the client device sends a request for a product information query search to a server device, such as server 302 in
After inputting the desired information for the product query search, the user utilizing the client device sends the product information query search window back to the product information management system for processing (step 806). The product information management system uses a search engine, such as search engine 314 in
Referring now to
Thus, embodiments of the present invention provide a computer implemented method, apparatus, and computer usable program code for retrieving product information query search data. The invention can take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment containing both hardware and software elements. In a preferred embodiment, the invention is implemented in software, which includes but is not limited to firmware, resident software, microcode, etc.
Furthermore, the invention can take the form of a computer program product accessible from a computer-usable or computer-readable medium providing program code for use by or in connection with a computer or any instruction execution system. For the purposes of this description, a computer-usable or computer readable medium can be any tangible apparatus that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
The medium can be an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system (or apparatus or device) or a propagation medium. Examples of a computer-readable medium include a semiconductor or solid state memory, magnetic tape, a removable computer diskette, a random access memory (RAM), a read-only memory (ROM), a rigid magnetic disk and an optical disk. Current examples of optical disks include compact disk—read only memory (CD-ROM), compact disk—read/write (CD-R/W) and DVD.
A data processing system suitable for storing and/or executing program code will include at least one processor coupled directly or indirectly to memory elements through a system bus. The memory elements can include local memory employed during actual execution of the program code, bulk storage, and cache memories which provide temporary storage of at least some program code in order to reduce the number of times code must be retrieved from bulk storage during execution.
Input/output or I/O devices (including but not limited to keyboards, displays, pointing devices, etc.) can be coupled to the system either directly or through intervening I/O controllers.
Network adapters may also be coupled to the system to enable the data processing system to become coupled to other data processing systems or remote printers or storage devices through intervening private or public networks. Modems, cable modem and Ethernet cards are just a few of the currently available types of network adapters.
The description of the present invention has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art. The embodiment was chosen and described in order to best explain the principles of the invention, the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated.