This application claims the benefit of U.S. Provisional Application No. 60/248,466, filed Nov. 14, 2000, which is incorporated herein by reference. Contained herein is material that is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction of the patent disclosure by any person as it appears in the Patent and Trademark Office patent files or records, but otherwise reserves all rights to the copyright whatsoever.
This invention relates to the field of computer databases, and more specifically, to a database architecture structured in accordance with an information taxonomy.
A database is a collection of information organized in such a way that a computer program can quickly select desired pieces of data. Traditional databases are organized by fields, records, and files. A field is a single piece of information; a record is one complete set of fields; and a file, also known as a table, is a collection of records. A database may comprise a number of tables that are linked by indices and keys, or may be a collection of objects in an object-oriented database.
For example, an employee database may comprise an address book table and a salary table. Within the address book table, each employee record may comprise information such as the employee name, employee number, birth date, address, and hiring date, and within the salary table, each employee record may comprise information such as the employee number, hiring date, hiring level, job title, and salary. The tables and objects for a given database may exist on one or more database instances.
The amount of information that a typical database holds can be astronomical, particularly with Internet-based transactions where the collection and dissemination of information is so vast. In an effort to impart structure to information collected in a database, data (i.e., information in the database) can be organized and partitioned to make databases more manageable. Typically, data is organized and partitioned by item numbers, or numerical identifiers that identify an entry in a database.
For example, in an employee database keyed (i.e., uniquely identified) by employee numbers, data (i.e., employee records) can be organized and partitioned such that employee records 1-100 reside on database instance A; employee records 101-200 reside on database instance B; and employee records 201-300 reside on database instance C; for example. As another example, in a products database keyed by a product number, data can be organized and partitioned such that item numbers 1000-1999 reside on server A; item numbers 2000-2999 reside on server B; and item numbers 3000-3999 reside on server C.
A disadvantage of this system of organization is lack of ease of manageability. A database in which data is partitioned according to a numerical scheme does not lend itself to certain database management tasks, such as strategically splitting data across machines. The task of splitting fixed-size employee records 1-10,000, for example, across 3 machines can be a simple task. However, the complexity of the task may increase when splitting variable-size product records 1-10,000 across 3 machines, since there is no efficient way of partitioning the variable-size records to facilitate database management decisions.
For example, if a database administrator decided that higher-priced products should be stored on the most expensive platform, or that certain machines should be backed-up more frequently because they store high-activity products, it could not feasibly be determined how the records could be partitioned to accommodate these splits.
In one embodiment of the invention, described herein is a memory that facilitates splitting data by taxonomy. The memory may be accessed by an application program, and includes one or more top-level categories, where each top-level category comprises a subset of items; and also includes a category group corresponding to at least one of the top-level categories and the subset of the items belonging to the top-level categories.
The present invention is illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which like reference numerals refer to similar elements and in which:
Described herein is a method, system, and apparatus for partitioning a database by taxonomy. As used herein, a taxonomy refers to a classification of items. In embodiments of the invention, an auction database is illustrated, where the auction database comprises items for sale on an auction website. As used herein, a database refers to all instances of a collection of related information. For example, an auction database can refer to a collection of auction items on machine A for only a single database instance of the auction database, or it can refer to a collection of auction items on storage devices A, B, and C for multiple database instances of the auction database.
In embodiments of the invention, an auction database is partitioned such that there are multiple database instances of the auction database, and items are distributed across multiple storage devices, where each storage devices comprises one or more groups of auction items related by a category group.
The present invention includes various operations, which will be described below. The operations of the present invention may be performed by hardware components or may be embodied in machine-executable instructions, which may be used to cause a general-purpose or special-purpose processor or logic circuits programmed with the instructions to perform the operations. Alternatively, the operations may be performed by a combination of hardware and software.
The present invention may be provided as a computer program product which may include a machine-readable medium having stored thereon instructions which may be used to program a computer (or other electronic devices) to perform a process according to the present invention. The machine-readable medium may include, but is not limited to, floppy diskettes, optical disks, CD-ROMs, and magneto-optical disks, ROMs, RAMs, EPROMs, EEPROMs, magnet or optical cards, flash memory, or other type of media/machine-readable medium suitable for storing electronic instructions.
Moreover, the present invention may also be downloaded as a computer program product, wherein the program may be transferred from a remote computer (e.g., a server) to a requesting computer (e.g., a client) by way of data signals embodied in a carrier wave or other propagation medium via a communication link (e.g., a modem or network connection). Accordingly, herein, a carrier wave shall be regarded as comprising a machine-readable medium.
Introduction
Embodiments of the present invention may be implemented in an online registration page for a website auction, such as eBay.com of eBay™ Inc. of San Jose, Calif.
The auction facility 100 includes one or more of a number of types of front-end servers, namely page servers 102 that deliver Web pages (e.g., markup language documents), picture servers 104 that dynamically deliver images to be displayed within Web pages, listing servers 106, CGI (Common Gateway Interface) servers 108 that provide an intelligent interface to the back-end of facility 100, and search servers 110 that handle search requests to the facility 100. E-mail servers 112 provide, inter alia, automated e-mail communications to users of the facility 100. The back-end servers include a database engine server 114, a search index server 116, and a credit card database server 118, each of which maintains and facilitates access to a respective database 120, 122, 124.
The internet-based auction facility 100 may be accessed by a client program 128, such as a browser (e.g., Internet Explorer distributed by Microsoft Corp. of Redmond, Wash.) that executes on a client machine 126 and accesses the facility 100 via a network such as, for example, the Internet 130. Other examples of networks that a client may utilize to access the auction facility 100 include a wide area network (WAN), a local area network (LAN), a wireless network (e.g., a cellular network), or the Plain Old Telephone Service (POTS) network.
Central to the database 120 is a user table 200, which contains a record for each user of the auction facility 100. A user may operate as a seller, buyer, or both, within the auction facility 100. The database 120 also includes item tables 202 that may be linked to the user table 200. Specifically, the item tables 202 comprise an items table 204, a description table 206, and a bids table 208. A user record in the user table 200 may be linked to multiple items that are being, or have been, auctioned via the facility 100. A link indicates whether the user is a seller or a bidder (i.e., buyer) with respect to items for which records exist within the item tables 202.
Under existing architecture, database 120 resides on a single storage device, such that item tables and user tables, for example, reside on the single storage device. Under an architecture of the present invention, database 120 may reside on a plurality of storage devices such that item tables and user tables may be split across multiple storage devices. In preferred embodiments of the invention, a storage device comprises a memory in a computer system, hereinafter generically referred to as a machine. Other storage devices, such as CD-ROMs and tape drives, are also within the scope of the invention.
Category Groups and Top-Level Categories
In preferred embodiments of the invention, database 120 comprises information about items for sale on an auction website. Items may comprise dolls, antiques, computers, and cars, for example, and are categorized by a top-level category for buyer convenience. For example, items may be categorized in any of the following top-level categories:
Antiques & Art
Books, Movies & Music
Coins & Stamps
Collectibles
Computers
Dolls, Figures
Jewelry, Gemstones
Photo & Electronics
Pottery & Glass
Sports
Toys, Bean Bag Plush
Everything Else
Great Collections
It should be understood that this list is for illustrative purposes only, and does not represent an exhaustive or even a necessary list.
As illustrated in
Each top-level category 306 may belong, or correspond, to one or more category groups 400, where a category group is identified by a category group identifier such as an alphanumeric label. For purposes of illustration, category groups are designated by a Roman numeral, such as “Category Group I”, “Category Group II”, etc. A top-level category 306a may correspond to one category group 400a, where every item in the top-level category 306a belongs to the same category group 400a, or a top-level category 306b may be partitioned such that items within a single top-level category 306b are divided into two or more category groups 400a, 400b. This can be implemented via a cross-reference table, or via methods and classes, for example.
Each category group 400a, 400b may exist on, or correspond to, one or more database instances. In other words, since a category group 400 comprises related tables of item information (i.e., items, item descriptions, and item bids), items in the related tables may be located on one or more database instances. Splitting tables within a category group may also be implemented via a cross-reference table, or via methods and classes, for example.
Adding Items to a Database Partitioned by Taxonomy
Item records are processed before they are added to database 120. In embodiments of the invention, an auction user adds an item to database 120 by entering item information through an item registration page. The item registration page solicits information about an item to be sold on an auction website, such as a “Title”, “Category”, “Description”, “Picture URL”, and “Item Location”, to name a few.
An item number is created for the newly added item. In one embodiment, an item number comprises an automatically generated number and a category group 400 appended to the automatically generated number. As an item (and its item number) corresponds to a top-level category, and a top-level category belongs to a category group, the appended category group 400 corresponds to the item. This embodiment entails automatically generating a number, where the number can be arbitrary, or sequential.
The top-level category 306 corresponding to the entered item is used to determine a corresponding category group 400. A category group identifier corresponding to the category group 400 for the given item is then appended to the automatically generated number to generate an item number. The automatically generated number can be globally unique, or it can be locally unique, where the automatically generated number is generated within the category group to which the item belongs.
In another embodiment, an item number is generated for an item in accordance with a numbering scheme unique to the item's category group. For example, a “Category Group I” category group may comprise item numbers 1-10,000 corresponding to low-volume top-level categories, and a “Category Group II” may comprise item numbers 50,000-1,000,000 corresponding to high-volume top-level categories. Thus, if an item is added, and corresponds to “Category Group II” (i.e., the item belongs to a top-level category corresponding to “Category Group II”), then an item number will be generated in the range of 50,000-1,000,000.
An item information object comprising a record is created in item table and a related item information object comprising a corresponding record is created in item descriptions table. Records in item table comprise data fields for item information such as ‘Title” and “Category”, and records in item descriptions table comprise data fields for information such as “Description”. Records in the items table also comprise an item number. As buyers place bids on an auction item, related item information objects comprising records corresponding to the auction item are created in the bids table.
Searching for Items in a Database Partitioned by Taxonomy
A website auction user may search for items in auction database 120. Auction user enters a search word or phrase to search for items. In one embodiment, a user may request to search all categories. In this embodiment, each category group comprises its own search database, and well-known methods of text search are executed over each search database. Item numbers corresponding to relevant items are returned.
In an alternative embodiment, users are limited to search within top-level categories. A user selects a top-level category to conduct a search, and well-known methods of text search are executed over the search database corresponding to the selected top-level category.
In yet another embodiment, multiple streams of items (one from each category group) are input to the search database. The indexing is then serialized when all updates from all groups are completed.
Listing Items in a Database Partitioned by Taxonomy
A website auction user may list items in the auction database 120. In one embodiment, a ListingsProduce method is executed for each category group to generate an items.map file for each category group. A dynamic link library (DLL) is programmed to read multiple items.map files, which then displays items in auction database 120.
In another embodiment, there is a pool having one or more machines for each category group. In this case, the DLL maintains link consistency between the pages.
Caches
Since item information is potentially split across multiple databases, joins with tables comprising other information are not always possible. For instance, each item record tracks a seller of the item, as well as a high bidder for the item. Under the current architecture, when a given item is displayed, an items table comprising seller I.D.s and high bidder I.D.s, a seller table comprising seller I.D.s and seller text, and a bidder table comprising bidder I.D.s and bidder text, which all exist on the same database instance, are joined such that each item corresponds to a seller I.D. as well as seller text, and to a high bidder I.D. as well as high bidder text. This all occurs under a single join operation.
Since a join operation cannot always be utilized under an architecture of the existing invention, other methods must be utilized to obtain information. In the example above, seller and high bidder text for a given item can be obtained by matching the seller I.D. in the item record to the seller I.D. in the seller record, and by matching the high bidder I.D. in the item record to the bidder I.D. in the bidder record. However, since two operations are now performed rather than a single operation, machine performance may become an issue. Consequently, caches are used in the present invention to optimize data retrieval.
A user list cache makes seller and bidder text available to requesting processes. In reference to the example above, when an item is displayed, a user list cache is accessed to determine if the corresponding seller I.D. and the corresponding high bidder I.D. for the item exist. If the I.D.s exist in the cache, then the seller and bidder text are retrieved from the cache and displayed. If not, then the seller I.D. is keyed to the seller I.D. in the seller table, and a record is created for the seller I.D. and corresponding text in the cache; and the bidder I.D. is keyed to the bidder table, and a record is created for the bidder I.D. and corresponding text in the cache.
The next time the seller I.D. or the bidder I.D. is encountered, the corresponding text can be retrieved from the cache on the machine from which the application is being executed, rather than from the database instance, which can be on another machine.
A category group cache makes item information, such as an item description, available to requesting processes. For instance, when an item is displayed, a category group cache is accessed to determine if the item description corresponding to the item exists in the cache. If the item exists in the cache, then the corresponding item description is retrieved from the cache and displayed. If not, then the item is keyed to the item description table, and a record is created for the item and its corresponding item description in the cache. The next time the item is encountered, the corresponding item description can be retrieved from the cache on the machine from which the application is being executed, rather than from the database instance, which can be on another machine.
Auction users may request to track bidding and selling activities. If a user requests items that the user has bidded on, a list of items and corresponding item information is retrieved for the user. Similarly, if a user requests items that the user has listed for sale, a list of those items and corresponding item information is retrieved for the user. However, since item information is not necessarily located on the same database instance, item information objects cannot be joined. Consequently, to find all items would require searching all category groups, which can be located on more than one database instance, and would be a time-consuming process.
Instead, item information is obtained from a seller category group cache or a bidder category group cache. When a user bids on or sells an item in a category group, an entry for the user is created in the user category group cache if the user doesn't already exist. The user entry in the user category group cache is then associated with the category group corresponding to the item in question.
These caches facilitate a request, for example, to find all items that a particular seller is selling, or to find all items that a particular buyer is bidding on. Instead of searching through every category group to find items associated with a particular seller or a particular bidder, the caches can be consulted to find only those category groups in which the seller is selling, or in which the buyer is bidding.
Conclusion
The invention as described above provides several advantages over what is currently done. The failure of any single machine comprising one or more category groups will not affect all items. Splitting items across several database instances allows items to be added without having to worry about running a machine to capacity. Splitting data by taxonomy also simplifies database management tasks if a particular business associated with the items provides some predictability about the size and activity of the data being split off. Taxonomy-based partitioning provides tangible benefits. Data stored in accordance with the taxonomy allows data to be more efficiently stored, and allows data to be more efficiently backed-up.
In the foregoing specification, the invention has been described with reference to specific embodiments thereof. It will, however, be evident that various modifications and changes may be made thereto without departing from the broader spirit and scope of the invention. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.
For example, embodiments of this invention should not be limited to the area of e-commerce, or online auctions, to the extent that the embodiments do not read upon prior art. It should be understood by one skilled in the art that concepts of this invention have general application in the area of database management. Furthermore, any references to specific top-level categories or category groups should not be construed as being limited to those discussed. It should be understood that such references are for illustrative purposes only.
This application is a continuation of U.S. application Ser. No. 09/992,594 filed Nov. 13, 2001, and claims the benefit of U.S. Provisional Application No. 60/248,466 filed Nov. 4, 2000, which applications are incorporated herein by reference in their entirety.
Number | Name | Date | Kind |
---|---|---|---|
3573747 | Adams et al. | Apr 1971 | A |
3581072 | Nymeyer | May 1971 | A |
4412287 | Braddock, III | Oct 1983 | A |
4674044 | Kalmus et al. | Jun 1987 | A |
4677552 | Sibley, Jr. | Jun 1987 | A |
4789928 | Fujisaki | Dec 1988 | A |
4799156 | Shavit et al. | Jan 1989 | A |
4823265 | Nelson | Apr 1989 | A |
4864516 | Gaither et al. | Sep 1989 | A |
4903201 | Wagner | Feb 1990 | A |
5063507 | Lindsey et al. | Nov 1991 | A |
5077665 | Silverman et al. | Dec 1991 | A |
5101353 | Lupien et al. | Mar 1992 | A |
5136501 | Silverman et al. | Aug 1992 | A |
5168446 | Wiseman | Dec 1992 | A |
5205200 | Wright | Apr 1993 | A |
5243515 | Lee | Sep 1993 | A |
5258908 | Hartheimer et al. | Nov 1993 | A |
5280422 | Moe et al. | Jan 1994 | A |
5297031 | Gutterman et al. | Mar 1994 | A |
5297032 | Trojan et al. | Mar 1994 | A |
5305200 | Hartheimer et al. | Apr 1994 | A |
5325297 | Bird et al. | Jun 1994 | A |
5329589 | Fraser et al. | Jul 1994 | A |
5375055 | Togher et al. | Dec 1994 | A |
5394324 | Clearwater | Feb 1995 | A |
5426281 | Abecassis | Jun 1995 | A |
5485510 | Colbert | Jan 1996 | A |
5553145 | Micali | Sep 1996 | A |
5557728 | Garrett et al. | Sep 1996 | A |
5598557 | Doner et al. | Jan 1997 | A |
5640569 | Miller et al. | Jun 1997 | A |
5657389 | Houvener | Aug 1997 | A |
5664115 | Fraser | Sep 1997 | A |
5689652 | Lupien et al. | Nov 1997 | A |
5694546 | Reisman | Dec 1997 | A |
5706457 | Dwyer et al. | Jan 1998 | A |
5710889 | Clark et al. | Jan 1998 | A |
5715314 | Payne et al. | Feb 1998 | A |
5715402 | Popolo | Feb 1998 | A |
5717989 | Tozzoli et al. | Feb 1998 | A |
5727165 | Ordish et al. | Mar 1998 | A |
5760917 | Sheridan | Jun 1998 | A |
5761655 | Hoffman | Jun 1998 | A |
5771291 | Newton et al. | Jun 1998 | A |
5771380 | Tanaka et al. | Jun 1998 | A |
5790790 | Smith et al. | Aug 1998 | A |
5794219 | Brown | Aug 1998 | A |
5799285 | Klingman | Aug 1998 | A |
5803500 | Mossberg | Sep 1998 | A |
5818914 | Fujisaki | Oct 1998 | A |
5826244 | Huberman | Oct 1998 | A |
5835896 | Fisher et al. | Nov 1998 | A |
5845265 | Woolston | Dec 1998 | A |
5845266 | Lupien et al. | Dec 1998 | A |
5850442 | Muftic | Dec 1998 | A |
5872848 | Romney et al. | Feb 1999 | A |
5873069 | Reuhl et al. | Feb 1999 | A |
5884056 | Steele | Mar 1999 | A |
5890138 | Godin et al. | Mar 1999 | A |
5892510 | Lau et al. | Apr 1999 | A |
5905975 | Ausubel | May 1999 | A |
5922074 | Richard et al. | Jul 1999 | A |
6035402 | Vaeth et al. | Mar 2000 | A |
6047264 | Fisher et al. | Apr 2000 | A |
6061448 | Smith et al. | May 2000 | A |
6085176 | Woolston | Jul 2000 | A |
6104815 | Alcorn et al. | Aug 2000 | A |
6119137 | Smith et al. | Sep 2000 | A |
6178408 | Copple et al. | Jan 2001 | B1 |
6192407 | Smith et al. | Feb 2001 | B1 |
6202051 | Woolston | Mar 2001 | B1 |
6243691 | Fisher et al. | Jun 2001 | B1 |
7113954 | Vogel | Sep 2006 | B2 |
8458214 | Wilson | Jun 2013 | B1 |
20050086256 | Owens et al. | Apr 2005 | A1 |
Number | Date | Country |
---|---|---|
2253543 | Mar 1997 | CA |
WO-9215174 | Sep 1992 | WO |
WO-9634356 | Oct 1996 | WO |
WO-9737315 | Oct 1997 | WO |
WO-9918510 | Apr 1999 | WO |
WO-9963461 | Dec 1999 | WO |
WO-0032088 | Nov 2000 | WO |
Entry |
---|
“U.S. Appl. No. 09/992,594, Final Office Action mailed Apr. 11, 2008”, FOAR, 2 pgs. |
“U.S. Appl. No. 09/992,594, Final Office Action mailed Sep. 2, 2011”, 13 pgs. |
“U.S. Appl. No. 09/992,594, Non Final Office Action mailed Jun. 21, 2012”, 15 pgs. |
“U.S. Appl. No. 09/992,594, Non Final Office Action mailed Sep. 12, 2007”, 14 pgs. |
“U.S. Appl. No. 09/992,594, Notice of Allowance mailed Jan. 30, 2013”, 9 pgs. |
“U.S. Appl. No. 09/992,594, Response filed Feb. 2, 2012 to Final Office Action mailed Sep. 2, 2011”, 8 pgs. |
“U.S. Appl. No. 09/992,594, Response filed Jun. 11, 2008 to Final Office Action mailed Jan. 11, 2008”, 12 pgs. |
“U.S. Appl. No. 09/992,594, Response filed Sep. 20, 2012 to Non Final Office Action mailed Jun. 21, 2012”, 8 pgs. |
“U.S. Appl. No. 09/992,594, Response filed Dec. 11, 2007 to Non Final Office Action mailed Sep. 12, 2007”, 12 pgs. |
“U.S. Appl. No. 09/992,594, Response filed Jun. 22, 2011 to Notice of Non-Compliant Amendment mailed Mar. 22, 2011”, 7 pgs. |
“U.S. Appl. No. 09/992,594, Advisory Action mailed Jul. 3, 2008”. |
“U.S. Appl. No. 09/992,594, Appeal Brief filed Sep. 11, 2008”, 23 pgs. |
“U.S. Appl. No. 09/992,594, Decision on Appeal Brief mailed Jan. 7, 2011”, 11 pgs. |
“U.S. Appl. No. 09/992,594, Examiners Answer mailed Dec. 10, 2008”, 20 pgs. |
“U.S. Appl. No. 09/992,594, Notice of Non-Compliant Amendment mailed Mar. 22, 2011”, 3 pgs. |
“U.S. Appl. No. 09/992,594, Reply Brief filed Feb. 10, 2009”, 7 pags. |
Number | Date | Country | |
---|---|---|---|
20130262531 A1 | Oct 2013 | US |
Number | Date | Country | |
---|---|---|---|
60248466 | Nov 2000 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 09992594 | Nov 2001 | US |
Child | 13908394 | US |