Claims
- 1. A system of generating an inverted index from a data repository, the system comprising:
a data retriever for retrieving selected numeric attributes from the data repository; a tokenizer for generating a plurality of tokens from each of the numeric attributes based on a binary value of each numeric attribute; and an indexer for generating an inverted index using each of the plurality of tokens as a key.
- 2. The system as set forth in claim 1 wherein, for a numeric attribute having a binary value represented by n-bits, the tokenizer generates n-tokens from the binary value.
- 3. The system as set forth in claim 2 wherein each token of the n-tokens includes a different number of bits from the binary value.
- 4. The system as set forth in claim 2 wherein an i-th token from the n-tokens has a length of i-bits.
- 5. The system as set forth in claim 1 wherein the system is embodied as computer executable instructions.
- 6. The system as set forth in claim 1 wherein the inverted index is a b-tree structure.
- 7. The system as set forth in claim 1 wherein the inverted index is a hash index.
- 8. The system as set forth in claim 1 wherein the inverted index is an array structure.
- 9. The system as set forth in claim 1 further including logic for generating a binary value for each numeric attribute that represents the numeric attribute.
- 10. A process of generating an inverted index from numeric values contained in a data repository, the process comprising the steps of:
determining a binary value for each of the numeric values; tokenizing the binary value, for each numeric value, into a plurality of bit tokens where each of the plurality of bit tokens includes a different number of bits from the binary value; and generating an inverted index using the plurality of bit tokens from each numeric value as an index key.
- 11. The process as set forth in claim 10 wherein the binary value includes n-bits and the tokenizing step creates n-tokens from the binary value.
- 12. The process as set forth in claim 10 wherein the generating step includes generating an index entry having a form of (attribute, token, list) where list is a list of entities that contain the attribute and the token.
- 13. The process as set forth in claim 10 wherein the inverted index is generated as a b-tree.
- 14. The process as set forth in claim 10 wherein the inverted index is generated as a hash index.
- 15. The process as set forth in claim 10 wherein the inverted index is generated as an array structure.
- 16. The process as set forth in claim 10 wherein the determining step includes setting a predetermined length for the binary value.
- 17. A method of data retrieval from a data repository in response to a query having a numeric operand, the method comprising the steps of:
providing an inverted index generated from the data repository, the inverted index having an index key formed from an attribute in the data repository having numeric values, the index key being based on a binary value of the numeric values; determining a binary value for the numeric operand; tokenizing the binary value of the numeric operand into a plurality of tokens each having a different number of bits; and retrieving data from the inverted index by searching the inverted index based on a correspondence between the plurality of tokens and the binary value of the indices.
- 18. The method as set forth in claim 17 wherein if the query includes a greater-than operator, the retrieving step includes:
selecting tokens from the plurality of tokens that end in a zero bit; converting the zero bit of the selected token to a one bit; and matching the converted tokens to the binary values of the indices in the inverted index.
- 19. The method as set forth in claim 17 further including wherein if the query includes a less-than operator, the retrieving step includes:
selecting tokens from the plurality of tokens that end in a one bit; converting the one bit of the selected token to a zero bit, and matching the converted tokens to the binary values of the indices in the inverted index.
- 20. The method as set forth in claim 17 wherein the query includes multiple numeric operands, the method further includes:
repeating the determining, tokenizing and retrieving steps for each numeric operand; and merging data retrieved from each numeric operand in accordance with the query to obtain a resultant data list.
- 21. The method as set forth in claim 17 wherein the retrieving step retrieves a list of entity identifiers that are associated to an index that matches one of the plurality of tokens.
REFERENCE TO RELATED APPLICATIONS
[0001] The present application claims the benefit of U.S. Provisional Patent Application, Ser. No. not yet assigned, which was filed on May 9, 2002.