Claims
- 1. A method of sorting data sets including a predetermined number of distinct keys, comprising the steps of:bundling the data sets where substantially identical keys having substantially identical key values are bundled together; and ordering the bundles in a predeteined order with respect to the order defined by the substantially identical key values for each bundle, and wherein said method is performed using an external memory.
- 2. A method according to claim 1, wherein said method is performed without using additional working space in the external memory.
- 3. A method according to claim 1, wherein said bundling step further comprises bundling the data sets to a location within a specified range belonging to an associated bundle.
- 4. A system sorting data sets including a predetermined number of distinct keys, comprising the steps of:bundling means for bundling the data sets where substantially identical keys having substantially identical key values are bundled together; and ordering means for ordering the bundles in a predetermined order with respect to the order defined by the substantially identical key values for each bundle, wherein said method is performed using an external memory.
- 5. A method for sorting large data sets that reside on external memory, given that the available memory is of size M and that the transfer block size is B, said method comprising the steps of:defining a function that maps input keys into about M/B bundles (groups); sorting the data set according to the bundles, resulting with about M/B sub-sequences, including the steps of: counting the Dumber of input keys that belong to each bundle; computing the range of disk blocks in which each bundle will reside upon termination of the sorting step; loading the first block from each of the said ranges into main memory; scanning the loaded blocks, while swapping keys so that each block is filled only with keys belonging to its bundle; writing every block that is filled with the appropriate keys back to its location within its range on disk and loading the next block in its range; and re-iterating the above steps for each sub-sequence, until each bundle consists of one key only.
- 6. A method for sorting large data sets that reside on external memory, given that the available memory is of size M, that the transfer block size is B, and that the number of distinct keys is at most about M/B, said method comprising the steps of:counting the number of input keys that belong to each bundle; computing the range of disk blocks in which each bundle will reside upon termination of the sorting step; loading the first block from each of the said ranges into main memory; scanning the loaded blocks, while swapping keys so that each block is filled only with keys belonging to its bundle; writing every block that is filled with the appropriate keys back to its location within its range on disk and loading the next block in its range.
- 7. A method for sorting large data sets that reside on external memory, given that the available memory is of size M and that the transfer block size is B, said method comprising the steps of:defining a function that maps input keys into about M/B bundles (groups); sorting the data set according to the bundles, resulting with about M/B sub-sequences, including the steps of; estimating the number of input keys that belong to each bundle; computing the range of disk blocks in which each bundle will reside upon termination of the sorting step; loading the first block from each of the said ranges into main memory; scanning the loaded blocks, while swapping keys so that each block is filled only with keys belonging to its bundle; writing every block that is filled with the appropriate keys back to its location within its range on disk and loading the next block in its range, and re-iterating the above steps for each sub-sequence, until each bundle consists of one key only.
- 8. A method for sorting large data sets that reside on external memory, given that the available memory is of size M, that the transfer block size is B, and that the number of distinct keys is at most about M/B, said method comprising the steps of:estimating the number of input keys that belong to each bundle; computing the range of disk blocks in which each bundle will reside upon termination of the sorting step; loading the first block from each of the said ranges into main memory; scanning the loaded blocks, while swapping keys so that each block is filled only with keys belonging to its bundle; writing every block that is filled with the appropriate keys back to its location within its range on disk and loading the next block in its range.
RELATED APPLICATIONS
This application claims priority to, and incorporates by reference, U.S. Provisional Application No. 60/112,190, filed on Dec. 15, 1998.
US Referenced Citations (1)
Number |
Name |
Date |
Kind |
5855016 |
Edem et al. |
Dec 1998 |
A |
Non-Patent Literature Citations (1)
Entry |
Lars Arge et al, On Sorting Strings in External Memory, Proceedings of the twenty eighth annual ACM symposium on theory of computing, May′1997 (pp. 540-548). |
Provisional Applications (1)
|
Number |
Date |
Country |
|
60/112190 |
Dec 1998 |
US |