This application is related to U.S. patent application Ser. No. 14/957,049 entitled “Sorting Apparatus,” naming Sheldon K. Meredith et al. as inventors, filed the same day as this application, which application is hereby incorporated herein by reference.
Field of the Disclosure
This application relates to sorting and in particular to sorting using multiple processing modules.
Description of the Related Art
The widely accepted value for the minimum number of comparison operations to sort a large list of N items is N log 2(N). Sorting a very large list of one billion items, for example, still requires roughly 30B comparisons. Each of these comparisons can also require many clock cycles of the computing system. 30B comparisons might actually take 300B clocked operations. In Big Data analytics, weather prediction, nuclear calculations, astrophysics, genetics, public health, and many other disciplines, there is a frequent need to sort very large datasets. That implies computational resources than can literally fill buildings with racks of servers to service these sorting needs. To the extent one can improve on this N log 2(N) limitation, or otherwise improve sorting operations, one can improve on the capital infrastructure and associated operational costs for computing systems.
Accordingly, in an embodiment sorting apparatus to sort a list of elements includes ingest logic to receive the list of elements to be sorted. A communication channel is coupled to receive elements of the list from the ingest logic. A processing module stores each of the elements of the list on the communication channel having a value that is within a range of values assigned to the processing module and to notify the ingest logic that the element has been received from the communication channel.
In another embodiment a method includes receiving a list of elements to be sorted and supplying an element of the list to a communication channel. A plurality of processing modules compare the element on the communication channel to respective value ranges associated with the processing modules. One of the processing modules for which a value of the element is within an associated value range, stores the element.
In another embodiment a method of sorting a list of elements includes receiving a list of elements to be sorted at ingest logic. The ingest logic supplies the elements of the list to a communication channel in a random order. For each element supplied to the communication channel, one of a plurality of processing modules determines the element to be within a range of element values associated with the one of the plurality of processing modules. The one of the processing modules stores the element in sorted order in memory associated with the one of the plurality of processing modules and notifies the ingest logic that the element has been read from the communication channel.
The present disclosure may be better understood, and its numerous objects, features, and advantages made apparent to those skilled in the art by referencing the accompanying drawings.
The use of the same reference symbols in different drawings indicates similar or identical items.
Referring to
Referring to
A set of processing and storage modules 105 read elements from the communication bus 201. Each processing and storage module 105 has an associated range of alphanumeric values and when an element on the bus has a value within the range, the module accepts the element from bus 201 for sorting. To the extent the range of values in the list to be sorted is known in advance, the ingest logic may inform an initialization function 115 (see
Since each module 105 has its own range, whenever a module sees an alphanumeric list element available on the communications bus, the module 105 determines if the list element value is within the range of the module in compare logic 204. That determination may be accomplished by subtracting the list element value from both the minimum and maximum range of the module. If the sign of each of the subtraction results is the same, then the alphanumeric value is outside of the range. If one of the subtraction results is zero, then the list value is within the range of list values assigned to be processed by the module. If the signs of the subtractions results are different, the list value is within the value range assigned to the module.
If the value of the element is within the range of values processed by the module, the module stores the element in a FIFO 205 and then strobes the communication bus line 203 to inform the ingestion logic 103 that the element has been accepted. There will always be one processing module to accept any possible element value placed on the communications bus 201. Although the processing modules pull elements from their own FIFOs for further processing, it is possible for delays in processing to cause a FIFO to fill up in which case the module with a “match” will not toggle the strobe line until it has an open position in its FIFO. That will cause the entire process to wait for this condition to clear. One of skill in the art will understand that the likelihood of this stall occurring goes down with more modules assigned, with faster removal of values from the FIFOs, with deeper FIFOs, and with incoming data being more randomly distributed across the range of list values (not having similar values very close to each other). In an embodiment, the ingestion logic sends list values randomly (randomly is intended to refer to pseudorandom implementations of a desired random process) to the communications bus rather than sequentially in order of receipt to ensure good distribution of alphanumeric values, which then may lower the likelihood of a sequential series of list elements landing on the same processing module, depending of course on the distribution of the values within the list.
Within each module, a value coming out of the FIFO 205 must be sorted and placed within a set of alphanumeric values retained within that module in storage 207. The sort and place value logic 209 may utilize the difference between the value of the element being inserted and the high and low range values to interpolate an approximate place within these rank-ordered values in storage 207 in which to insert the new value in sorted order. A local sort insertion sort can then be used for accurate placement of the element in sorted order, but it is important to note that there should not be any elements out of order, so proper placement is certain. In other embodiments, any insertion sort algorithm, including a linear sort, may be utilized to insert the value in the storage 207 in sorted order. Duplicates may be handled by a field associated with each entry that tracks the number of times the element value has been processed by the module or by having sufficient storage 207 to handle duplicate entries. When the insertion sort process in complete, the FIFO has one more position available to accept another value off of the communications bus.
One of skill in the art will appreciate that if the number of elements retained in a processing module is large, there is risk of the element placement process in the sorted list in storage 207 becoming time-consuming, which can in turn, cause the FIFO to accumulate elements and ultimately slow the overall list ordering process, e.g., if the FIFO repeatedly fills and cannot toggle the strobe line 203. Accordingly, in an embodiment, the length of the FIFO is dynamically adjusted as needed to match the needs of the module. That requires that storage be available to expand the FIFO if needed.
It may be advantageous to ensure one of the modules does not get overloaded with respect to any of the other processing modules to help ensure efficiency in sorting. In an embodiment, an equalizer function 211, which may be implemented as software executing on a processor, monitors how many elements are accepted by each module and when any one module accepts incoming list elements highly disproportionately, the equalizer function divides that processing module into two or more new processing modules and divides the parent module's range into suitable ranges for the new modules. Highly disproportionate means that a module is accepting a particular percentage more elements than the average module is accepting. The particular percentage can be determined by the number of processing modules that might be assignable to the overall sorting apparatus. The closer the particular percentage is to 0, the closer the apparatus will be to performing the sort at maximum efficiency. Thus, the particular percentage may be 5%, 10%, 20%, or some other percentage value depending such factors as available resources, size of the sort to be performed, and/or sensitivity to delay.
As an example of how the equalizer function works in an embodiment, assume a list of values where there are some random values scattered throughout the possible range, but there is a concentration in the middle of the possible range. Also assume that the sort starts with five modules (1-5) as shown in
Referring to
Once the child modules are created, the parent module's rank-ordered values are moved into the child modules sequentially in 309. For example, the sorted elements with values between 200-249 go to module 221 and sorted elements with values between 250-259 go to module 223. Additionally, the FIFO contents of the parent module are moved to the FIFOs of the child modules. That requires that FIFO entries with values of 200-249 go to module 221 and FIFO entries with values between 250-299 go to module 223. Once that is complete, the parent module is removed from the set of modules assigned to the communications bus. The parent module no longer exists for processing purposes. However, its alphanumeric range, its rank-ordered values and its FIFO have all be re-assigned to child modules. When the parent module is eliminated, the strobe lines of the child modules are enabled in 311, allowing them to now accept list elements from the communications bus and the flow returns to monitoring loading of all the modules. With the parent module being divided into two child modules, there are now six processing and storage modules instead of five. Note that the ability to add modules assumes an environment, such as a data center with sufficient processing resources where additional modules (processing cores and associated storage) can be assigned to a particular sorting task as needed. In that environment, when a module such as module 3 is taken offline in the example above, that module is made available to other processing or sorting tasks being performed in the data center.
In other embodiments, a single module 221 may be added and module 3 kept to process a portion of the range it previously processed. For example, the range of module 3 may be changed to 200-249 and module 221 receives the range 250-299. In that case, the FIFO elements appropriate to the new range (200-249) may be kept and the remaining FIFO entries transferred to the newly allocated module 221. In addition, the sorted entries corresponding to the new range are kept and the other sorted entries are transferred to the newly instantiated module 221.
The continual monitoring and equalization continues as needed to ensure the number of values in each processing module is sufficiently equal, so as to improve the sorting efficiency of the apparatus. Referring back to
The sender 107 may also be implemented as programmed processing logic together with sufficient memory to store and supply the sorted list the requesting machine 101. Since the alphanumeric values retained in each processing module are already sorted, the values are simply read out sequentially over bus 227 until the module indicates it is finished. Then the SENDER logic 107 repeats this process for all remaining modules. The SENDER logic 107 may send the rank-ordered outputs sequentially back to the requesting system 101 or create a rank-ordered file and send the file. In the first case, an end-of-transmission notification would be sent to indicate that no more values remain to be sent.
In an embodiment, multiple values are placed onto the communications bus 201 in parallel by, e.g., having a bus width sufficient to accommodate multiple values. Each module may then be able to inspect each of these values in parallel. With a large number of processing modules and a well distributed set of values, that would improve the processing speed of the apparatus. In the limit, it is possible that a single module would need to accept all of the values presented in parallel and put them all into its FIFO, but this is still topologically consistent with the described processing methodology provided above.
The processing modules and other functionality described herein, such as the ingest logic, the initializer function, the sender function, and the equalizer function, may be implemented by one or more processors that execute software instructions stored in memory to perform various functions associated with sorting as described herein. As employed herein, the term “processor” can refer to substantially any computing processing unit or device comprising, but not limited to, single-core processors or multi-core processors.
The processor system 400 also includes one or more external interfaces 415 to couple to, e.g., other processors in the system. Thus, interface 415 may provide an interface to bus 201. In addition, the processor system 400 may include interface 417 to interface to bus 227. In embodiments, the bus 201 (or 227) may be implemented as multiple point to point high speed interconnects such as a HyperTransport (HT) link. In such case, the ingest logic 103 may have multiple HT links. The ingest logic may broadcast list elements to all the processing modules 105 interconnected over HT link. In addition, the counter values from counter 215, and the strobe line 203 may be implemented as separate physical signal lines or as messages sent over any one or more of the communication links being utilized in the particular embodiment. Other communication channels and technologies are within contemplation of the embodiments described herein and may be utilized as needed in various embodiments. The systems and methods described herein can be implemented utilizing various commercially available operating systems or combinations of operating systems.
Note that the terms “first,” “second,” “third,” and the like, as used in the claims does not typically indicate or imply a particular temporal order. For example, “a first action,” “a second action,” and “a third action,” indicates three actions and not a particular order of the actions.
Thus, by employing the sort approach described above, improved sort efficiencies can be achieved. The description of the sort approach set forth herein is illustrative, and is not intended to limit the scope of the following claims. Variations and modifications of the embodiments disclosed herein may be made based on the description set forth herein, without departing from the scope of the following claims.
Number | Name | Date | Kind |
---|---|---|---|
3931612 | Stevens et al. | Jan 1976 | A |
4030077 | Florence et al. | Jun 1977 | A |
4132314 | von Beckmann et al. | Jan 1979 | A |
4298957 | Duvall et al. | Mar 1981 | A |
4261034 | Saccomano | Apr 1981 | A |
4464732 | Clark | Aug 1984 | A |
4750149 | Miller | Jun 1988 | A |
4907186 | Racey | Mar 1990 | A |
4939639 | Lee et al. | Jul 1990 | A |
5179699 | Iyer et al. | Jan 1993 | A |
5218700 | Beechick | Jun 1993 | A |
5600825 | Atkins et al. | Feb 1997 | A |
5905387 | Chinosi et al. | May 1999 | A |
5917828 | Thompson | Jun 1999 | A |
6073128 | Pongracz et al. | Jun 2000 | A |
6188251 | Priemer et al. | Feb 2001 | B1 |
6198311 | Shi et al. | Mar 2001 | B1 |
6199084 | Wiseman | Mar 2001 | B1 |
6289509 | Kryloff | Sep 2001 | B1 |
6310740 | Dunbar | Oct 2001 | B1 |
6366911 | Christy | Apr 2002 | B1 |
6741999 | Wagner | May 2004 | B2 |
6757766 | Hutner et al. | Jun 2004 | B1 |
6775667 | Lewis et al. | Aug 2004 | B1 |
7092901 | Davis et al. | Aug 2006 | B2 |
7177319 | Buchert et al. | Feb 2007 | B2 |
7197498 | Perloff | Mar 2007 | B2 |
7233895 | Petty | Jun 2007 | B2 |
7277606 | Sakai | Oct 2007 | B1 |
7450588 | Chang et al. | Nov 2008 | B2 |
7467138 | Carroll | Dec 2008 | B2 |
7689541 | Le Grand | Mar 2010 | B1 |
7711603 | Vanker | May 2010 | B2 |
7796583 | Enderby | Sep 2010 | B1 |
7870159 | Carroll | Jan 2011 | B2 |
7975120 | Sabbatini, Jr. et al. | Jul 2011 | B2 |
8014270 | Halford et al. | Sep 2011 | B2 |
8094157 | Le Grand | Jan 2012 | B1 |
8098718 | Sienko et al. | Jan 2012 | B2 |
8239231 | Lian | Aug 2012 | B2 |
8407098 | Davydov | Mar 2013 | B2 |
8700614 | Diller et al. | Apr 2014 | B1 |
8704842 | Gaddy et al. | Apr 2014 | B1 |
8832115 | Smintina et al. | Sep 2014 | B2 |
8964771 | Tabatabaee et al. | Feb 2015 | B2 |
9106238 | Cronie et al. | Aug 2015 | B1 |
20020040362 | Lewis | Apr 2002 | A1 |
20020165707 | Call | Nov 2002 | A1 |
20020174222 | Cox | Nov 2002 | A1 |
20030061597 | Curtis et al. | Mar 2003 | A1 |
20050193174 | Arimilli | Sep 2005 | A1 |
20060123014 | Ng | Jun 2006 | A1 |
20060176901 | Terai et al. | Aug 2006 | A1 |
20070174124 | Zagofsky et al. | Jul 2007 | A1 |
20070279274 | Hsieh | Dec 2007 | A1 |
20080048641 | Makino | Feb 2008 | A1 |
20080208951 | Gotz et al. | Aug 2008 | A1 |
20080211544 | Makino | Sep 2008 | A1 |
20080215584 | Furusho | Sep 2008 | A1 |
20090103501 | Farrag | Apr 2009 | A1 |
20090163223 | Casey | Jun 2009 | A1 |
20090302232 | Grosholz | Dec 2009 | A1 |
20100031003 | Chen | Feb 2010 | A1 |
20100312995 | Sung | Dec 2010 | A1 |
20110093546 | Rubingh | Apr 2011 | A1 |
20130314101 | Belet | Nov 2013 | A1 |
20130346425 | Bruestle | Dec 2013 | A1 |
20140006665 | Amano et al. | Jan 2014 | A1 |
20140164652 | Pettijohn | Jun 2014 | A1 |
20140266487 | Vaishnav | Sep 2014 | A1 |
20140269684 | Fortune et al. | Sep 2014 | A1 |
20140281390 | Boland et al. | Sep 2014 | A1 |
20150134795 | Theimer et al. | May 2015 | A1 |
20150169585 | Korst et al. | Jun 2015 | A1 |
20170325065 | Azam | Nov 2017 | A1 |
Entry |
---|
U.S. Appl. No. 14/869,374, filed Sep. 29, 2015, entitled “Sorting System,” naming Sheldon K. Meredith and William C. Cottrill as inventors. |
U.S. Appl. No. 14/924,005, filed Oct. 27, 2015, entitled “Analog Sorter,” naming Sheldon K. Meredith and William C. Cottrill as inventors. |
U.S. Appl. No. 14/957,049, filed Dec. 2, 2015, entitled “Sorting Apparatus,” naming Sheldon K Meredith, William C. Cottrill, and Jeremy Fix as inventors. |
Choi, Sung-Soon and Moon, Byung-Ro, “Isomorphism, Normalization, and a Genetic Algorithm for Sorting Network Optimization,” Proceedings of the Genetic and Evolutionary Computation, Gecco, 2002, pp. 327-334. |
Dewdney A.K., “Computer Recreations—On the Spaghetti Computer and Other Analog Gadgets for Problem Solving,” Scientific American, pp. 19-26, Jun. 1984. |
Rovetta, S. and Zunino, R., “Minimal-connectivity circuit for analogue sorting,” IEE Proc.-Circuits Devices Syst., vol. 146, No. 3, Jun. 1999, pp. 108-110. |
Xing, Huanlai and Qu, Rong, “A Nondominated Sorting Genetic Algorithm for Bi-Objective Network Coding Based multicast Routing Problems,” Information Sciences, 233 (2013), pp. 23 pages. |
Non-final Office action dated Dec. 7, 2017 in U.S. Appl. No. 14/957,049, filed Dec. 2, 2015, 23 pages. |
Final Office action dated Jun. 28, 2018 in U.S. Appl. No. 14/957,049, filed Dec. 2, 2015, 25 pages. |
Non-final Office action dated Apr. 4, 2018 in U.S. Appl. No. 14/869,374, filed Sep. 29, 2015, 36 pages. |
Number | Date | Country | |
---|---|---|---|
20170161020 A1 | Jun 2017 | US |