A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
The present invention pertains to data processing. In particular, the invention pertains to processing intensive function at high speed. With greater particularity, the invention pertains to methods and apparatus for dividing processing tasks in an efficient manner for rapid processing. With still greater particularity, the invention pertains to methods and apparatus of implementing high-speed data stream splitting, computation, and data on an array of processors.
Processing devices can be utilized for a wide range of applications, including the data processing of large amounts of data. In conventional systems, a stream of serial data is processed one data sample at a time by a single processing device. For example, a first data sample is processed, then a second, then a third, and so on until all samples are processed by the same processing device. The use of multiple processing devices will only speed up the processing of data so long as there is a common bus between the processing devices that controls the input and output of the stream to and from the processing devices.
A problem has arisen when such arrays are used for rapid processing of real time information common in audio, video and signal processing applications. The incoming data stream information must be rapidly processed in order to be useful. This requires division of processing tasks and transmission to multiple processors. This division process becomes a bottleneck, limiting speed to that of the division process. Accordingly, there is a need for a method and apparatus for rapidly splitting, processing, and reformulation of a high speed data stream.
The proposed invention uses computers on an array of processors for the purpose of high speed data stream splitting, processing, and reformulation. An array of processing devices can also be used to perform the task of separating a data stream, processing the data, and reformulating the processed data. An array of multiple processing devices can be utilized to divide each of the larger tasks into smaller subtasks spread across the array. The smaller tasks are performed simultaneously, thus improving the performance of the larger task. In addition, the same smaller task can be divided in a way that many processing devices are performing the same task, and thus improving the overall speed of the large task.
One scenario of doing this is to input a data stream into a group of processors connected in serial. As the data stream passes individual processors substreams are split off at the processors. Each substream is then processed separately in a second group of processors. This second group of processors may have multiple steps and multiple processors for each substream. Finally, a third group of processors assembles the substreams into a processed data stream. This third group of processors may be connected in serial to form a virtual mirror image of the first group of processors.
The invention provides an efficient fast method of processing a data stream by means of a processor array.
a is a printout of example machine language and compiler directives to instruct a processing device in
b is a second printout of example machine language and compiler directives to instruct a second processing device in
c is a third printout of example machine language and compiler directives to instruct a third processing device in
Also shown in
In an alternate embodiment, processing device 205(da) sends the nth of the ‘n’ samples to processing device 205(ca), processing device 205(db) sends the (n-1)th of the ‘n’ samples to processing device 205(da), and so on and so forth until processing device 205(dn) sends the first of the ‘n’ samples to processing device 205(cn).
In a second alternate embodiment, the ‘n’ data values present in each of the processing devices 205(da)-205(dn) are sent to processing devices 205(ca)-205(cn) in such a way that each of the processing devices 205(ca)-205(cn) only receive one of the ‘n’ data values and that no single data value is left out, which also implies that no two processing 205(ca)-205(cn) devices receive a duplicate data value. The difference between this embodiment and the previous two embodiments is that the row of processing devices 205(ca)-205(cn) do not receive data values based on an ascending or descending order with respect to the data stream order.
A third grouping of processing devices 225 performs the function of signal processing. A column of processing devices within grouping 225 is used to process each data sample in parallel. Each of the processing devices 205(ca)-205(cn) receives a single data value from processing devices 205(da)-205(dn). Each row of processing devices, as part of grouping 225, must perform an identical function. Hence, the number of processing devices in each column is arbitrary.
A fourth grouping of processing devices 230 performs the function of reformulating the processed data. The processed data value in processing device 205(ba) is sent to processing device 205(aa), and the processed data value in processing device 205(bb) is sent to processing device 205(ab), and so on and so forth until the processed data value in processing device 205(bn) is sent to processing device 205(an).
Recall that in one embodiment, processing device 205(aa) contains the first of ‘n’ processed data, processing device 205(ab) contains the second of ‘n’ processed data, and so on and so forth so that processing device 205(an) contains the nt of ‘n’ processed data. Hence, to reformulate the data stream in the same order it was received into the processing device involves passing the data values in each of the processing devices 205(aa)-205(an) in the direction of processing device 205(aa).
Recall that in an alternate embodiment, processing device 205(aa) contains the nth of ‘n’ processed data, processing device 205(ab) contains the (n-1)th of ‘n’ processed data. Hence, to reformulate the data stream in the same order it was received into the processing device involves passing the data values in each of the processing devices 205(aa)-205(an) in the direction of processing device 205(an).
Recall that in a second alternate embodiment, prior to the processing of the data in grouping 225 and in grouping 220, the data is separated such that processing devices 205(ca)-205(cn) receive only one unique data value of the ‘n’ data values and that the row of processing devices 205(ca)-205(cn) do not receive data values based on an ascending or descending order with respect to the data stream order. Hence, to reformulate the data stream in the same order in which it was received involves more than just a movement of the data in the direction of a processing device.
The second grouping of processing devices 315 includes processing devices 305(ca), 305(cb), 305(cc), 305(cd), and 305(ce). Each processing device, as part of the grouping 315, receives every five data sample substream. Processing device 305(ca) sends the fifth of every five data sample substream to processing device 305(ba). Processing device 305(cb) sends the fourth of every five data sample substream to processing device 305(bb). Processing device 305(cc) sends the third of every five data sample substream to processing device 305(bc). Processing device 305(cd) sends the second of every five data sample substream to processing device 305(bd). Processing device 305(ce) sends the first of every five data sample substream to processing device 305(be). A third grouping of processing devices 320 includes processing devices 305(ba), 305(bb), 305(bc), 305(bd), and 305(be). Each processing device, as part of this grouping, performs the same function.
The result of the processed data sample in processing device 305(ba) is sent to processing device 305(aa). The result of the processed data sample in processing device 305(bb) is sent to processing device 305(ab). The result of the processed data sample in processing device 305(bc) is sent to processing device 305(ac). The result of the processed data sample in processing device 305(bd) is sent to processing device 305(ad). The result of the processed data sample in processing device 305(be) is sent to processing device 305(ae).
A fourth group of processing devices 325 includes processing devices 305(aa), 305(ab), 305(ac), 305(ad), and 305(ae). The function of grouping 325 is to reformulate the processed data from grouping 320 in the order in which every five data sample substream enter the array of processing devices via path 305. The processed data leaves the array of processing devices via a path 330. Processing device 305(ae) sends to path 330 the first processed data of every five data sample substream. Processing device 305(ad) sends to path 330, via processing device 305(ae), the second processed data of every five data sample substream. Processing device 305(ac) sends to path 330 via processing devices 305(ad) and 305(ae) the third processed data of every five data sample substream. Processing device 305(ab) sends to path 330 via processing devices 305(ac), 305(ad), and 305(ae) the second processed data of every five data sample substream. Processing device 305(aa) sends to path 330 via processing devices 305(ab), 305(ac), 305(ad), and 305(ae).
In an alternate embodiment, path 305 is the movement of data in a stream from another processing device not a part of the high speed data stream split, processing, and reformulation. In this alternate embodiment, path 330 is the movement of processed data to another processing device not a part of the high speed data stream split, processing, and reformulation.
a is the native machine language and compiler directives written to instruct a processing device on the SEAforth® S40 array of processing devices, a preferred embodiment for executing the function of grouping 215 of
Once processing device 205(ea) receives power, the first instruction word positioned at the address indicated by the program counter at a position $00000 of the RAM will be fetched and positioned into the instruction decode logic of processing device 205(ea). Each of the four instructions, as part of the instruction word, will be executed in the following manner. The @a (pronounced fetch a) instruction will perform a read from the port in which the A-register is addressing. Hence, the execution of the @a instruction will read a data word of the incoming stream of data and place the data word into the T-register of the data stack of processing device 205(ea). The !b (pronounced store b) instruction will perform a write to the address in which the B-register is addressed. Hence, the execution of the !b instruction will write the just received data value in the T-register to the port in which the B-register is addressing. The first unext (pronounced micro next) instruction checks the contents of the R-register of the return stack for zero. If the R-register is zero, then the contents of the R-register are dropped. Due to the fact that the return stack is circular, dropping the contents of the R-register effectively moves the contents of each register below the R-register up one register. The bottom register of the return stack will contain the value of the register just below the R-register prior to the execution of the unext instruction. If the R-register is non-zero, the unext instruction will decrement the R-register by one (decimal base) and return to the beginning of the present instruction word for instruction execution. Hence, the execution of the first unext instruction will result in the execution of the @a and !b instructions a total of 218−1 times before the second written unext instruction in line 7 of
b is the native machine language and compiler directives written to instruct a processing device on the SEAforth® S40 array of processing devices, a preferred embodiment for executing function 220 of
Once processing device 205(da) receives power, the first instruction word positioned at the address indicated by the program counter at a position $00000 of the RAM will be fetched and positioned into the instruction decode logic of processing device 205(da). The @a instruction will read a word from processing device 205(ea) and place the data word into the T-register of the data stack of processing device 205(da). The unext instruction will check the R-register for zero (decimal base). Due to the fact that the R-register is zero, the !b instruction is executed, which sends the data word in the T-register to processing device 205(ca). The value in the R-register is dropped and now contains the value of ten (decimal base). The second written unext instruction checks the R-register for zero, and because the value of the R-register is ten (decimal base) the R-register is decremented and execution returns to the beginning of the present instruction word. A total of nine data words are fetched from processing device 205(ea) by the @a instruction in conjunction with the first written unext instruction until the R-register contains zero, in which case the !b instruction will send the tenth data word received into processing device 205(da) to processing device 205(ca). The execution of the second written unext instruction, in which case each register of the return stack contains a value of ten (decimal base) and thus, execution returns to the beginning of the present instruction word where ten more data words are fetched from processing device 205(ea) and only the tenth data word is sent to processing device 205(ca). This sequence of fetching ten data words from processing device 205(ea) and only sending the tenth data word to processing device 205(ca) is indefinitely repeated. There is no memory overload in processing device 205(da) because the fetched data words from processing device 205(ea) are stored in the T-register of the data stack of processing device 205(da). The data stack is circular, so only the data words which are not sent to processing device 205(ca) are eventually overwritten. Also, the first instruction word loaded into the instruction decode logic is the only instruction word ever loaded into the instruction decode logic, there is no delay in pre-fetching the next instruction words. The pre-fetch circuitry is never enabled, and the only delay is in returning to the beginning of the instruction word.
c is the native machine language and compiler directives written to instruct a processing device on the SEAforth® S40 array of processing devices, a preferred embodiment for executing function 230 of
Once processing device 205(aa) receives power, the first instruction word positioned at the address indicated by the program counter at a position $00000 of the RAM will be fetched and positioned into the instruction decode logic of processing device 205(aa). Each of the four instructions, as part of the instruction word, will be executed in the following manner. The @a instruction will perform a read from the port in which the A-register is addressing. Hence, the execution of the @a instruction will read a processed data word from processing device 205(ba) and place the processed data word into the T-register of the data stack of processing device 205(aa). The !b instruction will perform a write to the address in the B-register. Hence, the execution of the !b instruction will write the just received processed data value in the T-register to the port in which the B-register is addressing. The first unext instruction checks the contents of the R-register of the return stack for zero. If the R-register is zero, then the contents of the R-register are dropped. Due to the fact that the return stack is circular, dropping the contents of the R-register effectively moves the contents of each register below the R-register up one register. The bottom register of the return stack will contain the value of the register just below the R-register prior to the execution of the unext instruction. If the R-register is non zero, the unext instruction will decrement the R-register by one (decimal base) and return to the beginning of the present instruction word for instruction execution. Hence, the execution of the first unext instruction will result in the execution of the @a and !b instructions a total of 218−1 times before the second written unext instruction in line 7 of
The inventive computer logic arrays processors 205, busses 110, 210, groupings 220, 225 and 235, and signal processing methods are intended to be widely used in a great variety of communication applications, including hearing aid systems. It is expected that they will be particularly useful in wireless applications where significant computing power and speed is required.
As discussed previously herein, the applicability of the present invention is such that the inputting information and instructions are greatly enhanced, both in speed and versatility. Also, communications between a computer array and other devices are enhanced according to the described method and means. Since the inventive computer logic arrays processors 205, busses 110, 210, groupings 220, 225 and 235, and signal processing methods may be readily produced and integrated with existing tasks, input/output devices and the like, and since the advantages as described herein are provided, it is expected that they will be readily accepted in the industry. For these and other reasons, it is expected that the utility and industrial applicability of the invention will be both significant in scope and long-lasting in duration.
This application claims the benefit of U.S. Provisional Patent Application Ser. No. 61/094,501 entitled “High Speed Data Stream Splitter”, filed on Sep. 5, 2008; and U.S. Provisional Patent Application Ser. No. 61/074,097 entitled “High Speed Data Stream Splitter”, filed on Jun. 19, 2008, which are incorporated herein by reference in their entirety.
Number | Date | Country | |
---|---|---|---|
61074097 | Jun 2008 | US | |
61094501 | Sep 2008 | US |