This application is related to U.S. patent application Ser. No. 09/815,122, filed on Mar. 22, 2001, entitled “ADAPTIVE INTEGRATED CIRCUITRY WITH HETEROGENEOUS AND RECONFIGURABLE MATRICES OF DIVERSE AND ADAPTIVE COMPUTATIONAL UNITS HAVING FIXED, APPLICATION SPECIFIC COMPUTATIONAL ELEMENTS.”
This application is also related to the following copending applications:
U.S. patent application Ser. No. 10/443,501, filed on May 21, 2003; entitled “HARDWARE TASK MANAGER FOR ADAPTIVE COMPUTING” ; and
U.S. patent application Ser. No. 10/443,554, filed on May 21, 2003 entitled, “UNIFORM INTERFACE FOR A FUNCTIONAL NODE IN AN ADAPTIVE COMPUTING ENGINE” .
The design of processing architectures is crucial to improving the speed, power and efficiency of digital processing systems. More complex computing systems generally require more innovative architecture design in order to maximize the utility of the available processing power.
One tradeoff that is often made in processing architecture design is the tradeoff between speed, complexity and reconfigurability. For example, where a unit, e.g., an execution unit is highly configurable. There is more of a burden in controlling the unit. A reconfigurable unit needs to receive control signals to set up the configuration. Also, the unit's configuration is dependent on the higher-level tasks that are being performed, or solved, within the overall system.
Thus, it is desirable to provide features for a digital processing architecture that improve upon one or more shortcomings in the prior art.
The present invention includes a reconfigurable arithmetic node (RAN) that allows the performance of the RAN to be optimized depending on a specific task, or algorithm, to be executed within an interval of time. A preferred embodiment of the invention allows a RAN to be configured differently for eight different algorithms as follows: Asymmetric Finite-Impulse Response (FIR) Filter, Symmetric FIR Filter, Complex Multiply/FIR Filter, Sum-Of-Absolute-Differences (SAD), Bi-Linear Interpolation, BiQuad Infinite Impulse Response (IIR) Filter, Radix-2 Fast Fourier Transform (FFT)/Inverse FFT (IFFT), and Radix-2 Discrete Cosign Transform (DCT)/Inverse DCT (IDCT).
The RAN is provided with interconnection ability to various computational elements and memories. The configurations of RAN and associated components are optimized so that each algorithm can execute in only a few clock cycles. For example, an IDCT algorithm which requires 16 multiplications and 26 additions/subtractions can be performed in 16 clock cycles using an execution unit that has one multiplier and two adder/subtractors.
In one embodiment the invention provides a computational unit in an adaptable computing system, the computational unit comprising a plurality of execution units coupled by a configurable interconnection; and a configuration system for configuring the interconnection in response to a control signal.
A detailed description of an adaptive computing engine architecture used in a preferred embodiment is provided in the patents referenced above. The following section provides a summary of the architecture described in the referenced patents.
In a preferred embodiment, the ACE 100 does not utilize traditional (and typically separate) data, DMA, random access, configuration and instruction busses for signaling and other transmission between and among the reconfigurable matrices 150, the controller 120, and the memory 140, or for other input/output (“I/O”) functionality. Rather, data, control and configuration information are transmitted between and among these matrix 150 elements, utilizing the matrix interconnection network 110, which may be configured and reconfigured, in real-time, to provide any given connection between and among the reconfigurable matrices 150, including those matrices 150 configured as the controller 120 and the memory 140.
The matrices 150 configured to function as memory 140 may be implemented in any desired or exemplary way, utilizing computational elements (discussed below) of fixed memory elements, and may be included within the ACE 100 or incorporated within another IC or portion of an IC. In the exemplary embodiment, the memory 140 is included within the ACE 100, and preferably is comprised of computational elements which are low power consumption random access memory (RAM), but also may be comprised of computational elements of any other form of memory, such as flash, DRAM, SRAM, MRAM, ROM, EPROM or E2PROM. In the exemplary embodiment, the memory 140 preferably includes direct memory access (DMA) engines, not separately illustrated.
The controller 120 is preferably implemented, using matrices 150A and 150B configured as adaptive finite state machines (FSMs), as a reduced instruction set (“RISC”) processor, controller or other device or IC capable of performing the two types of functionality discussed below. (Alternatively, these functions may be implemented utilizing a conventional RISC or other processor.) The first control functionality, referred to as “kernel” control, is illustrated as kernel controller (“KARC”) of matrix 150A, and the second control functionality, referred to as “matrix” control, is illustrated as matrix controller (“MARC”) of matrix 150B. The kernel and matrix control functions of the controller 120 are explained in greater detail below, with reference to the configurability and reconfigurability of the various matrices 150, and with reference to the exemplary form of combined data, configuration and control information referred to herein as a “silverware” module.
The matrix interconnection network 110 of
It should be pointed out, however, that while any given switching or selecting operation of, or within, the various interconnection networks may be implemented as known in the art, the design and layout of the various interconnection networks, in accordance with the present invention, are new and novel, as discussed in greater detail below. For example, varying levels of interconnection are provided to correspond to the varying levels of the matrices, computational units, and elements. At the matrix 150 level, in comparison with the prior art FPGA interconnect, the matrix interconnection network 110 is considerably more limited and less “rich”, with lesser connection capability in a given area, to reduce capacitance and increase speed of operation. Within a particular matrix or computational unit, however, the interconnection network may be considerably more dense and rich, to provide greater adaptation and reconfiguration capability within a narrow or close locality of reference.
The various matrices or nodes 150 are reconfigurable and heterogeneous, namely, in general, and depending upon the desired configuration: reconfigurable matrix 150A is generally different from reconfigurable matrices 150B through 150N; reconfigurable matrix 150B is generally different from reconfigurable matrices 150A and 150C through 150N; reconfigurable matrix 150C is generally different from reconfigurable matrices 150A, 150B and 150D through 150N, and so on. The various reconfigurable matrices 150 each generally contain a different or varied mix of adaptive and reconfigurable nodes, or computational units; the nodes, in turn, generally contain a different or varied mix of fixed, application specific computational components and elements that may be adaptively connected, configured and reconfigured in various ways to perform varied functions, through the various interconnection networks. In addition to varied internal configurations and reconfigurations, the various matrices 150 may be connected, configured and reconfigured at a higher level, with respect to each of the other matrices 150, through the matrix interconnection network 110. Details of the ACE architecture can be found in the related patent applications, referenced above.
Reconfigurable Arithmetic Node (RAN)
The RAN is designed to perform commonly-used digital signal processing (DSP) functions. It is adaptable in accordance with the approaches disclosed in the related applications to perform the functions listed in Table I. Naturally, other approaches can use other designs to achieve other functions. Further, not all of the functions listed in Table I need be achieved in a particular embodiment.
In
In a preferred embodiment, the ACU, the AGU, and the DPU components are configurable. Reconfigurability allows efficient execution of the targeted algorithms while minimizing power consumption.
The CPU controls task setup and teardown, buffer acknowledgements, and intra-task processing. More details of task processing can be found in discussions of the hardware task manager in the above-referenced patent applications. The reconfigurable ACU (of
The RAN architecture uses two data memory reads and one data memory write per clock period. The required memory addresses are generated by the RAN's AGU. The AGU consists of two READ address generators: Read X_Memory Address Generator Unit (XAGU) and Read Y_Memory Address Generator Unit (YAGU); and one WRITE address generator: Write X|Y_Memory Address Generator Unit (WAGU). Each of the three address generators includes a so-called common part plus a reconfigurable algorithm-specific part. The common part includes registers, adders and multiplexers that are used for all algorithms. The algorithm-specific part includes counter logic that supports a specific algorithm, such as a “perfect shuffle” generator for FFT, a first eight powers of two delay generation for Golay correlators, and a ‘row/column’ counter for two dimensional DCT.
The capabilities of the YAGU include the local oscillator function and FFT sine/cosine table address generation. The WAGU also supports FFT “perfect shuffle” addressing and first eight powers of two delay generation for Golay correlators
The ability of any hardware arithmetic unit to execute any digital signal processing (DSP) algorithm efficiently is a function of many elements of the design, including the number of computational elements and memories and their interconnectivity. We describe eight execution units that are tailored to execute eight specific, widely used algorithms.
These units are near-optimum in the sense that, with the number of computational elements that have been selected, the algorithm will execute in the fewest possible clock cycles. For example, a radix-2 FFT butterfly requires four multiplications and six addition/subtractions. An execution unit with one multiplier and two adder/subtractors can calculate the butterfly in four clock cycles. Removing one of the adder/subtractors would increase the required time to six clock cycles. The second adder/subtractor provides considerable performance gains at a modest incremental cost.
Similarly, the inner loop for an IDCT algorithm can require sixteen multiplications and twenty six addition/subtractions (e.g., a Chen IDCT algorithm). Such an algorithm can be performed in sixteen clock cycles on an execution unit which includes one multiplier and two adder/subtractors.
The eight near-optimum execution units for the targeted algorithms are shown in
Although the invention has been described with respect to specific embodiments, thereof, these embodiments are merely illustrative, and not restrictive of the invention. For example, any type of processing units, functional circuitry or collection of one or more units and/or resources such as memories, I/O elements, etc., can be included in a node. A node can be a simple register, or more complex, such as a digital signal processing system. Other types of networks or interconnection schemes than those described herein can be employed. It is possible that features or aspects of the present invention can be achieved in systems other than an adaptable system, such as described herein with respect to a preferred embodiment.
Thus, the scope of the invention is to be determined solely by the appended claims.
This application claims priority from U.S. Provisional Patent Application Ser. No. 60/391,874, filed on Jun. 25, 2002 entitled “DIGITAL PROCESSING ARCHITECTURE FOR AN ADAPTIVE COMPUTING MACHINE”; which is hereby incorporated by reference as if set forth in full in this document for all purposes.
Number | Name | Date | Kind |
---|---|---|---|
4758985 | Carter | Jul 1988 | A |
4905231 | Leung et al. | Feb 1990 | A |
5144166 | Camarota et al. | Sep 1992 | A |
5218240 | Camarota et al. | Jun 1993 | A |
5245227 | Furtek et al. | Sep 1993 | A |
5336950 | Popli et al. | Aug 1994 | A |
5388062 | Knutson | Feb 1995 | A |
5450557 | Kopp et al. | Sep 1995 | A |
5646544 | Iadanza | Jul 1997 | A |
5729754 | Estes | Mar 1998 | A |
5737631 | Trimberger | Apr 1998 | A |
5796957 | Yamamoto et al. | Aug 1998 | A |
5802055 | Krein et al. | Sep 1998 | A |
5828858 | Athanas et al. | Oct 1998 | A |
5889816 | Agrawal et al. | Mar 1999 | A |
5892961 | Trimberger | Apr 1999 | A |
5907580 | Cummings | May 1999 | A |
5910733 | Bertolet et al. | Jun 1999 | A |
5959881 | Trimberger et al. | Sep 1999 | A |
5963048 | Harrison et al. | Oct 1999 | A |
5966534 | Cooke et al. | Oct 1999 | A |
5970254 | Cooke et al. | Oct 1999 | A |
6023742 | Ebeling et al. | Feb 2000 | A |
6088043 | Kelleher et al. | Jul 2000 | A |
6094065 | Tavana et al. | Jul 2000 | A |
6119178 | Martin et al. | Sep 2000 | A |
6120551 | Law et al. | Sep 2000 | A |
6150838 | Wittig et al. | Nov 2000 | A |
6230307 | Davis et al. | May 2001 | B1 |
6237029 | Master et al. | May 2001 | B1 |
6266760 | DeHon et al. | Jul 2001 | B1 |
6282627 | Wong et al. | Aug 2001 | B1 |
6353841 | Marshall et al. | Mar 2002 | B1 |
6381293 | Lee et al. | Apr 2002 | B1 |
6408039 | Ito | Jun 2002 | B1 |
6426649 | Fu et al. | Jul 2002 | B1 |
6433578 | Wasson | Aug 2002 | B1 |
6483343 | Faith et al. | Nov 2002 | B1 |
6510138 | Pannell | Jan 2003 | B1 |
6675284 | Warren | Jan 2004 | B1 |
6694380 | Wolrich et al. | Feb 2004 | B1 |
6859434 | Segal et al. | Feb 2005 | B2 |
6941336 | Mar | Sep 2005 | B1 |
6980515 | Schunk et al. | Dec 2005 | B1 |
20010052793 | Nakaya | Dec 2001 | A1 |
20020042875 | Shukla | Apr 2002 | A1 |
20020138716 | Master et al. | Sep 2002 | A1 |
20020184275 | Dutta et al. | Dec 2002 | A1 |
20030074473 | Pham et al. | Apr 2003 | A1 |
Number | Date | Country | |
---|---|---|---|
20040030736 A1 | Feb 2004 | US |
Number | Date | Country | |
---|---|---|---|
60391874 | Jun 2002 | US |