1. Field
The embodiments discussed herein are directed to a processing element capable of dynamically changing a circuit configuration and a reconfigurable circuit including the processing element.
2. Description of the Related Art
The counter element 103 includes an adder 103a to add the number of inputs of the input data DI, a register 103b to temporarily hold addition data generated by the adder 103a, and an enable signal generator 103c to generate and output an enable signal ENB. For example, the enable signal generator 103c outputs an enable signal ENB, which is data of “1”, at predetermined intervals based on the addition data. The adder 103a adds “1” to the addition data held in the register 103b every time input data DI is input to the counter element 103. Thus, the input data DI corresponds to the addition data output from the adder 103a.
The RAM element 104 includes a storage unit 104a to store the memory data DI1. Addition data DOa output from the counter element 103 is input to a read-address input terminal RA of the storage unit 104a. The addition data DOa corresponds to the input data DI. Thus, by using the addition data DOa as a read-address signal of the storage unit 104a, the memory data DI1, which is to be multiplied by the input data DI, can be read.
Predetermined time is required from when the input data DI is input to the counter element 103 until when the memory data DI1 is output from the RAM element 104, due to processing time in the elements 103 and 104. For example, one clock cycle is required for the counter element 103 to count the number of inputs of the input data DI and output the addition data DOa. Also, one clock cycle is required for the RAM element 104 to read the memory data DI1 and output the addition data DOa. In this case, output timing of the memory data DI1 read from the RAM element 104 delays by two clock cycles with respect to input timing of the input data DI to the counter element 103. Thus, the input data DI is input to the multiplier element 105 via the data delay element 107 so that the input data ID and the memory data DI1 corresponding to the same addition data DOa are input to the multiplier element 105 almost simultaneously. The data delay element 107 includes a register group 107a to delay the input data DI. The register group 107a includes two registers connected to each other in series so as to delay the input data DI by two clock cycles, for example.
The multiplier element 105 includes a multiplier 105a to which the input data DI and the memory data DI1 are input and a register 105b to temporarily hold multiplication data generated by the multiplier 105a. The accumulating adder element 106 includes an adder 106a and a register 106b to temporarily store addition data generated by the adder 106a. The adder 106a adds the multiplication data output from the multiplier element 105 and the addition data held in the register 106b. Thus, the accumulating adder element 106 can cumulatively add pieces of multiplication data generated by the multiplier element 105.
If an enable signal ENB of data corresponding to “1” is input to the accumulating adder element 106, for example, the accumulating adder element 106 ends cumulative addition of data and outputs cumulative addition data as output data DO. Output timing of the memory data DI delays by one clock cycle with respect to input timing of the addition data DOa to the RAM element 104. Thus, the enable signal ENB is input to the accumulating adder element 106 via the enable delay element 108 so that the output data DO is output after the accumulating adder element 106 has accumulated a desired number of pieces of multiplication data. The enable delay element 108 includes a register 108a to delay the enable signal ENB by one clock cycle, for example.
At multiplication of the memory data DI1 by the input data DI, the reconfigurable LSI 101 allows the RAM element 104 to store the memory data DI1 and sequentially reads the memory data DI1 from the RAM element 104 by using addition data based on the number of data inputs counted by the counter element 103. When a product-sum operation is performed, the reconfigurable LSI 101 calculates the number of accumulations of data by using the counter element 103. Furthermore, the reconfigurable LSI 101 generates the enable signal ENB controlling the number of accumulations by using the counter element 103 and input the signal to the accumulating adder element 106.
When the reconfigurable LSI 101 performs a cumulative operation, the reconfigurable LSI 101 counts the number of accumulations by using the counter element 103 (the element other than the accumulating adder element 106 to accumulate data), output a control signal, and input the control signal to each operation element.
As illustrated in
A filter process based on the spatial filter 111 is performed on the image data segments x00 to x22 and so on in the operation execution units x1, x2, and x3 read from the line buffers LB0 to LB2. If all of the coefficients a00 to a22 of the spatial filter 111 are set to 1/9, the image data segments x11, x12, and x13 of the target pixels of the operation execution units x1, x2, and x3 become new image data segments y11, y12, and y13, as expressed by the following expressions (1) to (3). Accordingly, a blurred image can be generated.
y11=(1/9)×x00+(1/9)×x01+(1/9)×x02+(1/9)×x10+(1/9)×x11+(1/9)×x12+(1/9)×x20+(1/9)×x21+(1/9)×x22 (1)
y12=(1/9)×x01+(1/9)×x02+(1/9)×x03+(1/9)×x11+(1/9)×x12+(1/9)×x13+(1/9)×x21+(1/9)×x22+(1/9)×x23 (2)
y13=(1/9)×x02+(1/9)×x03+(1/9)×x04+(1/9)×x12+(1/9)×x13+(1/9)×x14+(1/9)×x22+(1/9)×x23+(1/9)×x24 (3)
The image processing reconfigurable LSI requires the RAM elements to store the coefficients a00 to a22 of the spatial filter 111 and read the coefficients therefrom, the counter element to generate a read address signal, and the data delay element to adjust delay of the input data DI. Furthermore, a network is disadvantageously occupied to connect these elements.
It is an aspect of the embodiments discussed herein to provide a processing element including a shift register including n stages of registers mutually connected in series, and rotating held data among the n stages of registers in synchronization with a clock signal and a number-of-stages determining circuit determining the number of stages to be used among the n stages of registers, wherein an output terminal of the register in the last stage connects to an input terminal of the register in the first stage.
These together with other aspects and advantages which will be subsequently apparent, reside in the details of construction and operation as more fully hereinafter described and claimed, reference being had to the accompanying drawings forming a part hereof, wherein like numerals refer to like parts throughout.
The above-described embodiments of the present invention are intended as examples, and all embodiments of the present invention are not limited to including the features described above.
Reference may now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to like elements throughout.
The counter element 137 includes a counter 137a to count the number of inputs of input data DI and a register 137b to temporarily hold count data. The RAM element 121 includes a storage unit RAM0 to store the coefficients a00, a01, and a02 of the spatial filter 111. The coefficients a00, a01, and a02 are stored in the storage unit RAM0 while corresponding to addresses 0,1, and 2, respectively The RAM element 122 includes a storage unit RAM1 to store the coefficients a10, a11, and a12 of the spatial filter 111. The coefficients a10, a11, and a12 are stored in the storage unit RAM1 while corresponding to addresses 0, 1, and 2, respectively. The RAM element 123 includes a storage unit RAM2 to store the coefficients a20, a21, and a22 of the spatial filter 111. The coefficients a20, a21, and a22 are stored in the storage unit RAM2 while corresponding to addresses 0,1, and 2, respectively. Each of the RAM elements 121, 122, and 123 outputs the coefficient corresponding to the same address as that of the count data output from the counter element 137, among the coefficients a00 to a22.
The reconfigurable LSI 201 uses the spatial filter 111 of three rows and three columns, and thus each of the RAM elements 121, 122, and 123 stores three coefficients among the coefficients a00 to a22. The reconfigurable LSI 201 sequentially reads the coefficients a00 to a22 of the spatial filter 111 from the RAM elements 121, 122, and 123 by using three pieces of count data “0”, “1”, and “2” generated by the counter element 137.
The timings when the coefficients a00 to a22 are output from the RAM elements 121, 122, and 123 delay by the processing time in the counter element 137 and the RAM elements 121, 122, and 123 with respect to the timing when the input data DI is input to the counter element 137. Thus, the input data DI is input to the shift/mask elements 124, 125, and 126 via the data delay element 136. The data delay element 136 includes a register group 136a to delay the input data DI by the processing time. The register group 136a includes two registers connected to each other in series, for example.
The shift/mask element 124 includes a shift/mask circuit 124a, a register 124b to temporarily hold the input data DI on which a bit shift process and a bit mask process have been performed by the shift/mask circuit 124a, and a register 124c to temporarily hold the coefficients a00, a01, and a02 output from the RAM element 121. The shift/mask circuit 124a performs a bit shift process on the input data DI having many bits so that the image data segments x00 to x0n (see
The shift/mask elements 125 and 126 have the same configuration as that of the shift/mask element 124, that is, include shift/mask circuits 125a and 126a, registers 125b and 126b to hold the input data DI on which a bit shift process and a bit mask process have been done by the shift/mask circuits 125a and 126a, and registers 125c and 126c to hold the coefficients a10, a11, and a12 and the coefficients a20, a21, and a22 output from the RAM elements 122 and 123, respectively.
The input data DI0, DI1, and DI2 illustrated in
The filter process element 127 includes a multiplier element 130 to multiply the image data segments x00 to x0n read from the line buffer LB0 illustrated in
DO0=a00×x00+a01x01+a02×x02 (4)
The filter process element 128 has the same configuration as that of the filter process element 127. The filter process element 128 includes a multiplier element 132 to multiply the image data segments x10 to x1n read from the line buffer LB1 illustrated in
DO1=a10×x10+a11×x11+a12×x12 (5)
The filter process element 129 has the same configuration as that of the filter process elements 127 and 128. The filter process element 129 includes a multiplier element 134 to multiply the image data segments x20 to x2n read from the line buffer LB2 illustrated in
DO2=a20×x20+a21×x21+a22×x22 (6)
The image processing reconfigurable LSI 201 can add the output data DO0, DO1, and DO2 output from the filter process elements 127, 128, and 129, respectively, so as to calculate an image data segment Y11 expressed by the following expression (7) as a new image data segment of the target pixel x11 of the operation execution unit x11.
Y11=a00×x00+a01x01+a02×x02+a10×x10+a11×x11+a12×x12+a20×x20+a21×x21+a22×x22 (7)
The reconfigurable LSI 101 requires the RAM element 104 to store the memory data DI1 and read it therefrom and the counter element 103. In the reconfigurable LSI 101, the counter is used for both generating a read address signal to read the memory data DI1 and generating an enable signal ENB. If the counter is not used for these two purposes, the reconfigurable LSI 101 includes a counter element for generating a read address and a counter element for generating an enable signal ENB, independently. Also, the reconfigurable LSI 101 requires the data delay element 107 and the enable delay element 108 to adjust delay of the input data DI and the enable signal ENB. Furthermore, a network is disadvantageously occupied to connect these elements 103, 104, 107, and 108.
The image processing reconfigurable LSI 201 requires the RAM elements 121, 122, and 123 to store the coefficients a00 to a22 of the spatial filter 111 and read the coefficients therefrom, the counter element 137 to generate a read address signal, and the data delay element 136 to adjust delay of the input data DI. Furthermore, a network is disadvantageously occupied to connect these elements 121, 122, 123, 136, and 137.
As described above, the conventional reconfigurable LSI requires a processing element to store predetermined data and a processing element to adjust delay of data. Furthermore, the reconfigurable LSI requires a wiring area to connect these processing elements to each other. With this configuration, the conventional reconfigurable LSI has a problem that the chip size increases. Furthermore, the conventional reconfigurable LSI has a problem that a wiring load increases and that high-speed processing is difficult to perform.
A processing element and a reconfigurable circuit including the same according to an embodiment are described with reference to
A processing element and a reconfigurable circuit including the same according to example 1 are described with reference to
The number-of-stages determining circuit 4 includes selectors 4S1 to 4Sn-1, each being placed between adjoining registers of the n stages of registers 3R1 to 3Rn. In each of the selectors 4S1 to 4Sn-1, the coefficient as held data output from the register in an anterior stage in the adjoining registers and the coefficient as held data output from the register 3Rn in the last stage have been input. Each selector selects the coefficient as held data in any of the register in the anterior stage or the register 3Rn in the last stage and outputs the selected coefficient to the register in the posterior stage in the adjacent registers. Among the selectors placed between the registers,
For example, the selector 4Sn-1 is placed between the adjoining registers 3Rn-1 and 3Rn. The selector 4Sn-1 receives the coefficient a02 output from the register 3Rn-1 in the n-1-th stage (anterior stage) and the coefficient a01 of the register 3Rn in the last stage, selects one of the coefficients a01 and a02, and outputs the selected coefficient to the register 3Rn in the last stage (posterior stage). If the selector 4Si (1≦i≦n-1) among the selectors 4S1 to 4Sn-1 selects the held data in the register 3Rn in the last stage, the number-of-stages determining circuit 4 can determine the number of stages of registers to be used to be n-i stages. The number-of-stages determining circuit 4 is controlled by a control unit (not illustrated) provided in the reconfigurable circuit.
Next, performance of the processing element 7 is described with reference to
When a clock signal (not illustrated) rises after the initial setting, for example, in synchronization with a rising edge of the clock signal, the register 3R1 in the first stage outputs the coefficient a0n to the register in the second stage (not illustrated) via the selector 4S1, the register in the n-2-th stage (not illustrated) outputs the coefficient a03 (not illustrated) as held data to the register 3Rn-1 in the n-1-th stage via the selector 4Rn-2, the register 3Rn-1 in the n-1-th stage outputs the coefficient a02 to the register 3Rn in the last stage via the selector 4Sn-1, and the register 3Rn in the last stage outputs the coefficient a01 to the register 3R1 in the first stage and the selectors 4S1 to 4Sn-1. Accordingly, the coefficient a01 is held in the register 3R1 in the first stage, the coefficient a0n is held in the register in the second stage, the coefficient a03 is held in the register 3Rn-1 in the n-1-th stage, and the coefficient a02 is held in the register 3Rn in the last stage. During the above-described performance, the coefficient a01 output from the register 3Rn in the last stage is output from an output terminal 5 in synchronization with a rising edge of the clock signal.
When the clock signal rises again after the above-described performance, in synchronization with a rising edge of the clock signal, the register 3R1 in the first stage outputs the coefficient a01 to the register in the second stage (not illustrated) via the selector 4S1, the register in the n-2-th stage (not illustrated) outputs the coefficient a04 (not illustrated) to the register 3Rn-1 in the n-1-th stage via the selector 4Sn-2, the register 3Rn-1 in the n-1-th stage outputs the coefficient a03 to the register 3Rn in the last stage via the selector 4Sn-1, and the register 3Rn in the last stage outputs the coefficient a02 to the register 3R1 in the first stage. Accordingly, the coefficient a02 is held in the register 3R1 in the first stage, the coefficient a01 is held in the register in the second stage, the coefficient a04 is held in the register 3Rn-1 in the n-1-th stage, and the coefficient a03 is held in the register 3Rn in the last stage. During the above-described performance, the coefficient a02 output from the register 3Rn in the last stage is output from the output terminal S in synchronization with a rising edge of the clock signal.
The shift register 3 repeats the above-described performance in synchronization with rising edges of the clock signal, so that the coefficients a01 to a0n can be rotated among the registers 3R1 to 3Rn. After the clock signal rises n times, the coefficients a01 to a0n held in the registers 3R1 to 3Rn return to the original position at the initial setting.
For example, assume that setting is made in the number-of-stages determining circuit 4 so that only the selector 4Sn-2 selects the held data in the register 3Rn in the last stage at the initial setting. In this case, i=n-2, and thus the number of stages to be used among the n stages of registers 3R1 to 3Rn is two (=n-(n-2) stages).
When a clock signal rises after the initial setting, for example, in synchronization with a rising edge of the clock signal, the register 3Rn-1 in the n-1-th stage outputs the coefficient a02 to the register 3Rn in the last stage via the selector 4Sn-1, and the register 3Rn in the last stage outputs the coefficient a01 to the register 3R1 in the first stage and the selectors 4S1 to 4Sn-1. The selector 4Sn-1 is set to select the held data in the register 3Rn in the last stage and output the selected held data to the register 3Rn-1 in the n-1-th stage. Thus, during the above-described performance, the coefficient a01 output from the register 3Rn in the last stage is input to the register 3Rn-1 in the n-1-th stage via the selector 4Sn-1. Accordingly, the coefficient a01 is held in the register 3Rn-1 in the n-1-th stage, and the coefficient a02 is held in the register 3Rn in the last stage. During the above-described performance, the coefficient a01 output from the register 3Rn in the last stage is output from the output terminal 5 in synchronization with a rising edge of the clock signal.
When the clock signal rises again after the above-described performance, in synchronization with a rising edge of the clock signal, the register 3Rn-1 in the n-1-th stage outputs the coefficient a01 to the register 3Rn in the last stage via the selector 4Sn-1, and the register 3Rn in the last stage outputs the coefficient a02 to the register 3R1 in the first stage and the selectors 4S1 to 4Sn-1. The selector 4Sn-1 outputs the coefficient a02 to the register 3Rn-1 in the n-1-th stage. Accordingly, the coefficient a02 is held in the register 3Rn-1 in the n-1-th stage, and the coefficient a01 is held in the register 3Rn in the last stage. During the above-described performance, the coefficient a02 output from the register 3Rn in the last stage is output from the output terminal 5 in synchronization with a rising edge of the clock signal.
By repeating the above-described performance, the shift register 3 can rotate the coefficients a01 and a02 between the register 3Rn-1 in the n-1-th stage and the register 3Rn in the last stage in synchronization with rising edges of the clock signal. Also, the coefficient is input from the register 3Rn in the last stage to the registers 3R1 to 3Rn-2 and the selectors 4S1 to 4Sn-2. However, the coefficient held in the register 3Rn-2 in the n-2-th stage is not input to the register 3Rn-1 in the n-1-th stage, and thus the registers 3R1 to 3Rn-2 do not contribute to a shift operation of the coefficients as held data. In this way, the processing element 7 can determine the number of stages to be used in the shift register 3 by using the number-of-stages determining unit 4. With this configuration, when the number of pieces of held data to be rotated is small relative to the number of stages of the shift register 3, the number of stages to be used of the shift register 3 is set to the same number as that of the pieces of held data, so that the processing element 7 can continuously output the held data from the output terminal 5 in synchronization with the clock signal.
As described above, according to this example, the processing element 7 includes the shift register 3 having the registers 3R1 to 3Rn in many stages, so that the plurality of coefficients a01 to a0n can be held in the single processing element. Accordingly, the reconfigurable circuit according to this example including the processing element 7 can rotate the coefficients a01 to a0n by using a pipeline in the processing element 7. Thus, the reconfigurable circuit according to this example does not include the counter element 103 and the data delay element 107, unlike the conventional reconfigurable LSI 101. Accordingly, the space for the processing element in a semiconductor chip can be saved. Also, the reconfigurable circuit according to this example has a smaller number of processing elements, and thus the chip size can be reduced. Furthermore, in the reconfigurable circuit according to this example, a network is not occupied by a counter element unlike in the conventional circuit, and thus a wiring load is reduced and high-speed performance can be realized.
A processing element and a reconfigurable circuit including the same according to example 2 are described with reference to
Referring back to
The processing element 7 includes a selector 15 to select the coefficients a01 to a0n sequentially output from the output terminal 5 and predetermined data Dx input from the outside and output them to the multiplier circuit 13.
Next, performance of the processing element 7 according to this example is described with reference to
Assume that the selector 15 is set to select the data Dx from the outside at the initial setting. In this case, for example, the multiplier circuit 13 multiplies the input data DI and the data Dx input to the multiplier 13a in synchronization with a rising edge of the clock signal and outputs multiplication data as output data DO from the register 13b to the outside of the processing element 7.
As described above, according to this example, the processing element 7 includes the shift register 3 having the registers 3R1 to 3Rn in many stages and the multiplier circuit 13. With this configuration, the reconfigurable circuit according to this example including the processing element 7 can rotate the coefficients a01 to a0n by using a pipeline in the processing element 7 and multiply the coefficients a01 to a0n by the input data DI. Thus, the reconfigurable circuit can perform an operation in the single processing element, so that timing adjustment between the input data DI and the coefficients a01 to a0n is not required. Accordingly, the reconfigurable circuit according to this example does not include the counter element 103 and the data delay element 107, unlike the conventional reconfigurable LSI 101. Thus, the processing element 7 and the reconfigurable circuit including the same according to this example have the same advantages as those in example 1.
A processing element and a reconfigurable circuit including the same according to example 3 are described with reference to
The shift/mask circuit 17 performs a bit shift process on the input data DI so that the multiplier circuit 13 can multiply the input data DI by the coefficients a01 to a0n or the data Dx, and also performs a bit mask process on part of the input data DI that is not multiplied in the multiplier circuit 13. For example, assume that the multiplier circuit 13 has a function of performing operation on low 8 bits on the LSB side of the input data DI and the coefficients a01 to a0n or the data Dx. Also, assume that the input data DI is composed of 24 bits and that the high 8 bits on the MSB side are to be operated. In this case, the shift/mask circuit 17 shifts the high 8 bits of the input data DI to the right to the low 8 bits on the LSB side and performs a mask process on high 18 bits. Accordingly, the data to be operated is bit-shifted to the low 8 bits, so that the multiplier circuit 13 can perform operation on the coefficients a01 to a0n or the data Dx and the data to be operated in the input data DI.
The performance of the processing element 7 according to this example is the same as in example 2 except that the shift/mask circuit 17 performs a bit shift process and a bit mask process on the input data DI.
As described above, according to this example, the processing element 7 includes the shift/mask circuit 17 and thus can perform operation on some of many bits of the input data DI and the coefficients a01 to a0n or the data Dx. The reconfigurable circuit including the processing element 7 according to this example does not include the counter element 103 and the data delay element 107, unlike the conventional reconfigurable LSI 101. Thus, the reconfigurable circuit according to this example can have the same advantages as those in examples 1 and 2.
A processing element and a reconfigurable circuit including the same according to example 4 are described with reference to
The accumulating adder circuit 21 includes an adder 21a to which the multiplication data is input and a register 21b to temporarily hold addition data generated by the adder 21a and output the addition data as output data DO to the outside of the processing element 7. The adder 21a adds the multiplication data output from the multiplier circuit 13 and the addition data temporarily held in the register 21b. Accordingly, the accumulating adder circuit 21 can cumulatively add pieces of the multiplication data. The register 21b outputs the output data DO in synchronization with a rising edge of a clock signal.
Next, performance of the processing element 7 according to this example is described with reference to
As described above, the processing element 7 according to this example can perform a product-sum operation on the input data DI and the coefficients a01 to a0n. Thus, the reconfigurable circuit including the processing element 7 according to this example does not include the counter element 103 and the data delay element 107, unlike the conventional reconfigurable LSI 101. Thus, the processing element 7 and the reconfigurable circuit including the same according to this example can have the same advantages as those in examples 1 to 3.
A processing element and a reconfigurable circuit including the same according to example 5 are described with reference to
As described above, according to this example, the processing element 7 can cumulatively add some of many bits of the input data DI and the coefficients a01 to a0n. The reconfigurable circuit including the processing element 7 does not include the counter element 103 and the data delay element 107, unlike the conventional reconfigurable LSI 101. Thus, the processing element 7 and the reconfigurable circuit according to this example can have the same advantages as those in examples 1 to 4.
A processing element and a reconfigurable circuit including the same according to example 6 of the embodiment are described with reference to
Pieces of held data c01 to c0n are used as control signal data to control an operation unit (not illustrated). For example, each of the pieces of held data c01 to c0n-1 is data of “0” composed of several bits, and the held data c0n is data of “1” composed of several bits. The shift register 3 can output the held data of “1” from the output terminal 5 at every n clock cycles. In
The performance of the processing element 7 according to this example is the same as that in example 1 except that the output position of the pieces of held data c01 to c0n output from the shift register 3 is different.
As described above, according to this example, the processing element 7 includes the shift register 3 having the registers 3R1 to 3Rn in many stages, and thus can rotate the pieces of held data c01 to c0n in the single processing element. Accordingly, the reconfigurable circuit including the processing element 7 uses a pipeline in the processing element 7 as a delay device so as to allow the pieces of held data c01 to c0n used as control signal data (e.g., enable signal) to propagate. Thus, the reconfigurable circuit according to this example does not include the counter element 103 and the enable delay element 108, unlike the conventional reconfigurable LSI 101. With this configuration, the space for the processing element in a semiconductor chip can be saved. Also, the reconfigurable circuit can be miniaturized because the number of processing elements is reduced. Furthermore, in the reconfigurable circuit according to this example, a network is not occupied by a counter element and so on unlike in the conventional circuit, and thus a wiring load is reduced and high speed performance can be realized.
A processing element and a reconfigurable circuit including the same according to example 7 of the embodiment are described with reference to
Furthermore, the processing element 7 includes the multiplier circuit 13 to perform operation on the input data DI and predetermined data Dx input from the outside. The multiplier circuit 13 includes the multiplier 13a to multiply the input data DI by the predetermined data Dx and the register 13b to temporarily hold multiplication data output from the multiplier 13a and output the multiplication data as output data DO to the outside of the processing element 7.
Next, performance of the processing element 7 according to this example is described with reference to
As described above, according to this example, the processing element 7 includes the shift register 3 and the multiplier circuit 13. Thus, the reconfigurable circuit including the processing element 7 according to this example does not include the counter element 103 and the enable delay element 108, unlike the conventional reconfigurable LSI 101. Thus, the reconfigurable circuit according to this example can have the same advantages as those in example 6.
A processing element and a reconfigurable circuit including the same according to example 8 are described with reference to
The performance of the processing element 7 according to this example is the same as that in example 7 except that the shift/mask circuit 17 performs a bit shift process and a bit mask process on the input data DI.
As described above, according to this example, the processing element 7 includes the shift/mask circuit 17 and thus can multiply some of many bits of the input data DI by data Dx. The reconfigurable circuit including the processing element 7 according to this example does not include the counter element 103 and the enable delay element 108, unlike the conventional reconfigurable LSI 101. Thus, the reconfigurable circuit according to this example can have the same advantages as those in examples 6 and 7.
A processing element and a reconfigurable circuit including the same according to example 9 of the embodiment are described with reference to
The operation unit includes the accumulating adder circuit 21 in which the number of accumulations of the input data ID is controlled based on the pieces of held data c01 to c0n. The accumulating adder circuit 21 includes the adder 21a to cumulatively add pieces of the input data DI, the register 21b to temporarily hold addition data generated by the adder 21a and output the data as output data DO to the outside of the processing element 7, and the register 23 controlled by the pieces of held data c01 to c0n. The processing element 7 includes the accumulating adder circuit 21 and thus has a function as an accumulating adder element.
The register 23 temporarily holds addition data output from the adder 21a and outputs the addition data to the adder 21a in synchronization with a rising edge of a clock signal. The adder 21a adds the addition data output from the register 23 and the input data DI, so that the accumulating adder circuit 21 can cumulatively add pieces of the input data. The register 23 can reset the data held therein based on the pieces of held data c01 to c0n. Furthermore, the register 23 can determine whether the output data DO is to be output from the register 21b based on the pieces of held data c01 to c0n. In this way, the accumulating adder circuit 21 can control output timing of operated data based on the pieces of held data c01 to c0n. The accumulating adder circuit 21 can control the number of accumulations of the input data DI and output timing of the addition data based on the pieces of held data c01 to c0n. When the held data of “1” is input to the control terminal E, the register 21b outputs the addition data accumulated so far as output data DO.
Next, performance of the processing element 7 according to this example is described with reference to
The shift register 3 performs in the same manner as in the shift register 3 according to example 6. For example, the shift register 3 sequentially outputs the pieces of held data c01 to c0n to the register 23 via the output terminal 5 in synchronization with rising edges of the clock signal. The pieces of held data c01 to c0n-1 are data of “0” and the held data c0n is data of “1”. Thus, the shift register 3 outputs the pieces of held data c01 to c0n-1 of “0” to the register 23 during n-1 clock cycles from the initial setting, and outputs the held data c0n of “1” to the register 23 at the n-th clock. Then, the shift register 3 outputs the held data c0n of “1” to the register 23 at every n clock cycles.
On the other hand, each piece of the input data DI is input to the adder 21a in synchronization with a rising edge of the clock signal. The accumulating adder circuit 21 outputs addition data generated by adding first input data DI and the data of “0” held in the register 23 to the registers 21b and 23 in synchronization with output timing of the held data c01. For example, if the held data output from the shift register 3 and the number-of-stages determining circuit 4 is “0”, the register 23 holds the addition data and outputs it to the adder 21a at a next rising edge of the clock signal. The register 21b outputs the addition data as output data DO to the outside of the processing element 7.
The processing element 7 repeats the above-described performance, and the accumulating adder circuit 21 cumulatively adds first to n-th input data DI. Assuming that the first to n-th input data DI are x01 to x0n, the accumulating adder circuit 21 outputs the output data DO=x01+x02+ . . . +x0n after n clock cycles. At the same time when cumulatively-added data to the n-th input data DI is input to the register 23, the held data c0n of “1” is input from the shift register 3 and the number-of-stages determining circuit 4 to the control terminal E of the register 23. Accordingly, the register 23 resets the held data to “0”, the value at the initial setting. Since the data held in the register 23 is reset to “0” at every n clock cycles, the processing element 7 can cumulatively add n pieces of input data DI. According to the above description, the register 21b is controlled to output data DO every time addition data is input thereto. However, the register 21b may be controlled to output cumulatively-added first to n-th input data DI as output data DO every time held data of “1” is input to the control terminal E of the register 23, that is, at every n clock cycles.
As described above, in the processing element 7 according to this example, the pieces of held data c01 to c0n as control signals propagate through a pipeline in the processing element 7, and data can be accumulated and output at arbitrary timing and arbitrary times. Thus, the processing element 7 functions as an accumulating adder element. The reconfigurable circuit including the processing element 7 can perform operation in the single processing element, and thus timing adjustment between the input data DI and the coefficients a01 to a0n is not required. Accordingly, the reconfigurable circuit according to this example does not include the counter element 103 and the enable delay element 108, unlike the conventional reconfigurable LSI 101. Thus, the reconfigurable circuit according to this example has the same advantages as those in examples 6 to 8.
A processing element and a reconfigurable circuit including the same according to example 10 of the embodiment are described with reference to
The configuration and performance of the processing element 7 according to this example are the same as those of the processing element 7 according to example 9 except that the pieces of data added in the accumulating adder circuit 21 are pieces of multiplication data output from the multiplier circuit 13, and thus the corresponding description is omitted.
As described above, the processing element 7 according to this example functions as a product-sum operation element. Thus, the reconfigurable circuit including the processing element 7 according to this example does not include the counter element 103, the data delay element 107, and the enable delay element 108, unlike the conventional reconfigurable LSI 101. Thus, the reconfigurable circuit according to this example can have the same advantages as those in examples 6 to 9.
A processing element and a reconfigurable circuit including the same according to example 11 of the embodiment are described with reference to
As described above, according to this example, the processing element 7 can perform a product-sum operation on some of many bits of the input data DI and the data Dx. The reconfigurable circuit including the processing element 7 according to this example does not include the counter element 103, the data delay element 107, and the enable delay element 108, unlike the conventional reconfigurable LSI 101. Thus, the reconfigurable circuit according to this example can have the same advantages as those in examples 6 to 10.
A processing element and a reconfigurable circuit including the same according to example 12 of the embodiment are described with reference to
At the initial setting, the coefficients a02, a01, and a00 of the spatial filter 111 illustrated in
Next, performance of the processing elements 7a, 7b, and 7c and the reconfigurable circuit 1 including the same according to this example is described with reference to
When a clock signal rises after the initial setting, for example, in synchronization with a rising edge of the clock signal, the registers 3Rn in the last stages of the processing elements 7a, 7b, and 7c output the coefficients a00, a10, and a20 to the registers 3Rn-2 in the n-2-th stages via the selectors 4Sn-3, respectively, the registers 3Rn-1 in the n-1-th stages output the coefficients a01, a11, and a21 to the registers 3Rn in the last stages via the selectors 4Sn-1 (not illustrated), respectively, and the registers 3Rn-2 in the n-2-th stages output the coefficients a02, a12, and a22 to the registers 3Rn-1 in the n-1-th stages via the selectors 4Sn-2 (not illustrated), respectively. Also, in synchronization with a rising edge of the clock signal, the registers 3Rn in the last stages output the coefficients a00, a10, and a20 to the selectors 15 via the output terminals 5, respectively. The selectors 15 output the coefficients a00, a10, and a20 to the registers 22, respectively.
At the same time when the coefficients a00, a10, and a20 are output from the registers 3Rn in the last stages, the input data DI0 composed of the image data segments x00, x10, and x20 illustrated in
Assume that each of the multiplier circuits 13 has a function of performing operation on the low 8 bits on the LSB side of the input data DI and the 8-bit coefficients a00 to a0n. The shift/mask circuit 17 of the processing element 7a shifts 16 bits of the input data DI0 to the right, shifts the image data segment x00 to the low 8 bits, and then performs a bit mask process on the high 18 bits. The shift/mask circuit 17 of the processing element 7b shifts 8 bits of the input data DI0 to the right, shifts the image data segment x10 to the low 8 bits, and then performs a bit mask process on the high 18 bits. The shift/mask circuit 17 of the processing element 7c does not perform a bit shift process and performs a bit mask process on the high 18 bits. Accordingly, the image data segments x00, x10, and x20 to be operated are shifted to the low 8 bits. The pieces of input data DI0 on which a bit shift process and a bit mask process have been done in the shift/mask circuits 17 are output to the registers 19, respectively.
After a clock signal rises again after the above-described performance, in synchronization with a rising edge of the clock signal, the registers 19 of the processing elements 7a, 7b, and 7c output the input data DI0 to the multipliers 13a, and the registers 22 output the coefficients a00, a10, and a20 to the multipliers 13a, respectively. The multipliers 13a of the processing elements 7a, 7b, and 7c multiply the input data DI0 by the coefficients a00, a10, and a20, respectively, and output multiplication data to the registers 13b. The registers 13b of the processing elements 7a, 7b, and 7c hold the multiplication data.
The shift reregisters 3 perform in synchronization with a rising edge of the clock signal Accordingly, the coefficients a01, a11, and a21 are held in the registers 3Rn-2 in the n-2-th stages, the coefficients a00, a10, and a20 are held in the registers 3Rn-1 in the n-1-th stages, and the coefficients a02, a12, and a22 are held in the registers 3Rn in the last stages. At that time, the coefficients a01, a11, and a21 output from the register 3Rn in the last stages are held in the registers 22, respectively. Furthermore, a bit shift process and a bit mask process are performed by the shift/mask circuits 17 on the input data DI1 (see
When the clock signal rises again after the above-described performance, in synchronization with a rising edge of the clock signal, pieces of multiplication data output from the registers 13b of the processing elements 7a, 7b, and 7c are output to the adders 21a, respectively. Since the registers 21b hold data of “0”, the adders 21a add the multiplication data and the data of “0” and output addition data to the registers 21b. The registers 21b of the processing elements 7a, 7b, and 7c hold the addition data output from the adders 21a as held addition data. For example, in the processing element 7a, the held addition data is a00×x00.
In synchronization with a rising edge of the clock signal, the registers 19 of the processing elements 7a, 7b, and 7c output the input data DI1 to the multipliers 13a, and the registers 22 output the coefficients a01, a11, and a21 to the multipliers 13a. The multipliers 13a of the processing elements 7a, 7b, and 7c multiply the input data DI1 by the coefficients a01, a11, and a21, respectively, and output multiplication data to the registers 13b. The registers 13b of the processing elements 7a, 7b, and 7c hold the multiplication data.
The shift registers 3 perform in synchronization with a rising edge of the clock signal. Accordingly, the coefficients a02, a12, and a22 are held in the registers 3Rn-2 in the n-2-th stages, the coefficients a01, a11, and a21 are held in the registers 3Rn-1 in the n-1-th stages, and the coefficients a00, a10, and a20 are held in the registers 3Rn in the last stages. At that time, the coefficients a02, a12, and a22 output from the registers 3Rn in the last stages are held in the registers 22. Furthermore, a bit shift process and a bit mask process are performed by the shift/mask circuits 17 on the input data DI2 (see
When the clock signal rises again after the above-described performance, the registers 21b of the processing elements 7a, 7b, and 7c output the data held therein as output data DOa, DOb, and DOc to the outside of the processing elements 7a, 7b, and 7c, in synchronization with a rising edge of the clock signal.
In synchronization with a rising edge of the clock signal, the registers 13b of the processing elements 7a, 7b, and 7c output multiplication data held therein to the adders 21a, and the registers 21b output addition data held therein to the adders 21a. The adders 21a add the input multiplication data and the held addition data and output the generated data to the registers 21b. The registers 21b of the processing elements 7a, 7b, and 7c hold the addition data output from the adders 21a as held addition data. For example, in the processing element 7a, the held addition data is a00×x00+a01×x01.
in synchronization with a rising edge of the clock signal, the registers 19 of the processing elements 7a, 7b, and 7c output the input data DI2 to the multipliers 13a, and the registers 22 output the coefficients a02, a12, and a22 to the multipliers 13a. The multipliers 13a of the processing elements 7a, 7b, and 7c multiply the input data DI2 by the coefficients a02, a12, and a22 and output multiplication data to the registers 13b. The registers 13b of the processing elements 7a, 7b, and 7c hold the multiplication data.
The shift reregisters 3 perform in synchronization with a rising edge of the clock signal. Accordingly, the coefficients a00, a10, and a20 are held in the registers 3Rn-2 in the n-2-th stages, the coefficients a02, a12, and a22 are held in the registers 3Rn-1 in the n-1-th stages, and the coefficients a01, a11, and a21 are held in the registers 3Rn in the last stages. At that time, the coefficients a00, a10, and a20 output from the register 3Rn in the last stages are held in the registers 22, respectively. Furthermore, a bit shift process and a bit mask process are performed by the shift/mask circuits 17 on input data DI3 (not illustrated) that is input in synchronization with a rising edge of the clock signal, and the input data DI3 is held in the respective registers 19.
The reconfigurable circuit 1 repeats the above-described performance. After 5 clock cycles from the initial setting state, the processing element 7a outputs output data DOa satisfying expression (4), the processing element 7b outputs output data DOb satisfying expression (5), and the processing element 7c outputs output data DOc satisfying expression (6). The reconfigurable circuit 1 adds the output data DOa, DOb, and DOc from the processing elements 7a, 7b, and 7c by using an adder circuit (not illustrated), so as to calculate the image data Y11 expressed by expression (7) as new image data of the image data x11 of the target pixel in the operation execution unit x11 illustrated in
Y11=a00×x0(i−1)+a01×x0i+a02×x0(i+1)+a10×x1(i−1)+a11×x1i+a12×x1(i+1)+a20×x2(i−1)+a21×x2i+a22×x2(i+1) (8)
Herein, note that i is an integer satisfying 1≦i≦n-1.
In the reconfigurable circuit 1, read from the line buffers LB0 to LB2 starts before all the image data segments x00 to x22 included in the operation execution unit x11 have been read. Therefore, the reconfigurable circuit 1 can perform a pipeline process.
As described above, according to this example, the reconfigurable circuit includes the processing elements 7a, 7b, and 7c and thus does not include the counter element 137, the data delay element 136, and the RAM elements 121, 122, and 123, unlike the conventional reconfigurable LSI 201. Accordingly, the space for the processing element in the semiconductor chip can be saved. Also, in the reconfigurable circuit 1 according to this example, the chip size can be reduced because the number of processing elements is reduced. Furthermore, in the reconfigurable circuit 1, a network is not occupied by the counter element and so on unlike in the conventional circuit, so that a wiring load reduces and high speed performance can be realized.
Although a few preferred embodiments of the present invention have been shown and described, it would be appreciated by those skilled in the art that changes may be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the claims and their equivalents.
Number | Date | Country | Kind |
---|---|---|---|
2007-42361 | Feb 2007 | JP | national |