The present invention relates to a data processing circuit comprising at least a first functional unit able to perform a n-taps polyphase filtering and a second filtering unit able to perform a m-taps polyphase filtering, m and n being integers greater than or equal to two, as well as a memory device able to store data and coefficients.
The invention finds an application, for example, in an image processing system, in particular in a real-time system.
Some image processing systems use polyphase filters. For example, when video data are broadcast in a high-definition format, it is necessary to convert them into a standard format in order to be able to display them on a television whose screen is not compatible with the high-definition format. A polyphase filter in particular makes it possible to perform such a conversion with good quality.
United States patent U.S. Pat. No. 5,383,155 granted on 17 Jan. 1995 describes several embodiments of polyphase filters. In one of the embodiments, the polyphase filter described is a 64-taps polyphase filter consisting of eight 8-taps polyphase filters placed in series.
Data are received in series one after another by the filter. These data correspond for example to pixel values P1 to P8 of an input image. In addition, a clock controls the registers. At each clock cycle, a data item is received at the register 101. When a data item arrives at the register 101, the data item situated in the register 101 shifts towards the register 102, the data item situated in the register 102 shifts towards the register 103 and so on. Thus, after eight clock cycles, the pixel value P8 is situated in the register 101, the pixel value P7 in the register 102 and so on. The multipliers then calculate values c8*P8, c7*P7 and so on. The adder 120 next calculates a result S:
S=c1*P1+c2*P2+c3*P3+c4*P4+c5*P5+c6*P6+c7*P7+c8*P8.
A drawback of such a filter lies in the fact that it carries out only a polyphase filtering with a fixed number of taps. This is because, once the filters 201 to 208 in
However, current video processing systems require various types of polyphase filtering, in particular because of the large number of image formats used in television. Consequently, if it is wished to use the teachings of the patent cited above, it is necessary to provide in this circuit as many polyphase filters as there are types of polyphase filtering required. Such a solution has many drawbacks, in particular because these circuits occupy a large surface area of silicon in the circuit.
It is an object of the invention to propose a processing circuit occupying a small surface area and making it possible to perform various types of polyphase filtering.
A processing circuit according to the invention as defined in the opening paragraph is characterized in that the functional units are able to receive in parallel data and coefficients coming from the memory device, calculate results from said data and coefficients and supply these results to the memory device.
According to the invention, the data to be processed by a functional unit are directly sent by the memory device. The functional units communicate by means of the memory device. Thus the functional units are not physically connected to each other, which makes it possible to perform various types of polyphase filtering, by suitably programming the processing circuit.
For example, the processing circuit can comprise ten functional units, each being able to perform a 2-taps polyphase filtering. In this case, it is possible, as will be seen in more detail below, to perform a 2-taps polyphase filtering, a 4-taps polyphase filtering and so on up to a 20-taps polyphase filtering. For example, for a 10-taps polyphase filtering, five functional units calculate intermediate results from two data items and these intermediate results, sent to the memory device, are then added in order to obtain a final result.
Advantageously, at least one functional unit is able to function according to a direct mode and a transposed mode, the circuit comprising control means for controlling the functioning mode of said functional unit.
This makes it possible, with the same processing circuit, to perform a polyphase filtering in direct or transposed mode, which increases the number of types of polyphase filtering which can be performed by this processing circuit, without considerably increasing the surface area of this circuit.
Preferably, at least one functional unit is also able to perform a multiplication-accumulation using two data items coming from the memory device. This increases still further the processing capabilities of such a circuit.
The processing circuit can simultaneously perform a polyphase filtering and one or more multiplication-accumulation operations. This is possible because of the great flexibility of this circuit. This is because, the data being sent to the functional units by the memory device, each functional unit is independent of the others; consequently the functional units can perform tasks which are different from each other.
Advantageously, the processing circuit comprises a crossbar able to provide a transfer of data, coefficients and results between the memory device and at least one functional unit.
Such a crossbar ensures rapid communications in parallel of data between the memory device and the functional units, as well as good management of such communications.
The invention will be further described with reference to examples of embodiments shown in the drawings to which, however, the invention is not restricted.
a and 4b depict input and output image pixels for filtering in direct mode and transposed mode,
The data storage device 301, the coefficient storage device 302 and the result storage device 308 form a memory device. The reading crossbar 303 and the writing crossbar 307 form a crossbar.
The memory device can comprise a single physical entity, for example a register bank able to store data, coefficients and results. The crossbar can also consist of a single physical entity.
The first functional unit 304 is able to perform a 2-taps polyphase filtering, the second functional unit 305 a 4-taps polyphase filtering and the third functional unit 306 a 2-taps polyphase filtering. A functional unit can receive in one clock cycle a data item coming from the data storage device 301. This data item corresponds for example to a pixel value of an input image, for example a chrominance value.
Assume that it is wished to perform, with the processing circuit of
A first solution comprises using solely the second functional unit 305. The data to be processed and the coefficients are sent to this functional unit, which processes them in the same way as in the prior art and supplies results which correspond for example to pixel values of an output image. The reading crossbar 303 comprises multiplexers controlled by a control system, not shown in
A second solution consists of using the first functional unit 304 and the third functional unit 306. Assume, as indicated in
P′1=c11*P1+c12*P2+c13*P3+c14*P4
P′2=c21*P1+c22*P2+c23*P3+c24*P4
P′3=c31*P1+c32*P2+c33*P3+c34*P4
P′4=c41*P1+c42*P2+c43*P3+c44*P4
P′5=c51*P1+c52*P2+c53*P3+c54*P4
P′6=c61*P1+c62*P2+c63*P3+c64*P4
P′7=c71*P2+c72*P3+c73*P4+c74*P5
During a first clock cycle, the value P1 is sent to the first functional unit 304 able to perform a direct 2-taps polyphase filtering, the value P3 is sent to the third functional unit 306 able to perform a direct 2-taps polyphase filtering, the coefficients c11 and c12 are sent to the first functional unit 304 and the coefficients c13 and c14 are sent to the third functional unit 306. During a second clock cycle, the value P2 is sent to the first functional unit 304 and the value P4 is sent to the third functional unit 306.
The first functional unit 304 then calculates a first intermediate result c11*P1+c12*P2 and the third functional unit 306 calculates a second intermediate result c13*P3+c14*P4. These intermediate results are sent to the result storage device 308 by means of the writing crossbar 307. Once stored in the memory device, these intermediate results can subsequently be added in order to obtain the value P′1, by means of an adder, not shown in
At the next clock cycle, the coefficients c21 and c22 are sent to the first functional unit 304 and the coefficients c23 and c24 are sent to the third functional unit 306. The first functional unit 304 then calculates an intermediate result c21*P1+c22*P2 and the third functional unit 306 calculates another intermediate result c23*P3+c24*P4. These intermediate results are sent to the result storage device 308 by means of the writing crossbar 307.
The same procedure is followed for calculating the values P′3 to P′6.
At the clock cycle following the calculation of c61*P1+c62*P2 and c63*P3+c64*P4, the value P3 is sent to the first functional unit 304, the value P5 is sent to the third functional unit 306, the coefficients C71 and c72 are sent to the first functional unit 304 and the coefficients C73 and C74 are sent to the third functional unit 306. The first functional unit 304 then calculates an intermediate result c71*P2+c72*P3 and the third functional unit 306 calculates another intermediate result c73*P4+c74*P5. These intermediate results are sent to the result storage device 308 by means of the writing crossbar 307.
If it is wished to perform a 6-taps polyphase filtering with the processing circuit of
If it is wished to perform an 8-taps polyphase filtering, the three functional units 304 to 306 are used.
The processing circuit of
Consequently the processing circuit according to the invention makes it possible to perform several types of polyphase filtering, requiring a surface area comparable with that required in the prior art. This is because the functional units, as will be seen in more detail in
The example described above applies to a direct polyphase filtering. It is possible, with the processing circuit according to the invention, to perform a transposed polyphase filtering if functional units able to perform a transposed polyphase filtering are available.
Assume, as indicated in
P″1=c11*P1+c12*P2+c13*P3+c14*P4
P″2=c21*P2+c22*P3+c23*P4+c24*P5
During a first clock cycle, the value P1 is sent to the first functional unit 304 able to perform a transposed 2-taps polyphase filtering, the value P3 is sent to the third functional unit 306 able to perform a transposed 2-taps polyphase filtering, the coefficients c11 and 0 are sent to the first functional unit 304 and the coefficients c13 and 0 are sent to the third functional unit 306. The value c11*P1 is then calculated and stored in a register of the first functional unit 304. In the same way, the value c13*P3 is calculated and stored in a register of the third functional unit 306.
During a second clock cycle, the value P2 is sent to the first functional unit 304, the value P4 is sent to the third functional unit 306, the coefficients c21 and c12 are sent to the first functional unit 304 and the coefficients c23 and c14 are sent to the third functional unit 306. The first functional unit 304 then calculates the value c11*P1+c12*P2 and the third functional unit 306 calculates the value c13*P3+c14*P4. These values are sent to the result storage device 308.
During a third clock cycle, the value P3 is sent to the first functional unit 304, the value P5 is sent to the third functional unit 306, the coefficients 0 and c22 are sent to the first functional unit 304 and the coefficients 0 and c24 are sent to the third functional unit 306. The first functional unit 304 then calculates the value c21*P2+c22*P3 and the third functional unit 306 calculates the value c23*P4+c24*P5. These values are sent to the result storage device 308.
It can thus be seen that the processing circuit according to the invention makes it possible to reduce the time required by an initialization of the filtering. This is because, in order to perform a transposed polyphase filtering using five pixel values, as indicated in
This functional unit can function according to a direct mode and a transposed mode. When the functional unit functions in direct mode, the multiplexers 511 to 514, controlled by a control circuit, not shown in
Take the example detailed in the description of
At the following clock cycle, the coefficients c11 and c12 are replaced by the coefficients c21 and c22. The value P2 is reinjected into the register 501 by means of the multiplexer 515. Likewise, the value P1 is reinjected into the register 502 by means of the multiplexer 516. The functional unit then calculates the value c22*P2+c21*P1. The same procedure is followed for calculating P′3 to P′6.
When, for calculating P′7, the value P3 is sent into the register 501, the value P2 is sent into the register 502. The multipliers 521 and 522 then calculate the values c72*P3 and c71*P2 and the adder 531 calculates the value c71*P2+c72*P3, which is sent to the result storage device 308.
Take the example detailed in the description of
At the first clock cycle, the value P1 is sent to the multipliers 521 and 522, the coefficient c11 is sent to the multiplier 521 and a zero coefficient is sent to the multiplier 522. The value c11*P1 is then calculated and stored in the register 503.
At the second clock cycle, the value P2 is sent to the multipliers 521 and 522, the coefficient c21 is sent to the multiplier 521 and the coefficient c12 is sent to the multiplier 522. The value c21*P2 is then calculated and stored in the register 503, whilst the adder 531 calculates the value c11*P1+c12*P2, which is stored in the register 504 and will be sent to the result storage device 308 at the third clock cycle.
At the third clock cycle, the value P3 is sent to the multipliers 521 and 522, the coefficient c21 is sent to the multiplier 522 and a zero coefficient is sent to the multiplier 521. The value c22*P3+c21*P2 is then calculated and stored in the register 504 and will be sent to the result storage device 308 at the following clock cycle.
When this functional unit has to perform a multiplication-accumulation, the multiplexers 511 to 514, controlled by a control circuit, not shown in
Assume for example that it is wished to calculate, from four data P1 to P4, a value P1*P2+P3*P4. During a first clock cycle, the data item P1 is sent to the input denoted P and the data item P2 to the input denoted c2. The value P1*P2 is then calculated by the multiplier 521 and stored in the register 503. During a second clock cycle, the value P1P2 is sent to the register 504, the data item P3 is sent to the input denoted P and the data item P4 to the input denoted c2. The value P3*P4 is then calculated by the multiplier 521 and stored in the register 503. During a third clock cycle, the adder 531 performs the addition between the values P1*P2 and P3*P4, the result of this addition then being stored in the register 504 and being able to be sent to the result storage device 308 at the following clock cycle.
A multiplication-accumulation of this type is used for example for performing a multiplication of matrices or a convolutional filtering.
A functional unit of this type is able to perform various types of filtering. When a functional unit of this type is integrated in a circuit according to the invention, it can therefore perform various processings, independently of the other functional units. For example, assuming that the functional units 304 to 306 of
Naturally, because of the great flexibility of the processing circuit according to the invention, a large number of simultaneous processings can be conceived of, according to the number and type of functional units.
A circuit like the one depicted in
The verb “to comprise” and its conjugations should be interpreted broadly, that is to say as not excluding the presence not only of elements other than those listed after the said verb but also a plurality of elements already listed after said verb and preceded by the article “a” or “one”.
Number | Date | Country | Kind |
---|---|---|---|
02 09745 | Jul 2002 | FR | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/IB03/03061 | 7/9/2003 | WO | 00 | 1/26/2005 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2004/013963 | 2/12/2004 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
4592027 | Masaki | May 1986 | A |
4785411 | Thompson et al. | Nov 1988 | A |
4864574 | Pritt | Sep 1989 | A |
4928265 | Higuchi et al. | May 1990 | A |
4953130 | Houston | Aug 1990 | A |
4954992 | Kumanoya et al. | Sep 1990 | A |
4975877 | Bell | Dec 1990 | A |
5027325 | Katsura | Jun 1991 | A |
5031150 | Ohsawa | Jul 1991 | A |
5077690 | Smith | Dec 1991 | A |
5383145 | Sakiyama et al. | Jan 1995 | A |
5383155 | Ta | Jan 1995 | A |
6308191 | Dujardin et al. | Oct 2001 | B1 |
6889238 | Johnson | May 2005 | B2 |
6963890 | Dutta et al. | Nov 2005 | B2 |
20010007573 | Kingston et al. | Jul 2001 | A1 |
Number | Date | Country |
---|---|---|
0942530 | Sep 1999 | EP |
Entry |
---|
Ouelette et al.: “BICMOS SRAM with Array-Integrated Sense Device”, IBM Burlington Technical Disclosure, pp. 1-3, May 1991. |
Number | Date | Country | |
---|---|---|---|
20060036665 A1 | Feb 2006 | US |