The present application claims priority from Japanese patent application JP 2009-032687 filed on Feb. 16, 2009, the content of which is hereby incorporated by reference into this application.
The present invention relates to a filter processing technique and, further, to a filter processing module and a semiconductor device to which the technique is applied.
In a filter processing (convolution operation), filter coefficients are sequentially called, each of the read coefficients is subjected to product-sum operation with input data, and results are accumulated, thereby enabling an arithmetic operation of the number of taps exceeding the number of arithmetic logic units to be performed.
For example, patent document 1 discloses a digital filter configured so as not to increase the hardware scale even if the number of taps in a filter to be used increases. According to the technique, a device is controlled on the basis of a written filter coefficient or control data. Therefore, by changing data to be written into a memory, the filer and the sampling rate conversion rate can be changed without increasing the device scale.
Japanese Unexamined Patent Publication No. 2001-24479
However, when the inventors of the present invention examined the conventional filter processing technique, they found out that the efficiency of a two-dimensional filter processing on two-dimensional data such as an image has to be improved. In the following, an image will be used as an example of the two-dimensional data.
In many cases, the two-dimensional filter processing on an image is performed twice in the horizontal direction and the vertical direction of the image. The flow of processing is as follows. First, data of the number of pieces necessary for the second filter processing is sequentially supplied to a plurality of arithmetic logic units performing a first filter processing and, at the same time, the first filter processing is performed. Results of the first filter processing are sequentially supplied to a plurality of arithmetic logic units corresponding to the second filter processing, and the second filter processing is performed. Consequently, in the case where the number of pieces of data necessary for the second filter processing is larger than the element number of arithmetic logic units performing the first filter processing, the filter processing is performed a plurality of times until the processing on data necessary for the second filter processing is finished. As a result, there is the possibility that the timing of starting the second filter processing delays. In the case where the number of pieces of data necessary for the second filter processing is extremely smaller than the element number of arithmetic logic units performing the first filter processing, the number of arithmetic logic units performing the first filter processing uselessly increases.
The technique described in the patent document 1 does not adjust the number of pieces of data which is input per cycle in accordance with the number of taps of the filter processing and size of data generated by the plural arithmetic logic units simultaneously, and cannot solve the problem.
An object of the present invention is to provide a technique for improving efficiency of a two-dimensional filter processing on two-dimensional data such as an image.
The above and other objects and novel features of the present invention will become apparent from the description of the specification and the appended drawings.
Representative one of inventions disclosed in the application will be briefly described as follows.
A filter processing module includes a filter circuit and a control circuit. The filter circuit includes: a first register capable of storing data; a first arithmetic logic unit capable of executing a first filter processing on the basis of output data of the first register; a second register capable of storing a result of the arithmetic operation of the first arithmetic logic unit; and a second arithmetic logic unit capable of executing a second filter processing on the basis of output data of the second register. The control circuit can adjust the number of pieces of data which is input per cycle in the first register in accordance with the number of taps in the first filter processing, size of an execution result of the first filter processing, and the number of second arithmetic logic units, thereby promptly completing the first filter processing.
An effect obtained by the representative one of the inventions disclosed in the application is briefly described as follows.
That is, according to the present invention, the efficiency of the filter processing on an image can be improved.
First, outline of representative embodiments of the present invention disclosed in the application will be described. Reference numerals of the drawings referred to in parentheses in the description of the outline of the representative embodiments merely illustrate components designated with the reference numerals included in the concept of the components.
(1) A filter processing module (100) according to a representative embodiment of the invention includes a filter circuit (208) that performs a filter processing on input data, and a control circuit that controls operation of the filter circuit. The filter circuit includes a first register (206) capable of storing input data to the filter processing module (100) and a first arithmetic logic unit (207) capable of executing a first filter processing on the basis of output data of the first register. The filter circuit further includes: a second register (206) capable of storing a result of the arithmetic operation of the first arithmetic logic unit, and a second arithmetic logic unit (207) capable of executing a second filter processing on the basis of output data of the second register. The control circuit can adjust the number of pieces of data which is input per cycle in the first register in accordance with the number of taps in the first filter processing, size of an execution result of the first filter processing, and the number of second arithmetic logic units.
With the configuration, the control circuit adjusts the number of pieces of data which is input per cycle in the first register in accordance with the number of taps in the first filter processing, size of an execution result of the first filter processing, and the number of second arithmetic logic units. Consequently, the first filter processing can be completed promptly, the result of the processing can be supplied to the second filter processing, and the timing of starting the second filter processing can be hastened as compared with the conventional technique.
(2) According to another aspect, the filter circuit may include a first register (206), a first arithmetic logic unit (207), a second register (206), a second arithmetic logic unit (207), and a third register (206). In the first register (206), the above-described data is stored. The first arithmetic logic unit (207) executes a first filter processing on the basis of output data of the first register. In the second register (206), a result of the arithmetic operation of the first arithmetic logic unit is stored. The second arithmetic logic unit (207) executes a second filter processing. In the third register (206), a result of the arithmetic operation of the second arithmetic logic unit is stored.
The control circuit adjusts the number of pieces of data which is input per cycle in the first register in accordance with the number of taps in the first filter processing, size of an execution result of the first filter processing, and the number of second arithmetic logic units. The control circuit adjusts the number of pieces of data which is input per cycle in the second register in accordance with the number of taps in the second filter processing, size of an execution result of the second filter processing, and the number of first arithmetic logic units.
With the configuration, the control circuit adjusts the number of pieces of data which is input per cycle in the first register in accordance with the number of taps in the first filter processing, size of an execution result of the first filter processing, and the number of second arithmetic logic units. Consequently, the first filter processing can be completed promptly, the result of the processing can be supplied to the second filter processing, and the timing of starting the second filter processing can be hastened as compared with the conventional technique. The control circuit also adjusts the number of pieces of data which is input per cycle in the second register in accordance with the number of taps in the second filter processing, size of an execution result of the second filter processing, and the number of first arithmetic logic units. Therefore, the case where the number of pieces of data necessary for the second filter processing is much smaller than the number of the arithmetic logic units performing the first filter processing can be avoided.
(3) In the configuration (2), the control circuit may include an arithmetic parameter calculator (204) capable of calculating an arithmetic parameter, and a control unit (202) that controls operation of the filter circuit on the basis of the arithmetic parameter.
The arithmetic parameter calculator may include a first tap-quantity register (301), a second tap-quantity register (311), a first arithmetic-element-quantity register (312), a second arithmetic-element-quantity register (302), a first output size register (303), a second output size register (313), a first filter processing number-of-times calculator (314), a second filter processing number-of-times calculator (304), a first input size calculator (305), and a second input size calculator (315). The first tap-quantity register (301) holds the number of taps in a first filter processing of an image. The second tap-quantity register (311) holds the number of taps in a second filter processing of an image. The first arithmetic-element-quantity register (312) holds the number of arithmetic logic units for the first filter processing. The second arithmetic-element-quantity register (302) holds the number of arithmetic logic units for the second filter processing. The first output size register (303) holds size of an execution result of the first filter processing. The second output size register (312) holds size of an execution result of the second filter processing. The first filter processing number-of-times calculator (314) calculates the number of times of the first filter processing from the number of taps in the second filter processing, the size of the execution result of the second filter processing, and the number of arithmetic logic units for the first filter processing. The second filter processing number-of-times calculator (304) calculates the number of times of the second filter processing from the number of taps in the first filter processing, the size of the execution result of the first filter processing, and the number of arithmetic logic units for the second filter processing. The first input size calculator (305) calculates the number of pieces of data which is input per cycle to the first register from the number of taps in the first filter processing, the number of times of the second filter processing, and the size of the execution result of the first filter processing. The second input size calculator (315) calculates the number of pieces of data which is input per cycle to the second register from the number of taps in the second filter processing, the number of times of the first filter processing, and the size of the execution result of the second filter processing.
The control unit performs a filter processing in accordance with the number of pieces of data which is input per cycle to the first register, the number of pieces of data which is input per cycle to the second register, the number of times of the first filter processing, and the number of times of the second filter processing.
With the configuration, the first filter processing system and the second filter processing system are provided separately. Consequently, a first input size calculation result and a second input size calculation result can be obtained promptly.
(4) In the configuration (3), the control unit includes a CPU that executes an instruction for instructing update of the first tap-quantity register, the second tap-quantity register, the first output size register, the second output size register, the first arithmetic-element-quantity register, and the second arithmetic-element-quantity register.
(5) In the configuration (2), the filter processing module is coupled to a bus, receives an encoded image via the bus, adjusts the number of pieces of data which is input per cycle to the first register on the basis of a parameter in a stream as the encoded image, and adjusts the number of pieces of data which is input per cycle to the second register.
(6) According to another aspect, a semiconductor device can be configured by including an instruction decoder (1002), an arithmetic parameter calculator (1004), an index generator (1005), an internal register (1006), an arithmetic logic unit (1009), and a data generating circuit (1010). The instruction decoder (1002) decodes an input instruction. The arithmetic parameter calculator (1004) calculates the number of times of the first filter processing, the number of times of the second filter processing, and the number of pieces of data which is input per cycle to an arithmetic logic unit for the first filter processing, and calculates the number of pieces of data which is input per cycle to an arithmetic logic unit for the second filter processing on the basis of a parameter related to a filter processing, given via the instruction decoder. The index generator (1005) generates a corrected source index by correcting a source index fetched via the instruction decoder on the basis of the number of times of the first filter processing or the number of times of the second filter processing calculated by the arithmetic parameter calculator. The internal register (1006) outputs data corresponding to the source index. The arithmetic logic unit (1009) filters data output from the internal register. The data generating circuit (1010) receives an image, converts format of the image on the basis of an arithmetic parameter output from the arithmetic parameter calculator, and supplies the resultant to the internal register.
The arithmetic logic unit includes: a shift register (1007) capable of shifting data output from the internal register; and an SIMD arithmetic logic unit (1008) that computes output data of the shift register.
The arithmetic parameter calculator includes a first tap-quantity register (301), a second tap-quantity register (311), a first arithmetic-element-quantity register (312), a second arithmetic-element-quantity register (302), and a first output size register (303). The arithmetic parameter calculator also includes a second output size register (313), a first the-number-of-filter-processes calculator (314), a second the-number-of-filter-processes calculator (304), a first input size calculator (305), and a second input size calculator (315).
The first tap-quantity register (301) holds the number of taps in a first filter processing of an image. The second tap-quantity register (311) holds the number of taps in a second filter processing of an image. The first arithmetic-element-quantity register (312) holds the number of arithmetic logic units for the first filter processing. The second arithmetic-element-quantity register (302) holds the number of arithmetic logic units for the second filter processing. The first output size register (303) holds size of an execution result of the first filter processing. The second output size register (313) holds size of an execution result of the second filter processing. The first number-of-filter-processes calculator (314) calculates the number of times of the first filter processing from the number of taps in the second filter processing, the size of the execution result of the second filter processing, and the number of arithmetic logic units for the first filter processing. The second number-of-filter-processes calculator (304) calculates the number of times of the second filter processing from the number of taps in the first filter processing, the size of the execution result of the first filter processing, and the number of arithmetic logic units for the second filter processing. The first input size calculator (305) calculates the number of pieces of data which is input per cycle to the first register from the number of taps in the first filter processing, the number of times of the second filter processing, and the size of the execution result of the first filter processing. The second input size calculator (315) calculates the number of pieces of data which is input per cycle to the second register from the number of taps in the second filter processing, the number of times of the first filter processing, and the size of the execution result of the second filter processing.
(7) In the configuration (6), the instruction decoder decodes an instruction which updates at least one of the first tap-quantity register, the second tap-quantity register, the first arithmetic-element-quantity register, the second arithmetic-element-quantity register, the first output size register, and the second output size register.
Embodiments will be described in more details.
In the following, a filter processing in the vertical direction of an image will be described as a vertical filter, and a filter processing in the horizontal direction of an image will be described as a horizontal filter. In the drawings, components assigned with the same reference numeral have the same function.
The image processing apparatus includes a filter processing unit (FIL) 100, a host processor (HST) 101, a memory interface (MIF) 102, an I/O (input/output) circuit 103, and an external memory (EXT-MEM) 104 which are coupled to each other via a bus 105.
The host processor 101 performs a general operation control on the image processing apparatus by executing a predetermined program.
The external memory 104 stores a program to be executed by the host processor 101 and various data, and data is transmitted/received via the bus 105 and the memory interface 102.
The I/O circuit 103 is an interface with a device 106 handling an image, video data, and audio data, and transmits/receives data via the bus 105. Examples of the device coupled to the I/O circuit 103 include a video input device typified by a terrestrial digital tuner, an image input device typified by an image pickup device, and a display device typified by an LCD (Liquid Crystal Display). Video data is input from the video input device, and an image is input from the image input device. On the other hand, an image processed by the image processing apparatus is output to the display device.
The filter processing unit 100 performs a filter processing on an image transmitted via the bus 105. Concretely, the filter processing unit 100 performs an FIR (Finite Impulse Response) filter processing.
The filter processing unit 100 includes a bus interface (BIF) 201, a control unit (CTRL) 202, a memory (MEM) 203, an arithmetic parameter calculator (ACP) 204, and a filter circuit 208 and are formed, for example, on a single semiconductor substrate such as a single-crystal silicon substrate. A control circuit 209 is formed by including the control unit (CTRL) 202 and the arithmetic parameter calculator (ACP) 204.
The bus interface 201 transmits/receives various information to/from the host processor 101 coupled to the bus 105. The various information includes images before/after a filter processing and various control information on the filter processing.
The control unit 202 includes, for example, a CPU (Central Processing Unit) executing an instruction given via the bus interface 201, and generates a control signal 211 used for controlling the arithmetic parameter calculating unit 204 and a control signal 212 used for controlling the filter circuit 208. The control unit 202 determines the format of an image transferred to the memory 203 via the bus 105, and sends an instruction to transfer data from the external memory 104 to the bus interface 201.
The memory 203 is used for temporarily storing the number of taps in a filter processing performed by the filter processing unit 100, the size of the result of the arithmetic operation, an image to be subjected to the filter processing, an image subjected to the filter processing, and the like.
The filter circuit 208 includes an internal register (INT-REG) 206 and an arithmetic logic unit (EXE) 207 and performs a filter processing under control of the control unit 202. The internal register 206 receives data for use in the arithmetic processing in the arithmetic logic unit 207 from the memory 203 and holds it. A result of the arithmetic operation of the arithmetic logic unit 207 is written in the internal register 206, and a result of the arithmetic operation held in the internal register 206 is written in the memory 203. The arithmetic logic unit 207 performs, although not limited, an FIR (Finite Impulse Response) filter processing.
The arithmetic parameter calculator 204 receives a parameter related to the filter processing from the memory 203, and calculates the number of times of processing the horizontal filter, the number of times of processing the vertical filter, input size for the horizontal filter processing, and input size for the vertical filter processing. In the following, they will be described as the number of horizontal filter processing times, the number of vertical filter processing times, the horizontal input size, and the vertical input size. A filter processing frequency signal 213 made by the number of horizontal filter processing times and the number of vertical filter processing times and an input size signal 214 made by the horizontal input size and the vertical input size are input to the control unit 202.
The arithmetic parameter calculator 204 includes a vertical tap quantity register (TFV-REG) 301, a horizontal arithmetic element quantity register (NHO-REG) 302, a vertical output size register (VOS-REG) 303, a unit 304 of calculating the number of horizontal filter processing times (CNHFO), and a vertical input size calculator (CVSI) 305. The arithmetic parameter calculator 204 also includes a horizontal tap quantity register (TFH-REG) 311, a vertical arithmetic element quantity register (NVO-REG) 312, a horizontal output size register (HOS-REG) 313, a unit 314 of calculating the number of vertical filter processing times (CNVFO), and a horizontal input size calculator (CHSI) 305. In the following, the number of vertical taps is expressed as Tv, the number of horizontal arithmetic elements is expressed as Eh, vertical output size is expressed as Ov, the number of horizontal taps is expressed as Th, the number of vertical arithmetic elements is expressed as Ev, and the horizontal output size is expressed as Oh.
The vertical tap quantity register 301 holds the number of taps in a filter processing in the vertical direction on a two-dimensional image.
The horizontal arithmetic element quantity register 302 holds the number of product-sum operations which can be simultaneously performed in one cycle by the arithmetic logic unit 207 on data in the horizontal direction in a two-dimensional image.
The vertical output size register 303 holds the size of the result of the arithmetic operation of the filter processing in the vertical direction in the two-dimensional image.
The unit 304 for calculating the number of horizontal filter processing times calculates the number Kh of times of the filter processing in the horizontal direction necessary to obtain an image of the output size in the horizontal direction. The number of times of the filter processing in the horizontal direction is calculated on the basis of the number of vertical taps, the number of horizontal arithmetic elements, and the vertical output size. In the calculating method, in the case of processing the filter in the horizontal direction first and processing the vertical filter later, when a maximum positive integer K satisfying K(Tv+Ov−1)≦Eh exists, 1/K is the number of processing times. When the maximum positive integer K satisfying K(Tv+Ov−1)≦Eh does not exist and Tv+Ov/K−1≦Eh and the minimum positive integer K satisfying “the remainder of Ov/K=0” exists, K is the number of processing times. On the other hand, in the case of processing the filter in the vertical direction first and processing the filter in the horizontal direction later, when the number of processing times of the vertical filter is expressed as Kv and the maximum positive integer K satisfying K(Ov×Kv)≦Eh exists, 1/K is the number of processing times. When the maximum positive integer K satisfying K (Ov×Kv)≦Eh does not exist and (Ov×Kv)/K≦Eh and the minimum positive integer K satisfying “the remainder of (Ov×Kv)/K=0” exists, K is the number of processing times.
In the case of processing the filter in the horizontal direction first, which has the number of taps Tv=4, the vertical output size Ov=8 and, then, performing the filter processing in the vertical direction, the maximum positive integer K satisfying K(4+8−1)≦10 does not exist, the minimum positive integer satisfying 2+8/K−1≦10 and the remainder of 8/K=0 is 2, so that the number Kh of times of processing the horizontal filter becomes 2.
The vertical input size calculator 305 calculates the size of data which is input in one cycle to the arithmetic logic unit 207 at the time of performing the filter processing in the vertical direction on the basis of the number of vertical taps, the number of times of the horizontal filter processing, and the vertical output size. In the calculating method, when the number Kh of times of processing the horizontal filter is equal to or less than 1 (Kh≧1), Tv+Ov/Kh−1 is set as input data size. When 0<Kh<1, (Tv+Ov−1)/Kh is set as input data size. In the example of
The horizontal tap quantity register 311 holds the number of taps in the filter processing in the horizontal direction in a two-dimensional image.
The vertical arithmetic element quantity register 312 holds the number of product-sum operations which can be simultaneously performed in one cycle by the arithmetic logic unit 207 on data in the vertical direction in the two-dimensional image.
The horizontal output size register 313 holds the size of the result of the arithmetic operation of the filter processing in the horizontal direction in the two-dimensional image.
The unit 314 for calculating the number of times of the horizontal filter processing calculates the number Kh of times of the filter processing in the vertical direction necessary to obtain an image of the output size in the vertical direction. The number of times of the filter processing in the vertical direction is calculated on the basis of the number of horizontal taps, the number of vertical arithmetic elements, and the horizontal output size. In the calculating method, in the case of processing the filter in the horizontal direction first and processing the vertical filter later, when the number of times of the processing the horizontal filter is expressed as Kh and a maximum positive integer K satisfying K(Oh×Kh)≦Ev exists, 1/K is the number of processing times. When the maximum positive integer K satisfying K(Oh×Kh)≦Ev does not exist and (Oh×Kh)≦Ev and the minimum positive integer K satisfying “the remainder of (Oh×Kh)/K=0” exists, K is the number of processing times. On the other hand, in the case of processing the filter in the vertical direction first and processing the filter in the horizontal direction later, when the maximum positive integer K satisfying K(Th+Oh−1)≦Ev exists, 1/K is the number of processing times. When the maximum positive integer K satisfying K(Th+Oh−1)≦Ev does not exist and Th+Oh/K−1≦Ev and the minimum positive integer K satisfying “the remainder of Oh/K=0” exists, K is the number of processing times.
The horizontal input size calculator 315 calculates the size of data which is input in one cycle to the arithmetic logic unit 207 at the time of performing the filter processing in the horizontal direction on the basis of the number of horizontal taps, the number of times of the vertical filter processing, and the horizontal output size. In the calculating method, when the number Kv of times of processing the horizontal filter is equal to or less than 1 (Kh≧1), Th+Oh/Kv−1 is set as input data size. When 0<Kv<1, (Th+Oh−1)/Kv is set as input data size.
In the example of
The flow of the operation in the configuration of the first embodiment is as follows. To determine the format of an image which is input to the memory 203, various information necessary for the filter processing is input to the memory 203. When a start instruction is given from the host processor 101 to the control unit 202 via the bus 105, the filter processing starts in the filter processing unit 100. The control unit 202 sets the number of taps in the horizontal filter, the number of elements in the horizontal filter processing, the horizontal output size, the number of taps in the vertical filter, the number of elements in the vertical filter processing, and the vertical output size in the arithmetic parameter calculator 204. It is also possible to directly write the number of taps in the horizontal filter, the number of elements in the horizontal filter processing, the horizontal output size, the number of taps in the vertical filter, the number of elements in the vertical filter processing, and the vertical output size into the register in the arithmetic parameter calculator 204 without holding them into the memory. After completion of setting the number of taps in the horizontal filter, the number of elements in the horizontal filter processing, the horizontal output size, the number of taps in the vertical filter, the number of elements in the vertical filter processing, and the vertical output size, the arithmetic parameter calculator 204 calculates the number of times of the horizontal filter processing, the horizontal input size, the number of times of the vertical filter processing, and the vertical input size. The arithmetic parameter calculator 204 inputs the filter processing frequency signal 213 made by the number of times of the horizontal filter processing and the number of times of the vertical filter processing and the input size signal 214 made by the horizontal input size and the vertical input size to the control unit 202. The control unit 202 determines the format of an image which is input from the external memory 104 into the memory 203 on the basis of the number of times of the horizontal filter processing input by the filter processing frequency signal 213 and the input size signal 214, the horizontal input size, the number of times of the vertical filter processing, the vertical input size, the number of taps in the horizontal filter, and the number of taps in the vertical filter. The control unit 202 sends the information of the format to the bus interface 201, and the external memory 104 inputs the image in the format into the memory 203 via the bus 105. The image input to the memory 203 is sent to the filter circuit 208, and the filter circuit 208 performs the filter processing, and writes data back into the memory 203.
When an image necessary for the filter processing is I(X,Y) (X denotes a coordinate in the horizontal direction and Y denotes a coordinate in the vertical direction), the number of times of the horizontal filter processing is Kh, the horizontal input size is Ih, the number of times of the vertical filter processing is Kv, the vertical input size is Iv, the horizontal output size is Oh, and the vertical output size is Ov, the format of the image and transfer are performed as follows.
For example, as shown in
The following nine conditions can be mentioned with respect to the format of the image and the transfer method.
(1) In the case where Kv>1 and Kh>1
The format of an image is an image Vjm (j=0, 1, . . . , Kv−1, m=0, 1, . . . , Kh−1) obtained by dividing the image I to Kv×Kh. The image Vjm is an image having a width Ih and a height Iv from the coordinates (X, Y)=(j×Oh/Kv, m×Ov/Kh) on the image I. Transfer is performed in order of V00, V01, . . . , V0Kh−1, . . . , and VKv−1Kh−1.
(2) In the case where Kv>1 and Kh=1
The format of the image is an image Vj (j=0, 1, . . . , Kv−1) obtained by dividing the image I to Kv. The image Vj is an image having a width Ih and a height Iv from the coordinates (X, Y)=(j×Oh/Kv, O) on the image I. Images are transferred in order of V0, V1, . . . , VKv−1.
(3) In the case where Kv>1 and Kh<1
The format of the image is an image Vj (j=0, 1, . . . , Kv−1) obtained by dividing the image I to Kv and coupling 1/Kh piece of the divided image in the vertical direction. The image Vj is an image obtained by coupling 1/Kh piece of the divided image in the vertical direction. Images are transferred in order of V0, V1, . . . , VKv−1.
(4) In the case where Kv=1 and Kh>1
The format of the image is an image Vm (m=0, 1, . . . , Kh−1) obtained by dividing the image I to Kh. The image Vk is an image having a width Ih and a height Iv from the coordinates (X, Y)=(0, m×Ov/Kh) on the image I. Images are transferred in order of V0, V1, . . . , VKh−1.
(5) In the case where Kv=1 and Kh=1
The format of the image is an image I, and the image I is transferred.
(6) In the case where Kv=1 and Kh<1
The format of the image is an image V obtained by coupling 1/Kh piece of the image I, and the image V is transferred.
(7) In the case where Kv<1 and Kh>1
The format of the image is an image Vm (m=0, 1, . . . , Kh−1) obtained by dividing the image I to Kh and coupling 1/Kv piece in the horizontal direction. The image Vm is an image obtained by coupling 1/Kv piece of an image having a width Ih×Kv and a height Iv from the coordinates (X, Y)=(0, m×Ov/Kh) on the image I. Images are transferred in order of V0, V1, . . . , VKh−1.
(8) In the case where Kv<1 and Kh=1
The format of the image is an image V obtained by coupling 1/Kv piece of the image I in the horizontal direction, and the image V is transferred.
(9) In the case where Kv<1 and Kh<1
The format of the image is an image V obtained by coupling 1/Kh piece of the image I in the vertical direction and coupling 1/Kv piece in the horizontal direction, and the image V is transferred.
According to the conventional technique, data of the number of pieces necessary for the second filter processing is sequentially supplied to a plurality of product-sum operation units. The first filter processing is performed simultaneously on the data. The result of the first filter processing is sequentially supplied to the product-sum operation units and the second filter processing is performed simultaneously on the data. Consequently, in the case where the amount of data necessary for the second filter processing is larger than the number of elements of the operation units performing the first filter processing, for example, in the case where the number of arithmetic elements performing the first filter processing is eight and data which is input in relation with data necessary for the second filter processing is 11 pixels, the data of 11 pixels has to be divided to eight pixels and three pixels, and the filter processing has to be performed twice. As a result, until the arithmetic operation on data necessary for the second filter processing is completed, cycles necessary to perform the filter processing twice are required. There is consequently the possibility that the timing of starting the second filter processing delays. The delay in the timing of starting the second filter processing disturbs reduction in time necessary for the filter processing on a two-dimensional image.
In contrast, in the first embodiment, the number of pieces of data which is input per cycle into the first register is adjusted according to the number of taps in the filter processing and size of data generated simultaneously by the plural arithmetic logic units (the number of arithmetic elements), thereby promptly completing the first filter processing and supplying the result to the second filter processing. It can hasten the timing of starting the second filter processing. For example, as shown in
By adjusting the number of pieces of data which is input per cycle to the first register in accordance with the number of taps in the second filter processing, the size of the execution result of the second filter processing, and the number of arithmetic logic units performing the first filter processing, the case where the number of arithmetic logic units uselessly performing the first filter processing can be avoided. For example, as shown in
According to the first embodiment, the following effects can be obtained.
By adjusting the number of pieces of data which is input per cycle to the first register in accordance with the number of taps in the filter processing and the size of data simultaneously generated by a plurality of arithmetic logic units, the first filter processing is completed promptly, and the result of the first filter processing can be provided to the second filter processing. It can hasten the timing of starting the second filter processing as compared with that of the conventional technique. Since the number of pieces of data which is input per cycle to the first register is adjusted according to the number of taps in the second filter processing, the size of the execution result of the second filter processing, and the number of arithmetic logic units performing the first filter processing, useless arithmetic operations by the arithmetic logic units performing the first filter processing can be reduced.
Thus, the two-dimensional filter processing on a two-dimensional image can be performed efficiently.
The configuration shown in
The data generating circuit 605 receives an image stored in the memory 603 on the basis of arithmetic parameters calculated by the arithmetic parameter calculating unit 604, converts the format of the image, and transfers the resultant image to the filter circuit 608.
The flow of operations in the configuration of the second embodiment is as follows. First, images transferred via the bus 105 and various information necessary for the filter processing are stored into the memory 603 via a bus interface 601. When a start instruction is given from the host processor 101 to the control unit 602 via the bus 105, the filter processing starts in the filter processing unit 100. The control unit 602 sets the number of taps in the horizontal filter, the number of elements in the horizontal filter processing, the horizontal output size, the number of taps in the vertical filter, the number of elements in the vertical filter processing, and the vertical output size in the arithmetic parameter calculator 604. It is also possible to directly write the number of taps in the horizontal filter, the number of elements in the horizontal filter processing, the horizontal output size, the number of taps in the vertical filter, the number of elements in the vertical filter processing, and the vertical output size into the register in the arithmetic parameter calculator 604 without storing them in the memory 603. After completion of setting the number of taps in the horizontal filter, the number of elements in the horizontal filter processing, the horizontal output size, the number of taps in the vertical filter, the number of elements in the vertical filter processing, and the vertical output size, the arithmetic parameter calculator 604 calculates the number of times of the horizontal filter processing, the horizontal input size, the number of times of the vertical filter processing, and the vertical input size, and sends them to the data generating circuit 605. The data generating circuit 605 determines the format of an image which is input to the filter circuit 608 on the basis of the number of times of the horizontal filter processing, the horizontal input size, the number of times of the vertical filter processing, the vertical input size, the number of taps in the horizontal filter, and the number of taps in the vertical filter which are input, converts the format of an image which is input to the filter circuit 608, converts the image according to the format, and transfers the resultant image to the filter circuit 608. The format of an image is similar to that of the first embodiment. The filter circuit 606 performs the filter processing and writes the data back to the memory 603.
In the second embodiment, by transferring the original image to the memory 603 in the filter processing unit 100, the size becomes smaller than that in the case of transferring divided images.
The processor shown in
The processor shown in
The filter processing unit 900 performs a predetermined arithmetic processing by executing an instruction fetched via the instruction cache 901. In the case of outputting the result of the arithmetic operation by a store instruction or the like, the result is temporarily held in the data cache 907 or is held in the external memory 904 via the bus 905 and the memory interface 902. The result can be also transmitted to the I/I circuit 903 as an interface to devices of video and audio data via the bus 905. Examples of the devices coupled to the I/O circuit 903 include a video input device typified by a terrestrial digital tuner, an image input device typified by an image pickup device, and a display device typified by an LCD.
The filter processing unit 900 includes a bus interface (BIF) 1001, an instruction decoder (IDEC) 1002, an arithmetic parameter calculator (ACP) 1004, an index generator (IND-GEN) 1005, an internal register (INT-REG) 1006, a filter processor 1009, and a data generation circuit (DATA-CIR) 1010.
The instruction decoder 1002 decodes an input instruction, thereby generating parameter signals related to the filter processing, a source index, and a filter processing control signal. The parameters related to the filter processing are, concretely, the number of vertical taps, the number of horizontal arithmetic elements, vertical output size, the number of times in horizontal filter processing, vertical input size, the number of horizontal taps, the number of vertical arithmetic elements, horizontal output size, the number of times in vertical filter processing, and horizontal input size.
On the basis of the parameters related to the filter processing input from the instruction decoder 1002, the arithmetic parameter calculator 1004 calculates the number of times of the filter processing in the horizontal direction in a two-dimensional image and the number of times of the filter processing in the vertical direction. On the basis of the parameters related to the filter processing input from the instruction decoder 1002, the arithmetic parameter calculator 1004 calculates the size in the horizontal direction of the two-dimensional image which is input per cycle to the arithmetic logic unit calculating the filter processing in the horizontal direction and the size in the horizontal direction of the two-dimensional image which is input per cycle to the arithmetic logic unit calculating the filter processing in the horizontal direction. The arithmetic parameter calculator 1004 sends the number of times of the horizontal filter processing and the number of times of the vertical filter processing to the filter processor 1009 and sends the horizontal input data size and the vertical input data size to the data generation circuit 1010. The arithmetic parameter calculator 1004 has a configuration similar to that of
On start of the filter processing, the index generator 1005 generates a corrected source index by correcting a source index which is input via the instruction decoder 1002 on the basis of the number of times of the horizontal filter processing and the number of times of the vertical filter processing input from the arithmetic parameter calculator 1004, and holds it on the inside. During the filter processing, the index generator 1005 increments the corrected source index.
The internal register 1006 holds data fetched as data to be subject to the filter processing and outputs data corresponding to the corrected source index which is input from the index generator 1005.
The filter processor 1009 has, although not limited, a shift register (SFT-REG) 1007 capable of shifting data, a shift control circuit (SFT-CTRL) 1003 controlling data shift in the shift register 1007, and an SIMD arithmetic unit 1008 performing an arithmetic processing on output data of the internal register 1006. SIMD stands for Single Instruction Multiple Data. An SIMD arithmetic operation denotes an arithmetic method of performing a processing on a plurality of pieces of data by a single instruction. A result of the arithmetic operation in the SIMD arithmetic unit 1008 is written in the internal register 1006. The filter processor 1009 performs a filter processing by the number of times of the filter processing input from the arithmetic parameter calculator 1004.
The data generation circuit 1010 receives an image stored in the external memory 904 or the data cache 907, converts the image format on the basis of the arithmetic parameters input from the arithmetic parameter calculator 1004, and transfers the resultant image to the internal register 1006. The format of the image is similar to that determined by the control unit 202 in the first embodiment.
In the configuration, in the case where a filter processing is instructed by a command which is entered to the instruction decoder 1002, first, a source index as a base point of data to be read which is stored in the internal register is supplied from the instruction decoder 1002 to the index generator 1005. Various parameters related to the filter processing are supplied from the instruction decoder 1002 to the arithmetic parameter calculator 1004. In a manner similar to the first and second embodiments, the arithmetic parameter calculator 1004 calculates the number of times of the horizontal filer processing, the horizontal input size, the number of times of the vertical filter processing, and the vertical input size, enters all of the parameters to the data generating circuit 1010, and enters the number of times of the horizontal filter processing and the number of times of the vertical filter processing to the index generator 1005. The index generator 1005 calculates the corrected source index on the basis of the number of times of the horizontal filter processing, the number of times of the vertical filter processing, and the source index, and enters them to the internal register 1006. The internal register 1006 inputs data of a register corresponding to the corrected source index to the shift register 1007 in the filter processor 1009. The shift register 1007 shifts data by the shift control circuit 1003 or inputs data from the internal register 1006. The case of shifting data of the shift register corresponds to the case of the horizontal filter processing. The data from the shift register 1007 is supplied to the SIMD arithmetic unit 1008. The result of the arithmetic operation is written in the internal register 1006, and the filter processing is completed.
Also in the semiconductor device with the above-described configuration, in a manner similar to the first and second embodiments, the arithmetic parameter calculator 1004 calculates the number of times of the horizontal filter processing, the horizontal input size, the number of times of the vertical filter processing, and the vertical input size. On the basis of the parameters calculated by the arithmetic parameter calculator 1004, the filter processing is performed in the filter processor 1009. At this time, the circuit 1010 receives the image stored in the external memory 904 or the data cache 907, converts the image format on the basis of the arithmetic parameters entered from the arithmetic parameter calculator 1004, and transfers the resultant image to the internal register 1006. Since the format of the image is similar to that determined by the control unit 202 in the first embodiment, the number of pieces of data which is input per cycle to the internal register 1006 can be adjusted in accordance with the number of taps in the first filter processing, the size of the execution result of the first filter processing, and the number of the second arithmetic logic units. The number of pieces of data which is input per cycle to the internal register 1006 can be also adjusted in accordance with the number of taps in the second filter processing, the size of the execution result of the second filter processing, and the number of the first arithmetic logic units. Consequently, also in the filter processing unit 900, effects similar to those of the first and second embodiments can be obtained.
The arithmetic parameter calculator 204 shown in
For example, in motion predicting processing in a brightness image of MPEG1 and MPEG2, the number of vertical taps is two, the number of horizontal taps is two, the vertical output size is eight, and the horizontal output size is eight. In an encoding method called VC-1 (WMV9), in the case of using the bicubic method for the motion predicting processing, the number of vertical taps is four, the number of horizontal taps is four, the vertical output size is eight, and the horizontal output size is eight.
According to the fourth embodiment, signals output from the outside are not the number of vertical taps, the vertical output size, the number of horizontal taps, and the horizontal output size. The method 1200 is determined in the filter processing circuit, and the number of vertical taps, the vertical output size, the number of horizontal taps, and the horizontal output size can be set. Only by the encoded image and the encoding information, effects similar to those of the first and second embodiments can be obtained.
The present invention achieved by the inventors herein has been concretely described above. Obviously, the invention is not limited to the embodiments but can be variously modified without departing from the gist.
For example, in the foregoing embodiments, each of the first, second, and third registers in the present invention is formed by the internal register 206. However, the first, second, and third registers may be formed by different registers. Although each of the first and second arithmetic logic units in the invention is formed by the arithmetic logic unit 207 in the foregoing embodiments, the first and second arithmetic logic units may be formed by different arithmetic logic units.
As the filter processing unit 900 in
Number | Date | Country | Kind |
---|---|---|---|
2009-032687 | Feb 2009 | JP | national |