1. Field of the Invention
The present invention relates to a parallel operation apparatus of a SIMD type for executing a parallel operation to an image signal such as an image CODEC (Coder Decoder) or the like.
2. Description of the Related Art
In a significant advancement of technology in the field of a digital image apparatus in recent years, an image processing, such as compression/extension and filtering with respect to the image, has been highly complicated. In the image processing, the processing is executed in a frame format or a field format with respect to the images stored in a memory respectively in the frame format or the field format. The frame format refers to a format wherein a top field and a bottom field alternately constitute the image. The field format refers to a format wherein the top field and the bottom field are respectively disposed at different positions, each as a lump.
In reading image data corresponding to an address, some data need not be read, as an example of which, encoding data for MPEG decoding can be mentioned. Data called CBP (Coded Block Pattern) is used therein. Though the details are omitted here, the CBP is used to judge whether or not blocks in a macro block are respectively encoded. When a CBP value with respect to a block is “0”, the block is not encoded and all of the encoding data is “0”, which makes it unnecessary to read the data.
An issue to be dealt with here is that, when image data in a data memory is not stored in a desired format, itis necessary to rearrange the order of reading the data. For example, when the image is arranged as in
No. 07-121687 of the Publication of the Unexamined Patent Applications disclosed a technology successfully solving the issue by executing one-bit rotation.
Providing a description referring to
However, the foregoing method is premised on the data arrangement in the frame format. Therefore, the foregoing method cannot be adopted to the case where the it is desired to obtain the image in the frame format from the image arranged in the field format.
Further, the foregoing method, which is based on the assumption that a line of the relevant image can be disposed in a line of the memory, cannot respond to the case where the line of the relevant image is larger than the line of the memory in size.
In any case where the foregoing method cannot be adopted, such as reading the image stored in the field format in the frame format, it becomes necessary to manipulate the address of the data to be read. It would require a program capable of corresponding to the read formats increasing a program size for the operation apparatus to execute the address manipulation. The data writing faces the same problem.
As a solution, it is an option to rewrite the data into data in a desired format. However, such a solution requiring the repetition of load/store in the operation apparatus would lead to an increased throughput in the operation apparatus. Further, a solution using DMA (Direct Memory Access) includes the problem that a DMA instruction is issued more often. Further, as a different option, an address conversion table can be previously prepared. The foregoing method, however, requires the number of conversion tables corresponding to different types of conversions, resulting in an increased necessary memory size.
Those methods according to the conventional technology do not include a mechanism for controlling the read by means of the address, therefore are incapable of controlling any unnecessary read with respect to the memory. Thus, power consumed for reading the data, which is later proven to be the unnecessary data, results in vain due to the unnecessary access to the memory. It would be convenient to arrange a data-read instruction not to be issued when an access is made to an address where the unnecessary data is stored. However, when such a judgment is made in the operation apparatus, a program installed in the operation apparatus would be complicated.
A first parallel operation apparatus of a SIMD type according to the present invention comprises, a processor element group of a SIMD type including a plurality of processor elements, wherein the respective processor elements simultaneously execute an identical operation, a data memory accessible from the respective processor elements, and an address conversion unit for converting an address with respect to the data memory accessed by the processor elements in accordance with a control signal by changing bit positions of the address.
In the first SIMD-type parallel operation apparatus, when it is premised that image data in the data memory is arranged in a frame format, the address conversion unit is controlled in accordance with the setting of the control signal to thereby change over to the state where the access is made in the frame format without changing the address at which the processor elements access the data memory, and to the state where the access is made in a field format by converting the address into a different address. Alternatively, when it is premised that the image data in the data memory is arranged in the field format, the address conversion unit is controlled in accordance with the setting of the control signal to thereby change over to the state where the access is made in the field format without changing the address at which the processor element accesses the data memory, and to the state where the access is made in the frame format by converting the address into a different address. As described, according to the first SIMD-type parallel operation apparatus, the data memory is accessible in either the frame format or the field format.
In the foregoing configuration, the bit positions can be changed in the address conversion unit in the following different manners.
1) The address conversion unit rearranges a first bit, second bid and third bit from a low order of the address data respectively to the second bit, third bit, and first bit from the lower order to thereby change the bit positions.
When eight pixels are a unit for per processing and it is premised that the image data in the data memory is arranged in the frame format, the described address conversion enables the access in the field format.
2) The address conversion unit rearranges the first bit, second bid and third bit from the lower order of the address data respectively to the third bit, first bit, and second bit from the lower order to thereby change the bit positions.
When eight pixels are a unit for per processing and it is premised that the image data in the data memory is arranged in the field format, the described address conversion enables the access in the frame format.
3) The address conversion unit rearranges the first bit, second bid, third bit, fourth bit and fifth bit from the lower order of the address data respectively to the first bit, third bit, fourth bit, fifth bit and second bit from the lower order to thereby change the bit positions.
In the case where 16 pixels are a unit per processing, and a line of the image data cannot be disposed in a line of the memory due to a limited memory width, therefore arranging a surplus part of the line in a subsequent line, and further it is premised that the image data in the data memory is arranged in the frame format, the foregoing address conversion enables the access in the field format. In the foregoing manner, it is unnecessary to provide a program responding to the access formats, thereby reducing a code size. Further, it is unnecessary to rearrange the data, which leads to the reduction of the throughput.
4) The address conversion unit rearranges the first bit, second bid, third bit, fourth bit and fifth bit from the lower order of the address data respectively to the first bit, fifth bit, second bit, third bit and fourth bit from the lower order to thereby change the bit positions.
When 16 pixels are a unit per processing, and a line of the image data cannot be disposed in a line of the memory due to the limited memory width, therefore arranging the surplus part of the line in a subsequent line, and further it is premised that the image data in the data memory is arranged in the field format, the foregoing address conversion enables the access in the frame format. In the foregoing manner, it is unnecessary to provide the program responding to the access formats, thereby reducing the code size. Further, it is unnecessary to rearrange the data, which leads to the reduction of the throughput.
5) The address conversion unit implements changeovers, with respect to the first bit, second bid, third bit, fourth bit and fifth bit from the lower order of the address data, to the arrangement state of the fifth bit, first bit, and second bit, third bit and fourth bit from the lower order, and to the arrangement state of the fifth bit, second bit, third bit, fourth bit and first bit from the lower side bit to thereby change the bit positions.
When 16 pixels are a unit per processing, and a line of the image data cannot be disposed in a line of the memory due to the limited memory width, therefore arranging the surplus part of the line in a position 16 lines below, and further it is premised that the image data in the data memory is arranged in the frame format, the foregoing address conversion enables the access in the field format. In the foregoing manner, it is unnecessary to provide the program responding to the access formats, thereby reducing the code size. Further, it is unnecessary to rearrange the data, which leads to the reduction of the throughput. Further, because it is unnecessary to provide an address conversion table, the required memory size is not increased.
6) The address conversion unit implements changeovers, with respect the first bit, second bid, third bit, fourth bit and fifth bit from the lower order of the address data, to the arrangement state of the fifth bit, fourth bit, first bit, second bit and third bit from the lower order, and to the arrangement state of the fifth bit, first bit, second bit, third bit and fourth bit from the lower order bit to thereby change the bit positions.
When 16 pixels are a unit per processing, and a line of the image data cannot be disposed in a line of the memory due to the limited memory width, therefore arranging the surplus part of the line in the position 16 lines below, and further it is premised that the image data in the data memory is arranged in the field format, the foregoing address conversion enables the access in the frame format. In the foregoing manner, it is unnecessary to provide the program responding to the access formats, thereby reducing the code size. Further, it is unnecessary to rearrange the data, which leads to the reduction of the throughput. Further, because it is unnecessary to provide the address conversion table, the required memory size is not increased.
7) The address conversion unit implements changeovers, with respect to the first bit, second bid, third bit, fourth bit and fifth bit from the lower order of the address data, to the arrangement state of the fourth bit, first bit, second bit, third bit and fifth bit from the lower order, and to the arrangement state of the fourth bit, second bit, third bit, fifth bit and first bit from the lower order bit to thereby change the bit positions.
When 16 pixels are a unit per processing, and a line of the image data cannot be disposed in a line of the memory due to the limited memory width, therefore arranging the surplus part of the line in a position eight lines below, and further it is premised that the image data in the data memory is arranged in the frame format, the foregoing address conversion enables the access in the field format. In the foregoing manner, it is unnecessary to provide the program responding to the access formats, thereby reducing the code size. Further, it is unnecessary to rearrange the data, which leads to the reduction of the throughput. Further, because it is unnecessary to provide the address conversion table, the required memory size is not increased.
8) The address conversion unit implements changeovers, with respect to the first bit, second bid, third bit, fourth bit and fifth bit from the lower order of the address data, to the arrangement state of the fourth bit, fifth bit, first bit, second bit and third bit from the lower order, and to the arrangement state of the fourth bit, first bit, second bit, third bit and fifth bit from the lower side bit to thereby change the bit positions.
When 16 pixels are a unit per processing, and a line of the image data cannot be disposed in a line of the memory due to the limited memory width, therefore arranging a surplus part of the line in the position eight lines below, and further it is premised that the image data in the data memory is arranged in the field format, the foregoing address conversion enables the access in the frame format. In the foregoing manner, it is unnecessary to provide the program responding to the access formats, thereby reducing the code size. Further, it is unnecessary to rearrange the data, which leads to the reduction of the throughput. Further, because it is unnecessary to provide the address conversion table, the required memory size is not increased.
Both of the address conversion units in 1) and 2) may be provided, each used for a different purpose according to need. At least two or more from any of the plurality of address conversion units in 3)-8) may be provided, each used for a different purpose according to need.
A second parallel operation apparatus of the SIMD type according to the present invention comprises, a SIMD-type processor element group including a plurality of processor elements, wherein the respective processor elements simultaneously execute an identical operation, a data memory accessible from the respective processor elements, and a data changeover unit for negating a read request for an address which does not fall under conditions and inputting fixed data to the processor elements.
In the second SIMD-type parallel operation apparatus, CBP is used to judge whether or not blocks in a macro block are respectively encoded in the case of MPEG. When a CBP value is “0” meaning that the relevant block is not encoded, all of encoding data is “0”, which makes it unnecessary to read data. In the case of the read request for the address, which does not fall under the conditions, for example, when the CBP value is “0”, the data changeover unit negates the request and inputs the fixed data to the processor elements. In the foregoing manner, the read of the unnecessary data, which does not fall under the conditions, is halted by means of the address value, so that any unnecessary access to the memory can be eliminated, reducing the power consumption. Further, because the program does not judge whether or not the data is necessary, the program can be prevented from being complicated.
The present invention is illustrated by way of example and not limitation in the figures of the accompanying drawings, in which like references indicate similar elements and in which:
Hereinafter, a parallel operation apparatus of a SIMD type according to preferred embodiments of the present invention is described referring to the drawings.
Input and output data of the processor elements 5 is stored in the data memory 4. The data memory 4 is evenly allocated to the processor elements 5. A pre-conversion address 8 to be inputted to an address conversion unit 7 is stored in an address storage register 6, and a value of the pre-conversion address 8 can be controlled by means of the processor element group 1. There may be a plurality of address storage registers 6. An address conversion unit 7 converts the pre-conversion address 8 from the address storage register 6 and creates the post-conversion address 3. The address conversion unit 7 changes over a conversion method in response to an external control signal.
An operation of the SIMD-type parallel operation apparatus in writing with respect to the data memory 4 is described. The processor element group 1 outputs the write request to the memory control signal 2. The data memory 4 receives the write request, and stores the data outputted from the respective processor elements 5 in a position indicated by the post-conversion address 3 resulting from the conversion of the pre-conversion 8 by the address conversion unit 7.
An operation of the SIMD-type parallel operation apparatus in reading with respect to the data memory 4 is described. The processor element group 1 outputs the read request to the memory control signal 2. The data memory 4 receives the read request, and outputs the data in a position indicated by the post-conversion address 3 resulting from the conversion of the pre-conversion 8 by the address conversion unit 7.
In the case where serial addresses are inputted to the address conversion unit 7, a value of the address storage register 6 is incremented by one by the processor element group 1 for each read or write.
In
In the address conversion unit 7, a bit order of an address value is changed to thereby convert serial accesses into an effective access order so that the foregoing problem is solved. An operation of the bit order change is changed over by means of an external control signal 9.
In
Further, when the control signal 9 is set to “0”, the image can be obtained in the frame format shown in
Below is provided a more specific description. In
As described, according to the present embodiment, no program or data rearrangement responding to the respective frame and field formats is necessary. The image can be obtained in either frame format or field format by changing over the control signal 9.
A configuration of a parallel operation apparatus of the SIMD type according to an embodiment 2 of the present invention is the same as the configuration shown in
In the foregoing case, providing that the serial addresses are supplied to the address conversion register 6 and the conversion operation shown in
Further, when the control signals 9 is set to “0”, the image can be obtained in the field format.
Below is provided a more specific description. In
As described, according to the present embodiment, no program or data rearrangement responding to the respective frame and field formats is necessary. The image can be obtained in either frame format or field format by changing over the control signal 9.
A configuration of a parallel operation apparatus of the SIMD type according to an embodiment 3 of the present invention is the same as the configuration shown in
In the foregoing case, providing that the serial addresses are given to the address storage register 6 and the conversion operation shown in
Further, the image can be obtained in the frame format by setting the control signal 9 to “0”.
Below is provided a description in more detail. In
As described, according to the present embodiment, no program or data rearrangement responding to the respective frame and field formats is necessary. The image can be obtained in either frame format or field format by changing over the control signal 9.
A configuration of a parallel operation apparatus of the SIMD type according to an embodiment 4 of the present invention is the same as the configuration shown in
In the foregoing case, providing that the serial addresses are given to the address storage register 6 and the conversion operation shown in
Further, the image can be obtained in the field format by setting the control signal 9 to “0”.
Below is provided a description in more detail. In
As described, according to the present embodiment, no program or data rearrangement responding to the respective frame and field formats is necessary. The image can be obtained in either frame format or field format by changing over the control signal 9.
A configuration of a parallel operation apparatus of the SIMD type according to an embodiment 5 of the present invention is the same as the configuration shown in
In the foregoing case, providing that the serial addresses are given to the address storage register 6 and the conversion operation shown in
Further, the image can be obtained in the field format by setting the control signal 9 to “1”.
Below is provided a description in more detail. In
As described, according to the present embodiment, no program or data rearrangement responding to the respective frame and field formats is necessary. The image can be obtained in either frame format or field format by changing over the control signal 9.
A configuration of a parallel operation apparatus of the SIMD type according to an embodiment 6 of the present invention is the same as the configuration shown in
In the foregoing case, providing that the serial addresses are given to the address storage register 6 and the conversion operation shown in
Further, the image can be obtained in the field format by setting the control signal 9 to “1”.
Below is provided a description in more detail. In
As described, according to the present embodiment, no program or data rearrangement responding to the respective frame and field formats is necessary. The image can be obtained in either frame format or field format by changing over the control signal 9.
A configuration of a parallel operation apparatus of the SIMD type according to an embodiment 7 of the present invention is the same as the configuration shown in
In the foregoing case, providing that the serial addresses are given to the address storage register 6 and the conversion operation shown in
Further, the image can be obtained in the field format by setting the control signal 9 to “1”.
Below is provided a description in more detail. In
As described, according to the present embodiment, no program or data rearrangement responding to the respective frame and field formats is necessary. The image can be obtained in either frame format or field format by changing over the control signal 9.
A configuration of a parallel operation apparatus of the SIMD type according to an embodiment 8 of the present invention is the same as the configuration shown in
In the foregoing case, providing that the serial addresses are given to the address storage register 6 and the conversion operation shown in
Further, the image can be obtained in the field format by setting the control signal 9 to “1”.
Below is provided a description in more detail. In
As described, according to the present embodiment, no program or data rearrangement responding to the respective frame and field formats is necessary. The image can be obtained in either frame format or field format by changing over the control signal 9.
Further, the different configurations of the respective address conversion unit 7 shown in the embodiments 1 though 8 can be combined, in which case a plural kinds of conversion methods are changed over in response to the control signal 9. In such a manner, in the case where the image comprised of horizontal eight pixels×vertical eight pixels each having 16 bits is disposed in the memory in the frame format or field format in consequence of, for example, combining the embodiments 1 and 2, the image can be read in either of the formats.
Further, the embodiments 1 through 8 employ the image comprised of horizontal eight pixels×vertical eight pixels each having 16 bits and the image comprised of horizontal 16 pixels×vertical 16 pixels each having 16 bits in the respective descriptions, however, the configuration of the image is not limited thereto.
In the data changeover unit 13, in the case where a read request is inputted to the memory control signal 2 from the processor element group 1, an address is inputted at the same time from the address storage register 6 to thereby judge whether or not the address satisfies conditions. When the address satisfies the conditions, the read request is outputted to the data memory 4, and data changeover selectors 15 are set by means of a data changeover signal 14 in such manner that memory input/output data 10 is inputted to the processor elements 5.
When the address does not satisfy the conditions, the read request is not outputted to the data memory 4, and the data changeover selectors 15 are set in such manner that “0” is inputted to the processor elements 5.
When a write request is outputted to the memory control signal 2, the data changeover unit 13 always outputs the write request to the data memory 4, and sets the data changeover selectors 15 in such manner that the output data of the processor elements 5 is outputted to the data memory 4.
A read control by means of CBP (encoding block pattern) of MPEG decoding is described.
It is assumed that the encoding data is disposed as shown in
For example, when a highest-order bit of the CBP is “0”, it is unnecessary to read the encoding data in the Y0 block.
The data changeover unit 13 converts the inputted address by means of a conversion table, and negates the read request when the bit value of the CBP indicated by the converted value is “0” and sets the data changeover selectors 15 so that “0” is inputted to the respective processor elements 5 by means of the data changeover signal 14.
When the bit value of the CBP corresponding to the block is “1”, the read request is outputted to the data memory 4, the data changeover selectors 15 are set in such manner that the memory input/output data 10 is inputted to the processor elements 5.
The conversion table for the inputted address is shown in
According to the foregoing method, the read of any unnecessary data is halted in response to the address value, and power consumption can be thereby reduced eliminating any unnecessary access to the memory.
An operation of the SIMD-type parallel operation apparatus in writing with respect to the data memory 4 is described.
The processor element group 1 outputs the write request to the memory control signal 2. The data changeover unit 13, in response to the receipt of the write request signal, outputs the write request to the data memory 4 and sets the data changeover selectors 15 in such manner that the output data of the processor elements 5 is outputted to the data memory 4. The data memory 4 receives the write request, and correspondingly stores the data outputted from the processor elements 5 in a position indicated by the post-conversion address 3 in which the pre-conversion address 8 is converted by means of the address conversion unit 7.
An operation of the SIMD-type parallel operation apparatus in reading with respect to the data memory 4 is described.
The processor element group 1 outputs the read request to the memory control signal 2. The data conversion unit 13, in response to the receipt of the signal, judges whether or not the post-conversion address 3 from the address conversion unit 7 satisfies the conditions, and outputs the read request to the data memory 4 when the conditions are satisfied, and further sets the data changeover selectors 15 in such manner that the memory input/output data 10 is inputted to the processor elements 5. The data memory 4 receives the read request, and correspondingly outputs the data in the position indicated by the post-conversion address 3 from the address conversion unit 7 to the respective processor elements 5.
Further, when the post-conversion address 3 does not satisfy the conditions, the data changeover unit 13 does not output the read request to the data memory 4, and sets the data changeover selectors 15 in such manner that “0” is inputted to the processor elements 5. As a result, “0” is inputted to the respective processor elements 5.
According to the foregoing method, neither the program nor the rearrangement of the data corresponding to the frame format or field format is necessary, and the image can be obtained in either the frame format or field format by the changeover of the control signal 9. Further, the read of any unnecessary data can be halted by means of the address value, which eliminates any unnecessary access to the memory thereby reducing the power consumption.
While the invention has been described and illustrated in detail, it is to be clearly understood that this is intended by way of illustration and example only and is not to be taken by way of limitation, the spirit and scope of this invention being limited only be the terms of the following claims.
Number | Date | Country | Kind |
---|---|---|---|
P2003-423077 | Dec 2003 | JP | national |