This application claims priority under 35 U.S.C. § 119(a) to a Korean Patent Application filed in the Korean Intellectual Property Office on Feb. 24, 2006 and assigned Serial No. 2006-18478, the disclosure of which is incorporated herein by reference.
1. Field of the Invention
The present invention relates generally to a data processing technique for a portable multimedia apparatus and an apparatus for processing data using the same, and in particular, to a subword parallelism method for efficiently processing multimedia data and an apparatus for processing data using the same.
2. Description of the Related Art
In a multichannel image coding scheme, standard images can be expressed with image signals based on vector values, and each pixel of the images is composed of three components, i.e., Red, Green and Blue (RGB). However, the RGB color space is not suitable for image processing because signal correlation between color components of an RGB image is high and each of the color components has a broad band. In order to solve this problem, the image and video processing field universally uses a YCbCr color space which is suitable for the visual characteristics of human beings by reducing the signal correlation between the color components and reducing the total amount of generated data.
The YCbCr color space is a color coordinate space based on the color perceptibility of the humans, and because the human eye is less susceptible to high frequency in terms of chrominance (for example, Cb and Cr), humans cannot recognize color distortion with the naked eye even though it undergoes undersampling. In addition, a luminance component Y of the image can be processed independently of the chrominance components Cb and Cr.
Meanwhile, a subword parallelism technique that can simultaneously operate for several small data elements, like 8-bit pixels, is used for image processing. For subword parallelism, several small data elements (for example, 8-bit pixels) are packed in one large register (for example, 32-bit or 64-bit register), and the individual elements are processed in parallel by one instruction.
Referring to
The words 11 and 13 each include 3 subwords having Y, Cb and Cr information. In this case, the 8 Most Significant Bit (MSB) bits of each word are unused. The subwords undergo computation in their associated ALUs 110, 120, 130 and 140, and are output as another word 15.
However, in the subword parallelism technique, overflow or underflow may occur during arithmetic computation (for example, addition and subtraction) which is most frequently used for image processing, and thus overhead for handling the overflow or underflow may also occur, affecting performance.
Referring to
Referring to
Referring to
An aspect of the present invention is to address at least the problems and/or disadvantages and to provide at least the advantages described below. Accordingly, an aspect of the present invention is to provide a subword parallelism method capable of preventing overflow or underflow which may occur during multimedia data processing, without an increase in hardware, and an apparatus for processing data using the same.
Another aspect of the present invention is to provide a subword parallelism method capable of reducing the processing delay due to overhead instruction by reducing a bit width of input data, and an apparatus for processing data using the same.
The above and other aspects of the present invention can be achieved by a subword parallelism method in a data processing system that processes in parallel the subwords constituting a word obtained by temporarily loading in word registers the data stored in a memory, using ALUs which are equal in size to the subwords.
According to one aspect of the present invention, there is provided a parallel processing method in a data processing system that temporarily loads data stored in a memory in word registers and parallel-processes subwords constituting the loaded word using Arithmetic Logic Units (ALUs) which are equal in size to the subwords. The method includes generating a shortened subword by removing at least one bit among the bits constituting each subword; and performing parallel computation on the shortened subwords.
According to another aspect of the present invention, there is provided a parallel processing method in a data processing system that temporarily loads data stored in a memory in 32-bit word registers in units of 8-bit subwords and parallel-processes the subwords using four 8-bit Arithmetic Logic Units (ALUs). The method includes right-shifting each subword by a predetermined number of bits and outputting the right-shifted subword as a shortened subword; and delivering the shortened subwords to their associated ALUs and performing parallel computation thereon.
According to further another aspect of the present invention, there is provided an apparatus for processing data in a data processing system. The apparatus includes a memory for storing data; two registers for temporarily storing the data stored in the memory in units of subwords; and Arithmetic Logic Units (ALUs) for right-shifting the subword stored in each register by at least one bit, and performing computation on the two right-shifted subwords output from the two registers.
According to yet another aspect of the present invention, there is provided an apparatus for processing data in a data processing system. The apparatus includes a memory for storing data; two registers for temporarily storing the data stored in the memory in units of subwords; and Arithmetic Logic Units (ALUs) for right-shifting the subword stored in each register by at least one bit, performing sign bit extension on each right-shifted subword, and performing computation on the two sign bit-extended subwords output from the two registers.
According to still another aspect of the present invention, there is provided an apparatus for processing data in a data processing system. The apparatus includes a memory for storing data; two registers for dividing the data stored in the memory into subwords, right-shifting the divided subwords separately by at least one bit, and temporarily storing the right-shifted subwords; and Arithmetic Logic Units (ALUs) for performing computation on the subwords stored in the registers.
The above and other objects, features and advantages of the present invention will become more apparent from the following detailed description when taken in conjunction with the accompanying drawings in which:
Exemplary embodiments of the present invention will now be described in detail with reference to the annexed drawings. In the drawings, the same or similar elements are denoted by the same reference numerals even though they are depicted in different drawings. In the following description, a detailed description of known functions and configurations incorporated herein has been omitted for clarity and conciseness.
Referring to
This embodiment will be described with reference to an exemplary process of parallel-computing four 8-bit data signals stored in two 32-bit registers 41 and 42.
In the first register (Ra) 41, 8-bit subwords Y0, Cb0 and Cr0 are arranged in sequence from the Least Significant Bit (LSB) position. In the second register (Rb) 42, 8-bit subwords Y1, Cb1,Cr1 are arranged in sequence from the LSB position. The surplus positions in the first register (Ra) 41 and the second register (Rb) 42 are unused.
The subwords stored in the first and second registers 41 and 42 are right-shifted by a predetermined number ‘8−n’, and then input to their associated ALUs. Herein, it is preferable that n is greater than or equal to 1, and less than or equal to 4 (1≦n≦4).
For example, a 6-bit subword Y′0 obtained by right-shifting a subword Y0 of the first register 41 by 2, and a subword Y′1 obtained by right-shifting a subword Y1 of the second register 42 by 2 are input to an 8-bit ALU 440, and the computation results of the 8-bit ALU 440 are stored in a third register 43 as an 8-bit subword C0. In addition to the right shifting, it is preferable to perform sign bit extension for processing negative numbers.
Although this embodiment has been described with reference to the 32-bit datapath architecture by way of example, the present invention is not limited thereto and can also be applied to 64-bit or 128-bit datapath architecture. In addition, although this embodiment has been described with reference to the data processing method for the YCbCr color space by way of example, the present invention is not limited thereto and can also be applied to data processing in other color spaces, like the YUV and YIQ color spaces.
Referring to
The present invention solves the overflow problem in the ALUs by reducing the number of bits of pixel data. This is possible because in the YCbCr color space, the reduction in the number of component bits may not cause noticeable quality degradation. The subword parallelism method of the present invention limits the number ‘n’ of shifting bits to a range of 1≦n≦4 to prevent noticeable quality degradation.
As can be understood from the foregoing description, the new subword parallelism method reduces the number of bits constituting a pixel (or subword) within a given limit for preventing noticeable quality degradation, thereby preventing the overflow or underflow which may occur due to additional computation.
In addition, the new subword parallelism method does not need the packing/unpacking process because it reduces the length of subwords during computation, thereby minimizing processing delay due to processing overhead.
While the invention has been shown and described with reference to a certain preferred embodiment thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.
Number | Date | Country | Kind |
---|---|---|---|
18478/2006 | Feb 2006 | KR | national |