This application claims the right of priority under 35 U.S.C. §119 based on Australian Patent Application No. 2007219336, filed 28 Sep. 2007, which is incorporated by reference herein in its entirety as if fully set forth herein.
The current invention relates to image resealing and, in particular, to downscaling of video image data by arbitrary ratios, preferably using video processing hardware.
Resealing of images is typically done using an interpolating filter. When resealing images to smaller sizes, to achieve good quality results, it is necessary to pre-process the image with a low-pass filter to avoid artifacts caused by aliasing. To achieve high quality image reduction with reasonable computational efficiency, it is desirable to combine the low-pass and interpolating filters into a single filter. A FIR (finite impulse response) filter is typically used. The input samples are convolved with the filter kernel to produce the output samples.
The cubic kernel is a well known filter kernel and is widely used for these purposes. The cubic kernel itself is defined as a continuous function and may be sampled as required dependent upon the task being performed. This process comprises defining an origin at the location of the output sample and evaluating the cubic kernel function at each input sample location to determine a discrete convolution kernel. The output point is then calculated as the inner product of the input data with the discrete convolution kernel.
Monochrome images may be resealed as described above. In colour images, each pixel is typically represented by a colour value that is defined by three values that represent different components of the colour, such as red, green and blue components. Many other ways of representing colour are possible using multiple components. Colour images may be re-scaled by re-sampling each of the component images separately. Video data is typically represented as a sequence of frames, each of which is represented as a rectangular array of pixels. In video data, the three components used to represent the colour of a pixel may be sampled at different resolutions so each colour frame may actually be represented as three frames each corresponding to a different component. Video data is typically encoded so that one component represents luminance and the other two components represent colour information. Colour information is often represented at lower resolution than luminance information, but colour videos may still be resealed by re-sampling each component of each frame separately.
One issue with down-sampling is that the filter kernel size grows as a function of the rescaling ratio because more low-pass filtering is required for larger downscaling ratios, and this requires a wider filter kernel. This leads to several problems. The first is that the filter coefficients needed depend on the downscaling ratio. This means that to support arbitrary downscaling ratios, either a large number of kernel values need to be stored, or kernel values need to be calculated dynamically. This is particularly important for real time image transformations at video display rates, such as 25-30 frames per second for television. These problems compound when extending from standard definition to so-called high definition formats.
Various methods are known to reduce the cost of kernel evaluation. For example, for any given rational scaling rate it is known that only a finite set of coefficients will be required and these can be pre-calculated and stored in a table. Low complexity methods for calculating cubic coefficients at unit intervals have also been proposed and may be less costly to implement than large look-up tables.
Another problem caused by the fact that filter sizes vary for down-sampling, is that this can make a hardware implementation difficult as a large number of memory reads and many multiplications may be required to generate each output sample. When a conventional convolution method is used for down-sampling, one output sample is produced at each step. This is particularly a problem when arbitrary scaling ratios are required because a variable number of multiplications are required to produce each output sample making it difficult to design circuits for performing such convolutions. Such circuits either require a large number of multipliers or they require many clock cycles to produce one output, and each input sample may need to be accessed many times.
A known solution to this problem is to invert the order of the convolution summation. In order to reduce the resolution of a one dimensional stream of data such as a stream of audio samples, a transposed FIR filter structure with time-varying coefficients may be used to implement a polyphase filter.
The transposed convolution method may be applied to two dimensional image resealing by scaling first horizontally and then vertically. This requires buffering a complete intermediate frame of data because the data is accessed in different order for horizontal and for vertical scaling.
A second issue for down-scaling is that for non-integer reduction ratios, the different discrete convolution kernels derived from the cubic function do not exhibit uniform gain. This means that the sum of the coefficients contributing to each output sample is not constant and the output of the resampling process will exhibit a position dependent intensity variation. This issue is trivially overcome by calculating the sum of the coefficients and using this value to normalise the output. Other solutions have also been proposed in the literature. Another solution is known in the art as “Paul Heckbert's zoom code”, which may be found, as at the filing date of this specification, at http://www.cs.cmu.edu/˜ph/src/zoom/ which calculates the difference between the ideal and actual coefficient sums for each kernel and adds this difference to the centre-most kernel sample. This approach is particularly suited for implementations that use integer arithmetic and avoids the need for division. A disadvantage is that the kernel continuity is compromised.
There are other known techniques for modifying interpolating filters to produce a flat response, such as that described in U.S. Pat. No. 6,816,622, issued Nov. 9, 2004 and assigned to Microsoft Corp. A disadvantage of this approach is that the frequency response of the kernel is modified in a rate dependent manner. In particular, the degree of additional smoothing introduced by the modification of the filter increases as the down-sampling rate approaches 1:1. This level of smoothing for small changes in scale may be unacceptable for some applications such as video re-sampling where a small scale change may be required to change between a “letter-box” view and a full screen view of a movie sequence.
It is an object of the present invention to substantially overcome or at least ameliorate one or more problems with the conventional approaches discussed above.
The present inventors have determined that by extending the transposed time-varying FIR filter processing model to two dimensions and incorporating kernel normalisation with negligible additional buffering, efficient down-sampling of two dimensional image data in raster scan order can be obtained. This is useful where independent and arbitrary scaling is required in both vertical and horizontal directions. This approach avoids the need for a large kernel coefficient store or costly coefficient calculations by dynamically normalizing the filter response. This is desirably achieved by dividing by the filter weight for each output sample using a novel buffering scheme for storing partially calculated filter weights, while avoiding costly division operations by calculating the reciprocal of the filter weight using a novel look-up table based approach.
In accordance with one aspect of the present invention there is disclosed a method for re-sampling an input image comprising input samples to produce an output image comprising output samples, said method comprising the steps of:
(a) determining a set of kernel values based on a position of an input sample, each kernel value in said set corresponding to a distinct output sample position;
(b) multiplying each kernel value in said set by the value of said input sample to form a contribution, each said contribution corresponding to a distinct output sample;
(c) first adding each said contribution to a value in a corresponding storage location in an output accumulator, the result of said first addition replacing the contents of said storage location in the output accumulator;
(d) second adding each kernel value to a storage location in a sliding kernel accumulator, the result of said second addition replacing the contents of said storage location in the sliding kernel accumulator;
(e) reading an accumulated output value from said output accumulator;
(f) reading a kernel weight from said sliding kernel accumulator;
(g) dividing said accumulated output value by said kernel weight to form an output sample at said output sample position; and
(h) advancing said sliding kernel accumulator by one value.
Generally, the input samples are processed in raster scan order and also the output samples are produced in raster scan order. Desirably, the output accumulator contains a number of values not significantly more than II lines of output where II is the height in output samples of a vertical interpolation kernel.
In a specific implementation, step (g) comprises the steps of:
(ga) calculating a residual kernel weight representing the difference between the kernel weight and an ideal kernel weight,
(gb) determining a reciprocal of the kernel weight based on said difference, and
(gc) multiplying said accumulated output value by said reciprocal.
Preferably the method is implemented is computer hardware. Alternatively, the method may be computer software implemented.
In accordance with another aspect of the present invention there is disclosed a method for re-sampling an input image comprising input samples to produce an output image comprising output samples, said method comprising the steps of:
determining a set of kernel values based on a position of an input sample, each kernel value in said set corresponding to a distinct output sample position;
multiplying each kernel value by the value of said input sample to form a contribution, each contribution in said set corresponding to a distinct output sample;
first adding each said contribution to a value in a storage location in an output accumulator, the result of said first addition being stored in said storage location in the output accumulator;
second adding each kernel value to a storage location in a sliding kernel accumulator, including replacing said value in said storage location in the sliding kernel accumulator with low order bits of a result of said second addition;
reading an accumulated output value from said output accumulator;
reading a residual kernel weight from said sliding kernel accumulator, said residual kernel weight representing the difference between an ideal kernel weight and said kernel weight;
determining a reciprocal of said kernel weight based on said residual kernel weight,
multiplying said accumulated output value by said reciprocal to form one of said output samples; and
advancing said sliding kernel accumulator by one value.
The determining of the reciprocal may comprise subtracting said residual kernel weight from the ideal kernel weight to produce the reciprocal. Alternatively, that step may comprise using said residual kernel weight as an index into a table to identify said reciprocal.
In accordance with another aspect of the present invention there is disclosed apparatus for re-scaling images, said apparatus comprising:
an input configured to receive a stream of input samples representing an input image;
an output configured to output a plurality of output samples representing an output image;
a calculator arranged to calculate a set of kernel values, dependent on a position of at least one of said input samples relative to the position of one of said output samples;
a multiplier for multiplying one of said input samples by one of said kernel values to form a contribution;
an output accumulator including a plurality of storage locations and an adder for adding one of said contributions to a value stored in one of said storage locations to form a contribution total to replace said value stored in said one storage location;
a sliding kernel accumulator including a plurality of kernel accumulator storage locations and an adder for adding said kernel values to each of said kernel registers; and
an output process by which a contribution total, from one of said storage locations in said output accumulator, is divided by a kernel weight, from one of said kernel registers, to form one of said output samples, and the contents of said kernel accumulator storage locations are advanced by one location.
Typically the apparatus is implemented as a system for resealing an input image, said system comprising:
a first such apparatus and operative in one of a horizontal or vertical direction of the input image; and
a second such apparatus and operative in the other of the vertical and horizontal direction upon an output of the first apparatus to provide a stream of output values representing the resealed image.
In accordance with another aspect of the present invention there is disclosed a method for re-sampling an input image comprising input samples to produce an output image comprising output samples, said method comprising the steps of:
using kernel values based on a position of an input sample to form a contribution to an output sample, the contribution being retained in a sliding output accumulator;
adding each kernel value to a storage location in a sliding kernel accumulator, forming an output sample value by dividing a value from the sliding output accumulator by a value from the sliding kernel accumulator; and
advancing said sliding kernel accumulator by one value.
Other features and aspect of the present invention are also disclosed.
One or more embodiments of the invention will now be described with reference to the following drawings, in which:
In the arrangement illustrated in
As each input sample is received from the input source 100, it is multiplied by four horizontal kernel values stored in a bank of four horizontal kernel registers 130 to produce four contributions. The multiplication is performed by a bank of four multipliers 105. Each contribution is added to the contents of a different register of a bank of registers 143. The addition is performed by a bank of four adders 145, with the results being written back into the registers 143. The registers 143 and the adder 145 in the illustrated configuration collectively form and function as a horizontal output accumulator 140. The values stored in the four registers 143 of the horizontal output accumulator 140 correspond to four distinct horizontally adjacent output samples.
The values in the horizontal kernel registers 130 are generated by a kernel coefficient generator 110 that is synchronized to the input sample source 100 via an input sample clock 120. The coefficient generator 110 generates or otherwise calculates a new set of four kernel values for each input sample position and stores them in the horizontal kernel registers 130. Each of the four kernel new coefficients is added to the contents of a corresponding register in a bank of registers 153. A separate bank of adders 135 is provided for this purpose. The registers 153 and the adders 135 in the configuration illustrated collectively form a horizontal kernel accumulator 150. In the circuit arrangement 199, the kernel is assumed to be a third order poly-phase cubic interpolation kernel. A different “phase” of the kernel is applied to each input sample depending on its position relative to the positions of the distinct output samples to which it contributes. Suitable kernel coefficient generators and methods for their construction are known in the art.
The horizontal output accumulator 140 is arranged as a FIFO (first in first out) queue of output storage locations. It represents a sliding window containing partially calculated output samples. This data structure will be referred to herein as a “sliding accumulator”, and more particularly for this case, a “sliding output accumulator”. The horizontal kernel accumulator 150 is also a sliding accumulator, with each register 153 in the horizontal kernel accumulator 150 corresponding to a register 143 in the horizontal output accumulator 140. Each of the four values stored or contained in the horizontal kernel accumulator 150 represents the partially calculated kernel weight of its corresponding output sample. The kernel weight for a given output sample is the sum of all kernel values that contribute to the output sample. The head of the queue, being register 141, represents the next output sample to be produced.
When all input samples that contribute to the next output sample have been processed, an output process occurs by which the value read from the head register 141 of the horizontal output accumulator 140 is divided by the value read from the corresponding head register 151 of the horizontal kernel accumulator 150. This function is performed by a divider 160, and the result is written to the output stream 170. Each time an output sample is produced, the retained (remaining) contents of the horizontal output accumulator 140 are advanced so that the next partial result advances to the head of the queue and the contents of a last or end register 142 in the horizontal output accumulator 140 are reset to 0. Similarly, the contents of the horizontal kernel accumulator 150 are advanced so that the next value advances to the head 151 of the queue and the contents of the last register 152 are reset to 0.
In the circuit 299 illustrated in
As each input sample is received from the input source 200, it is multiplied by four vertical kernel values stored in a bank of four vertical kernel registers 230 to produce four contributions. The multiplication is performed by a bank of four multipliers 205. Each contribution is added to the contents of a different vertical output accumulator register 243 in a vertical output accumulator cache 280. The addition is performed by a bank of four adders 245, the results being written back into the vertical output accumulator registers 243 of the cache 280. The values contained in the four vertical output accumulator registers 243 correspond to four different vertically adjacent output samples.
The values in the vertical kernel registers 230 are generated by a kernel coefficient generator 210 that is synchronized to the input sample source via an input line clock 220. The generator 210 generates a new set of four kernel values for each line of input and stores them in the vertical kernel registers 230. Each of the four new kernel coefficients is added to the contents of a corresponding register in a bank of registers 253 at the start of each line of input. A separate bank of adders 235 is provided for this purpose. The registers 253 and the adders 235 configured for this purpose as illustrated collectively form a vertical kernel accumulator 250. In this implementation, the kernel is a third order cubic interpolation kernel, although other FIR (finite impulse response) filter kernels may also be used. A different phase of the kernel is applied to each horizontal line of input samples dependent on its position relative to the positions of the output samples to which it contributes.
In order to produce output in raster scan order, unlike the horizontal re-scaling circuit 199, the vertical rescaling circuit 299 requires an additional buffer of four output lines. That buffer is referred to herein as “the vertical output accumulator buffer” 240. The number of lines of the buffer 240 is at least equal to the number of registers in the vertical output accumulator cache 240. In general, the number of lines of buffering required depends on the filter kernel used. In a preferred implementation, which uses a third order cubic kernel that is four output samples high, four lines of buffer are required. This is a minimum, although a number of lines not significantly more that the minimum may be used. The vertical output accumulator buffer 240 is a sliding window containing partially calculated output samples. Before each input sample is processed, a block 281 of four vertically adjacent samples from the vertical output accumulator buffer 240 is loaded into the vertical output accumulator cache 242. The block 281 essentially represents the ‘sliding window’ at one point in time. These correspond to the output samples that are affected by the next input sample. The vertical output accumulator cache 242 is a block of four registers 243 used to temporarily store the values of four partially calculated output samples. These registers 243 are used to accumulate the contributions of the input samples as the input samples are processed. As each input sample is processed, the contents of the vertical output accumulator cache 280 are updated and the values are written back into the block 281 of the vertical output accumulator buffer 240. From this description, it will now be appreciated that the collective function of the cache 280, the adders 245 and the buffer 240 is essentially a vertical equivalent of the horizontal output accumulator 140 of
Like the horizontal kernel accumulator 150 in the horizontal re-sampling circuit 199 in
When all input samples that contribute to the next output sample have been processed, the value from the head register 241 of the vertical output accumulator cache 280 is then output for division by the value read or output from the head register 251 of the vertical kernel accumulator 250. The division is performed by a divider 260 and the result is written to the output stream 270. Each time an output sample is produced, the contents of the vertical output accumulator cache 280 are advanced so that the next partial result advances to the head of the queue and the contents of the last register 242 are reset to 0. This is done before writing the contents of the vertical output accumulator cache 280 back to the vertical output accumulator buffer 240.
Unlike the horizontal kernel accumulator 150 in the horizontal circuit 199 of
It will therefore be appreciated that the arrangements of
In the arrangements of
Moreover, whilst the example of
When implemented using a computer system 1300, such as that shown in
As seen in
The computer module 1301 typically includes at least one processor unit 1305, and a memory unit 1306, for example formed from semiconductor random access memory (RAM) and read only memory (ROM). The module 1301 also includes an number of input/output (I/O) interfaces including an audio-video interface 1307 that couples to the video display 1314 and loudspeakers 1317, an I/O interface 1313 for the keyboard 1302 and mouse 1303 and optionally a joystick (not illustrated), and an interface 1308 for the external modem 1316 and printer 1315. In some implementations, the modem 1316 may be incorporated within the computer module 1301, for example within the interface 1308. The computer module 1301 also has a local network interface 1311 which, via a connection 1323, permits coupling of the computer system 1300 to a local computer network 1322, known as a Local Area Network (LAN). As also illustrated, the local network 1322 may also couple to the wide network 1320 via a connection 1324, which would typically include a so-called “firewall” device or similar functionality. The interface 1311 may be formed by an Ethernet™ circuit card, a wireless Bluetooth™ or an IEEE 802.11 wireless arrangement.
The interfaces 1308 and 1313 may afford both serial and parallel connectivity, the former typically being implemented according to the Universal Serial Bus (USB) standards and having corresponding USB connectors (not illustrated). Storage devices 1309 are provided and typically include a hard disk drive (HDD) 1310. Other devices such as a floppy disk drive and a magnetic tape drive (not illustrated) may also be used. An optical disk drive 1312 is typically provided to act as a non-volatile source of data. Portable memory devices, such optical disks (eg: CD-ROM, DVD), USB-RAM, and floppy disks for example may then be used as appropriate sources of data to the system 1300.
The components 1305, to 1313 of the computer module 1301 typically communicate via an interconnected bus 1304 and in a manner which results in a conventional mode of operation of the computer system 1300 known to those in the relevant art. Examples of computers on which the described arrangements can be practiced include IBM-PC's and compatibles, Sun Sparcstations, Apple Mac™ or alike computer systems evolved therefrom.
Typically, the application programs discussed above are resident on the hard disk drive 1310 and read and controlled in execution by the processor 1305. Intermediate storage of such programs and any data fetched from the networks 1320 and 1322 may be accomplished using the semiconductor memory 1306, possibly in concert with the hard disk drive 1310. In some instances, the application programs may be supplied to the user encoded on one or more CD-ROM and read via the corresponding drive 1312, or alternatively may be read by the user from the networks 1320 or 1322. Still further, the software can also be loaded into the computer system 1300 from other computer readable media. Computer readable media refers to any storage medium that participates in providing instructions and/or data to the computer system 1300 for execution and/or processing. Examples of such media include floppy disks, magnetic tape, CD-ROM, a hard disk drive, a ROM or integrated circuit, a magneto-optical disk, or a computer readable card such as a PCMCIA card and the like, whether or not such devices are internal or external of the computer module 1301. Examples of computer readable transmission media that may also participate in the provision of instructions and/or data include radio or infra-red transmission channels as well as a network connection to another computer or networked device, and the Internet or Intranets including e-mail transmissions and information recorded on Websites and the like.
The second part of the application programs and the corresponding code modules mentioned above may be executed to implement one or more graphical user interfaces (GUIs) to be rendered or otherwise represented upon the display 1314. Through manipulation of the keyboard 1302 and the mouse 1303, a user of the computer system 1300 and the application may manipulate the interface to provide controlling commands and/or input to the applications associated with the GUI(s).
Implementing the method 300 illustrated in
The method 300 illustrated in
In step 315, an input sample is obtained from the input source. In a software implementation, the input source may be an image derived from the storage devices 1309 or the optical drive 1312. The input source may be a sequence of images, such as video data. Images may further be sources from the networks 1320 and 1322, perhaps streamed in real time. In step 320 a set of kernel values is determined, based on the position of the input sample relative to the positions of the output samples that depend on the input sample. The kernel value, used to calculate the contribution of a given input sample to a given output sample, is for example s−1k(xo−xi), where s is the downscaling ratio, k is the continuous kernel function, xo is the horizontal coordinate of the output sample and xi is the horizontal coordinate of the input sample. The downscaling ratio may be set by a user, or established by a default setting whereas the kernel function is generally predetermined for the particular application. Some implementations may offer a selection of kernel functions. The coordinate system used is such that a distance of 1.0 equals the horizontal spacing between output samples. A preconfigured calculator may be used to determine the kernel values. In step 325, the input sample is multiplied by each of the kernel values determined in step 320 to form a set of contributions. Each contribution represents the contribution of the input sample to a different output sample. In step 330, each contribution is added (arithmetically) to the contents of a corresponding register in the output accumulator, the result of each addition being written back to the corresponding register in the output accumulator. Also, each kernel value is added (arithmetically) to a corresponding register in the kernel accumulator in step 335, the result of each addition being written back to the corresponding register in the kernel accumulator.
In step 340 it is determined if all of the input that contributes to the next output sample has been processed, if not, the method 300 returns to step 310 to process the next input sample if it is available, otherwise the method proceeds to step 345.
In step 345, the value at the head of the output accumulator is divided by the value at the head of the kernel accumulator to produce an output sample. This is written to the output stream in step 350. Since the values at the head of the output accumulator and at the head of the kernel accumulator have been used and are no longer needed, the contents of output accumulator and the kernel accumulator are advanced (i.e. shifted along by one value) in steps 355 and 360 respectively. The method 300 then returns to step 305, where the last value in the output accumulator is reset to zero. Similarly, the value stored in last register of the kernel accumulator is reset to zero in step 307.
Many variations of the method 300 illustrated in
In a practical realization of the method 300 illustrated in
The method 400 illustrated in
The method illustrated in
The method 400 then proceeds to step 410 where a test is performed to determine if there are any input samples to process. If not, the method 400 ends at step 403, otherwise the method 400 proceeds to step 415. In practice, step 410 simply tests if the current input pixel location lies within the bounds of the input image where these bounds are expanded to include any extension of the image at the boundaries. A 2D input position can be maintained for this purpose and its practice is well understood in the prior art. In general many different methods could be used according to implementation constraints. For example a 1D raster position could also be used, in which the input source could provide a signal indicating end of line and end of frame.
In step 415, an input sample is obtained from the input source. As each input sample is only ready once, step 415 may also involve incrementing the input position, according to a raster scan or caching values returned by the sample fetching process for subsequent testing at decision 410.
In step 420, the input sample is multiplied by each of the kernel values determined in step 407 to form a set of contributions. Each contribution represents the contribution of the input sample to a different output sample. In step 430, each contribution is added (arithmetically) to the contents of a corresponding location in the output accumulator, the result of each addition being written back to same address in the output accumulator. Note that the vertical down-sampling method 400 described in
In step 440 it is determined whether all of the input that contributes to the next output sample has been processed, If not, the method 400 returns to step 410 to process the next input sample, otherwise the method proceeds to step 445. A typical interpolating filter kernel, such as a third order cubic filter as may be employed for the method 400, has height equal to four times the spacing between the output samples and is symmetrical about the origin. As a consequence, each output sample depends on input samples that have the same horizontal position as the output sample and vertical distance no larger than twice the output spacing from said output sample. The test of step 440 can be implemented as follows: if (xi, yi) are the coordinates of the input sample just processed (i.e. the one obtained in step 415), then after processing the input sample in steps 420 and 430, the next output sample is ready if yi+si−yo>2so, where yo is the vertical coordinate of the output sample, si is the vertical spacing of the input samples and so is the vertical spacing of the output samples. In this example, vertical coordinates increase downwards, and samples are assumed to be located at the centres of the output pixels.
In step 445, the value at the head of the output accumulator is divided by the value at the head of the kernel accumulator to produce an output sample.
The output value calculated at step 445 is written to the output stream in step 450. In practice this step will include incrementing an output position according to a raster scan based on the size of the re-sampled output. Subsequently at step 455 the contents of the output accumulator are advanced (i.e. shifted by one sample) and the last value in the output accumulator is reset to zero at step 460. In the simplest possible implementation, the output accumulator is implemented using a linear memory and the simplest way to perform the advancing is by physically moving all of the samples. Other methods of implementing this are described below.
At decision step 462 a test is performed to determine if all the output samples for the current output line have been written. This could be performed for example by considering the current output position. If the current line is complete, then the method proceeds to step 465 where the contents of the kernel accumulator are advanced by one and execution returns to step 405 where the last register being reset to 0.
The simplest way to perform the advancing of the output accumulator is by physically moving all of the samples. There are many ways of implementing such a buffer so that physically moving the data is not necessary. In the hardware implementation, the cache 280 is employed to reduce memory access bandwidth to the output accumulator as is shown in
The second snapshot 520 of
The third snapshot 520 of
From
There are numerous ways of implementing an output accumulator according to the present disclosure. The vertical output cache 280 as shown in
Both the hardware and software arrangements described above make use of accumulators for their operation. An accumulator operates to combine or add an input value to an existing value. This may occur a number of times to thereby accumulate a number of input values. In some hardware implementations, accumulation can take place within registers configured to perform this function. The register represents a storage location and the result of the addition replaces the previous contents of the storage location. In
An advantage of the arrangements presently described is that a line buffer is not required between the horizontal rescaling circuit 620 and the vertical rescaling circuit 630 and also that the image reduction circuit 600 may be inserted as an independent component of a chain of video processing circuits without any additional buffering being required.
The apparatus described in
The continuous interpolating filter has nominally a total weight (area under the curve) equal to 1.0, and the ideal kernel weight for any output sample will also be equal to 1.0. When the continuous kernel is sampled to produce discrete kernel values, the discrete kernel values may be normalized so that the average weight is 1.0 by dividing by the downscaling ratio. This makes the weight, i.e. the total of the kernel values, for each output sample close to 1.0, but generally not exactly 1.0. To avoid a costly division operation each time an input sample is processed, the reciprocal of the re-scaling ratio, may be calculated once before processing (possibly using slower low-cost hardware such as a general purpose processor), and the kernel values may be normalized by multiplying the sampled kernel values by the reciprocal. The normalization is desirably carried out as part of the kernel coefficient generator (110 or 210).
In practice, there are two stages of normalization, and therefore two divisions that would need to be performed. One division is to divide the value sampled from the continuous kernel by the scaling ratio. This makes the kernel weight approximately 1.0, but not exactly 1.0. To make the weights exactly 1.0, it is necessary to divide each accumulated output value by the actual kernel weight. The first division can be avoided by multiplying by the reciprocal of the scaling ratio. The second division can be avoided by either using a table of reciprocals or by approximation of the reciprocal.
Further, if the average weight is normalized to 1.0, then the fractional bits of the average weight when represented as a fixed point binary fraction are all zero, so the low order bits of a kernel weight as calculated in the method described in either
A first optimization is that it is only necessary to accumulate the low order bits of the kernel values to calculate the low order bits of the kernel weights, so only a small number of bits need to be stored in the kernel accumulators 150 and 250 referred to in
A second optimization is to calculate the reciprocal of the kernel weights from the residual kernel weights using the formula: (1+e)−1˜1˜e. If the kernel weight is 1+e, where e is the residual kernel weight, then (1−e), being the result of subtracting the residual kernel weight from the ideal kernel weight, is a close approximation of the reciprocal of the kernel weight, and thus the need for costly division hardware is avoided by multiplying by the reciprocal of the kernel weight instead of dividing by the kernel weight. The dividers 160 and 260 may then be replaced by multipliers. An alternative optimization is to store the reciprocals of the kernel weights in a look up table indexed by the residual kernel weight, and such a table can be used to identify or look up predetermined reciprocals of the kernel weights. The reciprocal look up table does not need to be very large because deviation from the average weight is typically very small. For third order cubic filters, a table of 64 12-bit values is sufficient to calculate the reciprocal to 10 bits. These optimizations are described below with reference to
A process 7000 of calculating and using reciprocals of kernel weights is described in
According to these various optimizations, the complexity of the kernel coefficient generators 110 and 210 is increased marginally, but such results in the replacement of a division operation with a multiplication operation, possibly performed on smaller (residual) values. These optimization provide for more simplified hardware implementation that, in integrated applications, will avoid excessive chip area consumption. Further, by virtue of the basic processes of
The above that the arrangements described are applicable to the computer and data processing industries and particularly to instances where downsampling of images, such as video images is desired. An example of this may be where a video is captured using a hand-held video camera at television data rates (eg. 625 lines per frame at 25 frames per second for the PAL system), and it is desired to transfer that video footage into a reduced format suitable for distribution via a web page on the World Wide Web. Another example is for the real-time display of picture-in-picture images in television or video displays. Another example of where these approaches may be used is in a large digital camera having an integral display operating in a fixed ratio video mode. Accurate downsampling provides for quality image reproduction in the smaller format with minimal artifacts.
The foregoing describes only some embodiments of the present invention, and modifications and/or changes can be made thereto without departing from the scope and spirit of the invention, the embodiments being illustrative and not restrictive.
Number | Date | Country | Kind |
---|---|---|---|
2007219336 | Sep 2007 | AU | national |