The invention relates to an apparatus and method for processing video data, and in particular to a single instruction multiple data (SIMD) processor that is adapted for processing de-interlacing algorithms.
Video signals come in different frame-rates, thus making video format conversion a core task in almost all video processing apparatus. For example, movie pictures are recorded at 24, 25 or 30 Hz, while TV signals are interlaced at either 50 Hz or 60 Hz. In addition to this, modern displays often work at higher display rates to reduce flickering (for example interlacing at 75 Hz, 90 Hz, 100 Hz, etc). In view of the above, video frame-rate conversion becomes an important functionality in bridging the dissimilar domains, including the displaying of interlaced TV signals on a computer monitor which is based on progressive scan.
De-interlacing is the task of calculating the odd lines from an even field and vice versa. On the low-end side of the performance scale are the algorithms that perform line repetition or line averaging (both of which are intra-field interpolation methods). On non-moving sequences the result of these algorithms suffers from the original 25 or 30 Hz line flickering. Another de-interlacing method is line insertion. Here the missing lines are copied from the same vertical position from the previous field (this is an inter-field interpolation method). On non-moving sequences this algorithm performs very well. However, even with just slightly moving sequences annoying artefacts become visible in the displayed image.
In the past decades, extensive work has been carried out to improve the quality of displayed video material via smart algorithms that have benefited from the growing computational power of integrated circuits. Known methods either provide dedicated ASICs to deal with the computational complexity of high-performance algorithms, or implement part of the algorithm on media processing integrated circuits, such as the applicant's TriMedia processor. Advanced frame-rate conversion techniques apply methods for motion compensation and direction-dependent (edge-dependent) de-interlacing to generate high-quality displayed images. On the high end of the performance scale are the motion compensation methods that use information from the past, shifted according to an appropriate motion vector. Edge-dependent de-interlacing is a method for effectively removing jagged edges from interlaced video. It detects and quantifies edges for optimal image interpolation, with applications in high-end as well as in economy interlacing. An example of advanced de-interlacing is disclosed in “IC for Motion-Compensated De-Interlacing, Noise reduction and Picture Rate Conversion” by G. de Haan, IEEE Transactions on CE, vol. 45, no. 3, August 1999.
a and 3b show examples of the pseudo-codes for carrying out a majority-select median filtering for de-interlacing, and the Edge-dependent post processing functions, respectively. It is noted that a median filter de-interlacing algorithm combines the benefits of line repetition and line insertion, whereby pixels in missing lines are calculated by taking the median of two pixels from the neighbouring lines in the current field, and one pixel from the line on the same vertical position in the previous field. All of these high-end algorithms are computationally intensive and demand high performance figures.
Although it is known to implement such algorithms in parallel processing arrays, such systems do not make efficient use of the de-interlacing functions.
It is therefore the aim of the present invention to provide a SIMD processor that is adapted to process de-interlacing algorithms more efficiently.
According to a first aspect of the present invention, there is provided a processor array for de-interlacing a video data signal, the processor array comprising: an array of processing elements for processing the video data signal to produce a de-interlaced video signal; a previous video field memory, the previous video field memory storing a first plurality of pixels from a previous video field; a current video field memory, the current video field memory storing a plurality of pixels from a current video field; and a next video field memory, the next video field memory storing a plurality of pixels from a next video field, wherein the processor array is configured such that the previous video field memory, the current video field memory and the next video field memory can be accessed simultaneously during a de-interlacing operation.
The architecture described above provides high performance, flexibility and low-power.
According to another aspect of the present invention, there is provided a method of de-interlacing a video data signal using a processor array having a plurality of processing elements for processing the video data signal to produce a de-interlaced video signal, the method comprising the steps of: storing a first plurality of pixels from a previous video field in a previous video field memory; storing a plurality of pixels from a current video field in a current video field memory; storing a plurality of pixels from a next video field in a next video field memory; and enabling the previous video field memory, the current video field memory and the next video field memory to be accessed simultaneously during a de-interlacing operation.
For a better understanding of the present invention, and to show more clearly how it may be carried into effect, reference will now be made, by way of example only, to the following figures in which:
a shows a typical pseudo code for majority-select median filtering for de-interlacing;
b shows a typical pseudo code for edge-dependent post processing;
As with a conventional SIMD processor, the architecture comprises a Linear Processor Array (LPA) 41 having a plurality of Processing Elements (PEs) 42. The LPA 41 can have as many PEs 42 as the number of pixels in a line, for example. Each PE 42 operates on its pixel data based on a common instruction which is broadcast to all PEs 42 from a global control processor 44. The result of the LPA 41 is written in parallel to an output line memory 45. A serial processor 46 performs appropriate post processing (for example, format conversion and statistical processing) on the outgoing video data.
Depending on the chosen operating frequency, the LPA 41 can execute a pre-defined number of operations per image line. Due to the pixel-level parallelism, the same number of instructions are available for processing each pixel.
The global control processor 44 is responsible for the synchronization of the entire SIMD processor architecture. The main task of the global control processor 44 is to update the program counter, to fetch and decode instructions and pass them to the LPA 41. Additionally, the global control processor 44 can receive statistical information from the serial processor 46 and perform dynamic adaptation of filter coefficients, or can even control the flow of the actual program. The global control processor 44 also interfaces to the outside world for program downloading and communicating status information. These features are common in a SIMD processor architecture.
According to the present invention, the SIMD processor architecture described above is adapted to enable the processor to perform de-interlacing tasks more efficiently. The enhancements comprise a field access module (FAM) 47, an input line memory 48 and a shadow memory 49 within the working line memory 43. The input line memory 48 comprises a previous video field memory 481, a current video field memory 482 and a next video field memory 483. The previous video field memory 481 stores a first plurality of pixels from a previous video field, the current video field memory 482 stores a plurality of pixels from a current video field, and the next video field memory 483 stores a plurality of pixels from a next video field.
In a similar manner, the shadow memory 49 comprises a previous-copy video field memory 491, a current-copy video field memory 492, and a next-copy video field memory 493. The previous-copy video field memory 491 stores a first plurality of pixels from a previous copy of the video field, the current-copy video field memory 492 stores a plurality of pixels from a current-copy of the video field, and the next-copy video field memory 493 stores a plurality of pixels from a next copy of the video field.
The de-interlacing algorithm for operating on the received video signal, for example an edge-dependent de-interlacing algorithm, is stored in a program memory 50 together with other video processing codes, and operates on the three video fields, ie the previous, current and next video fields. The processing is conducted in a pipelined fashion in which the processor array operates on the shadow memories 491, 492, 493 while the input line memories 481, 482, 483 are being filled with new data. The architecture is easily scalable to match the desired area, speed and power dissipation trade-offs.
The field access module 47, input line memory 48 and shadow memory 49 work together to address the data preparation part for enabling the efficient utilization of the SIMD architecture for implementing de-interlacing algorithms. The field access module 47 is configured to provide an interface between a multi-port field memory 51 and the input line-memories 481, 482, 483 through proper addressing and synchronization. The field access module 47 takes care of the change of location of previous, current and next fields in the field memory 51.
The provision of an input line memory 48 in the form of a previous, current and next video field memories 481, 482 and 483 facilitates the simultaneous three-field access to the previous, current and next video fields by the linear processor array 41. Likewise, the storage of previous-copy, current-copy and next-copy memories 491, 492 and 493 enables simultaneous access to these memories by the linear processor array 41. Further details about how the input line memories 481, 482, 483 and the shadow memories 491, 492, 493 are utilized during a typical de-interlacing process will be provided below.
Thus, according to the processor architecture of the present invention, while the LPA 41 is busy preparing the next output line, the video input port and the serial processor are also busy receiving in and sending out video data, respectively.
To facilitate the use of the proposed architectural enhancements, the global control processor is preferably provided with a Shadow and Input Memory Sequencer (SIMS) module 51. The SIMS module 51 is a dedicated task that makes use of the index rotation unit of the global control processor 44 to manage the sequence and updating of the line-memory blocks during de-interlacing.
The field access module 47, input line memory 48 and shadow memory 49 exploit the performance of the SIMD architecture for performing de-interlacing tasks. For example, an implementation of the edge-based de-interlacing algorithm given in
Even though the de-interlacing routine in
One of the features of the architecture is its flexibility originating from the programmability of the architecture. The actual pixel processing can be made adaptive to suit the dynamics of the video signal. Furthermore, the coefficients of the filters used or even the algorithmic flow can be altered on the fly.
The proposed approach results in high-performance and yet low-power, since the parallelism in data processing localizes data access and allows the use of a lower system clock frequency. Consequently, the switching power dissipation reduces.
Although the preferred embodiment has been described as having three field memories for processing data from current, previous and next fields, it will be appreciated that one or more field memories could be provided if data from another field or fields is being used in the processing operation. Likewise, fewer field memories could be used if fewer fields are used in the data processing.
Furthermore, although the preferred embodiment discloses the three field memories as being logically separate memories, it will be appreciated that the three field memories could be mapped to one memory with a wide interface to fulfill the bandwidth requirement.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design many alternative embodiments without departing from the scope of the appended claims. The word ‘comprising’ does not exclude the presence of elements or steps other than those listed in a claim.
Number | Date | Country | Kind |
---|---|---|---|
0419870.1 | Sep 2004 | GB | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/IB05/52901 | 9/6/2005 | WO | 00 | 2/28/2007 |