Turning now to the drawing figures,
Beginning with step 12, a progressive video source is provided, such as a motion picture film. The progressive signal is then converted into a plurality of interlaced video fields in step 14, such as by 3:2 or 2:2 pulldown telecine techniques, as described above. The telecined video fields may comprise a sequence of interlaced top fields, or odd-parity fields, and bottom fields, or even-parity fields. In step 16, each of the interlaced video fields is then partitioned into a plurality of regions. A region can be a horizontal stripe in a field, or a vertical stripe in a field, or it may be defined by a number of neighboring blocks, or a single block of certain size. A block may be a group of connected pixels where two pixels X and Y are said to be connected if X is one of the eight neighbors of Y and vice versa. The region size and/or dimensions can be set to constant values while processing the interlaced video sequence, or, alternatively, the region size and/or dimensions can be dynamically adjusted based upon the content of the interlaced sequence. Ideally, the region is chosen to be small enough to capture film mode variations from region to region in a field, and yet large enough to minimize storage and computational complexity of the video processing system/device implementing the methodology.
The sequence of partitioned interlaced video fields from step 16 can be defined as f(0), f(1), f(2), . . . , where f(n) is the current field whose film modes are to be determined. The plurality of partitioned regions of f(n) may have different film modes and/or different phases due to possible post-edits as described above. In step 18, statistical measurements are taken on f(n) and its neighboring fields (the fields immediately before and after f(n)), both at field level and region level, in order to detect a temporal periodic pattern in the field/regions. A variety of different types of statistical measurements could be employed in this step, such as the sum of absolute differences (SAD) measurements discussed below.
The plurality of regions in a field f(n) from which the statistical measurements are collected may be overlapping or non-overlapping. In the case of regions defined as a plurality of blocks, if the blocks are non-overlapping, then the blocks is referred to herein as tiles. Thus, tiles are non-overlapping blocks. The plurality of regions in a field from which statistical measurements are collected may not cover the entire field area. This limited-coverage implementation may be desirable to reduce the storage and computational complexity of the device or system implementing the method. Moreover, the regions in a given field may have distinct spatial structures. Thus, for example, the entire top portion of the field could be a single region, whereas the bottom portion of the field includes a plurality of smaller regions, such as blocks.
Following the statistical measurements in step 18, in step 20 the film mode of each field is set based upon the field level statistical measurements. Then, in step 22, the film mode of each of the partitioned regions in the field is set based upon both the field level statistical measurements and the region level measurements. Typically, if the field level and region level measurements are consistent, then the film mode of the region is set to be the same as the film mode of the entire field. But if the measurements are not consistent, then the film mode of the region is typically set to be either interlaced or that which is indicated by the region level statistics. The determination of the film mode for a region may also take into consideration statistical measurements from other neighboring regions, or from co-located regions in neighboring fields.
Finally, in step 24, the film mode data for the fields and the plurality of regions within the fields, is utilized to process the interlaced video sequence at the region level. An example of this processing step could be a de-interlacing function in which certain regions of a field in the video sequence are de-interlaced using one technique while other regions of the same field are de-interlaced using a different technique.
The methodology described in
In one example of this methodology, a region is defined as a number of neighboring horizontal lines in a field. When a telecine pattern (for example, 3:2 or 2:2 pulldown) is detected at the field level, then each region in the field is examined to determine whether its local statistical measurements are contradictory to the detected field-level film mode. If they are not contradictory, then the film mode of a particular region is set to be the same as the field-level film mode; otherwise, the film modes of the current region and all the remaining regions in the field are set to interlaced mode.
For example, consider a block “A” and its eight neighboring blocks “B” to “I”, as shown below.
The film mode of the block “A” may be determined according to the following rules: (i) if the statistical measurements of the block “A” and at least t1 of its eight neighboring blocks indicate the same film mode as the field-level film mode, then set the film mode of “A” to be the same as the field-level mode. In this rule, t1 is a programmable parameter in the range of 0˜8, with a default value 5; (ii) otherwise, if the statistical measurements of the block “A” and at least t2 of its eight neighboring blocks indicate the same film mode, but which is different from the field-level film mode, then set the film mode of “A” as indicated by its statistic measurements. Here, t2 is a programmable parameter in the range of 0˜8 with default value 8.; (iii) otherwise, set the film mode of “A” to be interlaced.
Turning back to
The similarity between two blocks can be, for example, based on the sum-of-absolute-differences (SAD) of all the co-sited pixels in the two blocks. In the case that the two blocks are in two fields having different parities, then SAD can be measured between vertically-neighboring pixels in the two fields. The similarity between two blocks can be measured in a variety of other ways.
For each block in f(n), its film mode can be determined based on a history of these similarity measurements for a number of past fields. To achieve this, a history of the statistical measurements (s1 to s5) for each block in a field is tracked and stored in a memory. Although a very small block size may lead to better visual performance of the subsequent de-interlacing function, this will likely result in more complex computations and increased storage requirements for the device/system implementing the methodology. Thus, a reasonable trade-off between visual performance and storage/computation complexity can be achieved by using a reasonable small block size, but one that is not too small so as to increase the storage/computational requirements of the device. The prior art field and pixel-based methodologies do not provide for this type of performance/complexity trade-off. Ultimately, the device performing the video processing function can be programmed by a user with different block sizes depending upon whether the user is interested in maximizing visual performance or storage/computational complexity.
If this detection step 46 indicates that there are two relatively small SADs separated by four relatively large SADs, then the block A exhibits the 3:2 pattern and control passes to step 48. Otherwise, the block does not exhibit the 3:2 pattern and thus in step 50 the block is not set to 3:2 mode. At step 48, the neighboring blocks of the block A are examined. If among the eight immediate neighboring blocks, at least 5, for example, of the blocks have the same 3:2 temporal pattern as does block A, then block A is determined to be on 3:2 mode as in step 52; otherwise, block A is not on 3:2 mode as in step 50.
Operationally, each input field from the input video signal 72 is partitioned into tiles. For example, each tile may be a non-overlapping block of 8 pixels wide and 4 lines high. Statistics are gathered for each tile using the blocks 78, 80, including statistics from the tile in the current field and its co-located tile in the previous same-parity field (block 80), and from the tile in the current field and its co-located tile in the previous opposite-parity field (block 78). The field delay blocks 74, 76 are utilized to provide these opposite and same parity fields to the statistics gathering blocks 78, 80.
The gathered statistics from these blocks 78, 80 are then stored in a statistics memory 82. The statistics memory 82 may include, for example, 10 segments, with each segment storing the statistics gathered for each of the most recent 10 fields. The statistics memory 82 may be utilized in a circular manner at the segment level, i.e., when a new field comes in, the statistics gathered for this new field overwrites the segment corresponding to the most ancient field in the memory.
Each segment in the memory 82 may be further partitioned into a number of cells, with each cell storing the statistics gathered for a tile in the field. This technique provides a unique one-to-one mapping between the tiles in a field and the cells in the memory segment corresponding to this field. The gathered statistics are written into the statistics memory 82 at the tile clock, which is generated by the tile clock generation logic 92 from the pixel clock 86 and line clock 88 in the input video.
The data from the statistics memory 82 is provided to the decision making block 84 on the field clock 90. For each tile in an input field, the statistics of the tile and its neighboring tiles in the same field are examined, as are the statistics of the co-located tiles in the previous 9 fields. The statistics of the spatially-neighboring tiles of the co-located tiles may be considered as well in this block 84. If the statistics match a temporal pattern of a certain film mode, then the decision making block 84 determines that the tile is on the particular film mode with a certain phase. This determination is then provided to the subsequent de-interlacer 94 for the proper processing of the tile into the output video signal.
This written description uses examples to disclose the invention, including the best mode, and also to enable a person skilled in the art to make and use the invention. The patentable scope of the invention may include other examples that occur to those skilled in the art.