The present invention is generally related to video processing and, in particular, to a system and method for enhancing video effects.
Presently, vendors of video services typically produce compressed video sequences based on some higher quality video sources. The compressed video sequences subsequently are delivered through a communication network to a group of end users for viewing on certain devices. The communication network can be either traditional broadcasting networks (over the air or cable network), or any data networks (internet or mobile network or home network), or the emerging peer to peer networks, or the combinations of them. The devices that end users use for viewing the produced video sequences have displays of different designs or sizes, such as large screen televisions found at consumers' homes, or small liquid crystal displays (LCD) used on mobile phones as well as any portable video/multimedia devices. End users are often people without any knowledge on video processing.
Current video processing methods and systems are usually designed under a one-size-fits-all principal that produce one set of main video for different viewing devices, allowing little control by end users in processing and displaying video signals. For example, when watching television at home, no matter what kind of television the user has, he or she always gets the same video sequence for displaying on the television. The user only has some very limited choices as to how the video is to be displayed, such as whether to add subtitles or not, or whether to display a smaller picture within a larger picture or not, commonly referred to as picture in picture. Other than that, not many meaningful video adjustments are available to the end users. Such a one-size-fits-all model often needs to satisfy a minimum quality requirement while minimizing both the bandwidth in delivering video sequences over networks and the system complexity on the devices that receives and/or displays video sequences. Although the one-size-fits-all model is convenient for the service providers, but it may not be able to offer satisfying viewing experience to all users because the very significant differences existing among the viewing devices of the users.
There is another challenge associated with current video processing method, which is when processing videos containing small objects, and delivering such processed video sequences to a small screen for display, the small objects often become hard to discern, sometimes even totally disappear. This can happen when broadcasting either baseball or tennis matches to a mobile phone that can display video sequences on a small LCD screen. A typical baseball has a diameter under 3 inches and a typical baseball field has 90 feet between adjacent bases. If a pixel is used to display a baseball, it requires more than 360 pixels to display adjacent bases. For any video sequences with less resolution, the baseball can disappear during either the compression or the transcoding or the transcaling process. In addition, even if a high resolution format and high resolution video display device is chosen which can allocate more pixels to the baseball, the baseball may still be less than 0.5% of an inch on a small screen which will make it hard to see with a naked eye at a normal distance.
Therefore, there is clearly a need for an improved video processing method and system to address these challenges.
Possible embodiments of the invention are discussed in this section.
To deliver quality video services over heterogeneous networks to various display devices is a serious challenge for deploying new video services. Objectively, the service providers would want to reduce the communication bandwidth requirement dramatically while maintaining a minimum quality requirement by adopting new video standards, such as MPEG4 and H.264. However, the same kind of video processing, compressing method will cause drastically different results depending on the kind of the image that is being transmitted.
For example,
According to one embodiment of the invention, one of the higher quality video files, before it is preprocessed for broadcasting, when critical image elements are still clearly viewable or traceable, we call this video file the master copy, or the parent video. After the parent video is processed at least once, we call the resulted video file the child video. After the child video is processed at least once, we call the resulted video file the grandchild video. After the grandchild video is processed at least once, we call the resulted video file the great grandchild video.
The parent video usually has a lot of details including those that are essential to the theme of the video. However, parent videos are often very large in size and therefore difficult to be delivered over a bandwidth limited network. Processing the parent video into a child video to reduce the video size as well as the video resolution often involves compression or transcoding or transcaling. This processing step introduces the possibility that a critical image element may get lost.
According to one embodiment of the invention, a generic method is employed to obtain the information of the critical image element. The information may include the horizontal and vertical positions of the critical image element in the various image frames of the parent video, the size of the critical image element, the contour of the critical image element, the color, brightness, etc. The information can be obtained using any video object acquiring/tracking system available today, such as those discussed in articles “A Scheme for Ball Detection and Tracking in Broadcast Soccer Video”, by Dawei Liang, Yang Liu, Qingming Huang, and Wen Gao, published on the 6th Pacific-Rim Conference on Multimedia, Jeju Island, Korea, pp 864-875, Nov. 13-16, 2005, and “Preprocessing of Ball Game Video Sequences for Robust Transmission Over Mobile Network”, by Olivia Nemethova, Martin Zahumensky, Markus Rupp, and Tu Wien, published on CDMA International Conference, Seoul, Korea; Oct. 25-28, 2004. The first publication, “A Scheme for Ball Detection and Tracking in Broadcast Soccer Video”, describes a method for both detecting the ball in a ball game and tracking the ball in a sequence of video frames of a ballgame video. Multiple video frames are utilized for performing the functions. When detecting the ball, this scheme uses color, shape and size for extracting ball candidates in each frame, and compares the information in adjacent frames. Viterbi algorithm is applied to extract the path that is likely to be the ball's path. After the ball is detected, Kalman filter and template matching are used to track ball location. Ball location information is constantly updated during the tracking step for possible ball re-detection. The second publication, “Preprocessing of Ball Game Video Sequences for Robust Transmission Over Mobile Network”, describes a different method for tracking a ball using trajectory knowledge, position prediction, and the sum of absolute differences counting. These image element detecting and tracking methods, as well as other detecting and tracking methods available currently, can be used to perform the tracking step of this invention to find and track the location of the critical image element in the image frames of the parent video.
According to one embodiment of the invention, once the information of a critical image element is obtained from the parent video, the parent video is then processed by a compression method to produce the child video so as to reduce video file size for transmission over a network. The compression methods that can be used include standard methods such as H.264, MPEG 4, and VC-1. In some situations, a parallel camera can be used in conjunction with the main camera to produce a low resolution video at the same time when a high resolution parent video is produced. If such low resolution video has the same content as the high resolution parent video but at a much smaller size, it can be used as a child video as well.
Once the child video is obtained, according to one embodiment of the invention, a grant child video is produced by reconstructing the critical image element onto the child video using the information of the critical element produced from the parent video. To perform this function, certain information needs to be adjusted. Some of the adjustments are done based on a comparative relationship between the parent video and the child video. For example, the horizontal and vertical positions of the critical image element in the various image frames of the parent video need to be adjusted to place the critical image element in the same locations of the corresponding image frames of the child video based on a comparison of the horizontal and vertical sizes of the corresponding image frames of the parent video and the child video. This may be done by adding a factor to the numbers representing horizontal and vertical positions, and the factor corresponds to the compression ratio of the child video. For example, if the size of the images frame in the child video is reduced by half both horizontally and vertically, then the number representing the horizontal and vertical positions of the critical image element in the image frames of the parent video can be both reduced by half accordingly. Other factors can also be introduced to adjust other tracking information for the critical image element relating to size, contour, color, brightness, etc. Some of these factors can be arbitrarily decided by the producer of the child video. After the tracking information for the critical image element is adjusted with the factors as exemplarily explained in the above, the adjusted tracking information is then employed to reconstruct the critical image element onto the child video so as to produce the grandchild video. The critical image element can be reconstructed onto the child video using the adjusted tracking information by various methods. It can be simply redrawn to the various image frames of the child video using the tracking information or can be blended into the child video using alpha blending. Alpha blending is a commonly used imaging processing method for the purpose of combing multiple layers of image frames with various degrees of opacity. If the tracking information contains only the position information of the critical image element, the tracking information can be multiplexed with the child video for transmission purposes using any standard multiplexing methods. More than one critical image elements can be processed following the same method. When multiple critical image elements are involved, they can be distinguished either by their different characteristics such as shape, size, color, brightness, etc., or by their respective trajectory paths, or the combination of both. Some of the imaging processing methods can be in compliance with international standards such as H.264, MPEG4, or VC-1.
In a H.264 environment for example, the reconstruction of the critical image element onto the child video using the adjusted tracking information extracted from the parent video can be conducted through one or more of the following steps.
According to the H.264 standard, alpha blending is performed using an auxiliary coded picture and a primary coded picture. The auxiliary coded picture is an auxiliary component of the image video, and the support for the auxiliary coded picture is optional. The primary coded picture may have a background picture and a foreground picture. Both the foreground picture and the auxiliary coded picture are suitable for carrying tracking information related to the critical image element. Section 7.4.2 of the March 2005H.264 specification prepublication, which is hereby incorporated by reference, details how to perform alpha blending so as to reconstruct the critical image element onto the child video to produce the grandchild video. For illustrative purposes, we use a baseball game video as an example. The critical image element in this video is the baseball.
First, the spatial and temporal information of the critical image element, the baseball, is obtained from a high quality baseball game video, the parent video. The parent video is compressed using the H.264 standard to generate the child video. The child video has the primary coded picture, which can be either one sequence of video pictures, or two related sequences of video pictures comprising the background picture and the foreground picture. A separate auxiliary coded picture may also be generated based on the producer's preference.
Then, the tracking information of the critical image element (the baseball), such as the spatial and temporal information of the baseball, is marked in the frames of either the foreground picture of the primary coded picture, or the auxiliary coded picture, or both. Such marking can be done for example by simply drawing the baseball to the foreground picture or the auxiliary coded picture using the tracking information.
In one situation, the tracking information of the critical image element only contains the center of the baseball. In this case, it may be that only the pixel in the center of the ball is marked in the foreground picture or the auxiliary coded picture using the tracking information.
In another possible situation, the tracking information of the critical image element may include the contour of the baseball in addition to the center. In this case, a larger region may be marked in the foreground picture or the auxiliary coded picture corresponding to the baseball using the tracking information. The tracking information may be adjusted tracking information as discussed earlier.
After the foreground picture or the auxiliary coded picture is marked with the tracking information, the primary coded picture, in this case the core of the child video, is delivered to end user, as well as the auxiliary coded picture if it is generated. The tracking information of the critical image element has been embedded into either the foreground picture or the auxiliary coded picture, or both. Because the generation and transmission process is in compliance with the H.264 standard, any H.264 compliant device can display the sequence. Since the support for auxiliary coded picture is optional under H.264, in the situation that the producer generates an auxiliary coded picture for carrying the critical image element tracking information, the producer can send an instruction to the end user device alerting it to process the auxiliary coded picture.
Once the end user device receives the primary coded picture and the auxiliary coded picture, it can then generate the grandchild video by performing alpha blending as described in section 7.4.2.1.2 of the March 2005 H.264 specification prepublication. If the auxiliary coded picture is not generated and the tracking information is carried by drawing the critical image element to the foreground picture of the primary coded picture, alpha blending can be performed between the foreground picture and the background picture of the primary coded picture.
In a MPEG4 environment, similar process can be followed. MPEG4 also supports alpha blending. A difference between MPEG4 and H.265 is that in MPEG4 there is no primary coded picture and auxiliary coded picture. Instead, video objects are coded into video object planes (VOPs), and the grayscale shape information can be an auxiliary component of a VOP. Consequently, multiple VOPs can be used as the background picture, the foreground picture and the auxiliary coded picture, respectively. The critical image element tracking information can be carried by the VOPs that contain similar image information as the foreground pictures or auxiliary coded pictures in a H.264 environment. The tracking information can be preserved by drawing images onto the VOPs basing on the tracking information. A grandchild video can be generated by performing alpha blending using these VOPs similar to performing alpha blending using primary coded pictures and auxiliary coded pictures in a H.264 environment.
Moreover, since a VOP in a MPEG4 environment can carry grayscale shape information, each frame of the child video can be represented by just one VOP. The tracking information of the critical image element can be incorporated into an auxiliary component of the VOP such as the grayscale shape information. Section 7.5.5 of the International Standard ISO/IEC 14496-2, Second Edition, is a detailed introduction of the grayscale shape information and how to carry image information with grayscale shape information, which is hereby incorporated into this specification by reference. A grandchild video can be generated by reconstructing the critical image element onto the child video using the tracking information contained in the grayscale shape information. This is particularly useful for a low profile MPEG4 video and other videos that have similar structures.
It is noted that the above described processes are just some examples. The current invention does not have to comply with international standards, and if it does, it can introduce variations. For example, when the tracking information contains only the center position of the critical image element, the service provider can send a pattern to go along with the child video, or the pattern can be pre-stored on the user end device. The grandchild video can be generated with the pattern together with the primary coded video, the auxiliary coded video, or the VOPs, placing the pattern at or near the center position of the critical image element. Furthermore, user inputs can be solicited by the user end device to determine the characteristics of the pattern, such as its size, color, brightness, etc. If the tracking information for the critical image element contains information such as the size, contour, color, brightness, etc. of the critical image element, user inputs can also be solicited by the user end device to change such characters before generating the grandchild video.
The above described processes can be extended to the scenario where there are more than one critical image element, because there is no limit on how many items can be shown on the foreground picture, the auxiliary coded picture, or the VOP. It is generally possible to code many image elements on the foreground picture, the auxiliary coded picture or the VOPs. These image elements can be differentiated by such characters as color, shape, and location.
The user input may be received through any comment input hardware and software devices, such as infrared receivers for remote controls, or input keys on the user end displaying device. User inputs may be used as an additional set of factors for further adjusting the adjusted tracking information, for example size, color, brightness, etc. of the critical image element(s). User inputs may also be used for retrieving and adjusting pre-stored image patterns to be used as a replacement of the critical image element(s). For example, if the tracking information consists only the position of the center of a critical image element such as a baseball, then a circular image pattern can be pre-stored. The pre-stored image pattern can then be used to reconstruct the critical image element onto the main video by placing it at the center positions contained in the tracking information. User inputs can be used to retrieve such pre-stored image pattern and make changes to its size, color, brightness, etc. Alpha blending is one of the many possible ways of achieving such reconstruction. A great grandchild video can be produced by reconstructing the critical image element(s) onto the grandchild video employing the further adjusted tracking information. The great grandchild video is then displayed for the end user.
Alternatively, following similar processes as discussed above, the tracking information or adjusted tracking information and user inputs can be used to reconstruct the critical image element(s) onto an independent set of image frames rather than onto the image frames of the child video. Then, the independent set of image frames and the image frames of the child video are displayed separately but in such a sequence and speed that they are visually blended in the eyes of the viewers. Some image frames of the independent set of image frames would be displayed in between some of the image frames of the child video.
It is obvious that there are numerous different variations and combinations of the above described embodiments of the invention. All these different variations, combinations and their equivalences are considered as part of the invention. The terms used in this description are illustrative and are not meant to restrict the scope of the invention. The described methods have steps that can be performed in different orders and yet achieve the same results. All the variations in the orders of the method steps are considered as part of this invention as long as they achieve substantially the same results. It is also well understood that image video files have multiple image frames. Different image frames of the same video can be at different processing steps at the same time. For example, some early image frames in a video may be at step 18 being displayed in front of a user, when some later image frames in the same video are still at step 15 being processed, such as in the case of a live broadcast. Event though it is one possible embodiment that all the image frames in one video is processed before the video is moved to the next step, the invention is certainly not restricted to this process. The terms video file, parent video, child video, grandchild video, great grandchild video and other similar terms are used to refer to a sequence of image frames having a certain relationship to each other. They do not have to be final electronic files saved on a medium.
The invention is further defined and claimed by the following claims.