1. Field of the Invention
The present invention relates generally to real time creation and display of combined video sources by a composite video system, also referred to as a video karaoke system.
2. Description of the Related Art
Audio karaoke has been used by individuals to create music during a live performance wherein a user reviews hints or cues provided and responds to the hints by singing at the appropriate times. The hints are typically scrolling lyrics or background instrumental and vocal music, or both.
However, the features of audio karaoke have not been applied to a video environment. A technology called picture-in-picture is supported by some expensive televisions. These only allow an additional window to open up in predetermined section of a television where a second channel may be viewed.
Currently, composite video systems do not exist that incorporate information from multiple video streams and combine them realistically in real time. Similarly, tele-presence systems are primitive and do not support combining subsets of information from multiple video sources.
Current technology, for example, relating to interviewing two individuals in two separate places, is based on having a split screen or multiple boxes within a screen to show the two individuals talking, but who are clearly located in separate places. Video editing provides a way to painstakingly and manually combine the video sources to create an illusion of multiple video sources being a single video source. However, there does not currently exist a real time system, which enables multiple video sources to be combined in such a way so as to create an illusion of a single video source.
Additionally, there is no existing video system comparable to audio karaoke, which enable a live performance to react to cues in a recorded visual performance to insert dynamic video into the recorded video or visual performance.
Thus, a need exists for improvements in the manner with which the video sources and the video systems are made compatible in environments such as a homes or places of entertainment.
Further limitations and disadvantages of conventional and traditional approaches will become apparent to one of ordinary skill in the art through comparison of such systems with the present invention.
The present invention relates generally to real time creation and display of combined video sources by a composite video system. Although the following discusses aspects of the invention in terms of a video karaoke system, it should be clear that the following also applies to other systems such as, for example, live video broadcast, virtual reality systems, etc.
The video karaoke system 105, shown in
Another example might be viewing a prerecorded action scene on a display, wherein the viewer physically enacts portions of the scene in front of a video capture device such as a camera. The feed of the camera is then superimposed on the action scene, to create an illusion that the viewer is part of the action scene. The viewer can therefore take hints or cues from the combined scene viewed on the display.
In one embodiment, the first video source 107 is a pre-recorded video program and the second video source 109 is a live video data, or audio-visual data, captured from a camera. In addition, the second video source captures a viewers actions that are combined by the mixing unit 113 with a pre-recorded video unit, the combined output being displayed by the display unit 115 such that the viewer can see the display and react to it.
In another embodiment, the mixing unit 113 mixes the information from different video sources by changing certain parameters of the video sources. For example, the first video source 105 can be a static data of a background scene, while the second video source 107 can be an image of a person. The video data from the second video source 109 is view morphed and mixed with the video data from the first video source 107 and the combined output is displayed on the display unit 115. This mixing also comprises mixing of video, graphics and text by adjusting certain parameters of the video sources 107, 109.
In a different embodiment, the region selecting unit 111 or the mixing unit 113 might be configured with a resolution-adjusting capability, such that in situations where first video source 107 and second video source 109 are in different spectral bands or have different resolutions, the resolutions can be adjusted as necessary. For example, in some implementations it might be desirable to adjust the resolution of the background scene so that an illusion of a 3D image can be created. Various phase shifting implementations can also be utilized, or a conventional 3D video data employing well known 3D-glasses could be implemented.
In one embodiment, the invention includes a composite video system having a first video source 107 and a second video source 109 wherein the video karaoke system 105 combines at least a portion of a video data from each video source to create a composite video. The mixing unit 113 receives first and second video data from the first and second video sources 107, 109, with the mixing unit 113 providing a combined output having at least a portion of the first video from the first video source 107 and a second video data from the second video source 109 in a composite video stream.
In another embodiment, the invention includes a video karaoke system 105 having plurality of video sources, each providing a different type of video data. For example, one of them provides a still video image, another provides a live video image such as those captured by a digital camera, while a third provides a pre-recorded video clip. A mixing unit 113 receives video data from the plurality of video sources, with the mixing unit 113 providing a combined output having at least portion of plurality of video data in a combined output video stream that is stored (such as in a personal video storage) or optionally displayed.
The present invention also provides a method of providing a combined output video image from one or more input video sources. The method comprises providing a first 107 and second video source 109 and selecting a region of interest in the first or second video source. The method also comprises mixing the selected region of interests from the first 107 and second video source 109 to provide a combined output video image that may be stored or displayed on a display unit 115, or both.
Then, at a next block 215, the mixing unit mixes the required regions of interest from the first and second video sources to create a combined output that can be displayed. At the next block 217, the combined output from the mixing unit is displayed on the display unit. Finally, the operation terminates at an end block 219.
In one embodiment of the present invention, the display unit displays an overlay of two unrelated video streams that is combined together by a mixing unit that superimposes the region of interests from the first video source onto the region of interest of second video source.
A user can select the regions of interest from the video sources 307, 309 while the associated video data is being fed to the selecting unit 311. In one embodiment, utilizing such conventional input and control devices such as the keyboard, mouse, wireless pointing device, a tablet, a touch-screen, etc. the user can control the selecting unit 311.
The appropriate regions of interest in the input video sources 307, 30 are selected based upon appropriate locating methods, such as coordinates in an area of a screen. In addition, selection of a predefined object is supported, whether it is selection dynamic or a static selection based upon predefined characteristics of the object.
In general, software or hardware can be configured within the selecting unit 311 to track or to follow a dynamic region of interest, such as a talking person, a moving person or object such as a condenser, a racing car, or virtually any other moving device. The mixing unit 313 can be configured to superimpose video information from the first video source 307 onto a background from second video source 309, or to superimpose information from second video source 309 onto an image provided by first video source 307.
In one embodiment, a separate superimposing unit 317 is used to superimpose one image from one video source onto another. One example of such superimposition might be the utilization of background information, such as a mountain scene or a stage, from second video source 309 for superimposing the image of a person onto the selected background, the image of the person being accessed from the first video source, which could be based upon a video created in a studio. Through the use of image tracking software provided in either the selecting unit 311 or the mixing unit 313, a moving image can be tracked from the first video source 307, and realistically superimposed onto the background scene extracted from the second video source 309. In one embodiment, the software and hardware provided with the video karaoke system 305 is used to adjust shading and contrast between the superimposed images so as to provide a realistic superimposition of the superimposed image onto the background scene. In a related embodiment, the video manager 321 facilitates such adjustments of shading and contrasts, utilizing the control 319.
The appropriate regions of interest are selected based upon locating methods such as identifying coordinates in an area of a screen, selection of a predefined object from a list of predefined objects, dynamic determination of objects based upon predefined characteristics of objects, etc. Software or hardware can be configured within the region selecting unit 111 to track or to follow a dynamic region of interest, such as a talking person, a moving person or object such as a condenser, a racing car, or virtually any other moving device.
The video karaoke system 405 also comprises the remote control interface 423, and the video manager 421, which together facilitate the remote control of the region of interest from the video sources. In addition, superimposition of video images from the various sources is also supported.
One example of video superimposition is superimposition of thermal IR data on visual data for detecting seepage in the walls. The first video source 407 could be stored visual data from video art library 425, the second video source 409, could be thermal IR data of the same scene. The region selecting unit 411, coupled to both the first video source 407 and second video source 409, is used to select a user defined region of interest from the video sources 407, 409. The required region of interest from second video source 409, for example, is superimposed on the video from the first video source 407, so that seepage in the walls can be detected, since it is not possible using visual band data to detect the seepage in the walls.
In certain embodiments of the invention, the display 115 is placed in visual proximity to a viewer who is presumed to be participating in a event wherein the user's image is incorporated into a displayed video content or program. The viewer is therefore performing in front of a camera that serves as the first or second video source 407, 409. Watching the combined output on the display unit 415, which could be a background scene from one video source with a superimposed image of the viewer captured in real-time using a camera, the viewer can adapt his or her physical movement so as to make the physical movements synchronize with movements of an object in the other video source with which it is being combined. Thus, using the mixing unit 413, and video inputs from two sources, wherein one of them is a live video captured from a viewer acting such that his physical movements are made in a reaction to another video source that is viewed, a realistic video karaoke image is created that is displayed on the display unit 415.
In one embodiment, a motion picture scene, a video program, a video game, or other scene from one of the video sources is combined with video data from the video library 425 or video data from the other video source. It should be noted that the elements illustrated in
The output of contrast/border adjusting unit 530 is selectively fed, in certain embodiments, to a feedback control unit 540, which receives feedback from display 515, to enable real time adjustments in any of image tracking, shading, or contrast/border adjusting. The feedback control is not necessary in all embodiments.
The first video source 107 and second video source 109, in addition to the types of images discussed above, might also include one or more of motion picture video, martial arts video, video game images, etc. Various video recordings can be stored in a video library and accessed by users for various applications. The mixing unit 113 is configured to mix various video content based upon parameters, which can be preset by the user. The mixing unit 113 is also configured to mix various types of content by changing certain parameters of the video sources. For example, first video source 107 could be video of static background, and second video source could be dynamic activity of a person. The mixing unit 113 is capable of zooming the image of the person in the second video source and same it superimpose on the first video source. The mixing unit 113 is configured to mix plurality of video sources by changing certain parameters of the video sources such as resolution, contrast and dynamic range.
It would also be possible to utilize an image tracking unit 210 on both inputs from the first and second video sources 107 and 109, to enable real time composition from two or more video sources. It is possible to provide video data from third and fourth video sources, and image tracking, shading control; contrast/border adjustment can be configured as necessary.
In certain embodiments of the present invention, the second video source 109 might be a prerecorded stage or background scene, and first video source 107 can be live video providing video data from a remote location. It is also possible for second video source 109 to be stored video from the video library. Selection of an image from first video source 107 to be superimposed onto video source 109 can be done, for example, with a keyboard, mouse, or wireless remote control unit. Selection of the image can be done within selecting unit 111, either by manually or automatically highlighting a region of interest. Another embodiment is one wherein both first video source 107 and video source 109 are prerecorded and wherein regions of interest are selected within selecting unit 111 to be combined and superimposed appropriately. In another embodiment, first video source 107 and second video source 109 could be live feeds from video cameras, where certain aspects of each live feed are selected by selecting unit 111 and mixed by mixing unit 113, then output from mixing unit 113, and ultimately displayed on a display unit 115.
In one embodiment, a combined video output for a live telecast of a conversation between two users could comprise a first video source 107 containing the image of a first speaker, a second video source 109 containing an image from a second speaker, and a third video source 125 that could be a stage or studio background. The selected regions of interest from first video source 107 would be the first speaker, the selected region of interest from the second video source 109 would be the second speaker, and region selecting unit 111 would select the images of the first and second speakers, and the background from the third video source, transmit them to mixing unit 113 which would apply shading control, and contrast/border adjustment to the images, place the images in the appropriate locations in the background, and output the signal which would then be received by users or viewers, and output on a display. The intended net effect or the impression created for a viewer, therefore, would be the image of the two speakers being in the same room or the same studio, or in the same premises, having a face-to-face conversation, even though they are actually in remote locations. A fourth or fifth video source could be provided, as necessary, which could provide images of a moderator, or other scenes or persons.
In one embodiment of the invention, first video source 107 could be a video output from video camera aimed at a person or a viewer of the display unit, the second video source 109 could be, for example, a scene from a movie, and the video karaoke system makes it possible to superimpose the image of the viewer's face captured by the video camera (and tracked by the video camera) such that the combined output viewed is one where one of the characters in the scene from a movie is that of the viewer or person whose image is being captured via the video camera. Thus, it a person at home could, for amusement purposes, superimpose their image, captured as one of the video input sources, in place of a character in a movie, such as those of a action hero in a well known movie.
In one embodiment, a set-top-box at a user's premises is capable of not only receiving the cable TV or satellite broadcast signals for display on the television display, it is also capable of capturing a video stream (or signals) from the local second video source. It is also capable of combining video sources under the control of a user, whose input is provided via a remote control or via a keyboard. Thus, the user can control which characters in a movie being received from a satellite broadcast or a Cable TV broadcast is to be replaced by the real-time image captured from a local (second video source) camera. The set-top-box provides the functionality of the mixing unit in one embodiment. In another embodiment, the television display provides the functionality of the mixing unit.
In one embodiment, a multiple video data of same scene, acquired by different video cameras, each camera considered as one video source 707, 79, 725 provides complementary information about the same or similar live scene. The superimposing unit 713 (or a mixing unit 113) combines information from different video sources 707, 709, 725, to get multispectral information about the same or similar scene. The output of superimposing unit 713 is provided to a display unit 715, which displays the multispectral information about the same or similar scene, which gives different, and a more comprehensive information about the same or similar scene than would be possible from a single video source (in a single video output).
While the present invention has been described with reference to certain embodiments, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted without departing from the scope of the present invention. In addition, many modifications may be made to adapt a particular situation or material to the teachings of the present invention without departing from its scope. Therefore, it is intended that the present invention not be limited to the particular embodiment disclosed, but that the present invention will include all embodiments falling within the scope of the appended claims.