Field of the Invention
The present principles of the embodiments generally relate to a method and apparatus for synchronizing playbacks of two electronic devices and more particularly synchronizing playback of a video and a first audio associated with the video at one of the two electronic devices and playback of a second audio, different from the first audio and associated with the video, at the other electronic device.
Background Information
Multiple ways, such as broadband television (TV) and mobile TV, coexist today to bring multimedia steams or broadcast programs to end users. With broadband TV, the receiver is usually a standard TV device, connected to a receiving device, called a Set-Top Box or STB. With mobile TV, the receiver device is a mobile terminal such as a mobile phone, a Personal Digital Assistant (PDA), or a tablet.
In a MPEG-2 stream, several components, e. g. audio, video, are synchronized between each other in order to be rendered at the proper time. This is called inter-component synchronization. A common example is the lip synchronization, noted lip-sync, which provides the audio at the exact same time as the lips of the person move on the corresponding video. Such synchronization is typically achieved using specific time stamps. In MPEG-2 streams, the Presentation Time Stamp, or PTS, ensures such synchronization. The PTS of the audio sample indicates its presentation time, in reference to the internal clock (which is set thanks to the Program Clock Reference or PCR also contained in the MPEG-2 stream); in the same way, the PTS of the video sample indicates its presentation time, also in reference to the same internal clock.
However, when two electronic devices respectively receiving a video stream and an audio stream, the synchronization between the respective playbacks cannot be achieved easily. Accordingly, there is a need for a method and apparatus for synchronizing audio and video respectively received by two different receiving devices. The present invention addresses these and/or other issues.
In accordance with an aspect of the present invention, a method for synchronizing playback of a program including a video and associated first audio at a first electronic device with playback of a second audio associated with the program at a second electronic device that also receives the video is disclosed. The method comprises decoding, by a video decoder in the second electronic device, the video, and outputting the decoded video; decoding, by an audio decoder in the second electronic device, the second audio and outputting the decoded second audio for playing back by the second electronic device; receiving a user command to synchronize the playback of the video at the first electronic device and playback of the second audio at the second electronic device; responsive to the user command, the method further comprising capturing, by a capturing device in the second electronic device, the playback of the video at the first electronic device; determining, by the second electronic device, an offset between the outputted decoded video and the captured video; and adjusting outputting of the decoded second audio according to the offset, so that the playback of the first audio at the first electronic device is synchronized with the playback of the second audio at the second electronic device. The user command may be generated by a user activating an input mechanism.
In one embodiment, the method further comprises a step of playing back the second audio by the second electronic device from a first position, which is a first time interval away from a beginning of the program in a normal playback of the program, wherein when the playback of the second audio is at the first position, the playback of the program by the first electronic device is at a second position, which is a second time interval away from the beginning of the program in a normal playback, and wherein a difference between the first time interval and the second time interval is within a predefined interval. The method may further comprise a step of positioning the playback of the second audio to the first position responsive to a user signal.
In another embodiment, the method further comprises if the step of determining the offset fails, asking a user to input the user command again, and the steps of capturing and determining the offset are repeated.
to In another embodiment, the method further comprises adjusting, by the video decoder, an output by outputting the decoded video according to the offset, so that outputs of the video decoder and the audio decoder are synchronized.
is In another embodiment, the method further comprises downloading the video and the second audio to the second electronic device before playing back the second audio by the second electronic device. The program received by the first electronic device, and the video and the second audio received by the second electronic device are downloaded from a first source, or respectively from a second source and the first source.
In another embodiment, the method further comprises a step of determining a presentation time stamp associated with a frame in the decoded video, which corresponds a newly captured video frame according to the offset, and adjusting playback of the second audio comprising outputting a sample in the decoded second audio associated with the determined presentation time stamp.
In another aspect of the invention, a second electronic device is disclosed. The second electronic device comprises a video decoder and an audio decoder for respectively decoding a video and a second audio received by the second electronic device and outputting the decoded video and the decoded second audio, the second audio associated with a program comprising a video and the first audio and being played back by a first electronic device; a video capturing device for capturing the video being played back by the first electronic device; a video correlator receiving the captured playback video and the decoded video from the video decoder; and a processor, wherein when the processor receives a user command to synchronize playback of the second audio at the second electronic device with the playback of the video at the first electronic device, the processor is configured to instruct the video correlator to determine an offset between the received captured video and the received decoded video outputted from the video decoder and instruct the audio decoder to output the decoded second audio according to the offset. The second electronic device may include an input mechanism for a user to input the user command.
In one embodiment, the second electronic device further comprises a video player playing back the second audio by the second electronic device from a first position, which is a first time interval away from a beginning of the program in a normal playback of the program, wherein when the playback of the second audio is at the first position, the playback of the program at the first electronic device is at a second position, which is a second time interval away from the beginning of the program in a normal playback, and wherein a difference between the first time interval and the second time interval is within a predefined interval. The video player may position the playback of the second audio to the first position responsive to a user signal.
In another embodiment, if determining the offset fails, the processor is configured to ask a user to input the user command again, and instruct the video correlator to determine the offset again.
In another embodiment, processor is configured to instruct the video decoder to adjust an output by outputting the decoded video according to the offset, so that outputs of the video decoder and the second audio decoder are synchronized.
In another embodiment, the video and the second audio are downloaded to the second electronic device before the second electronic device playing back the second audio. The program received by the first electronic device, and the video and the second audio received by the second electronic device are downloaded from a first source, or respectively from a second source and the first source.
In another embodiment, wherein the processor is configured to instruct the video correlator to determine a presentation time stamp associated with a frame in the decoded video, which corresponds a newly captured video according to the offset, and instruct the audio decoder to output a sample in the decoded second audio associating with the determined presentation time stamp.
In another aspect of the invention, a second electronic device is disclosed. The second electronic device comprises first means and second means for respectively decoding a video and a second audio received by the second electronic device and outputting the decoded video and the decoded second audio, the second audio associated with a program comprising a video and the first audio and being played back by a first electronic device; means for capturing the video being played back by the first electronic device; correlator means for receiving the captured playback video and the decoded video from the first means; and processing means, wherein when the processing means receives a user command to synchronize playback of the second audio at the second electronic device with the playback of the video at the first electronic device, the processing means is configured to instruct the correlator means to determine an offset between the received captured video and the received decoded video outputted from the first means and instruct the second means to output the decoded second audio according to the offset. The second electronic device may comprise an input mechanism for a user to input the user command.
In one embodiment, the second electronic device further comprises a video player playing back the second audio by the second electronic device from a first position, which is a first time interval away from a beginning of the program in a normal playback of the program, wherein when the playback of the second audio is at the first position, the playback of the program at the first electronic device is at a second position, which is a second time interval away from the beginning of the program in a normal playback, and wherein a difference between the first time interval and the second time interval is within a predefined interval. The video player may position the playback of the second audio to the first position responsive to a user signal.
In another embodiment, if determining the offset fails, the processing means is configured to ask a user to input the user command again, and instruct the correlator means to determine the offset again.
In another embodiment, the processing means is configured to instruct the first means to adjust an output by outputting the decoded video according to the offset, so that outputs of the first means and the second means are synchronized.
In another embodiment, the video and the second audio are downloaded to the second electronic device before the second electronic device playing back the second audio. The program received by the first electronic device, and the video and the second audio received by the second electronic device are downloaded from a first source, or respectively from a second source and the first source.
In another embodiment, the processing means is configured to instruct the correlator means to determine a presentation time stamp associated with a frame in the decoded video, which corresponds a newly captured video according to the offset, and instruct the second means to output a sample in the decoded second audio associating with the determined presentation time stamp.
In all three aspects of the invention, the first electronic device may be one of a television receiver, a theater video reproduction device, and a computer.
The above-mentioned and other features and advantages of this invention, and the manner of attaining them, will become more apparent and the invention will be better understood by reference to the following description of embodiments of the invention taken in conjunction with the accompanying drawings, wherein:
The exemplifications set out herein illustrate preferred embodiments of the invention, and such exemplifications are not to be construed as limiting the scope of the invention in any manner.
Referring now to the drawings, and more particularly to
The term “synchronization” as used herein means that the time difference between the audio and the video does not exceed 20 milliseconds (ms) if the audio is advanced with respect to the video or 40 ms if the audio is delayed with respect to the video.
Although the MPEG-2 encoding format is used as an example, encoding according to Digital Video Broadcasting (DVB), Digital Video Broadcasting-Handheld (DVB-H), Advanced Television Systems Committee-Mobile/Handheld (ATSC-M/H), and ATSC A/53 can be used as well.
Furthermore, the first stream 8 can be a broadcast program broadcast from a broadcast source via satellite, terrestrial, or cable. The first stream 8 can also be coming from a local drive, a network drive, or other storage devices accessible by the STB 2. Thus, in some embodiments, the first network 5 is not needed. The first stream 8 may represent an analog television signal as well. In one embodiment, the STB 2 may be integrated into the TV 3, so that the TV 3 performs both sets of functions.
A second stream 7 including the video and a second audio is transmitted by a video server 1 through a second network 6 to a mobile terminal 4. The second audio is associated with the video, which is the same video in the first stream 8. The second audio is different from the first audio. For example, the second audio carries a different language from the first audio. According to the principles of an embodiment of the invention, a user can watch the video on the TV 3 and listen to the second audio on the mobile terminal 4 with the two playbacks synchronized.
The second stream 7 is transmitted to the mobile terminal 4 upon demand and the second stream 7 includes the same video and a second audio. The first stream 8 can be broadcasted to the STB 2 or transmitted to the STB 2 upon demand.
The second network 6 can also be the Internet, a satellite network, a Wi-Fi network, or other data networks accessible wirelessly or with wire by the mobile terminal 4.
According to the embodiment, the second stream 7 can be distributed through a DVB-H network, an ATSC-M/H network or other networks supporting other encoding standards, as long as the mobile terminal 4 supports the encoding formats. The second stream 7 can also received from a storage device accessible by the mobile terminal 4, for example, a storage device connected to the mobile terminal 4 wirelessly or with wire, such as USB. Thus, in some embodiments, the second network 6 is not needed. Although illustrated as a mobile terminal, the mobile terminal 4 might be a device such as a cellular terminal, a tablet, a Wi-Fi receiver, a DVB-T terminal, a DVB-H terminal, and an ATSC-M/H terminal.
The STB 2 may be located in a public hot spot, which comprises one or more displays for presenting the video and one or more speakers for outputting the audible signal of the first audio. When in the public hot spot, an end user listens on a mobile terminal to an audio associated to the video displayed in the hot spot. According to principles of an embodiment of the invention, the audio played by the mobile terminal 4 is synchronized, utilizing a camera attached or included in the mobile terminal 4, with the video being played back by the STB 2. Different users in the hot spot watch the same video, but listening to different audio streams carrying, for example, different languages associated with that video.
The video stream is then decoded by the video decoder 23. The decoded video signal is received by the TV 3 and displayed on the display 31. The audio decoder 25 decodes the first audio stream and outputs the decoded first audio signal to the TV 3. The TV 3 generates an audible output signal, the playback first audio signal, via the speaker 33 in response to the decoded audio signal.
The mobile terminal 4 in this embodiment includes a main processor 40, a video capture 41, a video correlator 42, a video decoder 43, a data multiplexer 44, an audio decoder 45, a speaker, such as headset or an ear phone 46, a camera 47, a display 48, and a keyboard 49. The main processor 40 is the main controller of the mobile terminal 4. Functions of some elements, such as the video capture 41, the video correlator 42, the video decoder 43, the data demultiplexer 44, and/or the audio decoder 45 may be integrated into the main processor 40.
In operation, the data demultiplexer 44 separates and extracts the video stream and the second audio stream from the second stream 7 received from the second network 6. The data demultiplexer 44 outputs the video stream and the second audio stream respectively to the video decoder 43 and the audio decoder 45. The video decoder 43 and the audio decoder 45 respectively produce decoded video and decoded second audio signals in response to the respective the video and second audio streams. The headset 46 renders the decoded second audio signal as an audible signal, the playback second audio signal.
The camera 47 receives the visible output signal from the display 31. The visible signal received by the camera 47 is digitized by the video capture 41, which is also serves as a buffer and transmits the digitized video signal to the video correlator 42. It is noted that both the digitized video and the decoded video signal represent the video but may not synchronize with each other.
The video correlator 42 determines an offset between the digitized video signal from the video capture 41 and the decoded video signal from the video decoder 43.
The video correlator 42 may determine the offset by comparing the digitized video signal from the video capture 41 with the decoded video signal from the video decoder 43 to find out a video frame in the digitized video signal, which corresponds to a video frame in the decoded video signal. Once correspondence is found, the offset can be derived by computing the number of frames between the currently outputted decoded frame for the video decoder 43 and the corresponding frame in the decoded video signal. The offset can be represented by number of frames or time interval. For example, for simplicity of illustration, we assume that each frame in the video stream is denoted by a number and the number of the next subsequent frame is denoted by the number of the frame plus 1. Now, if the digitized video signal from the video capture 41 is lagging behind the decoded video signal from the video decoder 43, the corresponding frame in the decoded video signal of a received digitized video frame should already exist in a buffer of the video correlator 42. Assuming that the corresponding frame is frame 3 and the currently outputted decode video frame is frame 7, the offset should be determined as −4 frame intervals. Thus, the output from the audio decoder 45 must be pulled back by four frames in order to be synchronized with the playback of the video at the TV3. The output of the video decoder 43 and the output of the audio decoder 45 are synchronized by using the embedded synchronization signals in the second stream 7, as known in the art.
Continuing the example above, if the digitized video signal from the video capture 41 is ahead of the decoded video signal from the video decoder 43, the corresponding frame in the decoded video signal of a received digitized video frame is not yet outputted from the video decoder 43. Assuming the currently outputted video frame from the video decoder 43 is frame 3 and the corresponding frame is frame 7, the offset should be determined as +4 frame interval. Thus, the output from the audio decoder 45 must be advanced by four frames in order to be synchronized with the playback of the video at the TV3.
As known in the art, one way to determine the corresponding frame in the video stream of a received digitized video frame is to calculate the peak signal to noise ratio or PSNR of each outputted decoded video frame from the video decoder 43 with respect to the received digitized video frame. The corresponding frame should be one that has the maximum PSNR.
The unit of the PSNR is decibels (dB) and can be calculated as follows:
where
Once the offset has been determined, the video correlator 42 informs the audio decoder 45 to retreat or advance the decoded second audio signal to the speaker 46 according to the offset, so that the playback of second audio at the mobile terminal 4 is synchronized with the playback of the video at the TV 3. Thus, the offset between the playback of the video at the first electronic device and playback of the second audio at the second electronic device is eliminated. Although in this embodiment, the decoded video signal is used as a reference for calculating the offset, the digitized video signal can be used as a reference as well resulting in the sign of the offset being reversed.
Instead of informing the audio decoder 45 the offset, the video correlator 42 may determine a presentation time stamp (PTS) of the decoded video signal which is synchronized with the digitized video signal most recently received according to the determined offset, inform the audio decoder 45 of the PTS, so that the audio decoder 45 can output the decoded second audio signal according to the determined PTS.
In order to reduce the time for the video correlator 42 for determining the offset and/or for reducing the sizes of the buffers (not shown) for storing the digitized video signal from the camera 31 and the decoded video signal, the actual offset between the digitized video signal and the decoded video signal should be less than a predetermined time, for example 10 seconds. This approach may also reduce the size of buffers (not shown) used in the video decoder 43 and the audio decoder 45.
According to the principles of an embodiment of the invention, a user of the mobile terminal 4 should determine the elapsed time of the playback of the video at the TV 3. This information may be indicated on the display 31 of the TV 3 as well known in the art or if the information is not shown on the display 31, the user can find out the starting time of the program from, for example, a program guide and compute the elapsed time using the current time. If the program is played back from a local drive, the user can easily compute the elapsed time by subtracting the playback start time from the current time. Once the user has determined the elapsed time of the video signal, the user should adjust the playback of the second audio at the mobile terminal 4 to a position having an elapsed time that is within the predetermined offset or time interval, preferably 10 seconds, of the determined elapsed time of the playback of the video at the TV 3. The user then instructs the mobile terminal 4 to synchronize the playback of the program at the TV 3 and the playback of the second audio at the mobile terminal 4 by activating an input mechanism, for example, pressing a particular key in the keyboard 49, a particular virtual key displayed on the display 48, or generating a particular gesture in front of the display 48 assuming that the main processor through the display 48 or another camera (not shown) other than the camera 47 is able to detect the particular gesture.
A user may start the playback of the second audio by selecting the second audio, for example, from a web browser on the mobile terminal 4. After the second audio has been selected, the mobile terminal 4 invokes an audio/video player 300, the user interface of which, for example, is shown in
As well known in the art, the user inputs can be coming from the keyboard 49 or the display 48 or both. The main processor 40 then instructs the video decoder 43 and the audio decoder 45 to execute the desired synchronization functions.
Once the user has selected the playback position at the mobile terminal 4, the user can input another signal via the keyboard 49 or the display 48 requesting the main processor 40 to synchronize the playback of the video at the TV 3 and the playback of the second video at the mobile terminal 4. Once the main processor 40 receives the user signal to synchronize the two playbacks, the main processor 40 activates or instructs the video capture 41 to capture the playback of the video at the TV 3 and the video correlator 42 to determine the offset or the desired PTS. The signal requesting the main processor 40 to synchronize may be generated by activating a special key in the keyboard 49, special virtual button on the display 48, or a particular hand gesture detectable by the process 40 via the touch-sensitive display 48 or another camera (not shown) other than the camera 47.
Referring to
A second electronic device, illustratively the mobile terminal 4, is playing back a second audio associated with the program. The second electronic device also receives and decodes the video. The video and the second audio received by the second electronic device are components of the second stream 7. Although illustrated as a mobile terminal, the second electronic device may be any electronic device that is able to receive the playback of the video at the first electronic device.
If the main processor 40 performs the functions of the video capture 41, the video correlator 42, the video decoder 43, and/or the audio decoder 45, the process 400 is performed by the main processor 40. However, those components still exist albeit inside the main processor 40.
At step 405, the main processor 40 is operative or configured to invoke or instruct the video decoder 43 to decode the video and output the decoded video. The video decoder 43 should have an output buffer, so that the video decoder 43 can select which frame in the output buffer to be outputted to the video correlator 42.
At step 410, the main processor 40 is operative or configured to invoke or instruct the audio decoder 45 to decode the second audio and output the decoded second audio for playing back by the second electronic device. The audio decoder 45 should have an output buffer, so that the audio decoder 45 can select which sample in the output buffer to be outputted to the headset 46 for playback.
At step 415, the main processor 40 is operative or configured to receive a user command to synchronize the playback of the video at the first electronic device and the playback of the second audio at the second electronic device. The user input is generated from activating an input mechanism, which may be a particular icon displayed on the display 46, a particular user gesture in front of the display 46, or a particular key on the keyboard 49.
Responsive to the user command to synchronize, the main processor 40 cooperating with other elements at step 420 is operative or configured to synchronize the two playbacks. An illustrative process flow of step 420 is shown in
At step 505, the main processor 40 is operative or configured to invoke or instruct the video capture 41 to capture, by a capturing device of the second electronic device, such as the camera 47, the playback of the video at the first electronic device. The main processor 40 at step 510 is also operative or configured to invoke or instruct the video correlator 42 to determine an offset between the decoded video from video decoder 43 in the mobile terminal 4 and the captured video, which is digitized by the video capture 41. The main processor 40 is then operative or configured to invoke or instruct the audio decoder 45 to adjust playback of the second audio by adjusting outputting decoded second audio according to the offset, so that playback of the video at the first electronic device is synchronized with playback of the second audio at the second electronic device. Since the playback of the first audio and the video at TV 3 is synchronized, and the playback of the video at the TV 3 and the playback of the second audio at the mobile terminal 4 are synchronized, the playback of the video at the TV 3 and the playback of the second audio at the mobile terminal 4 are also synchronized.
It is noted that the main processor 40 cooperating with other components, such as the audio decoder 45 and a video player (not shown), the user interface of which may be shown as in
The predefined interval can be user adjustable and preferably is 10 seconds (300 frame intervals if the frame rate is 30 frames/second) or less, so that the synchronization can be achieved quickly. As discussed previously with respect to
In the case that the server providing the second audio to the mobile terminal 4 knows the position of the video transmitted to the STB 2, the server providing the second audio can determine a position in the second audio that corresponds to the current position of the video transmitted and transmit the second audio from the corresponding position in response to a user input to the server, for example, activating an icon on the server web site. As such, positioning the first position can be done at the mobile terminal 4 or at the server transmitting the second audio.
In one embodiment, if the difference between the first time interval and the second time interval is more than the predefined interval, the main processor 40 is operative or configured to ask the user to adjust the first position in response to the user command to synchronize the two playbacks.
In another embodiment, if the step of determining the offset fails, the main processor 40 is operative or configured to ask a user to input the user command to synchronize the two playbacks again and the steps of capturing and determining the offset are repeated.
According to the principles of an embodiment of the invention, when the decoded second audio output to the headset 46 from the audio decoder 45 is adjusted to the first position, the output of the video decoder 43 is automatically adjusted to be synchronized with the output of the audio decoder 45, so that the output frame in the decoded video also corresponds to the first position. As such, outputs of the video decoder 43 and the audio decoder are synchronized. That is, the output samples from the audio decoder 45 correspond to the output frames from the video decoder 43. For example, the PTS associated with the current output frame from the video decoder 43 and the PTS associated with the current output sample from the audio decoder 45 are the same.
As such, the main processor 40 may instruct the video decoder 43 to adjust its output by outputting the decoded video according to the offset, so that outputs of the video decoder 43 and the audio decoder 45 are synchronized.
In another embodiment, the main processor 40 may instruct the video decoder 43 to synchronize with the audio decoder 45 in response to receipt of an occurrence of the user command to synchronize the playback of the video at the first electronic device and the playback of the second audio at the second electronic device.
An advantage of synchronizing the outputs of the video decoder 43 and the audio decoder 45 is that a user may send the user command to synchronize the playback of the video at the first electronic device and the playback of the second audio at the second electronic device at any time and the two decoders would be ready to perform the synchronization according to the present embodiments of the invention.
In another embodiment, the video and the second audio are pre-downloaded to the mobile terminal 4 before the mobile terminal 4 playing back the second audio. In one embodiment, playing back of the second audio may include playing back of the video received by the mobile terminal 4.
In one embodiment, the video and the second audio can be downloaded to the second electronic device from the same source, for example, the same web site of a service provider that transmits the program to the first electronic device. In another embodiment, the second audio may be downloaded from a different source from the source transmitting the program to the first electronic device. For example, the program received by the STB 2 is received from a broadcast source for a service provider and the second audio received by the second electronic device is downloaded from a web site sponsored by the service provider.
In fact, when the bandwidth of receiving the second audio by the second electronic device is too small, a user can switch to another source for receiving the second audio. This may happen when the user selects a streaming source that has a very low bandwidth and the user is unable to adjust the playback of the second audio to the first position.
In another embodiment, the main processor 40 is operative or configured to instruct the video correlator 42 to determine a PTS according to the offset and provide a PTS to the audio decoder 45 and the audio decoder 45 should output from a decoded sample associated with the PTS. In another embodiment, the main processor 40 is operative or configured to instruct the video correlator 42 to provide the same PTS to the video decoder 43, so that the video decoder 43 should output from a decoded frame associated the PTS. The video correlator 42 once determines the offset can determines the PTS as follows: determining a decoded video frame from the video decoder 43 that should correspond to the next received captured video frame and determining the PTS of the corresponding decoded video frame as the desired PTS.
Although the camera 47 is used as an example for the capturing device for capturing the playback video from the display 31 of the TV 3, the capturing device may be a wireless receiver, such as a Bluetooth receiver at the mobile terminal 4 and the captured video signal is simply the decoded video signal from the video decoder 23 from the STB 2 transmitted wirelessly to the wireless terminal 4.
While this invention has been described as having a preferred design, the present invention can be further modified within the spirit and scope of this disclosure. This application is therefore intended to cover any variations, uses, or adaptations of the invention using its general principles. Further, this application is intended to cover such departures from the present disclosure as come within known or customary practice in the art to which this invention pertains and which fall within the limits of the appended claims.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2014/014153 | 1/31/2014 | WO | 00 |