This application claims priority under 35 USC §119 to Finnish Patent Application No. 20045211 filed on Jun. 4, 2004.
The present invention relates to a device comprising at least a first control block running a first operating system, a second control block running a second operating system, a bus between said first control block and said second control block for transmitting information between said first control block and said second control block, electro-acoustic converter controlled by said first control block for generating audible signal on the basis of an audio frame, and video presentation means controlled by said second control block for presenting video information on the basis of video frames. The invention also relates to a system comprising at least a first control block running a first operating system, a second control block running a second operating system, a bus between said first control block and said second control block for transmitting information between said first control block and said second control block, electro-acoustic converter controlled by said first control block for generating audible signal on the basis of an audio frame, and video presentation means controlled by said second control block for presenting video information on the basis of video frames. The invention relates to a method for presenting audio and video information in a device, which comprises at least: a first control block running a first operating system, a second control block running a second operating system, a bus between said first control block and said second control block for transmitting information between said first control block and said second control block, electro-acoustic converter controlled by said first control block for generating audible signal on the basis of an audio frame, and video presentation means controlled by said second control block for presenting video information on the basis of video frames. The invention further relates to a computer program product comprising machine executable steps for presenting audio and video information in a device, which comprises at least: a first control block running a first operating system, a second control block running a second operating system, a bus between said first control block and said second control block for transmitting information between said first control block and said second control block, electro-acoustic converter controlled by said first control block for generating audible signal on the basis of an audio frame, and video presentation means controlled by said second control block for presenting video information on the basis of video frames.
There are devices in which multimedia information can be presented. Multimedia information often comprises audio and video components (tracks) which may have been encoded and/or compressed before they are delivered to the device. When playing back a multimedia presentation which is composed of a video track and an associated audio track, the two media tracks should be synchronized to achieve a pleasant user experience. This audio/video synchronization is also known as “lip sync”.
Missing synchronization can be easily perceived when the lips of a speaker are not moving in sync with the heard speech. There exists studies on the effect of the AN-sync accuracy to the subjective quality of the multimedia presentation. For instance, Ralf Steinmetz: Human Perception of Jitter and Media Synchronization (Human perception of jitter and media synchronization; Steinmetz, R.; Selected Areas in Communications, IEEE Journal on, Volume: 14, Issue: 1, January 1996 Pages: 61-72) concludes that the audio and video tracks are perceived to be in-sync when the skew between the two media is between −80 ms and +80 ms (+meaning audio ahead of video).
In general, the principal method of adjusting the AV-sync is to let audio “run free” and adjust the rendering time instant of each video frame accordingly. I.e., the industry standard approach is to synchronize the video to audio. This approach originates from the perceptual psychology: humans perceive jitter in the timing of video frames less disturbing than gaps in the stream of audio samples.
To offer customers the highest application performance, some wireless communication devices have a discrete cellular modem ASIC, separated from an application engine ASIC. Both cellular modem ASIC and application engine ASIC contain processor cores in which they run their independent operating system environments. In these kind of devices the audio hardware (i.e., the A/D and D/A converters, power amplifiers, and galvanic audio routing control) is connected to the cellular modem ASIC for optimised telephony, system modularity and power management reasons. The operating system on the application engine ASIC is running the user interface software of the device, and therefore the display driver software is running on the application engine ASIC. Moreover, the audio and video codecs (such as AMR-NB or H.263 decoders, respectively) used by the applications are executed on the application engine ASIC.
This setup means that there needs to be an inter-ASIC bus between the cellular modem ASIC and the application engine ASIC, to enable the audio data transfer from the audio codec to the audio hardware. The dual-ASIC system is illustrated (on a high-level) in
A common method for inter-ASIC audio data transfer is to use a serial bus as the physical layer. For example, the I2S bus is widely used. On top of the physical layer, it is common to utilize a link layer protocol (level 2 in the OSI model) for transferring the audio data in fixed-length frames. Typical frame lengths range from few milliseconds to some hundreds of milliseconds.
In addition to the above described “audio-only” use case, the baseband engine must be capable of rendering a multimedia presentation which consists of a video track and a synchronized audio track. In this audio and video use case, the display is used for the video track rendering. A software module called AV-sync Module controls the rendering time instant of each video frame. Therefore, the video frames flow through the AV-sync Module. The audio and video data paths are visualized in
In the dual-ASIC system, the above mentioned approach in which video is synchronized to audio means that the application engine ASIC needs to have a synchronizing parameter which can be implemented, for example, as a register or a memory slot, from which the AV-sync Module can read the value of how many audio samples have been played out from the loudspeaker. The AV-sync Module will then adjust the presentation time of the video frames accordingly.
There are some problems with the above described approach. It is not easy to arrange the synchronization between the two ASICs so that the rendering time instant of each video frame is determined by the AV-sync Module on the APE (Application Engine ASIC), and the rendering time instant of each audio sample is determined by the audio driver on the CMT (Cellular Modem ASIC).
In other words, one problem is how to convey the D/A converter clocking information from the cellular modem ASIC to the APE for the updating of the Sync Clock.
A known method for enabling the inter-ASIC audio/video synchronization is to use a common clock signal for both ASICs. This kind of arrangement is disclosed in the U.S. patent application US 2003/0094983 A1 Method and device for synchronising integrated circuits (Nokia Corporation; inventors: Takala, Janne and Mäkelä, Sami). In this method, both ASICs maintain their own hardware clock registers which count the number of pulses in the common clock signal. When necessary, the clock registers are cleared using a common reset signal. In an implementation based on this solution, the D/A converter would get its clock signal from a source which is common with the application engine ASIC. The Sync Clock on the application engine ASIC would be updated based on this common clock signal. The drawback in this method lies in the fact that in a cellular speech call use case the common clock must be synchronized with the cellular network (e.g. GSM or WCDMA network). This means added complexity in the actual hardware implementation, since the common clock must be synchronized with the RF parts of the device.
Another known method is to use the I2S bus as the inter-ASIC audio bus, and configure the cellular modem ASIC as a master and the application engine ASIC as a slave. Physically, I2S is a three-line serial bus, consisting of a line for two time-multiplexed data channels (i.e., the left and right audio channels), a word select line and a clock line. In audio playback, the master provides the slave with the clock signal (derived from the D/A converter clock) and with the word select information. The slave responds by transmitting the interleaved (left-right-left-right- . . . ) audio samples. One drawback of this method is the need for a separate word select line, which increases the pin count of both ASICs and the amount of wiring on the circuit board.
In the present invention there is provided a device, a system, a method and a computer program product for video and audio synchronization. The device according to the present invention is primarily characterised in that said first control block is adapted to transmit a request message to said second control block for requesting an audio frame to be transmitted from said second control block to said first control block, and that said first control block is adapted to play out said audio frame via the electro-acoustic converter within a specified time after said request.
The system according to the present invention is primarily characterised in that said first control block is adapted to transmit a request message to said second control block for requesting an audio frame to be transmitted from said second control block to said first control block, and that said first control block is adapted to play out said audio frame via the electro-acoustic converter within a specified time after said request.
The method according to the present invention is primarily characterised in that the method comprises:
The computer program product according to the present invention is primarily characterised in that the computer program product further comprises machine executable steps for:
The present invention uses timing information of audio frame transfers to enable synchronization between audio and video. This means that no additional hardware is needed. Moreover, no additional signalling or messages are needed in the audio protocol; the solution relies on the same audio protocol which is used in the audio-only use case.
The invention enables the audio/video synchronization without any additional hardware described in the prior art solutions. There is no need for a common clock signal, common reset signal, word select line, or hardware clock registers on both ASICs. This means that the silicon area and the pin count on both cellular modem ASIC and application engine ASIC and the amount of wiring on the circuit board can all be reduced. Instead, the solution is based on the software implementation of the audio protocol.
Moreover, the audio protocol sequence can be exactly the same regardless of whether it is used in an audio-only use case or in an audio+video use case. Thus, there is no additional signalling or protocol overhead incurred from the enabling of the AV-sync in the system.
Furthermore, the audio protocol can be implemented in software on top of any high-speed serial bus. If the bus has a high enough bandwidth, the audio transmissions can actually be time-multiplexed with other data transmissions needed between the ASICs. This reduces the overall pin count even more, since there is no need for separate “audio” and “other data” buses.
A high-speed bus (e.g. several tens of megabits per second) will also provide a lower audio signal latency than e.g. the I2S bus.
Due to the software implementation, the audio protocol is also easily configurable for different platforms and needs.
All the listed advantages reduce the power consumption of the mobile device. This is also very important property of the invention especially when implemented in portable devices.
The invention can also be implemented in devices in which digital signal processor and a controller are integrated on the same chip.
In the following the invention will be described in more detail with reference to the appended drawings, in which
In the following a device 1 depicted in
The device 1 in
The operation of the first control block 2 is controlled by a first operating system 2.6 (
The first operating system 2.6 and the second operating system can be similar operating systems, for example Symbian™ operating systems, or they can be different operating systems.
In this example embodiment of the present invention in
As shown in
In a device of
In
In
When the configuration is performed the first controller 2.1 forms a configuration reply message. The configuration reply message is included with the return values from the first audio driver 2.7 of the first control block 2. The configuration reply message is transmitted 303 by the first control transmission channel 2.2 to the second control block 3. The second control receiving channel 3.3 receives the message and the second controller 3.1 of the second control block 3 examines the message to determine whether the initialisation has been successful. If the first audio driver 2.7 is properly configured for receiving audio information, the first controller 2.1 configures 304 the first data receiving channel 2.5 to receive audio data and the first audio driver 2.7 starts to generate periodic audio control request messages (305, 308, 311 in
At a proper moment the first audio driver 2.7 (
The timing of the transmission of the audio control request messages is based on the timing of the D/A converter 5.4 (
The A/D converter 5.6 depicted in
The second audio driver 3.7 can request decoded audio information from the audio decoder 3.10 when necessary. Then the audio decoder 3.10 may retrieve encoded audio information from the memory 8 when necessary.
The second audio driver 3.7 increments 503 (
The synchronization module 3.9 determines 504 the proper timing of the video frames by e.g. calculating how many audio samples or audio frames are presented for each video frame. For example, if the length of the audio frames is 10 ms and the video rendering rate is 25 frames/s, one video frame will be presented after every 4th audio frame. It should be noted here that one video frame may consist of two half frames (interleaved video) wherein the presentation rate of the half frames is twice the full frames. In the example above this means that one half frame should be presented after every other audio frame. However, the invention is not limited to the above mentioned frame lengths and rendering rates.
The synchronization module 3.9 detects the increment of the synchronizing parameter 3.12 and if the new value of the synchronizing parameter 3.12 indicates that the time to render 505 one video frame has arrived the synchronization module 3.9 retrieves next video frame of the video track and sends it to the display driver 3.8. The video frames may have been previously decoded by the video decoder 3.11 or the synchronization module 3.9 instructs the video decoder 3.11 to decode the next video frame when it is determined that the next video frame should be presented on a display 4.
By the method described above the proper synchronization of video to audio can be achieved without any need for additional wiring or transmission of timing information.
In the following, some guidelines for implementing the present invention are formulated. First, from the second control block 3 (application engine) point of view, each audio frame transfer should be started based on a request signal from the first control block 2. The first control block 2 generates the request signal based on the D/A converter clock signal 5.5. In other words, the first control block 2 should ask for a new frame of data just at the precise moment when there is a need for it. The request signal should be generated periodically (the period should be equally long in milliseconds as is the length of the audio frame). Second, the audio signal path (from the second audio driver 3.7 to the actual speaker element 4.1) should have an approximately constant group delay. Third, the audio frame length should be selected carefully. It should be short enough to guarantee a sufficient resolution for the synchronizing parameter 3.12. The inter-ASIC audio protocol should use fixed-length frames. Otherwise, the length of the frames and/or the number of audio samples in the frame (i.e. the length of the audio signal in the frame) should be indicated to the synchronization module 3.9.
It is obvious that the present invention is not limited solely to the above described embodiments but it can be varied within the scope of the appended claims.
Number | Date | Country | Kind |
---|---|---|---|
20045211 | Jun 2004 | FI | national |