Audio Video Interleave (AVI) is a file format, based on the RIFF (Resource Interchange File Format) document format. AVI files are utilized for capture, edit, and playback of audio-video sequences, and generally contain multiple streams of different types of data. The data is organized into interleaved audio-video chunks, wherein a timestamp can be derived from the timing of the chunk, or from the byte size.
In general, an AVI system may derive time information from any of the following three sources: real time clock (RTC), video-sync (V_sync), and system time clock (STC). The video encoder utilizes the video-sync for encoding video frames, and the audio encoder utilizes the STC for encoding audio frames. Both the audio and video encoder utilize the STC to determine a presentation time stamp (PTS) value for the data.
In practice, there often exists a discrepancy between the timing of the three clocks. Please refer to
As can be seen from
It is therefore an objective of the disclosed invention to provide methods for addressing this synchronization problem.
With this in mind, a method for synchronizing audio and video data in an Audio Video Interleave (AVI) file, the AVI file comprising a plurality of audio and video chunks, is disclosed. The method comprises: determining a frame rate error of a group of consecutive main access units (GMAU) according to a video clock and an audio clock; determining a GMAU presentation time stamp (PTS) according to the frame rate error; and updating the AVI file with the GMAU PTS, so the GMAU will be played utilizing the GMAU PTS.
A second method is also disclosed. The method comprises: determining a frame rate error according to a video clock and an audio clock; and selectively adding or dropping one or a number of video or audio frames according to the frame rate error.
These and other objectives of the present invention will no doubt become obvious to those of ordinary skill in the art after reading the following detailed description of the preferred embodiment that is illustrated in the various figures and drawings.
A muxer of a recorder multiplexes audio and video chunks encoded by encoders to generate an AVI file. The video and audio may lose synchronization at playback since the audio and video chunks are generated based upon different respective clock sources. The present invention provides several methods to ensure audio and video synchronization during playback. In some embodiments, the muxer compares the audio and video time information to obtain a frame rate error, and then the AVI bitstream is adjusted in accordance with the frame rate error to ensure A/V synchronization. In other embodiments, time stamps are added to the AVI file and can be adjusted according to the frame rate error.
For example, if a system assumes that the video clock is accurate (e.g. v-sync), the audio data or the time corresponding to audio playback will be adjusted according to the video clock. On the other hand, if a system assumes that the audio clock (e.g. STC) is accurate, the video data or the time corresponding to video playback will be adjusted according to the audio clock. It is also possible for the system to select adjusting either audio or video data, or to select adjusting either audio or video playback time. For example, if the video or audio data are adjusted according to the frame rate error, the system may decide to adjust the one with a faster clock rate to avoid dropping data. The following description illustrates some embodiments of the methods for correcting the clock difference between audio and video data in an AVI file.
In a typical AVI system, video and audio encoders generate audio and video chunks, typically a video chunk is a video frame and an audio chunk contains one or more audio frames. The audio and video chunks are multiplexed by a multiplexer (muxer) and then sent to an authoring module. The video clock corresponding to a video chunk can be derived by the number of encoded frames and the duration of the encoded frame, where the number of encoded frames is determine by the number of v-sync patterns detected. The audio clock is derived by the STC. Ideally, the video clock and audio clock should be aligned at each data segment, so the start time of audio playback is equal to that of video playback for each data segment; however, as the audio and video data may be out of synchronization, audio may lead or lag the corresponding video. The data segment may be a frame or a group of frames.
In an embodiment, a frame rate error is derived by comparing the audio and video clock, and if the frame rate error is greater than one audio frame, for example the time for audio playback lags corresponding video playback by one frame length, such as 8 frames of audio data are multiplexed with 9 frames of video data, the muxer will purposely inform the authoring module that 9 frames of audio data have been multiplexed. Initially, the error will not be so serious as this, but over time the error will accumulate. When the frame rate error is equal to or greater than the duration of one frame, the content of the bitstream is adjusted to ensure A/V synchronization during playback. If the audio clock lags the video clock, the muxer may insert one audio frame or drop one video frame; if the audio clock leads the video clock, the muxer may insert one video frame or drop one audio frame. Frame insertion is usually accomplished by repeating a video or audio frame.
In some embodiments, the system first defines a Main Access Unit (MAU) consisting of interleaved audio and video chunks, for example, one MAU carries 0.5 seconds of data. A plurality of consecutive MAUs is known as a Group MAU (GMAU), and, for example, consists of approximately 5 minutes of data. A GMAU time stamp is defined as the audio and video presentation time stamp of a GMAU, and is inserted in a self-defined chunk of the AVI file. The GMAU time stamp can be used to calibrate audio and video clock difference. Rather than immediately correcting the synchronization error, the system accumulates the synchronization error over a complete GMAU. For example, as detailed above, if the total accumulated error corresponds to one audio frame period, the authoring module will notice that one extra frame of audio data has been muxed. Therefore, the observed number of muxed audio frames is equal to the actual number of audio frames +1. Once the number of muxed audio frames has been calculated by the system, a new GMAU PTS can be calculated and updated to the current GMAU, so when data in the GMAU is displayed, the video and audio will be displayed according to the new GMAU PTS.
For a clearer description of this first embodiment, please refer to
In some other embodiments of the present invention, the video clock is still utilized as a reference, but the determination of the observed number of audio frames and the actual number of audio frames is utilized for inserting or dropping video frames in order to achieve synchronization.
As in the previous embodiment, audio and video data is muxed, and the video clock is utilized as a reference for determining the frame rate error. When this error is converted into a corresponding number of frames, the AVI system will then determine to add or drop a plurality of video frames, wherein the number of added/dropped video frames directly corresponds to the frame rate error. In other words, if it takes 9 frames time to play 8 frames of audio data, the system will add an extra video frame to the AVI file so that audio video synchronization is achieved. Similarly, if it takes 7 frames time to play 8 frames of audio data, the system will drop a video frame from the AVI file.
For a clearer description of this embodiment please refer to
By utilizing the video clock as a reference, only the audio data needs to be calibrated.
Those skilled in the art will readily observe that numerous modifications and alterations of the device and method may be made while retaining the teachings of the invention.