This application claims priority under 35 U.S.C. §119(a) to Chinese Patent Application No. 201110157744.0, filed on Jun. 3, 2011, in the State Intellectual Property Office and Korean Patent Application No. 10-2012-0033995, filed on Apr. 2, 2012, in the Korean Intellectual Property Office, the disclosures of which are incorporated herein by reference.
1. Field
The exemplary embodiments relate to audio & video data processing in a multimedia file, and particularly relate to a method and device for demultiplexing audio & video data in a multimedia file.
2. Related Art
With the improvement of display technologies, a multi-thread media player has been developed.
When the demultiplexing thread module receives a multimedia file, of which the format can be AV1, MP4, 3GP, WMV or MKV, multiple data frames such as audio data frames or video data frames are cached or buffered in the multimedia file. The caching sequence of the audio data frames or video data frames is not the same as the decoding order of these data frames. Therefore, demultiplexing is required to enable the data frames in the multimedia file after demultiplexing to have the same caching sequence as the decoding order. Accordingly, the multimedia file is demultiplexed under the control of the shared control module to output an original stream audio and video frame queue, wherein the queue includes audio data frames and video data frames reordered in accordance with the audio and video decoding order. Under the control of the shared control module, the video decoding thread module decodes the audio data frames out of the original stream audio and video frame queue to obtain a video frame playing queue having a format such as YUV or RGB format, and sends it to the video playing thread module. Then, the video playing thread module plays the video in the video frame playing queue under the control of the shared control module. Under the control of the shared control module, the audio decoding thread module decodes the video data frames out of the original stream audio and video frame queue to obtain an audio frame playing queue having a format such as PCM format, and sends it to the audio playing thread module. Then, the audio playing thread module plays the audio frame playing queue under the control of the shared control module.
In the above process, the demultiplexing thread module is adapted for reordering audio data frames or video data frames of the multimedia file based on decoding time stamps of the audio data frames or video data frames in the multimedia file to get the original stream audio and video frame queue for subsequent audio and video data decoding and playing, to ensure synchronization of decoding and playing.
The multimedia file received by the demultiplexing thread module can cache audio data frames and video data frames in an interleaving form or non-interleaving form, wherein each of the audio data frames and video data frames cached has a decoding time stamp for identification. An index can be carried or not carried in the multimedia file, wherein the index identifies the byte offset location of each data frame and the size thereof for locating each data frame in the multimedia file.
When a multimedia file does not carry an index, whether the audio data frames or video data frames in the multimedia file are interleaved or not, the multimedia file is demultiplexed from front to back according to the caching sequence of each audio data frame or video data frame to get the original stream audio and video frame queue.
When a multimedia file carries an index, the location of each audio data frame or video data frame in the multimedia file can be determined according to the index. Decoding time stamps are ranked in an ascending order, and an original stream audio and video frame queue is obtained via jumping from one cache location of an audio data frame or video data frame in the multimedia file to another according to the order of the decoding time stamps ranked from the smallest to the largest.
During the process of demultiplexing the multimedia file, the excessive index locating and jumping operations, especially moving the read pointer of the multimedia file backward and forward frequently for index locating, will lower the demultiplexing speed and reduce the efficiency. Especially, when demultiplexing a remote multimedia file, the operations will bring about network load to a client where the multi-thread media player is located, and even affect subsequent decoding and normal playing of the original stream audio and video frame queue.
In view of the above, exemplary embodiments provide a method for demultiplexing audio & video data in a multimedia file, which can increase the efficiency for demultiplexing the multimedia file. The exemplary embodiments also provide a device for demultiplexing audio & video data in a multimedia file, which can increase the efficiency for demultiplexing the multimedia file. The technical schemes of the exemplary embodiments may be implemented as follows.
A method for demultiplexing audio & video data in a multimedia file, includes: setting and updating a maximum synchronization time point according to a preset maximum synchronization time; selecting an output data frame according to a comparison result between a decoding time stamp of a current data frame for each data frame channel in the multimedia file and the maximum synchronization time point in combination with a byte offset location value of the current data frame for each data frame channel; and fetching the output data frame via searching a position in the multimedia file according to the byte offset location value of the selected output data frame to obtain an original stream audio and video frame queue.
The maximum synchronization time may be preset according to a principle that the maximum number of data frames cached on the original stream audio and video frame queue after demultiplexing is greater than the number of data frames contained within the preset maximum synchronization time.
The process of setting and updating a maximum synchronization time point includes: a1. setting the decoding time stamp of the first data frame in the multimedia file as a current synchronization time point, and taking the sum of the current synchronization time point and the maximum synchronization time as the maximum synchronization time point; b1. comparing the decoding time stamp of the current data frame in each data frame channel and the maximum synchronization time point in accordance with the caching sequence of data frames in the multimedia file; c1. if the decoding time stamps of the current data frames of all the data frame channels are greater than or equal to the maximum synchronization time point, updating the current synchronization time point with the decoding time stamp of a current data frame having the most forward byte position among the current data frames of all the data frame channels, taking the sum of the current synchronization time point and the maximum synchronization time as the maximum synchronization time point and proceeding to step b1; wherein the current data frame of each data frame channel is initially the first data frame within each data frame channel, and if the current data frame of a data frame channel is subsequently output as an output data frame, the next data frame is amended to be a new current data frame.
The process of selecting an output data frame includes: a2. according to the caching sequence of data frames in the multimedia file, comparing the decoding time stamp of the current data frame of each data frame channel and the maximum synchronization time point, and if it is less than the maximum synchronization time point, identifying the current data frame as a candidate data frame for the data frame channel; if it is greater than or equal to the maximum synchronization time point, the candidate data frame within the data frame channel is null; b2. determining whether candidate data frames for all the data frame channels in the multimedia file are null, if not, comparing byte offset location values of the candidate data frames of all the data frame channels in the multimedia file, and outputting a candidate data frame with the minimum value as an output data frame; and if yes, comparing byte offset location values of current data frames for all the data frame channels in the multimedia file, and outputting a minimum data frame as the output data frame; c2. updating the next data frame to be the current data frame of the data frame channel where the output data frame locates, and proceeding to steps a2-c2 for further processing to obtain the selected output data frame.
The process of fetching the output data frame to obtain an original stream audio and video frame queue includes: a3. obtaining a byte offset location value of the output data frame in the multimedia file, wherein the byte offset location value includes a byte offset location and the number of bytes contained; b3. determining whether the sum of the byte offset location of the last output data frame and the number of bytes contained in the last output data frame is equal to the sum of the byte offset location and the number of bytes of the current output data frame, and if yes, the read pointer of the multimedia file is not moved; otherwise, searching a position within the multimedia file, and moving the read pointer of the multimedia file to the byte offset location of the current output data frame; c3. reading the current output data frame from a position pointed by the read pointer of the multimedia file, wherein the size of data being read is the number of bytes of the current output data frame, and outputting the current output data frame; c4. recording the byte offset location and the number of bytes of the current output data frame and taking the current output data frame as the last output data frame, and then taking the next output data frame as the current output data frame and proceeding to steps a3-b3 for further processing to get the original stream audio and video frame queue.
A device for demultiplexing audio & video data in a multimedia file, including: a setting unit, a comparing unit and an output unit; wherein the setting unit is adapted for setting and updating a maximum synchronization time point according to a preset maximum synchronization time; the comparing unit is adapted for choosing an output data frame according to a comparison result between a decoding time stamp of a current data frame for each data frame channel in the multimedia file and the maximum synchronization time point obtained from the setting unit in combination with a byte offset location value of a current data frame for each data frame channel, and sending the output data frame to the output unit; and the output unit is adapted for searching a position within the multimedia file according to the byte offset location value of the output data frame received from the comparing unit, and fetching the output data frame to get an original stream audio and video frame queue.
It can be seen from the above schemes that, an aspect of an exemplary embodiment makes use of a technical feature that a demultiplexed multimedia file caches multiple data frames in an FIFO way in the decoding process. When demultiplexing a multimedia file, the caching sequence of data frames obtained and their decoding order may be different, and it is ensured that the time stamp of the last data frame among the data frames currently cached is less than or equal to the sum of the time stamp of the first data frame and the preset cache time for caching multiple data frames. In this way, an audio data frame or video data frame to be decoded can be found from the data frames currently cached in a follow-up decoding process, and the synchronization performance of decoding can be guaranteed. Based on this principle, an aspect of an exemplary embodiment sets a maximum synchronization time, wherein the number of data frames cached within the maximum synchronization time is less than or equal to the number of data frames cached in the demultiplexing process. The maximum synchronization time point is updated in real time according to the preset maximum synchronization time and decoding time stamps of current data frames of each data frame channel in the multimedia file. Then, a current output data frame is selected according to a comparison result between decoding time stamps of current data frames for each data frame channel in the multimedia file and the maximum synchronization time point in combination with byte offset location values in the index. Finally, the original stream audio and video frame queue is obtained via jumping in the multimedia file according to the offset value of the current output data frame in the index. Thus, the number of times for performing index locating and jumping is decreased when transferring a multimedia file into an original stream audio and video frame queue. Therefore, the method and device provided may improve the efficiency for demultiplexing a multimedia file.
In order to make the purpose, technical schemes and advantages of the present invention more clear, the present invention is further described in detail hereinafter with reference to drawings and exemplary embodiments.
It can be seen from the existing technology that, the reason jumping is required so many times during demultiplexing to get the original stream audio and video frame queue for a multimedia file caching data frames in a non interleaving form, or a multimedia file caching data frames in an interleaving form while not caching data frames according to decoding time stamps is that: the original stream audio and video frame queue is obtained via ranking data frames of the multimedia file according to their decoding time stamps from the smallest to the largest. The original stream audio and video frame queue obtained in this manner may require many index locating and jumping operations when compared with a multimedia file without multiplexing, which will result in lowering the demultiplexing speed and reducing the efficient.
To address the aforementioned problems, a demultiplexed multimedia file caches multiple data frames according to a first in first out (FIFO) method in the decoding process. When demultiplexing a multimedia file, the caching sequence of data frames obtained and their decoding order may be different, and it is ensured that the time stamp of the last data frame among the data frames currently cached is less than or equal to the sum of the time stamp of the first data frame and the preset cache time for caching multiple data frames. In this way, an audio data frame or video data frame to be decoded can be found from the data frames currently cached in a follow-up decoding process, and the synchronization performance of decoding can be guaranteed.
The data frames cached in the multimedia file include audio data frames and video data frames, and one method of caching is to cache several audio data frames subsequent to several video data frames and then in turn cache video data frames and audio data frames. Another method of caching is to first cache several audio data frames and then cache several video data frames. For the purpose of description, multiple audio data frames cached together or multiple video data frames cached together are called a data frame channel, such as an audio data frame channel or a video data frame channel. There are multiple audio data frame channels and multiple video data frame channels in a multimedia file, which are called multiple data frame channels in general.
Step 301: Set and update a maximum synchronization time point according to a preset maximum synchronization time.
In this step, the preset maximum synchronization time is the longest synchronization time allowable to sequentially read data frames from a multimedia file. The maximum synchronization time can be set based on the configuration of a multi-thread media player, and a setting principle is that the maximum number of data frames cached in the original stream audio and video frame queue after demultiplexing is greater than the number of data frames contained in the preset maximum synchronization time.
The process of updating the maximum synchronization time point in real time includes the following steps.
Step 3011: Set the decoding time stamp of the first data frame in a multimedia file as a current synchronization time point, and take the sum of the current synchronization time point and the maximum synchronization time as the maximum synchronization time point.
Step 3012: Compare the decoding time stamp of a current data frame in each data frame channel and the maximum synchronization time point in accordance with the caching sequence of data frames in the multimedia file.
Step 3013: If the decoding time stamps of the current data frames of all the data frame channels are greater than or equal to the maximum synchronization time point, update the current synchronization time point with the decoding time stamp of a current data frame having the most forward byte position among all the current data frames of the data frame channels, take the sum of the current synchronization time point and the maximum synchronization time as the maximum synchronization time point and proceed to Step 3012.
Initially, the current data frame of each data frame channel is the first data frame within each data frame channel. If the current data frame of a data frame channel is output as an output data frame, the next data frame is amended to be a new current data frame.
Step 302: Choose an output data frame according to a comparison result between the decoding time stamp of a current data frame within each data frame channel in the multimedia file and the maximum synchronization time point in combination with the order of byte offset location values of current data frames for each data frame channel.
Step 303: Search a position within the multimedia file according to the byte offset location value of the output data frame chosen, and fetch the output data frame to get an original stream audio and video frame queue.
In the process of
Step 3021: According to the caching sequence of data frames in the multimedia file, compare the decoding time stamp of the current data frame of each data frame channel and the maximum synchronization time point. If it is less than the maximum synchronization time point, proceed to step 3022; if it is greater than or equal to the maximum synchronization time point, proceed to step 3023.
Step 3022: Identify the data frame as a candidate data frame for the data frame channel, and proceed to step 3024.
Step 3023: If there is no candidate data frame within the data frame channel, proceed to step 3024.
Step 3024: Determine whether there is no candidate data frame for all the data frame channels in the multimedia file, if not, proceed to step 3025; and if yes, proceed to step 3026.
Step 3025: Compare byte offset location values of candidate data frames of all data frame channels in the multimedia file, output a candidate data frame with the minimum value as an output data frame, and proceed to step 3027.
Step 3026: Compare byte offset location values of current data frames for all the data frame channels in the multimedia file, and output a minimum data frame as the output data frame and proceed to step 3027.
Step 3027: Update the next data frame to be the current data frame of the data frame channel where the output data frame locates, that is, add 1 to the frame number of the current data frame of the data frame channel where the output data frame is located, and reiterate steps 3021 to 3027 for further processing.
In
Step 3031: Obtain a byte offset location value of the current output data frame in the multimedia file.
In this step, the multimedia file carries an index, and the byte offset location of each data frame in the multimedia file. The number of bytes contained in each data frame is indicated in the index, which are called a byte offset location value capable of being acquired from the index.
Step 3032: Determine whether the sum of the byte offset location of the last output data frame and the number of bytes contained in the last output data frame is equal to the sum of the byte offset location and the number of bytes of the current output data frame, if so, the read pointer of the multimedia file is not moved; otherwise, search for a position within the multimedia file, and move the read pointer of the multimedia file to the byte offset location of the current output data frame.
Step 3033: Read the current output data frame from a position pointed to by the read pointer of the multimedia file, wherein the size of data being read is the number of bytes of the current output data frame, and then output the current output data frame.
Step 3034: Record the byte offset location and the number of bytes of the current output data frame and take the current output data frame as the last output data frame. Then, take the next output data frame as the current output data frame and reiterate steps 3031-3034 for further processing, in order to ultimately get the original stream audio and video frame queue.
The setting unit is adapted for setting and updating a maximum synchronization time point according to a preset maximum synchronization time.
The comparing unit is adapted for choosing an output data frame according to a comparison result between a decoding time stamp of a current data frame for each data frame channel in the multimedia file and the maximum synchronization time point obtained from the setting unit in combination with an order of byte offset location values of current data frames for each data frame channel, and sending the output data frame to the output unit. The output unit is adapted for searching a position within the multimedia file according to the byte offset location value of the output data frame received from the comparing unit, and fetching the output data frame to get an original stream audio and video frame queue.
A specific example is given hereinafter for illustrating a method according to an exemplary embodiment in more detail.
Step 501: A multi-thread media player reads a multimedia file, parses the index of the multimedia file and the header of each data frame channel in the multimedia file, and initializes all variables.
Step 502: The multi-thread media player sets a maximum synchronization time (Tsync).
Step 503: Set the first data frame of each data frame channel as the current data frame of the data frame channel. In this step, i is used for denoting the ith data frame channel among all the data frame channels, wherein i=1, 2, 3 . . . .
Step 504: Set the decoding time stamp of the current data frame of a data frame channel with the smallest byte offset location among all the data frame channels, i.e. DTS, as the current synchronization time point (Tcur).
Step 505: Calculate a current maximum synchronization time point (Tmax) according to the formula: Tmax=Tsync+Tcur.
Step 506: Compare the decoding time stamp of the current frame data of each data frame channel (DTSicur) with the current maximum synchronization time point Tmax, if DTSicur≦Tmax, then the current data frame is identified as a candidate data frame of the ith data frame channel; otherwise, the candidate data frame of the ith data frame channel is null.
Step 507: Compare byte offset location values of candidate data frames of all the data frame channels of which the candidate data frames are not null, and determine the data frame with the smallest byte offset location value as the current output data frame.
Step 508: If the candidate data frame of each data frame channel is null, compare the byte offset location of the current data frames of all the data frame channels, i.e. Licur (i=1, 2, 3 . . . ), and record the smallest as the byte offset location of the kth data frame channel (Lkcur); update the current synchronization time point Tcur with the decoding time stamp of the current data frame of the kth data frame channel, i.e. Tcur=DTSkcur, and re-calculate the maximum synchronization time point according to the sum of the current synchronization time point and the maximum synchronization time, i.e. Tmax=Tsync+Tcur; set the current data frame of the kth data frame channel as the output data frame.
Step 509: Update the next data frame to be the current data frame of the data frame channel (i.e. the kth data frame channel) where the output data frame is located, and add 1 to the frame number of the current data frame of the data frame channel.
Step 510: Compare the byte offset location (Lcur) of the current output data frame and the sum of the byte offset location (Lprev) of the last output data frame and the number of bytes it contains (Sprev), if Lprev+Sprev≠Lcur, search for a position in the multimedia file, that is, move the read pointer of the multimedia file to the byte offset location Lcur of the current output data frame; otherwise, the read pointer of the multimedia file is not moved, i.e., no search is needed between two adjacent data frames.
Step 511: Read the current output data frame from the position pointed by the read pointer of the multimedia file, wherein the size of data being read equals to the number of bytes contained in the current output data frame, i.e. Scur, and output the data being read as the current output data frame.
Step 512: Record the byte offset location and the number of bytes of the current output data frame, and update Lprev=Lcur and Sprev=Scur.
Step 513: Demultiplex each of the remaining data frames by repeatedly performing steps 506-511 until the original stream audio and video frame queue is obtained.
The original stream audio and video frame queue shown the second line of
It should be noted that the adaptive interleaving characteristic of the exemplary embodiments are not to select from the multimedia file a fixed number of audio data frames or video data frames continuously read, but to make the determination adaptively according to the caching sequence of audio data frames or video data frames in the multimedia file and a comparison result between decoding time stamps and the maximum synchronization time point. Therefore, the method provided herein is applicable for demultiplexing a multimedia file in an interleaving form, in a non-interleaving form, or in an improperly interleaved form. As shown in
It can be seen from the exemplary embodiments that, the an effect of the exemplary embodiments to the multi-thread media player is to greatly reduce the number of times for the multi-thread media player to perform search locating and jumping when using the index to demultiplex a multimedia file, wherein the number of jumps is reduced by 65% or more and up to 100%. Therefore, the exemplary embodiments can improve the speed and efficiency of demultiplexing a multimedia file. Especially, when playing a non-interleaving multimedia file containing an index under a specific network environment (such as DLNA) or in a CD-ROM, the multimedia file cannot be played normally and smoothly due to the slow demultiplexing speed by use of the existing technology, while the multimedia file can be played normally and smoothly by use of the exemplary embodiments. In addition, the exemplary embodiments also have a high adaptability to a variety of multimedia file formats including an index, such as AV1, MP4, MOV, 3GP, ASF, and MKV, etc.
While not restricted thereto, an exemplary embodiment can be embodied as computer-readable code on a computer-readable recording medium. The computer-readable recording medium is any data storage device that can store data that can be thereafter read by a computer system. Examples of the computer-readable recording medium include read-only memory (ROM), random-access memory (RAM), CD-ROMs, magnetic tapes, floppy disks, and optical data storage devices. The computer-readable recording medium can also be distributed over network-coupled computer systems so that the computer-readable code is stored and executed in a distributed fashion. Also, an exemplary embodiment may be written as a computer program transmitted over a computer-readable transmission medium, such as a carrier wave, and received and implemented in general-use or special-purpose digital computers that execute the programs. Moreover, while not required in all aspects, one or more units of the device for demultiplexing audio and video data of the multimedia file can include a processor or microprocessor executing a computer program stored in a computer-readable medium.
While the inventive concept has been particularly shown and described with reference to exemplary embodiments thereof, it will be understood by those of ordinary skill in the art that various changes in form and details may be made therein without departing from the spirit and scope of the inventive concept as defined by the following claims.
Number | Date | Country | Kind |
---|---|---|---|
201110157744.0 | Jun 2011 | CN | national |
10-2012-0033995 | Apr 2012 | KR | national |