The present invention relates to a data reproduction device which demultiplexes data such as video and audio multiplexed in a bitstream, and decodes and reproduces such data.
In recent years, with the increase in capacity of storage media and communication networks or the advance of data transmission technology, devices and services involving coded multimedia data such as video and audio have come into wide use. For example, in the broadcasting sector, broadcasting of digitally coded media data has replaced conventional analog broadcasting. Although the current digital broadcasting is directed only to landline receivers, broadcasting for mobile devices such as cellular phones is scheduled to commence. In the communication sector, for example, video distribution services for third generation cellular phones have started, and an environment for handling multimedia data has been created not only on landline terminals but also mobile terminals. Against such backgrounds, it is expected that multimedia will be used increasingly in various manners, in which, for example, content data received via broadcasting or the Internet is recorded in a memory card such as a secure digital (SD) card or an optical disk such as a digital versatile disk-rewritable (DVD-RAM) and shared between devices.
Here, the Advanced Audio Coding (AAC) standard developed by the Moving Picture Expert Group (MPEG) is taken as a typical example of audio data coding format, which is widely used in digital broadcasting, video distribution services for the third generation cellular phones, and the like.
Generally in coding of audio data, the upper limit of the frequency band for reproduction is lowered as the compression ratio increases, and thus the sound quality degrades accordingly. This is because not enough bits are allocated to coding of high frequency components. So, in order to recover the missing high frequency components, a technique called Spectral Band Replication (SBR) for generating high frequency components through artificial extension of bandwidth has been developed. To be more specific, by performing bandwidth extension processing on coded data using supplementary information stored in a stream for estimating high frequency components from low frequency components, it becomes possible to reproduce high quality sound from such coded data even if it is compressed at a higher ratio and thus at a lower bitrate. Here, assuming that AAC coded data included in data of one frame is called basic data, frame data is made up of such basic data and SBR data. With the SBR tool, double the bandwidth of the basic data can typically be reconstructed, and therefore, for example, output data of 32 kHz can be obtained from basic data of 16 kHz. Note that a coding format enhanced by adding a SBR function to the conventional AAC is called AAC-plus. Here, an AAC-plus frame which does not include SBR data is decoded as data in AAC format. Since AAC-plus is compatible with AAC, a decoding unit for AAC-plus can decode coded data in AAC format. A decoding unit for AAC can also decode only basic data by skipping the reading of SBR data in AAC-plus. In the following description, AAC-plus denotes a coding format including both MPEG-2 and MPEG-4 in a comprehensive manner, while MPEG-2 AAC and MPEG-4 AAC denote separate coding formats.
As described above, since AAC-plus is particularly effective at a lower bitrate, it is expected to be expanded to services for mobile devices. For example, it is to be used for third generation mobile terminals, digital terrestrial broadcasting for mobile devices, or the like. Note that MPEG-2 AAC is used in digital terrestrial broadcasting for mobile devices.
When carrying coded data in AAC or AAC-plus format via a TS packet, the frames of the coded data are carried after being converted to audio data transport stream (ADTS) frames in MPEG-2 format.
Next, recording of digital terrestrial broadcasts for mobile devices received on a mobile terminal is described. With the commencement of digital broadcasting for mobile terminals, broadcasts are supposed to be recorded. An MP4 file format (hereinafter referred to as MP4) is expected to be used as a multiplexing format for recording them, from a standpoint of ensuring interconnectability with the third generation mobile terminals. Here, MP4 is a file format standardized by ISO/IEC JTC1/SC29/WG 11, and is adopted in Transparent end-to-end packet switched streaming service (TS26.234) defined, as a wireless video distribution standard, by the Third Generation Partnership Project (3GPP), which is an international standardization organization aimed at standardization of a third generation mobile communications system. In the 3GPP standard, MPEG-4 AAC is used as AAC. Since MPEG-4 AAC has backward compatibility with MPEG-2 AAC, a terminal which is compliant with MPEG-4 AAC can correctly decode and reproduce MPEG-2 AAC coded data. Even a terminal which is compliant only with MPEG-2 AAC can also correctly decode and reproduce MPEG-4 AAC coded data if the data is coded without using a function specific to MPEG-4 AAC.
Description is given below regarding a method for multiplexing AU data in MP4. Here, AU is equivalent to one picture in a video sequence or one frame in an audio sequence. In MP4, media data is handled in units of samples. One sample is equivalent to one AU, and sample numbers which are incremented one by one in decoding time order are assigned to respective samples. Furthermore, header information and media data per sample is managed in units of objects called Boxes.
size: total size of a Box including a size field
type: identifier of a Box and typically represented by four alphabetical letters. A field length is 4 bytes, and a Box in an MP4 file is searched while judging whether or not data of consecutive 4 bytes matches the identifier stored in the type field.
version: version number of a Box
flags: flag information set for each Box
data: header information and media data are stored therein
Note that since “version” and “flags” are not mandatory fields, some Boxes do not contain these fields. Identifiers of type fields are used in referring to Boxes in the following description. For example, the Box whose type is “moov” is called “moov”. The Box structure in the MP4 file is shown in
In addition, since a conventional brand defined for each operational standard such as SD is used, it is not possible to judge from the brand stored in “ftyp” whether or not digital terrestrial broadcast data is recorded in the MP4 file.
The header separation unit 1001 separates the header from the MP4 file, outputs, to the input frequency obtainment unit 1002, the header information Hdr including at least information indicating an audio sampling frequency, and outputs the sample data separated from “mdat” to the decoding unit 1003. Here, in AAC-plus, the frequency of the basic data is indicated as a sampling frequency. The input frequency obtainment unit 1002 analyzes the header information Hdr, obtains the input frequency FSin that is the frequency of the basic data, and outputs it to the decoding unit 1003. The decoding unit 1003 decodes the sample data SplDat based on the input frequency FSin, and outputs, to the output unit 1004, the decoded frame Fdata which is the decoding result and the output frequency FSo which is the sampling frequency of the decoded frame Fdata. The output unit 1004 outputs the decoded frame Fdata in accordance with the output frequency FSo.
However, in the conventional data reproduction device 1000, since the output unit 1004 obtains the output frequency FSo of the decoded frame Fdata after decoding the sample data SplDat, it has the following problem.
In this case, since the sampling frequency of the decoded frame Fdata is switched at the positions of reproduction time, 10 seconds and 20 seconds, the output unit 1004 needs to perform the processing for switching the output frequency FSo at those timings. It takes a certain period of time to switch the output frequency FSo, which results in a problem that reproduction is interrupted at the switching position 1100.
Therefore, the present invention has been conceived in view of the above-mentioned problem. An object of the present invention is to provide a data reproduction device that can achieve seamless reproduction of a stream at the positions in the stream at which the validity of the bandwidth extension function is switched.
In order to achieve the above object, the data reproduction device according to the present invention is a data reproduction device which reproduces a coded stream including pieces of frame data obtained by coding audio data, and bandwidth extension information used for extending a reproduction frequency band of part of the pieces of frame data, and this data reproduction device includes: an obtainment unit which obtains a basic sampling frequency of the pieces of frame data from the coded stream; a determination unit which determines, based on the basic sampling frequency, an output sampling frequency at which the pieces of frame data should be reproduced to be a sampling frequency to which the reproduction frequency band of the part of the pieces of frame data is extended using the bandwidth extension information; and a decoding unit which decodes the pieces of frame data at the basic sampling frequency, and in the case where the output sampling frequency is different from the basic sampling frequency, extends the reproduction frequency band of the part of the decoded pieces of frame data using the bandwidth extension information, and upsamples the basic sampling frequency of the other part of the decoded pieces of frame data to the output sampling frequency. With this configuration, the data reproduction device of the present invention can keep the output sampling frequency constant even if the validity of the bandwidth extension function is switched in a stream that is made up of plural pieces of frame data, and thus can realize seamless reproduction of the stream at the positions at which the validity of the bandwidth extension function is switched.
The above-mentioned determination unit may determine the output sampling frequency to be the sampling frequency to which the reproduction frequency band of the part of the decoded pieces of frame data is extended using the bandwidth extension information, in the case where the basic sampling frequency is a predetermined value or lower.
The above-mentioned determination unit may determine the output sampling frequency to be the sampling frequency to which the reproduction frequency band of the part of the decoded pieces of frame data is extended using the bandwidth extension information, only in the case where the basic sampling frequency is a specific value.
The obtainment unit may obtain, from the coded stream, identification information indicating a possibility that the coded stream includes both the frame data having the bandwidth extension information and the frame data not having the bandwidth extension information, and the determination unit may determine the output sampling frequency based on the basic sampling frequency and the identification information. Accordingly, for example, in the case where there is no possibility that the first frame data includes both the part having the corresponding second frame data and the part not having such second frame data, the output sampling frequency can easily be determined.
Note that the present invention can be implemented not only as the above-described data reproduction device, but also as a data reproduction method including, as steps, the characteristic units of such a data reproduction device, or as a program causing a computer to execute these steps. Also, such a program can be distributed via a recording medium such as a CD-ROM, or a transmission medium such as the Internet.
The data reproduction device of the present invention can keep the output sampling frequency constant even if the validity of the bandwidth extension function is switched in a stream, and thus can realize seamless reproduction of the stream at the positions in the stream at which the validity of the bandwidth extension function is switched.
The embodiments of the present invention will hereinafter be described with reference to the attached drawings.
The difference between the present invention and the conventional data reproduction device 1000 is that the former decodes sample data SplDat so that the sampling frequency of decoded frame Fdata is kept constant even at the switching positions of the validity of the SBR function. The following description mainly focuses on the differences in the processes between the present invention and the conventional data reproduction device.
The input frequency obtainment unit 2001 analyzes the header information Hdr, obtains the input frequency (basic sampling frequency) FSin which is the frequency of the basic data, and outputs it to the decoding unit 2002. The output frequency determination unit 2002 performs predetermined processing based on the input frequency FSin, determines the output frequency (output sampling frequency) FSout which is the sampling frequency of the decoded frame Fdata, and outputs it to the decoding unit 2003 and the output unit 2004. The decoding unit 2003 decodes the sampling data SplDat, and upsamples the decoding result of SplDat if necessary so as to match the sampling frequency of the decoded frame Fdata to FSout. If the SBR function is valid in a frame to be decoded, the decoding unit 2003 obtains SBR data (bandwidth extension information), and performs bandwidth extension through SBR processing on the decoding result of the basic data decoded at the input frequency FSin so as to match the sampling frequency to the output frequency FSout. The output unit 2004 outputs the decoded frame Fdata at the frequency which is identical to the output frequency FSout. Here, the output unit 2004 can obtain the output frequency FSout prior to the input of the decoded frame Fdata.
Here, the processing for determining the output frequency FSout in Step 1003 may be performed only when the reproduction starts.
Furthermore, the processing in Step 1002 and Step 1004 may also be performed when necessary. For example, in MP4, an input frequency FSin can be changed per sample entry, but the input frequency FSin is constant in a track if only one sample entry is included in the track. Therefore, Step 1002 and Step 1004 need to be performed only when reproduction of the track starts. On the other hand, in the case where an input frequency FSin is attached to each AAC-plus frame, for example, an AAC-plus stream stored in an ADTS frame is carried by a TS, Step 1002 and Step 1004 may be performed per frame. In this case, the processing for separating the header and the payload of the ADTS frame corresponds to Step 1001. Also, when TS-packetized AAC or AAC-plus data is reproduced, Step 1002 and Step 1004 may be performed per specified unit of switching the input frequency FSin if the unit is specified by separately obtained information.
Note that as for whether the SBR function is valid or not in a sample, the input frequency obtainment unit 2001 may determine it, or the output frequency 2002 may determine it by analyzing the header information Hdr, or the decoding unit 2003 may determine it by analyzing the sample data. If it is obtained from the header information Hdr, information of a sample entry in a track where the AAC-plus coded data is stored can be used. If whether SBR is valid or not is indicated in AAC-plus coded data by a brand or the like of an MP4 file, such information may be used.
In Step 1007, the decoding unit 2003 performs bandwidth extension through SBR processing of the decoding result of the basic data decoded at the input frequency FSin, so as to match the sampling frequency to the output frequency FSou, and the process goes to Step 1009. In Step 1008, the decoding unit 2003 decodes the sample data at the input frequency FSin, and the process goes to Step 1009. Finally, in Step 1009, the output unit 2004 reproduces the result outputted from the decoding unit obtained in each of Step 1006, Step 1007 and Step 1008.
Note that if the frequency of the basic data is fixed in accordance with the standard or in the actual operation, the processes in Step 1004 and Step 1008 may be omitted.
Next, an operation for determining the output frequency FSout in Step 1003 is described with reference to
Note that in Step 1101, the processing may be switched based on whether the input frequency is a predetermined value or not. In addition, in Step 1103, the output frequency FSout may be set to a value different from a value double the input frequency FSin, or may be set to a predetermined value. Furthermore, the predetermined value in Step 1101 may be a value other than 24 kHz, depending on a service.
An application of the operations of the above-described data reproduction device 2000 is described hereinafter.
MP4 is adopted in various operational standards, but in some operational standards, it is fixed whether SBR can be validated or not on an AAC-plus track stored in an MP4 file. More specifically, if SBR can be validated, the validity of the SBR function may be switched within a track, but if SBR is invalid, the SBR function is invalid in all the frames within the track.
Note that the processing for determining the output frequency FSout in Step 1003 may be switched based on an identifier indicating attribute information of an MP4 file such as a brand.
First, the information in “moov” indicating the coding format of an audio track indicates that the coding format is MPEG-4 AAC. Furthermore, since it can be indicated whether or not there is a possibility that a sample having a valid SBR function exists in an MPEG-4 AAC track when such a track is stored in an MP4 file, it is indicated in the relevant field that there is a possibility that such a sample having a valid SBR function exists. To be more specific, “sbrPresentFlag” which is a flag indicating whether or not SBR data is included in MPEG-4 AAC coded data is set to “1” or “−1” in a sample entry in “stsd”. If “sbrPresentFlag” is “1”, it is explicitly indicated that SBR data may be included; while if “sbrPresentFlag” is “−1”, it is not explicitly indicated from outside of the coded data whether SBR data is included or not. Therefore, in Step 1201, the process may go to Step 1101 if a “1seg” brand exists in “compatible-brand” in Step 1201, or the process may go to Step 1101 only when the “1seg” brand exists and “sbrPresentFlag” is “1” or “−1”. In addition, the process may go to Step 1101 if “sbrPresentFlag” is “1” or “−1”. Note that the present invention can be implemented assuming that SBR is always valid when “sbrPresentFlag” is “1”.
In the following description, it is assumed that the number of channels of decoded data Fdata is kept constant. However, the processing for keeping the output of the decoding unit 2003 constant may be performed for only one of the sampling frequency and the number of channels.
In the input MP4 file, the maximum value FSmax of the sampling frequency and the maximum value CHmax of the number of channels of the sample in an audio track are indicated. It is assumed here that the sampling frequency and the number of channels stored in the sample entry of the audio track respectively indicate the maximum value FSmax of the sampling frequency and the maximum value CHmax of the number of channels.
First, in Step 1301, the audio sample entry is analyzed to obtain the maximum value FSmax of the sampling frequency and the maximum value CHmax of the number of channels, and these values are inputted to the decoding unit 2003. In Step 1302, the decoding unit 2003 judges whether the maximum sampling frequency value FSmas is different from the sampling frequency of a sample FSspl, and if they are different from each other, the process goes to Step 1303; while if they are identical to each other, the process goes to Step 1306. Here, when the SBR function is valid in the sample, the sampling frequency FSspl is assumed to indicate the sampling frequency after the bandwidth extension. In Step 1303, the decoding unit 2003 judges whether the maximum number of channels value CHmas is different from the number of channels of the sample CHspl, and if they are different from each other, the process goes to Step 1304; while if they are identical to each other, the process goes to Step 1305. In Step 1304, first, the decoding unit decodes the sample data assuming that the sampling frequency is FSspl and the number of channels is CHspl. Then, as for the decoding result, the decoding unit upsamples the sampling frequency to the maximum sampling frequency value FSmax, converts the number of channels into the maximum number of channels value FSmax, and then outputs them. Here, for example, when monaural sound is converted into stereo sound, the number of channels is converted in such a manner that one channel is converted into two channels of stereo data, both of which are made up of the identical data. On the other hand, in Step 1305, first, the decoding unit decodes the sample data assuming that the sampling frequency is FSspl and the number of channels is CHspl. Then, as for the decoding result, the decoding unit upsamples the sampling frequency to the maximum sampling frequency value FSmax but does not convert the number of channels CHspl, and outputs them.
Furthermore, in Step 1306, the decoding unit 2003 judges whether the maximum number of channels value CHmax is different from the number of channels of the sample CHspl, as in Step 1303, and if they are different from each other, the process goes to Step 1307; while if they are identical to each other, the process goes to Step 1308. In Step 1307, first, the sample data is decoded assuming that the sampling frequency is FSspl and the number of channels is CHspl. Then, as for the decoding result, the decoding unit does not upsample the sampling frequency but converts the number of channels CHspl to the maximum number of channels value FSmax, and outputs them. On the other hand, in Step 1308, the decoding unit decodes the sample data assuming that the sampling frequency is FSspl and the number of channels is CHspl, and outputs them. In other words, the output frequency FSout is identical to the sampling frequency FSspl of the sample, and the output number of channels CHout is identical to the number of channels CHspl of the sample.
Note that the maximum sampling frequency value FSmax and the maximum number of channels value CHmax may be stored in a place other than a sample entry, by providing a special Box, for example.
Note that although one-segment broadcast has been described above, the AAC or AAC-plus coded data to be received is not limited to one-segment broadcast, and it may be the data received via the Internet. Furthermore, the above-mentioned method can be applied to the case where packet data received via broadcasting or the Internet is reproduced and then recorded.
In addition, a recording medium is not limited to an SD card, and it may be other nonvolatile memory, a hard disk, and the like.
A method has been described for keeping the output sampling frequency or the output number of channels constant, and thus preventing degradation of reproduction quality such as interrupted reproduction, noise, and the like, which may occur when these parameters are switched. Other methods for preventing degradation of reproduction quality are described hereinafter.
A first method can reduce acoustic discomfort by using a special effect at the parameter-switching position. For example, if sound volume is gradually decreased before the parameter-switching position while it is gradually increased after the switching position so that the sound volume becomes low at the switching position, it is possible to reduce interruption of reproduction and noise. Using this method, the switching position needs to be specified in advance. When a file is reproduced, the switching position can be specified in advance by analyzing the header information of the file. In the case where the switching position can not be specified based on the header information of the file, or when the file is reproduced while receiving data, it is possible to reproduce data of a predetermined number of frames while buffering them so as to judge whether or not the switching position exists in the buffered frames. Furthermore, even if the switching position can not be specified in advance, if the parameter-switching position is detected when the decoding unit decodes a frame, the sound volume of the frame may be decreased and the sound volume of the subsequent frames may be gradually increased.
As a second method, in the case where the sampling frequency is switched only under a specific condition such as a switching position of the number of channels, a file may be reproduced based on the parameters even at the switching position of the sampling frequency and the like. For example, sometimes in broadcasting, only commercial parts are 2-channel while other parts are monaural. This is because contents are discontinuous between a program and a commercial, and therefore there are cases where it can be considered that degradation of reproduction quality caused by the parameter switching is not acoustically noticeable.
Note that the present embodiment has been described taking as an example the case where an MP4 file including an AAC-plus track is inputted to the data reproduction device 2000, but the present invention is not limited to this case. For example, it is also possible to apply the present invention to the case where a TS of MPEG-2 data of one-segment broadcasting is received and reproduced. In this case, the input frequency obtainment unit 2001 has only to obtain the sampling frequency, the number of channels and the like of audio data stored in the payload, from the header of the ADTS frame, as shown in
Here, a system using the data reproduction device as shown in the above first embodiment is described.
It is also possible to receive a TS packet stream on the disk recorder ex104, convert it to an MP4 file, and record the file on an optical disk such as an SD card, a DVD or the like, or a hard disk. The recorded MP4 file may be downloaded or distributed by pseudo-streaming to a cellular phone or a personal computer not shown in the diagram.
When a TS packet stream distributed from a content server ex102 via the Internet is received on the cellular phone ex105 or the disk recorder ex104, an MP4 file can also be used as in the case where the above-mentioned broadcast data is received.
The data reproduction device of the present invention can also be applied to the case where not only a TS but also data transmitted by a protocol such as Real-time Transport Protocol (RTP) used for streaming distribution on the Internet is recorded in MP4 file format.
By recording a program for implementing the data reproduction method in the data reproduction device as shown in each of the above-mentioned embodiments, on a recording medium such as a flexible disk and the like, it becomes possible to perform the processing as shown in the above embodiments easily in an independent computer system.
In addition,
Note that the above description is made on the assumption that a recording medium is a flexible disk, but the same processing can also be performed using an optical disk. In addition, the recording medium is not limited to these disks, but any other mediums such as an IC card, a ROM cassette, and the like can be used in the same manner if only a program can be recorded on them.
Furthermore, each functional block in the block diagram shown in
The name used here is LSI, but it may also be called IC, system LSI, super LSI, or ultra LSI depending on the degree of integration.
Moreover, ways to achieve integration are not limited to the LSI, and a special circuit or a general purpose processor and so forth can also achieve the integration. Field Programmable Gate Array (FPGA) that can be programmed after manufacturing LSI or a reconfigurable processor that allows re-configuration of the connection or configuration of LSI can be used for the same purpose.
In the future, with advancement in semiconductor technology, a brand-new technology may replace LSI. The integration of the functional blocks can be carried out by that technology. Application of biotechnology is one such possibility.
When reproducing a stream storing audio data on which attribute information, such as presence or absence of a bandwidth extension function, the sampling frequency, the number of channels, and the like, is switched in the middle of reproduction, the data reproduction device according to the present invention achieves seamless reproduction of such a stream even at the switching positions of the attribute information, and therefore is of great value particularly for devices such as a mobile terminal, a car navigation system, and the like which receive digital broadcasts.
Number | Date | Country | Kind |
---|---|---|---|
2005-049052 | Feb 2005 | JP | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/JP2006/303473 | 2/24/2006 | WO | 00 | 10/18/2006 |