The present invention relates to an audio encoding apparatus, an audio decoding apparatus, and an audio encoded information transmitting apparatus, and particularly to a technique for efficiently encoding an audio signal into a small amount of information while responding to changes in reproduction speed during listening, and for decoding encoded information.
The objective of audio encoding is compression encoding a digitalized signal as effectively as possible, transmitting this, and reproducing an audio signal of the highest possible quality through the decoding by a decoder.
Various methods have been proposed as audio encoding methods, depending on the conditions such as the type of the signal to be encoded, the bit rate, and required sound quality. For example, MPEG-4 Audio which is an ISO/IEC standard specification (see Non-patent Reference 1) discloses encoding methods such as Advanced Audio Coding (AAC), Code Excited Linear Prediction (CELP), and HVXC (Harmonic Vector eXcitation Coding). In particular, the AAC method is an excellent method that can encode, with high quality (at par with compact disc audio, for example), a general audio signal that contains music, and is characterized in utilizing a time-frequency transformation called Modified Discrete Cosine Transform (MDCT). These encoding methods are widely used in communication, broadcasting, and accumulation-type audio devices.
On the other hand, in the listening/viewing of broadcast or accumulated audio or audio/video composite information, there is an increasing demand for making reproduction speed during listening/viewing variable. With the increased capacity of information accumulation means and diversification of information obtainment methods, the amount of information that can be viewed/listened to by an individual has increased dramatically. Therefore, a high-speed reproduction function for viewing/listening to more information within a limited time is important.
As a method for variable-speed reproduction of an audio signal, there is a first method which cancels and inserts a pitch waveform, based on the pitch cycle of a temporal audio signal (see Patent Reference 1), and a second method which, after the parameter transformation of an audio signal, changes the update cycle of the parameters (see Patent Reference 2). However, as a processing method for a high-quality input signal, the use of the pitch cycle-based temporal signal processing in the former is common. This is because the second method is only used in low-quality speech, and is not suitable for a high-quality signal.
An example of the configuration of an audio decoding apparatus for realizing variable-speed reproduction of an audio signal encoded using an MDCT-based audio encoding method is shown in
As shown in
An input bitstream 9908 is separated into respective code elements by the bitstream separation unit 9901. An MDCT code 9908, which is a code element required in decoding an MDCT coefficient, is inputted to the MDCT coefficient decoding unit 9902, and an MDCT coefficient 9910 is decoded. The inverse MDCT unit 9903 performs inverse-transformation on the MDCT coefficient 9910, and a temporal audio signal 9911 is generated. The pitch analyzing unit 9904 analyzes the pitch cycle of the temporal audio signal 9911. The reproduction speed control unit 9905, upon receiving a reproduction speed change instruction 9913, determines a start position 9914 for reproduction speed changing based on analyzed pitch cycle 9912. The waveform modification unit 9906 performs the modification of the waveform (waveform cancellation and insertion) based on the pitch cycle 9912 at the start position 9914 for the processing, connects the modified waveform 9915, and generates an output audio signal 9916.
Furthermore, as shown (in Patent Reference 3), it is also possible to have a configuration which makes use of pitch cycle information included in the input bitstream, instead of the pitch cycle 9912 analyzed by the pitch analyzing unit 9904.
However, in the process of variable-speed reproduction of an audio signal compressed using an audio encoding method, a configuration for performing, on the decoded audio signal, pitch cycle-based waveform insertion and cancellation in a temporal region is conventionally adopted.
For this reason, in such a conventional configuration there exists problems broadly divided into the following two.
In order to clarify these problems, the premise of the conventional technique shall be explained.
The system includes an encoder 9100 which performs compression encoding on an inputted audio signal (PCM), a recording medium 9200 for recording the compression-encoded audio signal, a decoder 9300 which decodes the compression-encoded audio signal, and a speed changer 9400 for variable-speed reproduction.
The decoder 9300 includes the bitstream separation unit 9901, the MDCT coefficient decoder 9902, and the inverse MDCT unit 9903 of the decoding apparatus 9000 shown in
For example, in the case of variable-speed reproduction at double speed, although the encoded signal is transmitted from the recording medium 9200 directly to the decoder 9300 or via antennas 9500 and 9600, such transmission speed needs to be double that of normal reproduction. Furthermore, the processing amount for the decoder 9300 and the speed changer 9400 required also becomes double that of normal reproduction
Therefore, the conventional technique entails the following problems concerning (1) processing amount and (2) transmission information amount.
(1) Processing Amount
In order to perform the pitch waveform insertion and cancellation processing in the temporal region, the temporal signal waveform of the section to be processed is required. This indicates that in the case where the target audio signal is encoded, all the signals in that section needs to be decoded.
For example, in the case of implementing double-speed reproduction, after decoding a temporal waveform that is double the length of the actual reproduction time, the temporal waveform is halved.
Therefore, the processing amount required for decoding becomes double that of normal reproduction.
In addition, when pitch waveform extraction as well as waveform insertion and cancellation are added, the processing amount further increases.
(2) Transmission Information Amount
When the target audio signal is encoded, in order to obtain the temporal signal waveform for the target section, the bitstream corresponding to that section needs to be received.
For example, in the case of implementing double-speed reproduction, twice as much bitstream is required in order to decode a temporal waveform that is double the length of the actual reproduction time.
At this time, since reproduction time is fixed in relation to the actual time, there is a need to receive the bitstream at double the normal speed.
This means that a wider band is needed for the communication path and, in the case where the communication path has a fixed bit rate, this means that (except for partial variable-speed reproduction through buffering) variable-speed reproduction is not possible.
In view of this, the present invention solves the aforementioned technical problem and has as an object to provide an audio encoding apparatus, an audio decoding apparatus, and an audio encoded information transmitting apparatus, reduce transmission information volume, and reduce the processing amount for a decoding apparatus.
In order to achieve the aforementioned object, the audio encoding apparatus according to the present invention is an audio encoding apparatus including: a time-frequency transformation unit which transforms an audio signal inputted into a frequency parameter, for every predetermined time-frequency transformation frame length; and an encoding unit which encodes the frequency parameter, the audio encoding apparatus includes: a pitch cycle detection unit which detects a pitch cycle of the audio signal; a framing unit which frames the audio signal based on the detected pitch cycle; a first waveform modification unit which performs waveform modification on the audio signal framed based on the pitch cycle, in conformance with the time-frequency transformation frame length, and outputs the waveform-modified audio signal to the time-frequency transformation unit; and a multiplex unit which multiplexes the frequency parameter encoded by the encoding unit and the pitch cycle, and outputs the multiplexed result as a bitstream.
Accordingly, the information transmission amount to the decoding apparatus during variable speed reproduction can be reduced to the same level as during uniform-speed reproduction, and the processing amount in the decoding apparatus can be reduced to the same level as in the decoding during uniform-speed reproduction.
Furthermore, the audio decoding apparatus according to the present invention is an audio decoding apparatus including: a decoding unit which decodes a frequency parameter of an encoded frame included in an inputted bitstream; and an inverse time-frequency transformation unit which performs inverse time-frequency transformation, for every predetermined time-frequency transformation frame length, so as to inverse-transform the frequency parameter into an audio signal, wherein the bitstream includes pitch cycle information indicating a pitch cycle of the audio signal, the inverse time-frequency-transformed audio signal is an audio signal which has been framed in advance based on the pitch cycle, and which has been waveform-modified in conformance with the time-frequency transformation frame length, and the audio decoding apparatus includes: a bitstream separation unit which separates pitch cycle information included in the inputted bit stream; a second waveform modification unit which modifies the audio signal of the time-frequency transformation frame length into a waveform signal of the pitch cycle length, based on the pitch cycle information; and a waveform connecting unit which connects the audio signals modified to the pitch cycle length.
Accordingly, the information transmission amount received by the decoding apparatus can be reduced to the same level as that of the normal bit rate, and the processing amount in decoding can be reduced to the same level as that in normal decoding.
Specifically, it is possible that the audio decoding apparatus according to the present invention further includes a first reproduction speed changing unit which changes a reproduction speed of an audio signal by skipping a decoding process of decoding the frequency parameter.
Accordingly, since variable-speed reproduction becomes possible by bitstream manipulation, the processing amount required for decoding is reduced. Furthermore, sine the bitstream amount required in decoding decreases, the required transmission band during variable-speed reproduction is reduced.
Furthermore, the audio encoded information transmitting apparatus according to the present invention is an audio encoded information transmitting apparatus including: a transmitting apparatus for transmitting a bitstream of an encoded audio signal; and a receiving apparatus including a decoding unit and an inverse time-frequency transformation unit, the decoding unit receiving the bitstream of the encoded audio signal and decoding a frequency parameter of an encoded frame included in the inputted bitstream, and the inverse time-frequency transformation unit performing inverse time-frequency transformation, for every predetermined time-frequency transformation frame length, so as to inverse-transform the frequency parameter into an audio signal, wherein the transmitting apparatus includes: an information storage unit which holds the bitstream of the encoded audio signal; a switch unit which turns on and off transmission of the bitstream; and a fourth reproduction speed changing unit which controls the switch unit based on an instruction for reproduction speed changing and a frame identifier included in the bitstream, the bitstream includes pitch cycle information indicating a pitch cycle of the audio signal, the inverse time-frequency transformed audio signal is an audio signal which has been framed in advance based on the pitch cycle, and which has been waveform-modified in conformance with the time-frequency transformation frame length, and the audio receiving apparatus includes: a bitstream separation unit which separates pitch cycle information included in an input bit stream; a second waveform modification unit which modifies an audio signal of a time-frequency transformation frame length into a waveform signal of a pitch cycle length, based on the pitch cycle information; and a waveform connecting unit which connects the modified audio signal of the pitch cycle length.
Accordingly, the information transmission amount received by the decoding apparatus can be reduced to the same level as that of the normal bit rate, and the processing amount in decoding in the decoding apparatus can be reduced to the same level as that in normal decoding.
Note that the present invention can be implemented not only as the audio encoding apparatus, audio decoding apparatus, and audio encoded information transmitting apparatus mentioned herein, but also as an audio encoding method, audio decoding method, and so on, which has, as steps, the characteristic units included in the audio encoding apparatus, audio decoding apparatus, and audio encoded information transmitting apparatus, and also as a program which causes a computer to execute such steps. In addition, it goes without saying that such a program can be delivered via a recording medium such as a CD-ROM and a transmission medium such as the Internet.
As is clear from the above-mentioned description, the audio encoding apparatus, audio decoding apparatus, and audio encoded information transmitting apparatus according to the present invention, produces the effect of enabling the information transmission amount to be reduced to the same level as that of the normal bit rate, and the processing amount in decoding to be reduced to the same level as that in normal decoding.
Accordingly, with the present invention, compatibility with existing apparatuses is increased and, in the situation at present in which the amount of information that can be viewed/listened to by an individual has increased dramatically and high-speed reproduction of audio is demanded following the increased capacity of information accumulation units and diversification of information obtainment methods, the practical value of the present invention is extremely high.
10, 11, 12, 13 Encoding apparatus
20, 21, 22 Decoding apparatus
30 Audio encoded information transmitting apparatus
101 Framing unit
102 Pitch detection unit
103, 604, 1001, 1301 Waveform modification unit
104 MDCT unit
105 MDCT coefficient encoding unit
106 Bitstream multiplex unit
601, 1602 Bitstream separation unit
602 MDCT coefficient decoding unit
603 Inverse MDCT unit
605 Waveform connecting unit
901 Pitch adjustment unit
1302 Frame identifier generation unit
1601, 1801 Information storage unit
1603 Reproduction speed control unit
1604, 1803 Switch
1701 Buffering unit
1802 Reproduction speed control unit
1804 Transmitting apparatus
1805 Receiving apparatus
Hereinafter, the embodiments of the present invention shall be described with reference to the Drawings.
The encoding apparatus 10 is an apparatus which performs compression encoding on a digitalized audio signal such as PCM while modifying it in order to be able to respond to variable-speed reproduction. As shown in
Note that the wave form modification unit 103 includes: a cutting unit 103a which cuts an audio signal that is subjected to framing, in accordance with the pitch cycle of the audio signal; a copying unit 103b which generates a waveform signal having a temporal frequency transformation frame length by duplicating part of a signal waveform of an adjacent encoded frame in a current encoded frame; and a window unit 103c which performs windowing so that discontinuity points do not occur in the waveform signal of temporal frequency transformation frame length, generated by the copying unit 103b.
An input audio signal 107 is inputted to the framing unit 101 and the pitch detection unit 102.
The pitch detection unit 102 analyzes the input audio signal 107 and outputs a pitch cycle 108.
Referring to the pitch cycle 108, the framing unit 101 divides the input audio signal 107 into encoded frame signals 109 that are of pitch cycle length.
The waveform modification unit 103 modifies the encoded frame signals 109 into a form that allows MDCT transformation. Note that details of the operation of the waveform modification unit 103 shall be described later.
A modified MDCT frame signal 110 is transformed into an MDCT coefficient 111 by the MDCT unit 104.
The MDCT coefficient encoding unit 105 encodes the MDCT coefficient 111 and outputs MDCT encoded information 112.
The bitstream multiplex unit 106 multiplexes the MDCT encoded information 112 and the pitch cycle 108 and configures an output bitstream 113.
Here, although any commonly known encoding means such as vector quantization or entropy encoding can be used for the MDCT coefficient encoding unit 105, detailed description on this point is omitted as this is not the essence of the present invention.
Details of the MDCT encoded information 112 is different depending on the configuration of the MDCT coefficient encoding unit 105 that is used, and it is possible to include supplementary information for effectively encoding MDCT coefficients, aside from the code directly indicating the MDCT coefficient. For example, for the MDCT coefficient encoding unit 105, in the case of using the MPEG AAC method, scale factor information, joint stereo information, and predicted coefficient information, and so on, are included as supplementary information.
As shown in
Note that the waveform modification unit 604 includes a cutting unit 604a, a window unit 604b and a connection unit 604c, for performing the opposite operation as the waveform modification unit 103.
The bitstream separation unit 601 separates an input bitstream 606 into an MDCT coefficient 607 and a pitch cycle 610.
The MDCT coefficient decoding unit 602 decodes the MDCT coefficient 607 to obtain an MDCT coefficient 608. Here, any commonly known decoding means can be used for the MDCT coefficient decoding unit 602, and detailed description on this point is omitted as this is not the essence of the present invention. Details of the MDCT coefficient 607 inputted to the MDCT coefficient decoding unit 602 is different depending on the configuration of the MDCT coefficient decoding unit 602 that is used, and it is possible to include supplementary information for effectively decoding MDCT coefficients, aside from the code directly indicating the MDCT coefficient. For example, for the MDCT coefficient decoding unit 602, in the case of using the MPEG AAC method, scale factor information, joint stereo information, and predicted coefficient information, and so on, are included as supplementary information.
The inverse MDCT unit 603 inverse-transforms an MDCT coefficient 618 to obtain a frame decoded signal 609.
The waveform modification unit 604 modifies the frame decoded signal 609 with reference to the pitch cycle 610, and outputs a modified frame decoded signal 611. Details of the operation of the waveform modification unit 604 shall be described later.
The waveform connecting unit 605 connects the modified frame decoded signal 611, and generates an output audio signal 612.
Next, the operation of the waveform modification unit 103 of the encoding apparatus 10 shall be described in detail. First, however, MDCT transformation (inverse MDCT transformation), which is a prerequisite for processing, and its characteristics shall be explained.
MDCT is based on the technique known as TDAC and, by performing overlapping in the temporal signals between adjacent encoded frames, performs aliasing cancellation on the temporal signal.
In
When the coded frame length is assumed as N samples, the MDCT frame length becomes 2N samples. Furthermore, between the adjacent MDCT frames, there is an overlap 203 of the N samples equivalent to half of the MDCT frame length, and this overlap portion becomes the decoded frame waveform signal. The section (last-half of the MDCT frame) equivalent to the overlap portion of the waveform signal 201 is made from an actual signal component 204 and an aliasing component 205. Likewise, the section (first-half of the MDCT frame) equivalent to the overlap portion of the waveform signal 202 is made from an actual signal component 206 and an aliasing component 207. Here, the actual signal components 204 and 206 are mutually in phase signals, whereas the aliasing components 205 and 207 are mutually opposite phase signals. After multiplying the actual signal component 204 and the aliasing component 205 by a first window coefficient 208, and the actual signal component 206 and the aliasing component 207 with a second window coefficient 209, all the signals are added.
Here, assuming the first window coefficient is f(t) and the second window coefficient is g(t), the first window coefficient 208 and the second window coefficient 209 need to satisfy expression) (1)
[Expression 1]
f
2(t)+g2(t)=1 (0≦t<N) (1)
As a result of the addition, the aliasing components 205 and 207, being mutually opposite phase signals, cancel out each other and become 0, and the added portions of the actual signal components 204 and 206 become a decoded frame waveform signal 211
As is clear from this description, in inverse MDCT transformation, for the input of the 2N samples of the nth MDCT frame waveform signal, the N samples equivalent to the last-half portion of the input MDCT frame becomes the output.
Next, the principle of reproduction speed changing using pitch cycle, and its commonality with MDCT transformation is shown
In
By multiplying the waveform signal 302 by a third window coefficient 304 and multiplying the waveform signal 303 by a fourth window coefficient 305, and adding up the respective products, an added frame waveform signal 306 is obtained.
Here, assuming that the third window coefficient is p(t) and the fourth window coefficient is q(t), the relationship of the third window coefficient 304 and the fourth window coefficient 305 is represented by expression (2).
[Expression 2]
p(t)+q(t)=1 (0≦t<N) (2)
Compared with expression (1), there are no items raised to the 2nd power for the respective window coefficients. This is because, in MDCT, multiplication with the windows is performed during transformation and during inverse transformation for a total of two times, whereas in the present example multiplication is performed only once, during the speed changing process.
By assuming the waveform 301 as a waveform signal 307 of the k−1th frame at the output-side, and the added frame waveform signal 306 as a waveform signal 308 of the kth frame, the reproduction speed changing process is completed.
In this manner, it can be seen that both MDCT and pitch waveform-based reproduction speed changing make use of the overlap addition process using window coefficients.
This indicates that, reproduction speed changing is possible, using MDCT windows.
In normal MDCT inverse transformation, overlap addition is performed on the last-half of an n−1th MDCT frame 401 and the first-half of an nth MDCT frame 402. Here, however, overlap addition is performed on the last-half of an n−1th MDCT frame 401 and the first-half of an n+1th MDCT frame 403. In the same manner as in the example of the normal MDCT described earlier, an aliasing component 405 and an aliasing component 407 cancel out as a result of addition and, by the addition of an actual signal component 404 and an actual signal component 406, a frame waveform signal 410 is decoded. By assuming an encoding frame waveform signal of the k−1th as the frame a waveform signal 411 of the k−1th frame at the output-side, and the frame waveform signal 410 as the waveform signal 412 of the kth frame at the output-side, the reproduction speed changing process is completed.
In this process, since the waveform signal 402 of the nth MDCT frame is not used, the transmission and decoding of the waveform signal 402 of the nth MDCT frame is not required, and the processing amount when reproduction speed changing is performed becomes the same as when reproduction speed changing is not performed. In other words, changing of reproduction speed is possible without increasing the processing amount.
Here, as described using
However, since the pitch cycle L is different depending on the state of the input audio signal, the encoded frame length N needs to be of variable-length in synchronization with the pitch cycle L.
However, normally, the encoded frame length N is fixed as a power-of-2 (for example, 512, 1024, and so on). This is because a power-of-2 samples of MDCT can be easily attained by fast transformation using FFT. Furthermore, although fast transformation can be implemented even for a frame length other than that of a power-of-2, there is a need to change transformation algorithms for each frame length, and having a variable-length in synchronization with the pitch cycle is not practical.
Therefore, waveform signals for pitch cycle L samples need to be transformed into waveform signals of a predetermined length, preferably of a number of samples N that can be denoted by a power-of-2.
The waveform modification unit 103 has a function for transforming the waveform signals for pitch cycle L samples into waveform signals of encoded frame length N samples.
Waveform signals 501, 502, and 503 which correspond to the n−1th, nth, and n+1th pitch cycle frames, respectively, have lengths equal to the pitch cycle L.
In this example, L<=N is assumed.
A waveform signal divided into pitch cycle length L samples is rearranged in frames based on the encoded frame N sample length. In
At this time, when L<N, a section 508 in which a waveform signal does not exist arises. Therefore, for such portion, a waveform signal 509 for the same number of samples as the section 508 is copied from the beginning portion of the next frame.
At this time, since a discontinuity point arises in a frame boundary 510, the copied section 508 is multiplied by a reducing window 511 which becomes 0 at the frame boundary 510. At the same time, an increasing window 511 which becomes 0 at the frame boundary 510 is applied to a section 509.
When it is assumed that the reducing window 511 is r(t), the increasing window 512 is s(t), and the start position for either of the windows is t=0, the reducing window 511 and the increasing window 512 satisfy the relationship in expression (3).
[Expression 3]
r
2(t)+s2(t)=1 (0≦t<N−L) (3)
By performing the pitch cycle L sample waveform signal cutting, the abovementioned waveform signal duplication, and window multiplication in all the encoded frame boundaries, a modified waveform signal 513 is obtained.
The waveform signal 513 obtained in such manner becomes a temporal waveform having the coded frame length N as a pitch cycle, and satisfies the previously described condition for implementing reproduction speed changing using MDCT windows, and the pitch cycle=encoded frame length condition.
The modified waveform 513 is outputted as the modified MDCT frame signal 110 in
Next, the operation of the waveform modification unit 604 of the decoding apparatus 20 shall be described.
In
When the frame decoding signal 702 of the nth frame is inputted, N−L samples from the beginning thereof is multiplied by an increasing window 705. The decoding signal 703 of the previous frame is multiplied by a decreasing window 704.
When it is assumed that the reducing window 704 is r(t) and the increasing window 705 is s(t), the reducing window 704 and the increasing window 705 satisfy the relationship in expression (4).
[Expression 4]
r
2(t)+s2(t)=1 (0≦t<N−L) (4)
Furthermore, the reducing window 704 and the increasing window 705 are identical to the reducing window 511 and the increasing window 512, respectively, which are used in the encoding process. The respective signals which have been multiplied are then added up to generate a waveform signal of a section 706.
The inputted frame decoding signal 702 of the nth frame is used, as is, with respect to the waveform signal of a section 707.
The waveform signal of a section 708 is held since it is used in the decoding of the n+1th frame.
A signal 709 which connects the waveform signals of section 706 and section 707 becomes the modified frame decoding signal 611 which is the output of the waveform modification unit 604.
With this process, the frame decoding signal of N samples is modified into a decoding signal of L samples which are equal to the number of samples of the pitch cycle. The modified decoding signal of L samples becomes the same as the pitch waveform signal of L samples divided in the encoding process.
In the aforementioned configuration, process during uniform-speed reproduction and variable-speed reproduction in the decoding apparatus is absolutely the same.
Furthermore, the information transmission amount from the encoding apparatus 10 to the decoding apparatus 20 can be reduced to the same level as during uniform-speed reproduction, and the processing amount in the decoding apparatus 20 can be reduced to the same level as in the decoding during uniform-speed reproduction.
Note that in the case of variable-speed reproduction, for example when carrying out double-speed reproduction, the decoding process which decodes a frequency parameter may be skipped, and the audio signal reproduction speed may be changed.
Accordingly, since variable-speed reproduction becomes possible by bitstream manipulation, the processing amount required for decoding is reduced. Furthermore, sine the bitstream amount required in decoding decreases, the required transmission band during variable-speed reproduction is reduced.
Meanwhile, although the pitch cycle L is assumed to be a constant fixed value in the description thus far, in actuality, the pitch cycle is different depending on the state of the input audio signal.
Therefore, the condition for correctly performing encoding and decoding with respect to a variable pitch cycle L shall be described next.
In
In the case where reproduction speed changing is not performed, sections 802 and 803, as well as sections 804 and 805 are added up. In contrast, in the case where reproduction speed changing is performed and the nth MDCT frame is skipped, section 802 and section 805 are added up.
In the decoding process, since the pitch cycles of the two sections that are added up must be the same, it is necessary for the pitch cycles that are set for section 802 and section 805 to be the same. This indicates that, at the same time, the pitch cycles that are set for section 803 and section 804 in the nth frame must be identical.
On the contrary, when the pitch cycles of section 803 and section 804 are different, the pitch cycles of section 802 and section 805 are necessarily different, and addition between both is not possible. By setting identical pitch cycles for section 803 and section 804, information indication identical pitch cycles are multiplexed in the respective bitstreams corresponding to the nth coded frame and the n+1th coded frame.
Note that for a MDCT frame for which frame skipping is not permitted, the pitch cycles of the first-half section and the last-half section may be different. For example, the pitch cycles of section 801 and section 802 (=section 803) may be different and, in such case, information indicating respectively different pitch cycles are multiplexed in the respective bitstreams corresponding to the n−1th coded frame and the nth coded frame.
In order to implement arbitrary reproduction speed changing by MDCT frame skipping, MDCT frames that can be skipped must exist at a frequency stipulated according to a request condition. As previously described, in order to generate a skippable MDCT frame, equal pitch cycles may be set in the first-half section and the last-half section. However, there are many instances where the pitch cycles detected from an input audio signal are different for each section.
In order to solve this problem, it is possible to adjust the pitch cycles detected from the input audio signal, and treat it as if the first-half section and the last-half section of one MDCT frame are of equal pitch cycles.
In contrast to the encoding apparatus 10 of the present invention shown in
The pitch adjustment unit 901 sets an identical pitch cycle for two adjacent coded frames, at a predetermined frequency, while referring to the inputted pitch cycle 108, and outputs this as the adjusted pitch cycle 902.
As a method for adjusting the pitch cycle, there is a method, among others, in which the average value of the respective pitch cycles of two adjacent coded frames is taken, and the obtained average pitch cycle is adopted as a common pitch cycle for the two adjacent coded frames.
The process after the adjusted pitch cycle 902 is inputted to the framing unit 101 is the same as in the process described using
Note that although the above description uses an example in which the pitch waveform signal for one cycle is arranged in one coded frame, it should be obvious that a pitch waveform signal for 2 or more cycles can be considered and used as a pitch waveform signal for one new cycle.
In this configuration, an even number of pitch waveform signals are included in one MDCT frame of 2N samples.
In the encoding and decoding apparatuses of the present invention, the relationship of the coded frame length N and the pitch cycle L is important.
For example, in the case where the L>N relationship is upheld, application with the technique in the first embodiment is not possible. Furthermore, when L becomes extremely small in relation to N, overlapping sections increase relatively, triggering the decrease in encoding efficiency.
In order to solve this problem, the second embodiment shows a configuration that can be applied even in the case where L>N or an odd number of the pitch waveform signal exists in the MDCT frame of 2N samples.
In contrast to the configuration of the encoding apparatus 10 shown in
A pitch waveform signal 1101 is divided into two wave signals 1102 and 1103 becoming L1<=N, and L2<=N respectively. The number of samples of L1 and L2 are arbitrary, and may be identical or different.
For a section 1104 of N−L1 samples, the waveform signal of a section 1105 is duplicated. In the same manner, for a section 1106 of N−L1 samples, the waveform signal of a section 1107 is duplicated. At this time, coded frame boundaries 1108 and 1109 are discontinuity points.
In order to eliminate these discontinuity points, for example, the copied section 1104 is multiplied by a reducing window 1110 which becomes 0 in a frame boundary. Furthermore, section 1105 which is the copy source is likewise multiplied with an increasing window 1111 which becomes 0 in the frame boundary. The same processing is performed on sections 1106 and 1107 which precede and follow the discontinuity point 1109, respectively.
With the abovementioned modification process, the pitch waveform signal 1101 of L samples is modified into a waveform signal 1112 corresponding to MDCT frames of 2N samples. The waveform signal 1112 is outputted as the modified MDCT frame signal 110, and is encoded after undergoing MDCT transformation. Furthermore, as a second pitch cycle 1002, each of L1 and L2 is outputted as a pitch cycle corresponding to their respective encoded frames. The encoded MDCT coefficient and the second pitch cycle information are multiplexed by the bitstream multiplex unit 106.
After modification in the above-mentioned manner, the encoded waveform signal 1112 can be decoded with the same process as in the decoding apparatus described in the first embodiment, as long as reproduction speed changing is not performed. In other words, the same decoding apparatus can be used in relation to the encoding apparatuses in the first embodiment and the second embodiment. Furthermore, even when reproduction speed changing is performed, only the MDCT frame skipping method is different, and it is possible to have the same decoding apparatus.
In the first embodiment, the waveform signal within the MDCT frame is a signal having, as a cycle, the encoded frame length N samples. In contrast, in the second embodiment, the waveform signal within the MDCT frame is a signal having, as a cycle, the encoded frame length 2N samples. In this case, when looking at a waveform signal on a per encoded frame basis, the same pattern appears every other frame. In other words, in
Moreover, although in this configuration, it is not possible to handle a pitch cycle in which L>2N, by setting a sufficiently large value for N, problems will not occur from a practical standpoint. For example, by assuming N=1024 samples, the smallest pitch cycle that cannot be handled is 2049 samples. Although, in a 48 kHz sampling signal, this is equivalent to about 23.4 Hz, it is rare for a general music or speech signal to have such a long pitch cycle.
Moreover, as in the first embodiment, in the second embodiment, it is also possible to have a pitch adjustment unit 901, and perform framing and waveform modification using the adjusted pitch cycle.
By adopting such a configuration, it is possible to set MDCT frames which permit skipping at a predetermined arbitrary frequency and, as a result, arbitrary reproduction speed changing can be implemented.
Commonality is possible between the encoding apparatus in the first embodiment and the encoding apparatus in the second embodiment. In other words, it is possible to provide a third waveform modification unit having the functions of both the waveform modification unit 103 and the second waveform modification unit 1001 and, according to the number of pitch waveform signals existing in the MDCT frame, switch between the function of the waveform modification unit 103 and the second waveform modification unit 1001 in the case of even numbers and odd numbers, respectively.
Here, the pitch cycle used by the waveform modification unit 103 and the pitch cycle 1002 used by the second waveform modification unit 1001 are information with both indicate lengths from 0 to N samples and, as encoded information, can be handled as exactly the same information. Therefore, in the case where the function of the waveform modification unit 103 is selected, the inputted pitch cycle 108 or the adjusted pitch cycle 902 may be outputted, as is, as the second pitch cycle 1002. With this configuration, no matter what pitch cycle an input audio signal has, the appropriate encoding process can be performed and encoding efficiency can be increased.
Note that although, in the descriptions of all the aforementioned waveform modification units, the divided pitch waveform signals are arranged to match the beginning of each encoded frame boundary, the arrangement of the divided waveform signals is arbitrary. In other words, for the signal-less sections arising before or after a pitch waveform signal arranged in an arbitrary position within each encoded frame, a signal of the encoded frame length may be generated by duplicating the waveform signal of sections which would normally be continuous, from pitch waveform signals arranged in the respective preceding or subsequent frames. The length of reducing windows and increasing windows used in window multiplication, in the encoded frame boundary, is N−L where, regardless of the pitch waveform signal arrangement, the length of the coded frame is N and the pitch cycle is L. The difference of the arrangements of the divided pitch waveform signals in the encoding apparatus only appears as a difference in the phases of the encoded audio signal, and does not have any influence on the configuration or processing in the decoding apparatus.
As shown in
The frame skip information 1304, the frame identifier 1305 which are additional functions in the present configuration, and the operation of the third waveform modification unit 1301 and the frame identifier generation unit 1302 are described hereafter.
the third waveform modification unit 1301 detects the number of pitch waveform signals included within one MDCT frame based on inputted pitch information, as well as an encoded frame that can be skipped based on the uniformity of pitch cycles between two or more adjacent frames.
As in previously described, in the case where the number of pitch signals included in one MDCT frame is an even number, it is possible to independently skip one encoded frame. Furthermore, in the case where the number of pitch signals included in one MDCT frame is an odd number, it is possible to skip two successive encoded frames as a set.
Therefore, the frame skip information includes the following two information:
(A) Whether or not the current encoded frame is a frame that can be skipped; and
(B) Whether the number of pitch waveform signals included in the MDCT frame is an even number or an odd number.
The frame identification generation unit 1302 generates, based on the frame skip information 1304, the frame identifier 1305 which is added to the current frame.
The frame identifier to be generated may be any identifier as long as it is possible to differentiate the following three:
(1) An unskippable encoded frame.
(2) Skippable, and the number of pitch waveform signals included in the MDCT frame is an even number.
(3) Skippable, and the number of pitch waveform signals included in the MDCT frame is an odd number.
As an example, it is possible to have frame identifiers by setting “0” for the condition (1), “1” for the condition (2), and “2” for condition (3).
A frame identifier field 1401 and an encoded information field 1402 are arranged in a bitstream of the nth encoded frame. The frame identifier 1305 is written in the frame identifier field 1401, and an MDCT encoded information 112 and a pitch cycle 1303 are written in the encoded information field. Since a frame identifier “1” indicates that it is possible to independently skip an encoded frame, frame identifiers “0” and “1” can exist alternately, as shown in
Since a frame identifier “2” indicates that two successive encoded frames can be skipped, the frame identifier 2 is written in frame identifier field 1503 and 1504 of two successive encoded fields.
Note that an identifier corresponding to condition (3) can be further segmentized. In other words, between two successive encoded frames, it is possible to assign a frame identifier “2” for the preceding encoded frame, and a frame identifier “3” to the succeeding encoded frame. By attaching such frame identifiers, there is the advantage of being able to judge immediately whether or not skipping is possible even in cases where reproduction is performed from mid-stream of a bitstream.
Furthermore, it is also possible to limit the types of the frame identifier to be used. For example, when frame skipping is not to be allowed in the case where condition (3) is satisfied, the required identifiers become only those corresponding to conditions (1) and (2), and the amount of information required for describing the frame identifiers can be reduced.
Note that although in
A bitstream encoded by the encoding apparatus according to the third embodiment of the present invention, for example, is stored in an information storage unit 1601 of the decoding apparatus 21. An optical disc, a magnetic disc, a semiconductor memory can be used as the information storage unit 1601. A bitstream 1605, which is read by the storage unit 1601, is separated by a bitstream separation unit 1602 into the MDCT code 607, the pitch cycle 610, and a frame identifier 1607.
In accordance with an externally provided reproduction speed change instruction 1606, a reproduction speed control unit 1603 calculates the frame skipping frequency required in order to implement the instructed reproduction speed. For example, a frame skipping frequency f required in order to obtain a reproduction speed of k-times is represented by expression (5).
For example, in order to implement double speed, k=2.0 is substituted into the formula and f=0.5 is obtained, and thus 50 percent of the total number of frames are to be skipped.
The reproduction speed control unit 1603 refers to the frame identifier 1607 and skips the encoded frames for which frame skipping is possible, based on the calculated frame skipping frequency f. Specifically, with respect to an encoded frame for which it is judged that frame skipping is to be performed, the reproduction speed control unit controls a switch 1604 and shuts off the transmission of the MDCT code 607 and the pitch cycle 610.
The process from the MDCT coefficient decoding unit 602 to the waveform connecting unit 605 is the same process as that in the decoding apparatus of the present invention previously described using
Note that in the above description, it is also possible to provide the reproduction speed control unit 1603 with a function for adjusting the frame skipping frequency f with reference to the pitch cycle 610. In the decoding apparatus of the present invention, the temporal length of the frame decoding signal 611, which is in an encoded frame basis, is dependent on the pitch cycle 610 set for that encoded frame. Normally, since pitch cycles change smoothly, the change in pitch cycles between adjacent encoded frames is small, and as a condition, a relationship of a number 5 holds true. However, in a section in which the change of pitch cycles is great, a mismatch arises between the frame skipping frequency f calculated from the number 5 and the actual frame skipping frequency f. In order to correct this mismatch, the reproduction speed control unit 1603 may refer to the pitch cycle 610 and calculate the correct encoding signal temporal length for each encoded frame, and adjust the frame skipping frequency f based on the result.
Note that, as shown in
As previously described, in the decoding apparatus of the present invention, the temporal length of the frame decoding signal 611, which is in an encoded frame basis, is dependent on the pitch cycle 610 set for that encoded frame. Therefore, the number of temporal samples of the output audio signal 612 also varies. Consequently, by accumulating the output decoding signal once in the buffering unit 1701, and outputting it as an audio signal of a fixed sample length in a predetermined constant interval, an output audio signal 1702 of a fixed frame length can be obtained. By having a fixed frame length for the output audio signal, there is the advantage that output audio signal handling becomes easy.
In the present configuration, a transmitting apparatus 1804 including: an information storage unit 1801; a reproduction speed control unit 1802; and a switch 1803, and a receiving apparatus 1805 including: the bitstream separation unit 601; the MDCT coefficient decoding unit 602; the inverse MDCT unit 603, the waveform modification unit 604, and the waveform connecting unit 605 are connected via a transmission path 1807.
The configuration and the operation of the receiving apparatus 1805 is the same as the decoding apparatus shown using
A bitstream encoded by the encoding apparatus according to the third embodiment of the present invention, for example, is stored in the information storage unit 1801.
A reproduction speed change instruction 1808 is sent to the transmitting apparatus 1804 via the transmission path 1807.
In accordance with the reproduction speed change instruction 1808, the reproduction speed control unit 1802 controls the switch 1803 while referring to frame identifier information, or frame identifier information and pitch cycle information, included in a bitstream 1806 read from the information storage unit 1801. Details of the operation of the reproduction speed control unit 1802 are the same as the operation of the reproduction speed control unit 1603 explained in the fourth embodiment of the present invention.
The switch 1803 turns the transmission of the bitstream 1806 ON/OFF on a per encoded frame basis. A bitstream passing the switch 1803 is inputted to the receiving apparatus 1805 via the transmission path 1807, as an input bitstream 1809.
In the decoding apparatus in the present configuration, all the processes related to reproduction speed changing are completed in the transmitting apparatus 1804. With this, in the receiving apparatus, none of the processes relating to reproduction speed changing are necessary and there is no increase in processing amount due to the performance of reproduction speed changing.
Furthermore, since, with the switch 1803, only the bitstream of the encoded frames corresponding to the output audio signal for which reproduction speed has been changed, the amount of information per unit of time for the bitstream transmitted via the transmission path 1807 becomes almost equal to that when reproduction speed changing is not performed. In other words, reproduction speed changing can be performed without increasing the amount of transmission information per unit of time.
Note that, for the transmission path 1807, any transmission protocol may be used regardless of whether it is wired or wireless, as long as the reproduction speed change instruction 1808 and the bitstream 1809 can be transmitted.
(Variations)
Note that although the present invention is described based on the above-mentioned embodiments, it should be obvious that the present invention is not limited to such above-mentioned embodiments. The present invention also includes such cases as described below.
(1) Each of the above-described apparatuses is a computer system specifically made from a microprocessor, a ROM, a RAM, a hard disk unit, a display unit, a keyboard, and a mouse. A computer program is stored in the RAM or the hard disk unit. Each apparatus accomplishes its function through the operation of the microprocessor in accordance with the computer program. Here, the computer program is configured by combining plural command codes indicating instructions to the computer in order to accomplish predetermined functions.
(2) It is possible that a part or all of the constituent elements making up each of the above-mentioned apparatuses is made from one system LSI (Large Scale Integration circuit). The system LSI is a super multi-function LSI that is manufactured by integrating plural components in one chip, and is specifically a computer system which is configured by including a microprocessor, a ROM, a RAM, and so on. A computer program is stored in the RAM. The system LSI accomplishes its functions through the operation of the microprocessor in accordance with the computer program.
(3) It is possible that a part or all of the constituent elements making up each of the above-mentioned apparatuses is made from an IC card that can be attached to/detached from each apparatus, or a stand-alone module. The IC card or the module is a computer system made from a microprocessor, a ROM, a RAM, and so on. The IC card or the module may include the super multi-function LSI. The IC card or the module accomplishes its functions through the operation of the microprocessor in accordance with the computer program. The IC card or the module may also be tamper-resistant.
(4) The present invention may also be the methods described thus far. The present invention may also be a computer program for executing such methods through a computer, or as a digital signal made from the computer program.
Furthermore, the present invention may be a computer-readable recording medium, such as a flexible disk, a hard disk, a CD-ROM, an MO, a DVD, a DVD-ROM, a DVD-RAM, a BD (Blu-ray Disc), or a semiconductor memory, on which the computer program or the digital signal is recorded. In addition, the present invention may also be the digital signal recorded on such recording mediums.
Furthermore, the present invention may also transmit the computer program or the digital signal via an electrical communication line, a wireless or wired communication line, a network represented by the Internet, a data broadcast, and so on.
Furthermore, it is also possible that the present invention is a computer system including a microprocessor and a memory, with the aforementioned computer program being stored in the memory and the microprocessor operating in accordance with the computer program.
Furthermore, the present invention may also be implemented in another independent computer system by recording the program or digital signal on the recording medium and transferring the recording medium, or by transferring the program or the digital signal via the network, and the like.
(5) It is also possible to combine the above-described embodiments and the aforementioned variations.
The present invention can be generally applied to an apparatus, for example devices such as a cellular phone and a music player, which retrieves a compression-encoded sound or audio signal, from a storage medium or via a transmission path, and decodes these into the original sound or audio signal while changing the reproduction speed. The present invention is specifically suited for an sound/music player having an optical disc, magnetic disk, semiconductor memory, and the like, as a storage medium, and for on-demand delivery of voice/music/video, and so on.
Number | Date | Country | Kind |
---|---|---|---|
2005-184086 | Jun 2005 | JP | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/JP2006/312390 | 6/21/2006 | WO | 00 | 12/20/2007 |