The present invention relates to a method of encoding and decoding an audio signal at a low bit rate to transmit or storing the same, and more particularly, to a code conversion method and device, a program and a recording medium all of which are used for converting a code obtained by encoding an audio by a certain method, into another code that is decodable by any other method.
As a method of encoding an audio signal at a middle or a low bit rate in high efficiency, a method has been widely used which encodes the audio signal by separating it into a linear prediction (LP) filter and an excitation signal to drive the filter. As such a representative method, there is known Code Excited Linear Prediction (CELP) (e.g., see Nonpatent Document 1; M. R. Schroeder and B. S. Atal: “Code excited linear prediction: High quality speech at very low bit rates,” Proc. of IEEE Int. Conf. on Acoust., Speech and Signal Processing, pp. 937-940, 1985).
The CELP is a method of obtaining a synthesized audio signal by driving an LP filter provided with an LP coefficient indicative of frequency characteristics of an input audio, by an excitation signal represented by a sum of an adaptive codebook (ACB) indicative of a pitch cycle of the input audio and a fixed codebook (FVB) composed of a random number or a pulse. The ACB and FCB components are multiplied by gains (ACB and FCB gains), respectively.
For example, assuming a mutual connection between a 3G mobile network and a cable packet network, it is to be noted that standard audio encoding methods used for both of the networks are different from each other and this brings about a problem of difficulty of a direct connection to occur between the 3G mobile network and the cable packet network difficult. As a solution to this problem, a tandem connection has been developed.
Now, referring to
An audio decoding device 1A shown in
The audio detection device 5 receives the first decoded signal output from the audio decoding device 1A, judges whether the first decoded signal specifies an audio section or a non-audio section, and outputs an audio detection result flag to the audio encoding device 2A on the basis of a result of the judgment. An audio detection method is described in detail in the 3GPP Specification or the like. Thus, it is not described in detail here (Nonpatent Document 3 “AMR speech code; Voice Activity Detector (VAD)” 3GPP TS 26.094 Chapter 3).
The audio encoding device 2A is operable in response to the first decoded signal output from the audio decoding device 1A and the audio detection result flag output from the audio detection device 5. From the audio detection result flag, judgment can be made as to whether the first decoded signal specifies an audio section or a non-audio section. Responsive to the audio detection result flag, the audio encoding device 2A outputs a code string obtained by encoding an audio signal or a non-audio signal by a second encoding method to produce a second code string through an output terminal 4. The description of
Details on header and frame type information input to the audio decoding device 1A have been known (Nonpatent Document 4: “AMR speech codec; frame structure” 3GPP TS 26.101 Chapter 4). Additionally, methods described below for encoding and decoding noise have been known (Nonpatent Document 5: “AMR speech codec; comfort noise aspects” 3GPP TS 26.092 Chapters 5 and 6).
As mentioned above, the aforementioned conventional code conversion device uses the audio detection device to judge whether the signal decoded from the first code string specifies the audio section or the non-audio section. Therefore, such inclusion of the audio detection device causes a problem to occur in that the code conversion device inevitably becomes large in size. In other words, the Nonpatent Documents 1 to 5 have no mention at all of a possibility of improvement of the code conversion device shown in
The present invention has been developed in considering the foregoing problems in mind, and its primary object is to provide a device and a method for converting codes, wherein a device size can be reduced, and a recording medium recording a program for the above-mentioned device and method. Other objects, features, advantages and the like of the present invention will become apparent to those skilled in the art, by referring to the following description.
In order to achieve the object, according to an aspect of the present invention, a code conversion method for converting a first code string compliant with a first method into a second code string compliant with a second method, includes a first step of generating a first decoded audio from the first code string in accordance with a first decoding method and a second step of judging whether the first decoded audio is an audio signal or a non-audio signal by using information contained in the first code string, and encoding the first decoded audio in accordance with a second encoding method on the basis of the judgment to generate a second code string.
In the code conversion method of the present invention, preferably, in the second step, whether the first decoded signal is the audio signal or the non-audio signal is judged by using one of frame type information contained in the first code string and a size of the code string.
According to another aspect of the present invention, a code conversion device for converting a first code string compliant with a first method into a second code string compliant with a second method includes an audio decoding circuit for generating a first decoded audio from the first code string in accordance with a first decoding method, and an audio encoding circuit for judging whether the first decoded audio is an audio signal or a non-audio signal by using information contained in the first code string, and encoding the first decoded audio by a second encoding method based on the judgment to generate a second code string.
In the code conversion device of the present invention, preferably, whether the first decoded signal is the audio signal or the non-audio signal is judged by using one of frame type information contained in the first code string and a size of the code string.
According to yet another aspect of the present invention, a code conversion program for use in operating a computer constituting a code conversion device so as to execute conversion operation of a first code string compliant with a first method into a second code string compliant with a second method, the code conversion program comprising the steps of:
(a) processing of generating a first decoded audio from the first code string by a first decoding method; and
(b) processing of judging whether the first decoded audio is an audio signal or a non-audio signal by using information contained in the first code string, and encoding the first decoded audio by the second encoding method based on the judgment to generate a second code string.
In the code conversion program of the present invention, preferably, according to claim 9, whether the first decoded audio is the audio signal or the non-audio signal is judged by using one of frame type information contained in the first code string and a size of the code string.
Furthermore, according to yet another aspect of the present invention, a recording medium records and holds the code conversion program.
Hereinafter, the preferred embodiments of the present invention will be described. An outline and a principle of a device and a method of the present invention will be described first, and then the embodiments will be described in detail.
A first code string encoded in compliance with a first method, i.e., according to the first method, is supplied to the audio decoding device 1 via an input terminal 3. The audio decoding device 1 generates a first decoded audio from the first code string by a first decoding method.
The audio encoding device 2 judges whether the first decoded audio is an audio signal or a non-audio signal in response to information contained in the first code string, and encodes the first decoded audio by a second encoding method based on the judgment to generate a second code string.
The method of the present invention includes the following steps.
Step a: a first decoded audio is generated from a first code string by a first decoding method.
Step b: whether the first decoded audio is an audio signal or a non-audio signal is judged by using information contained in the first code string, and the first decoded audio is encoded by a second encoding method based on the judgment to generate a second code string via an output terminal 4.
Next, operation and merit of the present invention will be described. According to the present invention, by using frame type information contained in the first code string, judgment is made as to whether a signal decoded from the code string corresponds to an audio section or a non-audio section. Thus, an audio detection device is made unnecessary, whereby the code conversion device can be reduced in size.
Further, referring to
The audio decoding device 1 receives the first code string via the input terminal 3. Herein, the code string is assumed to be encoded by the first encoding method. The audio decoding device 1 decodes an audio signal or a non-audio signal, such as a noise, by the first decoding method corresponding to the first encoding method, and outputs the decoded signal as a first decoded signal to the audio encoding device 2. Generally, the first code string comprises a header and a payload. The header contains frame type information. It is to be noted that such frame type information makes it possible to judge whether the signal decoded from the code string corresponds to an audio section or a non-audio (no sound or noise) section. The audio decoding device 1 generates an audio signal or a non-audio signal (noise signal) according to this frame type information.
The audio decoding device 1 outputs the frame type information to the audio encoding device 2. In this case, for example, for details on the header and the frame type information, the Patent Document 4 can be referred to.
The payload comprises a code corresponding to a parameter indicating an audio signal (audio parameter) when the frame type information corresponds to the audio section.
On other hand, when the frame type information corresponds to the non-audio section, the payload is often composed of either a code corresponding to a parameter indicating a noise signal (noise parameter) or nothing.
From this fact, it is understood that payload sizes are varied between the audio section and the non-audio section. Thus, using a payload size or a size of the first code string in place of the frame type information also makes it possible to judge whether the signal decoded from the code string corresponds to the audio section or the non-audio section.
The audio encoding device 2 receives the first decoded signal and the frame type information output from the audio decoding device 1. Like in the audio detection result flag mentioned in connection with the configuration shown in
In this case, representation corresponding to the audio or the non-audio in the frame type information and representation corresponding to the audio or the non-audio in the audio detection result used in the audio detection device 5 may be correlated with each other beforehand. In such a case, based on this correlation, an audio detection result corresponding to the frame type information output from the audio decoding device 1 is input to the audio encoding device 2. This shows that no modification becomes unnecessary about the audio decoding device 1A and the audio encoding device 2A mentioned in connection with the conventional code conversion device of
Next, referring to
On the other hand, the audio encoding device 2 comprises a second switch 21, an audio encoding circuit 22, a noise encoding circuit 23, and a header information addition circuit 24.
The header information extraction circuit 11 separates the header and the payload from the first code string input given via the input terminal 3. In this case, the header contains frame type information. When the frame type information corresponds to the audio section, a code corresponding to an audio parameter is output to the audio decoding circuit 12. The audio parameter may include, for example, a linear prediction (LP) coefficient, an adaptive codebook (ACB), a fixed codebook (FCB), an ACB gain, and an FCB gain, all of which may be made to correspond to a first LP coefficient code, a first ACB code, a first FCB code, and a first gain code, respectively.
On the other hand, when the frame type information corresponds to the non-audio section, a code corresponding to a noise parameter is output to the noise decoding circuit 13. The noise parameter may include, for example, an LP coefficient and frame energy, which are may be made to correspond to a first LP coefficient code and a first frame energy code, respectively.
The audio decoding circuit 12 receives the first LP coefficient code, the first ACB code, the first FCB code, and the first gain code output from the header information extraction circuit 11, decodes an audio from the codes by the first decoding method 1, and outputs the decoded audio as a first decoded audio to the first switch 14.
The noise decoding circuit 13 receives the first LP coefficient code and the first frame energy code output from the header information extraction 11, decodes a noise from the codes by the first decoding method 1, and outputs the decoded noise as a first decoded noise to the first switch 14. As regards details on the noise decoding method, for example, Chapter 6 of the Nonpatent Document 5 can be referred to.
The first switch 14 receives the frame type information output from the header information extraction circuit 11 and outputs the first decoded audio sent from the audio decoding circuit 1 (sic) 12 to the second switch 21 when the frame type information corresponds to the audio section, and outputs the first decoded noise sent from the noise decoding circuit 13 to the second switch 21 when the frame type information corresponds to the non-audio section.
The second switch 21 receives the frame type information output from the header information extraction circuit 11, outputs the first decoded audio sent from the first switch 14 to the audio encoding circuit 22 when the frame type information corresponds to the audio section, and outputs the first decoded noise sent from the first switch 14 to the noise encoding circuit 23 when the frame type information corresponds to the non-audio section.
The audio encoding circuit 22 is supplied with the first decoded audio from the second switch 21, and encodes the same in accordance with the second encoding method, into the LP coefficient code, the ACB code, the FCB code and the gain code. Then, these codes are supplied as a second LP coefficient code, a second ACB code, a second FCB code, and a second gain code to the header information addition circuit 24.
The noise encoding circuit 23 is supplied with the first decoded noise from the second switch 21, and encodes the same in accordance with the second encoding method, into an LP coefficient code and a frame energy code. Then, these codes are supplied as a second LP coefficient code and a second frame energy code to the header information addition circuit 24. For details on the noise encoding method, for example, Chapter 5 of the Nonpatent Document 5 can be referred to.
Supplied with the receives the frame type information from the header information extraction circuit 11, the header information addition circuit 24 constitutes the payload by the second LP coefficient code, the second ACB code, the second FCB code, and the second gain code sent from the audio encoding circuit 22 when the frame type information corresponds to the audio section, and outputs a second code string obtained by adding a header to the payload via the output terminal 4. On the other hand, when the frame type information corresponds to the non-audio section, the second LP coefficient code and the second frame energy code output from the noise encoding circuit 23 are constituted as a payload, and a second code string obtained by adding a header to this payload is output via the output terminal 4. For details on the header and the frame type information, for example, the Nonpatent Document 4 or the like can be referred to. The first embodiment has been described.
The audio decoding device 1 receives the first code string from the input terminal 3 (step S1).
The audio decoding device 1 generates the first decoded audio in response to the first input code string in accordance with the first decoding method (step S2).
More specifically, in the audio decoding device 1, a header containing frame type information and a payload are separated from the first code string input from the input terminal 3, an audio of a code corresponding to an audio parameter is decoded into a decoded audio in accordance with the first decoding method corresponding to the first encoding method when the frame type information corresponds to an audio section. The decoded audio is output as the first decoded audio. On the other hand, when the frame type information corresponds to a non-audio section,
a noise corresponding to a noise parameter is decoded into the first decoded noise in accordance with the decoding method corresponding to the first encoding method and the first decoded noise is output from the noise decoding circuit 13. Accordingly, the audio decoding device 1 is switched and controlled by the first switch 14 to output the first decoded audio on the basis of the fame type information when the frame type information corresponds to the audio section, and to output the first decoded noise when the frame type information corresponds to the non-audio section.
On the other hand, the audio encoding device 2 judges whether the first decoded audio is an audio signal or a non-audio signal by using information contained in the code string (step S3).
The shown audio encoding device 2 receives the frame type information from the header information extraction circuit 11 of the audio decoding device 1, and judges whether the first decoded audio corresponds to the audio section or the non-audio section based on the frame type information.
The audio encoding device 2 encodes the first decoded audio by the second encoding method on the basis of a result of the judgment to generate a second code string (step S4).
When the frame type information corresponds to the audio section, the first decoded audio is encoded in accordance with the second encoding method at the audio encoding circuit 22 to be output as a second code string. On the other hand, when the frame type information corresponds to the non-audio section, the first decoded noise is encoded in accordance with the second encoding method at the noise encoding circuit 23 to be output as a second code string via the output terminal 4 (step S5).
More specifically, in the header information addition circuit 24, based on the frame type information, when the frame type information corresponds to the audio section, the second code obtained by encoding the first decoded audio from the audio decoding device 1 by the second encoding method is set as a payload, and a second code string obtained by adding a header to the payload is output from the output terminal. When the frame type information corresponds to the non-audio section, the second code obtained by encoding the first decoded noise from the audio decoding device 1 by the second encoding method is set as a payload, and a second code string obtained by adding a header to the payload is output from the output terminal 4.
The code conversion device of the embodiment of the present invention may be realized by computer control (program control method) such as a digital signal processor.
(a) processing of generating a first decoded audio from the first code string in accordance with the first decoding method; and
(b) processing of judging whether the first decoded signal is an audio signal or a non-audio signal by using information contained in the first code string, and encoding the first decoded audio in accordance with the second encoding method on the basis of the judgment to generate a second code string.
A CPU 32 reads the program out of the recording medium 36 through a recording medium reading device 35 and a recording medium reading device interface 34. The program is stored in a memory 33 to be executed. The program may be stored in a mask ROM or a nonvolatile memory, such as a flash memory. The recording medium includes a nonvolatile memory, a medium such as a CD-ROM, an FD, a digital versatile disk (DVD), a magnetic tape (MT), or a portable HDD, or the like.
The present invention has thus far been described in conjunction with the above-mentioned embodiments. However, the invention is not limited to the configurations of the embodiments. Needless to say, those skilled in the art may make various changes and corrections within a scope of the principle of the present invention. For example, the invention can be applied not only to the case where the first and second encoding methods are different from each other but also to the case where the first and second encoding methods are identical to each other to provide the same effects. Moreover, when distinction is made about whether the first code string is the audio signal or the non-audio signal, distinction may be made by using both of the frame type information and the first code string.
As described above, the present invention provides an effect that the size of the code conversion device can be reduced. According to the invention, its reason is that by using the frame type information contained in the first code string, the judgment is made as to whether the signal decoded from the code string is in the audio section or the non-audio section, which dispenses with the audio detection device.
Number | Date | Country | Kind |
---|---|---|---|
2003-117421 | Apr 2003 | JP | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/JP2004/005802 | 4/22/2004 | WO | 00 | 10/17/2005 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2004/009524 | 11/4/2004 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
5694425 | Suganuma et al. | Dec 1997 | A |
5768281 | Takano | Jun 1998 | A |
5995923 | Mermelstein et al. | Nov 1999 | A |
6044070 | Valentine et al. | Mar 2000 | A |
6658064 | Rotola-Pukkila et al. | Dec 2003 | B1 |
6678654 | Zinser et al. | Jan 2004 | B2 |
6718298 | Judge | Apr 2004 | B1 |
7016831 | Suzuki et al. | Mar 2006 | B2 |
7222069 | Suzuki et al. | May 2007 | B2 |
20030065508 | Tsuchinaga et al. | Apr 2003 | A1 |
20040213296 | Kanayama et al. | Oct 2004 | A1 |
20050159943 | Zinser et al. | Jul 2005 | A1 |
Number | Date | Country |
---|---|---|
1 288 913 | Mar 2003 | EP |
61-180299 | Aug 1986 | JP |
8-146997 | Jun 1996 | JP |
8-279811 | Oct 1996 | JP |
10-11100 | Jan 1998 | JP |
2000-078274 | Mar 2000 | JP |
2001-53869 | Feb 2001 | JP |
2002-041091 | Feb 2002 | JP |
2002-149196 | May 2002 | JP |
2002-202799 | Jul 2002 | JP |
2003-76394 | Mar 2003 | JP |
WO 0108136 | Feb 2001 | WO |
Number | Date | Country | |
---|---|---|---|
20060224389 A1 | Oct 2006 | US |