Method and apparatus to search fixed codebook and method and apparatus to encode/decode a speech signal using the method and apparatus to search fixed codebook

Information

  • Patent Application
  • 20070276655
  • Publication Number
    20070276655
  • Date Filed
    February 22, 2007
    17 years ago
  • Date Published
    November 29, 2007
    17 years ago
Abstract
A method and an apparatus to encode and decode a speech signal using a code excited linear prediction (CELP) algorithm. In order to reduce a bit rate without degrading performance in an enhancement layer based on CELP, each of a fixed codebook of a core layer and a fixed codebook of the enhancement layer is divided into a plurality of spaces. The spaces of the fixed codebook of the enhancement layer excludes a space corresponding to a least distorted space determined from among the spaces of the fixed codebook of the core layer are searched.
Description

BRIEF DESCRIPTION OF THE DRAWINGS

These and/or other aspects of the present general inventive concept will become apparent and more readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:



FIG. 1 is a block diagram illustrating an apparatus to encode a speech signal, according to an embodiment of the present general inventive concept;



FIG. 2 is a block diagram illustrating an apparatus to decode a speech signal, according to an embodiment of the present general inventive concept;



FIG. 3 is a flowchart illustrating a method of encoding a speech signal, according to an embodiment of the present general inventive concept;



FIG. 4 is a flowchart illustrating a method of decoding a speech signal, according to an embodiment of the present general inventive concept;



FIG. 5 is a flowchart illustrating a method of searching for a fixed codebook, according to an embodiment of the present general inventive concept;



FIG. 6 is a conceptual diagram illustrating a fixed codebook of each of a core layer and an enhancement layer in which combinations of possible positions of pulses are classified into a first space and a second space;



FIG. 7A is a graph illustrating a probability that a position of each pulse is selected from the fixed codebook of the enhancement layer, when a pulse position value found in the fixed codebook of the core layer is even-numbered;



FIG. 7B is a graph illustrating a probability that a position of each pulse is selected from the fixed codebook of the enhancement layer, when a pulse position value found in the fixed codebook of the core layer is odd-numbered;



FIG. 8A illustrates bits allocated to a fixed codebook of a core layer according to an embodiment of the present general inventive concept;



FIG. 8B illustrates bits allocated to a fixed codebook of an enhancement layer according to an embodiment of the present general inventive concept;



FIG. 8C illustrates bits allocated to a G.729 fixed codebook of a core layer;



FIG. 8D illustrates bits allocated to a G.729 fixed codebook of an enhancement layer;



FIG. 9A illustrates bits allocated to a fixed codebook of a core layer according to another embodiment of the present general inventive concept;



FIG. 9B illustrates bits allocated to a fixed codebook of an enhancement layer according to another embodiment of the present general inventive concept;



FIG. 9C illustrates bits allocated to a fixed codebook of a core layer in 3GPP2 VMR-WB rate set-1;



FIG. 9D illustrates bits allocated to a fixed codebook of an enhancement layer in 3GPP2 VMR-WB rate set-1;



FIG. 10A is a graph illustrating results of a comparison between a PESQ (perception evaluation of speech quality) of an embodiment of the present general inventive concept and the prior art; and



FIG. 10B is a graph illustrating results of a comparison between bits for each sub-frame used in a fixed codebook in an embodiment of the present general inventive concept and those in the prior art.





DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

Reference will now be made in detail to the embodiments of the present general inventive concept, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the like elements throughout. The embodiments are described below in order to explain the present general inventive concept by referring to the figures.



FIG. 1 is a block diagram illustrating an apparatus to encode a speech signal, according to an embodiment of the present general inventive concept. The apparatus of FIG. 1 includes a core layer generation unit 100, an enhancement layer generation unit 150, and a multiplexing unit 190.


The core layer generation unit 100 generates a core layer that includes encoding information and restores a minimal quality of the speech signal. To achieve this, the core layer generation unit 100 filters an input speech signal using a linear prediction coding (LPC) method to produce an excitation signal corresponding to the speech signal.


The core layer generation unit 100 includes a preprocessor 102, an LPC analyzer 104, an LPC coefficient quantizer 106, a first synthesis filter 108, an adder 110, a first subtractor 112, a first perceptual weighting filter 114, a pitch analyzer 116, a pitch contribution remover 118, a fixed codebook 120, a codebook searcher 122, an adaptive codebook 124, a space determiner 130, an identifier generator 132, a gain quantizer 140, a first multiplier 141, and a second multiplier 142.


The preprocessor 102 removes a direct current (DC) component from a speech signal received via an input port IN. More specifically, the preprocessor 102 removes a noise component in a low frequency band by filtering the speech signal using a high pass filter included in the preprocessor 102.


The LPC analyzer 104 extracts an LPC coefficient from the speech signal from which the DC component has been removed by the preprocessor 102.


The LPC coefficient quantizer 106 vector-quantizes the LPC coefficient extracted by the LPC analyzer 104.


The first synthesis filter 108 generates a synthesized signal corresponding to an excited signal output by the adder 110, using the result of the vector quantization by the LPC coefficient quantizer 106.


The first subtractor 112 subtracts the synthesized signal output by the first synthesis filter 108 from the signal output by the speech signal output by the preprocessor 102.


The first perceptual weighting filter 114 filters the signal output by the first subtractor 112 so that the quantization noise of the signal becomes less than or equal to a masking threshold in order to utilize the masking effect of a human's hearing structure. The first perceptual weighting filter 114 generates a signal including a weight so as to minimize the quanitzation noise of the signal output by the first subtractor 112.


The pitch analyzer 116 divides the signal output by the first perceptual weighting filter 114 into a plurality of sub-frames and analyzes the pitch of each of the sub-frames so as to generate an index and a gain of the adaptive codebook 124.


The pitch contribution remover 118 detects a target signal needed to search for a fixed codebook vector corresponding to the signal output by the first perceptual weighting filter 114 from the fixed codebook 120, using the index of the adaptive codebook 124.


The fixed codebook 120 is configured by classifying combinations of possible pulse positions into a plurality of spaces.


As illustrated in FIG. 6, the fixed codebook 120 may be configured by classifying combinations of possible pulse positions into a first space 610 and a second space 620. The first space 610 may include the possible positions of pulses that are highly likely to be searched for in a core layer.


The first and second spaces 610 and 620 may be distinguished from each other according to whether possible pulse positions are even or odd. FIG. 7A is a graph illustrating a probability that the position of each pulse is selected from a fixed codebook of an enhancement layer, when a pulse position value found in the fixed codebook of a core layer is even. Referring to FIG. 7A, when a pulse position value found in the fixed codebook of the core layer is even, the probability that a pulse position value corresponding to an odd number is selected from the fixed codebook of the enhancement layer is significantly high. FIG. 7B is a graph illustrating a probability that the position of each pulse is selected from the fixed codebook of the enhancement layer, when a pulse position value found in the fixed codebook of the core layer is odd. Referring to FIG. 7B, when a pulse position value found in the fixed codebook of the core layer is odd, the probability that a pulse position value corresponding to an even number is selected from the fixed codebook of the enhancement layer is significantly high. Hence, each of the codebooks of the core layer and the enhancement layer may be configured by classifying odd-numbered possible pulse positions into a first space and even-numbered possible pulse positions into a second space. Alternatively, as illustrated in FIG. 6, each of the codebooks of the core layer and the enhancement layer may be configured by classifying the even-numbered possible pulse positions into the first space 610 and the odd-numbered possible pulse positions into the second space 620.


Referring back to FIG. 1, the fixed codebook 120 outputs a fixed codebook vector using an index found by the codebook searcher 122.


The codebook searcher 122 searches the fixed codebook 120 for a fixed codebook vector corresponding to the target signal detected by the pitch contribution remover 118 and outputs an index and a gain of the fixed codebook 120. More specifically, the codebook searcher 122 searches for a fixed codebook vector that minimizes a mean square error (MSE) of the target signal.


When the codebook searcher 122 searches for the fixed codebook vector, a plurality of spaces included in the fixed codebook 120 are each searched. If the fixed codebook 120 is divided into the first and second spaces 610 and 620 (See FIG. 6), the first space 610 is searched for a fixed codebook vector that minimizes the MSE of the target signal, and the second space 620 is also searched for a fixed codebook vector that minimizes the MSE of the target signal.


The space determiner 130 detects a least distorted fixed codebook vector from the fixed codebook vectors found in all of the spaces of the fixed codebook 120 by the codebook searcher 122 and outputs the space to which the detected fixed codebook vector belongs.


The identifier generator 132 generates an identifier indicating the space determined by the space determiner 130. For example, a bit “offset” illustrated in FIGS. 8A and 9A corresponds to the identifier of the space output by the space determiner 130.


The adaptive codebook 124 outputs an adaptive codebook vector corresponding to the index output by the pitch analyzer 116.


The gain quantizer 140 quantizes the gain of the fixed codebook 120 output by the codebook searcher 122 and the gain of the adaptive codebook 124 output by the pitch analyzer 116 and outputs the results of the quantizations. The gain quantizer 140 outputs a quantized gain Gc of the fixed codebook 120 to the first multiplier 141 and a quantized gain Gp of the adaptive codebook 124 to the second multiplier 142.


The first multiplier 141 multiplies the fixed codebook vector output by the fixed codebook 120 by the quantized gain Gc of the fixed codebook 120 received from the gain quantizer 140.


The second multiplier 142 multiplies the adaptive codebook vector output by the adaptive codebook 124 by the quantized gain Gp of the adaptive codebook 124 received from the gain quantizer 140.


The adder 110 adds the product received from the first multiplier 141 to the product received from the second multiplier 142.


The enhancement layer generation unit 150 generates an enhancement layer to serve as an additional bit other than a bit provided by the core layer generation unit 100 in order to enhance the restored quality of sound. For example, when the core layer provides a bit rate of 8 kbps, the enhancement layer may provide an additional bit rate of 4 kbps.


The enhancement layer generation unit 150 includes a second subtractor 152, a second perceptual weighting filter 154, a codebook searcher 156, a gain difference quantizer 158, a fixed codebook 160, a third multiplier 162, and a second synthesis filter 164.


The second subtractor 152 subtracts a result output by the second perceptual weighting filter 154 from a result output by the first subtractor 112.


The second perceptual weighting filter 154 performs a filtering operation so that quantization noise is less than or equal to a masking threshold in order to utilize the masking effect of a human's hearing structure. More specifically, the second perceptual weighting filter 154 produces a signal including a weight in order to minimize the quantization noise of the signal output by the second subtractor 152.


The fixed codebook 160 outputs a fixed codebook vector corresponding to an index obtained by the codebook searcher 156. The fixed codebook 160 of the enhancement layer generation unit 150 is divided into a plurality of spaces corresponding to the spaces (i.e., the first and second spaces 610 and 620 of FIG. 6) of the fixed codebook 120 of the core layer generating unit 100.


The codebook searcher 156 searches the fixed codebook 160 for a fixed codebook vector corresponding to the result of the filtering by the second perceptual weighting filter 154 and outputs an index and a gain of the fixed codebook 160.


When the codebook searcher 156 searches for the fixed codebook vector, spaces of the fixed codebook 160 excluding the space determined by the space determiner 130 of the core layer generation unit 100 are each searched. Accordingly, if each of the fixed codebooks 120 and 160 of the core layer generating unit 100 and the enhancement layer generation unit 150, respectively, is divided into the first and second spaces 610 and 620 (See FIG. 6), and the first space 610 is determined by the space determiner 130, the codebook searcher 156 of the enhancement layer generation unit 150 searches the second space 620 for the fixed codebook vector. If the second space 620 is determined by the space determiner 130 of the core layer generation unit 100, the codebook searcher 156 of the enhancement layer generation unit 150 searches the first space 610 for the fixed codebook vector.


The gain difference quantizer 158 obtains a difference between the gain of the fixed codebook 160 output by the codebook searcher 156 of the enhancement layer generation unit 150 and the quantized gain Gc of the fixed codebook 120 output by the gain quantizer 140 of the core layer generation unit 100 and quantizes the difference. The gain difference quantizer 158 outputs the quantized gain difference Gce to the third multiplier 162 and the multiplexing unit 190.


The third multiplier 162 multiplies the fixed codebook vector output by the fixed codebook 160 of the enhancement layer generation unit 150 by the quantized gain difference Gce received from the gain difference quantizer 158.


The second synthesis filter 164 generates a synthesized signal corresponding to the product output by the third multiplier 162, using the result of the vector quantization by the LPC coefficient quantizer 106.


The multiplexing unit 190 generates a bitstream from the outputs of the LPC coefficient quantizer 106, the pitch analyzer 116, the codebook searcher 122, the identifier generator 132, the gain quantizer 140, the codebook searcher 156, and the gain difference quantizer 158. The multiplexing unit 190 then outputs the bitstream via an output port OUT.



FIG. 2 is a block diagram illustrating an apparatus to decode a speech signal, according to an embodiment of the present general inventive concept. The apparatus of FIG. 2 includes a demultiplexing unit 200, an LPC coefficient decoding unit 210, a core layer decoding unit 220, an enhancement layer decoding unit 230, a gain decoding unit 240, a gain difference decoding unit 250, a first adder 260, a first multiplier 262, a second multiplier 264, a second adder 266, a third adder 268, a first switching unit 270, a second switching unit 275, a synthesis filter 280, and a postprocessing unit 290.


The demultiplexing unit 200 receives a bitstream via an input port IN and analyzes the bitstream. The demultiplexing unit 200 outputs LPC coefficient quantization information to the LPC coefficient decoding unit 210, an index and identifier of a fixed codebook 222 to a fixed codebook decoder 224, an index of an adaptive codebook 226 to an adaptive codebook decoder 228, an index and identifier of a fixed codebook 232 to a fixed codebook decoder 234, gain quantization information to the gain decoding unit 240, and gain difference quantization information to the gain difference decoding unit 250.


The LPC coefficient decoding unit 210 decodes an LPC coefficient using the LPC coefficient quantization information received from the demultiplexing unit 200.


The core layer decoding unit 220 decodes a core layer. The core layer decoding unit 220 includes the fixed codebook 222, the fixed codebook decoder 224, the adaptive codebook 226, and the adaptive codebook decoder 228.


The fixed codebook 222 of the core layer decoding unit 220 is configured by classifying combinations of possible pulse positions into a plurality of spaces, as in the fixed codebooks 120 and 160 of the core layer generation unit 100 and the enhancement layer generation unit 150 of FIG. 1.


The fixed codebook 222 may be configured by classifying combinations of possible pulse positions into the first spaces 610 and 620, as illustrated in FIG. 6. The first space 610 may include the possible positions of pulses that are highly likely to be searched for in the core layer.


The first and second spaces 610 and 620 may be distinguished from each other according to whether the possible pulse positions are even or odd. FIG. 7A is a graph illustrating a probability that the position of each pulse is selected from a fixed codebook of an enhancement layer, when a pulse position value found in the fixed codebook of a core layer is even. Referring to FIG. 7A, when a pulse position value found in the fixed codebook of the core layer is even, the probability that a pulse position value corresponding to an odd number is selected from the fixed codebook of the enhancement layer is significantly high. FIG. 7B is a graph illustrating a probability that the position of each pulse is selected from the fixed codebook of the enhancement layer, when a pulse position value found in the fixed codebook of the core layer is odd. Referring to FIG. 7B, when a pulse position value found in the fixed codebook of the core layer is odd, the probability that a pulse position value corresponding to an even number is selected from the fixed codebook of the enhancement layer is significantly high. Hence, each of the codebooks of the core layer and the enhancement layer may be configured by classifying odd-numbered possible pulse positions into a first space and even-numbered possible pulse positions into a second space. Alternatively, as illustrated in FIG. 6, each of the codebooks of the core layer and the enhancement layer may be configured by classifying the even-numbered possible pulse positions into the first space 610 and the odd-numbered possible pulse positions into the second space 620.


Referring back to FIG. 2, the fixed codebook decoder 224 determines a to-be-searched space of the spaces of the fixed codebook 222 using the identifier output by the demultiplexing unit 200, searches the determined space for a codeword corresponding to the index output by the demultiplexing unit 200, and decodes the codeword. Here, the identifier represents a bit “offset” illustrated in FIGS. 8A and 9A.


The adaptive codebook decoder 228 searches the adaptive codebook 226 for the codeword corresponding to the index output by the demultiplexing unit 200 and decodes the codeword.


The enhancement layer decoding unit 230 decodes an enhancement layer. The enhancement layer decoding unit 230 includes the fixed codebook 232 and the fixed codebook decoder 234.


The fixed codebook 232 is divided into a plurality of spaces corresponding to the spaces of the fixed codebook 222 of the core layer decoding unit 220.


The fixed codebook decoder 234 searches spaces of the fixed codebook 232 excluding the space determined by the fixed codebook decoder 224 of the core layer decoding unit 220 for a codeword corresponding to the index output by the demultiplexing unit 200 and decodes the found codeword. Accordingly, if each of the fixed codebooks 222 and 232 of the core layer decoding unit 220 and the enhancement layer decoding unit 230, respectively, is divided into the first and second spaces 610 and 620, and the first space 610 is determined by the fixed codebook decoder 224, the fixed codebook decoder 234 searches the second space 620 for the codeword. If the second space 620 is determined by the fixed codebook decoder 224, the fixed codebook decoder 234 searches the first space 610 for the codeword.


The gain decoding unit 240 decodes the gain quantization information received from the demultiplexing unit 200, the information including a fixed codebook gain Gc and an adaptive codebook gain Gp of the core layer, and outputs the fixed codebook gain Gc and the adaptive codebook gain Gp.


The gain difference decoding unit 250 decodes a difference between the gains of the fixed codebooks of the core layer and the enhancement layer output by the demultiplexing unit 200.


The first adder 260 adds a result output by the fixed codebook decoder 224 of the core layer decoding unit 220 to a result output by the fixed codebook decoder 234 of the enhancement layer decoding unit 230.


The first switching unit 270 selectively switches between the result output by the fixed codebook decoder 224 or a result of the addition by the first adder 260 according to a control signal.


The third adder 268 adds the fixed codebook gain Gc of the core layer output by the gain decoding unit 240 to a result output by the gain difference decoding unit 250.


The second switching unit 275 selectively switches between the fixed codebook gain Gc of the core layer output by the gain decoding unit 240 or the result of the addition by the third adder 268 according to a control signal.


The second multiplier 264 multiplies the result output by the first switching unit 270 by the result output by the second switching unit 275.


The first multiplier 262 multiplies the result of the decoding by the adaptive codebook decoder 228 by the adaptive codebook gain Gp output by the gain decoding unit 240.


The second adder 266 adds the result of the multiplication by the first multiplier 262 to the result of the multiplication by the second multiplier 264.


The synthesis filter 280 synthesizes the result of the addition by the second adder 266 using the decoded LPC coefficient received from the LPC coefficient decoding unit 210, to thereby restore the speech signal.


The postprocessing unit 290 improves the quality of the speech signal restored by the synthesis filter 280 and outputs the improved speech signal via an output port OUT. More specifically, the postprocessing unit 290 filters the restored speech signal using a high pass filter and the decoded LPC coefficient output by the LPC coefficient decoding unit 210, in order to improve the quality of the speech signal restored by the synthesis filter 280.


A codebook searching apparatus according to embodiments of the present general inventive concept is included in the speech signal encoding apparatus of FIG. 1 and the speech signal decoding apparatus of FIG. 2.



FIG. 3 is a flowchart illustrating a method of encoding a speech signal, according to an embodiment of the present general inventive concept. The method of FIG. 3 may be performed by the encoding apparatus of FIG. 1. First, in operation 302, a DC component is removed from an input speech signal. That is, in the operation 302, the speech signal is filtered using a high pass filter to remove a noise component in a low frequency band from the speech signal.


In operation 304, an LPC coefficient is extracted from the speech signal from which the DC component has been removed in the operation 302.


In operation 306, the LPC coefficient extracted in the operation 304 is vector quantized.


In operation 308, a subtractor subtracts a signal output by a synthesis filter of a core layer from the speech signal from which the DC component has been removed.


In operation 310, in order to use the masking effect of a human's hearing structure, a perceptual weighting filter of the core layer filters the result of the subtraction in the operation 308 so that quantization noise become less than or equal to a masking threshold. In the operation 310, a signal including a weight is generated so as to minimize the quantization noise of the signal output in the operation 308.


In operation 312, the signal filtered in the operation 310 is divided into a plurality of sub-frames, and the pitch of each of the sub-frames is analyzed to output an index and gain of an adaptive codebook.


In operation 314, a target signal needed to search a fixed codebook for a fixed codebook vector corresponding to the signal filtered in the operation 310 is detected using the index of the adaptive codebook output in the operation 312.


In operation 316, the fixed codebook is searched for a fixed codebook vector corresponding to the target signal detected in the operation 314. In the operation 316, a fixed codebook vector that minimizes a mean squared error (MSE) of the target signal is searched for.


The fixed codebook of the core layer is configured by classifying combinations of possible pulse positions into a plurality of spaces.


As illustrated in FIG. 6, the fixed codebook of the core layer may be configured by classifying combinations of possible pulse positions into the first space 610 and the second space 620. The first space 610 may include the possible positions of pulses that are highly likely to be searched for in a core layer.


The first and second spaces 610 and 620 may be distinguished from each other according to whether possible pulse positions are even or odd. FIG. 7A is a graph illustrating a probability that the position of each pulse is selected from a fixed codebook of an enhancement layer, when a pulse position value found in the fixed codebook of a core layer is even. Referring to FIG. 7A, when a pulse position value found in the fixed codebook of the core layer is even, the probability that a pulse position value corresponding to an odd number is selected from the fixed codebook of the enhancement layer is significantly high. FIG. 7B is a graph illustrating a probability that the position of each pulse is selected from the fixed codebook of the enhancement layer, when a pulse position value found in the fixed codebook of the core layer is odd. Referring to FIG. 7B, when a pulse position value found in the fixed codebook of the core layer is odd, the probability that a pulse position value corresponding to an even number is selected from the fixed codebook of the enhancement layer is significantly high. Hence, each of the codebooks of the core layer and the enhancement layer may be configured by classifying odd-numbered possible pulse positions into a first space and even-numbered possible pulse positions into a second space. Alternatively, as illustrated in FIG. 6, each of the codebooks of the core layer and the enhancement layer may be configured by classifying the even-numbered possible pulse positions into the first space 610 and the odd-numbered possible pulse positions into the second space 620.


Referring back to FIG. 3, the fixed codebook search in the operation 316, each of the spaces of the fixed codebook of the core layer is searched. Accordingly, if the fixed codebook is divided into the first and second spaces 610 and 620 (See FIG. 6), the first space 610 is searched for a fixed codebook vector that minimizes the MSE of the target signal, and the second space 620 is also searched for the fixed codebook vector that minimizes the MSE of the target signal.


In operation 318, the least distorted fixed codebook vector is detected from the fixed codebook vectors found in the spaces of the fixed codebook of the core layer, and the space from which the detected fixed codebook vector is found is output. In the operation 318, an index and gain of the fixed codebook belonging to the determined space are output.


In operation 320, an identifier indicating the space determined in the operation 318 is generated. For example, the bit “offset” illustrated in FIGS. 8A and 9A corresponds to the identifier of the space determined in the operation 318.


In operation 322, the gain of the fixed codebook output in the operation 318 and the gain of the adaptive codebook output in operation 312 are quantized to generate a quantized fixed codebook gain Gc and a quantized adaptive codebook gain Gp.


In operation 324, the fixed codebook vector detected in the operation 318 is multiplied by the quantized fixed codebook gain Gc generated in the operation 322.


In operation 326, the adaptive codebook vector detected in the operation 312 is multiplied by the quantized adaptive codebook gain Gp generated in the operation 322.


In operation 328, the result of the multiplication in the operation 324 is added to the result of the multiplication in the operation 326.


In operation 330, a synthesis filter outputs a synthetic signal corresponding to an excitation signal obtained in the operation 328, using the result of the vector quantization in operation 306.


After the operation 308, a signal corresponding to the result of the subtraction in the operation 308 is filtered so that quantization noise of the signal becomes less than or equal to a masking threshold, in order to utilize the masking effect of the human's hearing structure, in operation 354. In other words, in the operation 354, a signal including a weight is generated so as to minimize the quantization noise of the signal obtained in the operation 308.


In operation 356, a fixed codebook vector corresponding to the result of the filtering in the operation 354 is searched for in the fixed codebook. In the operation 356, an index and a gain of the fixed codebook vector found in the operation 356 are output.


The fixed codebook of the enhancement layer is divided into a plurality of spaces corresponding to the spaces of the fixed codebook of the core layer.


Upon the fixed codebook vector search in the operation 354, spaces of the fixed codebook of the enhancement layer excluding the space determined in the operation 318 are each searched. Accordingly, if each of the fixed codebooks of the core layer and the enhancement layer is divided into the first and second spaces 610 and 620 (See FIG. 6), and the first space 610 is determined in the operation 318, the second space 620 is searched for a fixed codebook vector in the operation 356. If the second space 620 is determined in the operation 318, the first space 610 is searched for a fixed codebook vector in the operation 356.


In operation 358, a difference between the gain of the fixed codebook output in the operation 356 and the quantized gain Gc of the fixed codebook output in the operation 322 is obtained and quantized to generate a quantized gain difference Gce.


In operation 360, the fixed codebook vector output in the operation 356 is multiplied by the quantized gain difference Gce output in the operation 358.


In operation 362, a synthesis filter generates a synthesized signal corresponding to the result of the multiplication in the operation 360, using the result of the vector quantization in the operation 306.


In operation 380, a bitstream is generated from the results output in the operations 306, 312, 318, 320, 322, 356, and 358.



FIG. 4 is a flowchart illustrating a method of decoding a speech signal, according to an embodiment of the present general inventive concept. The method of FIG. 4 may be performed by the decoding apparatus of FIG. 2. First, in operation 400, a bitstream is received from a speech signal encoding apparatus, and the bitstream is analyzed. More specifically, in the operation 400, LPC coefficient quantization information, an index and an identifier of a fixed codebook of a core layer, an index of an adaptive codebook of the core layer, an index and identifier of a fixed codebook of an enhancement layer, gain quantization information, and gain difference quantization information are output.


In operation 405, an LPC coefficient is decoded using the LPC coefficient quantization information output in the operation 400.


In operation 415, a to-be-searched space of the spaces of the fixed codebook of the core layer is determined using the identifier output in the operation 400, the determined space is searched for a codeword corresponding to the index output in the operation 400, and the codeword is decoded. Here, the identifier represents a specific space provided in the fixed codebook of the core layer as a bit “offset” illustrated in FIGS. 8A and 9A.


The fixed codebook of the core layer is configured by classifying combinations of possible pulse positions into a plurality of spaces, as in the fixed codebook of the enhancement layer.


The fixed codebook of the core layer may be configured by classifying combinations of possible pulse positions into the first spaces 610 and 620, as illustrated in FIG. 6. The first space 610 may include the possible positions of pulses that are highly likely to be searched for in the core layer.


The first and second spaces 610 and 620 may be distinguished from each other according to whether possible pulse positions are even or odd. FIG. 7A is a graph illustrating a probability that the position of each pulse is selected from a fixed codebook of an enhancement layer, when a pulse position value found in the fixed codebook of a core layer is even. Referring to FIG. 7A, when a pulse position value found in the fixed codebook of the core layer is even, the probability that a pulse position value corresponding to an odd number is selected from the fixed codebook of the enhancement layer is significantly high. FIG. 7B is a graph illustrating a probability that the position of each pulse is selected from the fixed codebook of the enhancement layer, when a pulse position value found in the fixed codebook of the core layer is odd. Referring to FIG. 7B, when a pulse position value found in the fixed codebook of the core layer is odd, the probability that a pulse position value corresponding to an even number is selected from the fixed codebook of the enhancement layer is significantly high. Hence, each of the codebooks of the core layer and the enhancement layer may be configured by classifying odd-numbered possible pulse positions into a first space and even-numbered possible pulse positions into a second space. Alternatively, as illustrated in FIG. 6, each of the codebooks of the core layer and the enhancement layer may be configured by classifying the even-numbered possible pulse positions into the first space 610 and the odd-numbered possible pulse positions into the second space 620.


Referring back to FIG. 4, in operation 420, the codeword corresponding to the index of the adaptive codebook of the core layer output in the operation 400 is searched for from the adaptive codebook of the core layer and is decoded.


In operation 425, a codeword corresponding to the index of the fixed codebook of the enhancement layer output in the operation 400 is searched for in spaces of the fixed codebook of the enhancement layer excluding the space determined in the operation 415 and is decoded. Accordingly, if each of the fixed codebooks of the core layer and the enhancement layer is divided into the first and second spaces 610 and 620 (See FIG. 6), and the first space 610 is determined in the operation 415, a codeword is searched for in the second space 620. If the second space 620 is determined in the operation 415, a codeword is searched for in the first space 610.


The fixed codebook of the enhancement layer is configured by classifying combinations of possible pulse positions into spaces corresponding to the spaces of the fixed codebook of the core layer.


In operation 430, the fixed codebook gain and the adaptive codebook gain output in the operation 400 are decoded.


In operation 435, a difference between the fixed codebook gains of the core layer and the enhancement layer output in the operation 400 is decoded.


In operation 440, a predetermined operation is executed on the results of the decoding in the operations 415, 420, 430, and 435.


In operation 445, the result of the operation performed in the operation 440 is synthesized in a synthesis filter using the decoded LPC coefficient output in the operation 405, to thereby restore the speech signal.


In the operation 450, the quality of the speech signal restored in the operation 445 is improved to thereby output an improved restored speech signal. More specifically, in the operation 450, the quality of the speech signal restored in the operation 445 is improved by filtering the restored speech signal using a high pass filter and the decoded LPC coefficient output in the operation 405.


A codebook searching method according to embodiments of the present general inventive concept is performed during the speech signal encoding method of FIG. 3 and the speech signal decoding method of FIG. 4.



FIG. 5 is a flowchart illustrating a method of searching for a fixed codebook, according to an embodiment of the present general inventive concept. Each of the fixed codebooks of the core layer and the enhancement layer may be configured by classifying combinations of possible pulse positions into the first and second spaces 610 and 620 (See FIG. 6).


The first space 610 may include the possible positions of pulses that are highly likely to be searched for in a core layer.


The first and second spaces 610 and 620 may be distinguished from each other according to whether possible pulse positions are even or odd. FIG. 7A is a graph illustrating a probability that the position of each pulse is selected from a fixed codebook of an enhancement layer, when a pulse position value found in the fixed codebook of a core layer is even. Referring to FIG. 7A, when a pulse position value found in the fixed codebook of the core layer is even, the probability that a pulse position value corresponding to an odd number is selected from the fixed codebook of the enhancement layer is significantly high. FIG. 7B is a graph illustrating a probability that the position of each pulse is selected from the fixed codebook of the enhancement layer, when a pulse position value found in the fixed codebook of the core layer is odd. Referring to FIG. 7B, when a pulse position value found in the fixed codebook of the core layer is odd, the probability that a pulse position value corresponding to an even number is selected from the fixed codebook of the enhancement layer is significantly high. Hence, each of the codebooks of the core layer and the enhancement layer may be configured by classifying odd-numbered possible pulse positions into a first space and even-numbered possible pulse positions into a second space. Alternatively, as illustrated in FIG. 6, each of the codebooks of the core layer and the enhancement layer may be configured by classifying the even-numbered possible pulse positions into the first space 610 and the odd-numbered possible pulse positions into the second space 620.


Referring back to FIG. 5, first, in operation 500, a fixed codebook vector that minimizes a mean squared error (MSE) of a target signal is searched in each of the first and second spaces 610 and 620 of the fixed codebook of the core layer.


In operation 510, a distorted value D1 of the fixed codebook vector selected from the second space 620 of the fixed codebook of the core layer in the operation 500 is subtracted from a distorted value D0 of the fixed codebook vector selected from the first space 610 of the fixed codebook of the core layer in the operation 500.


In operation 520, it is determined whether a value D0-D1 corresponding to the result of the subtraction in the operation 510 is larger than 0.


In operation 530, if it is determined in the operation 520 that the value D0-D1 is larger than 0, an identifier of the first space 610 of the fixed codebook of the core layer is generated. Here, the identifier represents a specific space provided in the fixed codebook of the core layer as a bit “offset” illustrated in FIGS. 8A and 9A.


After the operation 530, in operation 540, only the second space 620 of the fixed codebook of the enhancement layer is searched for a fixed codebook vector.


In operation 550, if it is determined in the operation 520 that the value D0-D1 is less than or equal to 0, an identifier of the second space 620 of the fixed codebook of the core layer is generated.


In operation 560, only the first space 610 of the fixed codebook of the enhancement layer is searched for a fixed codebook vector.



FIG. 8A illustrates bits allocated to a fixed codebook of a core layer according to an embodiment of the present general inventive concept. FIG. 8B illustrates bits allocated to a fixed codebook of an enhancement layer according to an embodiment of the present general inventive concept. FIG. 8C illustrates bits allocated to a G.729 fixed codebook of a core layer. FIG. 8D illustrates bits allocated to a G.729 fixed codebook of an enhancement layer. FIG. 9A illustrates bits allocated to a fixed codebook of a core layer according to another embodiment of the present general inventive concept. FIG. 9B illustrates bits allocated to a fixed codebook of an enhancement layer according to another embodiment of the present general inventive concept. FIG. 9C illustrates bits allocated to a fixed codebook of a core layer in 3GPP2 VMR-WB rate set-1. FIG. 9D illustrates bits allocated to a fixed codebook of an enhancement layer in 3GPP2 VMR-WB rate set-1. FIG. 10A is a graph illustrating results of a comparison between a PESQ (perceptual evaluation of speech quality) of an embodiment of the present general inventive concept and a PESQ of the prior art. In FIG. 10A, the PESQ(s) of the present embodiment is represented by a dotted bar graph while a PESQ of the prior art is represented by a bar graph having diagonal lines. FIG. 10B is a graph illustrating results of a comparison between bits for each sub-frame used in a fixed codebook in an embodiment of the present general inventive concept and bits for each sub-frame used in a fixed codebook in the prior art. In FIG. 10B, a number of bits of the present embodiment is represented by a dotted bar graph while a number of bits of the prior art is represented by a bar graph having diagonal lines.


In a fixed codebook searching method and apparatus according to embodiments of the present general inventive concept and a speech signal encoding/decoding method and apparatus using the fixed codebook searching method and apparatus, in order to reduce a bit rate without degrading a performance in an enhancement layer based on CELP, each of a fixed codebook of a core layer and a fixed codebook of the enhancement layer is divided into a plurality of spaces. Accordingly, spaces of the fixed codebook of the enhancement layer excluding a space corresponding to the least distorted space determined from among the spaces of the fixed codebook of the core layer are searched.


By doing this, bits for positions values represented with underlining do not need to be allocated to the fixed codebooks of FIGS. 8A, 8B, 9A, and 9B according to the present general inventive concept. Hence, the fixed codebooks of FIGS. 8A, 8B, 9A, and 9B can have a smaller number of bits than the number of bits allocated to the G.729 fixed codebooks illustrated in FIGS. 8C and 8D and the number of bits allocated to the fixed codebooks in 3GPP2 VMR-WB rate set-1 illustrated in FIGS. 9C and 9D. The use of a smaller number of bits in the fixed codebook according to the present general inventive concept can also be seen from the PESQ results illustrated in FIG. 10A and the results of the comparison between bits for each sub-frame used in a fixed codebook in the present general inventive concept and bits for each sub-frame in the prior art illustrated in FIG. 10B. Therefore, in a fixed codebook searching method and apparatus according to embodiments of the present general inventive concept and a speech signal encoding/decoding method and apparatus using the fixed codebook searching method and apparatus, a speech signal can be encoded or decoded using a small number of bits without degrading the performance.


The general inventive concept can be embodied as computer (which denotes any device having an information processing function) readable codes on a computer readable recording medium. The computer readable recording medium is any data storage device that can store programs or data which can be thereafter read by a computer system. Examples of the computer readable recording medium include read-only memory (ROM), random-access memory (RAM), CD-ROMs, magnetic tapes, hard disks, floppy disks, flash memory, optical data storage devices, and so on.


Although a few embodiments of the present general inventive concept have been shown and described, it will be appreciated by those skilled in the art that changes may be made in these embodiments without departing from the principles and spirit of the general inventive concept, the scope of which is defined in the appended claims and their equivalents.

Claims
  • 1. A fixed codebook searching apparatus, comprising: a core layer codebook including a plurality of spaces into which combinations of possible positions of pulses are classified;a core layer searching unit to search each of the spaces of the core layer codebook and to determine a least distorted space from among the spaces of the core layer codebook;an enhancement layer codebook including a plurality of spaces corresponding to the spaces of the core layer codebook; andan enhancement layer searching unit to search the spaces of the enhancement layer codebook excluding the space in the enhancement layer codebook that corresponds to the determined space in the core layer codebook.
  • 2. The fixed codebook searching apparatus of claim 1, wherein each of the core layer codebook and the enhancement layer codebook is configured by classifying combinations of the possible positions of pulses into a first space and a second space.
  • 3. The fixed codebook searching apparatus of claim 2, wherein the first space comprises possible positions of pulses that are highly likely to be searched from the core layer codebook.
  • 4. The fixed codebook searching apparatus of claim 2, wherein the possible pulse positions are classified into the first and second spaces of each of the core layer codebook and the enhancement layer codebook according to whether the possible pulse positions are even or odd.
  • 5. The fixed codebook searching apparatus of claim 1, wherein the core layer searching unit comprises: a searcher to search each of the spaces of the core layer codebook;a space determiner to determine the least distorted space from among the searched spaces; andan identifier generator to generate an identifier indicating the determined space.
  • 6. An apparatus to encode a speech signal, the apparatus comprising: a core layer codebook including a plurality of spaces into which combinations of possible positions of pulses are classified;a core layer generating unit to search each of the spaces of the core layer codebook and to generate a core layer by determining a least distorted space from among the spaces of the core layer codebook;an enhancement layer codebook including a plurality of spaces corresponding to the spaces of the core layer codebook;an enhancement layer generating unit to generate an enhancement layer by searching spaces of the enhancement layer codebook excluding a space in the enhancement layer codebook that corresponds to the determined space of the core layer codebook; andan encoding unit to encode the speech signal into a core layer and an enhancement layer.
  • 7. The apparatus of claim 6, wherein each of the core layer codebook and the enhancement layer codebook is configured by classifying combinations of the possible positions of pulses into a first space and a second space.
  • 8. The apparatus of claim 7, wherein the first space comprises possible positions of pulses that are highly likely to be searched from the core layer codebook.
  • 9. The apparatus of claim 7, wherein the possible pulse positions are classified into the first and second spaces of each of the core layer codebook and the enhancement layer codebook according to whether the possible pulse positions are even or odd.
  • 10. The apparatus of claim 6, wherein the core layer generating unit comprises: a searcher to search each of the spaces of the core layer codebook;a space determiner to determine a space to which a least distorted result from among results found in the searched spaces;a layer generator to generate the core layer using the least distorted result found in the determined space; andan identifier generator to generate an identifier indicating the determined space.
  • 11. An encoding apparatus to encode a speech signal, the apparatus comprising: a core layer generation unit having a core fixed codebook with spaces that are searchable for codes to encode a core layer of the speech signal; andan enhancement layer generation unit having an enhancement fixed codebook with spaces that are searchable for codes to encode an enhancement layer of the speech signal, the searchable spaces of the enhancement fixed codebook being different from the searchable spaces of the core fixed codebook.
  • 12. The apparatus of claim 11, wherein the core fixed codebook and the enhancement fixed codebook each have different encoding information based on where pulses occurring in a sub-frame of a core layer and an enhancement layer of the speech signal are likely to occur.
  • 13. The apparatus of claim 11, wherein different position bits are allocated to each of the core fixed codebook and the enhancement fixed codebook.
  • 14. The apparatus of claim 11, wherein the core fixed codebook and the enhancement fixed codebook are divided into predetermined groups of pulse position bits such that an enhancement layer is encoded using a first group of pulse position bits and a core layer is encoded using a second group of pulse position bits.
  • 15. The apparatus of claim 11, wherein the core layer generation unit searches the core fixed codebook for a first fixed codebook vector that minimizes distortion with respect to a first signal and includes a space indicator to indicate the space of the core fixed codebook in which the codebook vector is found.
  • 16. The apparatus of claim 15, wherein the enhancement layer generation unit searches the enhancement fixed codebook for a second fixed codebook vector in a space of the enhancement fixed codebook that does not correspond to the space indicated by the space indicator.
  • 17. The apparatus of claim 11, wherein the core layer generation unit further includes an adaptive codebook to output an adaptive codebook vector indicative of pitch information of the speech signal.
  • 18. An encoding apparatus to encode a speech signal, the apparatus comprising: a core layer generation unit having a first fixed codebook with at least a first portion and a second portion, both the first and second portions being searchable to find a first fixed codebook vector that minimizes distortion with respect to a first signal; andan enhancement layer generation unit having a second fixed codebook with at least a first portion and a second portion corresponding to the first and second portions of the first fixed codebook, the first portion of the second fixed codebook being searchable for a second fixed codebook vector when the first fixed codebook vector is found in the second portion of the first fixed codebook, and the second portion of the second fixed codebook being searchable for the second fixed codebook vector when the first fixed codebook vector is found in the first portion of the first fixed codebook.
  • 19. An apparatus to decode a speech signal encoded into a core layer and an enhancement layer, the apparatus comprising: a core layer codebook including a plurality of spaces into which combinations of possible positions of pulses are classified;a core layer decoding unit to decode the core layer by searching a space of the core layer codebook that is indicated by an identifier included in the encoded speech signal;an enhancement layer codebook including a plurality of spaces corresponding to the spaces of the core layer codebook; andan enhancement layer decoding unit to decode the enhancement layer by searching spaces of the enhancement layer codebook excluding a space in the enhancement layer codebook that corresponds to the determined space of the core layer codebook.
  • 20. The apparatus of claim 19, wherein the identifier included in the encoded speech signal indicates a space of the core layer codebook that is used to decode the encoded speech signal.
  • 21. The apparatus of claim 19, wherein each of the core layer codebook and the enhancement layer codebook is configured by classifying the combinations of possible positions of pulses into a first space and a second space.
  • 22. The apparatus of claim 21, wherein the first space comprises possible positions of pulses that are highly likely to be searched from the core layer codebook.
  • 23. The apparatus of claim 21, wherein the possible pulse positions are classified into the first and second spaces of each of the core layer codebook and the enhancement layer codebook according to whether the possible pulse positions are even or odd.
  • 24. A decoding apparatus to decode an encoded speech signal, the apparatus comprising: a core layer decoding unit having a core fixed codebook with spaces that are searchable for codes to decode a core layer of the encoded speech signal; andan enhancement layer decoding unit having an enhancement fixed codebook with spaces that are searchable for codes to decode an enhancement layer of the encoded speech signal, the searchable spaces of the enhancement fixed codebook being different from the searchable spaces of the core fixed codebook.
  • 25. A fixed codebook searching method, comprising: searching spaces of a core layer codebook;determining a least distorted space from among the spaces of the core layer codebook; andsearching spaces of an enhancement layer codebook excluding a space corresponding to the determined space of the core layer codebook,wherein the core layer codebook is configured by classifying possible pulse positions into a plurality of spaces, and the enhancement layer codebook is configured by classifying possible pulse positions into a plurality of spaces corresponding to the spaces of the core layer codebook.
  • 26. The fixed codebook searching method of claim 25, wherein each of the core layer codebook and the enhancement layer codebook is configured by classifying combinations of possible positions of pulses into a first space and a second space.
  • 27. The fixed codebook searching method of claim 26, wherein the first space comprises possible positions of pulses that are highly likely to be searched from the core layer codebook.
  • 28. The fixed codebook searching method of claim 26, wherein the possible pulse positions are classified into the first and second spaces of each of the core layer codebook and the enhancement layer codebook according to whether the possible pulse positions are even or odd.
  • 29. The fixed codebook searching method of claim 25, wherein the determining of the least distorted space comprises generating an identifier indicating the determined space.
  • 30. A method of searching a fixed codebook, the method comprising: searching for a fixed codebook vector in first and second spaces of a fixed codebook of a core layer;comparing a distortion value of a first fixed codebook vector selected from the first space with a distortion value of a second fixed codebook vector selected from the second space;generating an identifier to indicate one of the first and second spaces based on the comparison of the distortion values; andsearching another one of the first and second spaces not indicated by the identifier for a fixed codebook vector of an enhancement layer.
  • 31. A method of encoding a speech signal, the method comprising: searching spaces of a core layer codebook;generating a core layer by determining a least distorted space from among the spaces of the core layer codebook;generating an enhancement layer by searching spaces of an enhancement layer codebook excluding a space corresponding to the determined space of the core layer codebook; andencoding the speech signal into a core layer and an enhancement layer,wherein the core layer codebook is configured by classifying possible pulse positions into a plurality of spaces, and the enhancement layer codebook is configured by classifying possible pulse positions into a plurality of spaces corresponding to the spaces of the core layer codebook.
  • 32. The method of claim 31, wherein each of the core layer codebook and the enhancement layer codebook is configured by classifying combinations of possible positions of pulses into a first space and a second space.
  • 33. The method of claim 32, wherein the first space comprises possible positions of pulses that are highly likely to be searched from the core layer codebook.
  • 34. The method of claim 32, wherein possible pulse positions are classified into the first and second spaces of each of the core layer codebook and the enhancement layer codebook according to whether the possible pulse positions are even or odd.
  • 35. The method of claim 31, wherein the determining of the least distorted space comprises generating an identifier indicating the determined space.
  • 36. A method of decoding a speech signal encoded into a core layer and an enhancement layer, the method comprising: decoding the core layer by searching a space of a core layer codebook that is indicated by an identifier included in the encoded speech signal; anddecoding the enhancement layer by searching spaces of an enhancement layer codebook excluding a space corresponding to the determined space of the core layer codebook,wherein the core layer codebook is configured by classifying possible pulse positions into a plurality of spaces, and the enhancement layer codebook is configured by classifying possible pulse positions into a plurality of spaces corresponding to the spaces of the core layer codebook.
  • 37. The method of claim 36, wherein the identifier included in the encoded speech signal indicates a space of the core layer codebook that is used to decode the encoded speech signal.
  • 38. The method of claim 36, wherein each of the core layer codebook and the enhancement layer codebook is configured by classifying combinations of possible positions of pulses into a first space and a second space.
  • 39. The method of claim 38, wherein the first space comprises possible positions of pulses that are highly likely to be searched from the core layer codebook.
  • 40. The method of claim 38, wherein possible pulse positions are classified into the first and second spaces of each of the core layer codebook and the enhancement layer codebook according to whether the possible pulse positions are even or odd.
  • 41. A computer readable recording medium that records a computer program for executing a fixed codebook searching method, comprising: executable code to search each of spaces of a core layer codebook;executable code to determine a least distorted space from among the spaces of the core layer codebook; andexecutable code to search spaces of an enhancement layer codebook excluding a space corresponding to the determined space of the core layer codebook,wherein the core layer codebook is configured by classifying possible pulse positions into a plurality of spaces, and the enhancement layer codebook is configured by classifying possible pulse positions into a plurality of spaces corresponding to the spaces of the core layer codebook.
Priority Claims (1)
Number Date Country Kind
2006-47118 May 2006 KR national