METHOD AND DEVICE FOR PROCESSING AUDIO SIGNALS

TECHNICAL FIELD

The present invention relates to an apparatus for processing an audio signal and method thereof. Although the present invention is suitable for a wide scope of applications, it is particularly suitable for encoding or decoding an audio signal.

BACKGROUND ART

Generally, it may be able to perform a frequency transform (e.g., MDCT (modified discrete cosine transform)) on an audio signal. In doing so, an MDCT coefficient as a result of the MDCT is transmitted to a decoder. If so, the decoder reconstructs the audio signal by performing a frequency inverse transform (e.g., iMDCT (inverse MDCT)) using the MDCT coefficient.

DISCLOSURE OF THE INVENTION
Technical Problem

However, in the course of transmitting the MDCT coefficient, if all data are transmitted, it may cause a problem that bit rate efficiency is lowered. In case that such data as a pulse and the like is transmitted, it may cause a problem that a reconstruction rate is lowered.

Technical Solution

Accordingly, the present invention is directed to substantially obviate one or more of the problems due to limitations and disadvantages of the related art. An object of the present invention is to provide an apparatus for processing an audio signal and method thereof, by which a shape vector generated on the basis of energy can be used to transmit a spectral coefficient (e.g., MDCT coefficient).

Another object of the present invention is to provide an apparatus for processing an audio signal and method thereof, by which a shape vector is normalized and then transmitted to reduce a dynamic range in transmitting a shape vector.

A further object of the present invention is to provide an apparatus for processing an audio signal and method thereof, by which in transmitting a plurality of normalized values generated per step, vector quantization is performed on the rest of the values except an average of the values.

Advantageous Effects

Accordingly, the present invention provides the following effects and/or features.

First of all, in transmitting a spectral coefficient, as a shape vector generated on the basis of energy is transmitted, it may be able to raise a reconstruction rate with a relatively small number of bits.

Secondly, since a shape vector is normalized and then transmitted, the present invention reduces a dynamic range, thereby raising bit efficiency.

Thirdly, the present invention transmits a plurality of shape vectors by repeating a shape vector generating step in multi-stages, thereby reconstructing a spectral coefficient more accurately without raising a bitrate considerably.

Fourthly, in transmitting a normalized value, the present invention separately transmits an average of a plurality of normalized values and vector-quantizes a value corresponding to a differential vector only, thereby raising bit efficiency.

Fifthly, a result of vector quantization performed on the normalized value differential vector almost has no correlation to SNR and the total number of bits assigned to a differential vector but has high correlation to the total bit number of a shape vector. Hence, although a relatively smaller number of bits are assigned to the normalized value differential vector, it is advantageous in not causing considerable trouble to a reconstruction rate.

DESCRIPTION OF DRAWINGS

FIG. 1 is a block diagram of an audio signal processing apparatus according to an embodiment of the present invention.

FIG. 2 is a diagram for describing a process for generating a shape vector.

FIG. 3 is a diagram for describing a process for generating a shape vector by a multi-stage (m=0, . . . ) process.

FIG. 4 shows one example of a codebook necessary for vector quantization of a shape vector.

FIG. 5 is a diagram for a relation between the total bit number of a shape vector and a signal to noise ratio (SNR).

FIG. 6 is a diagram for a relation between the total bit number of a normalized value differential code vector and a signal to noise ratio (SNR).

FIG. 7 is a diagram for one example of a syntax for elements included in a bitstream.

FIG. 8 is a diagram for configuration of a decoder in an audio signal processing apparatus according to one embodiment of the present invention.

FIG. 9 is a schematic block diagram of a product in which an audio signal processing apparatus according to one embodiment of the present invention is implemented;

FIG. 10 is a diagram for explaining relations between products in which an audio signal processing apparatus according to one embodiment of the present invention is implemented.

FIG. 11 is a schematic block diagram of a mobile terminal in which an audio signal processing apparatus according to one embodiment of the present invention is implemented.

BEST MODE

To achieve these and other advantages and in accordance with the purpose of the present invention, as embodied and broadly described, a method of processing an audio signal according to one embodiment of the present invention may include the steps of receiving an input audio signal corresponding to a plurality of spectral coefficients, obtaining a location information indicating a location of a specific one of a plurality of the spectral coefficients based on energy of the input signal, generating a shape vector using the location information and the spectral coefficients, determining a codebook index by searching a codebook corresponding to the shape vector, and transmitting the codebook index and the location information, wherein the shape vector is generated using a part selected from the spectral coefficients and wherein the selected part is selected based on the location information.

According to the present invention, the method may further include the steps of generating a sign information on the specific spectral coefficient and transmitting the sign information, wherein the shape vector is generated further based on the sign information.

According to the present invention, the method may further include the step of generating a normalized value for the selected part. The codebook index determining step may include the steps of generating a normalized shape vector by normalizing the shape vector using the normalized value and determining the codebook index by searching the codebook corresponding to the normalized shape vector.

According to the present invention, the method may further include the steps of calculating a mean of 1^stto M^thstage normalized values, generating a differential vector using a value resulting from subtracting the mean from the 1^stto M^thstage normalized values, determining the normalized value index by searching the codebook corresponding to the differential vector, and transmitting the mean and the normalized index corresponding to the normalized value.

According to the present invention, the input audio signal may include an (m+1)^thstage input signal, the shape vector may include an (m+1)^thstage shape vector, the normalized value may include an (m+1)^thstage normalized value, and the (m+1)^thstage input signal may be generated based on an m^thstage input signal, an m^thstage shape vector and an m^thstage normalized value.

According to the present invention, the codebook index determining step may include the steps of searching the codebook using a cost function including a weight factor and the shape vector and determining the codebook index corresponding to the shape vector and the weight factor may vary in accordance with the selected part.

According to the present invention, the method may further include the steps of generating a residual signal using the input audio signal and a shape code vector corresponding to the codebook index and generating an envelope parameter index by performing a frequency envelope coding on the residual signal.

To further achieve these and other advantages and in accordance with the purpose of the present invention, an apparatus for processing an audio signal according to another embodiment of the present invention may include a location detecting unit receiving an input audio signal corresponding to a plurality of spectral coefficients, the location detecting unit obtaining a location information indicating a location of a specific one of a plurality of the spectral coefficients based on energy of the input signal, a shape vector generating unit generating a shape vector using the location information and the spectral coefficients, a vector quantizing unit determining a codebook index by searching a codebook corresponding to the shape vector, and a multiplexing unit transmitting the codebook index and the location information, wherein the shape vector is generated using a part selected from the spectral coefficients and wherein the selected part is selected based on the location information.

According to the present invention, the location detecting unit may generate a sign information on the specific spectral coefficient, the multiplexing unit may transmit the sign information, and the shape vector may be generated further based on the sign information.

According to the present invention, the shape vector generating unit may further generate a normalized value for the selected part and generate a normalized shape vector by normalizing the shape vector using the normalized value. And, the vector quantizing unit may determine the codebook index by searching the codebook corresponding to the normalized shape vector.

According to the present invention, the apparatus may further include a normalized value encoding unit calculating a mean of 1^stto M^thstage normalized values, the normalized value encoding unit generate a differential vector using a value resulting from subtracting the mean from the 1^stto M^thstage normalized values, the normalized value encoding unit determining the normalized value index by searching the codebook corresponding to the differential vector, the normalized value encoding unit transmitting the mean and the normalized index corresponding to the normalized value.

According to the present invention, the vector quantizing unit may search the codebook using a cost function including a weight factor and the shape vector and determine the codebook index corresponding to the shape vector. And, the weight factor may vary in accordance with the selected part.

According to the present invention, the apparatus may further include a residual encoding unit generating a residual signal using the input audio signal and a shape code vector corresponding to the codebook index, the residual encoding unit generating an envelope parameter index by performing a frequency envelope coding on the residual signal.

MODE FOR INVENTION

Reference will now be made in detail to the preferred embodiments of the present invention, examples of which are illustrated in the accompanying drawings. First of all, terminologies or words used in this specification and claims are not construed as limited to the general or dictionary meanings and should be construed as the meanings and concepts matching the technical idea of the present invention based on the principle that an inventor is able to appropriately define the concepts of the terminologies to describe the inventor's invention in best way. The embodiment disclosed in this disclosure and configurations shown in the accompanying drawings are just one preferred embodiment and do not represent all technical idea of the present invention. Therefore, it is understood that the present invention covers the modifications and variations of this invention provided they come within the scope of the appended claims and their equivalents at the timing point of filing this application.

According to the present invention, the following terminologies may be construed in accordance with the following references and other terminologies not disclosed in this specification can be construed as the following meanings and concepts matching the technical idea of the present invention. Specifically, ‘coding’ can be construed as ‘encoding’ or ‘decoding’ selectively and ‘information’ in this disclosure is the terminology that generally includes values, parameters, coefficients, elements and the like and its meaning can be construed as different occasionally, by which the present invention is non-limited.

In this disclosure, in a broad sense, an audio signal is conceptionally discriminated from a video signal and designates all kinds of signals that can be auditorily identified. In a narrow sense, the audio signal means a signal having none or small quantity of speech characteristics. Audio signal of the present invention should be construed in a broad sense. Yet, the audio signal of the present invention can be understood as an audio signal in a narrow sense in case of being used as discriminated from a speech signal.

Although coding is specified to encoding only, it can be also construed as including both encoding and decoding.

FIG. 1 is a block diagram of an audio signal processing apparatus according to an embodiment of the present invention. Referring to FIG. 1, an encoder 100 includes a location detecting unit 110 and a shape vector generating unit 120. The encoder 100 may further include at least one of a vector quantizing unit 130, an (m+1)^thstage input signal generating unit 140, a normalized value encoding unit 150, a residual generating unit 160, a residual encoding unit 170 and a multiplexing unit 180. The encoder 100 may further include a transform unit (not shown in the drawing) configured to generate a spectral coefficient or may receive a spectral coefficient from an external device.

In the following description, functions of the above components are schematically explained. First of all, spectral coefficients of the encoder 100 are received or generated, a location of a high energy sample is detected from the spectral coefficients, a normalized shape vector is generated based on the detected location, normalization is performed, and vector quantization is then performed. Generation, normalization and vector quantization of a shape vector are repeatedly performed on signal in subsequent stages (m=1, . . . , M−1). Encoding is performed on a plurality of the normalized values generated by the multiple stages, a residual for the encoding result is generated via the shape vector, and residual coding is then performed on the generated residual.

In the following description, the functions of the above components shall be explained in detail.

First of all, the location detecting unit 110 receives spectral coefficients as an input signal X₀(of a 1^ststage (m=0)) and then detects a location of the coefficient having a maximum sample energy from the coefficients. In this case, the spectral coefficient corresponds to a result of frequency transform of an audio signal of a single frame (e.g., 20 ms). For instance, if the frequency transform includes MDCT, the corresponding result may include MDCT (modified discrete cosine transform coefficient. Moreover, it may correspond to an MDCT coefficient constructed with frequency components on low frequency band (4 kHz or lower).

The input signal X₀of the 1^ststage (m=0) is a set of total N spectral coefficients and may be represented as follows.

X
₀
=[x
₀(0),x₀(1), . . . ,x₀(N−1)] [Formula 1]

In Formula 1, X₀indicates an input signal of a 1^ststage (m=0) and N indicates the total number of spectral coefficients.

The location detecting unit 110 determines a frequency (or a frequency location) km corresponding to a coefficient having a maximum sample energy for the input signal X₀of the 1^ststage (m=0) as follows.

$\begin{matrix} k_{m} = \underset{0 \leq n < N}{\arg \max} (\langle x_{m} (n) \rangle) & [Formula 2] \end{matrix}$

In Formula 2, X_mindicates the (m+1)^thstage input signal (spectral coefficient), n indicates an index of a coefficient, N indicates the total number of coefficients of an input signal, and k_mindicates a frequency (or location) corresponding to a coefficient having a maximum sample energy.

Meanwhile, if the m is not 0 but is equal to or greater than 1 (i.e., a case of an input signal of a (m+1)^thstage), an output of the (m+1)^thstage input signal generating unit 150 is inputted to the location detecting unit 110 instead of the input signal X₀of the 1^ststage (m=0), which shall be explained in the description of the (m+1)^thstage input signal generating unit 150.

In FIG. 2, one example of spectral coefficients X_m(0)˜X_m(N−1), of which total number N is about 160, is illustrated. Referring to FIG. 2, a value of a coefficient X_m(k_m) having a highest energy corresponds to about 450. And, a frequency or location Km corresponding to this coefficient is nearby n (=140) (about 139).

Thus, once the location (k_m) is detected, a sign (Sign(X_m(K_m)) of a coefficient X_m(k_m) corresponding to the location k_mis generated. This sign is generated to make shape vectors have positive (+) values in the future.

As mentioned in the above description, the location detecting unit 110 generates the location k_mand the sign Sign(X_m(k_m)) and then forwards them to the shape vector generating unit 120 and the multiplexing unit 190.

Based on the input signal X_m, the received location k_mand the sign Sign(X_m(k_m)), the shape vector generating unit 120 generates a normalized shape vector S_min 2L dimensions.

$\begin{matrix} \begin{matrix} S_{m} = [x_{m} (k_{m} - L + 1), \dots, x_{m} (k_{m}), \dots, x_{m} (k_{m} + L)] \cdot \\ sign (x_{k} (k_{m})) / G_{n} \\ = (s_{m} (0), s_{m} (1), \dots, s_{m} (2 L - 1)] \end{matrix} & [Formula 3] \\ S_{m} = [S_{m} (n)] (n = 0 \sim 2 L - 1) \end{matrix}$

In Formula 3, S_mindicates a normalized shape vector of (m+1)^thstage, n indicates an element index of a shape vector, L indicates dimension, k_mindicates a location (k_m=0˜N−1) of a coefficient having a maximum energy in the (m+1)^thstage input signal, Sign(X_m(k_m)) indicates a sign of a coefficient having a maximum energy, ‘X_m(k_m−L+1), X_m(k_m+L)’ indicate portions selected from spectral coefficients based on the location k_m, and G_mindicates a normalized value.

The normalized value G_mmay be defined as follows.

$\begin{matrix} G_{m} = \sqrt{\frac{1}{2 L} \sum_{l = - L + 1}^{L} x_{m}^{2} (k_{m} + l)} & [Formula 4] \end{matrix}$

In Formula 4, G_mindicates a normalized value, X_mindicates an (m+1)^thstage input signal, and L indicates dimension.

In particular, the normalized value can be calculated into an RMS (root mean square) value expressed as Formula 4.

Referring to FIG. 2, since a shape vector S_mcorresponds to a set of total 2L coefficients on the right and lefts sides centering on the k_m, if L=10, 10 coefficients are located on each of the right and left sides centering on a point ‘139’. Hence, the shape vector S_mmay correspond to a set of the coefficients (X_m(130), X_m(149)) having ‘n=130˜149’.

Meanwhile, as multiplied by the Sign(X_m(k_m)) in Formula 3, a sign of a maximum peak component becomes identical to a positive (+) value. If a shape vector is normalized into an RMS value by equalizing a location and sign of the shape vector, it is able to further raise quantization efficiency using a codebook.

The shape vector generating unit 120 delivers the normalized shape vector S_mof the (m+1)^thstage to the vector quantizing unit 130 and also delivers the normalized value G_mto the normalized value encoding unit 150.

The vector quantizing unit 130 vector-quantizes the quantized shape vector S_m. In particular, the vector quantizing unit 130 selects a code vector {tilde over (Y)}_mmost similar to the normalized shape vector S_mfrom code vectors included in a codebook by searching the codebook, delivers the code vector {tilde over (Y)}_mto the (m+1)^thstage input signal generating unit 140 and the residual generating unit 160, and also delivers a codebook index Y_micorresponding to the selected code vector {tilde over (Y)}_mto the multiplexing unit 180.

One example of the codebook is shown in FIG. 4. Referring to FIG. 4, after 8-dimensional shape vectors corresponding to ‘L=4’ have been extracted, a 5-bit vector quantization codebook is generated through a training process. According to the diagram, it can be observed that peak locations and signs of the code vectors configuring the codebook are equally arranged.

Meanwhile, before searching the codebook, the vector quantizing unit 130 defines a cost function as follows.

$\begin{matrix} D (i) = \sum_{n = 0}^{2 L - 1} w_{m} (n) {(s_{m} (n) - c (i, n))}^{2} & [Formula 5] \end{matrix}$

In Formula 5, i indicates a codebook index, D(i) indicates a cost function, n indicates an element index of a shape vector, S_m(n) indicates an nth element of an (m+1)^thstage, c(i, n) indicates an n^thelement in a code vector having a codebook index set to i, and W_m(n) indicates a weight function.

The weight factor W_m(n) may be defined as follows.

$\begin{matrix} w_{m} (n) = \langle s_{m} (n) \rangle / \sqrt{\sum_{n = 0}^{2 L - 1} s_{m}^{2} (n)} & [FIG . 6] \end{matrix}$

In FIG. 6, W_m(n) indicates a weight vector, n indicates an element index of a shape vector, S_m(n) indicates an n^thelement of a shape vector in an (m+1)^thstage. In this case, the weight vector varies in accordance with a shape vector S_m(n) or a selected part (X_m(k_m−L+1), . . . , X_m(k_m+L)).

The cost function is defined as Formula 5 and a search for a code vector C_i=[c(i, 0), c(i, 1), . . . , c(i, 2L−1)] that minimizes the cost function. In doing so, a weight vector W_m(n) is applied to an error value for an element of a spectral coefficient. This means an energy ratio occupied by the element of each spectral coefficient in a shape vector and may be defined as Formula 6. In particular, in searching for a code vector, in a manner of raising significance for spectral coefficient elements having relatively high energy, it is able to further enhance quantization performance on the corresponding elements.

FIG. 5 is a diagram for a relation between the total bit number of a shape vector and a signal to noise ratio (SNR). After vector quantization has performed on a shape vector by generating 2-bit codebook to 7-bit codebook, if a signal to noise ratio is measured through an error from an original signal, referring to FIG. 5, it is able to confirm that the SNR increases by about 0.8 dB when 1 bit is increased.

Consequently, a code vector Ci, which minimizes the cost function of Formula 5, is determined as a code vector {tilde over (Y)}_m(or a shoe code vector) of a shape vector and a codebook index I is determined as a codebook index Y_miof the shape vector. As mentioned in the foregoing description, the codebook index Y_miis delivered to the multiplexing unit 180 as a result of the vector quantization. The shape code vector {tilde over (Y)}_mis delivered to the (m+1)^thstage input signal generating unit 140 for generation of an (m+1)^thstage input signal and is delivered to the residual generating unit 160 for residual generation.

Meanwhile, for the 1^ststage input signal (X_m, m=0), the location detecting unit 110 or the vector quantizing unit 130 generates a shape vector and then performs vector quantization on the generated shape vector. If m<(M−1), the (m+1)^thstage input signal generating unit 140 is activated and then performs the shape vector generation and the vector quantization on the (m+1)^thstage input signal. On the other hand, if m=M, the (m+1)^thstage input signal generating unit 140 is not activated but the normalized value encoding unit 150 and the residual generating unit 160 become active. In particular, if M=4, the (m+1)^thstage input signal generating unit 140, the location detecting unit 110 and the vector quantizing unit 130 repeatedly perform the operations on 2^ndto 4^thstage input signals in case of ‘m=1, 2 and 3’ after ‘m=0 (i.e., 1^ststage input signal)’. So to speak, if m=0˜3, after completion of the operations of the components 110, 120, 130 and 140, the normalized value encoding unit 150 and the residual generating unit 160 become active.

Before the (m+1)^thstage input signal generating unit 140 becomes active, an operation ‘m=m+1’ is performed. In particular, if m=0, the (m+1)^thstage input signal generating unit 140 operated for the case of ‘m=1’. The (m+1)^thstage input signal generating unit 140 generates an (m+1)^thstage input signal by the following formula.

X
_m
=X
_m-1
−G
_m-1
−{tilde over (Y)}
_m-1 [Formula 7]

In Formula 7, X_mindicates an (m+1)^thstage input signal, X_m-1indicates an (m+1)^thstage input signal, G_M-1indicates an m^thstage normalized value, and Y_m-1indicates an M^thstage shape code vector.

The 2^ndstage input signal X₁is generated using the 1^ststage input signal X₀, the 1^ststage normalized value G₀and the 1^ststage shape code vector {tilde over (Y)}₀.

Meanwhile, the m^thstage shape code vector {tilde over (Y)}_m-1is the vector having the same dimension(s) of X_mrather than the aforementioned shape code vector {tilde over (Y)}_mand corresponds to a vector configured in a manner that right and left parts (N−2L) centering on a location k_mare padded with zeros. A sign (Sign_m) should be applied to the shape code vector as well.

The above-generated (m+1)^thstage input signal X_m(where m=m) is inputted to the location detecting unit 110 and the like and repeatedly undergoes the shape vector generation and quantization until m=M.

On example of the case of ‘M=4’ is shown in FIG. 3. Like FIG. 2, a shape vector S₀is determined centering on a 1^ststage peak (k₀=139) and a result from subtracting a 1^ststage shape code vector {tilde over (Y)}₀(or a value resulting from applying a normalized value to {tilde over (Y)}₀), which is a result of vector quantization of the determined shape vector S₀, from an original signal X₀becomes a 2^ndstage input signal X₁. Hence, it can be observed that a location k₁of a peak having a highest energy value in the 2^ndstage input signal X₁is about 133 in FIG. 2. It can be observed that a 3^rdstage peak k₂is about 96 and that a 4^thstage peak k₃is about 89. Thus, in case that shape vectors are extracted through the multiple stages (e.g., total 4 stages (M=4)), it may be able to extract total 4 shape vectors (S₀, S₁, S₂, S₃).

Meanwhile, in order to raise compression efficiency of normalized values (G=[G₀, G₁, . . . , G_M-1], G_m, m=0˜M−1) generated per stage (m=0˜M−1), the normalized value encoding unit 150 performs vector quantization on a differential vector Gd resulting from subtracting a mean (G_mean) from each of the normalized values. First of all, the mean for the normalized values can be determined as follows.

G
_mean
=avg(G₀,˜,G_M-1) [Formula 8]

In Formula 8, G_mean, indicates a mean value, AVG( ) indicates an average function, and G₀,˜G_M-1indicate normalized values per stage (G_m, m=0˜M−1), respectively.

The normalized value encoding unit 150 performs vector quantization on a differential vector Gd resulting from subtracting a mean from each of the normalized values Gm. In particular, by searching a codebook, a code vector most similar to a differential value is determined as a normalized value differential code vector {tilde over (G)}d and a codebook index for the {tilde over (G)}d is determined as a normalized value index Gi.

FIG. 6 is a diagram for a relation between the total bit number of a normalized value differential code vector and a signal to noise ratio (SNR). IN particular, FIG. 6 shows a result of measuring a signal to noise ratio (SNR) by varying the total bit number for the normalized value differential code vector {tilde over (G)}d. In this case, the total bit number of the mean G_meanis fixed to 5 bits. Referring to FIG. 6, even if the total bit number of the normalized value differential code vector is increased, it can be observed that the SNR almost has no increase. In particular, the number of bits used for the normalized value differential code vector has no considerable influence on the SNR. Yet, when the bit numbers of a shape code vector (i.e., a quantized shape vector) are 3 bits, 4 bits and 5 bits, respectively, if SNRs of the normalized value differential code vectors are compared to each other, it can be observed that there exist considerable differences. In particular, the SNR of the normalized value differential code vector has considerable correlation with the total bit number of the shape code vector.

Consequently, although the SNR of the normalized value differential code vector is nearly independent from the total bit number of the normalized value differential code vector, it can be observed that the SNR of the normalized value differential code vector is dependent on the total bit number of the shape code vector.

The normalized value differential code vector {tilde over (G)}d, which is generated from the normalized value encoding unit 150, and the mean G_meanare delivered to the residual generating unit 160 and the normalized value mean G_meanand the normalized value index G₁are delivered to the multiplexing unit 180.

The residual generating unit 160 receives the normalized value differential code vector {tilde over (G)}d, the mean G_mean, the input signal X₀and the shape code vector {tilde over (Y)}_mand then generates a normalized value code vector {tilde over (G)} by adding the mean to the normalized value differential code vector. Subsequently, the residual generating unit 160 generates a residual z, which is a coding error or quantization error of the shape vector coding, as follows.

Z=Xo−{tilde over (G)}
₀
{tilde over (Y)}
₀
− . . . −{tilde over (G)}
_M-1
{tilde over (Y)}
_M-1 [Formula 9]

In Formula 9, z indicates a residual, X₀indicates an input signal (of a 1^ststage), {tilde over (Y)}_mindicates a shape code vector, and {tilde over (G)}_mindicates an (m+1)th element of a normalized value code vector {tilde over (G)}.

The residual encoding unit 170 applies a frequency envelope coding scheme to the residual z. A parameter for the frequency envelope may be defined as follows.

$\begin{matrix} F_{e} (i) = \frac{1}{2} \log_{2} (\frac{1}{2 W} \sum_{k = W_{i}}^{W (i + 2) - 1} {(w_{f} (k) z (k))}^{2}), 0 \leq i < 160 / W & [Formula 10] \end{matrix}$

In Formula 10, F_e(i) indicates a frequency envelope, i indicates an envelope parameter index, w_f(k) indicates 2W-dimensional Hanning window, and z(k) indicates a spectral coefficient of a residual signal.

In particular, by performing 50% overlap windowing, a log energy corresponding to each window is defined as a frequency envelope to use.

For instance, when W=8, according to Formula 10, since i=0˜19, it is able to transmit total 20 envelope parameters (F_e(i)) by a split vector quantization scheme. In doing so, vector quantization is performed on a mean removed part for quantization efficiency. The following formula represents vectors resulting from subtracting a mean energy value from split vectors.

F
₀
^M
=F
₀
−M
_F
F
₀
=[F
_e(0), . . . ,F_e(4)],

F
₁
^M
=F
₁
−M
_F
F
₁
=[F
_e(5), . . . ,F_e(9)],

F
₂
^M
=F
₂
−M
_F
F
₂
=[F
_e(10), . . . ,F_e(14)],

F
₃
^M
=F
₃
−M
_F
F
₃
=[F
_e(15), . . . ,F_e(19)]. [Formula 11]

In Formula 11, Fe(i) indicates a frequency envelope parameter (i=0˜19, W=8), F_j(j=0, . . . ) indicate split vectors, M_Findicates a mean energy value, and F_j^M(j=0, . . . ) indicates mean removed split vectors.

The residual encoding unit 170 performs vector quantization on the mean removed split vectors (F_j^M(j=0, . . . )) through a codebook search, thereby generating an envelope parameter index F_ji. And, the residual encoding unit 170 delivers the envelope parameter index F_jiand the mean energy M_Eto the multiplexing unit 180.

The multiplexing unit 180 multiplexes the data delivered from the respective components together, thereby generating at least one bitstream. In doing so, when the bitstream is generated, it may be able to follow the syntax shown in FIG. 7.

FIG. 7 is a diagram for one example of a syntax for elements included in a bitstream. Referring to FIG. 7, it is able to generate location information and sign information based on a location (k_m) and sign (Sign_m) received from the location detecting unit 110. If M=4, 7 bits (total 28 bits) may be assigned to the location information per stage (e.g., m=0 to 3) and 1 bit (total 4 bits) may be assigned to the sign information per stage (e.g., m=0 to 3), by which the present invention may be non-limited (i.e., the present invention is non-limited by specific bit number). And, it may be able to assign 3 bits (total 12 bits) to a codebook index Y_m, of a shape vector per stage as well. A normalized mean G_meanand a normalized value index G_iare the values generated not for each stage but for the whole stages. In particular, 5 bits and 6 bits may be assigned to the normalized mean G_meanand the normalized value index G_i, respectively.

Meanwhile, when the envelope parameter index F_jiindicates total 4 split factors (i.e., j=0, . . . , 3), if 5 bits are assigned to each split vector, it may be able to assign total 20 bits. Meanwhile, if the whole mean energy M_Fis exactly quantized without being split, it may be able to assign total 5 bits.

FIG. 8 is a diagram for configuration of a decoder in an audio signal processing apparatus according to one embodiment of the present invention. Referring to FIG. 8, a decoder 200 includes a shape vector reconstructing unit 220 and may further include a demultiplexing unit 210, a normalized value decoding unit 230, a residual obtaining unit 240, a 1^stsynthesizing unit 250 and a 2^ndsynthesizing unit 260.

The demultiplexing unit 210 extracts such elements shown in the drawing as location information k_mand the like from at least one bitstream received from an encoder and then delivers the extracted elements to the respective components.

The shape vector reconstructing unit receives a location (k_m), a sign (Sign_m) and a codebook index (Y_mi). The shape vector reconstructing unit 220 obtains a shape code vector corresponding to the codebook index from a codebook by performing de-quantization. The shape vector reconstructing unit 220 enables the obtained code vector to be situated at the location k_mand then applies the sign thereto, thereby reconstructing a shape code vector {tilde over (Y)}_m. Having reconstructed the shape code vector, the shape vector reconstructing unit 220 enables the rest of right and left parts (N−2L), which do not match dimension(s) of the signal X, to be padded with zeros.

Meanwhile, the normalized value decoding unit 230 reconstructs a normalized value differential code vector {tilde over (G)}d corresponding to the normalized value index G1 using the codebook. Subsequently, the normalized value decoding unit 230 generates a normalized value code vector {tilde over (G)}_mby adding a normalized value mean G_meanto the normalized value code vector.

The 1^stsynthesizing unit 250 reconstructs a 1^stsynthesized signal Xp as follows.

Xp={tilde over (G)}
₀
{tilde over (Y)}
₀
+{tilde over (G)}
₁
{tilde over (Y)}
₁
+ . . . +{tilde over (G)}
_M-1
{tilde over (Y)}
_M-1 [Formula 12]

The residual obtaining unit 240 reconstructs an envelope parameter F_e(i) in a manner of receiving an envelope parameter index F_jiand a mean energy M_F, obtaining mean removed split code vectors F_j^Mcorresponding to the envelope parameter index (F_ji), combining the obtained split code vectors, and then adding the mean energy to the combination.

Subsequently, if a random signal having a unit energy is generated from a random signal generator (not shown in the drawing), a 2^ndsynthesized signal is generated in a manner of multiplying the random signal by the envelope parameter.

Yet, in order to reduce a noise occurring effect caused by the random signal, the envelope parameter may be adjusted as follows before being applied to the random signal.

{tilde over (F)}
_e(i)=α·F_e(i) [Formula 13]

In Formula 13, Fe(i) indicates an envelope parameter, a indicates a constant, and {tilde over (F)}_e(i) indicates an adjusted envelope parameter.

In this case, the α may include a constant value by text. Alternatively, it may be able to apply an adaptive algorithm that reflects signal properties.

The 2^ndsynthesized signal Xr, which is a decoded envelope parameter, is generated as follows.

Xr=random( )×{tilde over (F)}_e(i) [Formula 14]

In Formula 14, random( ) indicates a random signal generator and {tilde over (F)}_e(i) indicates an adjusted envelope parameter.

Since the above-generated 2^ndsynthesized signal Xr includes the values calculated for the Hanning-windowed signal in the encoding process, it may be able to maintain the conditions equivalent to those of the encoder in a manner of covering the random signal with the same window in the decoding step. Likewise, it is able to output spectral coefficient elements decoded by the 50% overlapping and adding process.

The 2^ndsynthesizing unit 260 adds the 1^stsynthesized signal Xp and the 2^ndsynthesized signal Xr together, thereby outputting a finally reconstructed spectral coefficient.

The audio signal processing apparatus according to the present invention is available for various products to use. Theses products can be mainly grouped into a stand alone group and a portable group. A TV, a monitor, a settop box and the like can be included in the stand alone group. And, a PMP, a mobile phone, a navigation system and the like can be included in the portable group.

FIG. 9 is a schematic block diagram of a product in which an audio signal processing apparatus according to one embodiment of the present invention is implemented. Referring to FIG. 9, a wire/wireless communication unit 510 receives a bitstream via wire/wireless communication system. In particular, the wire/wireless communication unit 510 may include at least one of a wire communication unit 510A, an infrared unit 510B, a Bluetooth unit 510C and a wireless LAN unit 510D and a mobile communication unit 510E.

A user authenticating unit 520 receives an input of user information and then performs user authentication. The user authenticating unit 520 may include at least one of a fingerprint recognizing unit, an iris recognizing unit, a face recognizing unit and a voice recognizing unit. The fingerprint recognizing unit, the iris recognizing unit, the face recognizing unit and the speech recognizing unit receive fingerprint information, iris information, face contour information and voice information and then convert them into user informations, respectively. Whether each of the user informations matches pre-registered user data is determined to perform the user authentication.

An input unit 530 is an input device enabling a user to input various kinds of commands and can include at least one of a keypad unit 530A, a touchpad unit 530B, a remote controller unit 530C and a microphone unit 530D, by which the present invention is non-limited. In this case, the microphone unit 530D is an input device configured to receive an input of a speech or audio signal. In particular, each of the keypad unit 530A, the touchpad unit 530B and the remote controller unit 530C is able to receive an input of a command for an outgoing call or an input of a command for activating the microphone unit 530D. In case of receiving a command for an outgoing call via the keypad unit 530D or the like, a control unit 559 is able to control the mobile communication unit 510E to make a request for a call to the corresponding communication network.

A signal coding unit 540 performs encoding or decoding on an audio signal and/or a video signal, which is received via the wire/wireless communication unit 510, and then outputs an audio signal in time domain. The signal coding unit 540 includes an audio signal processing apparatus 545. As mentioned in the foregoing description, the audio signal processing apparatus 545 corresponds to the above-described embodiment (i.e., the encoder 100 and/or the decoder 200) of the present invention. Thus, the audio signal processing apparatus 545 and the signal coding unit including the same can be implemented by at least one or more processors.

The control unit 550 receives input signals from input devices and controls all processes of the signal decoding unit 540 and an output unit 560. In particular, the output unit 560 is a component configured to output an output signal generated by the signal decoding unit 540 and the like and may include a speaker unit 560A and a display unit 560B. If the output signal is an audio signal, it is outputted to a speaker. If the output signal is a video signal, it is outputted via a display.

FIG. 10 is a diagram for relations of products provided with an audio signal processing apparatus according to an embodiment of the present invention. FIG. 10 shows the relation between a terminal and server corresponding to the products shown in FIG. 9. Referring to FIG. 15 (A), it can be observed that a first terminal 500.1 and a second terminal 500.2 can exchange data or bitstreams bi-directionally with each other via the wire/wireless communication units. Referring to FIG. 15 (B), it can be observed that a server 600 and a first terminal 500.1 can perform wire/wireless communication with each other.

FIG. 11 is a schematic block diagram of a mobile terminal in which an audio signal processing apparatus according to one embodiment of the present invention is implemented. A mobile terminal 700 may include a mobile communication unit 710 configured for incoming and outgoing calls, a data communication unit for data configured for data communication, a input unit configured to input a command for an outgoing call or a command for an audio input, a microphone unit 740 configured to input a speech or audio signal, a control unit 750 configured to control the respective components, a signal coding unit 760, a speaker 770 configured to output a speech or audio signal, and a display 780 configured to output a screen.

The signal coding unit 760 performs encoding or decoding on an audio signal and/or a video signal received via one of the mobile communication unit 710, the data communication unit 720 and the microphone unit 530D and outputs an audio signal in time domain via one of the mobile communication unit 710, the data communication unit 720 and the speaker 770. The signal coding unit 760 includes an audio signal processing apparatus 765. As mentioned in the foregoing description of the embodiment (i.e., the encoder 100 and/or the decoder 200 according to the embodiment) of the present invention, the audio signal processing apparatus 765 and the signal coding unit including the same may be implemented with at least one processor.

An audio signal processing method according to the present invention can be implemented into a computer-executable program and can be stored in a computer-readable recording medium. And, multimedia data having a data structure of the present invention can be stored in the computer-readable recording medium. The computer-readable media include all kinds of recording devices in which data readable by a computer system are stored. The computer-readable media include ROM, RAM, CD-ROM, magnetic tapes, floppy discs, optical data storage devices, and the like for example and also include carrier-wave type implementations (e.g., transmission via Internet). And, a bitstream generated by the above mentioned encoding method can be stored in the computer-readable recording medium or can be transmitted via wire/wireless communication network.

While the present invention has been described and illustrated herein with reference to the preferred embodiments thereof, it will be apparent to those skilled in the art that various modifications and variations can be made therein without departing from the spirit and scope of the invention. Thus, it is intended that the present invention covers the modifications and variations of this invention that come within the scope of the appended claims and their equivalents.

INDUSTRIAL APPLICABILITY

Accordingly, the present invention is applicable to encoding and decoding an audio signal.

METHOD AND DEVICE FOR PROCESSING AUDIO SIGNALS

Information

Publication Number

Date Filed

Date Published

Inventors

Original Assignees

CPC

US Classifications

International Classifications

Abstract

Description

Claims

PCT Information

Provisional Applications (1)