The present disclosure is generally related to blind bandwidth extension.
Advances in technology have resulted in smaller and more powerful computing devices. For example, there currently exist a variety of portable personal computing devices, including wireless computing devices, such as portable wireless telephones, personal digital assistants (PDAs), and paging devices that are small, lightweight, and easily carried by users. More specifically, portable wireless telephones, such as cellular telephones and Internet Protocol (IP) telephones, can communicate voice and data packets over wireless networks. Further, many such wireless telephones include other types of devices that are incorporated therein. For example, a wireless telephone can also include a digital still camera, a digital video camera, a digital recorder, and an audio file player.
In traditional telephone systems (e.g., public switched telephone networks (PSTNs)), voice and other signals are sampled at about 8 kilohertz (kHz), limiting the signal frequencies of a represented signal to less than 4 kHz. In wideband (WB) applications, such as cellular telephony and voice over internet protocol (VoIP), the voice and other signals may be sampled at about 16 kHz. WB applications enable representation of signals with frequencies of up to 8 kHz. Extending signal bandwidth from narrowband (NB) telephony, limited to 4 kHz, to WB telephony of 8 kHz may improve speech intelligibility and naturalness.
WB coding techniques typically involve encoding and transmitting the lower frequency portion of the signal (e.g., 0 Hz to 4 kHz, also called the “low-band”). For example, the low-band may be represented using filter parameters and/or a low-band excitation signal. However, in order to improve coding efficiency, the higher frequency portion of the signal (e.g., 4 kHz to 8 kHz, also called the “high-band”) may be encoded to generate a smaller set of parameters that are transmitted with the low-band information. As the amount of high-band information is reduced, bandwidth transmission is more efficiently used, but accurate reconstruction of the high-band at a receiver may have reduced reliability.
Systems and methods of performing blind bandwidth extension are disclosed. In a particular embodiment, a low-band input signal (representing a low-band portion of an audio signal) is received. High-band parameters (e.g., line spectral frequencies (LSF), gain shape information, gain frame information, and/or other information descriptive of the high-band audio signal) may be predicted using the low-band portion of the audio signal according to states based on soft-vector quantization. For example, a particular state may correspond to particular low-band gain frame parameters (e.g., corresponding to a low-band frame or sub-frame). Using predicted state transition information, gain frame information associated with the high-band portion of the audio signal may be predicted based on low-band gain frame information extracted from the low-band portion of the audio signal. A known or predicted state corresponding to particular gain frame parameters may be used to predict additional gain frame parameters that correspond to additional frames/sub-frames. The predicted high-band parameters may be applied to a high-band model (with a low-band residual signal corresponding to the low-band portion of the audio signal) to generate a high-band portion of the audio signal. The high-band portion of the audio signal may be combined with the low-band portion of the audio signal to produce a wideband output.
In a particular embodiment, a method includes determining, based on a set of low-band parameters of an audio signal, a first set of high-band parameters and a second set of high-band parameters. The method further includes generating a predicted set of high-band parameters based on a weighted combination of the first set of high-band parameters and the second set of high-band parameters.
In another particular embodiment, a method includes receiving a set of low-band parameters corresponding to a frame of an audio signal. The method further includes selecting, based on the set of low-band parameters, a first quantization vector from a plurality of quantization vectors and a second quantization vector from the plurality of quantization vectors. The first quantization vector is associated with a first set of high-band parameters and the second quantization vector is associated with a second set of high-band parameters. The method also includes predicting a set of high-band parameters based on a weighted combination of the first set of high-band parameters and the second set of high-band parameters.
In another particular embodiment, a method includes receiving a set of low-band parameters corresponding to a frame of an audio signal. The method further includes predicting a set of non-linear domain high-band parameters based on the set of low-band parameters. The method also includes converting the set of non-linear domain high-band parameters from a non-linear domain to a linear domain to obtain a set of linear domain high-band parameters.
In another particular embodiment, a method includes receiving a set of low-band parameters corresponding to a frame of an audio signal. The method further includes selecting, based on the set of low-band parameters, a first quantization vector from a plurality of quantization vectors and a second quantization vector from the plurality of quantization vectors. The first quantization vector is associated with a first set of high-band parameters and the second quantization vector is associated with a second set of high-band parameters. The method also includes predicting a set of high-band parameters based on a weighted combination of the first set of high-band parameters and the second set of high-band parameters.
In another particular embodiment, a method includes selecting a first quantization vector of a plurality of quantization vectors. The first quantization vector corresponds to a first set of low-band parameters corresponding to a first frame of an audio signal. The method further includes receiving a second set of low-band parameters corresponding to a second frame of the audio signal. The method also includes determining, based on entries in a transition probability matrix, bias values associated with transitions from the first quantization vector corresponding to the first frame to candidate quantization vectors corresponding to the second frame. The method includes determining weighted differences between the second set of low-band parameters and the candidate quantization vectors based on the bias values. The method further includes selecting a second quantization vector corresponding to the second frame based on the weighted differences.
In another particular embodiment, a method includes receiving a set of low-band parameters corresponding to a frame of an audio signal. The method further includes classifying the set of low-band parameters as voiced or unvoiced. The method also includes selecting a quantization vector. The quantization vector corresponds to a first plurality of quantization vectors associated with voiced low-band parameters when the set of low-band parameters is classified as voiced low-band parameters. The quantization vector corresponds to a second plurality of quantization vectors associated with unvoiced low-band parameters when the set of low-band parameters is classified as unvoiced low-band parameters. The method includes predicting a set of high-band parameters based on the selected quantization vector.
In another particular embodiment, a method includes receiving a first set of low-band parameters corresponding to a first frame of an audio signal. The method further includes receiving a second set of low-band parameters corresponding to a second frame of the audio signal. The second frame is subsequent to the first frame within the audio signal. The method also includes classifying the first set of low-band parameters as voiced or unvoiced and classifying the second set of low-band parameters as voiced or unvoiced. The method includes selectively adjusting a gain parameter based at least partially on a classification of the first set of low-band parameters, a classification of the second set of low-band parameters, and an energy value corresponding to the second set of low-band parameters.
In another particular embodiment, a method includes receiving, at a decoder of a speech vocoder, a set of low-band parameters as part of a narrowband bitstream. The set of low-band parameters are received from an encoder of the speech vocoder. The method also includes predicting a set of high-band parameters based on the set of low-band parameters.
In another particular embodiment, an apparatus includes a speech vocoder and a memory storing instructions executable by the speech vocoder to perform operations. The operations include receiving, at a decoder of the speech vocoder, a set of low-band parameters as part of a narrowband bitstream. The set of low-band parameters are received from an encoder of the speech vocoder. The operations also include predicting a set of high-band parameters based on the set of low-band parameters.
In another particular embodiment, a non-transitory computer-readable medium includes instructions, that when executed by a speech vocoder, cause the speech vocoder to receive, at a decoder of the speech vocoder, a set of low-band parameters as part of a narrowband bitstream. The set of low-band parameters are received from an encoder of the speech vocoder. The instructions are also executable to cause the speech vocoder to predict a set of high-band parameters based on the set of low-band parameters.
In another particular embodiment, an apparatus includes means for receiving a set of low-band parameters as part of a narrowband bitstream. The set of low-band parameters are received from an encoder of a speech vocoder. The apparatus also includes means for predicting a set of high-band parameters based on the set of low-band parameters.
Particular advantages provided by at least one of the disclosed embodiments include generating high-band signal parameters from low-band signal parameters without the use of high-band side information, thereby reducing the amount of data transmitted. For example, high-band parameters corresponding to a high-band portion of an audio signal may be predicted based on low-band parameters corresponding to a low-band portion of the audio signal. Using soft-vector quantization may reduce audible effects due to transitions between states and compared to high-band prediction systems that use hard vector quantization. Using predicted state transition information may increase the accuracy of the predicted high-band parameters as compared to high-band prediction systems that do not use predicted state transition information. Other aspects, advantages, and features of the present disclosure will become apparent after review of the entire application, including the following sections: Brief Description of the Drawings, Detailed Description, and the Claims.
Referring to
In the following description, various functions performed by the system 100 of
Although the disclosed systems and methods of
The narrowband decoder 110 may be configured to receive a narrowband bitstream 102 (e.g., an adaptive multi-rate (AMR) bitstream). The narrowband decoder 110 may be configured to decode the narrowband bitstream 102 to recover a low-band audio signal 134 corresponding to the narrowband bitstream 102. In a particular embodiment, the low-band audio signal 134 may represent speech. As an example, a frequency of the low-band audio signal 134 may range from approximately 0 hertz (Hz) to approximately 4 kilohertz (kHz). The narrowband decoder 110 may further be configured to generate low-band parameters 104 based on the narrowband bitstream 102. The low-band parameters 104 may include linear prediction coefficients (LPC), line spectral frequencies (LSF), gain shape information, gain frame information, and/or other information descriptive of the low-band audio signal 134. In a particular embodiment, the low-band parameters 104 include AMR parameters corresponding to the narrowband bitstream 102. The narrowband decoder 110 may further be configured to generate low-band residual information 108. The low-band residual information 108 may correspond to a filtered portion of the low-band audio signal 134. Although
The high-band parameter prediction module 120 may be configured to receive the low-band parameters 104 from the narrowband decoder 110. Based on the low-band parameters 104, the high-band parameter prediction module 120 may generate predicted high-band parameters 106. The high-band parameter prediction module 120 may use soft vector quantization to generate the predicted high-band parameters 106, such as in accordance with one or more of the embodiments described with reference to
The high-band model module 130 may use the predicted high-band parameters 106 and the low-band residual information 108 to generate a high-band signal 132. As an example, a frequency of the high-band signal 132 may range from approximately 4 kHz to approximately 8 kHz. The synthesis filter bank 140 may be configured to receive the high-band signal 132 and the low-band signal 134 and generate a wideband output 136. The wideband output 136 may include a wideband speech output that includes the decoded low-band audio signal 134 and the predicted high-band audio signal 132. A frequency of the wideband output 136 may range from approximately 0 Hz to approximately 8 kHz, as an illustrative example. The wideband output 136 may be sampled (e.g., at approximately 16 kHz) to reconstruct the combined low-band and high-band signals. Using soft vector quantization may reduce inaccuracies in the wideband output 136 due to inaccurately predicted high-band parameters thereby reducing audible artifacts in the wideband output 136.
Although the description of
Referring to
The method 200 may further include decoding the narrowband bitstream to generate a low-band audio signal (e.g., the low-band signal 134 of
The method 200 includes applying the high-band parameters to a high-band model to generate a high-band audio signal, at 208. For example, the high-band parameters 106 may be applied to the high-band model 130 along with the low-band residual 108 received from the narrowband decoder 110. The method 200 further includes combining (e.g., at the synthesis filter bank 140 of
Using the soft vector quantization according to the method 200 may reduce inaccuracies in wideband output due to inaccurately predicted high-band parameters and therefore may reduce audible artifacts in the wideband output.
Referring to
The high-band parameter prediction module 310 may include a soft vector quantization module 312, a probability biased state transition matrix 314, a voiced/unvoiced prediction model switch module 316, and/or a multi-stage high-band error detection module 318.
The soft vector quantization module 312 may be configured to determine a set of matching low-band to high-band quantization vectors for a received set of low-band parameters. For example, the set of low-band parameters corresponding to the frame 304 may be received at the soft vector quantization module 312. The soft vector quantization module may select multiple quantization vectors from a vector quantization table (e.g., a codebook) that best match the set of low-band parameters, such as described in further detail with reference to
In selecting vectors from the vector quantization table that best match the set of low-band parameters, differences between the set of low-band parameters and the quantized low-band parameters of each quantization vector may be calculated. The calculated differences may be scaled, or weighted, based on a determination of a state (e.g., a closest matching quantized set) of the low-band parameters. The probability biased state transition matrix 314 may be used to determine a plurality of weights in order to weight the calculated differences. The plurality of weights may be calculated based on bias values corresponding to probabilities of transition from a current set of quantized low-band parameters to a next set of quantized low-band parameters of the vector quantization table (e.g., corresponding to a next received frame of the audio signal). The multiple quantization vectors selected by the soft vector quantization module 312 may be selected based on the weighted differences. In order to conserve resources, the probability biased state transition matrix 314 may be compressed. Examples of probability biased state transition matrices that may be used in
The voiced/unvoiced prediction model switch module 316 may provide a first codebook for use by the soft vector quantization module 312 when the received set of low-band parameters corresponds to a voiced audio signal and a second codebook when the received set of low-band parameters corresponds to an unvoiced audio signal, such as further described with reference to
The multi-stage high-band error detection module 318 may analyze the non-linear domain high-band parameters generated by the soft vector quantization module 312, the probability biased state transition matrix 314, and the voiced/unvoiced prediction model switch 316 to determine whether a high-band parameter (e.g., a gain frame parameter) may be unstable (e.g., corresponding to an energy value that is disproportionately higher than an energy value of a prior frame) and/or may lead to noticeable artifacts in the generated wide band audio signal. In response to determining that a high-band prediction error has occurred, the multi-stage high-band error detection module 318 may attenuate or otherwise correct the non-linear domain high-band parameters. Examples of multi-stage high-band error detection are further described with reference to
After the set of non-linear domain high-band parameters 306 are generated by the high-band parameter prediction module 310, the non-linear to linear conversion module 320 may convert the non-linear domain high-band parameters to the linear domain, thereby generating high-band parameters 308. Performing high-band parameter prediction in the non-linear domain, as opposed to the linear domain or the log domain, may enable the high-band parameters to more closely model the human auditory response. Further, the non-linear domain model may be selected to have a concavity, such that the non-linear domain model attenuates a weighted sum output of the soft vector quantization module 312 that does not clearly match a particular state (e.g., quantization vector). An example of concavity may include functions that satisfy the property:
Examples of concave functions may include logarithmic type functions, n-th root functions, one or more other concave functions, or expressions that include one or more concave components and that may further include a non-concave component. For example, a set of low-band parameters that falls equidistant from two quantization vectors within the soft vector quantization module 312 results in high-band parameters with a lower energy value than if the set of low-band parameters is equal to one or the other of the quantization vectors. The attenuation of less exact matches between low-band parameters and quantized low-band parameters enables high-band parameters that are predicted with less certainty to have less energy, thereby reducing the chance for erroneous high-band parameters from being audible within the output wideband audio signal.
Although
Referring to
The method 400 further includes predicting a set of non-linear domain high-band parameters based on the set of low-band parameters, at 404. For example, the high-band parameters prediction module 310 may use soft vector quantization in the non-linear domain to produce non-linear domain high-band parameters.
The method 400 also includes converting the set of non-linear domain high-band parameters from a non-linear domain to a linear domain to obtain a set of linear domain high-band parameters, at 406. For example, the non-linear to linear conversion module 320 may perform a multiplication operation to convert the non-linear high-band parameters into linear domain high-band parameters. To illustrate, a cubing operation applied to a value A may be denoted as A3 and may correspond to A*A*A. In this example, A is a cubic root (e.g., a 3-rd root) domain value of A3.
Performing high-band parameter prediction in the non-linear domain may more closely match the human auditory system and may reduce the likelihood that erroneous high-band parameters generate audible artifacts within the output wideband audio signal.
Referring to
To illustrate, the vector quantization table 520 may include a codebook that maps quantized low-band parameters “X” (e.g., an array of sets of low-band parameters X0-Xn) to high-band parameters “Y” (e.g., an array of sets of high-band parameters Y0-Yn). In an embodiment, the low-band parameters may include 10 low-band LSFs corresponding to a frame of an audio signal and the high-band parameters may include 6 high-band LSFs corresponding to the frame of the audio signal.
The vector quantization table 520 may be generated based on training data. For example, a database including wideband speech samples may be processed to extract low-band LSFs and corresponding high-band LSFs. From the wideband speech samples, similar low-band LSFs and corresponding high-band LSFs may be classified into multiple states (e.g., 64 states, 256 states, etc.). A centroid (or mean or other measure) corresponding to a distribution of low-band parameters in each state may correspond to quantized low-band parameters X0-Xn within an array of low-band parameters X and centroids corresponding to a distribution of high-band parameters in each state may correspond to quantized high-band parameters Y0-Yn within an array of high-band parameters Y. Each set of quantized low-band parameters may be mapped to a corresponding set of high-band parameters to form a quantization vector (e.g., a row of the vector quantization table 520).
In soft vector quantization, low-band parameters 502 corresponding to a low-band audio signal may be received by a soft vector quantization module (e.g., the soft vector quantization module 312 of
where di is a distance between the set of low-band parameters and an i-th set of quantized low-band parameters, Wj is a weight associated with each low-band parameter of the set of low-band parameters, xj is a low-band parameter having index j of the set of low-band parameters, and {circumflex over (x)}i,j is a quantized low-band parameter having index j of the i-th set of quantized low-band parameters.
Multiple quantized low-band parameters 510 may be matched to the set of low-band parameters 504 based on the distance between the set of low-band parameters 504 and the quantized low-band parameters. For example, the closest quantized low-band parameters (e.g., xi resulting in a smallest di) may be selected. In an embodiment, three quantized low-band parameters may be selected. In other embodiments, any number of multiple quantized low-band parameters 510 may be selected. Further, the number of multiple quantized low-band parameters 510 may adaptively change from frame to frame. For example, a first number of quantized low-band parameters 510 may be selected for a first frame of the audio signal and a second number including more or fewer quantized low-band parameters may be selected for a second frame of the audio signal.
Based on the selected multiple quantized low-band parameters 510, multiple corresponding quantized high-band parameters 530 may be determined. A combination, such as a weighted sum, may be performed on the multiple quantized high-band parameters 530 to obtain a set of predicted high-band parameters 508. For example, the set of predicted high-band parameters 508 may include 6 high-band LSFs corresponding to the frame of the low-band audio signal. High-band parameters 506 corresponding to the low-band audio signal may be generated based on multiple sets of predicted high-band parameters and may correspond to multiple sequential frames of the audio signal.
The multiple high-band parameters 530 may be combined as a weighted sum, where each selected quantized high-band parameter may be weighted based on the inverse distance di−1 between the corresponding quantized low-band parameter and the received low-band parameter. To illustrate, when three quantized high-band parameters are selected, as illustrated in
where di−1 is the inverse distance between the set of low-band parameters and the first, second, or third selected quantized set of low-band parameters corresponding to the quantized high-band parameters to be weighted and d1−1+d2−1 d3−1 corresponds to the sum of each of the inverse distances between the set of low-band parameters and each of the selected quantized sets of low-band parameters corresponding to each of the quantized high-band parameters. Hence, the output set of high-band parameters 508 may be represented by the equation:
where y(i1), y(i2), and y(i3) are the selected multiple quantized high-band parameters. By weighting multiple quantized high-band parameters to determine a predicted set of quantized high-band parameters, a more accurate output set of high-band parameters 508 corresponding to the set of low-band parameters 504 may be predicted. Further, as the low-band parameters 502 change gradually over the course of multiple frames, the predicted high-band parameters 506 may also change gradually, as described with reference to
Referring to
In soft vector quantization, an input low-band parameter X may be modeled based on distances (e.g., d1, d2, and d3) between the input low-band parameter X and the vectors (X1, Y1), (X2, Y2), (X3, Y3) in contrast to hard vector quantization, which models the input low-band parameter based on one vector (e.g., the vectors (X1, Y1)) corresponding to the segment that contains the input low-band parameter. To illustrate, in soft-vector quantization, the modeled input X may be determined conceptually by the equation:
where X is the input low-band parameter to be modeled, Y1, Y2, and Y3 are the centroids of each state (e.g., corresponding to the array of quantized high-band parameters Y0-Yn of
may be normalized as described with reference to
As a stream of frames associated with an audio signal is received by the high-band prediction module, increased accuracy of low-band parameters and corresponding predicted high-band parameters associated with each frame may result in a smoother transition of the predicted high-band parameters from frame to frame.
Referring to
The method 800 may also include determining a first weight corresponding to the first quantization vector and based on the first difference and determining a second weight corresponding to the second quantization vector and based on the second difference, at 806. The method 800 may include predicting a set of high-band parameters based on a weighted combination of the first set of high-band parameters and the second set of high-band parameters, at 808. For example, the high-band parameters 506 of
A predicted set of high-band parameters based on multiple quantization vectors (e.g., soft-vector quantization) as in the method 800 may be more accurate than a prediction based on hard-vector quantization and may lead to smoother transitions of high-band parameters between different frames of an audio signal.
Referring to
The vector quantization table 920 may correspond to the vector quantization table 520 of
The transition probability matrix 930 may include multiple entries organized into multiple rows and multiple columns. Each row (e.g., rows 1-N) of the transition probability matrix 930 may correspond to a vector of the vector quantization table 920 that may be matched to the first set of low-band parameters 904. Each column (e.g., columns 1-N) of the transition probability matrix may correspond to a vector of the vector quantization table 920 that may be matched to the second set of low-band parameters 906. An entry of the transition probability matrix 930 may correspond to a probability that the second set of low-band parameters 906 will be matched to a vector (indicated by the column of the entry) given that the first set of low-band parameters 904 has been matched to a vector (indicated by the row of the entry). In other words, the transition probability matrix may indicate a probability of transitioning from each vector to each vector of the vector quantization table 920 between frames of the audio signal 902.
To illustrate, distances 916 (represented in
The transition probability matrix 930 may be generated based on training data. For example, a database including wideband speech samples may be processed to extract multiple sets of low-band LSFs corresponding to a series of frames of an audio signal. Based on multiple sets of low-band LSFs corresponding to a particular vector of the vector quantization table 920, a probability that a subsequent frame will correspond to each additional vector may be determined along with a probability that the subsequent frame will correspond to the same vector. Based on the probability associated with each vector, the transition probability matrix 930 may be constructed.
After the transition probabilities 910 corresponding to the matched vector 908 have been determined, the transform module 940 may transform the probabilities into bias values. For example, in a particular embodiment the probabilities may be transformed according to the equation:
where D is a bias value for biasing the distance 916 between the first set of low-band values 904 corresponding to a first frame and each of the vectors V0-Vn of the vector quantization table 920, and is a probability that the first set of low-band parameters corresponding to a vector Vi during the first frame will transition to the second set of low-band parameters corresponding to a vector Vj during the second frame (e.g., a value at the i-th row, j-th column of the transition probability matrix 930).
A soft vector quantization module, such as the soft vector quantization module 312 of
Using the transition probability matrix 930 to determine probabilities of transitioning from a vector to another vector between audio frames and using the probabilities to bias the selection of matching vectors corresponding to subsequent frames may prevent errors in matching vectors from the vector quantization table 920 to the subsequent frames. Hence, the transition probability matrix 930 enables more accurate vector quantization.
Referring to
By compressing the transition probability matrix according to
where N is the number of vectors in the vector quantization table 920 and M is the number of vectors for each row that are not included in the compressed transition probability matrix 1020.
Referring to
The method 1100 may further include receiving a second set of low-band parameters corresponding to a second frame of the audio signal, at 1104. For example, the second set of low-band parameters 906 of
The method 1100 may further include determining, based on entries in a transition probability matrix, bias values associated with transitions from the first quantization vector corresponding to the first frame to candidate quantization vectors corresponding to the second frame, at 1106. For example, the bias values 912 may be generated by selecting a row of probabilities b from the transition probability matrix 930 of
The method 1100 may also include determining weighted differences between the second set of low-band parameters and the candidate quantization vectors based on the bias values. For example, the distances 916 between the second set of low-band parameters 906 and the vectors V0-Vn of the vector quantization table 920 may be biased according to the bias values 912 of
Using bias values to match the sets of low-band parameters to vectors of the vector quantization table may prevent errors in matching vectors from the vector quantization table to frames and may prevent erroneous high-band parameters from being generated.
Referring to
The voiced/unvoiced prediction model switching module 1200 includes a decoder voiced/unvoiced classifier 1220 and a vector quantization codebook index module 1230. The voiced/unvoiced prediction model switching module 1200 may include a voiced codebook 1240 and an unvoiced codebook 1250. In a particular embodiment, the voiced/unvoiced prediction model switching module 1200 may include fewer or more than the illustrated modules.
During operation, the decoder voiced/unvoiced classifier 1220 may be configured to select or provide the voiced codebook 1240 when a received set of low-band parameters corresponds to a voiced audio signal and the unvoiced codebook 1250 when the received set of low-band parameters corresponds to an unvoiced audio signal. For example, the decoder voiced/unvoiced classifier 1220 and the vector quantization codebook index module 1230 may receive low-band parameters 1202 corresponding to a low-band audio signal. In a particular embodiment, the low-band parameters 1202 may correspond to the low-band parameters 302 of
The decoder voiced/unvoiced classifier 1220 may classify the set of parameters corresponding to the frame 1204 as voiced or unvoiced. For example, voiced speech may exhibit a high degree of periodicity. Unvoiced speech may exhibit little or no periodicity. The decoder voiced/unvoiced classifier 1220 may classify the set of parameters based on one or more measures of periodicity (e.g., zero crossings, normalized autocorrelation functions (NACFs), or pitch gain) indicated by the set of parameters. To illustrate, the decoder voiced/unvoiced classifier 1220 may determine whether a measure (e.g., zero crossings, NACFs, pitch gain, and/or voice activity) satisfies a first threshold.
In response to determining that the measure satisfies the first threshold, the decoder voiced/unvoiced classifier 1220 may classify the set of parameters of the frame 1204 as voiced. For example, in response to determining that NACF indicated by the set of parameters satisfies (e.g., exceeds) a first voiced NACF threshold (e.g., 0.6), the decoder voiced/unvoiced classifier 1220 may classify the set of parameters of the frame 1204 as voiced. As another example, in response to determining that a number of zero crossings indicated by the set of parameters satisfies (e.g., is below) a zero crossing threshold (e.g., 50), the decoder voiced/unvoiced classifier 1220 may classify the set of parameters of the frame 1204 as voiced.
In response to determining that the measure does not satisfy the first threshold, the decoder voiced/unvoiced classifier 1220 may classify the set of parameters of the frame 1204 as unvoiced. For example, in response to determining that the NACF indicated by the set of parameters does not satisfy (e.g., is below) a second unvoiced NACF threshold (e.g., 0.4), the decoder voiced/unvoiced classifier 1220 may classify the set of parameters of the frame 1204 as unvoiced. As another example, in response to determining that a number of zero crossings indicated by the set of parameters does not satisfy (e.g., exceeds) the zero crossing threshold (e.g., 50), the decoder voiced/unvoiced classifier 1220 may classify the set of parameters of the frame 1204 as unvoiced.
The vector quantization codebook index module 1230 may select one or more quantization vector indices corresponding to one or more matched quantized vectors 1206. For example, the vector quantization codebook index module 1230 may select indices of one or more quantization vectors based on a distance, such as described with respect to
In response to the decoder voiced/unvoiced classifier 1220 classifying the set of parameters of the frame 1204 as voiced, the voiced/unvoiced prediction model switching module 1200 may select a particular quantization vector of the matched quantized vectors 1206 corresponding to a particular quantization vector index of the voiced codebook 1240. For example, the voiced/unvoiced prediction model switching module 1200 may select multiple quantization vectors of the matched quantization vectors 1206 corresponding to multiple quantization vector indices of the voiced codebook 1240.
In response to the decoder voiced/unvoiced classifier 1220 classifying the set of parameters of the frame 1204 as unvoiced, the voiced/unvoiced prediction model switching module 1200 may select a particular quantization vector of the matched quantized vectors 1206 corresponding to a particular quantization vector index of the unvoiced codebook 1250. For example, the voiced/unvoiced prediction model switching module 1200 may select multiple quantization vectors of the matched quantization vectors 1206 corresponding to multiple quantization vector indices of the unvoiced codebook 1250.
A set of high-band parameters 1208 may be predicted based on the selected quantization vector(s). For example, if the decoder voiced/unvoiced classifier 1220 classifies the set of low-band parameters of the frame 1204 as voiced, the set of high-band parameters 1208 may be predicted based on the matched quantization vectors of the voiced codebook 1240. As another example, if the decoder voiced/unvoiced classifier 1220 classifies the set of low-band parameters of the frame 1204 as unvoiced, the set of high-band parameters 1208 may be predicted based on the matched quantization vectors of the voiced codebook 1250.
The voiced/unvoiced prediction model switching module 1200 may predict the high-band parameters 1208 using a codebook (e.g., the voiced codebook 1240 or the unvoiced codebook 1250) that better corresponds to the frame 1204, resulting in increased accuracy of the predicted high-band parameters 1208 as compared to using a single codebook for voiced and unvoiced frames. For example, if the frame 1204 corresponds to voiced audio, the voiced codebook 1240 may be used to predict the high-band parameters 1208. As another example, if the frame 1204 corresponds to unvoiced audio, the unvoiced codebook 1250 may be used to predict the high-band parameters 1208.
Referring to
The method 1300 includes receiving a set of low-band parameters corresponding to a frame of an audio signal, at 1302. For example, the voiced/unvoiced prediction model switching module 1200 may receive the set of low-band parameters corresponding to the frame 1204, as described with reference to
The method 1300 also includes classifying the set of low-band parameters as voiced or unvoiced, at 1304. For example, the decoder voiced/unvoiced classifier 1220 may classify the set of low-band parameters as voiced or unvoiced, as described with reference to
The method 1300 further includes selecting a quantization vector, where the quantization vector corresponds to a first plurality of quantization vectors associated with voiced low-band parameters when the set of low-band parameters is classified as voiced low-band parameters, and where the quantization vector corresponds to a second plurality of quantization vectors associated with unvoiced low-band parameters when the set of low-band parameters is classified as unvoiced low-band parameters, at 1306. For example, the voiced/unvoiced prediction model switching module 1200 of
The method 1300 further includes predicting a set of high-band parameters based on the selected quantization vector, at 1310. For example, the voiced/unvoiced prediction model switching module 1200 of
In particular embodiments, the method 1300 of
Referring to
The multistage high-band error detection module 1400 includes a buffer 1416 coupled to a voicing classification module 1420. The voicing classification module 1420 is coupled to a gain condition tester 1430 and to a gain frame modification module 1440. In a particular embodiment, the multistage high-band error detection module 1400 may include fewer or more than the illustrated modules.
During operation, the buffer 1416 and the voicing classification module 1420 may receive low-band parameters 1402 corresponding to a low-band audio signal. In a particular embodiment, the low-band parameters 1402 may correspond to the low-band parameters 302 of
The buffer 1416 may receive and store the first set of low-band parameters. Subsequently, the voicing classification module 1420 may receive the second set of low-band parameters and may receive the stored first set of low-band parameters (e.g., from the buffer 1416). The voicing classification module 1420 may classify the first set of low-band parameter as voiced or unvoiced, such as described with reference to
The gain condition tester 1430 may receive a gain frame parameter 1412 (e.g., a predicted high-band gain frame) corresponding to the second frame 1406. In a particular embodiment, the gain condition tester 1430 may receive the gain frame parameter 1412 from the soft vector quantization module 312 and/or the voiced/unvoiced prediction model switch 316 of
The gain condition tester 1430 may determine whether the gain frame parameter 1412 is to be adjusted based at least partially on the classification (e.g., voiced or unvoiced) of the first set of low-band parameters and of the second set of low-band parameters by the voicing classification module 1420 and based on an energy value corresponding to the second set of low-band parameters. For example, the gain condition tester 1430 may compare the energy value corresponding to the second set of low-band parameters to a threshold energy value, an energy value corresponding to the first set of low-band parameters, or both, based on the classification of the first set of low-band parameters and the second set of low-band parameters. The gain condition tester 1430 may determine whether the gain frame parameter 1412 is to be adjusted based on the comparison, based on determining whether the gain frame parameter 1412 satisfies (e.g., is below) a threshold gain, or both, as further described with reference to
The gain frame modification module 1440 may modify the gain frame parameter 1412 in response to the gain condition tester 1430 determining that the gain frame parameter 1412 is to be adjusted. For example, the gain frame modification module 1440 may modify the gain frame parameter 1412 to satisfy the threshold gain.
The multistage high-band error detection module 1400 may detect whether the gain frame parameter 1412 is unstable (e.g., corresponds to an energy value that is disproportionately higher than energies of adjacent frames or sub-frames) and/or may lead to noticeable artifacts in the generated wide band audio signal. In response to the gain condition tester 1430 determining that a high-band prediction error may have occurred, the multistage high-band error detection module 1400 may adjust the gain frame parameter 1412 to generate an adjusted gain frame parameter 1414, as described further with respect to
Referring to
The method 1500 includes determining whether a first set of low-band parameters and a second set of low-band parameters are both classified as voiced, at 1502. For example, the gain condition tester 1430 of
The method 1500 also includes, in response to determining that at least one of the first set of low-band parameters or the second set of low-band parameters is not classified as voiced, at 1502, determining whether the first set of low-band parameters is classified as unvoiced and the second set of low-band parameters is classified as voiced, at 1504. For example, the gain condition tester 1430 of
The method 1500 further includes, in response to determining that the first set of low-band parameters is not classified as unvoiced or that the second set of low-band parameters is not classified as voiced, at 1504, determining whether the first set of low-band parameters is classified as voiced and the second set of low-band parameters is classified as unvoiced, at 1506. For example, the gain condition tester 1430 of
The method 1500 also includes in response to determining that the first set of low-band parameters is not classified as voiced or that the second set of low-band parameters is not classified as unvoiced, at 1506, determining whether the first set of low-band parameters and the second set of low-band parameters are both classified as unvoiced, at 1508. For example, the gain condition tester 1430 of
The method 1500 further includes, in response to determining that the first set of low-band parameters and the second set of low-band parameters are both classified as voiced, at 1502, determining whether a first energy value and a second energy value satisfy (e.g., exceed) a first energy threshold value, at 1522. For example, the gain condition tester 1430 of
The method 1500 also includes, in response to determining that the first set of low-band parameters is classified as unvoiced and the second set of low-band parameters is classified as voiced, at 1504, determining whether the second energy value ELB(n) satisfies the first energy threshold value E0 and whether the second energy value is greater than a first multiple (e.g., 4) of the first energy value ELB(n−1), at 1524. For example, the gain condition tester 1430 of
The method 1500 further includes, in response to determining that the first set of low-band parameters is classified as voiced and the second set of low-band parameters is classified as unvoiced, at 1506, determining whether the second energy value ELB(n) satisfies the first energy threshold value E0 and whether the second energy value is greater than a second multiple (e.g., 2) of the first energy value ELB(n−1), at 1526. For example, the gain condition tester 1430 of
The method 1500 also includes, in response to determining that the first set of low-band parameters and the second set of low-band parameters are both classified as unvoiced, at 1508, determining whether the second energy value ELB(n) is greater than a third multiple (e.g., 100) of the first energy value ELB(n−1), at 1528. For example, the gain condition tester 1430 of
The method 1500 further includes, in response to determining that the second energy value is less than or equal to the third multiple (e.g., 100) of the first energy value, at 1528, determining whether the second energy value ELB(n) satisfies the first energy threshold E0, at 1530. For example, the gain condition tester 1430 of
The method 1500 also includes, in response to determining that the first energy value and the second energy value satisfy the first energy threshold, at 1522, that the second energy value satisfies the first energy threshold and the second energy value is greater than the first multiple of the first energy value, at 1524, that the second energy value satisfies the first energy threshold and the second energy value is greater than the second multiple of the first energy value, at 1526, or that the second energy value satisfies the first energy threshold at 1530, determining whether a gain frame parameter satisfies a threshold gain, at 1540. The method 1500 further includes, in response to determining that the gain frame parameter does not satisfy the threshold gain, at 1540, or that the second energy value is greater than the third multiple of the first energy value, at 1528, adjusting the gain frame parameter, at 1550. For example, the gain frame modification module 1440 may adjust the gain frame parameter 1412 in response to determining that the gain frame parameter 1412 does not satisfy the threshold gain or in response to determining that the second energy value is greater than the third multiple of the first energy value, as further described with reference to
In particular embodiments, the method 1500 of
Referring to
The method 1600 includes receiving a first set of low-band parameters corresponding to a first frame of an audio signal, at 1602. For example, the buffer 1416 of
The method 1600 also includes receiving a second set of low-band parameters corresponding to a second frame of the audio signal, at 1604. The second frame may be subsequent to the first frame within the audio signal. For example, the voicing classification module 1420 of
The method 1600 further includes classifying the first set of low-band parameters as voiced or unvoiced and classify the second set of low-band parameters as voiced or unvoiced, at 1606. For example, the voicing classification module 1420 of
The method 1600 also includes selectively adjusting a gain parameter based on a classification of the first set of low-band parameters, a classification of the second set of low-band parameters, and an energy value corresponding to the second set of low-band parameters, at 1608. For example, the gain frame modification module 1440 may adjust the gain frame parameter 1412 based on the classification of the first set of low-band parameters, the classification of the second set of low-band parameters, and an energy value (e.g., the second energy value ELB(n)) corresponding to the second set of low-band parameters, as further described with reference to
In particular embodiments, the method 1600 of
Referring to
In the following description, various functions performed by the system 1700 of
The narrowband decoder 1710 may be configured to receive the narrowband bitstream 1702 (e.g., an adaptive multi-rate (AMR) bitstream, an enhanced full rate (EFR) bitstream, or an enhanced variable rate CODEC (EVRC) bitstream associated with an EVRC, such as EVRC-B). The narrowband decoder 1710 may be configured to decode the narrowband bitstream 1702 to recover a low-band audio signal 1734 corresponding to the narrowband bitstream 1702. In a particular embodiment, the low-band audio signal 1734 may represent speech. As an example, a frequency of the low-band audio signal 1734 may range from approximately 0 hertz (Hz) to approximately 4 kilohertz (kHz). The low-band audio signal 1734 may be in the form of pulse-code modulation (PCM) samples. The low-band audio signal 1734 may be provided to the synthesis filterbank 1740.
The high-band parameter prediction module 1720 may be configured to receive low-band parameters 1704 (e.g., AMR parameters, EFR parameters, or EVRC parameters) from the narrowband bitstream 1702. The low-band parameters 1704 may include linear prediction coefficients (LPC), line spectral frequencies (LSF), gain shape information, gain frame information, and/or other information descriptive of the low-band audio signal 1734. In a particular embodiment, the low-band parameters 1704 include AMR parameters, EFR parameters, or EVRC parameters corresponding to the narrowband bitstream 1702.
Because the system 1700 is integrated into the decoding system (e.g., the decoder) of the speech vocoder, the low-band parameters 1704 from an encoder's analysis (e.g., from an encoder of the speech vocoder) may be accessible to the high-band parameter prediction module 1720 without the use of a “tandeming” process that introduces noise and other errors that reduce the quality of the predicted high-band. For example, conventional BBE systems (e.g., post-processing systems) may perform synthesis analysis in a narrowband decoder (e.g., the narrowband decoder 1710) to generate a low-band signal in the form of PCM samples (e.g., the low-band signal 1734) and additionally perform signal analysis (e.g., speech analysis) on the low-band signal to generate low-band parameters. This tandeming process (e.g., the synthesis analysis and the subsequent signal analysis) introduces noise and other errors that reduce the quality of the predicted high-band. By accessing the low-band parameters 1704 from the narrowband bitstream 1702, the system 1700 may forego the tandeming process to predict the high-band with improved accuracy.
For example, based on the low-band parameters 1704, the high-band parameter prediction module 1720 may generate predicted high-band parameters 1706. The high-band parameter prediction module 1720 may use soft vector quantization to generate the predicted high-band parameters 1706, such as in accordance with one or more of the embodiments described with reference to
The high-band model module 1730 may use the predicted high-band parameters 1706 to generate a high-band signal 1732. As an example, a frequency of the high-band signal 1732 may range from approximately 4 kHz to approximately 8 kHz. In a particular embodiment, the high-band model module 1730 may use the predicted high-band parameters 1706 and low-band residual information (not shown) generated from the narrowband decoder 1710 to generate the high-band signal 1732, in a similar manner as described with respect to
The synthesis filter bank 1740 may be configured to receive the high-band signal 1732 and the low-band signal 1734 and generate a wideband output 1736. The wideband output 1736 may include a wideband speech output that includes the decoded low-band audio signal 1734 and the predicted high-band audio signal 1732. A frequency of the wideband output 1736 may range from approximately 0 Hz to approximately 8 kHz, as an illustrative example. The wideband output 1736 may be sampled (e.g., at approximately 16 kHz) to reconstruct the combined low-band and high-band signals.
The system 1700 of
The integration of the system 1700 into the decoder of the speech vocoder may support other integrated functions of the speech vocoder that are supplemental features of the speech vocoder. As non-limiting examples, homing sequences, in-band signaling of network features/controls, and in-band data modems may be supported by the system 1700. For example, by integrating the system 1700 (e.g., the BBE system) with the decoder, a homing sequence output of a wideband vocoder may be synthesized such that the homing sequence may be passed across narrowband junctures (or wideband junctures) in a network (e.g., interoperation scenarios). For in-band signaling or in-band modems, the system 1700 may allow the decoder to remove in-band signals (or data), and the system 1700 may synthesize a wideband bitstream that includes the signals (or data) as opposed to a conventional BBE system in which in-band signals (or data) are lost through tandeming.
Although the system 1700 of
Alternatively, the interworking function may predict the high-band from the narrowband parameters (e.g., without using the narrowband PCM) and encode a wideband vocoder bitstream without using the wideband PCM). A similar approach may be used in conference bridges to synthesize a wideband output (e.g., the wideband outputs speech 1736) from multiple narrowband inputs.
Referring to
The method 1800 includes receiving, at a decoder of a speech vocoder, a set of low-band parameters as part of a narrowband bitstream, at 1802. For example, referring to
A set of high-band parameters may be predicted based on the set of low-band parameters, at 1804. For example, referring to
The method 1800 of
Referring to
One or more components of the system 1900 may be implemented via dedicated hardware (e.g., circuitry), by a processor executing instructions to perform one or more tasks, or a combination thereof. As an example, the memory 1932 or one or more components of the high-band parameter prediction module 1972 may be a memory device, such as a random access memory (RAM), magnetoresistive random access memory (MRAM), spin-torque transfer MRAM (STT-MRAM), flash memory, read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), registers, hard disk, a removable disk, or a compact disc read-only memory (CD-ROM). The memory device may include instructions (e.g., the instructions 1960) that, when executed by a computer (e.g., a processor in the CODEC 1934 and/or the processor 1910), may cause the computer to perform at least a portion of one of the method 200 of
Those of skill would further appreciate that the various illustrative logical blocks, configurations, modules, circuits, and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software executed by a processing device such as a hardware processor, or combinations of both. Various illustrative components, blocks, configurations, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or executable software depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in a memory device, such as random access memory (RAM), magnetoresistive random access memory (MRAM), spin-torque transfer MRAM (STT-MRAM), flash memory, read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), registers, hard disk, a removable disk, or a compact disc read-only memory (CD-ROM). An exemplary memory device is coupled to the processor such that the processor can read information from, and write information to, the memory device. In the alternative, the memory device may be integral to the processor. The processor and the storage medium may reside in an application-specific integrated circuit (ASIC). The ASIC may reside in a computing device or a user terminal. In the alternative, the processor and the storage medium may reside as discrete components in a computing device or a user terminal.
The previous description of the disclosed embodiments is provided to enable a person skilled in the art to make or use the disclosed embodiments. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the principles defined herein may be applied to other embodiments without departing from the scope of the disclosure. Thus, the present disclosure is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope possible consistent with the principles and novel features as defined by the following claims.
The present application claims priority from U.S. Provisional Application No. 61/916,264, filed Dec. 15, 2013, which is entitled “SYSTEMS AND METHODS OF BLIND BANDWIDTH EXTENSION,” and from U.S. Provisional Application No. 61/939,148, filed Feb. 12, 2014, which is entitled “SYSTEMS AND METHODS OF BLIND BANDWIDTH EXTENSION,” the content of which is incorporated by reference in its entirety.
Number | Name | Date | Kind |
---|---|---|---|
4521646 | Callaghan | Jun 1985 | A |
4914701 | Zibman | Apr 1990 | A |
5455888 | Iyengar et al. | Oct 1995 | A |
5581652 | Abe et al. | Dec 1996 | A |
5644310 | Laczko, Sr. et al. | Jul 1997 | A |
5758027 | Meyers et al. | May 1998 | A |
6014623 | Wu et al. | Jan 2000 | A |
6044268 | Haartsen | Mar 2000 | A |
6125120 | Lehtimaki | Sep 2000 | A |
6226616 | You | May 2001 | B1 |
6230120 | Suvanen | May 2001 | B1 |
6300552 | Sato | Oct 2001 | B1 |
6349197 | Oestreich | Feb 2002 | B1 |
6445686 | Hoffbeck et al. | Sep 2002 | B1 |
6539355 | Omori et al. | Mar 2003 | B1 |
6681202 | Miet et al. | Jan 2004 | B1 |
6842733 | Gao et al. | Jan 2005 | B1 |
7072366 | Parkkinen | Jul 2006 | B2 |
7088704 | Byers et al. | Aug 2006 | B1 |
7469206 | Kjorling | Dec 2008 | B2 |
7720676 | Philippe | May 2010 | B2 |
7953604 | Mehrotra et al. | May 2011 | B2 |
8392198 | Berisha et al. | Mar 2013 | B1 |
8532983 | Gao | Sep 2013 | B2 |
20010044722 | Gustafsson et al. | Nov 2001 | A1 |
20020007280 | McCree | Jan 2002 | A1 |
20020131377 | DeJaco et al. | Sep 2002 | A1 |
20030093278 | Malah | May 2003 | A1 |
20040138876 | Kallio et al. | Jul 2004 | A1 |
20040254786 | Kirla et al. | Dec 2004 | A1 |
20050273322 | Lee | Dec 2005 | A1 |
20070299669 | Ehara | Dec 2007 | A1 |
20080126085 | Morii | May 2008 | A1 |
20080129350 | Mitsufuji et al. | Jun 2008 | A1 |
20080177532 | Greiss et al. | Jul 2008 | A1 |
20090292537 | Ehara et al. | Nov 2009 | A1 |
20100169081 | Yamanashi et al. | Jul 2010 | A1 |
20120076323 | Disch et al. | Mar 2012 | A1 |
20120239388 | Sverrisson et al. | Sep 2012 | A1 |
20130144614 | Myllyla et al. | Jun 2013 | A1 |
20150170655 | Li et al. | Jun 2015 | A1 |
Entry |
---|
International Search Report and Written Opinion of the International Searching Authority (EPO) for International Application No. PCT/US2014/069045, mailed Mar. 4, 2015, 13 pages. |
Jax, P. et al., “On Artificial Bandwidth Extension of Telephone Speech”, Signal Processing, Elsevier Science Publishers B.V., Amsterdam, NL, vol. 83, No. 8, dated Aug. 1, 2003, pp. 1707 through 1719. |
Laaksonen, L. et al., “Artificial Bandwidth Expansion Method to Improve Intelligibility and Quality of AMR Coded Narrowband Speech”, International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2005, pp. 809 through 812. |
Nour-Eldin, A., “Quantifying and Exploiting Speech Memory for the Improvement of Narrowband Speech Bandwidth Extension”, Nov. 1, 2013, pp. 1 through 336. |
Soon, I. et al., “Bandwidth Extension of Narrowband Speech Using Soft-decision Vector Quantization”, Information Communications and Signal Processing, 2005 Fifth International Conference on Bangkok, Thailand, Piscataway, NJ, USA, IEEE, dated Dec. 6, 2005, pp. 734 through 738. |
Number | Date | Country | |
---|---|---|---|
20150170654 A1 | Jun 2015 | US |
Number | Date | Country | |
---|---|---|---|
61916264 | Dec 2013 | US | |
61939148 | Feb 2014 | US |