One or more exemplary embodiments relate to decoding of a signal, and more particularly, to a method and an apparatus for generating a wideband signal from a narrowband bitstream and a device employing the same.
In most voice communication systems, the bandwidth is limited to a range from 0.3 kHz to 3.4 kHz. A speech bandwidth includes a voiced sound section and an unvoiced sound section, where sound quality of a reconstructed signal is deteriorated from that of an original signal due to the limited bandwidth. To reduce deterioration in the sound quality, a wideband speech receiving device has been suggested. A wideband speech having a bandwidth from 0.05 kHz to 7 kHz may cover all voice bandwidths including a voiced sound section and an unvoiced sound section and naturalness and clarity of a wideband speech may be superior than those of a narrowband speech. However, since voice communication applications, such as public switched telephone network (PSTN), an internet phone service such as VoIP and VoWiFi, and a voice-related application installed on a mobile device, are still provided based on narrowband speech codecs, significant time and cost are required for changing a current codec to a wideband codec.
Therefore, to obtain a wideband signal from a narrowband signal via a decoder, various bandwidth extension techniques have been suggested. An example of the bandwidth extension techniques may be a technique for allocating an additional bit for a high-band, that is, a guided bandwidth extension. The guided bandwidth extension is a technique for extending a speech bandwidth by using encoding information transmitted from an encoder, where additional information therefor is included in a bitstream. An encoder analyzes a speech signal and generates and transmits the additional information for a high-band signal. A decoder generates a high-band signal based on the transmitted additional information and a low-band signal. Another example of the bandwidth extension techniques may be a technique for generating a high-band signal from a low-band signal in a decoder without allocating an additional bit, e.g., a blind bandwidth extension. To this end, techniques based on estimations using pattern recognizing techniques, such as the hidden Markov model and the Gaussian mixture model, have been suggested. However, pattern recognition requires a training process, and efficiency of the pattern recognition may vary according to languages for recognition. Furthermore, since an amount of calculations for prediction or estimation significantly increases, it is difficult to quickly and effectively process a speech signal received in real time. In addition, the sound quality of a high-band signal generated without allocation of an additional bit is relatively inferior.
Recently, it becomes more and more necessary to provide a wideband signal or an ultra-wideband signal with improved sound quality to a user from a narrowband signal without an excessive increase of complexity and without changing the basic structure of an existing communication system, that is, the basic structure of a telephony system or a decoder used in a receiving end, even if a bandwidth extension technique is applied.
One or more exemplary embodiments provide a method and an apparatus for generating a wideband signal from a narrowband bitstream based on blind bandwidth extension and a device employing the same.
According to one or more exemplary embodiments, a method of generating a wideband signal, the method comprising estimating a high-band spectrum parameter from a reconstructed narrowband signal based on a combination of at least two mapping schemes, estimating a high-band excitation signal from the reconstructed narrowband signal, generating a high-band signal based on the estimated high-band spectrum parameter and the estimated high-band excitation signal, and generating a wideband signal by synthesizing the reconstructed narrowband signal with the high-band signal.
According to one or more exemplary embodiments, a method of generating a wideband signal, the method comprises estimating a high-band spectrum parameter from a reconstructed narrowband signal, whitening the reconstructed narrowband signal and estimating a high-band excitation signal based on the whitened narrowband signal, generating a high-band signal based on the estimated high-band spectrum parameter and the estimated high-band excitation signal, and generating a wideband signal by synthesizing the reconstructed narrowband signal with the high-band signal.
According to one or more exemplary embodiments, a wideband signal generating apparatus comprises a high-band signal generator, which estimates a high-band envelope signal from a reconstructed narrowband signal based on a combination of a codebook mapping scheme and a linear mapping scheme, estimates a high-band excitation signal from the reconstructed narrowband signal, and generates a high-band signal, and a synthesizer, which generates a wideband signal by synthesizing the reconstructed narrowband signal with the high-band signal.
According to one or more exemplary embodiments, a wideband signal generating apparatus comprises a high-band signal generator, which estimates a high-band envelope signal based on a reconstructed narrowband signal, estimates a high-band excitation signal based on a signal obtained by whitening the reconstructed narrowband signal, and generates a high-band signal, and a synthesizer, which generates a wideband signal by synthesizing the reconstructed narrowband signal with the high-band signal.
A wideband signal or an ultra-wideband signal with improved sound quality may be provided to a user from a narrowband signal without an excessive increase of complexity and without changing the basic structure of a communication system supporting the narrowband, that is, the basic structure of a telephony system or a decoder used in a receiving end. Furthermore, since it is not necessary to include an additional bit for bandwidth extension into a bitstream provided by an encoder, one or more exemplary embodiments may be more suitable for a low-bitrate network. Furthermore, since bandwidth extension is selectively performed based on a user input or characteristics of a narrowband signal, a narrowband signal or a wideband signal may be selectively provided.
The present invention will now be described more fully with reference to the accompanying drawings, in which exemplary embodiments of the invention are shown. In the description of the present invention, if it is determined that a detailed description of commonly-used technologies or structures related to the invention may unnecessarily obscure the subject matter of the invention, the detailed description will be omitted.
Throughout the specification, it will be understood that when a portion is referred to as being “connected to” another portion, it can be “directly connected to” the other portion or “electrically connected to” the other portion via another element.
While such terms as “first,” “second,” etc., may be used to describe various components, such components must not be limited to the above terms. The above terms are used only to distinguish one component from another.
The term ‘signal’ includes parameters, coefficients, and elements and may be interpreted otherwise or may be used as a combination of definitions thereof.
In addition, the term “units” described in the specification mean units for processing at least one function and operation and can be implemented by software components or hardware components, such as FPGA or ASIC. However, the “units” are not limited to software components or hardware components. The “units” may be embodied on a recording medium and may be configured to operate one or more processors. Therefore, for example, the “units” may include components, such as software components, object-oriented software components, class components, and task components, processes, functions, properties, procedures, subroutines, program code segments, drivers, firmware, micro codes, circuits, data, databases, data structures, tables, arrays, and variables. Components and functions provided in the “units” may be combined to smaller numbers of components and “units” or may be further divided into larger numbers of components and “units.”
The wideband signal generating apparatus shown in
Referring to
The high-band signal generator 130 may estimate extension parameters necessary for generating a high-band signal by using a reconstructed narrowband signal provided by the narrowband decoder 110 and may generate a high-band signal based on the estimated extension parameters. Here, examples of the extension parameters may include a spectrum parameter and an excitation signal. Examples of the spectrum parameter may include at least one of an envelope signal, an energy level, or a gain, whereas the excitation signal may be a residual signal or a residual error signal. The configuration and the operation of the high-band signal generator 130 will be described later.
The synthesizer 150 may generate a wideband signal by synthesizing the reconstructed narrowband signal provided by the narrowband decoder 110 with a high-band signal provided by the high-band signal generator 130.
The wideband signal generating apparatus shown in
Referring to
According to an embodiment, bandwidth extension may be selectively performed with regard to a voiced sound section and an unvoiced sound section. In other words, bandwidth extension may be performed on a voiced sound section, whereas no bandwidth extension may be performed on an unvoiced sound section. According to an embodiment, with regard to an unvoiced sound section, Os or predetermined noise components may be filled into a high-band. For a voiced sound section, the signal classifier 200 may provide an enable signal for operating the high-band signal generator 230 to the high-band signal generator 230. According to another embodiment, the signal classifier 200 may determine whether to provide a reconstructed narrowband signal from the narrowband decoder 210 to the high-band signal generator 230 with regard to a voiced sound section or an unvoiced sound section.
Regarding the voiced sound section of a narrowband signal, the high-band signal generator 230 may estimate extension parameters for generating a high-band signal by using a reconstructed narrowband signal provided by the narrowband decoder 110 and generate a high-band signal by using the estimated extension parameters.
The synthesizer 250 may generate a wideband signal by synthesizing the reconstructed narrowband signal provided by the narrowband decoder 210 with the high-band signal provided by the high-band signal generator 230.
The wideband signal generating apparatus shown in
Referring to
The high-band signal generator 330 may estimate extension parameters for generating a high-band signal by using a reconstructed narrowband signal from the narrowband decoder 310 and the switching unit 320 and generate a high-band signal by using the estimated extension parameters.
The synthesizer 350 may generate a wideband signal by synthesizing the reconstructed narrowband signal provided by the narrowband decoder 310 with the high-band signal provided by the high-band signal generator 330.
According to another embodiment, when the wideband signal generating apparatus is embodied to provide a reconstructed narrowband signal from the narrowband decoder 310 to the high-band signal generator 330, the wideband signal generating apparatus may be designed, such that the high-band signal generator 330 operates when a switching signal is generated based on a user input.
The high-band signal generating module shown in
Referring to
The spectrum parameter estimator 430 may estimate a high-band spectrum parameter, e.g., a high-band envelope signal, by using the narrowband LPC coefficient provided by the first LP analyzer 410. In detail, the spectrum parameter estimator 430 may estimate a high-band envelope signal by mapping a narrowband LPC coefficient to a high-band LPC coefficient by using a combination of at least two mapping schemes. Furthermore, the spectrum parameter estimator 430 may estimate a gain from a narrowband LPC coefficient or a narrowband signal provided by the first LP analyzer 410. A gain may be estimated by using various techniques known in the art. According to an embodiment, the spectrum parameter estimator 430 may combine at least two mapping schemes, e.g., a codebook mapping and a linear mapping. Since it is difficult to process (e.g., quantize) a LPC coefficient efficiently, a LPC coefficient may be commonly converted to another format, e.g., a line spectrum pair (LSP) coefficient or a line spectrum frequency (LSF) coefficient. Furthermore, an LPC coefficient may include another format, e.g., a parcor coefficient, a log-area ratio value, an immittance spectrum pair coefficient, or an immittance spectrum frequency coefficient. Alternatively, a cepstral coefficient may be used instead of an LPC coefficient.
The first LPC filtering unit 450 may generate a narrowband excitation signal by filtering a narrowband LPC coefficient provided by the first LP analyzer 410 from the reconstructed narrowband signal.
The excitation estimator 470 may generate a whitened narrowband excitation signal by performing LP analysis and LPC filtering on a narrowband excitation signal provided by the first LPC filtering unit 450 and estimate a high-band excitation signal by using the whitened narrowband excitation signal. In detail, a whitened high-band excitation signal may be generated by shifting the whitened narrowband excitation signal to a corresponding high-band, a narrowband excitation LPC coefficient may be generated by performing LP analysis on the narrowband excitation signal, and the narrowband excitation LPC coefficient may be linearly mapped to a corresponding high-band excitation LPC coefficient, and thus a high-band excitation LPC coefficient may be generated. A high-band excitation signal may be generated by performing LP synthesis on the whitened high-band excitation signal and the high-band excitation LPC coefficient. Although an LPC coefficient is used instead of an LSP coefficient for convenience of explanation, the LSP coefficient may be preferably used for linear mapping.
The first LP synthesizer 490 may generate a high-band signal by performing LP synthesis on a high-band spectrum parameter estimated by the spectrum parameter estimator 430 and a high-band excitation signal estimated by the excitation estimator 470.
The spectrum parameter estimating module shown in
Referring to
The codebook mapper 530 may generate a first high-band LSP coefficient, which is a first extended spectrum parameter (that is, a first high-band codeword), by mapping a narrowband LSP coefficient to a corresponding high-band LSP coefficient by using a high-band codebook corresponding to a narrowband codebook. Each of the narrowband codebook and the high-band codebook may be designed to include N groups of codewords adjacent to one another. Each group may include the same number of codewords, but is not limited thereto. Here, codewords adjacent to one another may refer to codewords corresponding to frequencies or sizes similar to one another.
Based on a mapping result provided by the codebook mapper 530, the first linear mapper 550 may generate a first high-band LSP coefficient, which is a second extended spectrum parameter (that is, a second high-band codeword), by mapping a narrowband LSP coefficient by using a linear matrix. Here, the linear matrix may be obtained based on a relationship between narrowband training data and high-band training data.
The selector 570 may compare the first high-band LSP coefficient and the second high-band LSP coefficient to the narrowband LSP coefficient and select one of the high-band LSP coefficients exhibiting less spectrum distortion.
The first inverse-transform unit 590 may generate a high-band LPC coefficient by inverse-transforming the LSP coefficient selected by the selector 570. At least one high-band spectrum parameter, such as an envelope signal, an energy level, or a gain, may be estimated from the generated high-band LPC coefficient.
The excitation estimating module shown in
Referring to
The second LPC filtering unit 620 may generate a whitened narrowband excitation signal by filtering a narrowband excitation LPC coefficient provided by the second LP analyzer 610 from a narrowband excitation signal.
The shifter 630 may shift a whitened narrowband excitation signal provided by the second LPC filtering unit 620 to a correspond high-band. In detail, since an excitation signal has a flat spectrum characteristic, a whitened high-band excitation signal may be generated by copying a whitened narrowband excitation signal to a high band in a frequency domain. According to an embodiment, an adaptive spectral shifting for adjusting the frequency of a narrowband excitation signal shifted to the high-band based on pitch information may be applied. When the adaptive spectral shifting is applied, a similar harmonic structure may be maintained between the narrowband and the high-band.
In detail, the lower region and the upper region of a high-band excitation signal in a frequency domain may be obtained by copying the upper region of a whitened narrowband excitation signal. Here, for example, the upper region of the whitened narrowband excitation signal may be a range from 1.9 kHz to 3.8 kHz, whereas the lower region and the upper region of the high-band excitation signal may be from ˜3.8 kHz to 5.7 kHz and from ˜5.7 kHz to 7.6 kHz. ˜3.8 kHz and ˜5.7 kHz indicate multiples of a fundamental frequency that is close to 3.8 kHz and 5.7 kHz and do not exceed 3.8 kHz and 5.7 kHz, respectively. For example, the fundamental frequency may be about 1.9 kHz.
Although a spectral shifting technique is employed in the exemplary embodiment, a whitened high-band excitation signal may be generated from a whitened narrowband excitation signal by using one of techniques including a non-linear function transform, oversampling excitation, and Gaussian modulation.
The second transform unit 640 may transform a narrowband excitation LPC coefficient provided by the second LP analyzer 610 and generate a narrowband excitation LSP coefficient.
The second linear mapper 650 may generate a high-band excitation LSP coefficient by mapping a narrowband excitation LSP coefficient provided by the second transform unit 640 by using a linear matrix. According to an embodiment, a narrowband excitation LSP coefficient transformed from a narrowband excitation LPC coefficient with an order of 6 may be mapped to a high-band LSP coefficient with an order of 10 by using a single linear matrix. The linear matrix may be obtained based on a relationship between narrowband training data and high-band training data.
The second inverse-transform unit 660 may generate a high-band excitation LPC coefficient by inverse-transforming a high-band excitation LSP coefficient provided by the second linear mapper 650.
The second LP synthesizer 670 may generate a high-band excitation signal by performing LPC synthesis on a whitened high-band excitation signal provided by the shifter 630 and a high-band excitation LPC coefficient provided by the second inverse-transform unit 660.
Although the linear mapping is applied in the exemplary embodiment, a high-band excitation LSP coefficient may be generated from a narrowband excitation LSP coefficient by using a non-linear function or one of various other transform techniques.
The synthesizing module shown in
Referring to
The low pass filter 730 may set the maximum frequency of the narrowband as a cutoff frequency and perform low pass filtering on an upsampled narrowband signal provided by the upsampler 710.
The high pass filter 750 may set the minimum frequency of the high-band as a cutoff frequency and perform high pass filtering on a high-band signal generated via blind bandwidth extension. The high-band signal may be provided by one of the high-band signal generators 130, 230, and 330 of
The combiner 770 may generate a wideband signal by combining a narrowband signal provided by the low pass filter 730 with a high-band signal provided by the high pass filter 750.
A codebook mapper 810 shown in
Referring to
First, training data sampled at a desired sampling rate may be collected with respect to a wide range of wideband content including frequency components corresponding to the narrowband and frequency components corresponding to the high-band. Here, in order to match the bandwidth of the training data to that of an actual signal to be processed, the training data may be downsampled. A narrowband codebook may be generated by applying the LBG algorithm to narrowband components of the training data. While the LBG algorithm is being applied to narrowband training data, a high-band codebook may also be generated by applying the LBG algorithm to high-band training data. Accordingly, a dual-structured codebook may include a set of representative narrowband codewords and a set of representative high-band codewords correspond thereto. The dual-structured codebook may be generated based on a correlation between a low-band spectrum envelope and a high-band spectrum envelope for a particular speaker or a particular speaker class. Meanwhile, in each codebook, codewords may be grouped with adjacent codewords, where optimal groups may be obtained experimentally or based on a simulation with respect to training data.
The first codebook searching unit 815 may search for a narrowband codebook for a narrowband LSP coefficient and may output a narrowband codeword index and a group index corresponding to the optimal codeword from the narrowband codebook. In other words, when a narrowband codeword index corresponding to the optimal codeword is found, a group index may be automatically determined. The narrowband LSP coefficient may be provided by the first transform unit 510 of
The second codebook searching unit 819 may search for a high-band codebook by using a narrowband codeword index provided by the first codebook searching unit 815 and obtain a first high-band codeword at a location corresponding to the narrowband codeword index from the high-band codebook. In other words, since locations of codewords of a narrowband codebook are respectively mapped to locations of codewords of a high-band codebook via a training operation, a same codeword index may be applied.
Meanwhile, in the first linear mapper 830, the third storage unit 833 may store N linear matrices corresponding to N groups constituting a narrowband codebook and a high-band codebook respectively stored in the first and/or second storage units 813 and/or 817. Generation of N linear matrices will be described below in detail in conjunction with codebooks used for codebook mapping.
First, based on a nearest neighbor searching with respect to the overall training data, the set of the dual-structured codebook may be partitioned into N cluster sets, that is, N groups. Next, the overall training data may be passed through the N cluster sets to generate per-cluster training data, i.e. per-group training data. Then, N linear matrices may be constructed by applying an optimal matrix solution on N sets of per-group training data. Meanwhile, codewords of the narrowband codebook and codewords of the high-band codebook may be rearranged, such that entries in the cluster i correspond to entries of the group i of each of the narrowband codebook and the high-band codebook. Here, the optimal matrix solution may employ a mapping relationship between narrowband training data and high-band training data.
The mapper 835 may read out a linear matrix corresponding to a group index provided by the first codebook searching unit 815 from the third storage unit 833 and generate a second high-band codeword by multiplying a narrowband LSP coefficient by the read-out linear matrix. A reordering operation may be performed on the generated second high-band codeword in order to sort a sequence of or an interval between LSP coefficients.
The selector 850 may calculate a spectral distortion based on a narrowband signal with respect to a first high-band codeword provided by the codebook mapper 810 and a second high-band codeword provided by the first linear mapper 830 and select one of the high-band codewords corresponding to a smaller spectral distortion value, as shown in Equation 1 below.
Here, hb
Here, p denotes an order of a narrowband LSP coefficient.
According to Equations 1 and 2, spectral distortions between p parameters of a narrowband LSP coefficient and p parameters of a first or second high-band LSP coefficient are calculated, where a high-band LSP coefficient corresponding to a smaller spectral distortion value may be selected.
Generally, the spectrum 910 of a narrowband excitation signal provided by the first LPC filtering unit 450 of
In order to prevent amplification of a synthesized high-band signal, when the second LPC filtering unit 620 of
Referring to
In the perceptual aspect, when a whitened excitation signal is used for blind bandwidth extension, less artifacts may be produced as compared to a case of performing blind bandwidth extension by using a conventional excitation signal.
Meanwhile, referring to
Referring to
In operation 1130, extension parameters for generating a high-band signal may be estimated by using the reconstructed narrowband signal, and a high-band signal may be generated by using the estimated extension parameters.
In operation 1150, a wideband signal may be generated by synthesizing the reconstructed narrowband signal with the high-band signal.
According to an embodiment, the method may further include an operation for determining whether an enable signal or a switching signal is generated based on a user input for determining whether to perform bandwidth extension, before the operation 1110. Here, the method may be embodied, such that operations 1110 through 1150 are performed when an enable signal or a switching signal is generated.
According to another embodiment, the method may further include an operation for determining whether to perform bandwidth extension based on characteristics of a narrowband signal, before the operation 1110. Here, the operations 1110 through 1150 may be performed on a voiced sound section of which sound quality may be enhanced via bandwidth extension. The high-band region of the remaining section, e.g., an unvoiced sound section, may be filled with Os or pre-set noise components.
Meanwhile, if the frequency range of the narrowband is from 0.3 kHz to 3.4 kHz and the frequency range of the wideband is from 0.05 kHz to 7 kHz, bandwidth extension based on the generation of a high-band signal as described above may be performed on the range from 3.4 kHz to 7 kHz, whereas bandwidth extension may be performed based on sinusoidals on the range from 0.05 kHz to 0.3 kHz.
A multimedia device 1200 shown in
Referring to
The decoding module 1230 may include a common narrowband decoding algorithm and a common bandwidth extension algorithm, where the bandwidth extension algorithm may be performed as the default algorithm or may be selectively perforjmed based on a user input received via the switch 1337 or characteristics of a narrowband signal. The bandwidth extension algorithm included in the decoding module 1230 may be based on the operations of the wideband signal generating apparatus of
The storage unit 1250 may store a narrowband signal or a wideband signal generated by the decoding module 1230. Meanwhile, the storage unit 1250 may store various programs for operating the multimedia device 1200.
The speaker 1270 may output a narrowband signal or a wideband signal generated by the decoding module 1230 to outside.
Meanwhile, the speaker 1270 may be connected to an outside headset 1280 or an external speaker 1290 in a wired or wireless manner, where the bandwidth extension algorithm may be embodied in the headset 1280 or the external speaker 1290 instead of the decoding module 1230. In this case, the headset 1280 or the external speaker 1290 may be configured to execute the bandwidth extension algorithm when the bandwidth extension algorithm is executed as the default algorithm or it is determined to perform bandwidth extension based on a user input received via the switch 1237 included in the headset 1280 or the external speaker 1290.
A multimedia device 1300 shown in
The multimedia devices 1200 and 1300 shown in
When the multimedia device 1200 or 1300 is, for example, a mobile phone, although not shown, the multimedia device 1500, 1600, or 1700 may further include a user input unit, such as a keypad, a display unit for displaying information processed by a user interface or the mobile phone, and a processor for controlling the functions of the mobile phone. In addition, the mobile phone may further include a camera unit having an image pickup function and at least one component for performing a function required for the mobile phone.
When the multimedia device 1200 or 1300 is, for example, a TV, although not shown, the multimedia device 1200 or 1300 may further include a user input unit, such as a keypad, a display unit for displaying received broadcasting information, and a processor for controlling all functions of the TV. In addition, the TV may further include at least one component for performing a function of the TV.
The above-described embodiments of the present invention may be implemented as programmable instructions executable by a variety of computer components and stored in a computer readable recording medium. The computer readable recording medium may include program instructions, a data file, a data structure, or any combination thereof. The program instructions stored in the computer readable recording medium may be designed and configured specifically for the present invention or can be publicly known and available to those skilled in the field of software. Examples of the computer readable recording medium include a hardware device specially configured to store and perform program instructions, for example, a magnetic medium, such as a hard disk, a floppy disk, and a magnetic tape, an optical recording medium, such as a CD-ROM, a DVD, and the like, a magneto-optical medium, such as a floptical disc, a ROM, a RAM, a flash memory, and the like. Examples of the program instructions include machine codes made by, for example, a compiler, as well as high-level language codes executable by a computer using an interpreter. (The above exemplary hardware device can be configured to operate as one or more software modules in order to perform the operation in an exemplary embodiment, and vice versa.)
While this invention has been particularly shown and described with reference to preferred embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims. The preferred embodiments should be considered in descriptive sense only and not for purposes of limitation. Therefore, the scope of the invention is defined not by the detailed description of the invention but by the appended claims, and all differences within the scope will be construed as being included in the present invention.
Number | Date | Country | Kind |
---|---|---|---|
10-2013-0132623 | Nov 2013 | KR | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/KR2014/010456 | 11/3/2014 | WO | 00 |