The present disclosure relates to a method and device for time-domain bandwidth expansion of an excitation signal during encoding/decoding of a cross-talk sound signal.
In the present disclosure and the appended claims:
In many conversational applications there are often situations when one person talks over another person. As mentioned herein above, such situations are often referred to as “cross-talk”. Cross-talk speech segments may be problematic in modern speech encoding/decoding systems. Since the traditional speech encoding technologies have been designed and optimized mainly for single-talk content (only one person talking), the quality of cross-talk speech may be severely impacted by the encoding/decoding operations. As an example, one of the most serious issues in cross-talk speech encoding/decoding in the 3GPP EVS codec (Reference [1] or which the full content is incorporated herein by reference) is the occasional presence of “rattling noise”. “Rattling noise” is a strong annoying sound produced at frequencies from 8 kHz to 14 kHz, that is within the high-band frequency range examples as defined herein above.
At low bitrates of the 3GPP EVS codec the high-band frequency content is encoded/decoded using the super wideband bandwidth extension (SWB TBE) tool as described in Reference [1]. Due to the limited number of bits available for the SWB TBE tool the high-band excitation signal within the high-band frequency range is not encoded directly. Instead, the low-band excitation signal within the low-band frequency range is calculated using an ACELP (Algebraic Code-Excited Lineal Prediction) encoder (Reference [2] of which the full content is incorporated herein by reference), then upsampled and extended up to 14 kHz or 16 kHz depending on the high-band frequency range and used as a replacement for the high-band excitation signal. If there is a mismatch between the low-band excitation signal and the high-band excitation signal the synthesized sound may sound differently compared to the original sound. When the low-band excitation signal is voiced but the high-band excitation signal is unvoiced the synthesized sound will be perceived as the above defined rattling noise. The problem of rattling noise in the cross-talk content is illustrated in the spectral plot of
The plot in
The present disclosure relates to the following aspects:
The foregoing and other objects, advantages and features of the method and device for time-domain bandwidth expansion of an excitation signal during encoding/decoding of a cross-talk sound signal will become more apparent upon reading of the following non-restrictive description of illustrative embodiments thereof, given by way of example only with reference to the accompanying drawings.
In the appended drawings:
The following description relates to a technique for encoding/decoding cross-talk sound signals. In the present disclosure, the basis for the encoding/decoding technique is the SWB TBE tool of the 3GPP EVS codec as described in Reference [1]. However, it should be kept in mind that this technique may be used in conjunction with other encoding/decoding technologies.
More specifically, the present disclosure proposes a series of modifications to the SWB TBE tool. An objective of this series of modifications is to improve the quality of synthesized cross-talk sound signals, such as cross-talk speech signals, in particular but not exclusively to eliminate the above defined rattling noise. The series of modifications is concerned with time-domain bandwidth expansion of an excitation signal and is distributed in one or more of the following three areas:
Calculation of the high-band voicing factor in accordance with the present disclosure uses a high-band autocorrelation function itself calculated from the temporal envelope of the high-band residual signal for example in the down-sampled domain. The high-band voicing factor is used in the encoder to replace the so-called voice factors derived from the low-band voicing parameter in the SWB TBE tool.
Calculation of the high-band mixing factor in accordance with the present disclosure replaces the corresponding method in the SWB TBE tool. The high-band mixing factor determines a proportion of a low-band excitation signal (for example from an ACELP core) and a random noise (which may also be defined as “white noise”) excitation signal for producing the time-domain bandwidth expanded excitation signal. In the disclosed implementation, the high-band mixing factor is calculated by means of MSE (Mean Squared Error) minimization between the temporal envelope of the random noise excitation signal and the temporal envelope of the low-band excitation signal, for example in the down-sampled domain. Quantization of the high-band mixing factor may be performed by the existing quantizer of the SWB TBE tool. The addition of the quantized high-band mixing factor to the SWB TBE bitstream results in a small increase of the bitrate. The mixing operation is performed both at the encoder and the decoder. Other properties of the mixing operation may comprise a re-scaling of the random noise excitation signal at the beginning of each frame and an interpolation of the high-band mixing factor to ensure smooth transitions between the current frame and the previous frame.
Estimation of the gain/shape parameters in accordance with the present disclosure comprises post-processing of the gain/shape parameters using adaptive smoothing of the unquantized gain/shape parameters (in the encoder) by means of weighting between original gain/shape parameters and interpolated gain/shape parameters. Quantization of the gain/shape parameters may be performed by the existing quantizer of the SWB TBE tool. The adaptive smoothing is applied twice; it is first applied to the unquantized gain/shape parameters (in the encoder), and then to the quantized gain/shape parameters (both in the encoder and decoder). An adaptive attenuation is applied to the unquantized frame gain at the encoder. The adaptive attenuation is based on an MSE excess error which is a by-product of the SHB voicing parameter calculation in the SWB TBE tool.
Referring to
where N32 k is the number of samples in the frame (frame length). In this particular non-limitative example, the input sound signal sinp(n) is sampled at the rate of Fs=32 kHz and the length of a single frame is N32 k=640 samples. This corresponds to a time interval of 20 ms. Frames of given duration, each including a given number of sub-frames and including a given number of successive sound signal samples, are used for processing sound signals in the field of sound signal encoding; further information about such frames can be found, for example, in Reference [1].
The method 200 comprises a downsampling operation 201 and the device 250 comprises a downsampler 251 for conducting operation 201. The downsampler 251 downsamples the input sound signal sinp(n) from 32 kHz to 12.8 kHz or 16 kHz depending on the bitrate of the encoder. For example, the input sound signal in the 3GPP EVS codec is downsampled to 12.8 kHz for all bitrates up to 24.4 kbps and to 16 kHz otherwise. The resulting signal is a low-band signal 202. The low-band signal 202 is encoded in an ACELP encoding operation 203 using an ACELP encoder 253.
The method 200 comprises the ACELP encoding operation 203 while the device 250 comprises the ACELP encoder 253 of the 3GPP EVS codec to perform the ACELP encoding. The ACELP encoder 253 generates two types of excitation signals, an adaptive codebook excitation signal 204 and a fixed codebook excitation signal 205 as described in Reference [1].
In the method 200 and device 250, the SWB TBE tool within the 3GPP EVS codec performs a low-band excitation signal generating operation 207 and comprises a corresponding generator 257 for generating the low-band excitation signal 208. The generator 257 uses the two excitation signals 204 and 205 as an input, mixes them together and applies a non-linear transformation to produce a mixed signal with flipped spectrum which is further processed in the SWB TBE tool to result into the low-band excitation signal 208 of
As a non-limitative example, the low-band excitation signal 208 with flipped spectrum is sampled at 16 kHz and denoted using the following relation (2):
where N=320 is the frame length.
Referring to
Following processing in the QMF filter bank 259, the method 200 comprises an operation 211 of estimating high-band filter coefficients 212 and the device 250 comprises an estimator 261 to perform operation 211. The estimator 261 estimates the high-band LP (Linear Prediction) filter coefficients 212 from the high-band target signal 210 in four consecutive subframes by frame where each subframe has the length of 80 samples. The estimator 261 calculates the high-band LP filter coefficients 212 using the Levinson-Durbin algorithm as described in Reference [1]. The high-band LP filter coefficients 212 may be denoted using the following relation (4):
where P=10 is the order of the high-band LP filter and j=0, . . . , 3 is the subframe index. The first LP filter coefficient in each subframe is unitary, i.e. ajHB(0)=1.
The method 200 comprises an operation 213 of generating a high-band residual signal 214 and the device 250 comprises a generator 263 of the high-band residual signal to conduct operation 213. The generator 263 produces the high-band residual signal 214 by filtering the high-band target signal 210 from the QMF analysis filter bank 259 with the high-band LP filter (LP filter coefficients 212) from estimator 261. The high-band residual signal 214 may be expressed, for example, using the following relation (5):
The first P samples of the high-band residual signal 214 are calculated using the high-band target signal 210 from the previous frame. This is indicated by the negative index in sHB(−k), k=1, . . . , P in the summation term. The negative indices refer to the samples of the high-band target signal 214 at the end of the previous frame.
Section 3 (High-Band Autocorrelation Function) relates to features of the encoder.
The high-band residual signal 214 calculated by the generator 263 using relation 5 is used to calculate a high-band autocorrelation function and a high-band voicing factor. The high-band autocorrelation function is not calculated directly on the high-band residual signal 214. Direct calculation of the high-band autocorrelation function requires significant computational resources. Furthermore, the dynamics of the high-band residual signal 214 are generally low and the spectral flipping process often leads to smearing the differences between voiced and unvoiced sound signals. To avoid these problems the high-band autocorrelation function is estimated on the temporal envelope of the high-band residual signal 214 for example in the downsampled domain.
The method 200 comprises an operation 215 of calculating the temporal envelope of the high band residual signal 214 and the device 250 comprises a calculator 265 to perform operation 215. To calculate the temporal envelope RTD(n) 216 of the high-band residual signal 214, the calculator 265 processes the high-band residual signal 214 through a sliding moving-average (MA) filter comprising in the example implementation M=20 taps. The temporal envelope calculation can be expressed, for example by the following relation (6):
where the negative samples rHB(k), k=−M/2, . . . , −1 refer to the values of the high-band residual signal 214 in the previous frame. In mode switching scenarios it may happen that the high-band residual signal 214 in the previous frame is not calculated and the values are unknown. In that case the first M/2 values rHB(k), k=0, . . . , M/2−1 are replicated and used as a replacement for the values rHB(k), k=−M/2, . . . , −1 of the previous frame. The calculator 265 approximates the last M values of the temporal envelope RTD(n) 216 in the current frame by means of IIR (Infinite Impulse Response) filtering. This can be done using the following relation (7):
The operation 215 of calculating the temporal envelope RTD(n) 216 of the high-band residual signal 214 is illustrated in
The method 200 comprises a temporal envelope downsampling operation 217 and the device 250 comprises a downsampler 267 for conducting operation 217. The downsampler 267 downsamples the temporal envelope RTD(n) 216 by a factor of 4 using, for example, the following relation (8):
The method 200 comprises a mean value calculating operation 219 and the device 250 comprises a calculator 269 for conducting operation 219. The calculator 269 divides the down-sampled temporal envelope R4 kHz(n) 218 into four consecutive segments and calculates the mean value 220 of the down-sampled temporal envelope R4 kHz(n) 218 in each segment using, for example, the following relation (9):
where k is the index of the segment.
The calculator 269 limits all the mean values to a maximum value of 1.0.
The method 200 comprises a normalization factor calculating operation 221 and the device 250 comprises a calculator 271 for conducting operation 221. The calculator 271 uses the down-sampled temporal envelope mean values 220 to calculate, for the respective segments k, segmental normalization factors using, for example, the following relation (10):
The calculators 271 then linearly interpolates the segmental normalization factors from relation (10) within the entire interval of the current frame to produce interpolated normalization factors 222 using, for example, the following relation (11):
This interpolation process performed by operation 221 and calculator 271 is illustrated in
In relation (11), the term η−1 refers to the last segmental normalization factor in the previous frame. Therefore, η−1 is updated with η3 after the interpolation process in each frame.
The method 200 comprises a downsampled temporal envelope normalizing operation 223 and the device 250 comprises a normalizer 273 for conducting operation 223. The normalizer 273 processes the down-sampled temporal envelope R4 kHz(n) 218 from the downsampler 267 with the interpolated normalization factors γ(n) 222 using, for example, the following relation (12):
The normalizer 273 then subtracts the global mean value
It is useful to estimate the tilt of the temporal envelope of the high-band residual signal. For that purpose, the method 200 comprises a temporal envelope tilt estimation operation 225 and the device 250 comprises an estimator 275 for conducting operation 225. The temporal envelope tilt estimation can be done by fitting a linear curve to the segmental mean values
According to the LLS method, the objective is to minimize the sum of squared differences between
The optimal slope aLLS (tilt 226) can be calculated by the estimator 275 using relation (16):
The method 200 comprises a high-band autocorrelation function calculating operation 227 and the device 250 comprises a calculator 277 for conducting operation 227. The calculator 277 calculates the high-band autocorrelation function Xcorr 228 based on the normalized temporal envelope using, for example, relation (17):
where Ef is the energy of the normalized temporal envelope Rnorm(n) 224 in the current frame and Ef[−1] is the energy of the normalized temporal envelope Rnorm(n) 224 in the previous frame. The calculator 277 may use the following relation (18) to calculate the energy:
In case of mode switching the factor in front of the summation term in relation (17) is set to 1/Ef because the energy of the normalized temporal envelope Rnorm(n) 224 in the previous frame is unknown.
The method 200 comprises a high-band voicing factor calculating operation 229 and the device 250 comprises a calculator 279 for conducting operation 229.
The voicing of the high-band residual signal is closely related to the variance σcorr of the high-band autocorrelation function Xcorr 228. The calculator 279 calculates the variance σcorr using, for example, the following relation (19):
To improve the discriminative potential (VOICED/UNVOICED decision) of the voicing parameter νmult, the calculator 279 multiplies the variance σcorr with the maximum value of the high-band autocorrelation function Xcorr 228 as expressed in the following relation (20):
The calculator 279 then transforms the voicing parameter νmult from relation (20) with the sigmoid function to limit its dynamic range and obtain a high-band voicing factor νHB 230 using, for example, the following relation (21):
where the factor β is estimated experimentally and set, for example, to a constant value of 25.0. The high-band voicing factor νHB 230 as calculated from relation (21) above is then limited to the range of 0.0; 1.0
and transmitted to the decoder
Section 4 (Excitation Mixing Factor) relates to features of both the encoder and decoder.
The SWB TBE tool in the 3GPP EVS codec uses the low-band excitation signal 208 (
Referring to
The pseudo-random noise generator 551 produces a random noise excitation signal 502 with uniform distribution. For example, the generator of pseudo-random numbers of the 3GPP EVS codec as described in Reference [1] can be used as pseudo-random noise generator 551. The random noise excitation signal wrand 502 can be expressed using the following relation (22):
The random noise excitation signal wrand 502 has zero mean and a non-zero variance σrand=1.14e+11. It should be noted that the variance is only approximate and represents an average value over 100 frames.
The method 200 comprises an operation 503 of calculating the power of the low-band excitation signal lLB(n) 208 and a power calculator 553 to perform operation 503.
The power calculator 503 calculates that power 504 of the low-band excitation signal lLB(n) 208 transmitted from the encoder using, for example, the following relation (23):
The method 200 comprises an operation 505 of normalizing the power of the random noise excitation signal 502 and a power normalizer 555 to perform operation 505.
The power normalizer 555 normalizes the power of the random noise excitation signal 502 to the power 504 of the low-band excitation signal 208 using, for example, the following relation (24):
Although the true variance of the random noise excitation signal 502 varies from frame to frame, the exact value is not needed for power normalization. Instead, the above defined approximate value of the variance is used in the above relation (24) to save computational resources.
The method 200 comprises an operation 507 of mixing the low-band excitation signal lLB(n) 208 with the power normalized random noise excitation signal wwhite(n) 506 and a mixer 557 to perform operation 507.
The mixer 557 produces the time-domain bandwidth expanded excitation signal 508 by mixing the low-band excitation signal lLB(n) 208 with the power normalized random noise excitation signal wwhite(n) 506 using a high-band mixing factor to be described later in the present disclosure.
Referring to
As illustrated in
The calculator 652 calculates the downsampled temporal envelope W4 kHz(n) 606 of the power-normalized random noise excitation signal wwhite(n) 506 (which is also calculated at the encoder as shown in
Similarly, the calculator 654 calculates temporal envelope L4 kHz(n) 605 of the low-band excitation signal lLB(n) 208 downsampled at 4 kHz again using the same algorithm as described in Section 3 (High-Band Autocorrelation Function and Voicing Factor). The downsampled temporal envelope 606 of the low-band excitation signal lLB(n) 208 can be denoted as follows:
The objective of the MSE minimization operation 601 is to find an optimal pair of gains g*l, g*w minimizing the energy of the error between (a) the combined temporal envelope (L4 kHz(n), W4 kHz(n)) and (b) the temporal envelope R4 kHz(n) of the high-band residual signal rHB(n) 214. This can be mathematically expressed using relation (27):
For that purpose, the MSE minimizer 651 solves a system of linear equations. The solution is found in the scientific literature. For example, the optimal pair of gains g*l, g*w can be calculated using relation (28):
where the values c0, . . . , c4, and c5 are given by
The MSE minimizer 651 then calculates the minimum MSE error energy (excess error) using, for example, the following relation (30):
For further processing, the gain quantizer 657 scales the optimal gains g*l, g*w in such a way that a gain gln associated with the temporal envelope L4 kHz(n) 605 of the low-band excitation signal lLB(n) becomes unitary, with a gain gwn associated with the temporal envelope W4 kHz(n) 606 of the power-normalized random noise excitation signal wwhite(n) 506 given using, for example the following relation (31):
The result/advantage of the re-scaling of relation (31) is that only one parameter, the normalized gain gwn, needs to be coded and transmitted in the bitstream from the encoder to the decoder instead of two parameters. Therefore, scaling of the gains using relation (31) reduces bit consumption and simplifies the quantization process. On the other hand, the energy of the combined temporal envelopes (L4 kHz(n) and W4 kHz(n)) will not match the energy of the temporal envelope R4 kHz(n) of the high-band residual signal 214. This is not a problem since the SWB TBE tool uses subframe gains and a global gain containing the information about energy of the high-band residual signal. The calculation of subframe gains and the global gain is described in Section 6 (Gain/Shape Estimation) of the present disclosure.
The gain quantizer 657 limits the normalized gain gwn between a maximum threshold of 1.0 and a minimum threshold of 0.0. The gain quantizer 657 quantizes the normalized gain guy using, for example, a 3-bit uniform scalar quantizer described by the following relation (32):
and the resulting index idxg 610 is limited to the interval {0; 7} to form/represent the high-band mixing factor and is transmitted in the SWB TBE bitstream together with the existing indices of the SWB TBE encoder at 0.95 kbps or 1.6 kbps.
Referring back to
The mixing factor decoder 559 produces from the received index idxg 610 a decoded gain using, for example, the following relation (33):
The decoded gain from relation (33) forms the high-band mixing factor fmix 510.
The low-band excitation signal lLB(n) 208, sampled for example at 16 kHz, and the normalized random noise excitation signal wwhite(n) 506, sampled for example at 16 kHz, are mixed together in the mixer 557. However, both the energy of the low-band excitation signal lLB(n) 208 and the energy of the random noise excitation signal wrand 502 vary from frame to frame. The fluctuation of energy could eventually generate audible artifacts at frame borders if the low-band excitation signal ID (n) 208 and the random noise excitation signal wrand 502 were mixed directly using the high-band mixing factor fmix 510 obtained from relation (33). To ensure smooth transitions the energy of the random noise excitation signal wrand 502 is linearly interpolated in generator 551 between the previous frame and the current frame. This can be done by scaling the random noise excitation signal wrand 502 in the first half of the current frame with the following interpolation factor:
where ELB is the energy of the low-band excitation signal lLB(n) 208 in the current frame and ELB[−1] is the energy of the low-band excitation signal lLB(n) 208 in the previous frame.
To further smooth the transitions between the previous and the current frame the decoder 559 also linearly interpolates the high-band mixing factor fmix 510. This can be done by introducing the scaling factor βmix(n) calculated, for example, using the following relation:
where fmax[−1] is the value of the high-band mixing factor in the previous frame. Note that the interpolation factor ζw(n) calculated in relation (34) and the scaling factor βmix(n) calculated in relation (35) are defined for n=0, . . . , N/2−1.
The mixing of the low-band excitation signal lLB(n) 208 and the random noise excitation signal wwhite(n) 506 is finally done by the mixer 557 using, for example, relation (36) to obtain a time-domain bandwidth expanded excitation signal u(n) 508.
The high-band LP filter coefficients ajHB(n) 212 calculated by means of the LP analysis on the high-band input signal sHB(n) in relation (4) are converted in the encoder of the SWB TBE tool into LSF parameters and quantized. At the bitrate of 0.95 kbps the SWB TBE encoder uses 8 bits to quantize the LSF indices. At the bitrate of 1.6 kbps the SWB TBE encoder uses 21 bits to quantize the LSF indices.
Referring back to
The decoded high-band LP filter coefficients 512 can be denoted as:
where P=10 is the order of the LP filter. The first decoded LP filter coefficient in each subframe is unitary, i.e. aqjHB(0)=1.0, j=0, . . . , 3.
The method 200 comprises a filtering operation 515 and the device 250 comprises a corresponding synthesis filter 565 using the decoded high-band LP filter coefficients 514 to filter the mixed time-domain bandwidth expanded excitation signal 508 of relation (36) using for example the following relation (38) to obtain a LP-filtered high-band signal yHB 516:
A gain/shape parameter smoothing is applied both at the encoder and at the decoder. The adaptive attenuation of the frame gain is applied at the encoder only.
The spectral shape of the high-band target signal sHB(n) 210 is encoded with the quantized LSF coefficients. Referring to
The normalized estimated temporal subframe gains 702 from estimator 751 can be denoted using relation (39):
The method 200 comprises a calculating operation 703 and the device 250 comprise a corresponding calculator 753 for determining a temporal tilt 704 of the normalized estimated temporal subframe gains gk 702 by means of linear least squares (LLS) interpolation. As illustrated in
The linear curve 801 built with the LLS interpolation method can be defined using the following relation (40):
where the parameters cLLS and dLLS are found by minimizing the sum of squared differences between the true subframe gains gk 702 and the corresponding points on the linear curve for all k=0, . . . , 3 subframes. This can be expressed using the following relation (41):
By expanding relation (41) it is possible to express the temporal tilt gtilt of the estimated temporal subframe gains gk 702. The temporal tilt gtilt 702 is, in fact, equal to the optimal slope cLLS of the linear curve. The temporal gtilt guilt can be calculated in the calculator 753 using the following relation (42):
The method 200 comprises a smoothing operation 705 and the device 250 comprises a corresponding smoother 755 for smoothing the temporal subframe gains gk 702 with the interpolated (LLS) gains gk[LLS] from relation (40) when, for example, the following condition is true:
The smoothing of the temporal subframe gains gk 702 is then done by the smoother 755 using, for example, the following relation (44):
where the weight κ is proportional to the voicing parameter σHB 230 (
and limited to a maximum value of 1.0 and a minimum value of 0.0.
The method 200 comprises a gain-shape quantizing operation 707 and the device 250 comprises a corresponding gain-shape quantizer 757 for quantizing the smoothed temporal subframe gains
The method 200 comprises an interpolation operation 709 and the device 250 comprises a corresponding interpolator 759 for interpolating, after the quantization operation 707, the quantized temporal subframe gains ĝk 708 again using the same LLS interpolation procedure as described in relations (40) and (41). The interpolated quantized subframe gains 710 in the four consecutive subframes in a frame can be denoted using the following relation (47):
The method 200 comprises a tilt calculation operation 711 and the device 250 comprises a corresponding tilt calculator 761 for calculating the tilt of the interpolated quantized temporal subframe gains ĝk[LLS]710 using, for example, relation (42). The tilt of the interpolated quantized temporal subframe gains ĝk[LLS]710 can be denoted as ĝtilt[LLS].
The quantized temporal subframe gains ĝk 708 are then smoothed when the condition of the following condition (48) is true, where idxg is the index from relation (32):
For that purpose, the method 200 comprises a quantized gains smoothing operation 713 and the device 250 comprises a corresponding smoother 714 for smoothing the quantized temporal subframe gains ĝk 708 by means of averaging using, for example, the interpolated temporal subframe gains ĝk[LLS]710 from relation (47). For that purpose, the following relation (49) can be used:
The method 200 comprises a frame gain estimating operation 715 and the device 250 comprises a corresponding frame gain estimator 765. The SWB TBE tool uses the frame gain to control the global energy of the synthesized high-band sound signal. The frame gain is estimated by means of energy-matching between (a) the LP-filtered high-band signal yHB 516 of relation (38) multiplied by the smoothed quantized temporal subframe gains {tilde over (g)}k 714 from relation (49) and (b) the high-band target signal sHB(n) 210 of relation (3). The LP-filtered high-band signal yHB 516 of relation (38) is multiplied by the smoothed quantized temporal subframe gains {tilde over (g)}k 714 using, for example, the following relation (50):
The details of the frame gain estimation operation 715 are described in Reference [1]. The estimated frame gain parameter is denoted as gf (see 716).
The method 200 comprises an operation 717 of calculating a synthesis high-band signal 718 and the device 250 comprises a calculator 767 for performing the operation 717. The calculator 767 may modify the estimated frame gain gf 717 under some specific conditions. For example, the frame gain gf can be attenuated according to relation (51) under given values of high-band voicing factor σHB 230 (
where Eerr is the MSE excess error energy calculated in relation (30) and fatt is an attenuation factor for example calculated as:
Further modifications to the frame gain gf under some specific conditions are described in Reference [1].
The calculator 767 then quantizes the modified frame gain using the frame gain quantizer of the encoder of the SWB TBE tool of Reference [1].
Finally, the calculator 767 determines the synthesized high-band sound signal 718 using, for example, the following relation (53):
The method 200 and device 250 may be implemented as a part of a mobile terminal, as a part of a portable media player, or in any similar device. The device 250 (identified as 900 in
The input 902 is configured to receive the input signal. The output 904 is configured to supply the time-domain bandwidth expanded excitation signal. The input 902 and the output 904 may be implemented in a common module, for example a serial input/output device.
The processor 906 is operatively connected to the input 902, to the output 904, and to the memory 908. The processor 906 is realized as one or more processors for executing code instructions in support of the functions of the various operations and elements of the above described method 200 and device 250 as shown in the accompanying figures and/or as described in the present disclosure.
The memory 908 may comprise a non-transient memory for storing code instructions executable by the processor 906, specifically, a processor-readable memory comprising/storing non-transitory instructions that, when executed, cause a processor to implement the operations and elements of the method 200 and device 250. The memory 908 may also comprise a random access memory or buffer(s) to store intermediate processing data from the various functions performed by the processor 908.
Those of ordinary skill in the art will realize that the description of the method 200 and device 250 are illustrative only and are not intended to be in any way limiting. Other embodiments will readily suggest themselves to such persons with ordinary skill in the art having the benefit of the present disclosure. Furthermore, the disclosed method 200 and device 250 may be customized to offer valuable solutions to existing needs and problems of encoding and decoding sound.
In the interest of clarity, not all of the routine features of the implementations of the method 200 and device 250 are shown and described. It will, of course, be appreciated that in the development of any such actual implementation of the method 200 and device 250, numerous implementation-specific decisions may need to be made in order to achieve the developer's specific goals, such as compliance with application-, system-, network- and business-related constraints, and that these specific goals will vary from one implementation to another and from one developer to another. Moreover, it will be appreciated that a development effort might be complex and time-consuming, but would nevertheless be a routine undertaking of engineering for those of ordinary skill in the field of sound processing having the benefit of the present disclosure.
In accordance with the present disclosure, the elements, processing operations, and/or data structures described herein may be implemented using various types of operating systems, computing platforms, network devices, computer programs, and/or general purpose machines. In addition, those of ordinary skill in the art will recognize that devices of a less general purpose nature, such as hardwired devices, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), or the like, may also be used. Where a method comprising a series of operations and sub-operations is implemented by a processor, computer or a machine and those operations and sub-operations may be stored as a series of non-transitory code instructions readable by the processor, computer or machine, they may be stored on a tangible and/or non-transient medium.
Processing operations and elements of the method 200 and device 250 as described herein may comprise software, firmware, hardware, or any combination(s) of software, firmware, or hardware suitable for the purposes described herein.
In the method 200 and device 250, the various processing operations and sub-operations may be performed in various orders and some of the processing operations and sub-operations may be optional.
Although the present disclosure has been described hereinabove by way of non-restrictive, illustrative embodiments thereof, these embodiments may be modified at will within the scope of the appended claims without departing from the spirit and nature of the present disclosure.
The present disclosure mentions the following references, of which the full content is incorporated herein by reference:
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/CA2023/050117 | 1/27/2023 | WO |
Number | Date | Country | |
---|---|---|---|
63306291 | Feb 2022 | US |