The present disclosure relates to a quantization scale factor determination apparatus and a quantization scale factor determination method.
A Modified Discrete Cosine Transform (MDCT) spectral arithmetic coding technique is one coding technique for encoding a speech signal or an audio signal (e.g., also referred to as a “speech audio signal”) at a low bit rate. This coding technique, for example, scales (or referred to as quantization scaling), quantizes, and performs arithmetic coding on MDCT spectra (e.g., see Patent Literature (hereinafter, referred to as “PTL” 1).
PTL 1
Japanese Unexamined Patent Application Publication (Translation of PCT Application) No. 2019-514065
However, there is scope for further study on a method for reducing the amount of mathematical operation in coding of speech signals or audio signals.
One non-limiting exemplary embodiment of the present disclosure facilitates providing a quantization scale factor determination apparatus and a quantization scale factor determination method capable of reducing the amount of mathematical operation in coding of speech signals or audio signals.
A quantization scale factor determination apparatus according to an embodiment of the present disclosure includes: correction circuitry, which, in operation, corrects an initial value of a quantization scale factor based on whether or not a spectrum of a speech audio signal has sparsity: and search circuitry, which, in operation, searches for the quantization scale factor based on the initial value.
Note that these generic or specific aspects may be achieved by a system, an apparatus, a method, an integrated circuit, a computer program, or a recoding medium, and also by any combination of the system, the apparatus, the method, the integrated circuit, the computer program, and the recoding medium.
According to one exemplary embodiment of the present disclosure, it is possible to reduce the amount of mathematical operation in coding of speech signals or audio signals.
Additional benefits and advantages of the disclosed exemplary embodiments will become apparent from the specification and drawings. The benefits and/or advantages may be individually obtained by the various embodiments and features of the specification and drawings, which need not all be provided in order to obtain one or more of such benefits and/or advantages.
Hereinafter, embodiments of the present disclosure will be described in detail with reference to the accompanying drawings.
In PTL 1, for example, an inverse of a Root Mean Square (RMS) of values obtained by multiplication between an envelope of MDCT spectra obtained based on a linear predictive analysis (e.g., linear prediction coding (LPC) analysis) and the absolute values of the MDCT spectra is configured as an initial value of a “quantization scale factor” in quantization scaling of the MDCT spectra.
An encoding apparatus performs a search process for a quantization scale factor, for example, based on the initial value of the quantization scale factor. For example, the encoding apparatus estimates, based on the quantization scale factor, the amount of bits consumed by arithmetic coding on the MDCT spectra (e.g., referred to as the “consumption bit amount”) from an approximate expression. Then, the encoding apparatus compares the estimated consumption bit amount with a target bit amount, and searches for, for example, a quantization scale factor satisfying conditions of “not exceeding the target bit amount” and “closest to the target bit amount” in accordance with a binary search method.
However, for example, the farther away the initial value of the quantization scale factor is from the quantization scale factor after the search (in other words, a convergence value in the binary search), the more the number of searches performed until value convergence in the search. Accordingly, there is a possibility that the amount of mathematical operation in the encoding apparatus increases. Further, it is known that the binary search method is a slow convergence method.
Therefore, one exemplary embodiment of the present disclosure will be described in relation to a method for reducing the amount of mathematical operation in the search for a quantization scale factor.
The transmission system illustrated in
Encoding apparatus 1 encodes an input signal, such as, for example, a speech signal or an audio signal, and transmits encoded data to decoding apparatus 2 via a communication network or a storage medium (not illustrated). For example, encoding apparatus 1 may include various speech audio codecs (e.g., encoders) defined in standards such as Moving Picture Experts Group (MPEG), 3rd Generation Partnership Project (3GPP) or International Telecommunication Union Telecommunication Standardization Sector (ITU-T).
Decoding apparatus 2 decodes the encoded data received from encoding apparatus 1 via, for example, a transmission path or a storage medium, and outputs an output signal (for example, an electric signal). Decoding apparatus 2 may, for example, output the electrical signal as an acoustic wave via a speaker or headphones. Further, decoding apparatus 2 may use, for example, a decoder corresponding to the above-described speech audio codecs.
In addition, the codecs in encoding apparatus 1 may include, for example, transformed code excitation (TCX) encoding, which is one frequency-domain encoding. For example, encoding apparatus 1 illustrated in
The TCX encoding may be applied, for example, to encoding in low bit rate transmissions such as transmissions at 13.2 kbps or 16.4 kbps. Note that, the bit rate of transmission to which the TCX encoding is applied is not limited to 13.2 kbps and 16.4 kbps, and may be other bit rates. The TCX encoding that uses MDCT to encode excitation signals may also be referred to, for example, as “MDCT based TCX.”
For example, a frequency-domain signal obtained by MDCT performed on an input signal (hereinafter referred to as “MDCT spectrum”) and LPC coefficients obtained by LPC analysis performed on the input signal are inputted to envelope generator 11. Envelope generator 11 generates an envelope of MDCT spectra based on, for example, the LPC coefficients. Envelope generator 11 outputs envelope information indicating the generated envelope and spectral information indicating the MDCT spectra to harmonics analyzer 12.
Harmonics analyzer 12 analyzes a harmonics structure (in other words, harmonic components) in the MDCT spectra, for example, based on the information inputted from envelope generator 11. Harmonics analyzer 12 outputs, for example, harmonics information, envelope information, and spectral information indicating the analysis result of the harmonics structure to envelope scaler 13.
For example, the harmonics information may include information indicating whether or not the MDCT spectra have the harmonics structure (e.g., referred to as a “harmonics flag” or a “harmonics model flag”). The harmonics information may include, for example, an index (e.g., referred to as a “harmonics gain index”) indicating a harmonics gain. The harmonics gain index may be, for example, a value obtained by indexing (in other words, quantizing) the harmonics gain for each certain level. For example, the higher the value of the harmonics gain index, the higher the harmonics gain level may be.
Envelope scaler 13 performs a scaling process on the envelope of MDCT spectra based on, for example, the information inputted from harmonics analyzer 12. Envelope scaler 13 outputs the envelope information, harmonics information, and spectral information indicating the scaled envelope to rate loop processor 14.
Rate loop processor 14 performs, based on the information inputted from envelope scaler 13, rate loop processing (or, also referred to as quantization rate loop processing) to calculate a quantization scale factor for quantization of MDCT spectra. Rate loop processor 14 searches for the quantization scale factor, for example, based on comparison between a consumption bit amount and a target bit amount. A search method may be, for example, a binary search method or another search method.
Further, rate loop processor 14 may configure an initial value of the quantization scale factor for the search, for example, based on the sparsity in the MDCT spectra. Note that, an example of a configuration method for configuring the initial value of the quantization scale factor in rate loop processor 14 will be described later.
Rate loop processor 14 outputs information indicating the searched quantization scale factor and spectral information to quantizer/encoder 15.
Quantizer/encoder 15 quantizes and encodes the MDCT spectra based on the information inputted from rate loop processor 14 and outputs the resulting encoded data.
Rate loop processor 14 illustrated in
In rate loop processor 14 illustrated in
Note that the calculation method for calculating the quantization scale factor in quantization scale factor calculator 141 is not limited to the method described above. For example, quantization scale factor calculator 141 may configure, as the initial value of the quantization scale factor, the inverse of the variance of the multiplication values obtained by multiplication between the envelope and the absolute values of the MDCT spectra. Further, for example, quantization scale factor calculator 141 may configure, as the initial value of the quantization scale factor, the inverse of the root mean square of the multiplication values obtained by multiplication between the envelope and the MDCT spectra (this inverse may also be multiplied by a predetermined factor).
Sparsity analyzer 142 analyzes (in other words, judges) the sparsity of MDCT spectra based on, for example, at least one of the harmonics information, spectral information, and envelope information.
The term “sparsity” means a characteristic that, for example, in distribution of MDCT spectra, a small number of spectra (components) are non-zero and a large number of spectra (components) are zero (or components with amplitudes below thresholds). Alternatively, the sparsity is a state in which, for example, a small number of spectra account for a larger percentage of the spectral amplitudes (e.g., an amplitude sum of 50% or more) of the sum of the spectral amplitudes.
For example, sparsity analyzer 142 may determine, based on the analysis result on the sparsity, whether or not to correct the quantization scale factor inputted from quantization scale factor calculator 141. When the correction of the quantization scale factor is determined, sparsity analyzer 142 corrects the quantization scale factor and outputs information indicating the corrected quantization scale factor to quantization scale factor searcher 143. On the other hand, when the quantization scale factor is not to be corrected, sparsity analyzer 142 outputs, to quantization scale factor searcher 143, information indicating the quantization scale factor inputted from quantization scale factor calculator 141.
Quantization scale factor searcher 143 searches for the quantization scale factor based on the initial value of the quantization scale factor inputted from sparsity analyzer 142. Then, for example, quantization scale factor searcher 143 performs the binary search based on the comparison result between the consumption bit amount estimated for the arithmetic coding and the target bit amount, and outputs information indicating the quantization scale factor after the search to quantizer/encoder 15 (quantizer 151).
In quantizer/encoder 15 illustrated in
Encoder 152 encodes the quantized MDCT spectra inputted from quantizer 151 and outputs the encoded data. The encoding method in encoder 152 may be, for example, arithmetic encoding or other encoding.
Sparsity analyzer 142 illustrated in
Pre-processor 1421, for example, performs pre-processing on the quantization scale factor (for example, the uncorrected quantization scale factor (initial value)) inputted from quantization scale factor calculator 141. Pre-processor 1421, for example, may adjust the upper limit value of the quantization scale factor. Further, pre-processor 1421 may multiply the quantization scale factor by a specific value (e.g., a value less than 1.00, for example). Pre-processor 1421 outputs information indicating the quantization scale factor after the pre-processing to sparsity determiner 1422.
Sparsity determiner 1422 determines whether or not the MDCT spectra have the sparsity. For example, sparsity determiner 1422 may judge the sparsity of the MDCT spectra based on the envelope information, harmonics information, and information on the MDCT spectra (e.g., absolute values of the MDCT spectra).
For example, in the MDCT spectra having the harmonics structure, peaks of the MDCT spectra appear intensively at certain spacings, as illustrated, for example, in
In addition, for example, as illustrated in
Therefore, sparsity determiner 1422 may judge the sparsity based on the harmonics information, for example. Sparsity determiner 1422 may judge the sparsity based on, for example, the number of spectra accounting for a percentage equal to or greater than a threshold (e.g., 50%) in the MDCT spectra (in other words, the speech signal or the audio signal). Sparsity determiner 1422 may also judge the sparsity based on, for example, the envelope based on the LPC analysis and the MDCT spectra (e.g., absolute values). Note that, the judgement on the sparsity is not limited to that performed based on at least one parameter (or feature amount) of the harmonics information, envelope information, and MDCT spectra (e.g., absolute values), and may also be performed based on other parameters.
Note that an example of a condition for judging by sparsity determiner 1422 whether or not the MDCT spectra have the sparsity will be described later.
Quantization scale factor corrector 1423 corrects the initial value of the quantization scale factor, for example, based on whether or not the MDCT spectra have the sparsity. For example, quantization scale factor corrector 1423 corrects the quantization scale factor (initial value) when the MDCT spectra have the sparsity. On the other hand, when the MDCT spectra do not have the sparsity, for example, sparsity analyzer 142 does not correct the quantization scale factor. Quantization scale factor corrector 1423 outputs the obtained quantization scale factor to quantizer/encoder 15 (for example,
Here, in
In addition, for example, as illustrated in
For this reason, the energy or mean amplitude of the entire MDCT spectra (for example, corresponding to the above-described standard deviation) can be estimated to be lower in the case where the MDCT spectra have the sparsity than in the case where the MDCT spectra do not have the sparsity. Thus, for example, the quantization scale factor (e.g., the inverse of the standard deviation) determined in quantization scale factor calculator 141 may be larger in the case where the MDCT spectra have the sparsity than the quantization scale factor in the case where the MDCT spectra do not have the sparsity or than the quantization scale factor after search.
In
As illustrated in
The correction method for correcting the quantization scale factor may be configured based on, for example, a statistical relationship (e.g., simulation result) between the quantization scale factor in the presence of sparsity and the quantization scale factor after search, as illustrated in
Note that, the parameter “1.85” is one example, and is not limited to this value. The correction method for correcting the quantization scale factor is not limited to the above method, and other methods may be used.
The operation of sparsity analyzer 142 has been described above. For example, when the MDCT spectra have the sparsity, quantization scale factor searcher 143 is capable of starting the search based on the initial value of the corrected quantization scale factor. For example, in
Next, an example of a condition (judgement method) for sparsity determiner 1422 to judge whether or not the MDCT spectra have the sparsity will be described.
Based on judgement condition 1, sparsity determiner 1422 judges the sparsity based on whether or not the MDCT spectra have the “harmonics structure” as illustrated in
For example, sparsity determiner 1422 may judge the sparsity based on the harmonics flag, the harmonics gain index, and the mean value of the absolute values of MDCT spectra (hereinafter referred to as the “spectral mean value”).
In addition, for example, sparsity determiner 1422 may judge that the MDCT spectra have the sparsity, when the harmonics flag is “ON” (in other words, when the MDCT spectra have the harmonics structure), when the harmonics gain index is equal to or higher than a threshold (in other words, when the harmonics gain is equal to or higher than the threshold), and when the number of spectra (in other words, also referred to as frequency bins or lines) exceeding the spectral mean value is less than a threshold.
For example, there is a possibility that, even when the MDCT spectra have the harmonics structure, the MDCT spectra do not have the sparsity when the number of spectra exceeding the spectral mean value is equal to or larger than the threshold, because a difference between the spectral peak components in the harmonics structure and other components different from the peak components becomes smaller. Therefore, when the number of spectra exceeding the spectral mean value is equal to or larger than the threshold, sparsity determiner 1422 may judge that the MDCT spectra do not have the sparsity.
Note that, in judgement condition 1, a plurality of thresholds for the harmonics gain index may be configured. Further, in judgement condition 1, a plurality of thresholds for the number of spectra exceeding the spectral mean value may be configured.
For example, the example illustrated in
Further, for example, the example illustrated in
Note that the values of thresholds X1, X2, Y1, and Y2 are examples, and are not limited to these values. In addition, here, the description has been given of the case where the sparsity is judged based on one of the two patterns of conditions of the combination of X1 and Y1 and the combination of X2 and Y2, but the present disclosure is not limited thereto. For example, the number of patterns of combinations of threshold X for the harmonics gain index and threshold Y for the number of spectra exceeding the spectral mean value may be one pattern or three or more patterns.
Based on judgement condition 2, sparsity determiner 1422 judges the sparsity based on the number of MDCT spectra accounting for a percentage (for example, also referred to as a “composition ratio”) equal to or larger than a threshold in the MDCT spectra, as illustrated in
For example, sparsity determiner 1422 may judge that the MDCT spectra have the sparsity, when the number of spectra accounting for the composition ratio of the MDCT spectra equal to or greater than the threshold (e.g., 50%) is equal to or less than threshold L1.
Alternatively, for example, sparsity determiner 1422 may judge that the MDCT spectra have the sparsity, when the number of spectra of the MDCT spectra accounting for the composition ratio equal to or greater than the threshold (e.g., 50%) is equal to or less than threshold L1, and when the number of spectra exceeding the root mean square (in other words, the power-mean value or the mean amplitude) of the absolute values of the MDCT spectra is less than threshold L2.
For example, when the number of spectra exceeding the root mean square of the absolute values of the MDCT spectra is equal to or greater than threshold L2, sparsity determiner 1422 may judge that the MDCT spectra do not have the sparsity because it is likely that the energy is not concentrated in a part of the spectra (in other words, is dispersed) in the distribution of the MDCT spectra.
For example, the example illustrated in
Note that judgement condition 2 may be applied, for example, to the case where the MDCT spectra do not have the harmonics structure (an example will be described later).
Based on judgement condition 3 like based on judgement condition 2, sparsity determiner 1422 judges the sparsity based on the number of MDCT spectra accounting for a percentage (or the “composition ratio”) equal to or larger than a threshold in the MDCT spectra, as illustrated in
In addition, based on judgement condition 3, sparsity determiner 1422 may judge the sparsity not only based on the condition based on the composition ratio accounted for by spectra, but also based on the ratio between the “maximum value of the multiplication values obtained by multiplication between the envelope and the absolute values of the MDCT spectra” and the “root mean square.”
For example, sparsity determiner 1422 may judge that the MDCT spectra have the sparsity, when the number of spectra of the MDCT spectra accounting for the composition ratio equal to or greater than the threshold (for example, 50%) is equal to or less than threshold L1, and when the ratio of the “maximum value of the multiplication values obtained by multiplication between the envelope and the absolute values of the MDCT spectra” to the “root mean square” is equal to or greater than threshold L2,
For example, when the ratio of the “maximum value of the multiplication values obtained by multiplication between the envelope and the absolute values of the MDCT spectra” to the “root mean square” is less than threshold L2, the ratio of the mean power (or amplitude) value to the maximum peak power (or amplitude) may be large in the MDCT spectra. Therefore, since it is highly likely that the maximum peak power (or amplitude) is not concentrated in a part of the spectra (in other words, is dispersed), sparsity determiner 1422 may judge that the MDCT spectra do not have the sparsity.
For example, the example illustrated in
Note that the values of parameter k and thresholds L1 and L2 are examples, and are not limited to these values.
Further, the description has been given of the case where, in judgement conditions 2 and 3, the threshold regarding the composition ratio accounted for by the spectra is 50%, but the present disclosure is not limited to 50%, and other percentages may be used.
In judgement conditions 2 and 3, for example, the condition that the composition ratio accounted for by k spectra exceeds 50% may be replaced with the condition that the percentage (for example, k/L_frame) of number k of spectra accounting for a composition ratio of 50% among the spectra in a frame (for example, number L_frame of spectra) is equal to or less than a threshold. For example, L_frame is 640, and k satisfying k/L_frame≤0.0559 is 4 when the threshold=0.0559.
Judgement conditions 1 to 3 have been described above. Note that, judgement conditions 1 to 3 may be combined. In addition, the judgement condition for the sparsity is not limited to judgement conditions 1 and 2 and other judgement conditions may be used.
For example, sparsity determiner 1422 may switch the judgement condition for judging the sparsity of MDCT spectra based on the uncorrected quantization scale factor (initial value before correction) calculated based on the MDCT spectra.
For example, in the example of
Threshold n1 may be determined, for example, based on whether or not it is a quantization scale factor corresponding to MDCT spectra that may have the harmonics structure. For example, the larger the peak amplitude value of the MDCT spectra and the smaller the mean value of the MDCT spectral amplitudes, the more likely the MDCT spectra have the harmonics structure. Therefore, for example, when the uncorrected quantization scale factor is less than threshold n1 (in other words, when the peak amplitude value of the MDCT spectra is large and the mean value of the MDCT spectral amplitudes is small), sparsity determiner 1422 may judge, on the occasion of the sparsity judgement, whether or not the MDCT spectra have the harmonics structure. On the other hand, for example, when the uncorrected quantization scale factor is equal to or greater than threshold n1 (in other words, when the peak amplitude value of only several MDCT spectra is large and the mean value of the MDCT spectral amplitudes is small), sparsity determiner 1422 does not have to judge, on the occasion of the sparsity judgement, whether or not the MDCT spectra have the harmonics structure.
Threshold n2 may also be determined based on, for example, a lower limit value of the amplitude levels of the MDCT spectra scaled by the quantization scale factor.
For example, the smaller the amplitude levels of the MDCT spectra, the greater the quantization scale factor may be configured. However, when the amplitude levels of the MDCT spectra is around 0, the quantization scale factor may be configured to such a quantization scale factor that quantizes the MDCT spectra assuming the MDCT spectra to be 0 without a larger quantization scale factor being configured. In other words, depending on the configuration of the quantization scale factor, the MDCT spectra may be excessively scaled when an MDCT spectral amplitude level near 0 is forcibly quantized with a value greater than 0.
For example, in the example illustrated in
Further, for example, in
As described above, sparsity determiner 1422 switches the judgement conditions for judging the sparsity based on the uncorrected quantization scale factor (in other words, MDCT spectral amplitude levels). By switching the judgement conditions, sparsity determiner 1422 can judge the sparsity according to the features of the MDCT spectra (for example, the amplitude level, the presence or absence of the harmonics structure, or the like), and thus, the judgement accuracy for judging the sparsity can be improved.
Note that, the values of thresholds n1 and n2 are examples, and other values may be used. Further, the number of thresholds may be one or three or more.
As described above, in the present embodiment, in encoding apparatus 1, the initial value of the quantization scale factor is corrected based on whether or not the MDCT spectra of a speech signal or an audio signal have the sparsity, and the search for the quantization scale factor is performed based on the initial value. In other words, in encoding apparatus 1, the initial value of the quantization scale factor is corrected to a value closer to the quantization scale factor obtained in the binary search, for example. By this correction, for example, the number of searches in the binary search can be reduced, and the amount of mathematical operation in the search process for the quantization scale factor can be reduced. Therefore, according to the present embodiment, it is possible to reduce the amount of mathematical operation in the coding of the speech signal or the audio signal.
In variation 1, quantization scale factor searcher 143 (for example,
In
In Expression 1, “tbit” represents the target bit amount, “bfbit” represents the consumption bit amount estimated for the arithmetic encoding on the MDCT spectra in the previous search, and “crbit” represents the consumption bit amount estimated for the arithmetic encoding on the MDCT spectra in the current search. In addition, “bfscl” represents the quantization scale factor in the previous search, and “crscl” represents the quantization scale factor in the current search.
As described above, in Variation 1, quantization scale factor searcher 143 determines quantization scale factor nxscl for the next search based on difference n between consumption bit amount crbit estimated for the arithmetic coding on the MDCT spectra in the current search and target bit amount tbit, and difference m between consumption bit amount bfbit estimated for the arithmetic coding on the MDCT spectra in the previous search and the target bit amount tbit. Note that, “nxscl” satisfies “bfscl≤nxscl≤crscl” or “crscl≤nxscl≤bfscl.”
In other words, quantization scale factor searcher 143 weights the quantization scale factor used for each search based on the differences (e.g., m and n) between the consumption bit amounts estimated for the searches and the target bit amount.
For example, in the example illustrated in
In addition, letting the quantization scale factor at the time of the next search obtained by weighting be denoted by “wgscl,” the quantization scale factor at the time of the next search obtained by the binary search be denoted by “biscl” (in the case of the binary search method, weighting factor biscl is 0.5), quantization scale factor searcher 143 may determine quantization scale factor nxscl at the time of the next search based on the weighted sum of the two quantization scale factors. The weighting factor of this weighting may vary from search to search. For example, the weighting factor may be changed by starting with nxscl=1×wgscl+0×biscl, the weight may be increased or decreased by 0.25 at each time as given by nxscl0.75×wgscl+0.25×biscl, nxscl=0.5×wgscl+0.5×biscl, and nxscl=0.25×wgscl+0.75×biscl, and finally, the same nxscl=0×wgscl+1×biscl as that in the binary search method may be used. When generalized, nxscl is expressed by Expression 2:
(Expression 2).
nx
scl
=α×wg
scl+(1−α)×biscl, 0≤α≤1 [2]
According to Variation 1, for example, the quantization scale factor satisfying the target bit amount can be searched for faster (with a smaller number of searches) as compared with the case where an intermediate value of the quantization scale factors at the time of the previous search and at the time of the current search is configured as the quantization scale factor at the time of the next search. It is thus possible to reduce the number of searches for the quantization scale factor in quantization scale factor searcher 143, so as to reduce the amount of mathematical operation.
Note that, the search to be compared with the consumption hit amount in the current search is not limited to the previous search (in other words, the search immediately before the current search), but may be a search before the previous search. Further, the search in which the quantization scale factor is determined based on a plurality of searches is not limited to the next search (in other words, the search immediately after the current search), but may be a search after the next search. Further, the search to be compared with the consumption bit amount in the current search is not limited to one search in the past, and the consumption bit amounts in a plurality of searches in the past may be used.
In sparsity analyzer 142 illustrated in
For example, when adjusting the upper limit value of the quantization scale factor, pre-processor 1421 may configure threshold n2 illustrated in
Note that the upper limit value of the quantization scale factor in pre-processor 1421 may be a value different from threshold n2.
For example, when the MDCT spectra are determined to have the sparsity and the number of spectra accounting for the composition ratio of the threshold (e.g., 50%) is equal to or less than the threshold, encoding apparatus 1 may perform pulse coding, rather than arithmetic coding, on the quantized MDCT spectra. By this processing, coding efficiency can be improved.
Note that, encoder 152 illustrated in
The embodiments of the present disclosure have been described above.
The present disclosure can be realized by software, hardware, or software in cooperation with hardware. Each functional block used in the description of each embodiment described above can be partly or entirely realized by an LSI such as an integrated circuit, and each process described in the each embodiment may be controlled partly or entirely by the same LSI or a combination of LSIs. The LSI may be individually formed as chips, or one chip may be formed so as to include a part or all of the functional blocks. The LSI may include a data input and output coupled thereto. The LSI herein may be referred to as an IC, a system LSI, a super LSI, or an ultra LSI depending on a difference in the degree of integration.
However, the technique of implementing an integrated circuit is not limited to the LSI and may be realized by using a dedicated circuit, a general-purpose processor, or a special-purpose processor. In addition, a FPGA (Field Programmable Gate Array) that can be programmed after the manufacture of the LSI or a reconfigurable processor in which the connections and the settings of circuit cells disposed inside the LSI can be reconfigured may be used. The present disclosure can be realized as digital processing or analogue processing.
If future integrated circuit technology replaces LSIs as a result of the advancement of semiconductor technology or other derivative technology, the functional blocks could be integrated using the future integrated circuit technology. Biotechnology can also be applied.
The present disclosure can be realized by any kind of apparatus, device or system having a function of communication, which is referred to as a communication apparatus. The communication apparatus may comprise a transceiver and processing/control circuitry. The transceiver may comprise and/or function as a receiver and a transmitter. The transceiver, as the transmitter and receiver, may include an RF (radio frequency) module and one or more antennas. The RF module may include an amplifier, an RF modulator/demodulator, or the like. Some non-limiting examples of such a communication apparatus include a phone (e.g., cellular (cell) phone, smart phone), a tablet, a personal computer (PC) (e.g., laptop, desktop, netbook), a camera (e.g., digital still/video camera), a digital player (digital audio/video player), a wearable device (e.g., wearable camera, smart watch, tracking device), a game console, a digital book reader, a telehealth/telemedicine (remote health and medicine) device, and a vehicle providing communication functionality (e.g., automotive, airplane, ship), and various combinations thereof.
The communication apparatus is not limited to be portable or movable, and may also include any kind of apparatus, device or system being non-portable or stationary, such as a smart home device (e.g., an appliance, lighting, smart meter, control panel), a vending machine, and any other “things” in a network of an “Internet of Things (IoT)”.
The communication may include exchanging data through, for example, a cellular system, a wireless LAN system, a satellite system, etc., and various combinations thereof.
The communication apparatus may comprise a device such as a controller or a sensor which is coupled to a communication device performing a function of communication described in the present disclosure. For example, the communication apparatus may comprise a controller or a sensor that generates control signals or data signals which are used by a communication device performing a communication function of the communication apparatus.
The communication apparatus also may include an infrastructure facility, such as, e.g., a base station, an access point, and any other apparatus, device or system that communicates with or controls apparatuses such as those in the above non-limiting examples.
A quantization scale factor determination apparatus according to an embodiment of the present disclosure includes: correction circuitry, which, in operation, corrects an initial value of a quantization scale factor based on whether or not a spectrum of a speech audio signal has sparsity: and search circuitry, which, in operation, searches for the quantization scale factor based on the initial value.
In an exemplary embodiment of the present disclosure, the quantization scale factor determination apparatus further includes judgement circuitry, which, in operation, judges whether or not the spectrum has the sparsity.
In an exemplary embodiment of the present disclosure, the judgement circuitry judges the sparsity based on a harmonics structure of the spectrum.
In an exemplary embodiment of the present disclosure, the judgement circuitry judges the sparsity based on a number of spectra accounting for a percentage equal to or greater than a threshold in the speech audio signal.
In an exemplary embodiment of the present disclosure, the judgement circuitry judges the sparsity based on an absolute value of the spectrum and an envelope of the spectrum.
In an exemplary embodiment of the present disclosure, the judgement circuitry switches a condition for judging the sparsity, the switching being based on the initial value before correction that is calculated based on the spectrum.
In an exemplary embodiment of the present disclosure, the quantization scale factor determination apparatus further includes pre-processing circuitry, which, in operation, adjusts an upper limit value of the initial value, in which the judgement circuitry judges the sparsity based on an output of the pre-processing circuitry.
In one embodiment of the present disclosure, the search circuitry determines the quantization scale factor for a third search after a first search based on a difference between a target bit amount and a consumption bit amount estimated for encoding on the spectrum in the first search, and a difference between the target bit amount and a consumption bit amount estimated for encoding on the spectrum in a second search before the first search.
In an exemplary embodiment of the present disclosure, the quantization scale factor determination apparatus further includes calculation circuitry, which, in operation, calculates the initial value based on one of a variance and a standard deviation of a spectral amplitude of the speech audio signal.
A quantization scale factor determination method according to an embodiment of the present disclosure includes steps performed by a quantization scale factor determination apparatus of: correcting an initial value of a quantization scale factor based on whether or not a spectrum of a speech audio signal has sparsity: and searching for the quantization scale factor based on the initial value.
The disclosure of Japanese Patent Application No. 2019-189177 dated Oct. 16, 2019 including the specification, drawings and abstract is incorporated herein by reference in its entirety.
An exemplary embodiment of the present disclosure is useful for a transmission system for transmitting a speech signal or an audio signal, or the like.
1 Encoding apparatus
2 Decoding apparatus
10 TCX encoder
11 Envelope generator
12 Harmonics analyzer
13 Envelope scaler
14 Rate loop processor
141 Quantization scale factor calculator
142 Sparse analyzer
143 Quantization scale factor searcher
1422 Sparsity determiner
1423 Quantization scale factor corrector
Number | Date | Country | Kind |
---|---|---|---|
2019-189177 | Oct 2019 | JP | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2020/033579 | 9/4/2020 | WO |