This application relates to the field of audio signal encoding/decoding technologies, and more specifically, to an audio signal encoding method and apparatus, and an audio signal decoding method and apparatus.
As quality of life improves, people have an increasing demand on high-quality audio. To better transmit an audio signal by using limited bandwidth, the audio signal is usually encoded first, and then a bitstream obtained through encoding processing is transmitted to a decoder side. The decoder side performs decoding processing on the received bitstream to obtain a decoded audio signal, where the decoded audio signal is used for playback.
There are many audio signal coding technologies. A frequency-domain encoding/decoding technology is a common audio encoding/decoding technology. In the frequency-domain encoding/decoding technology, compression encoding/decoding is performed by using short-term correlation and long-term correlation of an audio signal.
Therefore, how to improve encoding/decoding efficiency of performing frequency-domain encoding/decoding on an audio signal becomes an urgent technical problem to be resolved.
This application provides an audio signal encoding method and apparatus, and an audio signal decoding method and apparatus, to improve audio signal encoding/decoding efficiency.
According to a first aspect, an audio signal encoding method is provided. The method includes: obtaining a target frequency-domain coefficient of a current frame and a reference target frequency-domain coefficient of the current frame; calculating a cost function based on the target frequency-domain coefficient and the reference target frequency-domain coefficient of the current frame, where the cost function is for determining whether to perform long-term prediction (LTP) processing on the current frame during encoding of the target frequency-domain coefficient of the current frame; and encoding the target frequency-domain coefficient of the current frame based on the cost function.
In this embodiment of this application, the cost function is calculated based on the target frequency-domain coefficient and the reference target frequency-domain coefficient of the current frame, and LTP processing may be performed, based on the cost function, on a signal suitable for LTP processing (no LTP processing is performed on a signal unsuitable for LTP processing). In this way, redundant information in a signal can be reduced by effectively using a long-term correlation of the signal, so that compression performance in audio signal encoding/decoding can be improved. Therefore, audio signal encoding/decoding efficiency can be improved.
In some embodiments, the target frequency-domain coefficient and the reference target frequency-domain coefficient of the current frame may be obtained through processing based on a filtering parameter. The filtering parameter may be obtained by performing filtering processing on a frequency-domain coefficient of the current frame. The frequency-domain coefficient of the current frame may be obtained by performing time to frequency domain transform on a time-domain signal of the current frame. The time to frequency domain transform may be modified discrete cosine transform (MDCT), discrete cosine transform (DCT), fast Fourier transform (FFT), or the like.
The reference target frequency-domain coefficient may be a target frequency-domain coefficient of a reference signal of the current frame.
In some embodiments, the filtering processing may include temporary noise shaping (TNS) processing and/or frequency-domain noise shaping (FDNS) processing, or the filtering processing may include other processing. This is not limited in this embodiment of this application.
With reference to the first aspect, in some embodiments of the first aspect, the cost function includes at least one of a cost function of a high frequency band of the current frame, a cost function of a low frequency band of the current frame, or a cost function of a full frequency band of the current frame. The high frequency band is a frequency band whose frequency is greater than that of a cutoff frequency bin and that is of the full frequency band of the current frame, the low frequency band is a frequency band whose frequency is less than or equal to that of the cutoff frequency bin and that is of the full frequency band of the current frame, and the cutoff frequency bin is used for division into the low frequency band and the high frequency band.
In this embodiment of this application, based on the cost function, LTP processing may be performed on a frequency band (that is, one of the low frequency band, the high frequency band, or the full frequency band) that is suitable for LTP processing and that is of the current frame (no LTP processing is performed on a frequency band unsuitable for LTP processing). In this way, redundant information in a signal can be reduced by more effectively using a long-term correlation of the signal, so that compression performance in audio signal encoding/decoding can be further improved. Therefore, audio signal encoding/decoding efficiency can be improved.
With reference to the first aspect, in some embodiments of the first aspect, the cost function is a predicted gain of a current frequency band of the current frame, or the cost function is a ratio of energy of an estimated residual frequency-domain coefficient of a current frequency band of the current frame to energy of a target frequency-domain coefficient of the current frequency band. The estimated residual frequency-domain coefficient is a difference between the target frequency-domain coefficient of the current frequency band and a predicted frequency-domain coefficient of the current frequency band, the predicted frequency-domain coefficient is obtained based on a reference frequency-domain coefficient and the predicted gain of the current frequency band of the current frame, and the current frequency band is the low frequency band, the high frequency band, or the full frequency band.
With reference to the first aspect, in some embodiments of the first aspect, the encoding the target frequency-domain coefficient of the current frame based on the cost function includes: determining a first identifier and/or a second identifier based on the cost function, where the first identifier is used to indicate whether to perform LTP processing on the current frame, and the second identifier is used to indicate a frequency band on which LTP processing is to be performed and that is of the current frame; and encoding the target frequency-domain coefficient of the current frame based on the first identifier and/or the second identifier.
With reference to the first aspect, in some embodiments of the first aspect, the determining a first identifier and/or a second identifier based on the cost function includes: when the cost function of the low frequency band satisfies a first condition and the cost function of the high frequency band does not satisfy a second condition, determining that the first identifier is a first value and the second identifier is a fourth value, where the first value is used to indicate to perform LTP processing on the current frame, and the fourth value is used to indicate to perform LTP processing on the low frequency band; when the cost function of the low frequency band satisfies the first condition and the cost function of the high frequency band satisfies the second condition, determining that the first identifier is a first value and the second identifier is a third value, where the third value is used to indicate to perform LTP processing on the full frequency band, and the first value is used to indicate to perform LTP processing on the current frame; when the cost function of the low frequency band does not satisfy the first condition, determining that the first identifier is a second value, where the second value is used to indicate not to perform LTP processing on the current frame; when the cost function of the low frequency band satisfies the first condition and the cost function of the full frequency band does not satisfy a third condition, determining that the first identifier is a second value, where the second value is used to indicate not to perform LTP processing on the current frame; or when the cost function of the full frequency band satisfies the third condition, determining that the first identifier is a first value and the second identifier is a third value, where the third value is used to indicate to perform LTP processing on the full frequency band.
With reference to the first aspect, in some embodiments of the first aspect, the encoding the target frequency-domain coefficient of the current frame based on the first identifier and/or the second identifier includes: when the first identifier is the first value, performing LTP processing on at least one of the high frequency band, the low frequency band, or the full frequency band of the current frame based on the second identifier to obtain a residual frequency-domain coefficient of the current frame; encoding the residual frequency-domain coefficient of the current frame; and writing a value of the first identifier and a value of the second identifier into a bitstream; or when the first identifier is the second value, encoding the target frequency-domain coefficient of the current frame; and writing a value of the first identifier into a bitstream.
With reference to the first aspect, in some embodiments of the first aspect, the encoding the target frequency-domain coefficient of the current frame based on the cost function includes: determining a first identifier based on the cost function, where the first identifier is used to indicate whether to perform LTP processing on the current frame and/or indicate a frequency band on which LTP processing is to be performed and that is of the current frame; and encoding the target frequency-domain coefficient of the current frame based on the first identifier.
With reference to the first aspect, in some embodiments of the first aspect, the determining a first identifier based on the cost function includes: when the cost function of the low frequency band satisfies a first condition and the cost function of the high frequency band does not satisfy a second condition, determining that the first identifier is a first value, where the first value is used to indicate to perform LTP processing on the low frequency band; when the cost function of the low frequency band satisfies the first condition and the cost function of the high frequency band satisfies the second condition, determining that the first identifier is a third value, where the third value is used to indicate to perform LTP processing on the full frequency band; when the cost function of the low frequency band does not satisfy the first condition, determining that the first identifier is a second value, where the second value is used to indicate not to perform LTP processing on the current frame; when the cost function of the low frequency band satisfies the first condition and the cost function of the full frequency band does not satisfy a third condition, determining that the first identifier is a second value, where the second value is used to indicate not to perform LTP processing on the current frame; or when the cost function of the full frequency band satisfies the third condition, determining that the first identifier is a third value, where the third value is used to indicate to perform LTP processing on the full frequency band.
With reference to the first aspect, in some embodiments of the first aspect, the encoding the target frequency-domain coefficient of the current frame based on the first identifier includes: performing LTP processing on at least one of the high frequency band, the low frequency band, or the full frequency band of the current frame based on the first identifier to obtain a residual frequency-domain coefficient of the current frame; encoding the residual frequency-domain coefficient of the current frame; and writing a value of the first identifier into a bitstream; or when the first identifier is the second value, encoding the target frequency-domain coefficient of the current frame; and writing a value of the first identifier into a bitstream.
With reference to the first aspect, in some embodiments of the first aspect, the first condition is that the cost function of the low frequency band is greater than or equal to a first threshold, the second condition is that the cost function of the high frequency band is greater than or equal to a second threshold, and the third condition is that the cost function of the full frequency band is greater than or equal to the third threshold; or the first condition is that the cost function of the low frequency band is less than a fourth threshold, the second condition is that the cost function of the high frequency band is less than the fourth threshold, and the third condition is that the cost function of the full frequency band is greater than or equal to a fifth threshold.
With reference to the first aspect, in some embodiments of the first aspect, the method further includes: determining the cutoff frequency bin based on a spectral coefficient of the reference signal.
In this embodiment of this application, the cutoff frequency bin is determined based on the spectral coefficient of the reference signal, so that a frequency band suitable for LTP processing can be determined more accurately, LTP processing efficiency can be improved, and compression performance in audio signal encoding/decoding can be further improved. Therefore, audio signal encoding/decoding efficiency can be improved.
With reference to the first aspect, in some embodiments of the first aspect, the determining the cutoff frequency bin based on a spectral coefficient of the reference signal includes: determining, based on the spectral coefficient of the reference signal, a peak factor set corresponding to the reference signal; and determining the cutoff frequency bin based on a peak factor in the peak factor set, where the peak factor satisfies a preset condition.
With reference to the first aspect, in some embodiments of the first aspect, the cutoff frequency bin is a preset value.
In this embodiment of this application, the cutoff frequency bin is preset based on experience or with reference to an actual situation, so that a frequency band suitable for LTP processing can be determined more accurately, LTP processing efficiency can be improved, and compression performance in audio signal encoding/decoding can be further improved. Therefore, audio signal encoding/decoding efficiency can be improved.
According to a second aspect, an audio signal decoding method is provided. The method includes: parsing a bitstream to obtain a decoded frequency-domain coefficient of a current frame; parsing the bitstream to obtain a first identifier, where the first identifier is used to indicate whether to perform LTP processing on the current frame, or the first identifier is used to indicate whether to perform LTP processing on the current frame and/or indicate a frequency band on which LTP processing is to be performed and that is of the current frame; and processing the decoded frequency-domain coefficient of the current frame based on the first identifier to obtain a frequency-domain coefficient of the current frame.
In this embodiment of this application, LTP processing is performed on a signal suitable for LTP processing (no LTP processing is performed on a signal unsuitable for LTP processing). In this way, redundant information in the signal can be effectively reduced, so that compression efficiency in encoding/decoding can be improved. Therefore, audio signal encoding/decoding efficiency can be improved.
In some embodiments, the decoded frequency-domain coefficient of the current frame may be a residual frequency-domain coefficient of the current frame, or the decoded frequency-domain coefficient of the current frame is a target frequency-domain coefficient of the current frame.
In some embodiments, the bitstream may be further parsed to obtain a filtering parameter.
The filtering parameter may be used to perform filtering processing on the frequency-domain coefficient of the current frame. The filtering processing may include temporary noise shaping (TNS) processing and/or frequency-domain noise shaping (FDNS) processing, or the filtering processing may include other processing. This is not limited in this embodiment of this application.
With reference to the second aspect, in some embodiments of the second aspect, the frequency band on which LTP processing is performed and that is of the current frame includes a high frequency band, a low frequency band, or a full frequency band, where the high frequency band is a frequency band whose frequency is greater than that of a cutoff frequency bin and that is of the full frequency band of the current frame, the low frequency band is a frequency band whose frequency is less than or equal to that of the cutoff frequency bin and that is of the full frequency band of the current frame, and the cutoff frequency bin is used for division into the low frequency band and the high frequency band.
In this embodiment of this application, based on the cost function, LTP processing may be performed on a frequency band (that is, one of the low frequency band, the high frequency band, or the full frequency band) that is suitable for LTP processing and that is of the current frame (no LTP processing is performed on a frequency band unsuitable for LTP processing). In this way, redundant information in a signal can be reduced by more effectively using a long-term correlation of the signal, so that compression performance in audio signal encoding/decoding can be further improved. Therefore, audio signal encoding/decoding efficiency can be improved.
With reference to the second aspect, in some embodiments of the second aspect, when the first identifier is a first value, the decoded frequency-domain coefficient of the current frame is a residual frequency-domain coefficient of the current frame; or when the first identifier is a second value, the decoded frequency-domain coefficient of the current frame is a target frequency-domain coefficient of the current frame.
With reference to the second aspect, in some embodiments of the second aspect, the parsing a bitstream to obtain a first identifier includes: parsing the bitstream to obtain the first identifier; and when the first identifier is the first value, parsing the bitstream to obtain a second identifier, where the second identifier is used to indicate a frequency band on which LTP processing is to be performed and that is of the current frame.
With reference to the second aspect, in some embodiments of the second aspect, the processing the decoded frequency-domain coefficient of the current frame based on the first identifier to obtain a frequency-domain coefficient of the current frame includes: when the first identifier is the first value and the second identifier is a fourth value, obtaining a reference target frequency-domain coefficient of the current frame, where the first value is used to indicate to perform LTP processing on the current frame, and the fourth value is used to indicate to perform LTP processing on the low frequency band; performing LTP synthesis based on a predicted gain of the low frequency band, the reference target frequency-domain coefficient, and the residual frequency-domain coefficient of the current frame to obtain the target frequency-domain coefficient of the current frame; and processing the target frequency-domain coefficient of the current frame to obtain the frequency-domain coefficient of the current frame; or when the first identifier is the first value and the second identifier is a third value, obtaining a reference target frequency-domain coefficient of the current frame, where the first value is used to indicate to perform LTP processing on the current frame, and the third value is used to indicate to perform LTP processing on the full frequency band; performing LTP synthesis based on a predicted gain of the full frequency band, the reference target frequency-domain coefficient, and the residual frequency-domain coefficient of the current frame to obtain the target frequency-domain coefficient of the current frame; and processing the target frequency-domain coefficient of the current frame to obtain the frequency-domain coefficient of the current frame; or when the first identifier is the second value, processing the target frequency-domain coefficient of the current frame to obtain the frequency-domain coefficient of the current frame, where the second value is used to indicate not to perform LTP processing on the current frame.
With reference to the second aspect, in some embodiments of the second aspect, the processing the target frequency-domain coefficient of the current frame based on the first identifier to obtain a frequency-domain coefficient of the current frame includes: when the first identifier is the first value, obtaining a reference target frequency-domain coefficient of the current frame, where the first value is used to indicate to perform LTP processing on the low frequency band; performing LTP synthesis based on a predicted gain of the low frequency band, the reference target frequency-domain coefficient, and the residual frequency-domain coefficient of the current frame to obtain the target frequency-domain coefficient of the current frame; and processing the target frequency-domain coefficient of the current frame to obtain the frequency-domain coefficient of the current frame; or when the first identifier is a third value, obtaining a reference target frequency-domain coefficient of the current frame, where the third value is used to indicate to perform LTP processing on the full frequency band; performing LTP synthesis based on a predicted gain of the full frequency band, the reference target frequency-domain coefficient, and the residual frequency-domain coefficient of the current frame to obtain the target frequency-domain coefficient of the current frame; and processing the target frequency-domain coefficient of the current frame to obtain the frequency-domain coefficient of the current frame; or when the first identifier is the second value, processing the target frequency-domain coefficient of the current frame to obtain the frequency-domain coefficient of the current frame, where the second value is used to indicate not to perform LTP processing on the current frame.
With reference to the second aspect, in some embodiments of the second aspect, the obtaining a reference target frequency-domain coefficient of the current frame includes: parsing the bitstream to obtain a pitch period of the current frame; determining a reference frequency-domain coefficient of the current frame based on the pitch period of the current frame; and processing the reference frequency-domain coefficient to obtain the reference target frequency-domain coefficient.
With reference to the second aspect, in some embodiments of the second aspect, the method further includes: determining the cutoff frequency bin based on a spectral coefficient of the reference signal.
In this embodiment of this application, the cutoff frequency bin is determined based on the spectral coefficient of the reference signal, so that a frequency band suitable for LTP processing can be determined more accurately, LTP processing efficiency can be improved, and compression performance in audio signal encoding/decoding can be further improved. Therefore, audio signal encoding/decoding efficiency can be improved.
With reference to the second aspect, in some embodiments of the second aspect, the determining the cutoff frequency bin based on a spectral coefficient of the reference signal includes: determining, based on the spectral coefficient of the reference signal, a peak factor set corresponding to the reference signal; and determining the cutoff frequency bin based on a peak factor in the peak factor set, where the peak factor satisfies a preset condition.
With reference to the second aspect, in some embodiments of the second aspect, the cutoff frequency bin is a preset value.
In this embodiment of this application, the cutoff frequency bin is preset based on experience or with reference to an actual situation, so that a frequency band suitable for LTP processing can be determined more accurately, LTP processing efficiency can be improved, and compression performance in audio signal encoding/decoding can be further improved. Therefore, audio signal encoding/decoding efficiency can be improved.
According to a third aspect, an audio signal encoding apparatus is provided, including: an obtaining module, configured to obtain a target frequency-domain coefficient of a current frame and a reference target frequency-domain coefficient of the current frame; a processing module, configured to calculate a cost function based on the target frequency-domain coefficient and the reference target frequency-domain coefficient of the current frame, where the cost function is for determining whether to perform long-term prediction LTP processing on the current frame during encoding of the target frequency-domain coefficient of the current frame; and an encoding module, configured to encode the target frequency-domain coefficient of the current frame based on the cost function.
In this embodiment of this application, the cost function is calculated based on the target frequency-domain coefficient and the reference target frequency-domain coefficient of the current frame, and LTP processing may be performed, based on the cost function, on a signal suitable for LTP processing (no LTP processing is performed on a signal unsuitable for LTP processing), so that compression performance in audio signal encoding/decoding can be improved. Therefore, audio signal encoding/decoding efficiency can be improved.
In some embodiments, the target frequency-domain coefficient and the reference target frequency-domain coefficient of the current frame may be obtained through processing based on a filtering parameter. The filtering parameter may be obtained by performing filtering processing on a frequency-domain coefficient of the current frame. The frequency-domain coefficient of the current frame may be obtained by performing time to frequency domain transform on a time-domain signal of the current frame. The time to frequency domain transform may be MDCT, DCT, FFT, or the like.
The reference target frequency-domain coefficient may be a target frequency-domain coefficient of a reference signal of the current frame.
In some embodiments, the filtering processing may include temporary noise shaping (TNS) processing and/or frequency-domain noise shaping (FDNS) processing, or the filtering processing may include other processing. This is not limited in this embodiment of this application.
With reference to the third aspect, in some embodiments of the third aspect, the cost function includes at least one of a cost function of a high frequency band of the current frame, a cost function of a low frequency band of the current frame, or a cost function of a full frequency band of the current frame. The high frequency band is a frequency band whose frequency is greater than that of a cutoff frequency bin and that is of the full frequency band of the current frame, the low frequency band is a frequency band whose frequency is less than or equal to that of the cutoff frequency bin and that is of the full frequency band of the current frame, and the cutoff frequency bin is used for division into the low frequency band and the high frequency band.
In this embodiment of this application, based on the cost function, LTP processing may be performed on a frequency band (that is, one of the low frequency band, the high frequency band, or the full frequency band) that is suitable for LTP processing and that is of the current frame (no LTP processing is performed on a frequency band unsuitable for LTP processing), so that compression performance in audio signal encoding/decoding can be further improved. Therefore, audio signal encoding/decoding efficiency can be improved.
With reference to the third aspect, in some embodiments of the third aspect, the cost function is a predicted gain of a current frequency band of the current frame, or the cost function is a ratio of energy of an estimated residual frequency-domain coefficient of a current frequency band of the current frame to energy of a target frequency-domain coefficient of the current frequency band. The estimated residual frequency-domain coefficient is a difference between the target frequency-domain coefficient of the current frequency band and a predicted frequency-domain coefficient of the current frequency band, the predicted frequency-domain coefficient is obtained based on a reference frequency-domain coefficient and the predicted gain of the current frequency band of the current frame, and the current frequency band is the low frequency band, the high frequency band, or the full frequency band.
With reference to the third aspect, in some embodiments of the third aspect, the encoding module is in some embodiments configured to: determine a first identifier and/or a second identifier based on the cost function, where the first identifier is used to indicate whether to perform LTP processing on the current frame, and the second identifier is used to indicate a frequency band on which LTP processing is to be performed and that is of the current frame; and encode the target frequency-domain coefficient of the current frame based on the first identifier and/or the second identifier.
With reference to the third aspect, in some embodiments of the third aspect, the encoding module is in some embodiments configured to: when the cost function of the low frequency band satisfies a first condition and the cost function of the high frequency band does not satisfy a second condition, determine that the first identifier is a first value and the second identifier is a fourth value, where the first value is used to indicate to perform LTP processing on the current frame, and the fourth value is used to indicate to perform LTP processing on the low frequency band; when the cost function of the low frequency band satisfies the first condition and the cost function of the high frequency band satisfies the second condition, determine that the first identifier is a first value and the second identifier is a third value, where the third value is used to indicate to perform LTP processing on the full frequency band, and the first value is used to indicate to perform LTP processing on the current frame; when the cost function of the low frequency band does not satisfy the first condition, determine that the first identifier is a second value, where the second value is used to indicate not to perform LTP processing on the current frame; when the cost function of the low frequency band satisfies the first condition and the cost function of the full frequency band does not satisfy a third condition, determine that the first identifier is a second value, where the second value is used to indicate not to perform LTP processing on the current frame; or when the cost function of the full frequency band satisfies the third condition, determine that the first identifier is a first value and the second identifier is a third value, where the third value is used to indicate to perform LTP processing on the full frequency band.
With reference to the third aspect, in some embodiments of the third aspect, the encoding module is in some embodiments configured to: when the first identifier is the first value, perform LTP processing on at least one of the high frequency band, the low frequency band, or the full frequency band of the current frame based on the second identifier to obtain a residual frequency-domain coefficient of the current frame; encode the residual frequency-domain coefficient of the current frame; and write a value of the first identifier and a value of the second identifier into a bitstream; or when the first identifier is the second value, encode the target frequency-domain coefficient of the current frame; and write a value of the first identifier into a bitstream.
With reference to the third aspect, in some embodiments of the third aspect, the encoding module is in some embodiments configured to: determine a first identifier based on the cost function, where the first identifier is used to indicate whether to perform LTP processing on the current frame and/or indicate a frequency band on which LTP processing is to be performed and that is of the current frame; and encode the target frequency-domain coefficient of the current frame based on the first identifier.
With reference to the third aspect, in some embodiments of the third aspect, the encoding module is in some embodiments configured to: when the cost function of the low frequency band satisfies a first condition and the cost function of the high frequency band does not satisfy a second condition, determine that the first identifier is a first value, where the first value is used to indicate to perform LTP processing on the low frequency band; when the cost function of the low frequency band satisfies the first condition and the cost function of the high frequency band satisfies the second condition, determine that the first identifier is a third value, where the third value is used to indicate to perform LTP processing on the full frequency band; when the cost function of the low frequency band does not satisfy the first condition, determine that the first identifier is a second value, where the second value is used to indicate not to perform LTP processing on the current frame; when the cost function of the low frequency band satisfies the first condition and the cost function of the full frequency band does not satisfy a third condition, determine that the first identifier is a second value, where the second value is used to indicate not to perform LTP processing on the current frame; or when the cost function of the full frequency band satisfies the third condition, determine that the first identifier is a third value, where the third value is used to indicate to perform LTP processing on the full frequency band.
With reference to the third aspect, in some embodiments of the third aspect, the encoding module is in some embodiments configured to: perform LTP processing on at least one of the high frequency band, the low frequency band, or the full frequency band of the current frame based on the first identifier to obtain a residual frequency-domain coefficient of the current frame; encode the residual frequency-domain coefficient of the current frame; and write a value of the first identifier into a bitstream; or when the first identifier is the second value, encode the target frequency-domain coefficient of the current frame; and write a value of the first identifier into a bitstream.
With reference to the third aspect, in some embodiments of the third aspect, the first condition is that the cost function of the low frequency band is greater than or equal to a first threshold, the second condition is that the cost function of the high frequency band is greater than or equal to a second threshold, and the third condition is that the cost function of the full frequency band is greater than or equal to the third threshold; or the first condition is that the cost function of the low frequency band is less than a fourth threshold, the second condition is that the cost function of the high frequency band is less than the fourth threshold, and the third condition is that the cost function of the full frequency band is greater than or equal to a fifth threshold.
With reference to the third aspect, in some embodiments of the third aspect, the processing module is further configured to determine the cutoff frequency bin based on a spectral coefficient of the reference signal.
In this embodiment of this application, the cutoff frequency bin is determined based on the spectral coefficient of the reference signal, so that a frequency band suitable for LTP processing can be determined more accurately, LTP processing efficiency can be improved, and compression performance in audio signal encoding/decoding can be further improved. Therefore, audio signal encoding/decoding efficiency can be improved.
With reference to the third aspect, in some embodiments of the third aspect, the processing module is in some embodiments configured to: determine, based on the spectral coefficient of the reference signal, a peak factor set corresponding to the reference signal; and determine the cutoff frequency bin based on a peak factor in the peak factor set, where the peak factor satisfies a preset condition.
With reference to the third aspect, in some embodiments of the third aspect, the cutoff frequency bin is a preset value.
In this embodiment of this application, the cutoff frequency bin is preset based on experience or with reference to an actual situation, so that a frequency band suitable for LTP processing can be determined more accurately, LTP processing efficiency can be improved, and compression performance in audio signal encoding/decoding can be further improved. Therefore, audio signal encoding/decoding efficiency can be improved.
According to a fourth aspect, an audio signal decoding apparatus is provided, including: a decoding module, configured to parse a bitstream to obtain a decoded frequency-domain coefficient of a current frame, where the decoding module is further configured to parse the bitstream to obtain a first identifier, where the first identifier is used to indicate whether to perform LTP processing on the current frame, or the first identifier is used to indicate whether to perform LTP processing on the current frame and/or indicate a frequency band on which LTP processing is to be performed and that is of the current frame; and a processing module, configured to process the decoded frequency-domain coefficient of the current frame based on the first identifier to obtain a frequency-domain coefficient of the current frame.
In this embodiment of this application, LTP processing is performed on a signal suitable for LTP processing (no LTP processing is performed on a signal unsuitable for LTP processing). In this way, redundant information in the signal can be effectively reduced, so that compression efficiency in encoding/decoding can be improved. Therefore, audio signal encoding/decoding efficiency can be improved.
In some embodiments, the decoded frequency-domain coefficient of the current frame may be a residual frequency-domain coefficient of the current frame, or the decoded frequency-domain coefficient of the current frame is a target frequency-domain coefficient of the current frame.
In some embodiments, the bitstream may be further parsed to obtain a filtering parameter.
The filtering parameter may be used to perform filtering processing on the frequency-domain coefficient of the current frame. The filtering processing may include temporary noise shaping (TNS) processing and/or frequency-domain noise shaping (FDNS) processing, or the filtering processing may include other processing. This is not limited in this embodiment of this application.
With reference to the fourth aspect, in some embodiments of the fourth aspect, the frequency band on which LTP processing is performed and that is of the current frame includes a high frequency band, a low frequency band, or a full frequency band, where the high frequency band is a frequency band whose frequency is greater than that of a cutoff frequency bin and that is of the full frequency band of the current frame, the low frequency band is a frequency band whose frequency is less than or equal to that of the cutoff frequency bin and that is of the full frequency band of the current frame, and the cutoff frequency bin is used for division into the low frequency band and the high frequency band.
In this embodiment of this application, based on the cost function, LTP processing may be performed on a frequency band (that is, one of the low frequency band, the high frequency band, or the full frequency band) that is suitable for LTP processing and that is of the current frame (no LTP processing is performed on a frequency band unsuitable for LTP processing). In this way, redundant information in a signal can be reduced by more effectively using a long-term correlation of the signal, so that compression performance in audio signal encoding/decoding can be further improved. Therefore, audio signal encoding/decoding efficiency can be improved.
With reference to the fourth aspect, in some embodiments of the fourth aspect, when the first identifier is a first value, the decoded frequency-domain coefficient of the current frame is a residual frequency-domain coefficient of the current frame; or when the first identifier is a second value, the decoded frequency-domain coefficient of the current frame is a target frequency-domain coefficient of the current frame.
With reference to the fourth aspect, in some embodiments of the fourth aspect, the decoding module is in some embodiments configured to: parse the bitstream to obtain the first identifier; and when the first identifier is the first value, parse the bitstream to obtain a second identifier, where the second identifier is used to indicate a frequency band on which LTP processing is to be performed and that is of the current frame.
With reference to the fourth aspect, in some embodiments of the fourth aspect, the processing module is in some embodiments configured to: when the first identifier is the first value and the second identifier is a fourth value, obtain a reference target frequency-domain coefficient of the current frame, where the first value is used to indicate to perform LTP processing on the current frame, and the fourth value is used to indicate to perform LTP processing on the low frequency band; perform LTP synthesis based on a predicted gain of the low frequency band, the reference target frequency-domain coefficient, and the residual frequency-domain coefficient of the current frame to obtain the target frequency-domain coefficient of the current frame; and process the target frequency-domain coefficient of the current frame to obtain the frequency-domain coefficient of the current frame; or when the first identifier is the first value and the second identifier is a third value, obtain a reference target frequency-domain coefficient of the current frame, where the first value is used to indicate to perform LTP processing on the current frame, and the third value is used to indicate to perform LTP processing on the full frequency band; perform LTP synthesis based on a predicted gain of the full frequency band, the reference target frequency-domain coefficient, and the residual frequency-domain coefficient of the current frame to obtain the target frequency-domain coefficient of the current frame; and process the target frequency-domain coefficient of the current frame to obtain the frequency-domain coefficient of the current frame; or when the first identifier is the second value, process the target frequency-domain coefficient of the current frame to obtain the frequency-domain coefficient of the current frame, where the second value is used to indicate not to perform LTP processing on the current frame.
With reference to the fourth aspect, in some embodiments of the fourth aspect, the processing module is in some embodiments configured to: when the first identifier is the first value, obtain a reference target frequency-domain coefficient of the current frame, where the first value is used to indicate to perform LTP processing on the low frequency band; perform LTP synthesis based on a predicted gain of the low frequency band, the reference target frequency-domain coefficient, and the residual frequency-domain coefficient of the current frame to obtain the target frequency-domain coefficient of the current frame; and process the target frequency-domain coefficient of the current frame to obtain the frequency-domain coefficient of the current frame; or when the first identifier is a third value, obtain a reference target frequency-domain coefficient of the current frame, where the third value is used to indicate to perform LTP processing on the full frequency band; perform LTP synthesis based on a predicted gain of the full frequency band, the reference target frequency-domain coefficient, and the residual frequency-domain coefficient of the current frame to obtain the target frequency-domain coefficient of the current frame; and process the target frequency-domain coefficient of the current frame to obtain the frequency-domain coefficient of the current frame; or when the first identifier is the second value, process the target frequency-domain coefficient of the current frame to obtain the frequency-domain coefficient of the current frame, where the second value is used to indicate not to perform LTP processing on the current frame.
With reference to the fourth aspect, in some embodiments of the fourth aspect, the processing module is in some embodiments configured to: parse the bitstream to obtain a pitch period of the current frame; determine a reference frequency-domain coefficient of the current frame based on the pitch period of the current frame; and process the reference frequency-domain coefficient to obtain the reference target frequency-domain coefficient.
With reference to the fourth aspect, in some embodiments of the fourth aspect, the processing module is further configured to determine the cutoff frequency bin based on a spectral coefficient of the reference signal.
In this embodiment of this application, the cutoff frequency bin is determined based on the spectral coefficient of the reference signal, so that a frequency band suitable for LTP processing can be determined more accurately, LTP processing efficiency can be improved, and compression performance in audio signal encoding/decoding can be further improved. Therefore, audio signal encoding/decoding efficiency can be improved.
With reference to the fourth aspect, in some embodiments of the fourth aspect, the processing module is in some embodiments configured to: determine, based on the spectral coefficient of the reference signal, a peak factor set corresponding to the reference signal; and determine the cutoff frequency bin based on a peak factor in the peak factor set, where the peak factor satisfies a preset condition.
With reference to the fourth aspect, in some embodiments of the fourth aspect, the cutoff frequency bin is a preset value.
In this embodiment of this application, the cutoff frequency bin is preset based on experience or with reference to an actual situation, so that a frequency band suitable for LTP processing can be determined more accurately, LTP processing efficiency can be improved, and compression performance in audio signal encoding/decoding can be further improved. Therefore, audio signal encoding/decoding efficiency can be improved.
According to a fifth aspect, an encoding apparatus is provided. The encoding apparatus includes a storage medium and a central processing unit. The storage medium may be a nonvolatile storage medium and stores a computer executable program, and the central processing unit is connected to the nonvolatile storage medium and executes the computer executable program to implement the method in the first aspect or the embodiments of the first aspect.
According to a sixth aspect, an encoding apparatus is provided. The encoding apparatus includes a storage medium and a central processing unit. The storage medium may be a nonvolatile storage medium and stores a computer executable program, and the central processing unit is connected to the nonvolatile storage medium and executes the computer executable program to implement the method in the second aspect or the embodiments of the second aspect.
According to a seventh aspect, a computer-readable storage medium is provided. The computer-readable medium stores program code to be executed by a device, where the program code includes instructions for performing the method in the first aspect or the embodiments of the first aspect.
According to an eighth aspect, a computer-readable storage medium is provided. The computer-readable medium stores program code to be executed by a device, where the program code includes instructions for performing the method in the second aspect or the embodiments of the second aspect.
According to a ninth aspect, an embodiment of this application provides a computer-readable storage medium. The computer-readable storage medium stores program code, where the program code includes instructions for performing a part or all of operations in either of the methods in the first aspect or the second aspect.
According to a tenth aspect, an embodiment of this application provides a computer program product. When the computer program product is run on a computer, the computer is enabled to perform a part or all of the operations in either of the methods in the first aspect or the second aspect.
In embodiments of this application, the cost function is calculated based on the target frequency-domain coefficient and the reference target frequency-domain coefficient of the current frame, and LTP processing may be performed, based on the cost function, on a signal suitable for LTP processing (no LTP processing is performed on a signal unsuitable for LTP processing). In this way, redundant information in a signal can be reduced by effectively using a long-term correlation of the signal, so that compression performance in audio signal encoding/decoding can be improved. Therefore, audio signal encoding/decoding efficiency can be improved.
The following describes technical solutions of this application with reference to the accompanying drawings.
An audio signal in embodiments of this application may be a mono audio signal, or may be a stereo signal. The stereo signal may be an original stereo signal, may be a stereo signal including two channels of signals (a left channel signal and a right channel signal) included in a multi-channel signal, or may be a stereo signal including two channels of signals generated by at least three channels of signals included in a multi-channel signal. This is not limited in embodiments of this application.
For ease of description, only a stereo signal (including a left channel signal and a right channel signal) is used as an example for description in embodiments of this application. A person skilled in the art may understand that the following embodiments are merely examples rather than limitations. The solutions in embodiments of this application are also applicable to a mono audio signal and another stereo signal. This is not limited in embodiments of this application.
The encoding component 110 is configured to encode a current frame (an audio signal) in frequency domain. In some embodiments, the encoding component 110 may be implemented by software, may be implemented by hardware, or may be implemented in a form of a combination of software and hardware. This is not limited in this embodiment of this application.
When the encoding component 110 encodes the current frame in frequency domain, in a possible embodiment, operations shown in
S210: Convert the current frame from a time-domain signal to a frequency-domain signal.
S220: Perform filtering processing on the current frame to obtain a frequency-domain coefficient of the current frame.
S230: Perform long-term prediction (LTP) determining on the current frame to obtain an LTP identifier.
When the LTP identifier is a first value (for example, the LTP identifier is 1), S250 may be performed; or when the LTP identifier is a second value (for example, the LTP identifier is 0), S240 may be performed.
S240: Encode the frequency-domain coefficient of the current frame to obtain an encoded parameter of the current frame. Then, S280 may be performed.
S250: Perform stereo encoding on the current frame to obtain a frequency-domain coefficient of the current frame.
S260: Perform LTP processing on the frequency-domain coefficient of the current frame to obtain a residual frequency-domain coefficient of the current frame.
S270: Encode the residual frequency-domain coefficient of the current frame to obtain an encoded parameter of the current frame.
S280: Write the encoded parameter of the current frame and the LTP identifier into a bitstream.
It should be noted that the encoding method shown in
For example, in the encoding method shown in
For another example, the encoding method shown in
The decoding component 120 is configured to decode an encoded bitstream generated by the encoding component 110, to obtain an audio signal of the current frame.
In some embodiments, the encoding component 110 may be connected to the decoding component 120 in a wired or wireless manner, and the decoding component 120 may obtain, through a connection between the decoding component 120 and the encoding component 110, the encoded bitstream generated by the encoding component 110. Alternatively, the encoding component 110 may store the generated encoded bitstream into a memory, and the decoding component 120 reads the encoded bitstream in the memory.
In some embodiments, the decoding component 120 may be implemented by software, may be implemented by hardware, or may be implemented in a form of a combination of software and hardware. This is not limited in this embodiment of this application.
When the decoding component 120 decodes a current frame (an audio signal) in frequency domain, in a possible embodiment, operations shown in
S310: Parse a bitstream to obtain an encoded parameter of the current frame and an LTP identifier.
S320: Perform LTP processing based on the LTP identifier to determine whether to perform LTP synthesis on the encoded parameter of the current frame.
When the LTP identifier is a first value (for example, the LTP identifier is 1), a residual frequency-domain coefficient of the current frame is obtained by parsing the bitstream in S310. In this case, S340 may be performed. When the LTP identifier is a second value (for example, the LTP identifier is 0), a target frequency-domain coefficient of the current frame is obtained by parsing the bitstream in S310. In this case, S330 may be performed.
S330: Perform inverse filtering processing on the target frequency-domain coefficient of the current frame to obtain a frequency-domain coefficient of the current frame. Then, S370 may be performed.
S340: Perform LTP synthesis on the residual frequency-domain coefficient of the current frame to obtain an updated residual frequency-domain coefficient.
S350: Perform stereo decoding on the updated residual frequency-domain coefficient to obtain a target frequency-domain coefficient of the current frame.
S360: Perform inverse filtering processing on the target frequency-domain coefficient of the current frame to obtain a frequency-domain coefficient of the current frame.
S370: Convert the frequency-domain coefficient of the current frame to obtain a synthesized time-domain signal.
It should be noted that the decoding method shown in
For example, in the decoding method shown in
For another example, the decoding method shown in
In some embodiments, the encoding component 110 and the decoding component 120 may be disposed in a same device, or may be disposed in different devices. The device may be a terminal having an audio signal processing function, for example, a mobile phone, a tablet computer, a laptop portable computer, a desktop computer, a Bluetooth speaker, a recording pen, or a wearable device. Alternatively, the device may be a network element having an audio signal processing capability in a core network or a wireless network. This is not limited in this embodiment.
For example, as shown in
In some embodiments, the mobile terminal 130 may include a collection component 131, an encoding component 110, and a channel encoding component 132. The collection component 131 is connected to the encoding component 110, and the encoding component 110 is connected to the encoding component 132.
In some embodiments, the mobile terminal 140 may include an audio playing component 141, the decoding component 120, and a channel decoding component 142. The audio playing component 141 is connected to the decoding component 120, and the decoding component 120 is connected to the channel decoding component 142.
After collecting an audio signal by using the collection component 131, the mobile terminal 130 encodes the audio signal by using the encoding component 110, to obtain an encoded bitstream; and then encodes the encoded bitstream by using the channel encoding component 132, to obtain a to-be-transmitted signal.
The mobile terminal 130 sends the to-be-transmitted signal to the mobile terminal 140 by using the wireless or wired network.
After receiving the to-be-transmitted signal, the mobile terminal 140 decodes the to-be-transmitted signal by using the channel decoding component 142, to obtain the encoded bitstream; decodes the encoded bitstream by using the decoding component 120, to obtain the audio signal; and plays the audio signal by using the audio playing component. It may be understood that the mobile terminal 130 may alternatively include the components included in the mobile terminal 140, and the mobile terminal 140 may alternatively include the components included in the mobile terminal 130.
For example, as shown in
In some embodiments, the network element 150 includes a channel decoding component 151, the decoding component 120, the encoding component 110, and a channel encoding component 152. The channel decoding component 151 is connected to the decoding component 120, the decoding component 120 is connected to the encoding component 110, and the encoding component 110 is connected to the channel encoding component 152.
After receiving a to-be-transmitted signal sent by another device, the channel decoding component 151 decodes the to-be-transmitted signal to obtain a first encoded bitstream; the decoding component 120 decodes the encoded bitstream to obtain an audio signal; the encoding component 110 encodes the audio signal to obtain a second encoded bitstream; and the channel encoding component 152 encodes the second encoded bitstream to obtain the to-be-transmitted signal.
The another device may be a mobile terminal having an audio signal processing capability, or may be another network element having an audio signal processing capability. This is not limited in this embodiment.
In some embodiments, the encoding component 110 and the decoding component 120 in the network element may transcode an encoded bitstream sent by the mobile terminal.
In some embodiments, in this embodiment of this application, a device on which the encoding component 110 is installed may be referred to as an audio encoding device. In actual embodiment, the audio encoding device may also have an audio decoding function. This is not limited in this embodiment of this application.
In some embodiments, this embodiment of this application is described by using only a stereo signal as an example. In this application, the audio encoding device may further process a mono signal or a multi-channel signal, and the multi-channel signal includes at least two channels of signals.
This application provides an audio signal encoding method and apparatus, and an audio signal decoding method and apparatus. Filtering processing is performed on a frequency-domain coefficient of a current frame to obtain a filtering parameter, and filtering processing is performed on the frequency-domain coefficient of the current frame and the reference frequency-domain coefficient based on the filtering parameter, so that bits written into a bitstream can be reduced, and compression efficiency in encoding/decoding can be improved. Therefore, audio signal encoding/decoding efficiency can be improved.
S610: Obtain a target frequency-domain coefficient of a current frame and a reference target frequency-domain coefficient of the current frame.
In some embodiments, the target frequency-domain coefficient and the reference target frequency-domain coefficient of the current frame may be obtained through processing based on a filtering parameter. The filtering parameter may be obtained by performing filtering processing on a frequency-domain coefficient of the current frame. The frequency-domain coefficient of the current frame may be obtained by performing time to frequency domain transform on a time-domain signal of the current frame. The time to frequency domain transform may be MDCT, DCT, FFT, or the like.
The reference target frequency-domain coefficient may be a target frequency-domain coefficient of a reference signal of the current frame.
In some embodiments, the filtering processing may include temporary noise shaping (TNS) processing and/or frequency-domain noise shaping (FDNS) processing, or the filtering processing may include other processing. This is not limited in this embodiment of this application.
S620: Calculate a cost function based on the target frequency-domain coefficient and the reference target frequency-domain coefficient of the current frame.
The cost function may be used to determine whether to perform long-term prediction (LTP) processing on the current frame during encoding of the target frequency-domain coefficient of the current frame.
In some embodiments, the cost function may include at least one of a cost function of a high frequency band, a cost function of a low frequency band, or a cost function of a full frequency band of the current frame.
The high frequency band may be a frequency band whose frequency is greater than that of a cutoff frequency bin and that is of the full frequency band of the current frame, the low frequency band may be a frequency band whose frequency is less than or equal to that of the cutoff frequency bin and that is of the full frequency band of the current frame, and the cutoff frequency bin may be for division into the low frequency band and the high frequency band.
In some embodiments, the cost function may be a predicted gain of a current frequency band of the current frame.
For example, the cost function of the high frequency band may be a predicted gain of the high frequency band, the cost function of the low frequency band may be a predicted gain of the low frequency band, and the cost function of the full frequency band may be a predicted gain of the full frequency band.
Alternatively, the cost function is a ratio of energy of an estimated residual frequency-domain coefficient of a current frequency band of the current frame to energy of a target frequency-domain coefficient of the current frequency band.
The estimated residual frequency-domain coefficient may be a difference between the target frequency-domain coefficient of the current frequency band and a predicted frequency-domain coefficient of the current frequency band, the predicted frequency-domain coefficient may be obtained based on a reference frequency-domain coefficient and the predicted gain of the current frequency band of the current frame, and the current frequency band is the low frequency band, the high frequency band, or the full frequency band.
For example, the predicted frequency-domain coefficient may be a product of the reference frequency-domain coefficient and the predicted gain of the current frequency band of the current frame.
For example, the cost function of the high frequency band may be a ratio of energy of a residual frequency-domain coefficient of the high frequency band to energy of the high frequency band signal, the cost function of the low frequency band may be a ratio of energy of a residual frequency-domain coefficient of the low frequency band to energy of the low frequency band signal, and the cost function of the full frequency band may be a ratio of energy of a residual frequency-domain coefficient of the full frequency band to energy of the full frequency band signal.
In this embodiment of this application, the cutoff frequency bin may be determined in the following two manners:
Manner 1:
The cutoff frequency bin may be determined based on a spectral coefficient of the reference signal.
Further, a peak factor set corresponding to the reference signal may be determined based on the spectral coefficient of the reference signal; and the cutoff frequency bin may be determined based on a peak factor in the peak factor set, where the peak factor satisfies a preset condition.
The preset condition may be a greatest value of (one or more) peak factors in the peak factor set that are greater than a sixth threshold.
For example, the peak factor set corresponding to the reference signal may be determined based on the spectral coefficient of the reference signal; and the greatest value of the (one or more) peak factors in the peak factor set that are greater than the sixth threshold may be used as the cutoff frequency bin.
Manner 2:
The cutoff frequency bin may be a preset value. In some embodiments, the cutoff frequency bin may be preset to the preset value based on experience.
For example, it is assumed that a to-be-processed signal of the current frame is a 48 kHz (Hz) sampling signal, and undergoes 480-point MDCT transform to obtain 480-point MDCT coefficients. In this case, an index of the cutoff frequency bin may be preset to 200, and a cutoff frequency corresponding to the cutoff frequency bin is 10 kHz.
S630: Encode the target frequency-domain coefficient of the current frame based on the cost function.
In some embodiments, an identifier may be determined based on the cost function. Then, the target frequency-domain coefficient of the current frame may be encoded based on the determined identifier.
In some embodiments, based on different values of the determined identifier, the target frequency-domain coefficient of the current frame may be encoded in the following two manners:
Manner 1:
In some embodiments, a first identifier and/or a second identifier may be determined based on the cost function, and the target frequency-domain coefficient of the current frame may be encoded based on the first identifier and/or the second identifier.
The first identifier may be used to indicate whether to perform LTP processing on the current frame, and the second identifier may be used to indicate a frequency band on which LTP processing is to be performed and that is of the current frame.
In some embodiments, in Manner 1, the first identifier and the second identifier may have different values, and these different values may represent different meanings.
For example, the first identifier may be a first value or a second value, and the second identifier may be a third value or a fourth value.
The first value may be 1, which indicates to perform LTP processing on the current frame. The second value may be 0, which indicates not to perform LTP processing on the current frame. The third value may be 2, which indicates to perform LTP processing on the full frequency band. The fourth value may be 3, which indicates to perform LTP processing on the low frequency band.
It should be noted that the foregoing values of the first identifier and the second identifier in the foregoing embodiment are merely examples rather than limitations.
Further, based on different determined first identifiers and/or second identifiers, there may be the following several cases:
Case 1:
When the cost function of the low frequency band satisfies a first condition and the cost function of the high frequency band does not satisfy a second condition, it may be determined that the first identifier is the first value and the second identifier is the fourth value.
In this case, LTP processing may be performed on the low frequency band of the current frame based on the second identifier to obtain the residual frequency-domain coefficient of the low frequency band. Then, the residual frequency-domain coefficient of the low frequency band and a target frequency-domain coefficient of the high frequency band may be encoded, and a value of the first identifier and a value of the second identifier are written into a bitstream.
Case 2:
When the cost function of the low frequency band satisfies the first condition and the cost function of the high frequency band satisfies the second condition, it may be determined that the first identifier is the first value and the second identifier is the third value.
In this case, LTP processing may be performed on the full frequency band of the current frame based on the second identifier to obtain the residual frequency-domain coefficient of the full frequency band. Then, the residual frequency-domain coefficient of the full frequency band may be encoded, and a value of the first identifier and a value of the second identifier are written into a bitstream.
Case 3:
When the cost function of the low frequency band does not satisfy the first condition, it may be determined that the first identifier is the second value.
In this case, the target frequency-domain coefficient of the current frame may be encoded (instead of encoding the residual frequency-domain coefficient of the current frame after the residual frequency-domain coefficient of the current frame is obtained by performing LTP processing on the current frame), and a value of the first identifier is written into a bitstream.
Case 4:
When the cost function of the low frequency band satisfies the first condition and the cost function of the full frequency band does not satisfy a third condition, it may be determined that the first identifier is the second value.
In this case, the target frequency-domain coefficient of the current frame may be encoded, and a value of the first identifier is written into a bitstream.
Case 5:
When the cost function of the full frequency band satisfies the third condition, it may be determined that the first identifier is the first value and the second identifier is the third value.
In this case, LTP processing may be performed on the full frequency band of the current frame based on the second identifier to obtain the residual frequency-domain coefficient of the full frequency band. Then, the residual frequency-domain coefficient of the full frequency band may be encoded, and a value of the first identifier and a value of the second identifier are written into a bitstream.
In Manner 1, when the cost function is defined differently, the first condition, the second condition, or the third condition may also be different.
For example, when the cost function is the predicted gain of the current frequency band of the current frame, the first condition may be that the cost function of the low frequency band is greater than or equal to a first threshold, the second condition may be that the cost function of the high frequency band is greater than or equal to a second threshold, and the third condition may be that the cost function of the full frequency band is greater than or equal to the third threshold.
For another example, when the cost function is the difference between the target frequency-domain coefficient of the current frequency band and the predicted frequency-domain coefficient of the current frequency band, the first condition may be that the cost function of the low frequency band is less than a fourth threshold, the second condition may be that the cost function of the high frequency band is less than the fourth threshold, and the third condition may be that the cost function of the full frequency band is greater than or equal to a fifth threshold.
The first threshold, the second threshold, the third threshold, the fourth threshold, and the fifth threshold may be all preset to 0.5.
Alternatively, the first threshold may be preset to 0.45, the second threshold may be preset to 0.5, the third threshold may be preset to 0.55, the fourth threshold may be preset to 0.6, and the fifth threshold may be preset to 0.65.
Alternatively, the first threshold may be preset to 0.4, the second threshold may be preset to 0.4, the third threshold may be preset to 0.5, the fourth threshold may be preset to 0.6, and the fifth threshold may be preset to 0.7.
It should be understood that the values in the foregoing embodiment are merely examples rather than limitations. The first threshold, the second threshold, the third threshold, the fourth threshold, and the fifth threshold may be all preset based on experience (or with reference to actual situations). This is not limited in this embodiment of this application.
Manner 2:
In some embodiments, a first identifier may be determined based on the cost function; and the target frequency-domain coefficient of the current frame may be encoded based on the first identifier.
The first identifier may be used to indicate whether to perform LTP processing on the current frame, or the first identifier may be used to indicate whether to perform LTP processing on the current frame and indicate a frequency band on which LTP processing is to be performed and that is of the current frame.
In some embodiments, in Manner 2, the first identifier may alternatively have different values, and these different values may also represent different meanings.
For example, the first identifier may be a first value or a second value, and the second identifier may be a third value or a fourth value.
The first value may be 1, which indicates (to perform LTP processing on the current frame and) to perform LTP processing on the low frequency band. The second value may be 0, which indicates not to perform LTP processing on the current frame. The third value may be 2, which indicates (to perform LTP processing on the current frame and) to perform LTP processing on the full frequency band.
It should be noted that the foregoing values of the first identifier in the foregoing embodiment are merely examples rather than limitations.
Further, based on different determined first identifiers, there may be the following several cases:
Case 1:
When the cost function of the low frequency band satisfies a first condition and the cost function of the high frequency band does not satisfy a second condition, it may be determined that the first identifier is the first value.
In this case, LTP processing may be performed on the low frequency band of the current frame based on the first identifier to obtain the residual frequency-domain coefficient of the low frequency band. Then, the residual frequency-domain coefficient of the low frequency band and a target frequency-domain coefficient of the high frequency band may be encoded, and a value of the first identifier is written into a bitstream.
Case 2:
When the cost function of the low frequency band satisfies the first condition and the cost function of the high frequency band satisfies the second condition, it may be determined that the first identifier is the third value.
In this case, LTP processing may be performed on the full frequency band of the current frame based on the first identifier to obtain the residual frequency-domain coefficient of the full frequency band. Then, the residual frequency-domain coefficient of the full frequency band may be encoded, and a value of the first identifier is written into a bitstream.
Case 3:
When the cost function of the low frequency band does not satisfy the first condition, it may be determined that the first identifier is the second value.
In this case, the target frequency-domain coefficient of the current frame may be encoded, and a value of the first identifier is written into a bitstream.
Case 4:
When the cost function of the low frequency band satisfies the first condition and the cost function of the full frequency band does not satisfy a third condition, it may be determined that the first identifier is the second value.
In this case, the target frequency-domain coefficient of the current frame may be encoded (instead of encoding the residual frequency-domain coefficient of the current frame after the residual frequency-domain coefficient of the current frame is obtained by performing LTP processing on the current frame), and a value of the first identifier is written into a bitstream.
Case 5:
When the cost function of the full frequency band satisfies the third condition, it may be determined that the first identifier is the third value.
In this case, LTP processing may be performed on the full frequency band of the current frame based on the first identifier to obtain the residual frequency-domain coefficient of the full frequency band. Then, the residual frequency-domain coefficient of the full frequency band may be encoded, and a value of the first identifier is written into a bitstream.
In Manner 2, when the cost function is defined differently, the first condition, the second condition, or the third condition may also be different.
For example, when the cost function is the predicted gain of the current frequency band of the current frame, the first condition may be that the cost function of the low frequency band is greater than or equal to a first threshold, the second condition may be that the cost function of the high frequency band is greater than or equal to a second threshold, and the third condition may be that the cost function of the full frequency band is greater than or equal to the third threshold.
For another example, when the cost function is the difference between the target frequency-domain coefficient of the current frequency band and the predicted frequency-domain coefficient of the current frequency band, the first condition may be that the cost function of the low frequency band is less than a fourth threshold, the second condition may be that the cost function of the high frequency band is less than the fourth threshold, and the third condition may be that the cost function of the full frequency band is greater than or equal to a fifth threshold.
The first threshold, the second threshold, the third threshold, the fourth threshold, and the fifth threshold are all preset to 0.5.
Alternatively, the first threshold may be preset to 0.45, the second threshold may be preset to 0.5, the third threshold may be preset to 0.55, the fourth threshold may be preset to 0.6, and the fifth threshold may be preset to 0.65.
Alternatively, the first threshold may be preset to 0.4, the second threshold may be preset to 0.4, the third threshold may be preset to 0.5, the fourth threshold may be preset to 0.6, and the fifth threshold may be preset to 0.7.
It should be understood that the values in the foregoing embodiment are merely examples rather than limitations. The first threshold, the second threshold, the third threshold, the fourth threshold, and the fifth threshold may be all preset based on experience (or with reference to actual situations). This is not limited in this embodiment of this application.
With reference to
It should be understood that the embodiment shown in
S710: Obtain a target frequency-domain coefficient of a current frame.
In some embodiments, a left channel signal and a right channel signal of the current frame may be converted from a time domain to a frequency domain through MDCT transform to obtain an MDCT coefficient of the left channel signal and an MDCT coefficient of the right channel signal, that is, a frequency-domain coefficient of the left channel signal and a frequency-domain coefficient of the right channel signal.
Then, TNS processing may be performed on a frequency-domain coefficient of the current frame to obtain a linear prediction coding (linear prediction coding, LPC) coefficient (that is, a TNS parameter), so as to achieve an objective of performing noise shaping on the current frame. The TNS processing is to perform LPC analysis on the frequency-domain coefficient of the current frame. For a specific LPC analysis method, refer to a conventional technology. Details are not described herein.
In addition, because TNS processing is not suitable for all frames of signals, a TNS identifier may be further used to indicate whether to perform TNS processing on the current frame. For example, when the TNS identifier is 0, no TNS processing is performed on the current frame. When the TNS identifier is 1, TNS processing is performed on the frequency-domain coefficient of the current frame by using the obtained LPC coefficient, to obtain a processed frequency-domain coefficient of the current frame. The TNS identifier is obtained through calculation based on input signals (that is, the left channel signal and the right channel signal of the current frame) of the current frame. For a specific method, refer to the conventional technology. Details are not described herein.
Then, FDNS processing may be further performed on the processed frequency-domain coefficient of the current frame to obtain a time-domain LPC coefficient. Then, the time-domain LPC coefficient is converted to a frequency domain to obtain a frequency-domain FDNS parameter. The FDNS processing belongs to a frequency-domain noise shaping technology. In an embodiment, an energy spectrum of the processed frequency-domain coefficient of the current frame is calculated, an autocorrelation coefficient is obtained based on the energy spectrum, the time-domain LPC coefficient is obtained based on the autocorrelation coefficient, and the time-domain LPC coefficient is then converted to the frequency domain to obtain the frequency-domain FDNS parameter. For a specific FDNS processing method, refer to the conventional technology. Details are not described herein.
It should be noted that an order of performing TNS processing and FDNS processing is not limited in this embodiment of this application. For example, alternatively, FDNS processing may be performed on the frequency-domain coefficient of the current frame before TNS processing. This is not limited in this embodiment of this application.
In this embodiment of this application, for ease of understanding, the TNS parameter and the FDNS parameter may also be referred to as filtering parameters, and the TNS processing and the FDNS processing may also be referred to as filtering processing.
In this case, the frequency-domain coefficient of the current frame may be processed based on the TNS parameter and the FDNS parameter, to obtain the target frequency-domain coefficient of the current frame.
For ease of description, in this embodiment of this application, the target frequency-domain coefficient of the current frame may be expressed as X[k]. The target frequency-domain coefficient of the current frame may include a target frequency-domain coefficient of the left channel signal and a target frequency-domain coefficient of the right channel signal. The target frequency-domain coefficient of the left channel signal may be expressed as XL[k], and the target frequency-domain coefficient of the right channel signal may be expressed as XR[k], where k=0, 1, . . . , W, both k and W are positive integers, 0≤k≤W, and W may represent a quantity of points on which MDCT transform needs to be performed (or W may represent a quantity of MDCT coefficients that need to be encoded).
S720: Obtain a reference target frequency-domain coefficient of the current frame.
In some embodiments, an optimal pitch period may be obtained by searching pitch periods, and a reference signal ref[j] of the current frame is obtained from a history buffer based on the optimal pitch period. Any pitch period searching method may be used to search the pitch periods. This is not limited in this embodiment of this application.
ref[j]=syn[L−N−K+j],j=0,1, . . . ,N−1
A history buffer signal syn stores a synthesized time-domain signal obtained through inverse MDCT transform, a length satisfies L=2N, N represents a frame length, and K represents a pitch period.
For the history buffer signal syn, an arithmetic-coded residual signal is decoded, LTP synthesis is performed, inverse TNS processing and inverse FDNS processing are performed based on the TNS parameter and the FDNS parameter that are obtained in S710, inverse MDCT transform is then performed to obtain a synthesized time-domain signal. The synthesized time-domain signal is stored in the history buffer SYn. Inverse TNS processing is an inverse operation of TNS processing (filtering), to obtain a signal that has not undergone TNS processing. Inverse FDNS processing is an inverse operation of FDNS processing (filtering), to obtain a signal that has not undergone FDNS processing. For specific methods for performing inverse TNS processing and inverse FDNS processing, refer to the conventional technology. Details are not described herein.
In some embodiments, MDCT transform is performed on the reference signal ref[j], and filtering processing is performed on a frequency-domain coefficient of the reference signal ref[j] based on the filtering parameter (obtained after the frequency-domain coefficient X[k] of the current frame is analyzed) obtained in S710.
First, TNS processing may be performed on an MDCT coefficient of the reference signal ref[j] based on the TNS identifier and the TNS parameter (obtained after the frequency-domain coefficient X[k] of the current frame is analyzed) obtained in S710, to obtain a TNS-processed reference frequency-domain coefficient.
For example, when the TNS identifier is 1, TNS processing is performed on the MDCT coefficient of the reference signal based on the TNS parameter.
Then, FDNS processing may be performed on the TNS-processed reference frequency-domain coefficient based on the FDNS parameter (obtained after the frequency-domain coefficient X[k] of the current frame is analyzed) obtained in S710, to obtain an FDNS-processed reference frequency-domain coefficient, that is, the reference target frequency-domain coefficient Xref[k].
It should be noted that an order of performing TNS processing and FDNS processing is not limited in this embodiment of this application. For example, alternatively, FDNS processing may be performed on the reference frequency-domain coefficient (that is, the MDCT coefficient of the reference signal) before TNS processing. This is not limited in this embodiment of this application.
S730: Perform frequency-domain LTP determining on the current frame.
In some embodiments, an LTP-predicted gain of the current frame may be calculated based on the target frequency-domain coefficient X[k] and the reference target frequency-domain coefficient Xref[k] of the current frame.
For example, the following formula may be used to calculate an LTP-predicted gain of the left channel signal (or the right channel signal) of the current frame:
gi may represent an LTP-predicted gain of an ith subframe of the left channel signal (or the right channel signal), M represents a quantity of MDCT coefficients participating in LTP processing, k is a positive integer, and 0≤k≤M. It should be noted that, in this embodiment of this application, a part of frames may be divided into several subframes, and a part of frames have only one subframe. For ease of description, the ith subframe is used for description herein. When there is only one subframe, i is equal to 0.
In some embodiments, the LTP identifier of the current frame may be determined based on the LTP-predicted gain of the current frame. The LTP identifier may be used to indicate whether to perform LTP processing on the current frame.
It should be noted that when the current frame includes the left channel signal and the right channel signal, the LTP identifier of the current frame may be used for indication in the following two manners.
Manner 1:
The LTP identifier of the current frame may be used to indicate whether to perform LTP processing on the current frame.
The LTP identifier may further include the first identifier and/or the second identifier described in the embodiment of the method 600 in
For example, the LTP identifier may include the first identifier and the second identifier. The first identifier may be used to indicate whether to perform LTP processing on the current frame, and the second identifier may be used to indicate a frequency band on which LTP processing is to be performed and that is of the current frame.
For another example, the LTP identifier may be the first identifier. The first identifier may be used to indicate whether to perform LTP processing on the current frame. In addition, when LTP processing is performed on the current frame, the first identifier may further indicate a frequency band (for example, a high frequency band, a low frequency band, or a full frequency band of the current frame) on which LTP processing is performed and that is of the current frame.
Manner 2:
The LTP identifier of the current frame may include an LTP identifier of a left channel and an LTP identifier of a right channel. The LTP identifier of the left channel may be used to indicate whether to perform LTP processing on the left channel signal, and the LTP identifier of the right channel may be used to indicate whether to perform LTP processing on the right channel signal.
Further, as described in the embodiment of the method 600 in
The following provides description by using the LTP identifier of the left channel as an example. The LTP identifier of the right channel is similar to the LTP identifier of the left channel. Details are not described herein.
For example, the LTP identifier of the left channel may include the first identifier of the left channel and the second identifier of the left channel. The first identifier of the left channel may be used to indicate whether to perform LTP processing on the left channel, and the second identifier may be used to indicate a frequency band on which LTP processing is performed and that is of the left channel.
For another example, the LTP identifier of the left channel may be the first identifier of the left channel. The first identifier of the left channel may be used to indicate whether to perform LTP processing on the left channel. In addition, when LTP processing is performed on the left channel, the first identifier of the left channel may further indicate a frequency band (for example, a high frequency band, a low frequency band, or a full frequency band of the left channel) on which LTP processing is performed and that is of the left channel.
For specific description of the first identifier and the second identifier in the foregoing two manners, refer to the embodiment in
In the embodiment of the method 700, the LTP identifier of the current frame may be used for indication in Manner 1. It should be understood that the embodiment of the method 700 is merely an example rather than a limitation. The LTP identifier of the current frame in the method 700 may alternatively be used for indication in Manner 2. This is not limited in this embodiment of this application.
For example, in the method 700, an LTP-predicted gain may be calculated for each of subframes of the left channel and the right channel of the current frame. If a frequency-domain predicted gain gi of any subframe is less than a preset threshold, the LTP identifier of the current frame may be set to 0, that is, an LTP module is disabled for the current frame. In this case, the target frequency-domain coefficient of the current frame may be encoded. Otherwise, if a frequency-domain predicted gain of each subframe of the current frame is greater than the preset threshold, the LTP identifier of the current frame may be set to 1, that is, an LTP module is enabled for the current frame. In this case, the following S740 continues to be performed.
The preset threshold may be set with reference to an actual situation. For example, the preset threshold may be set to 0.5, 0.4, or 0.6.
In this embodiment of this application, bandwidth of the current frame may be categorized into a high frequency band, a low frequency band, and a full frequency band.
In some embodiments, a cost function of the left channel signal (and/or the right channel signal) may be calculated; whether to perform LTP processing on the current frame is determined based on the cost function; and when LTP processing is performed on the current frame, LTP processing is performed on at least one of the high frequency band, the low frequency band, or the full frequency band of the current frame based on the cost function to obtain a residual frequency-domain coefficient of the current frame.
For example, when LTP processing is performed on the high frequency band, a residual frequency-domain coefficient of the high frequency band may be obtained. When LTP processing is performed on the low frequency band, a residual frequency-domain coefficient of the low frequency band may be obtained. When LTP processing is performed on the full frequency band, a residual frequency-domain coefficient of the full frequency band may be obtained.
The cost function may include a cost function of the high frequency band, a cost function of the low frequency band, and/or a cost function of the full frequency band of the current frame. The high frequency band may be a frequency band whose frequency is greater than that of a cutoff frequency bin and that is of the full frequency band of the current frame, the low frequency band may be a frequency band whose frequency is less than or equal to the cutoff frequency bin and that is of the full frequency band of the current frame, and the cutoff frequency bin may be used for division into the low frequency band and the high frequency band.
In this embodiment of this application, the cutoff frequency bin may be determined in the following two manners:
Manner 1:
The cutoff frequency bin may be determined based on a spectral coefficient of the reference signal.
In some embodiments, a peak factor set corresponding to the reference signal may be determined based on the spectral coefficient of the reference signal; and the cutoff frequency bin may be determined based on a peak factor in the peak factor set, where the peak factor satisfies a preset condition.
Further, the peak factor set corresponding to the reference signal may be determined based on the spectral coefficient of the reference signal; and a greatest value of peak factors in the peak factor set that satisfy a preset condition may be used as the cutoff frequency bin.
The preset condition may be a greatest value of (one or more) peak factors in the peak factor set that are greater than a sixth threshold.
For example, the peak factor set may be calculated based on the following formula:
CFp represents the peak factor set, P represents a set of values k that satisfy a condition, w represents a size of a sliding window, and p represents an element in the set P.
In this case, an index value stopLine of a cutoff frequency bin coefficient of a low-frequency MDCT coefficient may be determined based on the following formula:
stopLine=max{p|CFp>thr6,p∈P}
thr6 represents the sixth threshold.
Manner 2:
The cutoff frequency bin may be a preset value. In some embodiments, the cutoff frequency bin may be preset to the preset value based on experience.
For example, it is assumed that a to-be-processed signal of the current frame is a 48 kHz (Hz) sampling signal, and undergoes 480-point MDCT transform to obtain 480-point MDCT coefficients. In this case, an index of the cutoff frequency bin may be preset to 200, and a cutoff frequency corresponding to the cutoff frequency bin is 10 kHz.
The following provides description by using the left channel signal as an example. In other words, the following description is not limited to the left channel signal or the right channel signal. In this embodiment of this application, a method for processing the left channel signal is the same as a method for processing the right channel signal.
At least one of the cost function of the high frequency band, the cost function of the low frequency band, and the cost function of the full frequency band of the current frame may be calculated.
In some embodiments, the cost function may be calculated by using the following two methods:
Method 1:
In some embodiments, the cost function may be a predicted gain of a current frequency band of the current frame.
For example, the cost function of the high frequency band may be a predicted gain of the high frequency band, the cost function of the low frequency band may be a predicted gain of the low frequency band, and the cost function of the full frequency band may be a predicted gain of the full frequency band.
For example, the cost function may be calculated based on the following formula:
X[k] represents a target frequency-domain coefficient of the current frame, Xref[k] represents the reference target frequency-domain coefficient, stopLine represents the index value of the cutoff frequency bin coefficient of the low-frequency MDCT coefficient, stopLine=M/2, gLFi represents a predicted gain of a low frequency band of an ith subframe, gHFi represents a predicted gain of a high frequency band of the ith subframe, gFBi represents a predicted gain of a full frequency band of the ith subframe, M represents a quantity of MDCT coefficients participating in LTP processing, k is a positive integer, and 0≤k≤M.
Method 2:
In some embodiments, the cost function is a ratio of energy of an estimated residual frequency-domain coefficient of a current frequency band of the current frame to energy of a target frequency-domain coefficient of the current frequency band.
The estimated residual frequency-domain coefficient may be a difference between the target frequency-domain coefficient of the current frequency band and a predicted frequency-domain coefficient of the current frequency band, the predicted frequency-domain coefficient may be obtained based on a reference frequency-domain coefficient and the predicted gain of the current frequency band of the current frame, and the current frequency band is the low frequency band, the high frequency band, or the full frequency band.
For example, the predicted frequency-domain coefficient may be a product of the reference frequency-domain coefficient and the predicted gain of the current frequency band of the current frame.
For example, the cost function of the high frequency band may be a ratio of energy of a residual frequency-domain coefficient of the high frequency band to energy of the high frequency band signal, the cost function of the low frequency band may be a ratio of energy of a residual frequency-domain coefficient of the low frequency band to energy of the low frequency band signal, and the cost function of the full frequency band may be a ratio of energy of a residual frequency-domain coefficient of the full frequency band to energy of the full frequency band signal.
For example, the cost function may be calculated based on the following formula:
rHFi represents the ratio of the energy of the residual frequency-domain coefficient of the high frequency band to the energy of the high frequency band signal, rLFi represents the ratio of the energy of the residual frequency-domain coefficient of the low frequency band to the energy of the low frequency band signal, rFBi represents the ratio of the energy of the residual frequency-domain coefficient of the full frequency band to the energy of the full frequency band signal, stopLine represents an index value of a cutoff frequency bin coefficient of the low-frequency MDCT coefficient, stopLine=M/2, gLFi represents a predicted gain of a low frequency band of an it subframe, gHFi represents a predicted gain of a high frequency band of the ith subframe, gFBi represents a predicted gain of a full frequency band of the ith subframe, M represents a quantity of MDCT coefficients participating in LTP processing, k is a positive integer, and 0≤k≤M.
Further, the first identifier and/or the second identifier may be determined based on the cost function.
In some embodiments, based on different determined identifiers, the target frequency-domain coefficient of the current frame may be encoded in the following two manners:
Manner 1:
In some embodiments, the first identifier and/or the second identifier may be determined based on the cost function, and the target frequency-domain coefficient of the current frame may be encoded based on the first identifier and/or the second identifier.
The first identifier may be used to indicate whether to perform LTP processing on the current frame, and the second identifier may be used to indicate a frequency band on which LTP processing is to be performed and that is of the current frame.
In some embodiments, in Manner 1, the first identifier and the second identifier may have different values, and these different values may represent different meanings.
For example, the first identifier may be a first value or a second value, and the second identifier may be a third value or a fourth value.
The first value may be used to indicate to perform LTP processing on the current frame, the second value may be used to indicate not to perform LTP processing on the current frame, the third value may be used to indicate to perform LTP processing on the full frequency band, and the fourth value may be used to indicate to perform LTP processing on the low frequency band.
For example, the first value may be 1, the second value may be 0, the third value may be 2, and the fourth value may be 3.
It should be noted that the foregoing values of the first identifier and the second identifier in the foregoing embodiment are merely examples rather than limitations.
Further, based on different determined first identifiers and/or second identifiers, there may be the following several cases:
Case 1:
When the cost function of the low frequency band satisfies a first condition and the cost function of the high frequency band does not satisfy a second condition, the first identifier may be the first value, and the second identifier may be the fourth value.
Case 2:
When the cost function of the low frequency band satisfies the first condition and the cost function of the high frequency band satisfies the second condition, the first identifier may be the first value, and the second identifier may be the third value.
Case 3:
When the cost function of the low frequency band does not satisfy the first condition, the first identifier may be the second value.
Case 4:
When the cost function of the low frequency band satisfies the first condition and the cost function of the full frequency band does not satisfy a third condition, the first identifier may be the second value.
Case 5:
When the cost function of the full frequency band satisfies the third condition, the first identifier may be the first value, and the second identifier may be the third value.
In Manner 1, when the cost function is defined differently, the first condition, the second condition, or the third condition may also be different.
For example, when the cost function is the predicted gain of the current frequency band of the current frame, the first condition may be that the cost function of the low frequency band is greater than or equal to a first threshold, the second condition may be that the cost function of the high frequency band is greater than or equal to a second threshold, and the third condition may be that the cost function of the full frequency band is greater than or equal to the third threshold.
For another example, when the cost function is the ratio of the energy of the estimated residual frequency-domain coefficient of the current frequency band of the current frame to the energy of the target frequency-domain coefficient of the current frequency band, the first condition may be that the cost function of the low frequency band is less than a fourth threshold, the second condition may be that the cost function of the high frequency band is less than the fourth threshold, and the third condition may be that the cost function of the full frequency band is greater than or equal to a fifth threshold.
The first threshold, the second threshold, the third threshold, the fourth threshold, and the fifth threshold are all preset to 0.5.
Alternatively, the first threshold may be preset to 0.45, the second threshold may be preset to 0.5, the third threshold may be preset to 0.55, the fourth threshold may be preset to 0.6, and the fifth threshold may be preset to 0.65.
Alternatively, the first threshold may be preset to 0.4, the second threshold may be preset to 0.4, the third threshold may be preset to 0.5, the fourth threshold may be preset to 0.6, and the fifth threshold may be preset to 0.7.
It should be understood that the values in the foregoing embodiment are merely examples rather than limitations. The first threshold, the second threshold, the third threshold, the fourth threshold, and the fifth threshold may be all preset based on experience (or with reference to actual situations). This is not limited in this embodiment of this application.
Manner 2:
In some embodiments, the first identifier may be determined based on the cost function; and the target frequency-domain coefficient of the current frame may be encoded based on the first identifier.
The first identifier may be used to indicate whether to perform LTP processing on the current frame, or the first identifier may be used to indicate whether to perform LTP processing on the current frame and indicate a frequency band on which LTP processing is to be performed and that is of the current frame.
In some embodiments, in Manner 2, the first identifier may alternatively have different values, and these different values may also represent different meanings.
For example, the first identifier may be a first value or a second value, and the second identifier may be a third value or a fourth value.
The first value may be used to indicate (to perform LTP processing on the current frame and) to perform LTP processing on the low frequency band, the second value may be used to indicate not to perform LTP processing on the current frame, and the third value may be used to indicate (to perform LTP processing on the current frame and) to perform LTP processing on the full frequency band.
For example, the first value may be 1, the second value may be 0, and the third value may be 2.
It should be noted that the foregoing values of the first identifier in the foregoing embodiment are merely examples rather than limitations.
Further, based on different determined first identifiers, there may be the following several cases:
Case 1:
When the cost function of the low frequency band satisfies a first condition and the cost function of the high frequency band does not satisfy a second condition, the first identifier may be the first value.
Case 2:
When the cost function of the low frequency band satisfies the first condition and the cost function of the high frequency band satisfies the second condition, the first identifier may be the third value.
Case 3:
When the cost function of the low frequency band does not satisfy the first condition, the first identifier may be the second value.
Case 4:
When the cost function of the low frequency band satisfies the first condition and the cost function of the full frequency band does not satisfy a third condition, the first identifier may be the second value.
Case 5:
When the cost function of the full frequency band satisfies the third condition, the first identifier may be the third value.
In Manner 2, when the cost function is defined differently, the first condition, the second condition, or the third condition may also be different.
For example, when the cost function is the predicted gain of the current frequency band of the current frame, the first condition may be that the cost function of the low frequency band is greater than or equal to a first threshold, the second condition may be that the cost function of the high frequency band is greater than or equal to a second threshold, and the third condition may be that the cost function of the full frequency band is greater than or equal to the third threshold.
For another example, when the cost function is the ratio of the energy of the estimated residual frequency-domain coefficient of the current frequency band of the current frame to the energy of the target frequency-domain coefficient of the current frequency band, the first condition may be that the cost function of the low frequency band is less than a fourth threshold, the second condition may be that the cost function of the high frequency band is less than the fourth threshold, and the third condition may be that the cost function of the full frequency band is greater than or equal to a fifth threshold.
The first threshold, the second threshold, the third threshold, the fourth threshold, and the fifth threshold are all preset to 0.5.
Alternatively, the first threshold may be preset to 0.45, the second threshold may be preset to 0.5, the third threshold may be preset to 0.55, the fourth threshold may be preset to 0.6, and the fifth threshold may be preset to 0.65.
Alternatively, the first threshold may be preset to 0.4, the second threshold may be preset to 0.4, the third threshold may be preset to 0.5, the fourth threshold may be preset to 0.6, and the fifth threshold may be preset to 0.7.
It should be understood that the values in the foregoing embodiment are merely examples rather than limitations. The first threshold, the second threshold, the third threshold, the fourth threshold, and the fifth threshold may be all preset based on experience (or with reference to actual situations). This is not limited in this embodiment of this application.
It should be noted that when the first identifier indicates not to perform LTP processing on the current frame, S740 may continue to be performed, and the target frequency-domain coefficient of the current frame is directly encoded after S740 is performed. Otherwise, S750 may be directly performed (that is, S740 is not performed).
S740: Perform stereo processing on the current frame.
In some embodiments, an intensity level difference (ILD) between the left channel of the current frame and the right channel of the current frame may be calculated.
For example, the ILD between the left channel of the current frame and the right channel of the current frame may be calculated based on the following formula:
XL[k] represents the target frequency-domain coefficient of the left channel signal, XR[k] represents the target frequency-domain coefficient of the right channel signal, M represents a quantity of MDCT coefficients participating in LTP processing, k is a positive integer, and 0≤k≤M.
In some embodiments, energy of the left channel signal and energy of the right channel signal may be adjusted by using the ILD obtained through calculation based on the foregoing formula. A specific adjustment method is as follows:
A ratio of the energy of the left channel signal to the energy of the right channel signal is calculated based on the ILD.
For example, the ratio of the energy of the left channel signal to the energy of the right channel signal may be calculated based on the following formula, and the ratio may be denoted as nrgRatio:
If the ratio nrgRatio is greater than 1.0, an MDCT coefficient of the right channel is adjusted based on the following formula:
XrefR[k] on the left of the formula represents an adjusted MDCT coefficient of the right channel, and XR[k] on the right of the formula represents the unadjusted MDCT coefficient of the right channel.
If nrgRatio is less than 1.0, an MDCT coefficient of the left channel is adjusted based on the following formula:
X[refL] on the left of the formula represents an adjusted MDCT coefficient of the left channel, and XL[k] on the right of the formula represents the unadjusted MDCT coefficient of the left channel.
Mid/side stereo (mid/side stereo, MS) signals of the current frame are adjusted based on the adjusted target frequency-domain coefficient XrefR[k] of the right channel signal and the adjusted target frequency-domain coefficient XrefL[k] of the left channel signal:
X
M[k]=(XrefL[k]+XrefR[k])*√{square root over (2)}/2
X
S[k]=(XrefL[k]−XrefR[k])*√{square root over (2)}/2
XM[k] represents an M channel of a mid/side stereo signal, XS[k] represents an S channel of a mid/side stereo signal, XrefL[k] represents the adjusted target frequency-domain coefficient of the left channel signal, XrefR[k] represents the adjusted target frequency-domain coefficient of the right channel signal, M represents the quantity of MDCT coefficients participating in LTP processing, k is a positive integer, and 0≤k≤M.
S750: Perform stereo determining on the current frame.
In some embodiments, scalar quantization and arithmetic coding may be performed on the target frequency-domain coefficient XL[k] of the left channel signal to obtain a quantity of bits required for quantizing the left channel signal. The quantity of bits required for quantizing the left channel signal may be denoted as bitL.
In some embodiments, scalar quantization and arithmetic coding may also be performed on the target frequency-domain coefficient XR[k] of the right channel signal to obtain a quantity of bits required for quantizing the right channel signal. The quantity of bits required for quantizing the right channel signal may be denoted as bitR.
In some embodiments, scalar quantization and arithmetic coding may also be performed on the mid/side stereo signal XM[k] to obtain a quantity of bits required for quantizing XM[k]. The quantity of bits required for quantizing XM[k] may be denoted as bitM.
In some embodiments, scalar quantization and arithmetic coding may also be performed on the mid/side stereo signal XS[k] to obtain a quantity of bits required for quantizing XS[k]. The quantity of bits required for quantizing XS[k] may be denoted as bitS.
For details about the foregoing quantization process and bit estimation process, refer to the conventional technology. Details are not described herein.
In this case, if bitL+bitR is greater than bitM+bitS, a stereo coding identifier stereoMode may be set to 1, to indicate that the stereo signals XM[k] and XS[k] need to be encoded during subsequent encoding.
Otherwise, the stereo coding identifier stereoMode may be set to 0, to indicate that XL[k] and XR[k] need to be encoded during subsequent encoding.
It should be noted that, in this embodiment of this application, LTP processing may alternatively be performed on the target frequency domain coefficient of the current frame before stereo determining is performed on an LTP-processed left channel signal and an LTP-processed right channel signal of the current frame, that is, S760 is performed before S750.
S760: Perform LTP processing on the target frequency-domain coefficient of the current frame.
In some embodiments, LTP processing may be performed on the target frequency-domain coefficient of the current frame in the following two cases:
Case 1:
If the LTP identifier enableRALTP of the current frame is 1 and the stereo coding identifier stereoMode is 0, LTP processing is separately performed on XL[k] and XR[k]:
X
L[k]=XL[k]−gLi*XrefL[k]
X
R[k]=XR[k]−gRi*XrefR[k]
XL[k] on the left of the formula represents an LTP-synthesized residual frequency-domain coefficient of the left channel, XL[k] on the right of the formula represents the target frequency-domain coefficient of the left channel signal, XR[k] on the left of the formula represents an LTP-synthesized residual frequency-domain coefficient of the right channel, XR[k] on the right of the formula represents the target frequency-domain coefficient of the right channel signal, XrefL represents a TNS- and FDNS-processed reference signal of the left channel, XrefR represents a TNS- and FDNS-processed reference signal of the right channel, gLi may represent an LTP-predicted gain of an ith subframe of the left channel, gRi may represent an LTP-predicted gain of an ith subframe of the right channel signal, M represents the quantity of MDCT coefficients participating in LTP processing, k is a positive integer, and 0≤k≤M.
Further, in this embodiment of this application, LTP processing may alternatively be performed on at least one of the high frequency band, the low frequency band, or the full frequency band of the current frame based on the first identifier and/or the second identifier determined in the foregoing S730, to obtain the residual frequency-domain coefficient of the current frame.
For example, when LTP processing is performed on the high frequency band, a residual frequency-domain coefficient of the high frequency band may be obtained. When LTP processing is performed on the low frequency band, a residual frequency-domain coefficient of the low frequency band may be obtained. When LTP processing is performed on the full frequency band, a residual frequency-domain coefficient of the full frequency band may be obtained.
The following provides description by using the left channel signal as an example. In other words, the following description is not limited to the left channel signal or the right channel signal. In this embodiment of this application, a method for processing the left channel signal is the same as a method for processing the right channel signal.
For example, when the first identifier and/or the second identifier satisfy or satisfies Case 1 in Manner 1 of encoding the target frequency-domain coefficient of the current frame based on the determined identifier in S730, LTP processing may be performed on a low frequency band based on the following formula:
XrefL represents a reference target frequency-domain coefficient of the left channel, gLFi represents a predicted gain of a low frequency band of the ith subframe of the left channel, stopLine represents the index value of the cutoff frequency bin coefficient of the low-frequency MDCT coefficient, stopLine=M/2, M represents the quantity of MDCT coefficients participating in LTP processing, k is a positive integer, and 0≤k≤M.
When the first identifier and/or the second identifier satisfy or satisfies Case 2 or Case 5 in Manner 1 of encoding the target frequency-domain coefficient of the current frame based on the determined identifier in S730, LTP processing may be performed on a full frequency band based on the following formula:
X
L[k]=XL[k]−gFBi*XrefL[k]
XrefL represents a reference target frequency-domain coefficient of the left channel, gFBi represents a predicted gain of a full frequency band of the ith subframe of the left channel, stopLine represents the index value of the cutoff frequency bin coefficient of the low-frequency MDCT coefficient, stopLine=M/2, M represents the quantity of MDCT coefficients participating in LTP processing, k is a positive integer, and 0≤k≤M.
For another example, when the first identifier satisfies Case 1 in Manner 2 of encoding the target frequency-domain coefficient of the current frame based on the determined identifier in S730, LTP processing may be performed on a low frequency band based on the following formula:
XrefL represents a reference target frequency-domain coefficient of the left channel, gLFi represents a predicted gain of a low frequency band of the ith subframe of the left channel, stopLine represents the index value of the cutoff frequency bin coefficient of the low-frequency MDCT coefficient, stopLine=M/2, M represents the quantity of MDCT coefficients participating in LTP processing, k is a positive integer, and 0≤k≤M.
When the first identifier satisfies Case 2 or Case 5 in Manner 2 of encoding the target frequency-domain coefficient of the current frame based on the determined identifier in S730, LTP processing may be performed on a full frequency band based on the following formula:
X
L[k]=XL[k]−gFBi*XrefL[k]
XrefL represents a reference target frequency-domain coefficient of the left channel, gFBi represents a predicted gain of a full frequency band of the ith subframe of the left channel, stopLine represents the index value of the cutoff frequency bin coefficient of the low-frequency MDCT coefficient, stopLine=M/2, M represents the quantity of MDCT coefficients participating in LTP processing, k is a positive integer, and 0≤k≤M.
Then, arithmetic coding may be performed on LTP-processed XL[k] and XR[k] (that is, the residual frequency-domain coefficient XL[k] of the left channel signal and the residual frequency-domain coefficient XR[k] of the right channel signal).
Case 2:
If the LTP identifier enableRALTP of the current frame is 1 and the stereo coding identifier stereoMode is 1, LTP processing is separately performed on XM[ki] and XS[k]:
X
M[k]=XM[k]−gMi*XrefM[k]
X
S[k]=XS[k]−gSi*XrefS[k]
XM[k] on the left of the formula represents an LTP-synthesized residual frequency-domain coefficient of the M channel, XM[k] on the right of the formula represents a residual frequency-domain coefficient of the M channel, XS[k] on the left of the formula represents an LTP-synthesized residual frequency-domain coefficient of the S channel, XS[k] on the right of the formula represents a residual frequency-domain coefficient of the S channel, gMi represents an LTP-predicted gain of an ith subframe of the M channel, gSi represents an LTP-predicted gain of an ith subframe of the S channel, M represents the quantity of MDCT coefficients participating in LTP processing, i and k are positive integers, 0≤k≤M, XrefM and XrefS represent reference signals obtained through mid/side stereo processing. Details are as follows:
X
refM[k]=(XrefL[k]+XrefR[k])*√{square root over (2)}/2
X
refS[k]=(XrefL[k]−XrefR[k])*√{square root over (2)}/2
Further, in this embodiment of this application, LTP processing may alternatively be performed on at least one of the high frequency band, the low frequency band, or the full frequency band of the current frame based on the first identifier and/or the second identifier determined in the foregoing S730, to obtain the residual frequency-domain coefficient of the current frame.
For example, when LTP processing is performed on the high frequency band, a residual frequency-domain coefficient of the high frequency band may be obtained. When LTP processing is performed on the low frequency band, a residual frequency-domain coefficient of the low frequency band may be obtained. When LTP processing is performed on the full frequency band, a residual frequency-domain coefficient of the full frequency band may be obtained.
The following provides description by using an M-channel signal as an example. In other words, the following description is not limited to the M-channel signal or the S-channel signal. In this embodiment of this application, a method for processing the M-channel signal is the same as a method for processing the S-channel signal.
For example, when the first identifier and/or the second identifier satisfy or satisfies Case 1 in Manner 1 of encoding the target frequency-domain coefficient of the current frame based on the determined identifier in S730, LTP processing may be performed on a low frequency band based on the following formula:
XrefM represents a reference target frequency-domain coefficient of the M channel, gLFi represents a predicted gain of a low frequency band of the ith subframe of the M channel, stopLine represents the index value of the cutoff frequency bin coefficient of the low-frequency MDCT coefficient, stopLine=M/2, M represents the quantity of MDCT coefficients participating in LTP processing, k is a positive integer, and 0≤k≤M.
When the first identifier and/or the second identifier satisfy or satisfies Case 2 or Case 5 in Manner 1 of encoding the target frequency-domain coefficient of the current frame based on the determined identifier in S730, LTP processing may be performed on a full frequency band based on the following formula:
X
M[k]=XM[k]−gFBi*XrefM[k]
XrefM represents a reference target frequency-domain coefficient of the M channel, gFBi represents a predicted gain of a full frequency band of the ith subframe of the M channel, stopLine represents the index value of the cutoff frequency bin coefficient of the low-frequency MDCT coefficient, stopLine=M/2, M represents the quantity of MDCT coefficients participating in LTP processing, k is a positive integer, and 0≤k≤M.
For another example, when the first identifier satisfies Case 1 in Manner 2 of encoding the target frequency-domain coefficient of the current frame based on the determined identifier in S730, LTP processing may be performed on a low frequency band based on the following formula:
XrefM represents a reference target frequency-domain coefficient of the M channel, gLFi represents a predicted gain of a low frequency band of the ith subframe of the M channel, stopLine represents the index value of the cutoff frequency bin coefficient of the low-frequency MDCT coefficient, stopLine=M/2, M represents the quantity of MDCT coefficients participating in LTP processing, k is a positive integer, and 0≤k≤M.
When the first identifier satisfies Case 2 or Case 5 in Manner 2 of encoding the target frequency-domain coefficient of the current frame based on the determined identifier in S730, LTP processing may be performed on a full frequency band based on the following formula:
X
M[k]=XM[k]−gFBi*XrefM[k]
XrefM represents a reference target frequency-domain coefficient of the M channel, gFBi represents a predicted gain of a full frequency band of the ith subframe of the M channel, stopLine represents the index value of the cutoff frequency bin coefficient of the low-frequency MDCT coefficient, stopLine=M/2, M represents the quantity of MDCT coefficients participating in LTP processing, k is a positive integer, and 0≤k≤M.
Then, arithmetic coding may be performed on LTP-processed XM[k] and XS[k] (that is, the residual frequency-domain coefficient of the current frame).
S810: Parse a bitstream to obtain a decoded frequency-domain coefficient of a current frame.
In some embodiments, the bitstream may be further parsed to obtain a filtering parameter.
The filtering parameter may be used to perform filtering processing on a frequency-domain coefficient of the current frame. The filtering processing may include temporary noise shaping (TNS) processing and/or frequency-domain noise shaping (FDNS) processing, or the filtering processing may include other processing. This is not limited in this embodiment of this application.
In some embodiments, in S810, the bitstream may be parsed to obtain a residual frequency-domain coefficient of the current frame.
S820: Parse the bitstream to obtain a first identifier.
The first identifier may be used to indicate whether to perform LTP processing on the current frame, or the first identifier may be used to indicate whether to perform LTP processing on the current frame and/or indicate a frequency band on which LTP processing is to be performed and that is of the current frame.
For example, when the first identifier is a first value, the decoded frequency-domain coefficient of the current frame is the residual frequency-domain coefficient of the current frame. The first value may be used to indicate to perform long-term prediction LTP processing on the current frame.
When the first identifier is a second value, the decoded frequency-domain coefficient of the current frame is a target frequency-domain coefficient of the current frame. The second value may be used to indicate not to perform long-term prediction LTP processing on the current frame.
In some embodiments, the frequency band on which LTP processing is performed and that is of the current frame may include a high frequency band, a low frequency band, or a full frequency band. The high frequency band may be a frequency band whose frequency is greater than that of a cutoff frequency bin and that is of the full frequency band of the current frame, the low frequency band may be a frequency band whose frequency is less than or equal to that of the cutoff frequency bin and that is of the full frequency band of the current frame, and the cutoff frequency bin may be for division into the low frequency band and the high frequency band.
In this embodiment of this application, the cutoff frequency bin may be determined in the following two manners:
Manner 1:
The cutoff frequency bin may be determined based on a spectral coefficient of the reference signal.
Further, a peak factor set corresponding to the reference signal may be determined based on the spectral coefficient of the reference signal; and the cutoff frequency bin may be determined based on a peak factor in the peak factor set, where the peak factor satisfies a preset condition.
The preset condition may be a greatest value of (one or more) peak factors in the peak factor set that are greater than a sixth threshold.
For example, the peak factor set corresponding to the reference signal may be determined based on the spectral coefficient of the reference signal; and the greatest value of the (one or more) peak factors in the peak factor set that are greater than the sixth threshold may be used as the cutoff frequency bin.
Manner 2:
The cutoff frequency bin may be a preset value. In some embodiments, the cutoff frequency bin may be preset to the preset value based on experience.
For example, it is assumed that a to-be-processed signal of the current frame is a 48 kHz (Hz) sampling signal, and undergoes 480-point MDCT transform to obtain 480-point MDCT coefficients. In this case, an index of the cutoff frequency bin may be preset to 200, and a cutoff frequency corresponding to the cutoff frequency bin is 10 kHz.
S830: Process the decoded frequency-domain coefficient of the current frame based on the first identifier to obtain a frequency-domain coefficient of the current frame.
In some embodiments, based on different first identifiers determined in S820, there may be the following two manners:
Manner 1:
In some embodiments, the bitstream may be parsed to obtain the first identifier. When the first identifier is the first value, the bitstream may be parsed to obtain a second identifier.
The second identifier may be used to indicate a frequency band on which LTP processing is to be performed and that is of the current frame.
In some embodiments, in Manner 1, the first identifier and the second identifier may have different values, and these different values may represent different meanings.
For example, the first identifier may be the first value or the second value, and the second identifier may be a third value or a fourth value.
The first value may be 1, which indicates to perform LTP processing on the current frame. The second value may be 0, which indicates not to perform LTP processing on the current frame. The third value may be 2, which indicates to perform LTP processing on the full frequency band. The fourth value may be 3, which indicates to perform LTP processing on the low frequency band.
It should be noted that the foregoing values of the first identifier and the second identifier in the foregoing embodiment are merely examples rather than limitations.
Further, based on different determined first identifiers and/or second identifiers, there may be the following several cases:
Case 1:
When the first identifier is the first value and the second identifier is the fourth value, a reference target frequency-domain coefficient of the current frame is obtained.
Then, LTP synthesis may be performed based on a predicted gain of the low frequency band, the reference target frequency-domain coefficient of the current frame, and the residual frequency-domain coefficient of the current frame to obtain the target frequency-domain coefficient of the current frame; and the target frequency-domain coefficient of the current frame is processed to obtain the frequency-domain coefficient of the current frame.
Case 2:
When the first identifier is the first value and the second identifier is the third value, the reference target frequency-domain coefficient of the current frame is obtained.
Then, LTP synthesis may be performed on a predicted gain of the full frequency band, the reference target frequency-domain coefficient of the current frame, and the residual frequency-domain coefficient of the current frame to obtain the target frequency-domain coefficient of the current frame; and the target frequency-domain coefficient of the current frame is processed to obtain the frequency-domain coefficient of the current frame.
Case 3:
When the first identifier is the second value, the target frequency-domain coefficient of the current frame is processed to obtain the frequency-domain coefficient of the current frame.
The processing (performed on the target frequency-domain coefficient of the current frame) may be inverse filtering processing. The inverse filtering processing may include inverse temporary noise shaping (TNS) processing and/or inverse frequency-domain noise shaping (FDNS) processing, or the inverse filtering processing may include other processing. This is not limited in this embodiment of this application.
Manner 2:
In some embodiments, the bitstream may be parsed to obtain the first identifier.
The first identifier may be used to indicate whether to perform LTP processing on the current frame, or the first identifier may be used to indicate whether to perform LTP processing on the current frame and indicate a frequency band on which LTP processing is to be performed and that is of the current frame.
In some embodiments, in Manner 2, the first identifier may alternatively have different values, and these different values may also represent different meanings.
For example, the first identifier may be the first value or the second value, and the second identifier may be a third value or a fourth value.
The first value may be 1, which indicates (to perform LTP processing on the current frame and) to perform LTP processing on the low frequency band. The second value may be 0, which indicates not to perform LTP processing on the current frame. The third value may be 2, which indicates (to perform LTP processing on the current frame and) to perform LTP processing on the full frequency band.
It should be noted that the foregoing values of the first identifier in the foregoing embodiment are merely examples rather than limitations.
Further, based on different determined first identifiers, there may be the following several cases:
Case 1:
When the first identifier is the first value, a reference target frequency-domain coefficient of the current frame is obtained.
Then, LTP synthesis may be performed on a predicted gain of the low frequency band, the reference target frequency-domain coefficient of the current frame, and the residual frequency-domain coefficient of the current frame to obtain the target frequency-domain coefficient of the current frame; and the target frequency-domain coefficient of the current frame is processed to obtain the frequency-domain coefficient of the current frame.
Case 2:
When the first identifier is the third value, the reference target frequency-domain coefficient of the current frame is obtained.
Then, LTP synthesis may be performed on a predicted gain of the full frequency band, the reference target frequency-domain coefficient of the current frame, and the residual frequency-domain coefficient of the current frame to obtain the target frequency-domain coefficient of the current frame; and the target frequency-domain coefficient of the current frame is processed to obtain the frequency-domain coefficient of the current frame.
Case 3:
When the first identifier is the second value, the target frequency-domain coefficient of the current frame is processed to obtain the frequency-domain coefficient of the current frame.
The processing (performed on the target frequency-domain coefficient of the current frame) may be inverse filtering processing. The inverse filtering processing may include inverse temporary noise shaping (TNS) processing and/or inverse frequency-domain noise shaping (FDNS) processing, or the inverse filtering processing may include other processing. This is not limited in this embodiment of this application.
In some embodiments, in the foregoing Manner 1 or Manner 2, the reference target frequency-domain coefficient of the current frame may be obtained by using the following method:
With reference to
It should be understood that the embodiment shown in
S910: Parse a bitstream to obtain a target frequency-domain coefficient of a current frame.
In some embodiments, a transform coefficient may be further obtained by parsing the bitstream.
The filtering parameter may be used to perform filtering processing on a frequency-domain coefficient of the current frame. The filtering processing may include temporary noise shaping (TNS) processing and/or frequency-domain noise shaping (FDNS) processing, or the filtering processing may include other processing. This is not limited in this embodiment of this application.
In some embodiments, in S910, the bitstream may be parsed to obtain a residual frequency-domain coefficient of the current frame.
For a specific bitstream parsing method, refer to a conventional technology. Details are not described herein.
S920: Parse the bitstream to obtain an LTP identifier of the current frame.
The LTP identifier may be used to indicate whether to perform long-term prediction LTP processing on the current frame.
For example, when the LTP identifier is a first value, the bitstream is parsed to obtain the residual frequency-domain coefficient of the current frame. The first value may be used to indicate to perform long-term prediction LTP processing on the current frame.
When the LTP identifier is a second value, the bitstream is parsed to obtain the target frequency-domain coefficient of the current frame. The second value may be used to indicate not to perform long-term prediction LTP processing on the current frame.
It should be noted that when the current frame includes a left channel signal and a right channel signal, the LTP identifier of the current frame may be used for indication in the following two manners.
Manner 1:
The LTP identifier of the current frame may be used to indicate whether to perform LTP processing on the current frame.
The LTP identifier may further include the first identifier and/or the second identifier described in the embodiment of the method 600 in
For example, the LTP identifier may include the first identifier and the second identifier. The first identifier may be used to indicate whether to perform LTP processing on the current frame, and the second identifier may be used to indicate a frequency band on which LTP processing is to be performed and that is of the current frame.
For another example, the LTP identifier may be the first identifier. The first identifier may be used to indicate whether to perform LTP processing on the current frame. In addition, when LTP processing is performed on the current frame, the first identifier may further indicate a frequency band (for example, a high frequency band, a low frequency band, or a full frequency band of the current frame) on which LTP processing is performed and that is of the current frame.
Manner 2:
The LTP identifier of the current frame may include an LTP identifier of a left channel and an LTP identifier of a right channel. The LTP identifier of the left channel may be used to indicate whether to perform LTP processing on the left channel signal, and the LTP identifier of the right channel may be used to indicate whether to perform LTP processing on the right channel signal.
Further, as described in the embodiment of the method 600 in
The following provides description by using the LTP identifier of the left channel as an example. The LTP identifier of the right channel is similar to the LTP identifier of the left channel. Details are not described herein.
For example, the LTP identifier of the left channel may include the first identifier of the left channel and the second identifier of the left channel. The first identifier of the left channel may be used to indicate whether to perform LTP processing on the left channel, and the second identifier may be used to indicate a frequency band on which LTP processing is performed and that is of the left channel.
For another example, the LTP identifier of the left channel may be the first identifier of the left channel. The first identifier of the left channel may be used to indicate whether to perform LTP processing on the left channel. In addition, when LTP processing is performed on the left channel, the first identifier of the left channel may further indicate a frequency band (for example, a high frequency band, a low frequency band, or a full frequency band of the left channel) on which LTP processing is performed and that is of the left channel.
For specific description of the first identifier and the second identifier in the foregoing two manners, refer to the embodiment in
In the embodiment of the method 900, the LTP identifier of the current frame may be used for indication in Manner 1. It should be understood that the embodiment of the method 900 is merely an example rather than a limitation. The LTP identifier of the current frame in the method 900 may alternatively be used for indication in Manner 2. This is not limited in this embodiment of this application.
In this embodiment of this application, bandwidth of the current frame may be categorized into a high frequency band, a low frequency band, and a full frequency band.
In this case, the bitstream may be parsed to obtain the first identifier.
The first identifier may be used to indicate whether to perform LTP processing on the current frame, or the first identifier may be used to indicate whether to perform LTP processing on the current frame and/or indicate a frequency band on which LTP processing is to be performed and that is of the current frame.
In some embodiments, the frequency band on which LTP processing is performed and that is of the current frame may include a high frequency band, a low frequency band, or a full frequency band. The high frequency band may be a frequency band whose frequency is greater than that of a cutoff frequency bin and that is of the full frequency band of the current frame, the low frequency band may be a frequency band whose frequency is less than or equal to that of the cutoff frequency bin and that is of the full frequency band of the current frame, and the cutoff frequency bin may be for division into the low frequency band and the high frequency band.
In this embodiment of this application, the cutoff frequency bin may be determined in the following two manners:
Manner 1:
The cutoff frequency bin may be determined based on a spectral coefficient of the reference signal.
In some embodiments, a peak factor set corresponding to the reference signal may be determined based on the spectral coefficient of the reference signal; and the cutoff frequency bin may be determined based on a peak factor in the peak factor set, where the peak factor satisfies a preset condition.
Further, the peak factor set corresponding to the reference signal may be determined based on the spectral coefficient of the reference signal; and a greatest value of peak factors in the peak factor set that satisfy a preset condition may be used as the cutoff frequency bin.
The preset condition may be a greatest value of (one or more) peak factors in the peak factor set that are greater than a sixth threshold.
For example, the peak factor set may be calculated based on the following formula:
CFp represents the peak factor set, P represents a set of values k that satisfy a condition, w represents a size of a sliding window, and p represents an element in the set P.
In this case, an index value stopLine of a cutoff frequency bin coefficient of a low-frequency MDCT coefficient may be determined based on the following formula:
stopLine=max{p|CFp>thr6,p∈P}
thr6 represents the sixth threshold.
Manner 2:
The cutoff frequency bin may be a preset value. In some embodiments, the cutoff frequency bin may be preset to the preset value based on experience.
For example, it is assumed that a to-be-processed signal of the current frame is a 48 kHz (Hz) sampling signal, and undergoes 480-point MDCT transform to obtain 480-point MDCT coefficients. In this case, an index of the cutoff frequency bin may be preset to 200, and a cutoff frequency corresponding to the cutoff frequency bin is 10 kHz.
Further, whether to perform LTP processing on the current frame and/or the frequency band on which LTP processing is performed and that is of the current frame may be determined based on the first identifier.
In some embodiments, based on different first identifiers obtained through decoding, there may be the following two manners:
Manner 1:
In some embodiments, the bitstream may be parsed to obtain the first identifier. When the first identifier is the first value, the bitstream may be parsed to obtain a second identifier.
The second identifier may be used to indicate a frequency band on which LTP processing is to be performed and that is of the current frame.
In some embodiments, in Manner 1, the first identifier and the second identifier may have different values, and these different values may represent different meanings.
For example, the first identifier may be the first value or the second value, and the second identifier may be a third value or a fourth value.
The first value may be used to indicate to perform LTP processing on the current frame, the second value may be used to indicate not to perform LTP processing on the current frame, the third value may be used to indicate to perform LTP processing on the full frequency band, and the fourth value may be used to indicate to perform LTP processing on the low frequency band.
For example, the first value may be 1, the second value may be 0, the third value may be 2, and the fourth value may be 3.
It should be noted that the foregoing values of the first identifier and the second identifier in the foregoing embodiment are merely examples rather than limitations.
Further, based on different first identifiers and/or second identifiers obtained by parsing the bitstream, there may be the following several cases:
Case 1:
When the first identifier is the first value and the second identifier is the fourth value, a reference target frequency-domain coefficient of the current frame is obtained.
Case 2:
When the first identifier is the first value and the second identifier is the third value, the reference target frequency-domain coefficient of the current frame is obtained.
Case 3:
When the first identifier is the second value, the target frequency-domain coefficient of the current frame is processed to obtain the frequency-domain coefficient of the current frame.
Manner 2:
In some embodiments, the bitstream may be parsed to obtain the first identifier.
The first identifier may be used to indicate whether to perform LTP processing on the current frame, or the first identifier may be used to indicate whether to perform LTP processing on the current frame and indicate a frequency band on which LTP processing is to be performed and that is of the current frame.
In some embodiments, in Manner 2, the first identifier may alternatively have different values, and these different values may also represent different meanings.
For example, the first identifier may be the first value or the second value, and the second identifier may be a third value or a fourth value.
The first value may be used to indicate (to perform LTP processing on the current frame and) to perform LTP processing on the low frequency band, the second value may be used to indicate not to perform LTP processing on the current frame, and the third value may be used to indicate (to perform LTP processing on the current frame and) to perform LTP processing on the full frequency band.
For example, the first value may be 1, the second value may be 0, and the third value may be 2.
It should be noted that the foregoing values of the first identifier in the foregoing embodiment are merely examples rather than limitations.
Further, based on different determined first identifiers, there may be the following several cases:
Case 1:
When the first identifier is the first value, a reference target frequency-domain coefficient of the current frame is obtained.
Case 2:
When the first identifier is the third value, the reference target frequency-domain coefficient of the current frame is obtained.
Case 3:
When the first identifier is the second value, the target frequency-domain coefficient of the current frame is processed to obtain the frequency-domain coefficient of the current frame.
S930: Obtain the reference target frequency-domain coefficient of the current frame.
In some embodiments, the reference target frequency-domain coefficient of the current frame may be obtained by using the following method:
For example, the bitstream may be parsed to obtain the pitch period of the current frame, and a reference signal ref[j] of the current frame may be obtained from a history buffer based on the pitch period. Any pitch period searching method may be used to search the pitch periods. This is not limited in this embodiment of this application.
ref[j]=syn[L−N−K+j],j=0,1, . . . ,N−1
A history buffer signal syn stores a decoded time-domain signal obtained through inverse MDCT transform, a length satisfies L=2N, N represents a frame length, and K represents a pitch period.
For the history buffer signal syn, an arithmetic-coded residual signal is decoded, LTP synthesis is performed, inverse TNS processing and inverse FDNS processing are performed based on the TNS parameter and the FDNS parameter that are obtained in S710, inverse MDCT transform is then performed to obtain a synthesized time-domain signal. The synthesized time-domain signal is stored in the history buffer syn. Inverse TNS processing is an inverse operation of TNS processing (e.g., filtering), to obtain a signal that has not undergone TNS processing. Inverse FDNS processing is an inverse operation of FDNS processing (e.g., filtering), to obtain a signal that has not undergone FDNS processing. For specific methods for performing inverse TNS processing and inverse FDNS processing, refer to the conventional technology. Details are not described herein.
In some embodiments, MDCT transform is performed on the reference signal ref[j], and filtering processing is performed on a frequency-domain coefficient of the reference signal ref[j] based on the filtering parameter obtained in S910, to obtain a target frequency-domain coefficient of the reference signal ref[j].
First, TNS processing may be performed on an MDCT coefficient (that is, the reference frequency-domain coefficient) of a reference signal ref[j] by using a TNS identifier and the TNS parameter, to obtain a TNS-processed reference frequency-domain coefficient.
For example, when the TNS identifier is 1, TNS processing is performed on the MDCT coefficient of the reference signal based on the TNS parameter.
Then, FDNS processing may be performed on the TNS-processed reference frequency-domain coefficient by using the FDNS parameter, to obtain an FDNS-processed reference frequency-domain coefficient, that is, the reference target frequency-domain coefficient Xref[k].
It should be noted that an order of performing TNS processing and FDNS processing is not limited in this embodiment of this application. For example, alternatively, FDNS processing may be performed on the reference frequency-domain coefficient (that is, the MDCT coefficient of the reference signal) before TNS processing. This is not limited in this embodiment of this application.
Particularly, when the current frame includes the left channel signal and the right channel signal, the reference target frequency-domain coefficient Xref[k] includes a reference target frequency-domain coefficient XrefL[k] of the left channel and a reference target frequency-domain coefficient XrefR[k] of the right channel.
In
S940: Perform LTP synthesis on the residual frequency-domain coefficient of the current frame.
In some embodiments, the bitstream may be parsed to obtain a stereo coding identifier stereoMode.
Based on different stereo coding identifiers stereoMode, there may be the following two cases:
Case 1:
If the stereo coding identifier stereoMode is 0, the target frequency-domain coefficient of the current frame obtained by parsing the bitstream in S910 is the residual frequency-domain coefficient of the current frame. For example, a residual frequency-domain coefficient of the left channel signal may be expressed as XL[k], and a residual frequency-domain coefficient of the right channel signal may be expressed as XR[k].
In this case, LTP synthesis may be performed on the residual frequency-domain coefficient XL[k] of the left channel signal and the residual frequency-domain coefficient XR[k] of the right channel signal.
For example, LTP synthesis may be performed based on the following formula:
X
L[k]=XL[k]+gLi*XrefL[k]
X
R[k]=XR[k]+gRi*XrefR[k]
XL[k] on the left of the formula represents an LTP-synthesized target frequency-domain coefficient of the left channel, XL[k] on the right of the formula represents a target frequency-domain coefficient of the left channel signal, XR[k] on the left of the formula represents an LTP-synthesized target frequency-domain coefficient of the right channel, XR[k] on the right of the formula represents a target frequency-domain coefficient of the right channel signal, XrefL represents the reference target frequency-domain coefficient of the left channel, XrefR represents the reference target frequency-domain coefficient of the right channel, gLi represents an LTP-predicted gain of an ith subframe of the left channel, gRi represents an LTP-predicted gain of an ith subframe of the right channel, M represents a quantity of MDCT coefficients participating in LTP processing, i and k are positive integers, and 0≤k≤M.
Further, in this embodiment of this application, LTP synthesis may be further performed on at least one of the high frequency band, the low frequency band, or the full frequency band of the current frame based on the first identifier and/or the second identifier obtained by parsing the bitstream in the foregoing S920, to obtain the residual frequency-domain coefficient of the current frame.
The following provides description by using the left channel signal as an example. In other words, the following description is not limited to the left channel signal or the right channel signal. In this embodiment of this application, a method for processing the left channel signal is the same as a method for processing the right channel signal.
For example, when the first identifier and/or the second identifier obtained by parsing the bitstream satisfy or satisfies Case 1 in Manner 1 in S920, LTP synthesis may be performed on a low frequency band based on the following formula:
XL[k] on the left of the formula represents an LTP-synthesized residual frequency-domain coefficient of the left channel, XL[k] on the right of the formula represents the target frequency-domain coefficient of the left channel signal, XrefL represents a reference target frequency-domain coefficient of the left channel, gLFi represents a predicted gain of a low frequency band of the ith subframe of the left channel, stopLine represents the index value of the cutoff frequency bin coefficient of the low-frequency MDCT coefficient, stopLine=M/2, M represents a quantity of MDCT coefficients participating in LTP processing, k is a positive integer, and 0≤k≤M.
When the first identifier and/or the second identifier obtained by parsing the bitstream satisfy or satisfies Case 2 or Case 5 in Manner 1 in S920, LTP synthesis may be performed on a full frequency band based on the following formula:
X
L[k]=XL[k]+gFBi*XrefLk[k]
XL[k] on the left of the formula represents an LTP-synthesized residual frequency-domain coefficient of the left channel, XL[k] on the right of the formula represents the target frequency-domain coefficient of the left channel signal, XrefL represents a reference target frequency-domain coefficient of the left channel, gFBi represents a predicted gain of a full frequency band of the ith subframe of the left channel, stopLine represents the index value of the cutoff frequency bin coefficient of the low-frequency MDCT coefficient, stopLine=M/2, M represents a quantity of MDCT coefficients participating in LTP processing, k is a positive integer, and 0≤k≤M.
For another example, when the first identifier and/or the second identifier obtained by parsing the bitstream satisfy or satisfies Case 1 in Manner 2 in S920, LTP processing may be performed on a low frequency band based on the following formula:
XrefL represents a reference target frequency-domain coefficient of the left channel, gLFi represents a predicted gain of a low frequency band of the ith subframe of the left channel, stopLine represents the index value of the cutoff frequency bin coefficient of the low-frequency MDCT coefficient, stopLine=M/2, M represents a quantity of MDCT coefficients participating in LTP processing, k is a positive integer, and 0≤k≤M.
When the first identifier and/or the second identifier obtained by parsing the bitstream satisfy or satisfies Case 2 or Case 5 in Manner 2 in S920, LTP processing may be performed on a full frequency band based on the following formula:
X
L[k]=XL[k]+gFBi*XrefL[k]
XrefL represents a reference target frequency-domain coefficient of the left channel, gFBi represents a predicted gain of a full frequency band of the ith subframe of the left channel, stopLine represents the index value of the cutoff frequency bin coefficient of the low-frequency MDCT coefficient, stopLine=M/2, M represents a quantity of MDCT coefficients participating in LTP processing, k is a positive integer, and 0≤k≤M.
Case 2:
If the stereo coding identifier stereoMode is 1, the target frequency-domain coefficient of the current frame obtained by parsing the bitstream in S910 is residual frequency-domain coefficients of mid/side stereo signals of the current frame. For example, the residual frequency-domain coefficients of the mid/side stereo signals of the current frame may be expressed as XM[k] and XS[k].
In this case, LTP synthesis may be performed on the residual frequency-domain coefficients XM[k] and XS[k] of the mid/side stereo signals of the current frame.
For example, LTP synthesis may be performed based on the following formula:
X
M[k]=XM[k]+gMi*XrefM[k]
X
S[k]=XS[k]+gSi*XrefS[k]
XM[k] on the left of the formula represents an M channel of an LTP-synthesized mid/side stereo signal of the current frame, XM[k] on the right of the formula represents a residual frequency-domain coefficient of the M channel of the current frame, XS[k] on the left of the formula represents an S channel of an LTP-synthesized mid/side stereo signal of the current frame, XS[k] on the right of the formula represents a residual frequency-domain coefficient of the S channel of the current frame, gMi represents an LTP-predicted gain of an ith subframe of the M channel, gSi represents an LTP-predicted gain of an ith subframe of the S channel, M represents a quantity of MDCT coefficients participating in LTP processing, i and k are positive integers, 0≤k≤M, and XrefM and XrefS represent reference signals obtained through mid/side stereo processing. Details are as follows:
X
refM[k]=(XrefL[k]+XrefR[k])*√{square root over (2)}/2
X
refS[k]=(XrefL[k]−XrefR[k])*√{square root over (2)}/2
Further, in this embodiment of this application, LTP synthesis may be further performed on at least one of the high frequency band, the low frequency band, or the full frequency band of the current frame based on the first identifier and/or the second identifier obtained by parsing the bitstream in the foregoing S920, to obtain the residual frequency-domain coefficient of the current frame.
The following provides description by using an M-channel signal as an example. In other words, the following description is not limited to the M-channel signal or the S-channel signal. In this embodiment of this application, a method for processing the M-channel signal is the same as a method for processing the S-channel signal.
For example, when the first identifier and/or the second identifier obtained by parsing the bitstream satisfy or satisfies Case 1 in Manner 1 in S920, LTP processing may be performed on a low frequency band based on the following formula:
XrefM represents a reference target frequency-domain coefficient of the M channel, gLFi represents a predicted gain of a low frequency band of the ith subframe of the M channel, stopLine represents the index value of the cutoff frequency bin coefficient of the low-frequency MDCT coefficient, stopLine=M/2, M represents a quantity of MDCT coefficients participating in LTP processing, k is a positive integer, and 0≤k≤M.
When the first identifier and/or the second identifier obtained by parsing the bitstream satisfy or satisfies Case 2 or Case 5 in Manner 1 in S920, LTP processing may be performed on a full frequency band based on the following formula:
X
M[k]=XM[k]+gFBi*XrefM[k]
XrefM represents a reference target frequency-domain coefficient of the M channel, gFBi represents a predicted gain of a full frequency band of the ith subframe of the M channel, stopLine represents the index value of the cutoff frequency bin coefficient of the low-frequency MDCT coefficient, stopLine=M/2, M represents a quantity of MDCT coefficients participating in LTP processing, k is a positive integer, and 0≤k≤M.
For another example, when the first identifier and/or the second identifier obtained by parsing the bitstream satisfy or satisfies Case 1 in Manner 2 in S920, LTP processing may be performed on a low frequency band based on the following formula:
XrefL represents a reference target frequency-domain coefficient of the M channel, gLFi represents a predicted gain of a low frequency band of the ith subframe of the M channel, stopLine represents the index value of the cutoff frequency bin coefficient of the low-frequency MDCT coefficient, stopLine=M/2, M represents a quantity of MDCT coefficients participating in LTP processing, k is a positive integer, and 0≤k≤M.
When the first identifier and/or the second identifier obtained by parsing the bitstream satisfy or satisfies Case 2 or Case 5 in Manner 2 in S920, LTP processing may be performed on a full frequency band based on the following formula:
X
M[k]=XM[k]+gFBi*XrefM[k]
XrefM represents a reference target frequency-domain coefficient of the M channel, gFBi represents a predicted gain of a full frequency band of the ith subframe of the M channel, stopLine represents the index value of the cutoff frequency bin coefficient of the low-frequency MDCT coefficient, stopLine=M/2, M represents a quantity of MDCT coefficients participating in LTP processing, k is a positive integer, and 0≤k≤M.
It should be noted that, in this embodiment of this application, stereo decoding may be further performed on the residual frequency-domain coefficient of the current frame, and then LTP synthesis may be performed on the residual frequency-domain coefficient of the current frame. That is, S950 is performed before S940.
S950: Perform stereo decoding on the residual frequency-domain coefficient of the current frame.
In some embodiments, if the stereo coding identifier stereoMode is 1, stereo-encoded target frequency-domain coefficients XL[k] and XR[k] of the current frame may be determined based on the following formulas:
X
L[k]=(XM[k]+XS[k])*√{square root over (2)}/2
X
R[k]=(XM[k]−XS[k])*√{square root over (2)}/2
XM[k] represents the M channel of the LTP-synthesized mid/side stereo signal of the current frame, XS[k] represents the S channel of the LTP-synthesized mid/side stereo signal of the current frame, M represents the quantity of MDCT coefficients participating in LTP processing, k is a positive integer, and 0≤k≤M.
Further, if an LTP identifier enableRALTP of the current frame is 0, the bitstream may be parsed to obtain an intensity level difference ILD between the left channel of the current frame and the right channel of the current frame, a ratio nrgRatio of energy of the left channel signal to energy of the right channel signal may be obtained, and an MDCT parameter of the left channel and an MDCT parameter of the right channel (that is, a target frequency-domain coefficient of the left channel and a target frequency-domain coefficient of the right channel) may be updated.
For example, if nrgRatio is less than 1.0, the MDCT coefficient of the left channel is adjusted based on the following formula:
XrefL[k] on the left of the formula represents an adjusted MDCT coefficient of the left channel, and XL[k] on the right of the formula represents the unadjusted MDCT coefficient of the left channel.
If the ratio nrgRatio is greater than 1.0, the MDCT coefficient of the right channel is adjusted based on the following formula:
XrefR[k] on the left of the formula represents an adjusted MDCT coefficient of the right channel, and XR[k] on the right of the formula represents the unadjusted MDCT coefficient of the right channel.
If the LTP identifier enableRALTP of the current frame is 1, the MDCT parameter XL[k] of the left channel and the MDCT parameter XR[k] of the right channel are not adjusted.
S960: Perform inverse filtering processing on the target frequency-domain coefficient of the current frame.
Inverse filtering processing is performed on the foregoing stereo-encoded target frequency-domain coefficient of the current frame to obtain the frequency-domain coefficient of the current frame.
For example, inverse FDNS processing and inverse TNS processing may be performed on the MDCT parameter XL[k] of the left channel and the MDCT parameter XR[k] of the right channel to obtain the frequency-domain coefficient of the current frame.
Then, an inverse MDCT operation is performed on the frequency-domain coefficient of the current frame to obtain a synthesized time-domain signal of the current frame.
The foregoing describes in detail the audio signal encoding method and the audio signal decoding method in embodiments of this application with reference to
In some embodiments, the cost function includes at least one of a cost function of a high frequency band of the current frame, a cost function of a low frequency band of the current frame, or a cost function of a full frequency band of the current frame. The high frequency band is a frequency band whose frequency is greater than that of a cutoff frequency bin and that is of the full frequency band of the current frame, the low frequency band is a frequency band whose frequency is less than or equal to that of the cutoff frequency bin and that is of the full frequency band of the current frame, and the cutoff frequency bin is used for division into the low frequency band and the high frequency band.
In some embodiments, the cost function is a predicted gain of a current frequency band of the current frame, or the cost function is a ratio of energy of an estimated residual frequency-domain coefficient of a current frequency band of the current frame to energy of a target frequency-domain coefficient of the current frequency band. The estimated residual frequency-domain coefficient is a difference between the target frequency-domain coefficient of the current frequency band and a predicted frequency-domain coefficient of the current frequency band, the predicted frequency-domain coefficient is obtained based on a reference frequency-domain coefficient and the predicted gain of the current frequency band of the current frame, and the current frequency band is the low frequency band, the high frequency band, or the full frequency band.
In some embodiments, the encoding module 1030 is in some embodiments configured to determine a first identifier and/or a second identifier based on the cost function, where the first identifier is used to indicate whether to perform LTP processing on the current frame, and the second identifier is used to indicate a frequency band on which LTP processing is to be performed and that is of the current frame; and
In some embodiments, the encoding module 1030 is in some embodiments configured to: when the cost function of the low frequency band satisfies a first condition and the cost function of the high frequency band does not satisfy a second condition, determine that the first identifier is a first value and the second identifier is a fourth value, where the first value is used to indicate to perform LTP processing on the current frame, and the fourth value is used to indicate to perform LTP processing on the low frequency band;
In some embodiments, the encoding module 1030 is in some embodiments configured to:
In some embodiments, the encoding module 1030 is in some embodiments configured to:
In some embodiments, the encoding module 1030 is in some embodiments configured to:
In some embodiments, the encoding module 1030 is in some embodiments configured to:
In some embodiments, the first condition is that the cost function of the low frequency band is greater than or equal to a first threshold, the second condition is that the cost function of the high frequency band is greater than or equal to a second threshold, and the third condition is that the cost function of the full frequency band is greater than or equal to the third threshold; or the first condition is that the cost function of the low frequency band is less than a fourth threshold, the second condition is that the cost function of the high frequency band is less than the fourth threshold, and the third condition is that the cost function of the full frequency band is greater than or equal to a fifth threshold.
In some embodiments, the processing module 1020 is further configured to determine the cutoff frequency bin based on a spectral coefficient of the reference signal.
In some embodiments, the processing module 1020 is in some embodiments configured to:
In some embodiments, the cutoff frequency bin is a preset value.
In some embodiments, the frequency band on which LTP processing is performed and that is of the current frame includes a high frequency band, a low frequency band, or a full frequency band, where the high frequency band is a frequency band whose frequency is greater than that of a cutoff frequency bin and that is of the full frequency band of the current frame, the low frequency band is a frequency band whose frequency is less than or equal to that of the cutoff frequency bin and that is of the full frequency band of the current frame, and the cutoff frequency bin is used for division into the low frequency band and the high frequency band.
In some embodiments, when the first identifier is a first value, the decoded frequency-domain coefficient of the current frame is a residual frequency-domain coefficient of the current frame; or when the first identifier is a second value, the decoded frequency-domain coefficient of the current frame is a target frequency-domain coefficient of the current frame.
In some embodiments, the decoding module 1110 is in some embodiments configured to: parse the bitstream to obtain the first identifier; and when the first identifier is the first value, parse the bitstream to obtain a second identifier, where the second identifier is used to indicate a frequency band on which LTP processing is to be performed and that is of the current frame.
In some embodiments, the processing module 1120 is in some embodiments configured to: when the first identifier is the first value and the second identifier is a fourth value, obtain a reference target frequency-domain coefficient of the current frame, where the first value is used to indicate to perform LTP processing on the current frame, and the fourth value is used to indicate to perform LTP processing on the low frequency band; perform LTP synthesis based on a predicted gain of the low frequency band, the reference target frequency-domain coefficient, and the residual frequency-domain coefficient of the current frame to obtain the target frequency-domain coefficient of the current frame; and process the target frequency-domain coefficient of the current frame to obtain the frequency-domain coefficient of the current frame; or when the first identifier is the first value and the second identifier is a third value, obtain a reference target frequency-domain coefficient of the current frame, where the first value is used to indicate to perform LTP processing on the current frame, and the third value is used to indicate to perform LTP processing on the full frequency band; perform LTP synthesis based on a predicted gain of the full frequency band, the reference target frequency-domain coefficient, and the residual frequency-domain coefficient of the current frame to obtain the target frequency-domain coefficient of the current frame; and process the target frequency-domain coefficient of the current frame to obtain the frequency-domain coefficient of the current frame; or when the first identifier is the second value, process the target frequency-domain coefficient of the current frame to obtain the frequency-domain coefficient of the current frame, where the second value is used to indicate not to perform LTP processing on the current frame.
In some embodiments, the processing module 1120 is in some embodiments configured to: when the first identifier is the first value, obtain a reference target frequency-domain coefficient of the current frame, where the first value is used to indicate to perform LTP processing on the low frequency band;
In some embodiments, the processing module 1120 is in some embodiments configured to: parse the bitstream to obtain a pitch period of the current frame; determine a reference frequency-domain coefficient of the current frame based on the pitch period of the current frame; and process the reference frequency-domain coefficient to obtain the reference target frequency-domain coefficient.
In some embodiments, the processing module 1120 is further configured to determine the cutoff frequency bin based on a spectral coefficient of the reference signal.
In some embodiments, the processing module 1120 is in some embodiments configured to: determine, based on the spectral coefficient of the reference signal, a peak factor set corresponding to the reference signal; and
In some embodiments, the cutoff frequency bin is a preset value.
It should be understood that the audio signal encoding method and the audio signal decoding method in embodiments of this application may be performed by a terminal device or a network device in
As shown in
It should be understood that, in
In
The first terminal device or the second terminal device in
During audio communication, a network device may implement transcoding of an encoding/decoding format of an audio signal. As shown in
Similarly, as shown in
In
It should be further understood that the audio signal encoder in
It should be understood that the audio signal encoding method and the audio signal decoding method in embodiments of this application may also be performed by a terminal device or a network device in
As shown in
It should be understood that, in
In
The first terminal device or the second terminal device in
In audio communication, a network device may implement transcoding of an encoding/decoding format of an audio signal. As shown in
Similarly, as shown in
It should be understood that, in
It should be further understood that the audio signal encoder in
A person of ordinary skill in the art may be aware that, in combination with the examples described in embodiments disclosed in this specification, units and algorithm operations may be implemented by using electronic hardware or a combination of computer software and electronic hardware. Whether the functions are performed by hardware or software depends on particular applications and design constraints of the technical solutions. A person skilled in the art may use different methods to implement the described functions of each particular application, but it should not be considered that the embodiment goes beyond the scope of this application.
It may be clearly understood by a person skilled in the art that, for the purpose of convenient and brief description, for a detailed working process of the foregoing system, apparatus, and unit, refer to a corresponding process in the foregoing method embodiments. Details are not described herein again.
In the several embodiments provided in this application, it should be understood that the disclosed system, apparatus, and method may be implemented in other manners. For example, the described apparatus embodiments are merely examples. For example, division into the units is merely logical function division and may be other division in actual embodiment. For example, a plurality of units or components may be combined or integrated into another system, or some features may be ignored or not performed. In addition, the displayed or discussed mutual couplings or direct couplings or communication connections may be implemented through some interfaces. The indirect couplings or communication connections between the apparatuses or units may be implemented in electrical, mechanical, or another form.
The units described as separate components may or may not be physically separate, and components displayed as units may or may not be physical units. To be specific, the components may be located at one position, or may be distributed on a plurality of network units. A part or all of the units may be selected based on actual requirements to achieve the objectives of the solutions in embodiments.
In addition, functional units in embodiments of this application may be integrated into one processing unit, each of the units may exist alone physically, or two or more units are integrated into one unit.
When the functions are implemented in a form of a software functional unit and sold or used as an independent product, the functions may be stored in a computer-readable storage medium. Based on such an understanding, the technical solutions of this application essentially, or the part contributing to the prior art, or a part of the technical solutions may be implemented in a form of a software product. The computer software product is stored in a storage medium, and includes several instructions for instructing a computer device (which may be a personal computer, a server, a network device, or the like) to perform all or a part of the operations of the methods described in embodiments of this application. The foregoing storage medium includes any medium that can store program code, such as a universal serial bus (USB) flash drive, a removable hard disk, a read-only memory (ReadROM), a random access memory (RAM), a magnetic disk, or an optical disc.
The foregoing descriptions are merely specific embodiments of this application, but the protection scope of this application is not limited thereto. Any variation or replacement readily figured out by a person skilled in the art within the technical scope disclosed in this application shall fall within the protection scope of this application. Therefore, the protection scope of this application shall be subject to the protection scope of the claims.
Number | Date | Country | Kind |
---|---|---|---|
201911418539.8 | Dec 2019 | CN | national |
This application is a continuation of International Application No. PCT/CN2020/141249, filed on Dec. 30, 2020, which claims priority to Chinese Patent Application No. 201911418539.8, filed on Dec. 31, 2019. The disclosures of the aforementioned applications are hereby incorporated by reference in their entireties.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/CN2020/141249 | Dec 2020 | US |
Child | 17853173 | US |