The present application relates to the field of communications, and in particular, to method for recovering lost frames.
With continuous progress of technologies, users have an increasingly high requirement on speech quality. Expanding speech bandwidth is one of the main methods for improving speech quality. However, if information carried in the added bandwidth is coded in a conventional coding manner, coding bit rates would be greatly increased. Because of this, efficient transmission of a bitstream cannot be achieved due to a limitation of current network bandwidth. Therefore, a bandwidth extension technology is often used. The bandwidth extension technology makes use of the correlation between the low frequency band of a signal and the high frequency band of the signal in order to predict the wider band signal from extracted lower-band features.
After coding a high frequency band signal by using the bandwidth extension technology, an encoding side (which comprises an encoder) transmits the coded signal to a decoding side (which comprises a decoder). The decoding side also recovers the high frequency band signal by using the bandwidth extension technology. During signal transmission, because of network congestion, network fault or other reasons, frame loss may occur. Since packet loss rate is a key factor affecting the signal quality, in order to recuperate the lost frame as correctly as possible in case of a frame loss, a lost frame recovering technology has been proposed. In this technology, the decoding side uses a synthesized high frequency band signal of a previous frame as a synthesized high frequency band signal of the lost frame, and then adjusts the synthesized high frequency band signal by using a subframe gain and a global gain of the current lost frame, to obtain a final high frequency band signal. However, in this technology, the subframe gain of the current lost frame is a fixed value, and the global gain of the current lost frame is obtained by multiplying a global gain of the previous frame by a fixed gradient. This may cause discontinuous transitions of the re-established high frequency band signal at before and after the lost frame, and severe noises in the re-established high frequency band signal.
Embodiments of the present application provide a method for recovering a lost frame, and a decoder configured according to the method. The method can improve quality of decoded high frequency band signals.
According to a first aspect, a method for recovering a lost frame of a media bitstream in a frame loss event is provided, where the method includes: obtaining a synthesized high frequency band signal of a current lost frame; obtaining recovery information related to the current lost frame, where the recovery information includes at least one of the following: a coding mode of a last frame received before the frame loss event, a frame class of the last frame received before the frame loss event, and a quantity of continuously lost frames, where the quantity of continuously lost frames is a quantity of frames that are continuously lost until the current lost frame in the frame loss event; determining a global gain gradient of the current lost frame according to the recovery information; determining a global gain of the current lost frame according to the global gain gradient and a global gain of each frame in previous M frames of the current lost frame, where M is a positive integer; determining a subframe gain of the current lost frame; and obtaining a high frequency band signal of the current lost frame by adjusting the synthesized high frequency band signal of the current lost frame according to the global gain of the current lost frame and the subframe gain of the current lost frame.
With reference to the first, in a first possible implementation manner, determining the global gain gradient of the current lost frame according to the recovery information comprises: determining the global gain gradient of the current lost frame according to the quantity of continuously lost frames and the coding mode or the frame class of the last frame received before the frame loss.
With reference to the first possible implementation manner of the first aspect, in a second possible implementation manner, the global gain gradient of the current lost frame is determined to be 1 if: the coding mode of the current lost frame is the same as the coding mode of the last frame received before the frame loss, and the quantity of continuously lost frames is less than or equal to 3, or the frame class of the current lost frame is the same as the frame class of the last frame received before the frame loss, and the quantity of continuously lost frames is less than or equal to 3.
With reference to the first possible implementation manner of the first aspect, in a third possible implementation manner, the global gain gradient of the current lost frame is determined to be less than or equal to a preset first threshold and greater than 0 if: it cannot be determined whether the coding mode of the current lost frame is the same as the coding mode of the last frame received before the frame loss or whether the frame class of the current lost frame is the same as the frame class of the last frame received before the frame loss, the last frame received before the frame loss is an unvoiced frame or a voiced frame, and the quantity of continuously lost frames is less than or equal to 3.
With reference to the first aspect, in a fourth possible implementation manner, the global gain gradient of the current lost frame is determined to be greater than a preset first threshold and smaller than 1 if: it is determined that the last frame received before the frame loss is an onset frame of a voiced frame, or the last frame received before the frame loss is an audio frame or a silent frame.
With reference to the first aspect, in a fifth possible implementation manner, the global gain gradient of the current lost frame is determined to be less than or equal to a preset first threshold and greater than 0 if: the last frame received before the frame loss is an onset frame of an unvoiced frame.
With reference to the first aspect or any implementation manner of the first possible implementation manner to the fifth possible implementation manner of the first aspect, in a sixth possible implementation manner, the determining the subframe gain of the current lost frame includes: determining a subframe gain gradient of the current lost frame according to the quantity of continuously lost frames and the coding mode or the frame class of the last frame received before the frame loss; and determining the subframe gain of the current lost frame according to the subframe gain gradient and a subframe gain of each frame in previous N frames of the current lost frame, where N is a positive integer.
With reference to the sixth possible implementation manner of the first aspect, in a seventh possible implementation manner, the subframe gain gradient of the current lost frame is determined to be less than or equal to a preset second threshold and greater than 0 if: it cannot be determined whether the coding mode of the current lost frame is the same as the coding mode of the last frame received before the frame loss or whether the frame class of the current lost frame is the same as the frame class of the last frame received before the frame loss, the last frame received before the frame loss is an unvoiced frame, and the quantity of continuously lost frames is less than or equal to 3.
With reference to the sixth possible implementation manner of the first aspect, in a eighth possible implementation manner, the subframe gain gradient of the current lost frame is determined to be greater than a preset second threshold if: the last frame received before the frame loss is an onset frame of a voiced frame.
According to a second aspect, a method for recovering a lost frame of a media bitstream in a frame loss event is provided, where the method includes: obtaining a synthesized high frequency band signal of a current lost frame in a frame loss event; obtaining recovery information related to the current lost frame, where the recovery information includes at least one of the following: a coding mode of a last frame received before the frame loss event, a frame class of a last frame received before the frame loss, and a quantity of continuously lost frames, where the quantity of continuously lost frames is a quantity of frames that are continuously lost until the current lost frame in the frame loss event; determining a subframe gain gradient of the current lost frame according to the recovery information; determining a subframe gain of the current lost frame according to the subframe gain gradient and a subframe gain of each frame in previous N frames of the current lost frame, where N is a positive integer; determining a global gain of the current lost frame, and obtaining a high frequency band signal of the current lost frame by adjusting the synthesized high frequency band signal of the current lost frame according to the subframe gain of the current lost frame and the global gain of the current lost frame.
With reference to the second aspect, in a first possible implementation manner, wherein determining the subframe gain gradient of the current lost frame according to the recovery information comprises: determining the subframe gain gradient of the current lost frame according to the quantity of continuously lost frames and the coding mode or the frame class of the last frame received before the frame loss, the subframe gain gradient of the current lost frame is determined to be less than or equal to a preset second threshold and greater than 0 if: it cannot be determined whether a coding mode of the current lost frame is the same as a coding mode of the last frame received before the frame loss or whether a frame class of the current lost frame is the same as the frame class of the last frame received before the frame loss, if it is determined that the last frame received before the frame loss is an unvoiced frame, and the quantity of continuously lost frames is less than or equal to 3, determining the subframe gain gradient, and enabling the subframe gain gradient to be less than or equal to a preset second threshold and greater than 0.
With reference to the second aspect, in a second possible implementation manner, the subframe gain gradient of the current lost frame is determined to be greater than a preset second threshold if: it is determined that the last frame received before the frame loss is an onset frame of a voiced frame.
According to a third aspect, a decoder is provided, where the decoder comprising a processor and a memory storing program codes, wherein the program codes, when executed by the processor, cause the decoder to perform a process to recover a lost frame of an media bitstream in a frame loss event, wherein the process comprises: obtaining a synthesized high frequency band signal of a current lost frame; a obtaining recovery information related to the current lost frame, where the recovery information includes at least one of the following: a coding mode of a last frame before the frame loss event, a frame class of a last frame received before the frame loss, and a quantity of continuously lost frames, where the quantity of continuously lost frames is a quantity of frames that are continuously lost until the current lost frame in the frame loss event; determining a global gain gradient of the current lost frame according to the recovery information; determining a global gain of the current lost frame according to the global gain gradient and a global gain of each frame in previous M frames of the current lost frame, where M is a positive integer; determining a subframe gain of the current lost frame; and obtaining a high frequency band signal of the current lost frame by adjusting the synthesized high frequency band signal of the current lost frame according to the global gain of the current lost frame and the subframe gain of the current lost frame.
With reference to the third aspect, in a first possible implementation manner, wherein determining the global gain gradient of the current lost frame according to the recovery information comprises: determining the global gain gradient of the current lost frame according to the quantity of continuously lost frames and the coding mode or the frame class of the last frame received before the frame loss.
With reference to the first possible implementation manner of the third aspect, in a second possible implementation manner, wherein the global gain gradient of the current lost frame is determined to be 1 if: the coding mode of the current lost frame is the same as the coding mode of the last frame received before the frame loss, and the quantity of continuously lost frames is less than or equal to 3, or the frame class of the current lost frame is the same as the frame class of the last frame received before the frame loss, and the quantity of continuously lost frames is less than or equal to 3.
With reference to the first possible implementation manner of the third aspect, in a third possible implementation manner, the global gain gradient of the current lost frame is determined to be less than or equal to a preset first threshold and greater than 0 if: it cannot be determined whether the coding mode of the current lost frame is the same as a coding mode of the last frame received before the frame loss or whether a frame class of the current lost frame is the same as the frame class of the last frame received before the frame loss, the last frame received before the frame loss is an unvoiced frame or a voiced frame, and the quantity of continuously lost frames is less than or equal to 3.
With reference to the third aspect, in a fourth possible implementation manner, the global gain gradient of the current lost frame is determined to be greater than a preset first threshold and smaller than 1 if: the last frame received before the frame loss is an onset frame of a voiced frame, or the last frame received before the frame loss is an audio frame or a silent frame.
With reference to the third aspect, in a fifth possible implementation manner, the global gain gradient of the current lost frame is determined to be less than or equal to a preset first threshold and greater than 0 if: the last frame received before the frame loss is an onset frame of an unvoiced frame.
With reference to the third aspect or any implementation manner of the first possible implementation manner to the fifth possible implementation manner of the third aspect, in a sixth possible implementation manner, wherein determining the subframe gain of the current lost frame comprises: determining a subframe gain gradient of the current lost frame according to the quantity of continuously lost frames and the coding mode or the frame class of the last frame received before the frame loss;, and determining the subframe gain of the current lost frame according to the subframe gain gradient and a subframe gain of each frame in previous N frames of the current lost frame, where N is a positive integer.
With reference to the sixth possible implementation manner of the third aspect, in a seventh possible implementation manner, the subframe gain gradient of the current lost frame is determined to be less than or equal to a preset second threshold and greater than 0 if: it cannot be determined whether a coding mode of the current lost frame is the same as the coding mode of the last frame received before the frame loss or whether the frame class of the current lost frame is the same as the frame class of the last frame received before the frame loss, the last frame received before the frame loss is an unvoiced frame, and the quantity of continuously lost frames is less than or equal to 3.
With reference to the sixth possible implementation manner of the third aspect, in a eighth possible implementation manner, the subframe gain gradient of the current lost frame is determined to be greater than a preset second threshold if: the last frame received before the frame loss is an onset frame of an unvoiced frame.
According to a fourth aspect, a decoder is provided, where the decoder includes a processor and a memory storing program codes, wherein the program codes, when executed by the processor, cause the decoder to perform a process to recover a lost frame in an media bitstream, wherein the process comprises: obtaining a synthesized high frequency band signal of a current lost frame in a frame loss event; obtaining recovery information related to the current lost frame, where the recovery information includes at least one of the following: a coding mode of a last frame received before the frame loss event, a frame class of the last frame received before the frame loss event, and a quantity of continuously lost frames, where the quantity of continuously lost frames is a quantity of frames that are continuously lost until the current lost frame in the frame loss event; determining a subframe gain gradient of the current lost frame according to the recovery information; determining a subframe gain of the current lost frame according to the subframe gain gradient and a subframe gain of each frame in previous N frames of the current lost frame, where N is a positive integer; and obtaining a high frequency band signal of the current lost frame by adjusting the synthesized high frequency band signal of the current lost frame according to the subframe gain of the current lost frame and a global gain of the current lost frame, to obtain a high frequency band signal of the current lost frame.
With reference to the fourth aspect, in a first possible implementation manner, the subframe gain gradient of the current lost is determined to be less than or equal to a preset second threshold and greater than 0 if: it cannot be determined whether a coding mode of the current lost frame is the same as a coding mode of the last frame received before the frame loss or whether a frame class of the current lost frame is the same as the frame class of the last frame received before the frame loss, if it is determined that the last frame received before the frame loss is an unvoiced frame, and the quantity of continuously lost frames is less than or equal to 3.
With reference to the fourth aspect, in a second possible implementation manner, the subframe gain gradient of the current lost frame is determined to be greater than a preset second threshold if: the last frame received before the frame loss is an onset frame of a voiced frame.
In the embodiments of the present application, a global gain gradient of a current lost frame is determined according to recovery information, a global gain of the current lost frame is determined according to the global gain gradient and a global gain of each frame in previous M frames of the current lost frame, and a synthesized high frequency band signal of the current lost frame is adjusted according to the global gain of the current lost frame and a subframe gain of the current lost frame, so that transition of a high frequency band signal of the current lost frame can be natural and smooth, and noise in the high frequency band signal can be attenuated, thereby improving quality of the high frequency band signal.
The following briefly introduces the accompanying drawings used for describing the embodiments of the present application.
Coding and decoding technologies are widely used in various electronic devices such as mobile phones, wireless devices, personal data assistant (PDA) devices, handheld or portable computers, global positioning system (GPS) receivers/navigators, digital cameras, audio/video players, video cameras, video recorders, and monitoring devices.
In order to increase voice signal bandwidth, a bandwidth extension technology is often used. Specifically, a signal encoding side (which comprises an encoder) encodes (codes) a low frequency band signal by using a core-layer encoder, and performs a linear predictive coding (LPC) analysis on a high frequency band signal, to obtain a high frequency band LPC coefficient. Then, a high frequency band excitation signal is obtained according to parameters such as pitch period, algebraic codebook, and gains that are obtained by the core-layer encoder. After the high frequency band excitation signal is processed by an LPC synthesis filter that is obtained by using an LPC parameter, a synthesized high frequency band signal is obtained. By comparing the original high frequency band signal with the synthesized high frequency band signal, a subframe gain and a global gain are obtained. The foregoing LPC coefficient is converted into a line spectral frequencies (LSF) parameter, and the LSF parameter, the subframe gain, and the global gain are quantized and coded. Finally, a bitstream obtained by means of coding is sent to a decoding side (which comprises a decoder).
After receiving the coded bitstream, the decoding side first parses information about the bitstream to determine whether any frame is lost. If no frame is lost, the bitstream is normally decoded; if the frame loss has occurred, the decoding side should recover the lost frame or frames. A method for recovering a lost frame by the decoding side is described in detail below.
110: Obtain a synthesized high frequency band signal of a current lost frame.
For example, the decoding side obtains a synthesized high frequency band excitation signal of the current lost frame according to a parameter of a previous frame of the current lost frame. Specifically, the decoding side may use an LPC parameter of the previous frame as an LPC parameter of the current lost frame, and obtain a high frequency band excitation signal by using parameters such as a pitch period, an algebraic codebook, and gains of the previous frame that are obtained by a core-layer decoder. The decoding side may use the high frequency band excitation signal as a high frequency band excitation signal of the current lost frame, and then process the high frequency band excitation signal by using an LPC synthesis filter that is generated by using the LPC parameter, to obtain the synthesized high frequency band signal of the current lost frame.
120: Obtain recovery information corresponding to the current lost frame. The recovery information includes at least one of the following: coding mode before the frame loss, frame class of the last frame received before the frame loss, and a quantity of continuously lost frames, where the quantity of the continuously lost frames is a quantity of frames that are continuously lost until the current lost frame.
The current lost frame is a lost frame that needs to be recovered by the decoding side at a current time.
The coding mode before the frame loss is a coding mode before the occurrence of a current frame loss event. Generally, to achieve better coding performance, an encoding side may classify signals before coding the signals, and select a suitable coding mode for the signal. At present, the coding modes may include: a silent frame coding mode (INACTIVE mode), an unvoiced frame coding mode (UNVOICED mode), a voiced frame coding mode (VOICED mode), a generic frame coding mode (GENERIC mode), a transition frame coding mode (TRANSITION mode), and an audio frame coding mode (AUDIO mode).
The frame class of the last frame received before the frame loss is a frame class of a last frame that is received at the decoding side before the occurrence of the current frame loss event. For example, if the encoding side sends four frames to the decoding side, and the decoding side correctly received the first frame and the second frame while the third frame and the fourth frame are lost, the last frame received before the frame loss is the second frame.
Generally, a frame can be classified as:
(1) a UNVOICED_CLAS frame: a frame that has any one of the following features: unvoiced sound, silence, noise, and end of voiced sound;
(2) a UNVOICED_TRANSITION frame: a frame of transition from unvoiced sound to voiced sound, where the voiced sound is on the onset and is still relatively weak;
(3) a VOICED_TRANSITION frame: a frame of transition after a voiced sound, where the feature of the voice sound is already very weak;
(4) a VOICED_CLAS frame: a frame that has a feature of a voiced sound, where a previous frame of this frame is a voiced frame or an onset of voiced frame;
(5) an ONSET frame: a frame with an onset of a obvious voiced sound;
(6) a SIN_ONSET frame: a frame with an onset of mixed harmonic and noise; or
(7) an INACTIVE_CLAS frame: a frame with an inactive feature.
The quantity of continuously lost frames is the quantity of frames that are continuously lost until the current lost frame in the current frame loss event. In essence, the quantity of continuously lost frames indicates a ranking of the current lost frame in the continuously lost frames.
For example, the encoding side sends five frames to the decoding side, the decoding side correctly receives the first frame and the second frame, and the third frame to the fifth frame are all lost. If the current lost frame is the fourth frame, the quantity of continuously lost frames is 2; or if the current lost frame is the fifth frame, the quantity of continuously lost frames is 3.
130: Determine a global gain gradient of the current lost frame according to the recovery information.
140: Determine a global gain of the current lost frame according to the global gain gradient and a global gain of each frame in previous M frames of the current lost frame, where M is a positive integer.
For example, the decoding side may weight the global gains of the previous M frames, and then determine the global gain of the current lost frame according to the weighted global gains of the previous M frames and the global gain gradient of the current lost frame.
Specifically, a global gain (FramGain) of the current lost frame may be represented by equation (1):
FramGain=f(α, FramGain(−m)) (1)
where FramGain(−m) represents a global gain of the mth frame in the previous M frames, and a represents the global gain gradient of the current lost frame.
For example, the decoding side may determine a global gain (FramGain) of the current lost frame according to the following equation (2):
Wm represents a weighting value that corresponds to the mth frame in the previous M frames, FramGain(−m) represents a global gain of the mth frame, and a represents the global gain gradient of the current lost frame.
It should be understood that the example of the foregoing equation (2) is not intended to limit the scope of this embodiment of the present application. A person skilled in the art may make various equivalent modifications or changes based on the equation (1), where these modifications or changes shall also fall within the scope of the present application.
Generally, to simplify the process of step 130, the decoding side may determine the global gain of the current lost frame according to a global gain of the previous frame of the current lost frame and the global gain gradient.
150: Adjust the synthesized high frequency band signal of the current lost frame according to the global gain of the current lost frame and a subframe gain of the current lost frame, to obtain a high frequency band signal of the current lost frame.
For example, the decoding side may set the subframe gain of the current lost frame to a fixed value, or the decoding side may determine the subframe gain of the current lost frame in a manner to be described below. Then, the decoding side may adjust the synthesized high frequency band signal of the current lost frame according to the global gain of the current lost frame and the subframe gain of the current lost frame, thereby obtaining the final high frequency band signal of the current lost frame.
In existing technology, the global gain gradient of the current lost frame is a fixed value, and the decoding side obtains the global gain of the current lost frame according to the global gain of the previous frame and the fixed global gain gradient. The adjusting the synthesized high frequency band signal according to the global gain of the current lost frame that is obtained by using this method may cause discontinuous transitions of the final high frequency band signal before and after the frame loss, and generation of severe noises. However, in this embodiment of the present application, the decoding side may determine the global gain gradient according to the recovery information, instead of simply setting the global gain gradient to a fixed value. The recovery information describes a related feature of the frame loss event, and therefore, the global gain gradient determined according to the recovery information is more accurate, so that the global gain of the current lost frame is also more accurate. The decoding side adjusts the synthesized high frequency signal according to the global gain, so that transitions of the re-established high frequency band signal can be natural and smooth, and the noises in the re-established high frequency band signal can be attenuated, thereby improving quality of the re-established high frequency band signal.
Optionally, in step 120, the foregoing global gain gradient α may be represented by an equation (3):
α=1.0−Delta*Scale (3)
where Delta represents an adjustment gradient of α, and a value of Delta may range from 0.5 to 1. Scale represents a tuning amplitude of α, which determines a degree at which the current lost frame follows the previous frame in a current condition, and may range from 0 to 1. A smaller value of Scale may indicate that energy of the current lost frame is closer to that of the previous frame, and a larger value may indicate that the energy of the current lost frame is rather weaker than that of the previous frame.
For example, the global gain gradient α is 1 if a coding mode of the current lost frame is the same as a coding mode of the last frame received before the frame loss, and the quantity of continuously lost frames is less than or equal to 3. Or, the global gain gradient α is 1 if a frame class of the current lost frame is the same as the frame class of the last frame received before the frame loss, and the quantity of continuously lost frames is less than or equal to 3.
For another example, in equation (3), if a value of Delta is 0.6, and a value of Scale is 0, then α is 1.
In a case in which it cannot be determined whether a coding mode of the current lost frame is the same as a coding mode of the last frame received before the frame loss or whether a frame class of the current lost frame is the same as the frame class of the last frame received before the frame loss, if the last frame received before the frame loss is an unvoiced frame or a voiced frame, and the quantity of continuously lost frames is less than or equal to 3, the decoder side may determine the global gain gradient to be less than or equal to a preset first threshold and greater than 0.
For example, the decoding side may determine that a is a relatively small value, that is, a may be less than the preset first threshold such as 0.5. If, in equation (3), a value of Delta is 0.65, and a value of Scale is 0.8, then a is 0.48.
In the foregoing embodiment, the decoding side may determine whether the coding mode or frame class of the last frame received before the frame loss is the same as the coding mode or frame class of the current lost frame according to the frame class of the last frame received before the frame loss and/or the quantity of continuously lost frames. For example, if the quantity of continuously lost frames is less than or equal to 3, the decoding side may determine that the coding mode or frame class of the last received frame is the same as the coding mode or frame class of the current lost frame. If the quantity of continuously lost frames is greater than 3, the decoding side cannot determine that the coding mode of the last received frame is the same as the coding mode of the current lost frame. For another example, if the last received frame is an onset frame of a voiced frame or an onset frame of an unvoiced frame, and the quantity of continuously lost frames is less than or equal to 3, the decoding side may determine that the frame class of the current lost frame is the same as the frame class of the last received frame. If the quantity of continuously lost frames is greater than 3, the decoding side cannot determine whether the coding mode of the last frame received before the frame loss is the same as the coding mode of the current lost frame, or whether the frame class of the last received frame is the same as the frame class of the current lost frame.
Optionally, in another instance, if it is determined that the last frame received before the frame loss is an onset frame of a voiced frame, or if it is determined that the last frame received before the frame loss is an audio frame or a silent frame, the decoding side may determine the global gain gradient, and make the global gain gradient to be greater than a preset first threshold.
Specifically, if the decoding side determines that the last frame received before the frame loss is an onset frame of a voiced frame, it may be determined that the current lost frame is probably a voiced frame, and accordingly, it may be determined that α is a relatively large value, that is, α may be greater than the preset first threshold. For example, in equation (3), a value of Delta may be 0.5, and a value of Scale may be 0.4.
If the decoding side determines that the last frame received before the frame loss is an audio frame or a silent frame, it may be also determined that α is a relatively large value, that is, α may be greater than the preset first threshold. For example, in equation (3), a value of Delta may be 0.5, and a value of Scale may be 0.4.
Optionally, as another embodiment, in a case in which it is determined that the last frame received before the frame loss is an onset frame of an unvoiced frame, the decoding side may determine the global gain gradient, and enable the global gain gradient to be less than or equal to a preset first threshold and greater than 0.
If the last frame received before the frame loss is an onset frame of an unvoiced frame, the current lost frame may be an unvoiced frame, and accordingly, the decoding side may determine that α is a relatively small value, that is, α may be less than the preset first threshold. For example, in equation (3), a value of Delta may be 0.8, and a value of Scale may be 0.65.
In addition, in addition to the cases indicated by the foregoing recovery information, in another case, the decoding side may determine that a is a relatively small value, that is, α may be less than the preset first threshold. For example, in equation (3), a value of Delta may be 0.8, and a value of Scale may be 0.75.
Optionally, a value range of the foregoing first threshold may be as follows: 0<the first threshold<1.
Optionally, as another embodiment, the decoding side may determine a subframe gain gradient of the current lost frame according to the recovery information; and determine the subframe gain of the current lost frame according to the subframe gain gradient and a subframe gain of each frame in previous N frames of the current lost frame, where N is a positive integer.
In addition to that the decoding side may determine the global gain gradient of the current lost frame according to the foregoing recovery information, the decoding side may also determine the subframe gain gradient of the current lost frame according to the foregoing recovery information. For example, the decoding side may weight subframe gains of the previous N frames, and then determine the subframe gain of the current lost frame according to the weighted subframe gains and the subframe gain gradient.
Specifically, a subframe gain (SubGain) of the current lost frame may be represented by an equation (4):
SubGain=f(β, SubGain(−n)) (4)
where SubGain(−n) represents a subframe gain of the nth frame in the previous N frames, and β represents the subframe gain gradient of the current lost frame.
For example, the decoding side may determine a subframe gain (SubGain) of the current lost frame according to an equation (5):
Wn represents a weighted value that corresponds to the nth frame in the previous N frames, SubGain(−n) represents a subframe gain of the nth frame, and β represents the subframe gain gradient of the current lost frame, where generally, β ranges from 1 to 2.
It should be understood that the example of the foregoing equation (5) is not intended to limit the scope of this embodiment of the present application. The person skilled in the art may make various equivalent modifications or changes based on the equation (4), and these modifications or changes also fall within the scope of the present application.
To simplify a process, the decoding side may determine the subframe gain of the current lost frame according to a subframe gain of the previous frame of the current lost frame, and the subframe gain gradient.
It can be seen that, in this embodiment, instead of simply setting a subframe gain of a current lost frame to a fixed value, the subframe gain of the current lost frame is determined after a subframe gain gradient is determined according to recovery information, and therefore, a synthesized high frequency band signal is adjusted according to the subframe gain of the current lost frame and a global gain of the current lost frame, so that transition of the high frequency band signal of the current lost frame can be natural and smooth, and noise in the high frequency band signal can be attenuated, thereby improving quality of the high frequency band signal.
Optionally, as another embodiment, in a case in which it cannot be determined whether the coding mode of the current lost frame is the same as the coding mode of the last frame received before the frame loss or whether the frame class of the current lost frame is the same as the frame class of the last frame received before the frame loss, if it is determined that the last frame received before the frame loss is an unvoiced frame, and the quantity of continuously lost frames is less than or equal to 3, the decoding side may determine the subframe gain gradient, and enable the subframe gain gradient to be less than or equal to a preset second threshold and greater than 0.
For example, the second threshold may be 1.5, and β may be 1.25.
Optionally, as another embodiment, in a case in which it is determined that the last frame received before the frame loss is an onset frame of a voiced frame, the decoding side may determine the subframe gain gradient, and enable the subframe gain gradient to be greater than a preset second threshold.
If the last frame received before the frame loss is an onset frame of a voiced frame, the current lost frame is probably a voiced frame, and the decoding side may determine that β is a relatively large value, for example, β may be 2.0.
In addition, for β, in addition to the two cases indicated by the foregoing recovery information, β may be 1 in another case.
Optionally, as another embodiment, a value range of the foregoing second threshold is as follows: 1<the second threshold<2.
210: Obtain a synthesized high frequency band signal of a current lost frame.
The decoding side may obtain the synthesized high frequency band signal of the current lost frame according to the prior art. For example, the decoding side may obtain a synthesized high frequency band excitation signal of the current lost frame according to a parameter of a previous frame of the current lost frame. Specifically, the decoding side may use an LPC parameter of the previous frame of the current lost frame as an LPC parameter of the current lost frame, and obtain a high frequency band excitation signal by using parameters such as a pitch period, an algebraic codebook, and gains of the previous frame that are obtained by a core-layer decoding. The decoding side may use the high frequency band excitation signal as a high frequency band excitation signal of the current lost frame, and then process the high frequency band excitation signal by using an LPC synthesis filter that is generated by using the LPC parameter, to obtain the synthesized high frequency band signal of the current lost frame.
220: Obtain recovery information corresponding to the current lost frame. The recovery information includes at least one of the following: coding mode before the frame loss, frame class of the last frame received before the frame loss, and a quantity of continuously lost frames, where the quantity of the continuously lost frames is a quantity of frames that are continuously lost until the current lost frame.
For description of the recovery information, refer to the description in the embodiment of
230: Determine a subframe gain gradient of the current lost frame according to the recovery information.
240: Determine a subframe gain of the current lost frame according to the subframe gain gradient and a subframe gain of each frame in previous N frames of the current lost frame, where N is a positive integer.
For example, the decoding side may weight the subframe gains of the previous N frames, and then determine the subframe gain of the current lost frame according to the weighted subframe gains of the previous N frames and the subframe gain gradient of the current lost frame.
Specifically, a subframe gain (SubGain) of the current lost frame may be represented by using the equation (4).
For example, the decoding side may determine a subframe gain (SubGain) of the current lost frame according to the equation (5).
It should be understood that the example of the foregoing equation (5) is not intended to limit the scope of this embodiment of the present application. The person skilled in the art may make various equivalent modifications or changes based on the equation (4), where these modifications or changes also fall within the scope of the present application.
To simplify the process, the decoding side may determine the subframe gain of the current lost frame according to a subframe gain of the previous frame of the current lost frame, and the subframe gain gradient.
250: Adjust the synthesized high frequency band signal of the current lost frame according to the subframe gain of the current lost frame and a global gain of the current lost frame, to obtain a high frequency band signal of the current lost frame.
For example, the decoding side may set a fixed global gain gradient according to the prior art, and then determine the global gain of the current lost frame according to the fixed global gain gradient and a global gain of the previous frame.
In existing technology, the decoding side sets the subframe gain of the current lost frame to a fixed value, and adjusts the synthesized high frequency band signal of the current lost frame according to the fixed value and the global gain of the current lost frame, which causes discontinuous transition of the final high frequency band signal before and after the frame loss, and generation of severe noise. However, in this embodiment of the present application, the decoding side may determine the subframe gain gradient according to the recovery information, and then determine the subframe gain of the current lost frame according to the subframe gain gradient, instead of simply setting the subframe gain of the current lost frame to the fixed value. The recovery information describes a related feature of a frame loss event, and therefore, the subframe gain of the current lost frame is more accurate. Therefore, the decoding side adjusts the synthesized high frequency signal according to the subframe gain, so that transition of the re-established high frequency band signal can be natural and smooth, and noise in the re-established high frequency band signal can be attenuated, thereby improving quality of the re-established high frequency band signal.
In this embodiment, a subframe gain gradient of a current lost frame is determined according to recovery information, a subframe gain of the current lost frame is determined according to the subframe gain gradient and a subframe gain of each frame in previous N frames of the current lost frame, and a synthesized high frequency band signal of the current lost frame is adjusted according to the subframe gain of the current lost frame and a global gain of the current lost frame, so that transition of a high frequency band signal of the current lost frame can be natural and smooth, and noise in the high frequency band signal can be attenuated, thereby improving quality of the high frequency band signal.
Optionally, as another embodiment, in a case in which it cannot be determined whether a coding mode of the current lost frame is the same as a coding mode of the last frame received before the frame loss or whether a frame class of the current lost frame is the same as the frame class of the last frame received before the frame loss, if it is determined that the last frame received before the frame loss is an unvoiced frame, and the quantity of continuously lost frames is less than or equal to 3, the decoding side may determine the subframe gain gradient, and enable the subframe gain gradient to be less than or equal to a preset second threshold and greater than 0.
For example, the second threshold may be 1.5, and β may be 1.25.
Optionally, as another embodiment, in a case in which it is determined that the last frame received before the frame loss is an onset frame of a voiced frame, the decoding side may determine the subframe gain gradient, and enable the subframe gain gradient to be greater than a preset second threshold.
If the last frame received before the frame loss is an onset frame of a voiced frame, the current lost frame is probably a voiced frame, and the decoding side may determine that β is a relatively large value, for example, β may be 2.0.
In addition, for β, in addition to the two cases indicated by the foregoing recovery information, β may be 1 in another case.
Optionally, as another embodiment, a value range of the foregoing second threshold may be as follows: 1<the second threshold<2.
It can be seen from the foregoing that, a decoding side may determine a global gain of a current lost frame according to this embodiment of the present application, and determine a subframe gain of the current lost frame according to the prior art; or a decoding side may determine a subframe gain of a current lost frame according to this embodiment of the present application, and determine a global gain of the current lost frame according to the prior art; or a decoding side may determine a subframe gain of a current lost frame and a global gain of the current lost frame according to this embodiment of the present application. All of the foregoing methods enable transition of a high frequency band signal of the current lost frame to be natural and smooth, and can attenuate noise in the high frequency band signal, thereby improving quality of the high frequency band signal.
301: Parse a frame loss flag in a received bitstream.
This process may be executed according to the prior art.
302: Determine whether a current frame is lost according to the frame loss flag.
If the frame loss flag indicates that the current frame is not lost, step 303 is executed.
If the frame loss flag indicates that the current frame is lost, steps 304 to 306 are executed.
303: If the frame loss flag indicates that the current frame is not lost, decode the bitstream to obtain the current frame.
If the frame loss flag indicates that the current frame is lost, steps 304 to 306 may be executed simultaneously, or steps 304 to 306 are executed in a specific sequence, which is not limited in this embodiment of the present application.
304: Determine a synthesized high frequency band signal of a current lost frame.
For example, the decoding side may determine a synthesized high frequency band excitation signal of the current lost frame according to a parameter of a previous frame of the current lost frame. Specifically, the decoding side may use an LPC parameter of the previous frame of the current lost frame as an LPC parameter of the current frame, and may obtain a high frequency band excitation signal by using parameters such as a pitch period, an algebraic codebook, and gains that are obtained by a core-layer decoding of the previous frame. The decoding side may use the high frequency band excitation signal as a high frequency band excitation signal of the current lost frame, and then process the high frequency band excitation signal by using an LPC synthesis filter that is generated by using the LPC parameter, to obtain the synthesized high frequency band signal of the current lost frame.
305: Determine a global gain of the current lost frame.
Optionally, the decoding side may determine a global gain gradient of the current lost frame according to recovery information of the current lost frame, where the recovery information may include at least one of the following: a coding mode before frame loss, a frame class of a last frame received before the frame loss, and a quantity of continuously lost frames; and then determine the global gain of the current lost frame according to the global gain gradient of the current lost frame and a global gain of each frame in previous M frames.
For example, optionally, the decoding side may further determine the global gain of the current lost frame according to the prior art. For example, the global gain of the current lost frame may be obtained by multiplying a global gain of the previous frame by a fixed global gain gradient.
306: Determine a subframe gain of the current lost frame.
Optionally, the decoding side may also determine a subframe gain gradient of the current lost frame according to the recovery information of the current lost frame, and then determine the subframe gain of the current lost frame according to the global gain gradient of the current lost frame and a subframe gain of each frame in previous N frames.
Optionally, the decoding side may determine the subframe gain of the current lost frame according to the prior art. For example, set the subframe gain of the current lost frame to a fixed value.
It should be understood that, to improve quality of a re-established high frequency band signal that corresponds to the current lost frame, if the global gain of the current lost frame is determined in step 305 according to the prior art, in step 306, the subframe gain of the current lost frame needs to be determined according to the method in the embodiment of
307: Adjust, according to the global gain of the current lost frame that is obtained in step 305 and the subframe gain of the current lost frame that is obtained in step 306, the synthesized high frequency band signal obtained in step 304, to obtain a high frequency band signal of the current lost frame.
The first determining unit 410 determines a synthesized high frequency band signal of a current lost frame. The second determining unit 420 determines recovery information that corresponds to the current lost frame, where the recovery information includes at least one of the following: a coding mode before frame loss, a frame class of a last frame received before the frame loss, and a quantity of continuously lost frames, where the quantity of continuously lost frames is a quantity of frames that are continuously lost until the current lost frame. The third determining unit 430 determines a global gain gradient of the current lost frame according to the recovery information. The fourth determining unit 440 determines a global gain of the current lost frame according to the global gain gradient and a global gain of each frame in previous M frames of the current lost frame, where M is a positive integer. A subframe gain of the current lost frame is determined. The adjusting unit 450 adjusts the synthesized high frequency band signal of the current lost frame according to the global gain of the current lost frame and the subframe gain of the current lost frame, to obtain a high frequency band signal of the current lost frame.
A fifth determining unit 460 may further be included. The fifth determining unit 460 may determine a subframe gain gradient of the current lost frame according to the recovery information. The fifth determining unit 460 may determine the subframe gain of the current lost frame according to the subframe gain gradient and a subframe gain of each frame in previous N frames of the current lost frame, where N is a positive integer.
For other functions and operations of the decoder 400, refer to the processes as depicted in
The memory 510 may be a random access memory, a flash memory, a read-only memory, a programmable read-only memory, a non-volatile memory, a register, or the like. The processor 520 may be a central processing unit (CPU).
The memory 510 is configured to store computer executable instructions. The processor 520 by executing the executable instructions stored in the memory 510, performs a series of tasks to: obtain a synthesized high frequency band signal of a current lost frame; obtain recovery information that corresponds to the current lost frame, where the recovery information includes at least one of the following: a coding mode before frame loss, a frame class of a last frame received before the frame loss, and a quantity of continuously lost frames, where the quantity of continuously lost frames is a quantity of frames that are continuously lost until the current lost frame; determine a global gain gradient of the current lost frame according to the recovery information; determine a global gain of the current lost frame according to the global gain gradient and a global gain of each frame in previous M frames of the current lost frame, where M is a positive integer; and adjust the synthesized high frequency band signal of the current lost frame according to the global gain of the current lost frame and a subframe gain of the current lost frame, to obtain a high frequency band signal of the current lost frame.
In one implementation manner, a global gain gradient of a current lost frame is determined according to recovery information, a global gain of the current lost frame is determined according to the global gain gradient and a global gain of each frame in previous M frames of the current lost frame, and a synthesized high frequency band signal of the current lost frame is adjusted according to the global gain of the current lost frame and a subframe gain of the current lost frame.
In an alternative implementation manner, a subframe gain gradient of the current lost frame is determined according to the recovery information, a subframe gain of the current lost frame is determined according to the subframe gain gradient and a subframe gain of each frame in previous N frames of the current lost frame. The synthesized high frequency band signal of the current lost frame is adjusted according to the subframe gain of the current lost frame and the global gain of the current lost frame.
By using the above-described process, transition of a high frequency band signal of the current lost frame can be natural and smooth, and noise in the high frequency band signal can be attenuated, thereby improving quality of the high frequency band signal.
For other functions and operations of the decoder 500, refer to the processes in the method embodiments in
Number | Date | Country | Kind |
---|---|---|---|
201310297740.1 | Jul 2013 | CN | national |
This application is a continuation of International Application No. PCT/CN2014/070199, filed on Jan. 7, 2014, which claims priority to Chinese Patent Application No. 201310297740.1, filed on Jul. 16, 2013. Both of the applications are hereby incorporated by reference in their entireties.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/CN2014/070199 | Jan 2014 | US |
Child | 14981956 | US |