This disclosure relates to the field of audio technology, and particularly relates to a method, apparatus, and device for transient noise detection.
Audio is a means of human-computer interaction, and noise interference exists in the working environment all the time. These noises will affect the application effect of audio, so it is necessary to detect the noise for further processing.
In the prior art, transient noise detection mainly analyzes the energy of the signal in a period of time according to the characteristics of the sharp increase of short-term energy of transient noise. If there is a sharp change in the signal energy, the signal in this period of time is detected as transient noise. However, the beginning of the audio signal, that is, the position point where the speech occurs, also has the similar characteristics of sudden energy change in a certain period of time. The accuracy of the scheme in the prior art is not high enough.
In a first aspect, a method for transient noise detection is provided. The method includes: obtaining a first audio frame signal having a preset duration, the audio frame signal includes a plurality of samples and an audio intensity value of each sample; performing wavelet decomposition on the first audio frame signal to obtain a first wavelet decomposition signal corresponding to the first audio frame signal, the first wavelet decomposition signal includes a plurality of sub-wavelet decomposition signals, and each sub-wavelet decomposition signal includes a plurality of samples and an audio intensity value of each sample; determining a first reference audio intensity value of a first sub-wavelet decomposition signal according to reference audio intensity values of all samples in the first sub-wavelet decomposition signal; determining energy distribution information of the first wavelet decomposition signal according to first reference audio intensity values of all sub-wavelet decomposition signals in the first wavelet decomposition signal; and determining a probability that the first audio frame signal is transient noise according to the energy distribution information of the first wavelet decomposition signal.
In one implementation, obtaining the first audio frame signals having the preset duration includes: obtaining a first audio signal, the first audio signal includes at least one audio frame signal, the at least one audio frame signal includes the first audio frame signal; for each audio frame signal, performing wavelet decomposition to obtain a plurality of wavelet decomposition signals corresponding to each audio frame signal; obtaining a wavelet signal sequence by splicing the wavelet decomposition signals corresponding to each audio frame signal according to a framing order of the at least one audio frame signal in the first audio signal.
The method further includes: obtaining a first minimum audio intensity value of a first preset number of consecutive samples in the wavelet signal sequence and a second minimum audio intensity value of a second present number of consecutive samples in the wavelet signal sequence, where the first preset number of consecutive samples includes a target sample and is before the target sample in the wavelet signal sequence, the second preset number of consecutive samples includes the target sample and is after the target sample in the wavelet signal sequence, and determining a second reference audio intensity value according to the first minimum audio intensity value and the second minimum audio intensity value; determining an average reference audio intensity value of the first audio frame signal according to second reference audio intensity values of all samples in the first wavelet decomposition signal; determining a first probability according to the average reference audio intensity value of the first audio frame signal.
Determining the probability that the first audio frame signal is transient noise according to the energy distribution information of the first wavelet decomposition signal includes: obtaining a second probability according to the energy distribution information of the first wavelet decomposition signal; and determining the probability that the first audio frame signal is transient noise according to the first probability and the second probability.
In one possible implementation, obtaining the first audio frame signals having the preset duration includes: obtaining a first audio signal, the first audio signal includes at least one audio frame signal, and at least one audio frame signal includes the first audio frame signal.
The method further includes: dividing the first audio signal to a plurality of processing signals, where each processing signal includes a third preset number of consecutive samples, an audio intensity value of each sample, and a frequency value of each sample, where the first audio signal includes a plurality of audio frame signals; determining a first smooth audio intensity value of a target sample according to an audio intensity value of a sample, the sample is in a previous processing signal of a first processing signal where the target sample is located and has a frequency value same as the target sample, and an audio intensity value of the target sample; determining an inhibition coefficient of the target sample according to a probability that an audio frame signal where the target sample is located is transient noise, the first smooth audio intensity value of the target sample, and the audio intensity value of the target sample; and performing suppression on an audio intensity value of each sample in an audio frame signal where the target sample is located to obtain a suppressed audio frame signal, according to inhibition coefficients of all samples in the audio frame signal where the target sample is located.
In one possible implementation, the method further includes: obtaining a probability that the first audio frame signal is transient noise and a probability that the second audio frame signal is transient noise, where the second audio frame signal is a previous audio frame signal of the first audio frame signal; and obtaining a first smoothing probability according to the probability that the first audio frame signal is the transient noise and the probability that the second audio frame signal is transient noise and using the first smoothing probability as the probability that the first audio frame signal is transient noise.
In one possible implementation, determining the average reference audio intensity value of the first audio frame signal according to the second reference audio intensity values of all samples in the wavelet decomposition signal includes: dividing the wavelet signal sequence to a plurality of signals to-be-smoothed, where each signal to-be-smoothed includes a fourth preset number of consecutive samples and an audio intensity value of each sample, each signal to-be-smoothed corresponds to a smoothing function, a time width of a definition domain of the smoothing function is not greater than a time width of the signal to-be-smoothed, a maximum value of a first smoothing function in the smoothing functions is located at a center of a definition domain of the first smoothing function; determining an average of audio intensity values of all samples in the first signal to-be-smoothed as a first average reference audio intensity value of all samples in the first smoothing signal; and performing convolution operation on the first average reference audio intensity value of all samples in each signal to-be-smoothed in the wavelet signal sequence and a corresponding smoothing function value to obtain a convolutional result, and using the convolutional result as an average reference audio intensity value of the first audio frame signal, where the smoothing function value is obtained according to the smoothing function and a time of a corresponding sample.
Optionally, before obtaining the first minimum audio intensity value of the first preset number of consecutive samples in the wavelet signal sequence, where the first preset number of consecutive samples includes the target sample and is before the target sample in the wavelet signal sequence, the method further includes: obtaining a third reference audio intensity of the target sample by multiplying an audio intensity value of a previous sample of the target sample in the wavelet signal sequence with a smoothing coefficient; obtaining a fourth reference audio intensity value of the target sample by multiplying a remaining smoothing coefficient with an average of audio intensity values of all consecutive samples in the wavelet signal sequence which includes the target sample and are spliced prior to the target sample in the wavelet signal sequence; and obtaining the audio intensity value of the target sample by adding the third reference audio intensity value with the fourth reference audio intensity value.
In one possible implementation, the reference audio intensity value includes an average and a variance of audio intensity values of a fifth preset number of consecutive samples.
In one possible implementation, the probability that the first audio frame signal is transient noise is expressed as
where result(n) represents energy distribution information of a wavelet decomposition signal corresponding to the nth audio frame signal, n represents an frame index indicating the nth audio frame signal, λ represents a first preset threshold, if a value of result(n) is greater than the first preset threshold, the probability that the first audio frame signal is transient noise is 1.
In one possible implementation, the energy distribution information of the first wavelet decomposition signal corresponding to the first audio frame signal is expressed as
where l represents the number of sub-wavelet decomposition signals included in the first wavelet decomposition signal, N represents the number of samples included in each sub-wavelet decomposition signal, n represents a frame index indicating the nth audio frame signal, xl(i) represents an audio intensity value of the lth sub-wavelet decomposition signal at the ith sample in a wavelet decomposition signal, ml1(i−1) represents an average of audio intensity values till the (i−1)th sample in the lth sub-wavelet decomposition signal, ml2(i−1) represents a variance of audio intensity values till the (i−1)th sample in the lth sub-wavelet decomposition signal.
In one possible implementation, determining the probability that the first audio frame signal is transient noise according to the energy distribution information of the first wavelet decomposition signal includes: obtaining a first average of audio intensity values of all samples in a first sub-wavelet decomposition signal and a second average of audio intensity values of all samples in a second sub-wavelet decomposition signal; and determining the probability that the first audio frame signal is transient noise according a ratio between the first average and the second average.
In one possible implementation, the second probability is expressed as
where thrg represents a second preset threshold, thrs represents a third preset threshold, n represents a frame index indicating the nth audio frame signal, Sc(n) represents an average reference audio intensity value of the nth audio frame signal.
In one possible implementation, before obtaining the first audio signal, the method further includes: compensating high-frequency components of a first preset threshold in an original audio signal having the preset duration to obtain the first audio signal.
In one possible implementation, performing wavelet decomposition on each audio frame signal includes: performing wavelet packet decomposition on each audio frame signal and using a signal obtained through wavelet packet decomposition as the wavelet decomposition signal.
In a second aspect, an apparatus for transient noise detection is provided. The apparatus includes an obtaining module, a decomposition module, and a determining module.
The obtaining module configured to obtain a first audio frame signal having a preset duration, the first audio frame signal includes a plurality of samples and an audio intensity value of each sample.
The decomposition module configured to perform wavelet decomposition on a first audio frame signal to obtain a first wavelet decomposition signal corresponding to the first audio frame signal, the first wavelet decomposition signal includes a plurality of sub-wavelet decomposition signals, and each sub-wavelet decomposition signal includes a plurality of samples and an audio intensity value of each sample.
The determining module is configured to determine a first reference audio intensity value of a first sub-wavelet decomposition signal according to reference audio intensity values of all samples in the first sub-wavelet decomposition signal.
The determining module is further configured to determine energy distribution information of the first wavelet decomposition signal according to first reference audio intensity values of all sub-wavelet decomposition signals in the first wavelet decomposition signal.
The determining module is further configured to determine a probability that the first audio frame signal is transient noise according to the energy distribution information of the first wavelet decomposition signal.
In a third aspect, a device for effective voice signal detection is provided. The device includes a transceiver, a processor, and a memory. The transceiver is coupled with the processor and the memory. The processor is coupled with the memory. The processor is configured to execute computer programs stored in the memory to carry out the method in any of the foregoing implementations.
In a fourth aspect, a non-transitory computer readable storage medium is provided. The non-transitory computer storage medium stores instructions which, when executed by the processor, are operable with the processor to carry out steps of the method in the foregoing aspects.
Technical solutions in embodiments of the disclosure will be clearly and completely described below in combination with the accompanying drawings of the disclosure. Obviously, the described embodiments are only part rather than all of the embodiments of the disclosure. Based on the embodiments provided herein, all other embodiments obtained by those skilled in the art without creative work belong to the protection scope of the application.
Disclosed herein are a method, apparatus, and device for transient noise detection, which count a preset number of continuous samples of a sub-wavelet decomposition signal in a wavelet decomposition signal corresponding to an audio frame signal and determine the probability that the audio frame signal is transient noise in a more refined time dimension, the accuracy of transient noise detection is therefore improved.
Implementations of the technical scheme of the present application is further described in detail below in combination with the accompanying drawings.
A method for transient noise detection provided herein will be described with reference to
At 100, an audio frame signal having a preset duration is obtained, the audio frame signal includes a plurality of samples and an audio intensity value of each sample. The audio frame signal can be referred to as first audio frame signal for explanation purpose. Specifically, an apparatus for transient noise detection obtains the audio frame signal having the preset duration, the preset duration can be comprehended as the frame length of the audio frame signal. The apparatus for transient noise detection obtains an original audio signal. Because the oral muscle movement is relatively slow relative to the audio frequency, and the audio signal is relatively stable in a short time range, the audio signal has short-term stability. Therefore, according to the short-term stability of the audio signal, framing can be performed on the audio signal to obtain audio frame signals each having a preset duration for detection. Optionally, there is no overlap between the audio frame signals, the size of frame shift is the size of frame length. Frame shift refers to an overlapping portion of a previous frame signal and a next frame signal. When the frame length equals to the frame shift, there is no overlap between audio frames. In one possible implementation, the apparatus for transient noise detection samples the audio signals at a frequency of 32 kHz, that is, 32 samples are collected in one second. Framing is performed on an audio signal with the frame length of 10 ms and the frame shift of 10 ms. One audio frame signal having a preset duration of 10 ms is obtained, each audio frame signal includes 320 samples and an audio intensity value corresponding to each sample.
At 101, wavelet decomposition is performed on a first audio frame signal to obtain a first wavelet decomposition signal corresponding to the first audio frame signal. The first wavelet decomposition signal includes a plurality of sub-wavelet decomposition signals, and each sub-wavelet decomposition signal includes a plurality of samples and an audio intensity value of each sample. Specifically, the audio frame signal is obtained at 100, then wavelet decomposition is performed on the first audio frame signal. Wavelet decomposition will be described below with reference to the accompanying drawings.
Referring to
The wavelet decomposition process will be described in detail. Exemplary, wavelet decomposition is performed on one audio frame signal. Referring to
In one possible implementation, performing wavelet decomposition on each audio frame signal includes: performing wavelet packet decomposition on each audio frame signal and using a signal obtained through wavelet packet decomposition as the wavelet decomposition signal.
Wavelet decomposition will be detailed below and reference can be made to
Process of wavelet packet decomposition in implementations of the disclosure will be detailed below. Exemplarily, wavelet packet decomposition is performed on one audio frame signal. Specifically, referring to
At 102, a first reference audio intensity value of the first sub-wavelet decomposition signal is determined according to the reference audio intensity value of all samples in the first sub-wavelet decomposition signal. Specifically, the reference audio intensity value includes an average and a variance of frequency intensities of a fifth preset number of consecutive samples.
Exemplarily, the fifth preset number is 3N−1, and the average (i) and the variance {circumflex over (Σ)}l(i) of frequency intensities of the fifth preset number of consecutive samples are expressed as:
l represents the number of sub-wavelet decomposition signals included in the first wavelet decomposition signal. N represents the number of samples in each sub-wavelet decomposition signal. Optionally, the sampling frequency of the first audio frame signal is 32 kHz, the frame length of the audio frame is 10 ms, and the number of samples is 320. After three level wavelet decomposition or wavelet packet decomposition, the number of all samples in each sub-wavelet decomposition signal is N=40. xl(j) represents the audio intensity value of the jth sample after the lth sub-wavelet decomposition signal is spliced into a sub-wavelet signal sequence. j represents an index of a sample in the sub-wavelet signal sequence. From j=i−(3N−1) to i, meaning that the average and the variance are calculated from the audio intensity values of the first 3N−1 samples prior to the ith sample, and representing the accumulation of three sub wavelet decomposition signals. (i) can be understood as the short-time average of all samples at the position of the ith sample in the lth sub-wavelet decomposition signal. {circumflex over (Σ)}l(i) can be understood as the short-time variance of all samples at the position of the ith sample in the lth sub-wavelet decomposition signal. It should be noted that the variance represented by {circumflex over (Σ)}l(i) is the variance in a broad sense, not the variance minus the average in the strict sense of mathematics. In this implementation, {circumflex over (Σ)}l(i) simply square the audio intensity value of the samples to obtain the degree of dispersion between the samples. ml1(i) represents an average of audio intensity values till the ith sample in the lth sub-wavelet decomposition signal. Mathematically, ml1(i) represents the first order moment of an expected value of an variable, and in this disclosure it can be understood as (i). ml2(i) represents a variance of audio intensity values till the ith sample in the lth sub-wavelet decomposition signal. Mathematically, ml2(i) represents the second order moment of an expected value of an variable, and in this disclosure it can be understood as {circumflex over (Σ)}l(i). According to the average ml1(i) and variance ml2(i) of audio intensity values of all samples in the first sub-wavelet decomposition signal, the first reference audio intensity value monentn(l) of the first sub-wavelet decomposition signal can be determined as:
xl(i) represents the audio intensity value of the ith sample in the wavelet decomposition signal of the lth sub-wavelet decomposition signal. i represents an index of a sample in a wavelet signal sequence. It can be understood that, j represents an index of a sample in a sub-wavelet signal sequence and is a temporary variable, i represents an index of a sample in a wavelet signal sequence. Optionally, i≥j.
At 103, energy distribution information of the first wavelet decomposition signal is determined according to the first reference audio intensity value of all sub-wavelet decomposition signals in the first wavelet decomposition signal. Specifically, calculates the sample distribution of all samples in the first sub-wavelet decomposition signal to estimate distribution concentration degree of the first audio frame signal. The first reference audio intensity value of all sub-wavelet decomposition signals in the sub-wavelet decomposition signal is obtained at step 102. Optionally, the energy distribution information of the first wavelet decomposition signal is determined according to the average of the first reference audio intensity value of all sub-wavelet decomposition signals in the first wavelet decomposition signal.
In one possible implementation, for example, three-level wavelet decomposition is performed on the first audio frame signal. The first level wavelet decomposition signal corresponding to the first audio frame signal includes eight sub-wavelet decomposition signals. According to the first reference audio intensity value momentn(l) of all sub-wavelet decomposition signals in the first sub-wavelet decomposition signal, the energy distribution information result(n) of the first wavelet decomposition signal is determined as:
l represents the number of sub-wavelet decomposition signals contained in the first wavelet decomposition signal. Optionally, l=8. N is the number of points included in each sub-wavelet decomposition signal. n represents a frame index and indicates the nth audio frame signal. xl(i) represents the audio intensity value of the ith sample in the lth sub-wavelet decomposition signal. ml1(i−1) represents an average of audio intensity values till the (i−1)th sample in the lth sub-wavelet decomposition signal, ml2(i−1) represents a variance of audio intensity values till the (i−1)th sample in the lth sub-wavelet decomposition signal.
At 104, according to the energy distribution information of the first wavelet decomposition signal, a probability that the first audio frame signal is transient noise is determined. Specifically, the energy distribution information of the first wavelet decomposition signal is obtained at step 103, and the energy distribution information represents a possible degree that a first audio frame signal corresponding to the first wavelet decomposition signal is transient noise. The energy distribution information is a value, which may be greater than 1. The probability that the first audio frame signal is transient noise is defined in a range from 0 to 1 according to the energy distribution information of the first wavelet decomposition signal.
According to the implementation, by counting the preset number of continuous samples in the wavelet decomposition signals corresponding to the audio frame signal and using the local microscopic characteristics of wavelet decomposition or wavelet packet decomposition, the audio frame signal can be detected in a finer time-dimension, and the accuracy of transient noise detection is improved.
In a possible implementation, the probability res(n) that the first audio frame signal is transient noise is determined according to the energy distribution information result(n) of the first wavelet decomposition signal as follows:
n represents the frame index and indicates the nth audio frame signal, λ represents a first preset threshold, result(n) is a specific value and represents the energy distribution information of a wavelet decomposition signal corresponding to the nth audio frame signal. If the value of result(n) is greater than the first preset threshold, then the probability that the first audio frame signal is transient noise is 1.
In another possible implementation, the probability res(n) that the first audio frame signal is transient noise is determined according to the energy distribution information result(n) of the first wavelet decomposition signal as follows:
n represents the frame index and indicates the nth audio frame signal, λ represents a first preset threshold, result(n) is a specific value and represents the energy distribution information of the first wavelet decomposition signal. If the value of result(n) is greater than the first preset threshold, then the probability that the first audio frame signal is transient noise is 1.
Formula 5 differs from Formula 6 in that, in Formula 5, square operation is performed, and the steepness of the curve is different. In the above two implementations, the probability that the first audio frame signal is transient noise can be defined in a range from 0 to 1, and the effect is shown in
In a possible implementation, transient noise can be detected as follows. Obtain a first average of audio intensity values of all samples in a first sub-wavelet decomposition signal and obtain a second average of audio intensity values of all samples in a second sub-wavelet decomposition signal, and determined the probability that the first audio frame signal is transient noise according to a ratio between the first average and the second average. Specifically, the first sub-wavelet decomposition signal and the second sub-wavelet decomposition signal correspond to different frequency bands of an audio frame signal, and the main frequency band of a human voice signal mainly falls into the range of 300 Hz to 3400 Hz, and distribution of transient noise in the whole frequency band is relatively even. Exemplary, the first sub-wavelet decomposition signal corresponds to a frequency band of 0˜2 kHz, and the second sub-wavelet decomposition signal corresponds to a frequency band of 2˜4 kHz. A ratio between the average of audio intensity values of all samples in the first sub-wavelet decomposition signal and the average of audio intensity values of all samples in the second sub-wavelet decomposition signal is determined, and the probability that the first audio frame signal is transient noise is determined according to the ratio of the first sub-wavelet decomposition signal and the second sub-wavelet decomposition signal. In one possible implementation, the wavelet decomposition signal corresponding to the audio frame signal includes multiple sub-wavelet decomposition signals. Optionally, ratios between any two sub-wavelet decomposition signals among all sub-wavelet decomposition signals in the wavelet decomposition signal is obtained, and the probability that the audio frame signal is transient noise is determined according to an average of the ratios.
In one possible implementation, the probability that the first audio frame signal is transient noise and the probability that the second audio frame signal is transient noise are determined, and the second audio frame signal is a previous audio frame signal of the first audio frame signal. Obtain a first smoothing probability according to the probability that the first audio frame signal is transient noise and the probability that the second audio frame signal is transient noise, and use the first smoothing probability as the probability that the first audio frame signal is transient noise. Specifically, to reduce the burr effect of transient noise probability distribution and ensure that detected transient noise has relatively stable appearance, the transient noise probability is smoothed. Exemplary, if the probability that the second audio frame signal is transient noise is greater than the probability that the first audio frame signal is transient noise, then a first smoothing probability is obtained according to the probability that the second audio frame signal is transient noise and the probability that the first audio frame signal is transient noise. The probability that the first audio frame signal is transient noise is expresses as res(n). Ds(n) is a defined variable for recording the probability that the first audio frame signal is transient noise. The probability that the second audio frame signal (which is a previous audio frame signal of the first audio frame signal) is transient noise is Ds(n−1), and the smoothing probability is:
When n=0, Ds(0)=0. The transient noise probability Ds(n) is used as the first smoothing probability.
Optionally, the audio frame signal is a signal obtained after framing of an original audio signal. In one possible implementation, first preset threshold high-frequency components in the original audio signal with a preset length are compensated, to obtain the first audio signal. Specifically, in the process of lip pronunciation or microphone recording, the speech signal loses high-frequency components, and with the increase of signal rate, the signal is greatly damaged in the transmission process. In order to get a better signal waveform at the receiver, it is necessary to compensate the damaged signal. In one possible implementation, pre-enhancement is performed on the original audio signal with the preset length. The audio intensity value of a sample is processed according to y(n)=x(n)−ax(n−1), where x(n) is the audio intensity value of the first audio signal at the nth moment, x(n−1) is the audio intensity value of the first audio signal at the (n−1)th moment, and a is a pre-enhancement coefficient. Exemplary, 0.9<a<1 and can be comprehended as the first present threshold. y(n) is the signal after pre-enhancement. The pre-enhancement can be considered as the first audio signal passes through a high-pass filter to compensate the high-frequency components, and high-frequency loss in lip pronunciation or microphone recording can be reduced.
In this implementation, the probability that the audio frame signal is transient noise is determined by counting the preset number of continuous samples of sub-wavelet decomposition signals in the wavelet packet decomposition signal corresponding to the audio frame signal and using the local microscopic characteristics of wavelet decomposition or wavelet packet decomposition, the accuracy of transient noise detection is improved.
After determining the probability that the first audio frame signal is transient noise, the first audio frame signal is suppressed according to the probability that the first audio frame signal is transient noise. In one possible implementation, referring to
At 801, a first audio signal is obtained, where the first audio signal incudes at least one audio frame signal. The at least one audio frame signal includes the first audio frame signal. Specifically, the first audio signal is obtained by an apparatus for transient noise detection. It can be understood that, the transient noise probability determining device frames the first audio signal to obtain the first audio frame signal. Then in combination with the implementations of
At 802, the first audio signal is divided into multiple processing signals, where each processing signal incudes third preset number of continuous samples, an audio intensity value and frequency value of each sample. The first audio signal includes multiple audio frame signals. Specifically, to obtain the result of noise suppression smoothly, short time Fourier Transform is performed on the first audio signal. Exemplary, the first audio signal is framed and a window function is applied. The “framing” here plays the same role as the “framing” described above, which is to divide the first audio signal into segments for processing. In the foregoing, the signal is wavelet decomposed, while here, the window signal is applied to the signal. Optionally, the frame length for framing of the first audio signal is 16 ms and the frame shift is 10 ms. It can be understood that, there is overlap between frames. Optionally, the window function can be a Hamming window expresses as:
Where i represent a sample index of the first audio signal, N represents the window length of the Hamming window. Optionally, N=512. The signal after the window function is applied can be expressed as:
y
n(i)=y(Ln+i)×w(i) Formula 9
Where n represents a frame index, yn(i) represents an audio intensity value of the ith sample of the nth frame and is a representation in time domain, i represents a sample index of the first audio signal, L represents the number of samples included in the time period of frame shift. Optionally, for example, the sampling frequency of the first audio signal is 32 kHz, and L=320.
Fourier transform is performed on the signal yn(i) after windowing, and the result is:
Where n represents a frame index, k represents a frequency, j represents an imaginary part in a Fourier transform formula, i represents a sample index of the first audio signal, N represents the window length of the Hamming window and can be comprehended as the third present number. A complex sequence obtained by Fourier transform is norm modeled to obtain the amplitude of the sample with frequency of k in the nth frame, which is expressed as Ya(n, k)=∥Y(n, k)∥. The amplitude can be comprehended as the audio intensity value of the sample. Exponential average is performed on the amplitude spectrum Ya(n, k) to obtain Ys(n, k) as the processing signal.
It can be understood that the processing signal contains multiple continuous samples as well as the audio intensity value and frequency value of each sample. Ys(n, k) represents the audio intensity value of the sample with frequency of k in the nth frame.
At 803, a first smooth audio intensity value of a target sample is determined according to an audio intensity value of a sample, the sample is in a previous processing signal of a first processing signal where the target sample is located and has a frequency value same as the target sample, and an audio intensity value of the target sample. Specifically, an audio intensity value Ya(n, k) of the target sample is obtained in step 802, the target sample has a frequency k, a first processing signal where the target sample is located is expressed as Ys(n, k), an audio intensity value of a processing signal prior to the first processing signal is Ys(n−1, k), a first smooth audio intensity value of the target sample is determined as (1−αa)×Ys(n−1, k)+αa×Ya(n, k), the first smooth intensity value is determined as the audio intensity value of the target sample at the first processing signal and is expressed as Ys(n, k)=(1−αa)×Ys(n−1, k)+αa×Ya(n, k). The first processing signal is determined according to first smooth audio intensity values of all samples in the first processing signal. Such smoothing can be comprehended as the exponential average mentioned in step 802. Optionally, αa ranges from 0 to 1, exemplary, αa=0.5.
At 804, an inhibition coefficient of the target sample is determined according to a probability that an audio frame signal where the target sample is located is transient noise, the first smooth audio intensity value of the target sample, and the audio intensity value of the target sample. Specifically, in combination with implementations described with reference to
It should be noted that, res(n) represents the probability that the audio frame is transient noise. The first smoothing intensity value Ys(n, k) and the audio intensity value Ya(n, k) are in one-to-one correspondence with samples in an audio frame signal. One audio frame signal may include multiple samples, and each sample includes the first smoothing intensity value Ys(n, k) and the audio intensity value Ya(n, k). The value of the probability res(n) that the audio frame is transient noise is in one-to-multiple correspondence with the first smoothing intensity value Ys(n, k) and the audio intensity value Ya(n, k).
In one possible implementation, if the device for transient noise detection smooths the probability of transient noise, according to Formula 7, the smoothed probability that the target sample is transient noise is Ds(n). res(n) in Formula 11 is replaced with Ds(n), and the inhibition coefficient of the target sample is expressed as:
At 805, suppression is performed on an audio intensity value of each sample in an audio frame signal where the target sample is located to obtain a suppressed audio frame signal, according to inhibition coefficients of all samples in the audio frame signal where the target sample is located. Specifically, the inhibition coefficient of the target sample is determined in step 804. Formula 11 can be comprehended as determining the inhibition coefficient according to a deviation degree of an audio intensity value of samples of the same frequency relative to an audio intensity value of a processing signal prior to the processing signal where the target sample is located. When the target sample has a signal amplitude, that is, Ya(n, k)>0, the audio intensity value of the target sample is greater than the audio intensity value of the target sample in the processing signal, that is, Ya(n, k)>Ys(n, k), suppression is performed on the result Y(n, k) of the Fourier transform in step 802. Otherwise, in other situations, when Ya(n, k)>Ys(n, k) or Ya(n, k)>0 is not satisfied, no suppression will be performed on the result Y(n, k) of the Fourier transform, and the result is multiplied with 1 to maintain the original amplitude value of the target sample. Therefore, the suppressed audio signal is Z(n, k)=Y(n, k)×G(n, k), which is a frequency-domain expression. In order to obtain audio information in time domain, Fourier transform needs to be performed on the suppressed audio signal, to obtain a time domain signal expressed as:
z(n, i) represents the audio intensity value of the ith sample in the nth frame signal. A window function of Hamming window is applied to the first audio signal in step 802, optionally, suppressed signal can be inversely transformed by Hamming window, to output signal z(Ln+i)=z(n, i)×winv(i) as an audio signal subjected to suppression in time domain. L represents the number of samples includes in a time period of frame shift. For example, the sampling frequency of the first audio frame signal is 32 kHz, L=320. winv(i) is an inverse transform representation of Hamming window w(i), which can be compared with Fourier transform and inverse Fourier transform.
In one possible implementation, first preset threshold high-frequency components in the original audio signal with a preset length are compensated, to obtain the first audio signal. Specifically, in the process of lip pronunciation or microphone recording, the speech signal loses high-frequency components, and with the increase of signal rate, the signal is greatly damaged in the transmission process. In order to get a better signal waveform at the receiver, it is necessary to compensate the damaged signal. In one possible implementation, pre-enhancement is performed on the original audio signal with the preset length. The audio intensity value of a sample is processed according to y(n)=x(n)−ax(n−1), where x(n) is the audio intensity value of the first audio signal at the nth moment, x(n−1) is the audio intensity value of the first audio signal at the (n−1)th moment, and a is a pre-enhancement coefficient. Exemplary, 0.9<a<1 and can be comprehended as the first present threshold. y(n) is the signal after pre-enhancement. The pre-enhancement can be considered as the first audio signal passes through a high-pass filter to compensate the high-frequency components, and high-frequency loss in lip pronunciation or microphone recording can be reduced.
In this implementation, the inhibition coefficient of transient noise is determined according to the probability of transient noise. The implementations described above with reference to
At 901, a first audio signal is obtained. The first audio signal includes at least one audio frame signal, for each audio frame signal, wavelet decomposition is performed to obtain a plurality of wavelet decomposition signals corresponding to each audio frame signal. Specifically, an apparatus for transient noise detection obtains the first audio signal with a preset length, and performs framing on the first audio signal to obtain the audio frame signal.
At 902, a wavelet signal sequence is obtained by splicing the wavelet decomposition signals corresponding to each audio frame signal according to a framing order of the at least one audio frame signal in the first audio signal.
It should be noted that, for details of wavelet decomposition on audio frame signals and obtaining the wavelet signal sequence by splicing the wavelet decomposition signals, reference can be made to the implementations described above with reference to
At 903, a first minimum audio intensity value of a first preset number of consecutive samples in the wavelet signal sequence and a second minimum audio intensity value of a second present number of consecutive samples in the wavelet signal sequence are obtained, where the first preset number of consecutive samples includes a target sample and is before the target sample in the wavelet signal sequence, the second preset number of consecutive samples includes the target sample and is after the target sample in the wavelet signal sequence, and determine a second reference audio intensity value according to the first minimum audio intensity value and the second minimum audio intensity value. Specifically, to avoid misjudging the sender of a voice signal as transient noise, in addition to determining the probability that the current frame signal is transient noise in implementations described with reference to
Exemplary, the duration of the signal to be tracked can be set in advance. It can be understood that, a duration of a forward tracking signal includes first preset number consecutive samples, and a duration of a backward tracking signal includes second preset number consecutive samples. Optionally, the first preset number is the same as the second preset number. In the wavelet signal sequence, all samples before the target sample are divided into tracking signals each with a preset duration, a minimum audio intensity value of all samples in a first duration is recorded and passed to the tracking signal in the next preset duration, the minimum audio intensity value passed from the previous preset duration is compared with an audio intensity value of a first sample in this preset duration, and the smaller of these two intensity values are recorded and compared with an audio intensity value of the next sample of the first sample, and so on. Each time the smaller of audio intensity values is recorded and compared with the audio intensity value of the next sample, to obtain a first minimum audio intensity value of the first preset number consecutive samples. Similarly, in the wavelet signal sequence, second preset number consecutive samples after the target sample are recorded and divided into tracking signals each with a preset duration. The operations for obtain the first minimum audio intensity value are performed. A minimum audio intensity value of all samples in a first duration is recorded and passed to the tracking signal in the next preset duration, the minimum audio intensity value passed from the previous preset duration is compared with an audio intensity value of a first sample in this preset duration, and the smaller of these two intensity values are recorded and compared with an audio intensity value of the next sample in this duration, and so on. Each time the smaller of audio intensity values is recorded and compared with the audio intensity value of the next sample, to obtain a second minimum audio intensity value of the second preset number consecutive samples. The larger of the first minimum audio intensity value and the second minimum audio intensity value is determined as the second reference audio intensity value of the target sample. Implementation of forward tracking voice signal and backward tracking voice signal will be described below with reference to the accompanying drawings.
At 904, an average reference audio intensity value of the first audio frame signal is determined according to second reference audio intensity values of all samples in the first wavelet decomposition signal. Specifically, the second reference audio intensity value of the target sample is determined in step 903, and the average of second reference audio intensity values of all samples in the first wavelet decomposition signal is calculated, to obtain the average reference audio intensity value of the first audio frame signal.
At 905, a first probability is determined according to the average reference audio intensity value of the first audio frame signal. Specifically, the average reference audio intensity value of the first audio frame signal is determined in step 904. Optionally, the first probability is:
thrg represents the second preset threshold, thrs represents the third preset threshold, n represents a frame index and indicates the nth audio frame signal, Sc(n) represents the average reference audio intensity value of the nth audio frame signal. Exemplary, thrg=2000, thrs=0.02. It can be understood that the first probability is the probability that the first audio frame signal is voice signal. The sum of the probability that the first audio frame signal is voice signal and the probability that the first audio frame signal is transient signal is 1.
At 906, a second probability is obtained according to energy distribution information of the first wavelet decomposition signal. Specifically, the second probability is a probability that the first audio frame signal is transient noise. The second probability is determined to be res(n) through the step 104 described above with reference to
At 907, the probability that the first audio frame signal is transient noise is determined according to the first probability and the second probability. Specifically, the first probability represents that a probability that the first audio frame signal is a voice signal is ps(n), and the second probability represents that a probability that the first audio frame signal is a transient noise is ydetect=res(n)×(1−ps(n)).
In one possible implementation, to reduce influence of burr between audio frame signals, the frame signals are smoothed. Optionally, an apparatus for transient noise detection divides the wavelet signal sequence into multiple signals to-be-smoothed, where each signal to-be-smoothed includes four preset number of consecutive samples and an audio intensity value of each sample. Each signal to-be-smoothed corresponds to one smoothing function. A time width of a definition domain of the smoothing function is not greater than a time width of the signal to-be-smoothed, a maximum value of a first smoothing function in the smoothing functions is located at a center of a definition domain of the first smoothing function. The signal to-be-smoothed can be comprehended as framing, the frame signal herein is movable and changes as the smoothing function moves. It can be understood that, the smoothing function has a definition domain, smoothing of all samples that having signals to-be-smoothed in the wavelet signal sequence can be achieved by moving the smoothing function. Exemplary, the smoothing function is:
M=2B+1, M is an odd number, the smoothing function sb(m) has the maximum function value at the center point m=B. Optionally, B=3 and represents 30 ms. According to Formula 15, the definition domain of the smoothing function is 0˜M.
An average of audio intensity values of all samples in the first signal to-be-smoothed is used as a first average reference audio intensity value of all samples in the first smoothing signal. Specifically, Sm(i) represents a second reference audio intensity value of the ith sample in the wavelet signal sequence, and is used for calculating an average of all second reference audio intensity values of all samples in the first signal to-be-smoothed. The first average reference audio intensity value of all samples in the first signal to-be-smoothed is represented as:
n represents a frame index and indicates the nth audio frame signal, N represents the number of samples in the sub-wavelet decomposition signal.
Convolution operation is performed on the first average reference audio intensity value of all samples of signals to-be-smoothed in the wavelet signal sequence and corresponding smoothing function values, and the result of the convolution operation (convolutional result) is used as an average reference audio intensity value of the first audio frame signal. The smoothing function value is obtained according to the smoothing function and the time of a corresponding sample. Specifically, the independent variable of the smoothing function is m, the dependent variable is sb(m), the first average reference audio intensity value is represented as Sfrm(n), and the first average reference audio intensity value of a sample which has the maximum value at the center point of the smoothing function is represented as Sfrm(n−m). Exemplary, the average reference audio intensity value of the first audio frame signal is Sc(n)=Σm=0M-1sb(m)·Sfrm(n−m).
In one possible implementation, time domain amplitude smoothing is performed on the samples in the wavelet sequence, to achieve smooth transition between adjacent samples of a voice signal and reduce the influence of burr on the voice signal. In one possible implementation, the apparatus for transient noise detection multiples an audio intensity value of a previous sample of the target sample in the wavelet signal sequence by a smoothing coefficient to obtain a third reference audio intensity of the target sample. Specifically, S(i−1) represents an audio intensity value of the previous sample of the target sample, and αs represents the smoothing coefficient. The audio intensity value S(i−1) of the previous sample of the target sample in the wavelet signal sequence is multiplied with the smoothing coefficient αs to obtain a third reference audio intensity of the target sample, which is αs×S(i−1).
The remaining smoothing coefficient is multiplied with an average of audio intensity values of all consecutive samples in the wavelet signal sequence which includes the target sample and prior to the target sample in the wavelet signal sequence, to obtain a fourth reference audio intensity value of the target sample. Specifically, the third reference audio intensity value is part of the time-domain smoothing result, and the result obtained as follows is another part of the time-domain smoothing result: the remaining smoothing coefficient is multiplied with the average of audio intensity values of all consecutive samples in the wavelet signal sequence, where the consecutive samples include the target sample and prior to the target sample in the wavelet signal sequence. Exemplary, 3-level packet decomposition is performed on the first audio signal, and the wavelet signal sequence includes eight wavelet packet decomposition signals, in this case, the average M(i) of audio intensity values of all consecutive samples prior to the target sample is:
M(i)=⅛Σl=18xl(i) Formula 17
In Formula 17, i represents the ith sample in the wavelet signal sequence, l represents the lth sub-wavelet decomposition signal. It can be understood that, i is less than the total number of all samples in the wavelet signal sequence. The remaining smoothing coefficient 1−αs is multiplied with the average M(i) of audio intensity values of all consecutive samples in the wavelet signal sequence which includes the target sample and prior to the target sample in the wavelet signal sequence, to obtain a fourth reference audio intensity value of the target sample, and the fourth reference audio intensity value is M(i)×(1−αs).
The third reference audio intensity value is added with the fourth reference audio intensity value, the result thus obtained is used as the audio intensity value of the target sample. Specifically, the third reference audio intensity value is αs×S(i−1) and the fourth reference audio intensity value is M(i)×(1−αs), the audio intensity value of the target sample is obtained by adding the third reference audio intensity value and the fourth reference audio intensity value: S(i)=αs×S(i−1)+M(i)×(1−αs).
In one possible implementation, a probability that the first audio frame signal is transient noise and a probability that the second audio frame signal is transient noise are obtained, the second audio frame signal is the previous audio frame signal of the first audio frame signal. A first smoothing probability is obtained according to the probability that the first audio frame signal is transient noise and the probability that the second audio frame signal is transient noise, and the first smoothing probability is used as the probability that the probability that the first audio frame signal is transient noise. Specifically, to reduce the burr effect of transient noise probability distribution and ensure that detected transient noise has relatively stable appearance, the transient noise probability is smoothed. Exemplary, if the probability that the second audio frame signal is transient noise is greater than the probability that the first audio frame signal is transient noise, then a first smoothing probability is obtained according to the probability that the second audio frame signal is transient noise and the probability that the first audio frame signal is transient noise. The probability that the first audio frame signal is transient noise is expresses as ydetect(n). Ds(n) is a defined variable for recording the probability that the first audio frame signal is transient noise. The probability that the second audio frame signal (which is a previous audio frame signal of the first audio frame signal) is transient noise is Ds(n−1), and the smoothing probability is:
When n=0, Ds(0)=0, the probability Ds(n) of the transient noise is the first smoothing probability.
In one possible implementation, the transient noise can be detected as follows: a first average of audio intensity values of all samples in a first sub-wavelet decomposition signal and a second average of audio intensity values of all samples in a second sub-wavelet decomposition signal, and the probability that the first audio frame signal is transient noise is determined according a ratio between the first average and the second average. Specifically, the first sub-wavelet decomposition signal and the second sub-wavelet decomposition signal correspond to different frequency bands of the audio frame signal, the main frequency band of human voice signal however mainly falls into the range of 300 Hz to 3400 Hz. Exemplary, the first sub-wavelet decomposition signal corresponds to a frequency band of 0˜2 kHz, and the second sub-wavelet decomposition signal corresponds to a frequency band of 2˜4 kHz. A ratio between the average of audio intensity values of all samples in the first sub-wavelet decomposition signal and the average of audio intensity values of all samples in the second sub-wavelet decomposition signal is determined, and the probability that the first audio frame signal is transient noise is determined according to the ratio of the first sub-wavelet decomposition signal and the second sub-wavelet decomposition signal. In one possible implementation, the wavelet decomposition signal corresponding to the audio frame signal includes multiple sub-wavelet decomposition signals. Optionally, ratios between any two sub-wavelet decomposition signals among all sub-wavelet decomposition signals in the wavelet decomposition signal is obtained, and the probability that the audio frame signal is transient noise is determined according to an average of the ratios.
In one possible implementation, first preset threshold high-frequency components in the original audio signal with a preset length are compensated, to obtain the first audio signal. Specifically, in the process of lip pronunciation or microphone recording, the speech signal loses high-frequency components, and with the increase of signal rate, the signal is greatly damaged in the transmission process. In order to get a better signal waveform at the receiver, it is necessary to compensate the damaged signal. In one possible implementation, pre-enhancement is performed on the original audio signal with the preset length according to y(n)=x(n)−ax(n−1), where x(n) is the audio intensity value of the first audio signal at the nth moment, x(n−1) is the audio intensity value of the first audio signal at the (n−1)th moment, and a is a pre-enhancement coefficient. Exemplary, 0.9<a<1 and can be comprehended as the first present threshold. y(n) is the signal after pre-enhancement. The pre-enhancement can be considered as the first audio signal passes through a high-pass filter to compensate the high-frequency components, and high-frequency loss in lip pronunciation or microphone recording can be reduced.
In this implementation, the probability of a voice signal is determined by forward tracking and backward tracking of distribution of audio intensity values of the voice signal with a preset duration, and the probability that the audio frame signal is transient noise is determined according to the probability that the audio frame signal is a voice signal and the probability that the audio frame signal is transient noise, as such, it is possible to avoid the false detection of the initial position of voice signal as transient noise, and further improve the accuracy of transient noise probability.
In one possible implementation, after determining the probability that the first audio frame signal is transient noise, the first audio frame signal is suppressed according to the probability that the first audio frame signal is transient noise. In one possible implementation, the first audio frame signal can be suppressed in combination with the implementation described with reference to
Specifically, the probability ydetect(n) that the audio frame signal where the target sample is located is transient noise is determined through the implementation of
In one possible implementation, if the apparatus for transient noise detection performs smoothing on the transient noise probability, the smoothing probability Ds(n) that the target sample is transient noise is determined according to Formula 18, and the inhibition coefficient G(n, k) of the target sample is determined according to Formula 12.
Suppression is performed on an audio intensity value of each sample in an audio frame signal where the target sample is located to obtain a suppressed audio frame signal, according to inhibition coefficients of all samples in the audio frame signal where the target sample is located.
It can be understood that, suppression of transient noise can be realized with reference to the implementation described with reference to
According to this implementation, tracking and smoothing in spectral domain are performed on audio intensity values of preset number of consecutive samples prior to the target sample and preset number of consecutive samples after the target sample in the wavelet signal sequence, the probability that the audio frame signal is a voice signal is determined according to all samples in the wavelet decomposition signal corresponding to the audio frame signal, and the probability that the audio frame signal is transient noise is affected by the probability that the audio frame signal is a voice signal, which can improve the accuracy of transient noise detection.
Forward tracking and backward tracking of a voice signal will be described below with reference to the accompanying drawings. Reference is made to
1000
a, the audio intensity value of each of first preset number of consecutive samples before the target sample in the wavelet signal sequence is obtained. Specifically, the audio intensity value of a sample before the target sample is obtained according to the location of the target sample in the wavelet signal sequence, and proceed to step 1001a.
1000
b, the audio intensity value of each of second preset number of consecutive samples after the target sample in the wavelet signal sequence is obtained. Specifically, the audio intensity value of a sample after the target sample is obtained according to the location of the target sample in the wavelet signal sequence, and proceed to step 1001b.
1001
a, perform first minimum controlled regressive averaging (MCRA). An input of the first MCRA is the audio intensity values of first preset number of samples before the target sample in the wavelet signal sequence, and the first MCRA aims to obtain the minimum value of the audio intensity values of first preset number of samples. MCRA will be introduced with reference to the drawings and reference is made to the following implementations.
1001
b, perform second MCRA. An input of the second MCRA is the audio intensity values of second preset number of samples after the target sample in the wavelet signal sequence, and the second MCRA aims to obtain the minimum value of the audio intensity values of second preset number of samples. The first MCRA and the second MCRA can be considered as the same procedure with different inputs and outputs but with the same purpose, that is, obtaining the minimum value of audio intensity values of preset number of samples. MCRA will be introduced with reference to the drawings and reference is made to the following implementations.
1002
a, a first minimum audio intensity value of first preset number of consecutive samples is determined as Smin. Specifically, the result of the first MCRA in step 1001a is determining Smin as the first minimum audio intensity value of first preset number of consecutive samples.
1002
b, a second minimum audio intensity value of second preset number of consecutive samples is determined as Suc_min. Specifically, the result of the second MCRA in step 1001b is determining Suc_min as the second minimum audio intensity value of second preset number of consecutive samples.
1003, a larger one among the first minimum audio intensity value and the second minimum audio intensity value is obtained as the second reference audio intensity value of the target sample.
1004, a probability that the first audio frame is a voice signal is determined according to second reference audio intensity values of all samples in the first audio frame signal, to determine a probability that the first audio frame is transient noise. Specifically, reference can be made to the implementation of
MCRA will be detailed below. Reference is made to
10011, an apparatus for transient noise detection defines a sample index i=0, initializes the audio intensity value of the sample S(0)=M(0) and a sample accumulating index imod=0. Specifically, i=0, S(0)=M(0), imod=0. It can be considered that, in an initial state of the apparatus for transient noise detection, initial values of samples to be traversed and corresponding audio intensity values are defined, and the sample accumulating index is for controlling a preset duration. As such, when the value of sample accumulating index imod reaches a certain value, data will be updated and tracking of a signal with a preset duration is completed.
10012, i=i+1, the audio intensity value of the ith sample is S(i)=αs×S(i−1)+M(i)×(1−αs). Specifically, audio intensity value of a sample is tracked, in other words, energy distribution is tracked, i=i+1. Amplitude smoothing is performed on each traversed sample, and the smoothed audio intensity value of the ith sample is S(i)=αs×S(i−1)+M(i)×(1−αs). Optionally, αs=0.7.
10013, determine whether i is less than the accumulating number of samples Vwin. Specifically, in this implementation, tracking is performed on a signal with a preset duration, therefore the samples need to be accumulated. The accumulating number Vwin of samples is predefined, for example, Vwin=20. For the 0 to 19th sample, the operation at step 10013a is performed, and when traversing the 20th sample, the operation at step 10013b is performed.
10013
a, if i<Vwin, define Emin=S(i), Emact=S(i). Specifically, traversing is performed from first sample in the wavelet signal sequence, and audio intensity smoothing is performed on samples, if i<Vwin, the value of S(i) is assigned to Emin and Emact, that is, Emin=S(i), Emact=S(i). Proceed to step 10014 for sample accumulating. Exemplary, i=i+1, it can be considered that the apparatus for transient noise detection keeps tracking of audio intensity values of samples, if i<Vwin, it indicates the first Vwin samples in the first audio signal, for example, Vwin=20. When traversed the 19th sample, Emin=S(19), Emact=S(19), where Emin and Emact records the audio intensity value of the 19th sample.
10013
b, obtain the minimum audio intensity value of the Vwin-th sample to the ith sample, Emin=min (Emin, S(i)), Emact=min (Emact, S(i)). Specifically, if i≥Vwin, when traversing the Vwin-th sample, for example, Vwin=20, exemplary, when step 10013 traversing the 20th sample, the less one among the 19th sample and the 20th sample is assigned to Emin, Emin=min (Emin, S(20)), Emin in a previous step 10013 of traversing the 20th sample has the value of S(19) recorded.
10014, imod=imod+1, specifically, during traversing sample i, sample accumulation imod is also accumulated, imod=imod+1, and imod controls whether data updating is performed on matrix SW. The wavelet signal sequence is divided into voice signals each of a preset duration for tracking. It can be understood that, i represents the location and order of samples in the wavelet signal sequence, imod represents the location and order of the ith sample in the preset duration. When reaching the preset duration, imod will be reset, to restart to record the location of a sample in a next wavelet signal sequence in the next preset duration.
10015, determine whether imod=Vmin. Specifically, compare imod with Vmin to determine whether tracking of a sample has reached a preset duration. Exemplary, 3-level wavelet packet decomposition and down-sampling are performed with a sampling frequency of the first audio signal of 32 kHz, then sampling is performed every 0.25 ms in the wavelet signal sequence, the sample accumulating number Vwin=20, the tracking duration is Vwin×0.25=5 ms. If imod=Vmin, it indicates that the tracking preset duration has been reached and proceed to step 10017a, if imod≠Vmin, optionally, if imod<Vmin, proceed to step 10017b.
10016, imod=0. Specifically, each time imod reaches the sample accumulating number Vwin, imod is released. Reset imod=0 for next sample accumulation.
10017, determine whether i=Vmin. Specifically, when i=Vmin, proceed to step 10017a, initialize matrix data; when i≠Vmin, proceed to step 10017b.
10017
a, initialize matrix SW. Specifically, SW is defined as:
When i=Vmin, define a matrix SW of Ni, rows and one column, optionally, Nwin=2. It can be understood that, this step starts at the beginning of a voice, i is accumulating, Vwin is a preset fixed value, when i traverses the Vwin-th sample, the matrix SW is initialized to provide a matrix to store data in this implementation.
10017
b, data in the matrix SW is updated and the minimum value Emin=min{SW} in the matrix is recorded, reset Emact=S(i). Specifically, SW is:
When i≠Vmin and imod is accumulated to the preset duration, the values in the matrix SW are updated to place the minimum value of all samples in the current duration and the minimum value in the previous duration into the matrix SW, to achieve energy tracking of samples included in a preset duration before the target sample, the smaller one among the above two minimum values is obtained and recorded in Emin, where Emin=min {SW}. It can be understood that, Emin records the minimum value of all samples starting from the previous sample of Vmin, release Emact and reset Emact=S(i). Exemplary, the tracking duration is 5 ms, Emact records the minimum value of audio intensity values of all samples in recent 5 ms, the minimum value of an adjacent 5 ms is placed in a matrix SW with a length of 2, the smaller one of these two minimum values are obtained and recorded in Emin, Emin=min {SW}. As such, in the first MCRA, Emin represents the first minimum audio intensity value Smin of first present number of consecutive samples.
In the second MCRA, track second preset number of consecutive samples from the target sample, one MCRA procedure is performed for each sample to obtain Emin, where Emin represents a second minimum audio intensity value Suc_min of second preset number of samples. Specifically, before sample accumulating, determine the location of the sample in the wavelet signal sequence, and determine whether there are still second preset number of consecutive points after sample i. Exemplary, the condition for determination is:
i<L
s
−N
nc Formula 22
Ls is the number of samples in the wavelet signal sequence. For example, the sampling frequency of the first audio is 32 kHz and 3-level wavelet decomposition is performed. In one second, Ls=4000. Nuc represents the number of second present number of consecutive samples. Optionally, Nuc=160.
If i<Ls−Nnc, track second preset number of consecutive samples starting from the target sample, record the audio intensity values corresponding to second present number of consecutive samples as an independent short-time sequence, which is represented as:
=[M(i)M(i+1) . . . M(i+Nuc−1)]1×N
Nuc represents the number of the second preset number of consecutive samples. Optionally, Nuc=160. M(i) represents the audio intensity value of the ith sample. It can be understood that, backward track the energy distribution of Nuc samples to obtain the second minimum audio intensity value Suc_min of the second preset number of samples, which is expressed as:
S
uc_min=MCRA() Formula 24
Formula 24 can be understood as follows. The output Emin of MCRA is assigned to Suc_min as the second minimum audio intensity value of the second preset number of consecutive samples. As such, the second MCRA obtains the second minimum audio intensity value of the second preset number of consecutive samples after the target sample.
10018, determine whether i≥the total number of samples. Specifically, before re-tracking the signal in the preset time period in step 10011, position of a sample in the wavelet signal sequence needs to be determined, and determine whether i relating to the ith sample is greater than or equal to the total number of samples in the wavelet signal sequence. Since i continuous to be added by 1, and traversing of the samples is moving backward, if i is less than the total number of samples in the wavelet signal sequence, signal tracking continuous. If the ith sample is the last one of all samples, that is, i is equal to or greater than the total number of samples, the above procedure is ended and the signal tracking of the wavelet signal sequence is completed.
10019, determine Emin as the minimum audio intensity value. Specifically, audio intensity values of preset number of samples are recorded in a matrix and the minimum value in the matrix is obtained and assigned to Emin, thus obtain the first minimum audio intensity value and the second minimum audio intensity value. As can be seen from step 10017b, during the first MCRA, the first minimum audio intensity value of the first preset number of samples before the target sample in the wavelet signal sequence is obtained according to Formula 21, the value of Emin is Smin, during the second MCRA, according to Formula 23 and Formula 24, the value of Emin outputted is Suc_min, which represents the second minimum audio intensity value of the second preset number of samples after the target sample in the wavelet signal sequence. As such, tracking of energy distribution of samples before the target sample and energy distribution of samples after the target sample is completed.
Next, in combination with steps 1003 and 1004 in the implementation of
S
m(i)=max{Suc_min,Smin} Formula 25
If there is no second preset number of consecutive samples after the ith sample, the first minimum audio intensity value is determined as the second reference audio intensity value of the target sample. Specifically, when sample i is traversed, the number of samples after sample i is decreasing, and when the i<Ls−Nnc in Formula 22 is not satisfied, the second reference audio intensity value of the target sample is:
S
m(i)=Smin Formula 26
The first average reference audio intensity value is determined according the second reference audio intensity value Sm(i) of the target sample and Formula 16, then the average reference audio intensity value of the first audio frame signal is determined. Next, the probability that the first audio frame is a voice signal is determined according to Formula 14, then the probability that the first audio frame signal is transient noise is determined according to the probability of the voice signal and the probability of the transient noise: ydetect=res(n)×(1−ps(n)).
In this implementation, the minimum value Smin of audio intensity values of all samples in the previous tracking duration is transferred to the current tracking duration through a matrix, Smin is compared with the audio intensity value of the first sample in the current tracking duration, the smaller one of these two is further compared with the audio intensity value of a subsequent sample of the first sample, and so on. The first minimum audio intensity value of the first preset number of samples, which include the target sample and are before the target sample in the wavelet signal sequence, is obtained. In addition, the second minimum audio intensity value of the second preset number of consecutive samples after the target sample in the wavelet signal sequence is determined, and an independent short-time sequence is formed by accumulated recording of the second preset number of consecutive samples. Tracking is initiated and a matrix is used for tracking of audio intensity values of the second preset number of consecutive samples recorded in the short-time sequence, the implementation is similar to the principle of tracking the first preset number of consecutive samples spliced before the target sample in the wavelet signal sequence. The second minimum audio intensity value Suc_min in the current tracking duration is transferred to the next tracking duration, Suc_min is compared with the audio intensity value of the first sample in the next tracking duration, and the smaller one of these two is compared with the audio intensity value of the subsequent sample of the first sample, and so on. The second minimum audio intensity value of the second preset number of samples, which include the target sample and are after the target sample in the wavelet signal sequence, is obtained. The larger one of the first audio intensity value and the second audio intensity value is obtained as the second reference audio intensity value Sm(i) of the target sample. The sample sequence composed of Sm(i) can describe the distribution of audio intensity values of the voice signal, or can be comprehended as the energy distribution tendency of the voice signal. The probability that the audio frame is a voice signal can be determined according to second reference audio intensity values of all samples in the audio frame, so as to determine the probability that the audio frame is transient noise.
In this implementation, by tracking the energy distribution of a signal with a stable duration, the probability that the audio frame signal is a voice signal is detected, and the probability that the audio frame is transient noise can be determined according to the probability that the signal frame is a voice signal and the probability that the signal frame is transient noise, this avoids the false detection of the audio frame of the voice signal as transient noise, and can further improve the accuracy of transient noise detection.
Effects of the implementation will be described with reference to
Referring to
An apparatus for transient noise detection provided in implementations of the disclosure will be described below. Referring to
The obtaining module 1401 is configured to obtain an audio frame signal having a preset duration, the audio frame signal includes a plurality of samples and an audio intensity value of each sample.
The decomposition module 1402 is configured to perform wavelet decomposition on a first audio frame signal to obtain a first wavelet decomposition signal corresponding to the first audio frame signal, the first wavelet decomposition signal includes a plurality of sub-wavelet decomposition signals, and each sub-wavelet decomposition signal includes a plurality of samples and an audio intensity value of each sample.
The determining module 1403 is configured to determine a first reference audio intensity value of a first sub-wavelet decomposition signal according to reference audio intensity values of all samples in the first sub-wavelet decomposition signal.
The determining module 1403 is configured to determine energy distribution information of the first wavelet decomposition signal according to first reference audio intensity values of all sub-wavelet decomposition signals in the first wavelet decomposition signal.
The determining module 1403 is configured to determine a probability that the first audio frame signal is transient noise according to the energy distribution information of the first wavelet decomposition signal.
In one possible implementation, the obtaining module 1401 is further configured to obtain a first audio signal. The first audio signal includes at least one audio frame signal, and for each audio frame signal, the obtaining module 1401 is configured to perform wavelet decomposition to obtain a plurality of wavelet decomposition signals corresponding to each audio frame signal.
The apparatus 14 further includes a splicing module 1404. The splicing module 1404 is configured to obtain a wavelet signal sequence by splicing the wavelet decomposition signals corresponding to each audio frame signal according to a framing order of the at least one audio frame signal in the first audio signal.
The obtaining module 1401 is further configured to obtain a first minimum audio intensity value of a first preset number of consecutive samples in the wavelet signal sequence and a second minimum audio intensity value of a second present number of consecutive samples in the wavelet signal sequence, where the first preset number of consecutive samples includes a target sample and is before the target sample in the wavelet signal sequence, the second preset number of consecutive samples includes the target sample and is after the target sample in the wavelet signal sequence.
The determining module 1403 is further configured to determine a second reference audio intensity value according to the first minimum audio intensity value and the second minimum audio intensity value in the obtaining module 1401, determine an average reference audio intensity value of the first audio frame signal according to second reference audio intensity values of all samples in the first wavelet decomposition signal, determine first probability according to the average reference audio intensity value of the first audio frame signal, obtain second probability according to the energy distribution information of the first wavelet decomposition signal, and determine the probability that the first audio frame signal is transient noise according to the first probability and the second probability.
In one possible implementation, the obtaining module 1401 is further configured to obtain a first audio signal. The first audio signal includes at least one audio frame signal.
The apparatus 14 further includes a dividing module 1405, which is configured to divide the first audio signal to a plurality of processing signals, where each processing signal includes a third preset number of consecutive samples, an audio intensity value of each sample, and a frequency value of each sample, where the first audio signal includes a plurality of audio frame signals.
The determining module 1403 is further configured to determine a first smooth audio intensity value of a target sample according to an audio intensity value of a sample, the sample is in a previous processing signal of a first processing signal where the target sample is located and has a frequency value same as the target sample, and an audio intensity value of the target sample.
The determining module 1403 is further configured to determine an inhibition coefficient of the target sample according to a probability that an audio frame signal where the target sample is located is transient noise, the first smooth audio intensity value of the target sample, and the audio intensity value of the target sample.
The apparatus 14 further includes a suppression module 1406, which is configured to perform suppression on an audio intensity value of each sample in an audio frame signal where the target sample is located to obtain a suppressed audio frame signal, according to inhibition coefficients of all samples in the audio frame signal where the target sample is located.
In one possible implementation, the obtaining module 1401 is further configured to obtain a probability that the first audio frame signal is transient noise and a probability that the second audio frame signal is transient noise, where the second audio frame signal is a previous audio frame signal of the first audio frame signal.
The obtaining module 1401 is further configured to obtain a first smoothing probability according to the probability that the first audio frame signal is the transient noise and the probability that the second audio frame signal is transient noise and use the first smoothing probability as the probability that the first audio frame signal is transient noise.
In one possible implementation, the dividing module 1405 is further configured to divide the wavelet signal sequence to a plurality of signals to-be-smoothed, where each signal to-be-smoothed includes a fourth preset number of consecutive samples and an audio intensity value of each sample, each signal to-be-smoothed corresponds to a smoothing function, a time width of a definition domain of the smoothing function is not greater than a time width of the signal to-be-smoothed, a maximum value of a first smoothing function in the smoothing functions is located at a center of a definition domain of the first smoothing function.
The determining module 1403 is further configured to determine an average of audio intensity values of all samples in the first signal to-be-smoothed as a first average reference audio intensity value of all samples in the first smoothing signal, and perform convolution operation on the first average reference audio intensity value of all samples in each signal to-be-smoothed in the wavelet signal sequence and a corresponding smoothing function value to obtain a convolutional result, and use the convolutional result as an average reference audio intensity value of the first audio frame signal, where the smoothing function value is obtained according to the smoothing function and a time of a corresponding sample.
Optionally, the apparatus 14 further includes a calculating module 1407, which is configured to obtain a third reference audio intensity of the target sample by multiplying an audio intensity value of a previous sample of the target sample in the wavelet signal sequence with a smoothing coefficient.
The calculating module 1407 is further configured to obtain a fourth reference audio intensity value of the target sample by multiplying a remaining smoothing coefficient with an average of audio intensity values of all consecutive samples in the wavelet signal sequence which includes the target sample and are spliced before the target sample in the wavelet signal sequence.
The calculating module 1407 is further configured to obtain the audio intensity value of the target sample by adding the third reference audio intensity value with the fourth reference audio intensity value.
In one possible implementation, the reference audio intensity value includes an average and a variance of audio intensity values of a fifth preset number of consecutive samples.
In one possible implementation, the determining module 1403 is further configured to determine the probability that the first audio frame signal is transient noise as
Where result(n) represents energy distribution information of a wavelet decomposition signal corresponding to the nth audio frame signal, n represents an frame index indicating the nth audio frame signal, λ represents a first preset threshold, if a value of result(n) is greater than the first preset threshold, the probability that the first audio frame signal is transient noise is 1.
Optionally, the determining module 1403 is further configured to determine the energy distribution information of the first wavelet decomposition signal corresponding to the first audio
Where l represents the number of sub-wavelet decomposition signals included in the first wavelet decomposition signal, N represents the number of samples included in each sub-wavelet decomposition signal, n represents a frame index indicating the nth audio frame signal, xl(i) represents an audio intensity value of the lth sub-wavelet decomposition signal at the ith sample in a wavelet decomposition signal, ml1(i−1) represents an average of audio intensity values till the (i−1)th sample in the lth sub-wavelet decomposition signal, ml2(i−1) represents a variance of audio intensity values till the (i−1)th sample in the lth sub-wavelet decomposition signal.
In one possible implementation, the obtaining module 1401 is further configured to: obtain a first average of audio intensity values of all samples in a first sub-wavelet decomposition signal and a second average of audio intensity values of all samples in a second sub-wavelet decomposition signal.
The determining module 1403 is configured to determine the probability that the first audio frame signal is transient noise according a ratio between the first average and the second average.
In one possible implementation, the determining module 1403 is further configured to determine the second probability as
where thrg represents a second preset threshold, thrs represents a third preset threshold, n represents a frame index indicating the nth audio frame signal, Sc(n) represents an average reference audio intensity value of the nth audio frame signal.
Optionally, the apparatus further includes a compensating module 1408, which is configured to compensate high-frequency components of a first preset threshold in an original audio signal having the preset duration to obtain the first audio signal.
In one possible implementation, the decomposition module 1402 is further configured to perform wavelet packet decomposition on each audio frame signal and use a signal obtained through wavelet packet decomposition as the wavelet decomposition signal.
The effective voice signal detection can be implemented with reference to
In this implementation, by counting the preset number of continuous samples in the wavelet packet decomposition signal corresponding to the audio frame signal and using the local microscopic characteristics of wavelet decomposition or wavelet packet decomposition, the accuracy of transient noise detection is improved.
A device for transient noise detection provided in implementations of the disclosure will be described below. Referring to
The transceiver 1500 is configured to obtain an audio frame signal having a preset duration, the audio frame signal includes a plurality of samples and an audio intensity value of each sample.
The processor 1501 is configured to perform wavelet decomposition on a first audio frame signal to obtain a first wavelet decomposition signal corresponding to the first audio frame signal, the first wavelet decomposition signal includes a plurality of sub-wavelet decomposition signals, and each sub-wavelet decomposition signal includes a plurality of samples and an audio intensity value of each sample.
The processor 1501 is configured to determine a first reference audio intensity value of a first sub-wavelet decomposition signal according to reference audio intensity values of all samples in the first sub-wavelet decomposition signal.
The processor 1501 is configured to determine energy distribution information of the first wavelet decomposition signal according to first reference audio intensity values of all sub-wavelet decomposition signals in the first wavelet decomposition signal.
The processor 1501 is configured to determine a probability that the first audio frame signal is transient noise according to the energy distribution information of the first wavelet decomposition signal.
In one possible implementation, the transceiver 1500 is further configured to obtain a first audio signal. The first audio signal includes at least one audio frame signal, and for each audio frame signal, the transceiver 1500 is configured to perform wavelet decomposition to obtain a plurality of wavelet decomposition signals corresponding to each audio frame signal.
The processor 1501 is configured to obtain a wavelet signal sequence by splicing the wavelet decomposition signals corresponding to each audio frame signal according to a framing order of the at least one audio frame signal in the first audio signal.
The transceiver 1500 is further configured to obtain a first minimum audio intensity value of a first preset number of consecutive samples in the wavelet signal sequence and a second minimum audio intensity value of a second present number of consecutive samples in the wavelet signal sequence, where the first preset number of consecutive samples includes a target sample and is before the target sample in the wavelet signal sequence, the second preset number of consecutive samples includes the target sample and is after the target sample in the wavelet signal sequence.
The processor 1501 is further configured to determine a second reference audio intensity value according to the first minimum audio intensity value and the second minimum audio intensity value in the obtaining module 1401, determine an average reference audio intensity value of the first audio frame signal according to second reference audio intensity values of all samples in the first wavelet decomposition signal, determine first probability according to the average reference audio intensity value of the first audio frame signal, obtain second probability according to the energy distribution information of the first wavelet decomposition signal, and determine the probability that the first audio frame signal is transient noise according to the first probability and the second probability.
In one possible implementation, the transceiver 1500 is further configured to obtain a first audio signal. The first audio signal includes at least one audio frame signal.
The processor 1501 is further configured to: divide the first audio signal to a plurality of processing signals, where each processing signal includes a third preset number of consecutive samples, an audio intensity value of each sample, and a frequency value of each sample, where the first audio signal includes a plurality of audio frame signals.
The processor 1501 is further configured to determine a first smooth audio intensity value of a target sample according to an audio intensity value of a sample, the sample is in a previous processing signal of a first processing signal where the target sample is located and has a frequency value same as the target sample, and an audio intensity value of the target sample.
The processor 1501 is further configured to determine an inhibition coefficient of the target sample according to a probability that an audio frame signal where the target sample is located is transient noise, the first smooth audio intensity value of the target sample, and the audio intensity value of the target sample.
The processor 1501 is further configured to perform suppression on an audio intensity value of each sample in an audio frame signal where the target sample is located to obtain a suppressed audio frame signal, according to inhibition coefficients of all samples in the audio frame signal where the target sample is located.
In one possible implementation, the transceiver 1500 is further configured to obtain a probability that the first audio frame signal is transient noise and a probability that the second audio frame signal is transient noise, where the second audio frame signal is a previous audio frame signal of the first audio frame signal.
The processor 1501 is further configured to obtain a first smoothing probability according to the probability that the first audio frame signal is the transient noise and the probability that the second audio frame signal is transient noise and use the first smoothing probability as the probability that the first audio frame signal is transient noise.
In one possible implementation, the processor 1501 is further configured to divide the wavelet signal sequence to a plurality of signals to-be-smoothed, where each signal to-be-smoothed includes a fourth preset number of consecutive samples and an audio intensity value of each sample, each signal to-be-smoothed corresponds to a smoothing function, a time width of a definition domain of the smoothing function is not greater than a time width of the signal to-be-smoothed, a maximum value of a first smoothing function in the smoothing functions is located at a center of a definition domain of the first smoothing function; the processor 1501 is further configured to determine an average of audio intensity values of all samples in the first signal to-be-smoothed as a first average reference audio intensity value of all samples in the first smoothing signal, and perform convolution operation on the first average reference audio intensity value of all samples in each signal to-be-smoothed in the wavelet signal sequence and a corresponding smoothing function value to obtain a convolutional result, and use the convolutional result as an average reference audio intensity value of the first audio frame signal, where the smoothing function value is obtained according to the smoothing function and a time of a corresponding sample.
Optionally, the processor 1501 is further configured to: obtain a third reference audio intensity of the target sample by multiplying an audio intensity value of a previous sample of the target sample in the wavelet signal sequence with a smoothing coefficient, and obtain a fourth reference audio intensity value of the target sample by multiplying a remaining smoothing coefficient with an average of audio intensity values of all consecutive samples in the wavelet signal sequence which includes the target sample and are spliced before the target sample in the wavelet signal sequence, obtain the audio intensity value of the target sample by adding the third reference audio intensity value with the fourth reference audio intensity value.
In one possible implementation, the reference audio intensity value includes an average and a variance of audio intensity values of a fifth preset number of consecutive samples.
In one possible implementation, the processor 1501 is further configured to determine the probability that the first audio frame signal is transient noise as
Where result(n) represents energy distribution information of a wavelet decomposition signal corresponding to the nth audio frame signal, n represents an frame index indicating the nth audio frame signal, λ represents a first preset threshold, if a value of result(n) is greater than the first preset threshold, the probability that the first audio frame signal is transient noise is 1.
Optionally, the processor 1501 is further configured to determine the energy distribution information of the first wavelet decomposition signal corresponding to the first audio frame signal as
where l represents the number of sub-wavelet decomposition signals included in the first wavelet decomposition signal, N represents the number of samples included in each sub-wavelet decomposition signal, n represents a frame index indicating the nth audio frame signal, xl(i) represents an audio intensity value of the lth sub-wavelet decomposition signal at the ith sample in a wavelet decomposition signal, ml1(i−1) represents an average of audio intensity values till the (i−1)th sample in the lth sub-wavelet decomposition signal, ml2(i−1) represents a variance of audio intensity values till the (i−1)th sample in the lth sub-wavelet decomposition signal.
In one possible implementation, the processor 1501 is further configured to: obtain a first average of audio intensity values of all samples in a first sub-wavelet decomposition signal and a second average of audio intensity values of all samples in a second sub-wavelet decomposition signal, and determine the probability that the first audio frame signal is transient noise according a ratio between the first average and the second average.
In one possible implementation, the processor 1501 is further configured to determine the second probability as
where thrg represents a second preset threshold, thrs represents a third preset threshold, n represents a frame index indicating the nth audio frame signal, Sc(n) represents an average reference audio intensity value of the nth audio frame signal.
Optionally, the processor 1501 is further configured to compensate high-frequency components of a first preset threshold in an original audio signal having the preset duration to obtain the first audio signal.
In one possible implementation, the processor 1501 is further configured to perform wavelet packet decomposition on each audio frame signal and use a signal obtained through wavelet packet decomposition as the wavelet decomposition signal.
It can be understood that, the apparatus for transient noise detection 14 can perform the implementation of steps of
In this implementation, by counting the preset number of continuous samples in the wavelet packet decomposition signal corresponding to the audio frame signal and using the local microscopic characteristics of wavelet decomposition or wavelet packet decomposition, the accuracy of the probability that the audio frame signal is transient noise is improved, and the accuracy of transient noise detection is improved.
Implementations of the disclosure further provide a computer readable storage medium storing instructions which, when executed by a processor, are operable with the processor to carry out the method described above.
It should be noted that the above terms “first” and “second” are only used for descriptive purposes and cannot be understood as indicating or implying relative importance.
In this implementation, by counting the preset number of continuous samples in the sub-wavelet decomposition signals in the wavelet packet decomposition signal corresponding to the audio frame signal and using the local microscopic characteristics of wavelet decomposition or wavelet packet decomposition, the accuracy of the probability that the audio frame signal is transient noise is improved, and the accuracy of transient noise detection is improved. In addition, the probability that the signal frame is a voice signal is determined by forward tracking and backward tracking of distribution of audio intensity values of the voice signal with a preset duration, and the probability that the audio frame signal is transient noise is determined according to the probability that the audio frame signal is a voice signal and the probability that the audio frame signal is transient noise, as such, it is possible to avoid the false detection of the initial position of voice signal as transient noise, and further improve the accuracy of transient noise probability. Furthermore, the inhibition coefficient of transient noise is determined according to the probability that the signal frame is transient noise, as such, transient noise can be effectively expressed while maintaining signal characteristics of voice signals in the signal frame as much as possible.
It should be understood that in several implementations provided in the present application, the disclosed methods, devices and systems can be realized in other ways. The implementations described above are only schematic. For example, the division of the units is only a logical function division, and there can be another division mode in actual implementation. For example, multiple units or components can be combined or integrated into another system, or some features can be ignored or not executed. In addition, the coupling, direct coupling or communication connection between the components shown or discussed can be achieved through some interfaces, indirect coupling or communication connection of equipment or units, and can be electrical, mechanical or other forms.
The units described above as separate components can be or may not be physically separated, and the components illustrated as units can be or may not be physical units, that is, they can be located in one place or distributed on multiple network units. Some or all of the units can be selected according to the actual needs to achieve the purpose of the implementations.
In addition, in implementations of the disclosure, all functional units can be integrated into one processing unit, each unit can be used as a unit separately, or two or more units can be integrated into one unit. The above integrated units can be realized in the form of hardware or hardware plus software functional units.
Those skilled in the art can understand that all or part of the steps of realizing the above method implementations can be completed by program instruction related hardware. The above program can be stored in a computer-readable storage medium. When the program is executed, it executes the steps of the above method. The storage medium includes mobile storage device, read only memory (ROM), random access memory (RAM), magnetic disc or optical disc and other media that can store program codes.
Alternatively, if the above integrated unit of the disclosure is realized in the form of software function module and sold or used as an independent product, it can also be stored in a computer-readable storage medium. Based on this understanding, the technical schemes of the implementations of the disclosure, in essence or in the part that contributes to the prior art, can be embodied in the form of software products, the computer software product is stored in a storage medium and includes several instructions to enable a computer device (which can be a personal computer, server, network device, etc.) to perform all or part of the methods described in various implementations of the present disclosure. The storage medium includes mobile storage device, ROM, RAM, magnetic disc or optical disc and other media that can store program codes.
While the disclosure has been described in connection with certain embodiments, it is to be understood that the disclosure is not to be limited to the disclosed embodiments. Any person skilled in the art who can easily think of changes or replacements within the technical scope disclosed herein shall be covered by the protection scope of the disclosure. Therefore, the protection scope of the disclosure shall be subject to the protection scope of the claims.
Number | Date | Country | Kind |
---|---|---|---|
201911107575.2 | Nov 2019 | CN | national |
This application is a continuation under 35 U.S.C. § 120 of PCT/CN2020/128372, filed Nov. 12, 2020, which claims priority under 35 U.S.C. § 119(a) and/or PCT Article 8 to Chinese Application Serial No. 201911107575.2, filed Nov. 13, 2019, the entire disclosures of which are hereby incorporated by reference.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/CN2020/128372 | Nov 2020 | US |
Child | 17728405 | US |