Method for selecting output wave beam of microphone array

Information

  • Patent Grant
  • 12223976
  • Patent Number
    12,223,976
  • Date Filed
    Thursday, November 12, 2020
    4 years ago
  • Date Issued
    Tuesday, February 11, 2025
    2 months ago
Abstract
A method for estimating a direction of arrival of sound signals from a microphone array, comprising: receiving sound signals from the microphone array, and performing beamforming on the sound signals to obtain wave beams and corresponding wave beam output signals; performing the following operation on each wave beam: converting the wave beam output signal of a current wave beam to frequency domain from time domain to obtain a frequency spectrum vector and a power spectrum vector; calculating comprehensive voice signal energy of the current wave beam, wherein the comprehensive voice signal energy is the product of comprehensive energy indicating the energy level of the wave beam output signal and a comprehensive voice existence probability indicating an existence probability of voice in the wave beam output signal; and selecting the wave beam with a maximal comprehensive voice signal energy value as the output wave beam.
Description
TECHNICAL FIELD

The disclosure relates to selecting an output wave beam of a microphone array, and specifically to a method for selecting an output wave beam of a microphone array based on voice existence probability.


BACKGROUND ART

A microphone array can perform beamforming in multiple directions. However, due to the limitation of output hardware resources or application scenarios, usually only a beam in a certain direction is allowed to be selected as an output signal. The output wave beam selection of the microphone array is essentially an estimate of the direction of the source of voice signal. Correctly judging the direction of the voice signal can maximize the application effect of a beamforming algorithm; on the contrary, selecting a non-optimal wave beam as the output may greatly reduce the noise inhibitory effect of the beamforming algorithm. Therefore, in practice, the output wave beam selection mechanism, as a subsequent process to the beamforming algorithm, is of great significance to the research and development of voice signal processing systems using microphone arrays.


The inventor has noticed that while attempts have been made in the prior art to propose different methods for selecting an output wave beam of a microphone array, these existing methods still have at least the following deficiencies:

    • 1) Relying on pre-stored speaker information or relying on wake word recognition before the direction of arrival (DOA) is recognized;
    • 2) Difficult to simultaneously deal with high volume noise interference and low volume unstable signal interference; and
    • 3) Not fully optimized for resource-constrained devices or application scenarios such as Internet of Things (IoT) microcontroller units (MCUs) to reduce computational complexity.


For example, Chinese Patent with the Publication No. CN103888861B discloses a method for adjusting the directivity of a microphone array, in which the method firstly receives voice information, judges the information of the pre-speaker according to the voice information, and determines the direction of the pre-speaker's location according to the judging result. In this method, it's required to store the speaker's identity information in advance, and wave beam directivity adjustment cannot be performed for unstored speakers.


For another example, the Chinese patent application with the Publication No. CN109119092A discloses a method for switching the directivity of a wave beam based on a microphone array, in which the method only utilizes the phase delay information between the microphones and the energy information of each beam, and cannot distinguish between human voice signals and non-human voice signals, therefore, it is susceptible to interference from high volume unstable noises.


For a further example, Chinese patent application with the Publication No. CN109473118A discloses a dual-channel voice enhancement method, in which the target wave beam is enhanced only according to the existence probability of the sound to be enhanced in the target wave beam, and the wave beam selection is performed based on the ratio of the voice existence probability of each wave beam therein. In practice, this method has the disadvantage of being susceptible to interference from low volume unstable signals.


For another further example, Chinese patent application with the Publication No. CN108899044A discloses a voice signal processing method, in which the correlation between the voice signals and the content is determined by utilizing the wake word existence probability, which specifically comprises firstly inputting the voice signals into the wake word engine, and obtaining the confidence levels of the voice signals output by the wake word engine, and then calculating the voice existence probability and calculating the direction of arrival of the original input signals. However, before the direction of arrival may be judged, this method relies on the wake word engine to calculate the existence probability of particular words or sentences, the realization of which relies on voice recognition technology, therefore, it can only be applied to a voice signal processing system with wake-up function. In addition, the calculation of wake word existence probability and vector operation required by the method increase the computational complexity of the method, which is not practical to be implemented on resource-constrained devices such as IoT microcontroller units (MCUs).


To sum up, there is a need in the prior art for a method for selecting an output wave beam of a microphone array to solve the above problems in the prior art. It should be understood that the technical problems listed above are only examples rather than limitations of the disclosure, and the disclosure is not limited to technical solutions that simultaneously solve all the above technical problems. The technical solutions of the disclosure may be implemented to solve one or more of the above or other technical problems.


SUMMARY OF THE INVENTION

In view of the above problems, the object of the disclosure is to provide a method for selecting an output wave beam of a microphone array, which does not rely on pre-stored speaker information, does not require wake word recognition before recognizing a direction of arrival, and can reduce both the high volume noise interference and low volume unstable signal interference, and has reduced computational complexity.


In one aspect of the disclosure, a method is provided for selecting an output wave beam of a microphone array, the method comprising the following steps: (a) receiving a plurality of sound signals from the microphone array comprising a plurality of microphones, and performing beamforming on the plurality of sound signals to obtain a plurality of wave beams and corresponding wave beam output signals; (b) performing the following operations on each wave beam in the plurality of wave beams: converting the wave beam output signal of a current wave beam from time domain to frequency domain to obtain a frequency spectrum vector and a power spectrum vector of the current wave beam; on the basis of the frequency spectrum vector and the power spectrum vector of the current wave beam, calculating an overall voice signal energy of the current wave beam, wherein the overall voice signal energy is a product of an overall energy and an overall voice existence probability of the current wave beam, wherein the overall energy indicates an energy level of the wave beam output signal of the current wave beam, the overall voice existence probability indicates an existence probability of voice in the wave beam output signal of the current wave beam, and the overall voice existence probability and the overall energy are scalar quantities; and (c) selecting a wave beam with a maximal overall voice signal energy value as an output wave beam.


Optionally, the frequency spectrum vector is obtained by performing Short-Time Fourier Transform (STFT) or Short-Time Discrete Cosine Transform (DCT) on the wave beam output signal of the current wave beam.


Optionally, in step (b), after obtaining the frequency spectrum vector and the power spectrum vector of the current wave beam, update the power spectrum vector with the frequency spectrum vector according to the following formula:

Sb(f,t)=α1Sb(f,t−1)+(1−α1)|Yb(f,t)|2,

    • wherein t represents a frame index; f represents a frequency point; Sb(f,t−1) is the power spectrum corresponding to an element of the power spectrum vector of the current wave beam at the frequency point f on frame t−1; Sb(f,t) is the power spectrum corresponding to an element of the power spectrum vector of the current wave beam at the frequency point f on frame t; α1 is a parameter greater than 0 and less than 1; and Yb (f,t) is the frequency spectrum corresponding to an element of the frequency spectrum vector of the current wave beam at the frequency point f on frame t.


Preferably, α1 is greater than or equal to 0.9 and less than or equal to 0.99.


Optionally, in step (b), before calculating the overall voice signal energy of the current wave beam based on the frequency spectrum vector and the power spectrum vector of the current wave beam, determining a local energy minimum value corresponding to each element in the power spectrum vector of the current wave beam.


Optionally, determining the local energy minimum value corresponding to each element in the power spectrum vector of the current wave beam comprises: maintaining two vectors Sb,min and Sb,tmp with the same length as the frequency spectrum vector, and with an initial value of zero;


Each element of vectors Sb,min and Sb,tmp is updated according to the following formula:

Sb,min(f,t)=min{Sb,min(f,t−1),Sb(f,t)},
Sb,tmp(f,t)=min{Sb,tmp(f,t−1),Sb(f,t)},

    • wherein t represents a frame index; f represents a frequency point; Sb,min(f,t) represents a local energy minimum value corresponding to the element of the power spectrum vector of the current wave beam at the frequency point f on frame t; Sb,min(f,t−1) represents a local energy minimum value corresponding to the element of the power spectrum vector of the current wave beam at the frequency point f on frame t−1; Sb (f,t) represents a power spectrum corresponding to the element of the power spectrum vector of the current wave beam at the frequency point f on frame t; Sb,tmp(f,t) represents a local energy temporary minimum value corresponding to the element of the power spectrum vector of the current wave beam at the frequency point f on frame t; Sb,tmp(f,t−1) represents a local energy temporary minimum value corresponding to the element of the power spectrum vector of the current wave beam at the frequency point f on frame t−1; and
    • each time when L elements are updated according to the above formula, reset the vectors Sb,min and Sb,tmp in the following manner:

      Sb,min(f,t)=min{Sb,tmp(f,t−1),Sb(f,t)},
      Sb,tmp(f,t)=Sb(f,t);
    • after updating each element of the vectors Sb,min and Sb,tmp, obtain the local energy minimum value corresponding to each element in the power spectrum vector of the current wave beam.


Preferably, the L is set such that the L frames of signals comprise signals of 200 milliseconds to 500 milliseconds.


Optionally, the overall energy is obtained according to the following steps: averaging all elements of the power spectrum vector to obtain the overall energy.


Optionally, averaging all elements of the power spectrum vector to obtain the overall energy comprises:

    • performing weighted averaging on all elements of the power spectrum vector to obtain the overall energy, wherein for each element in the power spectrum vector, if the frequency point corresponding to the element falls in the range of 0-5 kHz, the element is given a weight of 1, otherwise it is given a weight of 0.


Optionally, the overall voice existence probability is obtained according to the following steps: for each element in a signal power spectrum vector of the current wave beam, calculating a voice existence probability corresponding to each element in the signal power spectrum vector according to a voice existence probability model, so as to generate a voice existence probability vector of the current wave beam; and perform the following steps to update each element of the voice existence probability vector of the current wave beam:

pb(f,t)=α2pb(f,t−1)+(1−α2)I(b,f,t)

    • wherein t represents a frame index; f represents a frequency point; pb is a voice existence probability vector of the current wave beam; pb(f,t−1) is a voice existence probability corresponding to the element of the voice existence probability vector of the current wave beam at the frequency point f on frame t−1; pb(f,t) is a voice existence probability corresponding to the element of the voice existence probability vector of the current wave beam at the frequency point f on frame t; α2 is a parameter greater than 0 and less than 1; and
    • the value of function I(b,f,t) is







I

(

b
,
f
,
t

)

=

{





1
,







S
b

(

f
,
t

)

/


S

b
,
min


(

f
,
t

)




δ
1







0
,






S
b




(

f
,
t

)

/

S

b
,
min





(

f
,
t

)


<

δ
1





;








    • Sb(f,t) is a power spectrum corresponding to the elements of the power spectrum vector of the current wave beam; Sb,min(f,t) is a local energy minimum value corresponding to the elements of the power spectrum vector of the current wave beam; δ1 is the threshold used to determine whether the current frame has a voice signal;

    • averaging all elements of the voice existence probability vector to obtain the overall voice existence probability.





Preferably, α2 is greater than or equal to 0.8 and less than or equal to 0.99.


Optionally, averaging all elements of the voice existence probability vector to obtain the overall voice existence probability comprises: performing weighted averaging on all elements of the voice existence probability vector to obtain the overall voice existence probability, wherein for each element in the voice existence probability vector, if the frequency point corresponding to the element falls in the range of 0-5 kHz, the element is given a weight of 1, otherwise it is given a weight of 0.


Preferably, in step (b), after calculating the overall voice signal energy of the current wave beam, update the overall voice signal energy of the current wave beam according to the following operation:

db(t)=α3db(t−1)+(1−α3)J(b,t),

    • wherein db (t−1) is the overall voice signal energy of the current wave beam on frame t−1; db (t) is the overall voice signal energy of the current wave beam on frame t;
    • function J(b,t) represents the voice signal energy of the current frame, the value of which is:







J

(

b
,
t

)

=

{








e
b

(
t
)

·


q
b

(
t
)


,






q
b

(
t
)



δ
2







0
,






q
b

(
t
)

<

δ
2





,








    • wherein δ2 is a threshold used to decide whether to set the value of function J(b,t) to zero.





Preferably, α3 is greater than or equal to 0.8 and less than or equal to 0.99.


The solution of the disclosure calculates the overall voice signal energy of each wave beam to select an output wave beam of the microphone array accordingly. In particular, the overall voice signal energy give sufficient consideration to the overall energy of the wave beam and the overall voice existence probability, and the wave beam selection is performed through both the wave beam energy and the voice existence probability, which does not require pre-acquisition of speaker information, and overcomes the interference of non-human noises, and also does not require any voice recognition prior to recognizing the direction of arrival. In addition, the overall voice signal energy is a product of scalar quantities, which helps reduce vector calculations and lowers computational complexity.


It should be understood that the foregoing description of the background and summary of the invention is only intended to be illustrative rather than limiting.





BRIEF DESCRIPTION OF THE DRAWINGS


FIG. 1 is a schematic flow diagram of an exemplary embodiment of the method for selecting an output wave beam of a microphone array of the disclosure;



FIG. 2 is a schematic flow diagram of a detailed exemplary embodiment of the method for selecting an output wave beam of a microphone array of the disclosure; and



FIG. 3 is a schematic flow diagram of updating the local energy minimum value estimate in an embodiment of the method for selecting an output wave beam of a microphone array of the disclosure.





DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

The disclosure will be described more fully hereinafter with reference to the accompanying drawings, which form a part hereof, and which show exemplary embodiments by way of illustration. It should be understood that the embodiments shown in the accompanying drawings and described hereinafter are only illustrative and not intended to limit the disclosure.



FIG. 1 is a schematic flow diagram of an exemplary embodiment of the method for selecting an output wave beam of a microphone array of the disclosure.


Method 100 shown in FIG. 1 comprises: (a) as shown in step 102, receiving a plurality of sound signals from the microphone array comprising a plurality of microphones, and performing beamforming on the plurality of sound signals to obtain a plurality of wave beams and corresponding wave beam output signals.


The method 100 further comprises: (b) as shown in steps 104 to 108, performing the following operations on each wave beam in the plurality of wave beams: converting the wave beam output signal of a current wave beam from time domain to frequency domain to obtain a frequency spectrum vector and a power spectrum vector of the current wave beam (step 104); on the basis of the frequency spectrum vector and the power spectrum vector of the current wave beam, calculating an overall voice signal energy of the current wave beam (step 106), wherein the overall voice signal energy is a product of an overall energy and an overall voice existence probability of the current wave beam, wherein the overall energy indicates an energy level of the wave beam output signal of the current wave beam, the overall voice existence probability indicates an existence probability of voice in the wave beam output signal of the current wave beam, and the overall voice existence probability and the overall energy are scalar quantities.


The method further comprises: (c) as shown in step 110, selecting a wave beam with a maximal overall voice signal energy value as an output wave beam.



FIG. 2 is a schematic flow diagram of a detailed exemplary embodiment of the method for selecting an output wave beam of a microphone array of the disclosure.


Method 200 begins from step 202, in which the wave beam output by the beamforming algorithm is transformed into the STFT domain, and the power spectrum vector of each wave beam is updated with the frequency spectrum information. Specifically, it is assumed that the beamforming algorithm outputs B wave beams which are transformed into Short-Time Fourier Transform (STFT) domain of F points, then the output signal of the b-th (b=1, 2, . . . , B) wave beam may be represented as an F-dimensional frequency spectrum vector Yb in the STFT domain, and the f-th element Yb(f) of the vector Yb represents the frequency information of the signal at the frequency f. The modulus is taken for each frequency point of vector Yb and weighted with the power spectrum vector Sb, and the latter is updated according to the following formula:

Sb(f,t)=α1Sb(f,t−1)+(1−α1)|Yb(f,t)|2

    • wherein the independent variable t represents time (i.e., frame index), for example, Sb(f,t−1) and Sb(f,t) represent the value of Sb at the frequency point f on frame t−1 and the value of Sb at the frequency point f on frame t, respectively, and the vectors such as and Sb,tmp hereinafter also adopt the above manner of representation. The parameter a1 is between 0 and 1, the larger the value, the smaller the update degree of the power spectrum, which may better resist the influence of transient noise, but it may be more likely to mismatch with the real current instantaneous energy value, and the preferred values is between 0.9 to 0.99.|Yb(f)|2, the modulus of vector Yb on the frequency f represents the power spectrum of the current frame (that is, frame t, the same below) of signal on the frequency by updating Sb(f) with |Yb(f)|2, the latter still represents the same physical meaning (signal energy) as the former, but because it is updated smoothly, it may better resist the influence of transient noises. Preferably, the subsequent steps may be calculated using the updated power spectrum vector, so that the system is relatively stable.


In step 204, update the estimate of the local energy minimum value Sb,min of the current wave beam. For example, the local energy minimum value estimate may be updated according to the method 300 shown in FIG. 3. It should be understood that although FIG. 3 illustrates a specific method, the implementation of the disclosure is not limited thereto. For example, Martin, R.: Spectral subtraction based on minimum statistics. 1994, Proceedings of 7th EUSIPCO, 1182-1185 or a variant of this method may be used to update the estimate of the local energy minimum value Sb,min of the current wave beam.


In step 302, maintain two vectors Sb,min and Sb,tmp with a length of F (the initial value is 0, that is, the formula Sb,min(f,0)=Sb,tmp(f,0)=0 is for all f).


In step 304, determine whether a next element exists in the power spectrum vector of the current wave beam Sb. If yes, go to step 306; if no, which means that each element of the power spectrum vector of the current wave beam has been processed, go to step 312, and obtain the local minimum energy value corresponding to each element.


In step 306, update the current element corresponding to each frequency point in the following manner,

Sb,min(f,t)=min{Sb,min(f,t−1),Sb(f,t)},
Sb,tmp(f,t)=min{Sb,tmp(f,t−1),Sb(f,t)},


In step 308, judge whether L frames of signals have been processed, that is, judge whether t is a multiple of L or not. Each time when L frames of signals are processed, in step 310, reset Sb,min and Sb,tmp in the following manner,

Sb,min(f,t)=min{Sb,tmp(f,t−1)Sb(f,t)}
Sb,tmp(f,t)=Sb(f,t);

    • in which the vector Sb,min is local (L frames of signals) minimum value. Since at any time, the signal must be noise or the addition of noise and voice, it can be considered approximately that Sb,min represents the intensity of noise energy. This method is essentially based on the assumption that the voice signal is an unstable signal and the noise is a stable signal. The smaller the value of L, the lower the requirement for the stability of noise, but the smaller the discrimination between the noise signal and the voice signal; the value of this parameter is also related to the length setting of each frame of signal. In preferred embodiments of the disclosure, the L frames of signals should be approximately made to contain signals of 200 milliseconds to 500 milliseconds.


Returning to FIG. 2, in step 206, update the voice existence probability of the current wave beam at each frequency point. Specifically, the probability of the existence of the voice signal at each frequency point may be represented using a vector pb, and is updated in the following manner,

pb(f t)=α2pb(f,t−1)+(1−α2)I(b,f,t)

    • wherein the parameter α2 is between 0 and 1, and the recommended setting is 0.8 to 0.99;


The value of function I(b,f) is







I

(

b
,
f
,
t

)

=

{





1
,





S
b

(

f
,
t

)

/


S

b
,

min


(

f
,
t

)




δ
1








0
,





S
b

(

f
,
t

)

/


S

b
,

min


(

f
,
t

)


<

δ
1






;








    • wherein parameter δ1 represents the threshold used to determine whether the current frame has a voice signal.





It should be understood that step 206 may be implemented using the method of Cohen, I. and Berdugo, B.: Noise estimation by minima controlled recursive averaging for robust speech enhancement. 2002, IEEE Signal Processing Letters, 9(1): 12-15 or its variants, and other algorithms for probability estimation of voice signals. Similarly, the input to the algorithm is required to be the signal power spectrum Sb, and the output is the voice probability pb between 0 and 1.


In step 208, perform weighted averaging on the voice existence probability vector to obtain the overall voice probability of the current wave beam. Specifically, weighted averaging on the vector pb is performed. Give a weight of 1 to the frequency points in the range of 0-5 kHz, otherwise give a weight of 0, to obtain the overall voice existence probability qb of wave beam b. A scalar quantity qb will be used in subsequent steps instead of a vector pb, which will simplify the calculations; at the same time, since it is almost impossible for the frequency of human voice to exceed 5 kHz, it can be considered that discarding the signals above this frequency will not affect the final result.


In step 210, perform weighted averaging on the power spectrum vector to obtain the overall energy of the current wave beam. Similarly, perform the same weighted averaging on the vector Sb to obtain the overall energy eb of wave beam b. Specifically, weighted averaging is performed on the vector Sb. A weight of 1 is given to frequency points in the range of 0-5 kHz, otherwise a weight of 0 is given.


In step 212, calculate the overall voice signal energy of the current wave beam. db is defined as the voice signal energy of wave beam b, the initial value of which is 0 (i.e., db(0)=0), update each frame in the following manner:

db(t)=α3db(t−1)+(1−α3)J(b,t)


The parameter α3 is between 0 and 1, and the recommended setting is 0.8 to 0.99. The function J(b) represents the voice signal energy of the current frame, the value of which is







J

(

b
,
t

)

=

{








e
b

(
t
)

·


q
b

(
t
)


,






q
b

(
t
)



δ
2







0
,






q
b

(
t
)

<

δ
2





,








    • in which parameter δ2 is a threshold used to decide whether to set the function value to zero.





In step 214, determine whether a next wave beam exists. If yes, go back to step 204, and execute steps 204-212 for the next wave beam; if not, go to step 218.


In step 218, a wave beam with a maximal overall voice signal energy is determined and selected as an output wave beam. Specifically, take wave beam b corresponding to the maximum value in overall voice signal energy set {db}(b=1, 2, . . . , B) as an output wave beam.


The above embodiments provide specific operation processes by way of example, but it should be understood that the protection scope of the disclosure is not limited thereto.


While various embodiments of various aspects of the invention have been described for the purpose of the disclosure, it shall not be understood that the teaching of the disclosure is limited to these embodiments. The features disclosed in a specific embodiment are therefore not limited to that embodiment, but may be combined with the features disclosed in different embodiments. Furthermore, it should be understood that the method steps described above may be performed sequentially, performed in parallel, combined into fewer steps, split into more steps, combined and/or omitted in ways other than those described. Those skilled in the art should appreciate that there are possibly more optional embodiments and modifications and various changes and modifications may be made to the above components and configurations, without departing from the scope defined by the claims of the disclosure.

Claims
  • 1. A method for estimating a direction of arrival of sound signals from a microphone array, comprising the following steps: (a) receiving a plurality of sound signals from the microphone array comprising a plurality of microphones, and performing beamforming on the plurality of sound signals to obtain a plurality of wave beams and corresponding wave beam output signals;(b) performing the following operations on each wave beam in the plurality of wave beams: converting the wave beam output signal of a current wave beam from time domain to frequency domain to obtain a frequency spectrum vector and a power spectrum vector of the current wave beam;on the basis of the frequency spectrum vector and the power spectrum vector of the current wave beam, calculating an overall voice signal energy of the current wave beam, wherein the overall voice signal energy is a product of an overall energy and an overall voice existence probability of the current wave beam, wherein the overall energy indicates an energy level of the wave beam output signal of the current wave beam, the overall voice existence probability indicates an existence probability of voice in the wave beam output signal of the current wave beam, and the overall voice existence probability and the overall energy are scalar quantities; wherein the overall energy is obtained according to the following steps: averaging all elements of the power spectrum vector to obtain the overall energy; and the averaging comprises: performing weighted averaging on all elements of the power spectrum vector to obtain the overall energy, wherein for each element in the power spectrum vector, if the frequency point corresponding to the element falls in the range of 0-5 kHz, the element is given a weight of 1, otherwise it is given a weight of 0;(c) selecting a wave beam with a maximal overall voice signal energy value as an output wave beam; and(d) estimating the direction of arrival of sound signals from the microphone array based on a direction of the output wave beam.
  • 2. The method of claim 1, wherein the frequency spectrum vector is obtained by performing Short-Time Fourier Transform (STFT) or Short-Time Discrete Cosine Transform (DCT) on the wave beam output signal of the current wave beam.
  • 3. The method of claim 1, wherein, in step (b), after obtaining the frequency spectrum vector and the power spectrum vector of the current wave beam, update the power spectrum vector with the frequency spectrum vector according to the following formula: Sb(f,t)=α1Sb(f,t−1)+(1-α1)|Yb(f,t)|2,wherein:t represents a frame index;f represents a frequency point;Sb(f,t−1) is a power spectrum corresponding to an element of the power spectrum vector of the current wave beam b at the frequency point f on frame t−1;Sb(f,t) is a power spectrum corresponding to an element of the power spectrum vector of the current wave beam b at the frequency point f on frame t;α1 is a parameter greater than 0 and less than 1; andYb(f,t) is a frequency spectrum corresponding to an element of the frequency spectrum vector of the current wave beam b at the frequency point f on frame t.
  • 4. The method of claim 3, wherein α1 is greater than or equal to 0.9 and less than or equal to 0.99.
  • 5. The method of claim 1, wherein, in step (b), before calculating the overall voice signal energy of the current wave beam based on the frequency spectrum vector and the power spectrum vector of the current wave beam, determine a local energy minimum value corresponding to each element in the power spectrum vector of the current wave beam.
  • 6. The method of claim 5, wherein determining the local energy minimum value corresponding to each element in the power spectrum vector of the current wave beam comprises: maintaining two vectors Sb,min and Sb,tmp with the same length as the frequency spectrum vector and with an initial value of zero;each element of vectors Sb,min and Sb,tmp is updated according to the following formula: Sb,min(f,t)=min{Sb,min(f,t−1),Sb(f,t)},Sb,tmp(f,t)=min{Sb,tmp(f,t−1),Sb(f,t)},wherein:t represents a frame index;f represents a frequency point;Sb,min(f,t) represents a local energy minimum value corresponding to the element of the power spectrum vector of the current wave beam b at the frequency point f on frame t;Sb,min(f,t−1) represents a local energy minimum value corresponding to the element of the power spectrum vector of the current wave beam b at the frequency point f on frame t−1;Sb(f,t) represents a power spectrum corresponding to the element of the power spectrum vector of the current wave beam b at the frequency point f on frame t;Sb,tmp(f,t) represents a local energy temporary minimum value corresponding to the element of the power spectrum vector of the current wave beam b at the frequency point f on frame t;Sb,tmp (f,t−1) a local energy temporary minimum value corresponding to the element of the power spectrum vector of the current wave beam b at the frequency point f on frame t−1; andeach time when L elements are updated according to the above formula, reset the vectors Sb,min and Sb,tmp in the following manner: Sb,min(f,t)=min{Sb,tmp(f,t−1),Sb(f,t)},Sb,tmp(f,t)=Sb(f,t);after updating each element of the vectors Sb,min and Sb,tmp, obtain the local energy minimum value corresponding to each element in the power spectrum vector of the current wave beam b.
  • 7. The method of claim 6, wherein the L is set such that the L frames of signals comprise signals of 200 milliseconds to 500 milliseconds.
  • 8. The method of claim 1, wherein, the overall voice existence probability is obtained according to following steps: for each element in a signal power spectrum vector of the current wave beam, calculating a voice existence probability corresponding to each element in the signal power spectrum vector according to a voice existence probability model, so as to generate a voice existence probability vector of the current wave beam; andperforming the following steps to update each element of the voice existence probability vector of the current wave beam: pb(f,t)=α2pb(f,t−1)+(1−α2)I(b,f,t)wherein:t represents a frame index;f represents a frequency point;pb is a voice existence probability vector of the current wave beam b;pb(f,t−1) is a voice existence probability corresponding to the element of the voice existence probability vector of the current wave beam b at the frequency point f on frame t−1;pb(f,t) is a voice existence probability corresponding to the element of the voice existence probability vector of the current wave beam b at the frequency point f on frame t;α2 is a parameter greater than 0 and less than 1; andthe value of function/(b,f,t) is
  • 9. The method of claim 8, wherein α2 is greater than or equal to 0.8 and less than or equal to 0.99.
  • 10. The method of claim 8, wherein averaging all elements of the voice existence probability vector to obtain the overall voice existence probability comprises: performing weighted averaging on all elements of the voice existence probability vector to obtain the overall voice existence probability, wherein for each element in the voice existence probability vector, if the frequency point corresponding to the element falls in the range of 0-5 kHz, the element is given a weight of 1, otherwise it is given a weight of 0.
  • 11. The method of claim 1, wherein, in step (b), after calculating the overall voice signal energy of the current wave beam, update the overall voice signal energy of the current wave beam according to the following operation: db(t)=α3db(t−1)+(1−α3)J(b,t),wherein:db(t−1) is the overall voice signal energy of the current wave beam on frame t−1;db(t) is the overall voice signal energy of the current wave beam on frame t;α3 is a parameter greater than 0 and less than 1;function J(b,t) represents the voice signal energy of the current frame, the value of which is:
  • 12. The method of claim 11, wherein α3 is greater or equal to 0.8 and less than or equal to 0.99.
Priority Claims (1)
Number Date Country Kind
201911097476.0 Nov 2019 CN national
PCT Information
Filing Document Filing Date Country Kind
PCT/CN2020/128274 11/12/2020 WO
Publishing Document Publishing Date Country Kind
WO2021/093798 5/20/2021 WO A
US Referenced Citations (16)
Number Name Date Kind
6370507 Grill Apr 2002 B1
6377920 Yeldener Apr 2002 B2
9613640 Balamurali Apr 2017 B1
10096328 Markovich-Golan et al. Oct 2018 B1
20070260454 Gemello Nov 2007 A1
20120173234 Fujimoto Jul 2012 A1
20130003987 Furuta Jan 2013 A1
20130144614 Myllyla Jun 2013 A1
20140074467 Ziv Mar 2014 A1
20150039304 Wein Feb 2015 A1
20170004848 Bae Jan 2017 A1
20180033447 Ramprashad Feb 2018 A1
20180090158 Jensen Mar 2018 A1
20190259381 Ebenezer Aug 2019 A1
20190385635 Shahen Tov et al. Dec 2019 A1
20220148611 Slapak May 2022 A1
Foreign Referenced Citations (20)
Number Date Country
101510426 Aug 2009 CN
102324237 Jan 2012 CN
102508204 Jun 2012 CN
102739886 Oct 2012 CN
103456310 Dec 2013 CN
103871420 Jun 2014 CN
104751853 Jul 2015 CN
105590631 May 2016 CN
106251877 Dec 2016 CN
106448692 Feb 2017 CN
107976651 May 2018 CN
108922554 Nov 2018 CN
109346062 Feb 2019 CN
110223708 Sep 2019 CN
110390947 Oct 2019 CN
110600051 Dec 2019 CN
6114053 Apr 2017 JP
20110121319 Nov 2011 KR
2013132926 Jan 2013 WO
2018133056 Jan 2017 WO
Non-Patent Literature Citations (2)
Entry
International Search Report for PCT Publication No. WO 2021093798, dated May 20, 2021.
Office Action with Search Report for CN Patent Application No. 201911097476.0, dates Dec. 26, 2019.
Related Publications (1)
Number Date Country
20220399028 A1 Dec 2022 US