The present embodiments relate to detection of voice activity in noisy environments.
Robust voice activity detection in noisy sound environments is a very difficult problem when using a small device mounted in the ear. Such systems that rely on using fixed thresholds often suffer from false positives and false negatives.
A method of adjusting a gain on a voice operated control system can include receiving a first microphone signal, receiving a second microphone signal, updating a slow time weighted ratio of the filtered first and second signals, and updating a fast time weighted ratio of the filtered first and second signals. The method can further include calculating an absolute difference between the fast time weighted ratio and the slow time weighted ratio, comparing the absolute difference with a threshold, and increasing the gain when the absolute difference is greater than the threshold. In some embodiments the threshold can be fixed. In some embodiments the method can further include band limiting or band pass filtering the first microphone signal to provide a filtered first signal, band limiting or band pass filtering the second microphone signal to provide a filtered second signal, calculating a power estimate of the filtered first signal including a fast time weighted average and a slow time weighted average of the filtered first signal, and calculating a power estimate of the filtered second signal including a fast time weighted average and a slow time weighed average of the filtered second signal. In some embodiments the threshold is dependent on the slow time weighted average. In some embodiments, the threshold value is set to a time averaged value of the absolute difference and in some embodiments the threshold value is set to the time averaged value of the absolute difference using a leaky integrator used for time smoothing. The step of band limiting or band pass filtering can use a weighted fast Fourier transform operation. In some embodiments, the method can further include determining a current voice activity status based on the comparison step. In some embodiments, the method can further include determining a current voice activity status using Singular Value Decomposition, a neural net system, or a bounded probability value.
The embodiments can also include an electronic device for adjusting a gain on a voice operated control system which can include one or more processors and a memory having computer instructions. The instructions, when executed by the one or more processors causes the one or more processors to perform the operations of receiving a first microphone signal, receiving a second microphone signal, updating a slow time weighted ratio of the filtered first and second signals, and updating a fast time weighted ratio of the filtered first and second signals. The one or more processors can further perform the operations of calculating an absolute difference between the fast time weighted ratio and the slow time weighted ratio, comparing the absolute difference with a threshold, and increasing the gain when the absolute difference is greater than the threshold. In some embodiments, adjusting or increasing the gain involves adjusting a gain of an overall system or of a total output. In some embodiments, adjusting the gain involves adjusting the gain from a first microphone, from a second microphone or from both. In some embodiments, adjusting the gain involves adjusting the gain at the output of a VAD or comparator or other output. In some embodiments, adjusting the gain can involve any combination of gain adjustment mentioned above.
In some embodiments, electronic device can further include the memory having instructions when executed by the one or more processors to cause the one or more processors to perform the operations of band limiting or band pass filtering the first microphone signal to provide filtered first signal, band limiting or band pass filtering the second microphone signal to provide a filtered second signal, calculating a power estimate of the filtered first signal including a fast time weighted average and a slow time weighted average of the filtered first signal, and calculating a power estimate of the filtered second signal including a fast time weighted average and a slow time weighed average of the filtered second signal. In some embodiments the threshold is fixed or the threshold is dependent on the slow time weighted average. In some embodiments, the first microphone signal is received by an ambient signal microphone and the second microphone signal is received by a ear canal microphone. The ambient signal microphone and the ear canal microphone can be part of an earphone device having a sound isolating barrier or a partially sound isolating barrier to isolate the ear canal microphone from an ambient environment. The earphone device can be any number of devices including, but not limited to a headset, earpiece, headphone, ear bud or other type of earphone device. In some embodiments, the sound isolating barrier or partially sound isolating barrier is an inflatable balloon or foam plug. In some embodiments, the memory further includes instructions causing the operation of determining a current voice activity status based on the comparison step. In some embodiments, the memory further includes instructions causing the operation of determining a current voice activity status using Singular Value Decomposition, neural net systems, or a bounded probability value. In some embodiments, the first microphone signal is optionally processed using an analog or a digital band-pass filter and in some embodiments the second microphone signal is optionally processed using an analog or a digital band-pass filter. In some embodiments, at least one characteristic of the first or second microphone signals includes a short-term power estimate.
The invention may be understood from the following detailed description when read in connection with the accompanying drawing. It is emphasized, according to common practice, that various features of the drawings may not be drawn to scale. On the contrary, the dimensions of the various features may be arbitrarily expanded or reduced for clarity. Moreover, in the drawing, common numerical references are used to represent like features. Included in the drawing are the following figures:
A new method and system is presented to robustly determined voice activity using typically two microphones mounted in a small earpiece. The determined voice activity status can be used to control the gain on a voice operated control system to gate the level of a signal directed to a second voice receiving system. This voice receiving system can be a voice communication system (e.g. radio or telephone system), a voice recording system, a speech to text system, a voice machine-control system. The gain of the voice operated control system is typically set to zero when no voice active is detected, and set to unity otherwise. The overall data rate in a voice communication system can therefore be adjusted, and large data rate reductions are possible: thereby increasing the number of voice communications channels and/or increasing the voice quality for each voice communication channel. The voice activity status can also be used to adjust the power used in a wireless voice communication system, thereby extending the battery life of the system.
P_1(t)=W*FFT(M_1(t)
P_2(t)=W*FFT(M_2(t)
Where
P_1(t) is the weighted power estimate of signal microphone 1 at time t.
W is a frequency weighting vector.
FFT( ) is a Fast Fourier Transform operation.
M_1(t) is the signal from the first microphone at time t.
M_2(t) is the signal from the second microphone at time t.
A fast-time weighted average of the two band pass filtered power estimates is calculated at 25 and 26 respectively, with a fast time constant which in the preferred embodiment is equal to 45 ms.
AV_M1_fast(t)=a*AV_M1_fast(t−1)+(a−1)*P_1(t)
AV_M2_fast(t)=a*AV_M2_fast(t−1)+(a−1)*P_1(t)
Where
AV_M1_fast(t) is the fast time weighted average of the first band pass filtered microphone signal.
AV_M2_fast(t) is the fast time weighted average of the second band pass filtered microphone signal.
a is a fast time weighting coefficient.
A slow-time weighted average of the two band pass filtered power estimates is calculated at 27 and 28 respectively, with a fast time constant which in the preferred embodiment is equal to 500 ms.
AV_M1_slow(0=b*AV_M1_slow(t−1)+(b−1)*P_1(t)
AV_M2_slow(0=b*AV_M2_slow(t−1)+(b−1)*P_1(t)
Where
AV_M1_slow(t) is the slow time weighted average of the first band pass filtered microphone signal.
AV_M2_slow(t) is the slow time weighted average of the second band pass filtered microphone signal.
b is a slow time weighting coefficient, where a>b.
The ratio of the two fast time weighted power estimates is calculated at 30 (i.e., the fast weighted power of the second microphone divided by the fast weighted power of the first microphone).
ratio_fast(t)=AV_M2_fast(t)/AV_M1_fast(t)
The ratio of the two slow time weighted power estimates is calculated at 29 (ie the slow weighted power of the second microphone divided by the slow weighted power of the first microphone).
ratio_slow(t)=AV_M2_slow(t)/AV_M1_slow(t)
The absolute difference of the two above ratio values is then calculated at 31.
diff(t)=abs(ratio_fast(t)−ratio_slow(t))
Note that the updating of the slow time weighted ratio in one embodiment is of the first filtered signal and the second filtered signal where the first filtered signal and the second filtered signal are the slow weighted powers of the first and second microphone signals. Similarly, updating of the fast time weighted ratio is of the first filtered signal and the second filtered signal where the first filtered signal and the second filtered signals are the fast weighted powers of the first and second microphone signals. As noted above, the absolute differences between the fast time weighted ratio and the slow time weighted ratios are calculated to provide a value.
This value is then compared with a threshold at 32, and if the value diff(t) is greater than this threshold, then we determine that voice activity is current in an active mode at 33, and the VOX gain value is updated at 34 or in this example increased (up to a maximum value of unity).
In one exemplary embodiment the threshold value is fixed.
In a second embodiment the threshold value is dependent on the slow weighted level AV_M1_slow.
In a third embodiment the threshold value is set to be equal to the time averaged value of the diff(t), for example calculated according to the following:
threshold(t)=c*threshold(t−1)+(c−1)*diff(t)
where c is a time smoothing coefficient such that the time smoothing is a leaky integrator type with a smoothing time of approximately 500 ms.
Although the invention is illustrated and described herein with reference to specific embodiments, the invention is not intended to be limited to the details shown. Rather, various modifications may be made in the details within the scope and range of equivalents of the claims and without departing from the embodiments claimed.
This application is a continuation of and claiming priority to U.S. patent application Ser. No. 16/227,695, filed 20 Dec. 2018, which is a continuation of and claims priority to U.S. patent application Ser. No. 14/922,475, filed on Oct. 26, 2015, which claims priority to U.S. Provisional Patent Application No. 62/068,273, filed on Oct. 24, 2014, which are both hereby incorporated by reference in their entireties.
Number | Name | Date | Kind |
---|---|---|---|
3876843 | Moen | Apr 1975 | A |
4054749 | Suzuki et al. | Oct 1977 | A |
4088849 | Usami et al. | May 1978 | A |
4947440 | Bateman et al. | Aug 1990 | A |
5208867 | Stites, III | May 1993 | A |
5251263 | Andrea | Oct 1993 | A |
5267321 | Langberg | Nov 1993 | A |
5317273 | Hanson | May 1994 | A |
5479522 | Lindemann | Dec 1995 | A |
5524056 | Killion et al. | Jun 1996 | A |
5550923 | Hotvet | Aug 1996 | A |
5577511 | Killion | Nov 1996 | A |
5903868 | Yuen et al. | May 1999 | A |
5946050 | Wolff | Aug 1999 | A |
6021207 | Puthuff et al. | Feb 2000 | A |
6021325 | Hall | Feb 2000 | A |
6028514 | Lemelson | Feb 2000 | A |
6056698 | Iseberg | May 2000 | A |
6163338 | Johnson et al. | Dec 2000 | A |
6163508 | Kim et al. | Dec 2000 | A |
6226389 | Lemelson et al. | May 2001 | B1 |
6298323 | Kaemmerer | Oct 2001 | B1 |
6359993 | Brimhall | Mar 2002 | B2 |
6400652 | Goldberg et al. | Jun 2002 | B1 |
6408272 | White | Jun 2002 | B1 |
6415034 | Hietanen | Jul 2002 | B1 |
6567524 | Svean et al. | May 2003 | B1 |
6647368 | Nemirovski | Nov 2003 | B2 |
RE38351 | Iseberg et al. | Dec 2003 | E |
6661901 | Svean et al. | Dec 2003 | B1 |
6728385 | Kvaloy et al. | Apr 2004 | B2 |
6748238 | Lau | Jun 2004 | B1 |
6754359 | Svean et al. | Jun 2004 | B1 |
6738482 | Jaber | Sep 2004 | B1 |
6804638 | Fiedler | Oct 2004 | B2 |
6804643 | Kiss | Oct 2004 | B1 |
7003099 | Zhang | Feb 2006 | B1 |
7072482 | Van Doorn et al. | Jul 2006 | B2 |
7107109 | Nathan et al. | Sep 2006 | B1 |
7158933 | Balan | Jan 2007 | B2 |
7177433 | Sibbald | Feb 2007 | B2 |
7209569 | Boesen | Apr 2007 | B2 |
7280849 | Bailey | Oct 2007 | B1 |
7430299 | Armstrong et al. | Sep 2008 | B2 |
7433714 | Howard et al. | Oct 2008 | B2 |
7444353 | Chen | Oct 2008 | B1 |
7450730 | Bertg et al. | Nov 2008 | B2 |
7464029 | Visser et al. | Dec 2008 | B2 |
7477756 | Wickstrom et al. | Jan 2009 | B2 |
7512245 | Rasmussen | Mar 2009 | B2 |
7529379 | Zurek | May 2009 | B2 |
7562020 | Le et al. | Jun 2009 | B2 |
7756285 | Sjursen et al. | Jul 2010 | B2 |
7778434 | Juneau et al. | Aug 2010 | B2 |
7853031 | Hamacher | Dec 2010 | B2 |
7903826 | Boersma | Mar 2011 | B2 |
7920557 | Moote | Apr 2011 | B2 |
7936885 | Frank | May 2011 | B2 |
8014553 | Radivojevic et al. | Sep 2011 | B2 |
8018337 | Jones | Sep 2011 | B2 |
8045840 | Murata | Oct 2011 | B2 |
8150044 | Goldstein | Apr 2012 | B2 |
8162846 | Epley | Apr 2012 | B2 |
8189803 | Bergeron | May 2012 | B2 |
8270629 | Bothra | Sep 2012 | B2 |
8437480 | Zong | May 2013 | B2 |
8477955 | Engle | Jul 2013 | B2 |
8493204 | Wong et al. | Jul 2013 | B2 |
8577062 | Goldstein | Nov 2013 | B2 |
8600086 | Jensen et al. | Dec 2013 | B2 |
8611560 | Goldstein | Dec 2013 | B2 |
8750295 | Liron | Jun 2014 | B2 |
8774433 | Goldstein | Jul 2014 | B2 |
8855343 | Usher | Oct 2014 | B2 |
8935164 | Turnbull | Jan 2015 | B2 |
9037458 | Park et al. | May 2015 | B2 |
9053697 | Park | Jun 2015 | B2 |
9113240 | Ramakrishman | Aug 2015 | B2 |
9123343 | Kurki-Suonio | Sep 2015 | B2 |
9135797 | Couper et al. | Sep 2015 | B2 |
9191740 | McIntosh | Nov 2015 | B2 |
9532139 | Lu et al. | Dec 2016 | B1 |
9628896 | Ichimura | Apr 2017 | B2 |
10171922 | Merks | Jan 2019 | B2 |
10652672 | Merks | May 2020 | B2 |
20010046304 | Rast | Nov 2001 | A1 |
20020106091 | Furst et al. | Aug 2002 | A1 |
20020118798 | Langhart et al. | Aug 2002 | A1 |
20020193130 | Yang | Dec 2002 | A1 |
20030033152 | Cameron | Feb 2003 | A1 |
20030035551 | Light | Feb 2003 | A1 |
20030130016 | Matsuura | Jul 2003 | A1 |
20030161097 | Le et al. | Aug 2003 | A1 |
20030165246 | Kvaloy et al. | Sep 2003 | A1 |
20030198359 | Killion | Oct 2003 | A1 |
20040042103 | Mayer | Mar 2004 | A1 |
20040044520 | Chen | Mar 2004 | A1 |
20040109668 | Stuckman | Jun 2004 | A1 |
20040125965 | Alberth, Jr. et al. | Jul 2004 | A1 |
20040133421 | Burnett | Jul 2004 | A1 |
20040190737 | Kuhnel et al. | Sep 2004 | A1 |
20040196992 | Ryan | Oct 2004 | A1 |
20040202340 | Armstrong | Oct 2004 | A1 |
20040203351 | Shearer et al. | Oct 2004 | A1 |
20050058313 | Victorian | Mar 2005 | A1 |
20050071158 | Byford | Mar 2005 | A1 |
20050078838 | Simon | Apr 2005 | A1 |
20050102142 | Soufflet | May 2005 | A1 |
20050123146 | Voix et al. | Jun 2005 | A1 |
20050207605 | Dehe | Sep 2005 | A1 |
20050227674 | Kopra | Oct 2005 | A1 |
20050276420 | Davis | Dec 2005 | A1 |
20050281422 | Armstrong | Dec 2005 | A1 |
20050281423 | Armstrong | Dec 2005 | A1 |
20050283369 | Clauser et al. | Dec 2005 | A1 |
20050288057 | Lai et al. | Dec 2005 | A1 |
20060020451 | Kushner | Jan 2006 | A1 |
20060064037 | Shalon et al. | Mar 2006 | A1 |
20060067551 | Cartwright et al. | Mar 2006 | A1 |
20060083387 | Emoto | Apr 2006 | A1 |
20060083395 | Allen et al. | Apr 2006 | A1 |
20060092043 | Lagassey | May 2006 | A1 |
20060173563 | Borovitski | Aug 2006 | A1 |
20060182287 | Schulein | Aug 2006 | A1 |
20060195322 | Broussard et al. | Aug 2006 | A1 |
20060204014 | Isenberg et al. | Sep 2006 | A1 |
20060287014 | Matsuura | Dec 2006 | A1 |
20070043563 | Comerford et al. | Feb 2007 | A1 |
20070014423 | Darbut | Apr 2007 | A1 |
20070086600 | Boesen | Apr 2007 | A1 |
20070092087 | Bothra | Apr 2007 | A1 |
20070100637 | McCune | May 2007 | A1 |
20070143820 | Pawlowski | Jun 2007 | A1 |
20070160243 | Dijkstra | Jul 2007 | A1 |
20070189544 | Rosenberg | Aug 2007 | A1 |
20070223717 | Boersma | Sep 2007 | A1 |
20070253569 | Bose | Nov 2007 | A1 |
20070276657 | Gournay | Nov 2007 | A1 |
20070291953 | Ngia et al. | Dec 2007 | A1 |
20080037801 | Alves et al. | Feb 2008 | A1 |
20080063228 | Mejia | Mar 2008 | A1 |
20080130908 | Cohen | Jun 2008 | A1 |
20080159547 | Schuler | Jul 2008 | A1 |
20080165988 | Terlizzi et al. | Jul 2008 | A1 |
20080221880 | Cerra et al. | Sep 2008 | A1 |
20090010456 | Goldstein et al. | Jan 2009 | A1 |
20090024234 | Archibald | Jan 2009 | A1 |
20090076821 | Brenner | Mar 2009 | A1 |
20090286515 | Othmer | May 2009 | A1 |
20100061564 | Clemow et al. | Mar 2010 | A1 |
20100086139 | Nicolino, Jr. | Apr 2010 | A1 |
20100086141 | Nicolino, Jr. | Apr 2010 | A1 |
20100119077 | Platz | May 2010 | A1 |
20100296668 | Lee et al. | Nov 2010 | A1 |
20100328224 | Kerr et al. | Dec 2010 | A1 |
20110055256 | Phillips | Mar 2011 | A1 |
20110096939 | Ichimura | Apr 2011 | A1 |
20110187640 | Jacobsen et al. | Aug 2011 | A1 |
20110264447 | Visser et al. | Oct 2011 | A1 |
20110293103 | Park et al. | Dec 2011 | A1 |
20120123772 | Thyssen | May 2012 | A1 |
20120170412 | Calhoun | Jul 2012 | A1 |
20130170660 | Kristensen | Jul 2013 | A1 |
20130188796 | Kristensen | Jul 2013 | A1 |
20130297305 | Turnbull | Nov 2013 | A1 |
20130329912 | Krishnaswamy | Dec 2013 | A1 |
20140122092 | Goldstein | May 2014 | A1 |
20140229170 | Atti | Aug 2014 | A1 |
20160049915 | Wang | Feb 2016 | A1 |
20160058378 | Wisby et al. | Mar 2016 | A1 |
20160078879 | Lu | Mar 2016 | A1 |
20160104452 | Guan et al. | Apr 2016 | A1 |
20160165361 | Miller et al. | Jun 2016 | A1 |
20180122400 | Rasmussen | May 2018 | A1 |
Number | Date | Country |
---|---|---|
1385324 | Jan 2004 | EP |
1401240 | Mar 2004 | EP |
1519625 | Mar 2005 | EP |
1640972 | Mar 2006 | EP |
2146519 | Jun 2012 | EP |
2146519 | Jun 2012 | EP |
2884763 | Jun 2015 | EP |
2884763 | May 2019 | EP |
H0877468 | Mar 1996 | JP |
H10162283 | Jun 1998 | JP |
3353701 | Dec 2002 | JP |
9326085 | Dec 1993 | WO |
2004114722 | Dec 2004 | WO |
2006037156 | Apr 2006 | WO |
2006054698 | May 2006 | WO |
2007092660 | Aug 2007 | WO |
2008050583 | May 2008 | WO |
2009023784 | Feb 2009 | WO |
2012097150 | Jul 2012 | WO |
Entry |
---|
Bernard Widrow, John R. Glover Jr., John M. McCool, John Kaunitz, Charles S. Williams, Robert H. Hearn, James R. Zeidler, Eugene Dong Jr, and Robert C. Goodlin, Adaptive Noise Cancelling: Principles and Applications, Proceedings of The IEEE, vol. 63, No. 12, Dec. 1975. |
Samsung Electronics Co., Ltd., and Samsung Electronics, America, Inc., v. Staton Techiya, LLC, IPR2022-00282, Dec. 21, 2021. |
Samsung Electronics Co., Ltd., and Samsung Electronics, America, Inc., v. Staton Techiya, LLC, IPR2022-00242, Dec. 23, 2021. |
Samsung Electronics Co., Ltd., and Samsung Electronics, America, Inc., v. Staton Techiya, LLC, IPR2022-00243, Dec. 23, 2021. |
Samsung Electronics Co., Ltd., and Samsung Electronics, America, Inc., v. Staton Techiya, LLC, IPR2022-00234, Dec. 21, 2021. |
Samsung Electronics Co., Ltd., and Samsung Electronics, America, Inc., v. Staton Techiya, LLC, IPR2022-00253, Jan. 18, 2022. |
Samsung Electronics Co., Ltd., and Samsung Electronics, America, Inc., v. Staton Techiya, LLC, IPR2022-00324, Jan. 13, 2022. |
Samsung Electronics Co., Ltd., and Samsung Electronics, America, Inc., v. Staton Techiya, LLC, IPR2022-00281, Jan. 18, 2022. |
Samsung Electronics Co., Ltd., and Samsung Electronics, America, Inc., v. Staton Techiya, LLC, IPR2022-00302, Jan. 13, 2022. |
Samsung Electronics Co., Ltd., and Samsung Electronics, America, Inc., v. Staton Techiya, LLC, IPR2022-00369, Feb. 18, 2022. |
Samsung Electronics Co., Ltd., and Samsung Electronics, America, Inc., v. Staton Techiya, LLC, IPR2022-00388, Feb. 18, 2022. |
Samsung Electronics Co., Ltd., and Samsung Electronics, America, Inc., v. Staton Techiya, LLC, IPR2022-00410, Feb. 18, 2022. |
Samsung Electronics Co., Ltd., and Samsung Electronics, America, Inc., v. Staton Techiya, LLC, IPR2022-01078, Jun. 9, 2022. |
Samsung Electronics Co., Ltd., and Samsung Electronics, America, Inc., v. Staton Techiya, LLC, IPR2022-01099, Jun. 9, 2022. |
Samsung Electronics Co., Ltd., and Samsung Electronics, America, Inc., v. Staton Techiya, LLC, IPR2022-01106, Jun. 9, 2022. |
Samsung Electronics Co., Ltd., and Samsung Electronics, America, Inc., v. Staton Techiya, LLC, IPR2022-01098, Jun. 9, 2022. |
U.S. Appl. No. 90/015,146, Samsung Electronics Co., Ltd. and Samsung Electronics, America, Inc., Request for Ex Parte Reexamination of U.S. Pat. No. 10,979,836. |
Olwal, A. and Feiner S. Interaction Techniques Using Prosodic Features of Speech and Audio Localization. Proceedings of IUI 2005 (International Conference on Intelligent User Interfaces), San Diego, CA, Jan. 9-12, 2005, p. 284-286. |
Bernard Widrow, John R. Glover Jr., John M. McCool, John Kaunitz, Charles S. Williams, Robert H. Heam, James R. Zeidler, Eugene Dong Jr, and Robert C. Goodlin, Adaptive Noise Cancelling: Principles and Applications, Proceedings of the IEEE, vol. 63, No. 12, Dec. 1975. |
Mauro Dentino, John M. McCool, and Bernard Widrow, Adaptive Filtering in the Frequency Domain, Proceedings of the IEEE, vol. 66, No. 12, Dec. 1978. |
Number | Date | Country | |
---|---|---|---|
20200401368 A1 | Dec 2020 | US |
Number | Date | Country | |
---|---|---|---|
62068273 | Oct 2014 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 16227695 | Dec 2018 | US |
Child | 17013538 | US | |
Parent | 14922475 | Oct 2015 | US |
Child | 16227695 | US |