This application relates to microphones and, more specifically, to approaches for operating these microphones.
Microphones are used to obtain a voice signal from a speaker. Once obtained, the signal can be processed in a number of different ways. A wide variety of functions can be provided by today's microphones and they can interface with and utilize a variety of different algorithms.
Voice triggering, for example, as used in mobile systems is an increasingly popular feature that customers wish to use. For example, a user may wish to speak commands into a mobile device and have the device react in response to the commands. In these cases, a voice activity detector may first detect whether there is voice in an audio signal captured by a microphone, and then, subsequently, analysis is performed on the signal to predict what the spoken word was in the received audio signal. Various voice activity detection (VAD) approaches have been developed and deployed in various types of devices such as cellular phones and personal computers.
Microphones that are always on are often equipped with internal oscillators and operate at very low power. Low power microphones are used in various applications and sometimes two or more microphones are used when the device is brought out of the low power mode. Although the low power aspect allows some of the microphones to be on all the time in a low power listening mode, the microphones may also use buffers to aid in voice activity detection, which introduce processing delays. The processing delays may cause problems at the far end of the system where the signals frequently need to be processed as quickly as possible.
The problems of previous approaches have resulted in some user dissatisfaction with these previous approaches.
For a more complete understanding of the disclosure, reference should be made to the following detailed description and accompanying drawings wherein:
Skilled artisans will appreciate that elements in the figures are illustrated for simplicity and clarity. It will further be appreciated that certain actions and/or steps may be described or depicted in a particular order of occurrence while those skilled in the art will understand that such specificity with respect to sequence is not actually required. It will also be understood that the terms and expressions used herein have the ordinary meaning as is accorded to such terms and expressions with respect to their corresponding respective areas of inquiry and study except where specific meanings have otherwise been set forth herein.
The present approaches utilize two (or potentially more) microphones to obtain speech phrases from an utterance of a speaker. The effects of delays caused by buffers in one of the microphones are significantly reduced or eliminated. The approaches described herein are easy to implement and eliminate problems and limitations associated with prior approaches.
Referring now to
The first microphone 102 and second microphone 104 may be micro electro mechanical system (MEMS) microphones. In one example, these microphones are assemblies including a sensing element (diaphragm and back plate) and an application specific integrated circuit (which includes a buffer in the case of microphone 102 and potentially performs other processing functions). Sound energy is received by the microphones, moves the diaphragms and produces an electrical signal (which may or may not be buffered).
The processing device 106 may include a codec 108 and an application processor 110. The codec 108 in this example may supply the clock signals to the microphones 102 and 104, and may perform other signal processing functions. The application processor 110 may also perform processing related to the device in which the microphones 102 and 104 are deployed. For example, if the microphones 102 and 104 are deployed in a cellular phone, the application processor 110 may perform processing associated with the cellular phone. Although both a codec 108 and an application processor 110 are shown here, it will be appreciated that these devices can be merged together into a single processing device.
A clock signal 112 is applied to the microphones. Application of the clock signal, when applying the power signal to the microphone, causes the first microphone 102 to operate in a normal operating mode where incoming data to the ASIC is not buffered, but passed through to the output of the microphone 102. Non-application of the clock signal after power has been applied to the microphone, causes the first microphone 102 to operate in a low power operating mode. In this mode, incoming data to the ASIC is buffered and not directly passed through to the output of the microphone 102, thereby introducing a buffering delay, in one example of 256 milliseconds. The clock signal may be applied at one frequency when the microphone is in low power mode after acoustic activity has been detected and may be applied at the same or different frequency in the normal operating mode.
In one example of the operation of the system of
In one embodiment, the processing device determines whether the phrase includes a trigger phrase (e.g., OK GOOGLE NOW) within a phrase segment received at the first microphone 102. By “trigger phrase”, it is meant any phrase that signifies that a command is immediately present after the trigger phrase. The second microphone 104 is turned on by the processor 106 as a result of the trigger phrase having been detected at the first microphone; the second microphone 104 after activation captures voice data in real time.
In one embodiment, the first microphone 102 is turned off by the processor 106 after a time period of (delay+x) where delay is the buffering delay of the first microphone 102 and x is the period of common speech information that has been received at each of the microphones 102 and 104. In one example, x can be determined by the algorithm required to calibrate the two microphones. This calibration may include determination of and compensation for the acoustic delay and gain difference between microphones 102 and 104. In embodiments where the first microphone is turned off, the first microphone is quickly turned back on after being turned off (e.g., within approximately 20 milliseconds) and placed in a normal mode of operation by receiving a clock signal 112 from the processor 106.
As will be more fully apparent from the discussion below, at least one and in some cases both microphones do not detect the entire uttered phase (e.g., OK GOOGLE NOW, WHAT IS THE WEATHER TODAY?) and thus the one or more microphones do not provide data derived from or corresponding to the entire phrase to the processor 106 for further processing. At the processor 106, the entire phrase (e.g., OK GOOGLE NOW, WHAT IS THE WEATHER TODAY?) is stitched together for each microphone 102 and 104 based upon information received from both microphones. An output 114 from processor 106 includes assembled phrases 116 and 118 (e.g., each being OK GOOGLE NOW, WHAT IS THE WEATHER TODAY?) with the first phrase 116 being associated with the first microphone 102 and the second phrase 118 being associated with the second microphone 104. It will be appreciated that this various processing described above can occur at either the codec 108 or the application processor 110, or at other processing devices (not shown in
Referring now to
At step 202, the first microphone is on and uses the buffer. The second microphone is off.
At step 204, the trigger phrase is detected from data received from the first microphone. At step 206, the second microphone is turned on as a result of the trigger phrase having been detected. At step 208, the second microphone captures voice data in real time.
At step 210, the first microphone is turned off after a time period of (delay+x) where the delay is the buffering delay of the first microphone (i.e., how long data takes to move through its buffer) and x is the period of common speech between the two microphones. In one example, x can be determined by the algorithm required to calibrate the two microphones.
At step 212, the first microphone is quickly turned on after being turned off (e.g., the microphone is activated approximately 20 milliseconds after being deactivated) and placed in a normal mode of operation (i.e., a non-buffering mode of operation as explained elsewhere herein). At step 214, data derived from segments of the phrase received by the plural microphones are stitched together using suitable algorithms to form electronic representations of the entire phrase, one associated with each microphone. One example of assembling data from different microphones to form two separate electronic representations of complete phrases (one for first microphone and the other for the second microphone) is described below with respect to
Referring now to
At step 322, a user utters a trigger phrase (e.g., OK GOOGLE NOW) seamlessly followed by a command (WHAT IS THE WEATHER TODAY?). In this example, the trigger phrase and command are collectively labeled as 302. A first microphone 352 and a second microphone 354 (PDM microphone) detect parts of the complete phrase (e.g., the trigger and command in the example above). The first microphone 352 is in a low power sensing mode and all signals in this mode are buffered before being output, consequently introducing a buffer delay 324 (e.g., 256 milliseconds). During this delay time, a time period 304 exists at the output of the first microphone 352 where no audio is being supplied. At time 326, the start of the audio output of the first microphone 352 occurs. As mentioned, the buffer delay is approximately 256 milliseconds and the 256 millisecond delayed output occurs at the output of the first microphone 352 during period 306.
Another delay 328 (in this case an approximately 100 millisecond delay) may be introduced by the trigger phrase recognition algorithm in the processor (e.g., codec, applications processor, or digital signal processor to mention a few examples). The trigger phrase recognition algorithm compares the received audio to a predefined trigger word or phrase, to determine whether the trigger word or phrase has been uttered. The delay 328 may occur after time 330, which is the end of the delayed version of the trigger phrase. At time 332, after an approximately 256 plus 100 millisecond delay, a beep (or other signal) is emitted or presented to the user signifying that the trigger phrase has been detected. In some examples, no beep may be used and in other examples the “beep” may be inaudible to humans. This signal may be a marker used in later processing and may be removed before stitching together the various speech segments.
At time 334, the second microphone 354 is turned on. Prior to the second microphone 354 being turned on, a time period 304 exists at its output where no audio is being produced.
At time 336, the first microphone 352 is turned off a predetermined time after turning on the second microphone 354. The predetermined time may be 256 milliseconds (the buffer delay of the first microphone 352) plus x, where x is a time period 338 of overlapping speech segments that is used to determine the acoustic delay between microphones 352 and 354. As shown here in this example, x relates to the phrase “E WEA” because “E WEA” is the audio that has been received at both of the microphones 352 and 354. In these regards, the processor can first determine the common audio information received, and then use that to calibrate the microphone signals. This common time period (in this case, the length in time of the phrase “E WEA”) is the value of x.
At step 340, the first microphone 352 is quickly or immediately turned on after a very small (e.g., after an approximately 20 millisecond delay) to operate in a normal processing mode. By normal processing or operating mode, it is meant that the microphone 352 does not buffer the incoming signal, but passes data through without buffering the data. In one example, normal operating mode may be entered by applying a clock signal when applying the power signal, while in low power mode no clock signal is applied, till acoustic activity is detected.
Referring now to
As shown, user uttered audio 402 is in this example OK GOOGLE NOW, WHAT IS THE WEATHER TODAY? There is a large missing segment of missing audio (at period 410) at the second microphone 354. However, the first microphone 352 has obtained this audio at time period 404. Thus, audio (for time period 404) can be used to obtain audio (for time period 410) for the second microphone, after appropriate calibration.
Time period 406 is missing from the first microphone 352. But, the second microphone 354 has obtained this audio and so this can be included in the audio for the first microphone 322.
The first microphone 352 has obtained audio for time period 408 in real time. Consequently, the complete audio phrase (430) (“OK GOOGLE NOW, WHAT IS THE WEATHER TODAY?”) has been assembled for the first microphone 352 since time periods 404, 406, and 408 have been filled in.
The second microphone 354 obtains real time audio for time period 412. Consequently, the audio phrase 432 (“OK GOOGLE NOW, WHAT IS THE WEATHER TODAY?”) has been assembled for the second microphone 354 since time periods 410 and 412 have been filled in.
In this way, the audio phrases 430 (from the first microphone 352) and 432 (for the second microphone 354) are assembled or stitched together. This processing may occur at a codec, application processor, or digital system processor to mention a few examples. The phrases 430 and 432 may be further processed by other processing devices as needed.
Preferred embodiments of this disclosure are described herein, including the best mode known to the inventors. It should be understood that the illustrated embodiments are exemplary only, and should not be taken as limiting the scope of the appended claims.
This patent claims benefit under 35 U.S.C. § 119(e) to U.S. Provisional Application No. 62/115,898 entitled “Audio Buffer Catch-up Apparatus and Method with Two Microphones” filed Feb. 13, 2015, the content of which is incorporated herein by reference in its entirety.
Number | Name | Date | Kind |
---|---|---|---|
4052568 | Jankowski | Oct 1977 | A |
5577164 | Kaneko | Nov 1996 | A |
5598447 | Usui | Jan 1997 | A |
5675808 | Gulick | Oct 1997 | A |
5822598 | Lam | Oct 1998 | A |
5983186 | Miyazawa | Nov 1999 | A |
6049565 | Paradine | Apr 2000 | A |
6057791 | Knapp | May 2000 | A |
6070140 | Tran | May 2000 | A |
6154721 | Sonnic | Nov 2000 | A |
6249757 | Cason | Jun 2001 | B1 |
6282268 | Hughes | Aug 2001 | B1 |
6324514 | Matulich | Nov 2001 | B2 |
6397186 | Bush | May 2002 | B1 |
6453020 | Hughes | Sep 2002 | B1 |
6564330 | Martinez | May 2003 | B1 |
6591234 | Chandran | Jul 2003 | B1 |
6640208 | Zhang | Oct 2003 | B1 |
6756700 | Zeng | Jun 2004 | B2 |
7190038 | Dehe | Mar 2007 | B2 |
7415416 | Rees | Aug 2008 | B2 |
7473572 | Dehe | Jan 2009 | B2 |
7619551 | Wu | Nov 2009 | B1 |
7630504 | Poulsen | Dec 2009 | B2 |
7774202 | Spengler | Aug 2010 | B2 |
7774204 | Mozer | Aug 2010 | B2 |
7781249 | Laming | Aug 2010 | B2 |
7795695 | Weigold | Sep 2010 | B2 |
7825484 | Martin | Nov 2010 | B2 |
7829961 | Hsiao | Nov 2010 | B2 |
7856283 | Burk | Dec 2010 | B2 |
7856804 | Laming | Dec 2010 | B2 |
7903831 | Song | Mar 2011 | B2 |
7936293 | Hamashita | May 2011 | B2 |
7941313 | Garudadri | May 2011 | B2 |
7957972 | Huang | Jun 2011 | B2 |
7994947 | Ledzius | Aug 2011 | B1 |
8171322 | Fiennes | May 2012 | B2 |
8208621 | Hsu | Jun 2012 | B1 |
8275148 | Li | Sep 2012 | B2 |
8331581 | Pennock | Dec 2012 | B2 |
8666751 | Murthi | Mar 2014 | B2 |
8687823 | Loeppert | Apr 2014 | B2 |
8731210 | Cheng | May 2014 | B2 |
8798289 | Every | Aug 2014 | B1 |
8804974 | Melanson | Aug 2014 | B1 |
8849231 | Murgia | Sep 2014 | B1 |
8972252 | Hung | Mar 2015 | B2 |
8996381 | Mozer | Mar 2015 | B2 |
9020819 | Saitoh | Apr 2015 | B2 |
9043211 | Haiut | May 2015 | B2 |
9059630 | Gueorguiev | Jun 2015 | B2 |
9073747 | Ye | Jul 2015 | B2 |
9076447 | Nandy | Jul 2015 | B2 |
9111548 | Nandy | Aug 2015 | B2 |
9112984 | Sejnoha | Aug 2015 | B2 |
9113263 | Furst | Aug 2015 | B2 |
9119150 | Murgia | Aug 2015 | B1 |
9142215 | Rosner | Sep 2015 | B2 |
9147397 | Thomsen | Sep 2015 | B2 |
9161112 | Ye | Oct 2015 | B2 |
20020054588 | Mehta | May 2002 | A1 |
20020116186 | Strauss | Aug 2002 | A1 |
20020123893 | Woodward | Sep 2002 | A1 |
20020184015 | Li | Dec 2002 | A1 |
20030004720 | Garudadri | Jan 2003 | A1 |
20030061036 | Garudadri | Mar 2003 | A1 |
20030144844 | Colmenarez | Jul 2003 | A1 |
20040022379 | Klos | Feb 2004 | A1 |
20050207605 | Dehe | Sep 2005 | A1 |
20060074658 | Chadha | Apr 2006 | A1 |
20060233389 | Mao | Oct 2006 | A1 |
20060247923 | Chandran | Nov 2006 | A1 |
20070168908 | Paolucci | Jul 2007 | A1 |
20070278501 | MacPherson | Dec 2007 | A1 |
20080089536 | Josefsson | Apr 2008 | A1 |
20080175425 | Roberts | Jul 2008 | A1 |
20080201138 | Visser | Aug 2008 | A1 |
20080267431 | Leidl | Oct 2008 | A1 |
20080279407 | Pahl | Nov 2008 | A1 |
20080283942 | Huang | Nov 2008 | A1 |
20090001553 | Pahl | Jan 2009 | A1 |
20090180655 | Tien | Jul 2009 | A1 |
20100046780 | Song | Feb 2010 | A1 |
20100052082 | Lee | Mar 2010 | A1 |
20100057474 | Kong | Mar 2010 | A1 |
20100128894 | Petit | May 2010 | A1 |
20100128914 | Khenkin | May 2010 | A1 |
20100131783 | Weng | May 2010 | A1 |
20100183181 | Wang | Jul 2010 | A1 |
20100246877 | Wang | Sep 2010 | A1 |
20100290644 | Wu | Nov 2010 | A1 |
20100292987 | Kawaguchi | Nov 2010 | A1 |
20100322443 | Wu | Dec 2010 | A1 |
20100322451 | Wu | Dec 2010 | A1 |
20110007907 | Park | Jan 2011 | A1 |
20110013787 | Chang | Jan 2011 | A1 |
20110029109 | Thomsen | Feb 2011 | A1 |
20110075875 | Wu | Mar 2011 | A1 |
20110106533 | Yu | May 2011 | A1 |
20110208520 | Lee | Aug 2011 | A1 |
20110280109 | Raymond | Nov 2011 | A1 |
20120010890 | Koverzin | Jan 2012 | A1 |
20120232896 | Taleb | Sep 2012 | A1 |
20120250881 | Mulligan | Oct 2012 | A1 |
20120310641 | Niemisto | Dec 2012 | A1 |
20130044898 | Schultz | Feb 2013 | A1 |
20130058506 | Boor | Mar 2013 | A1 |
20130223635 | Singer | Aug 2013 | A1 |
20130226324 | Hannuksela | Aug 2013 | A1 |
20130246071 | Lee | Sep 2013 | A1 |
20130322461 | Poulsen | Dec 2013 | A1 |
20130343584 | Bennett | Dec 2013 | A1 |
20140064523 | Kropfitsch | Mar 2014 | A1 |
20140122078 | Joshi | May 2014 | A1 |
20140143545 | McKeeman | May 2014 | A1 |
20140163978 | Basye | Jun 2014 | A1 |
20140177113 | Gueorguiev | Jun 2014 | A1 |
20140188467 | Jing | Jul 2014 | A1 |
20140188470 | Chang | Jul 2014 | A1 |
20140197887 | Hovesten | Jul 2014 | A1 |
20140214429 | Pantel | Jul 2014 | A1 |
20140244269 | Tokutake | Aug 2014 | A1 |
20140244273 | Laroche | Aug 2014 | A1 |
20140249820 | Hsu | Sep 2014 | A1 |
20140257813 | Mortensen | Sep 2014 | A1 |
20140257821 | Adams | Sep 2014 | A1 |
20140274203 | Ganong | Sep 2014 | A1 |
20140278435 | Ganong | Sep 2014 | A1 |
20140281628 | Nigam | Sep 2014 | A1 |
20140343949 | Huang | Nov 2014 | A1 |
20140348345 | Furst | Nov 2014 | A1 |
20140358552 | Xu | Dec 2014 | A1 |
20150039303 | Lesso | Feb 2015 | A1 |
20150043755 | Furst | Feb 2015 | A1 |
20150046157 | Wolff | Feb 2015 | A1 |
20150046162 | Aley-Raz | Feb 2015 | A1 |
20150049884 | Ye | Feb 2015 | A1 |
20150055803 | Qutub | Feb 2015 | A1 |
20150058001 | Dai | Feb 2015 | A1 |
20150063594 | Nielsen | Mar 2015 | A1 |
20150073780 | Sharma | Mar 2015 | A1 |
20150073785 | Sharma | Mar 2015 | A1 |
20150088500 | Conliffe | Mar 2015 | A1 |
20150106085 | Lindahl | Apr 2015 | A1 |
20150110290 | Furst | Apr 2015 | A1 |
20150112690 | Guha | Apr 2015 | A1 |
20150134331 | Millet | May 2015 | A1 |
20150154981 | Barreda | Jun 2015 | A1 |
20150161989 | Hsu | Jun 2015 | A1 |
20150195656 | Ye | Jul 2015 | A1 |
20150206527 | Connolly | Jul 2015 | A1 |
20150256660 | Kaller | Sep 2015 | A1 |
20150256916 | Volk | Sep 2015 | A1 |
20150287401 | Lee | Oct 2015 | A1 |
20150302865 | Pilli | Oct 2015 | A1 |
20150304502 | Pilli | Oct 2015 | A1 |
20150350760 | Nandy | Dec 2015 | A1 |
20150350774 | Furst | Dec 2015 | A1 |
20160012007 | Popper | Jan 2016 | A1 |
20160087596 | Yurrtas | Mar 2016 | A1 |
20160133271 | Kuntzman | May 2016 | A1 |
20160134975 | Kuntzman | May 2016 | A1 |
20160196838 | Rossum | Jul 2016 | A1 |
Number | Date | Country |
---|---|---|
2001236095 | Aug 2001 | JP |
2004219728 | Aug 2004 | JP |
1020120112325 | Oct 2012 | KR |
2009130591 | Jan 2009 | WO |
2011106065 | Jan 2011 | WO |
2011140096 | Feb 2011 | WO |
2013049358 | Jan 2013 | WO |
2013085499 | Jan 2013 | WO |
Entry |
---|
U.S. Appl. No. 14/285,585, filed May 22, 2014, Santos. |
U.S. Appl. No. 14/495,482, filed Sep. 24, 2014, Murgia. |
U.S. Appl. No. 14/522,264, filed Oct. 23, 2014, Murgia. |
U.S. Appl. No. 14/698,652, filed Apr. 28, 2015, Yapanel. |
U.S. Appl. No. 14/749,425, filed Jun. 24, 2015, Verma. |
U.S. Appl. No. 14/853,947, filed Sep. 14, 2015, Yen. |
U.S. Appl. No. 62/100,758, filed Jan. 7, 2015, Rossum. |
“MEMS technologies: Microphone” EE Herald Jun. 20, 2013. |
Delta-sigma modulation, Wikipedia (Jul. 4, 2013). |
International Search Report and Written Opinion for PCT/EP2014/064324, dated Feb. 12, 2015 (13 pages). |
International Search Report and Written Opinion for PCT/US2014/038790, dated Sep. 24, 2014 (9 pages). |
International Search Report and Written Opinion for PCT/US2014/060567 dated Jan. 16, 2015 (12 pages). |
International Search Report and Written Opinion for PCT/US2014/062861 dated Jan. 23, 2015 (12 pages). |
International Search Report and Written Opinion for PCT/US2016/013859 dated Apr. 29, 2016 (12 pages). |
Kite, Understanding PDM Digital Audio, Audio Precision, Beaverton, OR, 2012. |
Pulse-density modulation, Wikipedia (May 3, 2013). |
Search Report of Taiwan Patent Application No. 103135811, dated Apr. 18, 2016 (1 page). |
International Search Report and Written Opinion for International Application No. PCT/US2016/017110 dated May 26, 2016 (15 pages). |
Number | Date | Country | |
---|---|---|---|
20160240192 A1 | Aug 2016 | US |
Number | Date | Country | |
---|---|---|---|
62115898 | Feb 2015 | US |