This disclosure relates to microphones and, more specifically, to voice activity detection (VAD) approaches used with these microphones.
Microphones are used to obtain a voice signal from a speaker. Once obtained, the signal can be processed in a number of different ways. A wide variety of functions can be provided by today's microphones and they can interface with and utilize a variety of different algorithms.
Voice triggering, for example, as used in mobile systems is an increasingly popular feature that customers wish to use. For example, a user may wish to speak commands into a mobile device and have the device react in response to the commands. In these cases, a digital signal process (DSP) will first detect if there is voice in an audio signal captured by a microphone, and then, subsequently, analysis is performed on the signal to predict what the spoken word was in the received audio signal. Various voice activity detection (VAD) approaches have been developed and deployed in various types of devices such as cellular phone and personal computers.
In the use of these approaches, power consumption becomes a concern. Lower power consumption gives longer standby time. For today's smart-phones (in particular), the use of power is a key parameter. Unfortunately, present approaches of operating microphones use and waste much power. This has resulted in user dissatisfaction with these previous approaches and systems.
For a more complete understanding of the disclosure, reference should be made to the following detailed description and accompanying drawings wherein:
Those of ordinary skill in the art will appreciate that elements in the figures are illustrated for simplicity and clarity. It will be appreciated further that certain actions and/or steps may be described or depicted in a particular order of occurrence while those of ordinary skill in the art will understand that such specificity with respect to sequence is not actually required. It will also be understood that the terms and expressions used herein have the ordinary meaning as is accorded to such terms and expressions with respect to their corresponding respective areas of inquiry and study except where specific meanings have otherwise been set forth herein.
The present approaches change the way that present mobile systems are partitioned, the functionality of the microphone, and the modes in which it can operate. In these regards, a microphone with a voice or event detection block is presented and this enables the microphone to generate an interrupt signal which can wake the system up.
In some aspects, the microphones described herein include five external connections. The first connection may be a power connection and the second connection may be a ground connection. The third, fourth, and fifth connections are connections from the microphone to a host device (e.g., host circuitry in the device in which the microphone resides). More specifically, the third connection may be a data connection, the fourth connection may be an interrupt (sent from the microphone to the host), and the fifth connection may be a clock signal (sent from the host to the microphone).
The microphone may have several modes of operation and these are controlled by a clock signal. The host receives a data signal from the microphone as well as an interrupt signal. The host has multiple power modes controlled by the interrupt signal generated by the microphone. The host generates the clock signal for the microphone and thereby controls the mode of operation of the microphone. In one example, the absence of a clock causes the microphone to enter voice activity detection (VAD) mode.
In one example, the microphone includes a VAD mode of operation. In this mode of operation, the microphone has a very low power consumption, and it runs on a relatively low clock frequency which can be supplied either externally (from the host) or from an on-chip oscillator.
This operation enables very low power consumption levels as only the most necessary signal processing is active during this mode. In one aspect, the analog signal processing blocks of the microphone (such as the microphone preamplifier, the analog to digital converter, the voltage regulators and the charge pump supplying the bias voltage for the MicroElectroMechanicalSystem (MEMS) microphone) operate at lower power. In this mode, these blocks are operated at reduced power enough for achieving the bandwidth and signal to noise ratio (SNR) needed for the VAD or event detector to function. For example, a bandwidth of operation of approximately 8 kHz after decimation and an SNR of approximately 60 dB can be achieved.
The VAD or event detector can be implemented using well known techniques. For example, short term energy measures vs. long term energy measures, zero crossing and so forth can be used to detect voice signals.
It should also be noted that the interface (the connections between the host and the microphone) is not limited to the exact signals described herein. In these regards, other signals or other combinations of signals may be used. The physical implementation of the interface may also vary. For example, it may be a single physical bi-directional line, or multiple uni-directional lines.
In other aspects, the microphone further includes a delay buffer. In other examples, upon wake-up, buffered data is transmitted over a first transmission line and real-time data is transmitted simultaneously over a second and separate output lines. In still other examples, buffered data is flushed or discarded upon switching modes.
In still other aspects, the microphone is over-clocked to catch up buffered data to real time data. The microphone can also be used for multi-microphone voice triggered applications. In one example, the microphone wakes up and enables data synchronizations of a second microphone either in a buffered or a real time mode.
Referring now to
A VDD power signal 112 and a ground signal 114 are coupled to the microphone 102. An interrupt signal 108 and a data signal 110 are sent from the microphone 102 to the host 104. A clock signal 106 is sent from the host 104 to the microphone 102.
In one example of the operation of the system 100 of
In one example, the microphone 102 includes a VAD mode of operation. In this mode, the microphone 102 has a very low power consumption, and it runs on a relatively low clock frequency which can be supplied either externally (from the clock signal 106 supplied by the host 104) or from an internal on-chip oscillator in the microphone 102. Consequently, when an interrupt is made, the low power operation can be changed to a higher powered mode of operation. As will be recognized, the interrupt allows the system to be operated in both a low power mode of operation and a high power mode of operation.
In some aspects, the integrated circuit and the MEMS circuit receive a clock signal from an external host. The clock signal is effective to cause the MEMS circuit and integrated circuit to operate in full system operation mode during a first time period and in a voice activity mode of operation during a second time period. The voice activity mode has a first power consumption or level and the full system operation mode has a second power consumption or level. The first power consumption is less than the second power consumption. The integrated circuit is configured to generate an interrupt upon the detection of voice activity, and send the interrupt to the host. The absence of a clock causes the microphone to enter a voice activity detection mode. The clock circuit may be located on the same chip as the other components or located externally.
In other aspects, the present approaches provide the ability to operate the internal clock at a third power consumption or level and thereafter generate an external data stream and clock to signal the system to operate at a fourth power consumption or level. The third power level is less than the fourth power level, and the fourth power level is less than the first power level.
In still other aspects, the external clock may be detected and this may be applied after the detection of voice activity. Then, the internal clock is synchronized to the external clock. Furthermore, the VAD signal processing is also synchronized to the external clock after synchronization.
In yet other aspects, the system may fall back to the internal clock for power savings at the first or second power level when the external clock is removed to reduce overall system power.
In another example, an external signal may be generated from the internal combination of the clock and the acoustic activity detection that acts as a signal and clock combination to signal the host to interrupt/wake up and recognize the voice signal. The bandwidth of the input signal after buffering may be in one example approximately 8 kHz. Other examples are possible. Data may be provided in PCM or PDM formats. Other examples of formats are possible.
Referring now to
In the VAD mode 202, no data is transmitted out of the microphone. The host is sleeping in this mode. In one aspect, when the host is sleeping only the functionality needed to react to a generated interrupt signal from the microphone is enabled. In this mode, the host is clocked at a very low clock to lower power and all unnecessary functionality is powered down. This mode has the absolute lowest power consumption possible as all unnecessary blocks are powered down and no switching of clock or data signals occur. In other words, the mode 202 is a low power mode, where VAD is enabled and no external clock is being received from the host.
In the wake up host (partially) mode 204, the external clock is received from the host. Data is transmitted out of the microphone. The host becomes partially awake due to the detection of a keyword and/or the detection of voice activity. Subsequently, the external clock for the microphone is enabled with a clock frequency corresponding to a higher performance level enough for doing reliable keyword detection.
The full system operation mode 206 is the high power or standard operating mode of the microphone.
In one example of the operation of the state transition diagram of
In the mode 204, the host detects a keyword/speech and decides that a specific key word, phrase, or sentence is recognized. This determination triggers the transition from the mode 204 to the full system wake up 206.
In the mode 206, the host keyword detect/speech recognition algorithm decides that no key word, phrase, or sentence is recognized which triggers the transition back to the VAD mode 202. In this respect, another mode or state (not shown here in
Referring now to
A VDD power signal 312 and a ground signal 314 are coupled to the ASIC 304. An interrupt signal 308 and a data signal 310 are sent by the ASIC 304 to a host (e.g., the host 104 of
In one example of the operation of the microphone 300 of
In one example, the microphone 300 includes a VAD mode of operation. In this mode, the microphone 300 has a very low power consumption, and it runs on a relatively low clock frequency which can be supplied either externally (from the clock signal 306 supplied by the host) or from an internal on-chip oscillator in the microphone 300. Consequently, when an interrupt is made, the low power operation can be changed to a higher powered operation. The interrupt allows the system to be operated in both a low power mode of operation and a high power mode of operation.
Referring now
The charge pump CHP 402 charges the MEMS element (e.g., the MEMS chip 302 of
The A/D converter 406 converts the analog signal from the amplifier 404 to a digital signal. The VAD 408 processes the digital signal from the A/D converter 406 and generates an interrupt signal 411 if voice is detected. The control block 410 controls the internal states of the ASIC 400 in a response to an external clock signal 413 (received from a host) and the interrupt signal 411 from the VAD 408. The switch 414 is controlled by the control block 410 to allow data 415 to be sent to an external host.
A data buffer may be included at the output of the A/D converter 406. The buffer may buffer data representing the audio signal and correspond to or approximate the delay of the VAD 408 (e.g., 10 ms-360 ms to mention one example range with other ranges being possible). A decimation filter stage could be included at the output of the A/D converter in order to reduce buffer size (sampler RAM) and power, this will limit the bandwidth. In this case an interpolation stage at the buffer output must be added as well. In this case, the delay may be around 200 msec. In another example, the delay may be around 360 msec. Other examples of delay values are possible. The buffer is provided to allow any recognition algorithm the latency required to wake-up the host, collect sufficient background noise statistics, and recognize the key phrase within the ambient noise.
The buffered data may be sent to the host via some connection such as the interrupt line 411 or the data line 415. If sending data via the data line 415, it may be sent at an increased clock rate compared to the sampling clock.
Additionally, the parameters or settings of the VAD 408 may be changed or controlled. For example, the reading or writing settings of registers and memory (both erasable and non-erasable) of the VAD 408 may be changed or controlled to, for example, account for various levels of background noise.
The functionality of the VAD 408 may be enhanced or changed. For example, voice or phrase detection may be used. Other functions may also be included.
Referring now
The interface block 502 provides interfacing functionality with respect to a microphone (e.g., the microphone 102 in
The control block 510 controls the power states of the microphone (e.g., the microphone 102 of
The memory 512 stores the states of the system, data, and other information. The on chip oscillator 511 is controllable from the control block 510 and enables at least two clock modes corresponding to at least two power modes.
Referring now
Signal 602 shows an audio signal. Upon detection of an audio signal, the microphone generates an interrupt as shown by signal 604. Data is also generated by the microphone as shown by signal 606. As can be seen by signal 608, the host in response to the interrupt changes the clock signal (sent to the microphone) from a low frequency signal to a high frequency signal. Alternatively (as shown by signal 610), in low power mode (before the event), the host may not send a clock signal and may only start the high frequency clock signal upon detection of the event.
Preferred embodiments of this disclosure are described herein, including the best mode known to the inventor(s). It should be understood that the illustrated embodiments are exemplary only, and should not be taken as limiting the scope of the appended claims.
This patent claims benefit under 35 U.S.C. §119 (e) to U.S. Provisional Application No. 61/826,587 entitled “VAD detection Microphone and Method of Operating the Same” filed May 23, 2013, the content of which is incorporated herein by reference in its entirety.
Number | Name | Date | Kind |
---|---|---|---|
4052568 | Jankowski | Oct 1977 | A |
5577164 | Kaneko | Nov 1996 | A |
5598447 | Usui | Jan 1997 | A |
5675808 | Gulick et al. | Oct 1997 | A |
5822598 | Lam | Oct 1998 | A |
5983186 | Miyazawa | Nov 1999 | A |
6049565 | Paradine et al. | Apr 2000 | A |
6057791 | Knapp | May 2000 | A |
6070140 | Tran | May 2000 | A |
6154721 | Sonnic | Nov 2000 | A |
6249757 | Cason | Jun 2001 | B1 |
6259291 | Huang | Jul 2001 | B1 |
6282268 | Hughes | Aug 2001 | B1 |
6324514 | Matulich | Nov 2001 | B2 |
6397186 | Bush et al. | May 2002 | B1 |
6453020 | Hughes | Sep 2002 | B1 |
6564330 | Martinez | May 2003 | B1 |
6591234 | Chandran | Jul 2003 | B1 |
6640208 | Zhang | Oct 2003 | B1 |
6756700 | Zeng | Jun 2004 | B2 |
6829244 | Wildfeuer | Dec 2004 | B1 |
7190038 | Dehe | Mar 2007 | B2 |
7415416 | Rees | Aug 2008 | B2 |
7473572 | Dehe | Jan 2009 | B2 |
7619551 | Wu | Nov 2009 | B1 |
7630504 | Poulsen | Dec 2009 | B2 |
7774202 | Spengler | Aug 2010 | B2 |
7774204 | Mozer et al. | Aug 2010 | B2 |
7781249 | Laming | Aug 2010 | B2 |
7795695 | Weigold | Sep 2010 | B2 |
7825484 | Martin | Nov 2010 | B2 |
7829961 | Hsiao | Nov 2010 | B2 |
7856283 | Burk et al. | Dec 2010 | B2 |
7856804 | Laming | Dec 2010 | B2 |
7903831 | Song | Mar 2011 | B2 |
7936293 | Hamashita | May 2011 | B2 |
7941313 | Garudadri | May 2011 | B2 |
7957972 | Huang et al. | Jun 2011 | B2 |
7994947 | Ledzius | Aug 2011 | B1 |
8171322 | Fiennes | May 2012 | B2 |
8208621 | Hsu | Jun 2012 | B1 |
8275148 | Li et al. | Sep 2012 | B2 |
8331581 | Pennock | Dec 2012 | B2 |
8666751 | Murthi et al. | Mar 2014 | B2 |
8687823 | Loeppert | Apr 2014 | B2 |
8731210 | Cheng | May 2014 | B2 |
8798289 | Every | Aug 2014 | B1 |
8804974 | Melanson | Aug 2014 | B1 |
8831246 | Josefsson | Sep 2014 | B2 |
8849231 | Murgia | Sep 2014 | B1 |
8972252 | Hung et al. | Mar 2015 | B2 |
8996381 | Mozer et al. | Mar 2015 | B2 |
9020819 | Saitoh | Apr 2015 | B2 |
9043211 | Haiut et al. | May 2015 | B2 |
9059630 | Gueorguiev | Jun 2015 | B2 |
9073747 | Ye | Jul 2015 | B2 |
9076447 | Nandy | Jul 2015 | B2 |
9111548 | Nandy | Aug 2015 | B2 |
9112984 | Sejnoha et al. | Aug 2015 | B2 |
9113263 | Furst | Aug 2015 | B2 |
9119150 | Murgia | Aug 2015 | B1 |
9142215 | Rosner | Sep 2015 | B2 |
9147397 | Thomsen | Sep 2015 | B2 |
9161112 | Ye | Oct 2015 | B2 |
20020054588 | Mehta et al. | May 2002 | A1 |
20020116186 | Strauss | Aug 2002 | A1 |
20020123893 | Woodward | Sep 2002 | A1 |
20020184015 | Li | Dec 2002 | A1 |
20030004720 | Garudadri et al. | Jan 2003 | A1 |
20030061036 | Garudadri | Mar 2003 | A1 |
20030091000 | Chu | May 2003 | A1 |
20030138061 | Li | Jul 2003 | A1 |
20030144844 | Colmenarez | Jul 2003 | A1 |
20040022379 | Klos | Feb 2004 | A1 |
20050207605 | Dehe | Sep 2005 | A1 |
20060013415 | Winchester | Jan 2006 | A1 |
20060074658 | Chadha | Apr 2006 | A1 |
20060233389 | Mao | Oct 2006 | A1 |
20060247923 | Chandran | Nov 2006 | A1 |
20070127761 | Poulsen | Jun 2007 | A1 |
20070168908 | Paolucci et al. | Jul 2007 | A1 |
20070274297 | Cross et al. | Nov 2007 | A1 |
20070278501 | Macpherson | Dec 2007 | A1 |
20080089536 | Josefsson | Apr 2008 | A1 |
20080120098 | Makinen | May 2008 | A1 |
20080175425 | Roberts | Jul 2008 | A1 |
20080201138 | Visser | Aug 2008 | A1 |
20080267431 | Leidl | Oct 2008 | A1 |
20080279407 | Pahl | Nov 2008 | A1 |
20080283942 | Huang | Nov 2008 | A1 |
20090001553 | Pahl | Jan 2009 | A1 |
20090022172 | Haberman | Jan 2009 | A1 |
20090180655 | Tien | Jul 2009 | A1 |
20090234645 | Bruhn | Sep 2009 | A1 |
20100046780 | Song | Feb 2010 | A1 |
20100052082 | Lee | Mar 2010 | A1 |
20100057474 | Kong | Mar 2010 | A1 |
20100128894 | Petit | May 2010 | A1 |
20100128914 | Khenkin | May 2010 | A1 |
20100131783 | Weng | May 2010 | A1 |
20100183181 | Wang | Jul 2010 | A1 |
20100246877 | Wang | Sep 2010 | A1 |
20100290644 | Wu | Nov 2010 | A1 |
20100292987 | Kawaguchi | Nov 2010 | A1 |
20100322443 | Wu | Dec 2010 | A1 |
20100322451 | Wu | Dec 2010 | A1 |
20110007907 | Park | Jan 2011 | A1 |
20110013787 | Chang | Jan 2011 | A1 |
20110029109 | Thomsen et al. | Feb 2011 | A1 |
20110075875 | Wu | Mar 2011 | A1 |
20110106533 | Yu | May 2011 | A1 |
20110208520 | Lee | Aug 2011 | A1 |
20110280109 | Raymond | Nov 2011 | A1 |
20120010890 | Koverzin | Jan 2012 | A1 |
20120112804 | Li et al. | May 2012 | A1 |
20120113899 | Overmars | May 2012 | A1 |
20120232896 | Taleb et al. | Sep 2012 | A1 |
20120250881 | Mulligan | Oct 2012 | A1 |
20120250910 | Shajaan et al. | Oct 2012 | A1 |
20120310641 | Niemisto et al. | Dec 2012 | A1 |
20130035777 | Niemisto | Feb 2013 | A1 |
20130044898 | Schultz | Feb 2013 | A1 |
20130058495 | Furst et al. | Mar 2013 | A1 |
20130058506 | Boor | Mar 2013 | A1 |
20130223635 | Singer et al. | Aug 2013 | A1 |
20130226324 | Hannuksela | Aug 2013 | A1 |
20130246071 | Lee | Sep 2013 | A1 |
20130322461 | Poulsen | Dec 2013 | A1 |
20130343584 | Bennett et al. | Dec 2013 | A1 |
20140064523 | Kropfitsch | Mar 2014 | A1 |
20140122078 | Joshi | May 2014 | A1 |
20140143545 | McKeeman | May 2014 | A1 |
20140163978 | Basye et al. | Jun 2014 | A1 |
20140177113 | Gueorguiev | Jun 2014 | A1 |
20140188467 | Jing | Jul 2014 | A1 |
20140188470 | Chang | Jul 2014 | A1 |
20140197887 | Hovesten | Jul 2014 | A1 |
20140244269 | Tokutake | Aug 2014 | A1 |
20140244273 | Laroche | Aug 2014 | A1 |
20140249820 | Hsu | Sep 2014 | A1 |
20140257813 | Mortensen | Sep 2014 | A1 |
20140257821 | Adams et al. | Sep 2014 | A1 |
20140270260 | Goertz et al. | Sep 2014 | A1 |
20140274203 | Ganong et al. | Sep 2014 | A1 |
20140278435 | Ganong et al. | Sep 2014 | A1 |
20140281628 | Nigam et al. | Sep 2014 | A1 |
20140343949 | Huang et al. | Nov 2014 | A1 |
20140348345 | Furst | Nov 2014 | A1 |
20140358552 | Xu | Dec 2014 | A1 |
20150039303 | Lesso | Feb 2015 | A1 |
20150043755 | Furst | Feb 2015 | A1 |
20150046157 | Wolff | Feb 2015 | A1 |
20150046162 | Aley-Raz | Feb 2015 | A1 |
20150049884 | Ye | Feb 2015 | A1 |
20150055803 | Qutub | Feb 2015 | A1 |
20150058001 | Dai | Feb 2015 | A1 |
20150063594 | Nielsen | Mar 2015 | A1 |
20150073780 | Sharma | Mar 2015 | A1 |
20150073785 | Sharma | Mar 2015 | A1 |
20150088500 | Conliffe | Mar 2015 | A1 |
20150106085 | Lindahl | Apr 2015 | A1 |
20150110290 | Furst | Apr 2015 | A1 |
20150112690 | Guha et al. | Apr 2015 | A1 |
20150134331 | Millet et al. | May 2015 | A1 |
20150154981 | Barreda | Jun 2015 | A1 |
20150161989 | Hsu | Jun 2015 | A1 |
20150195656 | Ye | Jul 2015 | A1 |
20150206527 | Connolly | Jul 2015 | A1 |
20150256660 | Kaller | Sep 2015 | A1 |
20150256916 | Volk | Sep 2015 | A1 |
20150287401 | Lee | Oct 2015 | A1 |
20150302865 | Pilli | Oct 2015 | A1 |
20150304502 | Pilli | Oct 2015 | A1 |
20150350760 | Nandy | Dec 2015 | A1 |
20150350774 | Furst | Dec 2015 | A1 |
20160012007 | Popper | Jan 2016 | A1 |
20160087596 | Yurrtas | Mar 2016 | A1 |
20160133271 | Kuntzman | May 2016 | A1 |
20160134975 | Kuntzman | May 2016 | A1 |
Number | Date | Country |
---|---|---|
2001236095 | Aug 2001 | JP |
2004219728 | Aug 2004 | JP |
2009130591 | Oct 2009 | WO |
2011106065 | Sep 2011 | WO |
2011140096 | Nov 2011 | WO |
2013049358 | Apr 2013 | WO |
2013085499 | Jun 2013 | WO |
Entry |
---|
International Search Report and Written Opinion for PCT/US2014/038790, dated Sep. 24, 2014 (9 pages). |
Search Report for PCT/EP2014/064324, dated Feb. 12, 2015, 13 pages. |
“MEMS technologies: Microphone” EE Herald Jun. 20, 2013. |
Delta-sigma modulation, Wikipedia (Jul. 4, 2013). |
Pulse-density modulation, Wikipedia (May 3, 2013). |
Kite, Understanding PDM Digital Audio, Audio Precision, Beaverton, OR, 2012. |
International Search Report and Written Opinion for PCT/US2014/060567 dated Jan. 16, 2015 (12 pages). |
International Search Report and Written Opinion for PCT/US2014/062861 dated Jan. 23, 2015 (12 pages). |
U.S. Appl. No. 14/285,858, filed May 22, 2014, Santos. |
U.S. Appl. No. 14/495,482, filed Sep. 24, 2014, Murgia. |
U.S. Appl. No. 14/522,264, filed Oct. 23, 2014, Murgia. |
U.S. Appl. No. 14/698,652, filed Apr. 28, 2015, Yapanel. |
U.S. Appl. No. 14/749,425, filed Jun. 24, 2015, Verma. |
U.S. Appl. No. 14/853,947, Sep. 14, 2015, Yen. |
U.S. Appl. No. 62/100,758, Jan. 7, 2015, Rossum. |
International Search Report and Written Opinion for PCT/US2016/013859 dated Apr. 29, 2016 (12 pages). |
Search Report of Taiwan Patent Application No. 103135811, dated Apr. 18, 2016 (1 page). |
U.S. Appl. No. 14/797,310, filed Jul. 13, 2015, entitled “Microphone Apparatus and Method With Catch-Up Buffer”. |
U.S. Appl. No. 14/989,445, filed Jan. 6, 2016, entitled “Utilizing Digital Microphones for Low Power Keyword Detection and Noise Suppression”. |
U.S. Appl. No. 14/698,652, filed Apr. 28, 2015, entitled “Keyword Sensing Voice Activity Detection”. |
Number | Date | Country | |
---|---|---|---|
20140348345 A1 | Nov 2014 | US |
Number | Date | Country | |
---|---|---|---|
61826587 | May 2013 | US |