Embodiments generally relate to mobile devices. More particularly, embodiments relate to the use of low power voice triggers to initiate interaction with mobile devices.
Hands-free operation of mobile devices may be relevant in a variety of contexts such as in-vehicle operation and disability-related usage scenarios. Initiating mobile device interactivity in a hands-free setting, however, may present a number of challenges. For example, conventional solutions may designate a pre-arranged activation phrase (e.g., “hey computer”) that enables a speech-based user interface for further interaction, wherein audio may be sampled continuously for analysis by a phrase recognizer until the activation phrase is detected. Such an approach may increase power consumption and have a negative impact on battery life.
The various advantages of the embodiments will become apparent to one skilled in the art by reading the following specification and appended claims, and by referencing the following drawings, in which:
Turning now to
More particularly, during the active portion of the periodic detection window, the audio front end 10 may be used to obtain sampled audio from an audio signal captured by the microphone 12. In such a case, the A/D converter 14 may sample the audio signal at a particular sample rate (e.g., x samples per second) to obtain the sampled audio (e.g., N milliseconds of audio data) for each active portion/sampled frame of the periodic detection window.
During the inactive portion of the periodic detection window, on the other hand, the audio front end 10 may forego any sampling of the audio signal and the power management module 22 may reduce the power consumption of one or more components of the audio front end 10. For example, the power management module 22 might power off the microphone 12, A/D converter 14, voice activity detector 18 and/or phrase recognizer 20, place the memory 16 in self-refresh mode, and so forth, during the inactive portion of the periodic detection window. Thus, the front end 10 may sample the audio signal for an odd N milliseconds, then “sleep” for an even N milliseconds during each periodic detection window. Of particular note is that reducing the power consumption of the components of the audio front end 10 during the inactive portion of the periodic detection window may significantly extend battery life for the mobile device.
In one example, overhead associated with power up and power down operations may be taken into consideration when determining the length of the sampled frame (i.e., active portion of the periodic detection window) and dropped frame (i.e., inactive portion of the periodic detection window). For example, the length of the sampled frame (e.g., sampled frame length) may be selected to be substantially greater than any overhead duration associated with power up operations of the audio front end 10 in order to ensure that energy savings are not negated by the duty cycling approach described herein. Similarly, the length of the dropped frame (e.g., dropped frame length) may be selected to be substantially greater than any overhead duration associated with power down operations of the audio front end 10. In this regard, the duty cycle of the periodic detection window may be fifty percent, or some other value, depending upon the circumstances. For example, if the power down overhead is low relative to the power up overhead, the duty cycle might be increased to a value greater than fifty percent in order to increase the sampled frame length and further optimize power savings.
The sampled audio may be buffered in the memory 16, wherein the illustrated voice activity detector 18 determines whether voice activity is present in the audio signal based at least in part on the sampled audio. Thus, the illustrated voice activity detector 18 may make the activity decision based on the odd N millisecond frames obtained during the active portions of the periodic detection windows. If voice activity is detected, the phrase recognizer 20 may analyze the sampled audio to determine whether a pre-arranged activation phrase is present in the audio signal.
Turning now to
Illustrated processing block 32 uses an audio front end of the mobile device to obtain sampled audio from an audio signal during a first portion of a periodic detection window. The power consumption of one or more components of the audio front end may be reduced at block 34 during a second portion of the periodic detection window, wherein a determination may be made at block 36 as to whether voice activity is present in the audio signal based at least in part on the sampled audio. If so, illustrated block 38 continually samples the audio signal (e.g., discontinues duty cycle sampling) in order to increase accuracy for phrase detection purposes. Otherwise, the process may repeat until voice activity is detected.
The illustrated device 40 also includes an input output (IO) module 48, sometimes referred to as a Southbridge of a chipset, that functions as a host device and may communicate with, for example, an audio codec 50, a microphone 52, one or more speakers 54, and mass storage 56 (e.g., hard disk drive/HDD, optical disk, flash memory, etc.). The audio codec 50, microphone 52, IO module 48, etc., may be part of an audio front end such as, for example, the audio front end 10 (
Example one may include a mobile device having a battery to power the mobile device, an audio front end and logic to use the audio front end to obtain sampled audio from an audio signal during a first portion of a periodic detection window. The logic may also reduce a power consumption of one or more components of the audio front end during a second portion of the periodic detection window, and determine whether voice activity is present in the audio signal based at least in part on the sampled audio.
Additionally, the mobile device of example one may include a power management module that at least partially includes the logic.
Example two may include an apparatus having logic to use an audio front end of a mobile device to obtain sampled audio from an audio signal during a first portion of a periodic detection window. The logic may also reduce a power consumption of one or more components of the audio front end during a second portion of the periodic detection window, and determine whether voice activity is present in the audio signal based at least in part on the sampled audio.
Additionally, a length of the first portion and a length of the second portion are to be defined by a duty cycle of the window in examples one or two. In addition, the first portion is to be greater than a first overhead duration associated with one or more power up operations of the audio front end and the second portion is to be greater than a second overhead duration associated with one or more power down operations of the audio front end. Additionally, the logic of examples one or two may sample the audio signal at a sample rate to obtain the sampled audio. In addition, the logic of examples one or two may store the sampled audio to a memory of the audio front end. Additionally, the logic of examples one or two may sample the audio signal continually if voice activity is present in the audio signal. In addition, the power consumption in examples one or two of one or more of a microphone, a voice activity detector, an analog to digital converter, a memory and a phrase recognizer may be reduced during the second portion of the window.
Example three may include a non-transitory computer readable storage medium having a set of instructions which, if executed by a processor, cause a mobile device to use an audio front end of the mobile device to obtain sampled audio from an audio signal during a first portion of a periodic detection window. The instructions, if executed, may also cause the mobile device to reduce a power consumption of one or more components of the audio front end during a second portion of the periodic detection window, and determine whether voice activity is present in the audio signal based at least in part on the sampled audio.
Additionally, a length of the first portion and a length of the second portion may be defined by a duty cycle of the window in example three. In addition, the first portion of example three may be greater than a first overhead duration associated with one or more power up operations of the audio front end and the second portion of example three may be greater than a second overhead duration associated with one or more power down operations of the audio front end. Additionally, the instructions of example three, if executed, may cause the mobile device to sample the audio signal at a sample rate to obtain the sampled audio. In addition, the instructions of example three, if executed, may cause the mobile device to store the sampled audio to a memory of the audio front end. Additionally, the instructions of example three, if executed, may cause the mobile device to sample the audio signal continually if voice activity is present in the audio signal. In addition, the power consumption in example three of one or more of a microphone, a voice activity detector, an analog to digital converter, a memory and a phrase recognizer may be reduced during the second portion of the window.
Example four may involve a computer implemented method in which an audio front end of a mobile device is used to sampled audio from an audio signal during a first portion of a periodic detection window. The method may also provide for reducing a power consumption of one or more components of the audio front end during a second portion of the periodic detection window, and determining whether voice activity is present in the audio signal based at least in part on the sampled audio.
Additionally, in the method of example four, a length of the first portion and a length of the second portion may be defined by a duty cycle of the window. In addition, in the method of example four, the first portion may be greater than a first overhead duration associated with one or more power up operations of the audio front end and the second portion may be greater than a second overhead duration associated with one or more power down operations of the audio front end. Additionally, the method of example four may further include sampling the audio signal at a sample rate to obtain the sampled audio. In addition, in the method of example four, the power consumption of one or more of a microphone, a voice activity detector, an analog to digital converter, a memory and a phrase recognizer may be reduced during the second portion of the window.
Thus, techniques described herein may enable longer battery life for mobile devices operating in standby mode for voice trigger detection. As a result, hands-free operation may be significantly enhanced a variety of contexts such as, for example, in-vehicle operation (e.g., greater safety) and disability-related usage scenarios.
Embodiments are applicable for use with all types of semiconductor integrated circuit (“IC”) chips. Examples of these IC chips include but are not limited to processors, controllers, chipset components, programmable logic arrays (PLAs), memory chips, network chips, systems on chip (SoCs), SSD/NAND controller ASICs, and the like. In addition, in some of the drawings, signal conductor lines are represented with lines. Some may be different, to indicate more constituent signal paths, have a number label, to indicate a number of constituent signal paths, and/or have arrows at one or more ends, to indicate primary information flow direction. This, however, should not be construed in a limiting manner. Rather, such added detail may be used in connection with one or more exemplary embodiments to facilitate easier understanding of a circuit. Any represented signal lines, whether or not having additional information, may actually comprise one or more signals that may travel in multiple directions and may be implemented with any suitable type of signal scheme, e.g., digital or analog lines implemented with differential pairs, optical fiber lines, and/or single-ended lines.
Example sizes/models/values/ranges may have been given, although embodiments are not limited to the same. As manufacturing techniques (e.g., photolithography) mature over time, it is expected that devices of smaller size could be manufactured. In addition, well known power/ground connections to IC chips and other components may or may not be shown within the figures, for simplicity of illustration and discussion, and so as not to obscure certain aspects of the embodiments. Further, arrangements may be shown in block diagram form in order to avoid obscuring embodiments, and also in view of the fact that specifics with respect to implementation of such block diagram arrangements are highly dependent upon the platform within which the embodiment is to be implemented, i.e., such specifics should be well within purview of one skilled in the art. Where specific details (e.g., circuits) are set forth in order to describe example embodiments, it should be apparent to one skilled in the art that embodiments can be practiced without, or with variation of, these specific details. The description is thus to be regarded as illustrative instead of limiting.
The term “coupled” may be used herein to refer to any type of relationship, direct or indirect, between the components in question, and may apply to electrical, mechanical, fluid, optical, electromagnetic, electromechanical or other connections. In addition, the terms “first”, “second”, etc. are used herein only to facilitate discussion, and carry no particular temporal or chronological significance unless otherwise indicated.
Those skilled in the art will appreciate from the foregoing description that the broad techniques of the embodiments can be implemented in a variety of forms. Therefore, while the embodiments have been described in connection with particular examples thereof, the true scope of the embodiments should not be so limited since other modifications will become apparent to the skilled practitioner upon a study of the drawings, specification, and following claims.