The present invention relates generally to electrical and electronic hardware and speech detection. More specifically, techniques for speech detection using a low power microelectrical mechanical system (MEMS) sensor are described.
Conventional devices and techniques for speech detection typically require multiple separate components, such as a voice activity detection device, a microphone array or other acoustic sensor, a signal processor, and other computing devices for processing acoustic signals and noise cancellation. Implementing each of these components on separate circuits, and then connecting them as a system for speech detection using conventional techniques, is inefficient and uses a lot of power. Although microelectrical mechanical systems (MEMS) microphones exist to combine microphones with certain limited processing capabilities, they are not well-suited for speech detection and recognition.
Also, conventional techniques for separating speech from background noise using microphone arrays typically do not perform well in noisy environments. Other conventional techniques for separating speech from noise require a sensor touching the face to correlate with speech. However, such sensors can be uncomfortable, and unreliable if they do not maintain constant contact with the face, or if there is a barrier between the sensor and skin.
Thus, what is needed is a solution for speech detection using a low power MEMS sensor without the limitations of conventional techniques.
Various embodiments or examples (“examples”) are disclosed in the following detailed description and the accompanying drawings:
Although the above-described drawings depict various examples of the invention, the invention is not limited by the depicted examples. It is to be understood that, in the drawings, like reference numerals designate like structural elements. Also, it is understood that the drawings are not necessarily to scale.
Various embodiments or examples may be implemented in numerous ways, including as a system, a process, an apparatus, a user interface, or a series of program instructions on a computer readable medium such as a computer readable storage medium or a computer network where the program instructions are sent over optical, electronic, or wireless communication links. In general, operations of disclosed processes may be performed in an arbitrary order, unless otherwise provided in the claims.
A detailed description of one or more examples is provided below along with accompanying figures. The detailed description is provided in connection with such examples, but is not limited to any particular example. The scope is limited only by the claims and numerous alternatives, modifications, and equivalents are encompassed. Numerous specific details are set forth in the following description in order to provide a thorough understanding. These details are provided for the purpose of example and the described techniques may be practiced according to the claims without some or all of these specific details. For clarity, technical material that is known in the technical fields related to the examples has not been described in detail to avoid unnecessarily obscuring the description.
In some examples, the described techniques may be implemented as a computer program or application (“application”) or as a plug-in, module, or sub-component of another application. The described techniques may be implemented as software, hardware, firmware, circuitry, or a combination thereof. If implemented as software, the described techniques may be implemented using various types of programming, development, scripting, or formatting languages, frameworks, syntax, applications, protocols, objects, or techniques, including ASP, ASP.net, .Net framework, Ruby, Ruby on Rails, C, Objective C, C++, C#, Adobe® Integrated Runtime™ (Adobe® AIR™), ActionScript™, Flex™, Lingo™, Java™, Javascript™, Ajax, Perl, COBOL, Fortran, ADA, XML, MXML, HTML, DHTML, XHTML, HTTP, XMPP, PHP, and others. Design, publishing, and other types of applications such as Dreamweaver®, Shockwave®, Flash®, Drupal and Fireworks® may also be used to implement the described techniques. Database management systems (i.e., “DBMS”), search facilities and platforms, web crawlers (i.e., computer programs that automatically or semi-automatically visit, index, archive or copy content from, various websites (hereafter referred to as “crawlers”)), and other features may be implemented using various types of proprietary or open source technologies, including MySQL, Oracle (from Oracle of Redwood Shores, Calif.), Solr and Nutch from The Apache Software Foundation of Forest Hill, Md., among others and without limitation. The described techniques may be varied and are not limited to the examples or descriptions provided.
In some examples, VAD logic 112 may be configured to detect a trigger (i.e., an event) that indicates a presence of speech to be captured and processed (i.e., using speech recognition module 122). In some examples, the trigger may be a spike (i.e., sudden increase) in acoustic energy (e.g., acoustic vibrations, signals, pressure waves, and the like), a speech characteristic, a predetermined (i.e., pre-programmed) word, a loud noise (e.g., a siren, an automobile crash, a scream, or other noise), or the like. When VAD logic 112 detects such a trigger, VAD logic 112 may provide a signal to host system 116 to switch (i.e., wake) from a low (or off) power mode to a high (or on) power mode. For example, VAD logic 112 may be implemented as a peak energy tracking system configured to detect, using data from MEMS sensor 106, a peak, spike, or other sudden increase in acoustic or vibrational energy, and to send a signal indicating a presence of speech to power manager 124 upon detection of said energy spike. In another example, VAD logic 112 may be configured to sense the presence of speech by detecting speech characteristics (e.g., articulation, pronunciation, pitch, rate, rhythm, and the like), and to send a signal indicating a presence of speech to power manager 124 upon detection of one or more of said speech characteristics. For example, speech patterns associated with said characteristics may be pre-programmed into VAD logic 112. In still another example, VAD logic 112 may be configured to detect a trigger word, which may be pre-programmed into VAD logic 112 such that VAD logic 112 may send a signal indicating a presence of speech to power manager 124 upon detection of said trigger word. In yet another example, VAD logic 112 may be configured to detect (i.e., using an accelerometer (e.g., MEMS sensor 106)) a tap (e.g., physical strike, light hit, brief touch, or the like), for example, on a housing (not shown) in which low power VAD device 102 may be housed, encased, mounted, or otherwise installed. VAD logic 112 may be configured to send a signal indicating a presence of speech to power manager 124 upon detection of said tap. In some examples, triggers may be programmed using an interface (e.g., control interface 228 in
In some examples, power source 114 may be implemented as a battery, battery module, or other power storage. As a battery, power source 114 may be implemented using various types of battery technologies, including Lithium Ion (“LI”), Nickel Metal Hydride (“NiMH”), or others, without limitation. In some examples, power may be gathered from local power sources such as solar panels, thermo-electric generators, and kinetic energy generators, among other power sources. These additional sources can either power the system directly or can charge power source 114, which, in turn, may be used to power the speech detection system. Power source 114 also may include circuitry, hardware, or software that may be used in connection with, or in lieu of, a processor in order to provide power management (e.g., power manager 124), charge/recharging, sleep, or other functions. Power drawn as electrical current may be distributed from power source 114 via bus 104 and/or bus 118, which may be implemented as deposited or formed circuitry or using other forms of circuits. Electrical current distributed from power source 114, for example, using bus 104 and/or bus 118, may be managed by a processor (not shown) and may be used by one or more of the components (shown or not shown) of low power VAD device 102 and host system 116.
In some examples, power manager 124 may be configured to provide control signals to other components of host system to power on (i.e., high power or full capture mode) or off (i.e., low power mode) in response to a signal from low power VAD device indicating whether or not there is speech (i.e., a presence of speech). For example, when low power VAD device 102 detects a presence of speech, low power VAD device 102 may provide a signal (i.e., using VAD logic 112 and a communication interface (not shown)) to power manager 124 to switch host system 116 from a low power mode, wherein host system 116 draws a minimal amount of power (i.e., sufficient power to operate power manager 124 to receive a signal from low power VAD device 102) to a high power mode, wherein host system 116 draws more power from power source 114 (i.e., sufficient power to operate signal processing module 120, speech recognition module 122, sensor 126, and other components of host system 116). In another example, once low power VAD device 102 detects a change from a presence of speech to an absence of speech, low power VAD device 102 may provide another signal indicating an absence of speech to power manager 124 to switch host system 116 from a high power mode back to a low power mode. In still other examples, low power VAD device also may be configured to detect a speech (i.e., verbal) command to manually switch host system 116 to an off or low power mode. For example, VAD logic 112, or another module of low power VAD device 102 or host system 116, may be pre-programmed to detect a verbal command (e.g., “off,” “low power,” or the like), and to send the another signal to power manager 124 causing power manager 124 to switch host system 116 from a high power mode back to a low power mode (i.e., by sending control signals to various components of host system 116). In some examples, power manager 124 may be configured to send control signals associated with other modes, in addition to high and low power modes, to other components of host system 116 (e.g., signal processing module 120, speech recognition module 122, sensor 126, or the like) or other components (e.g., power source 114, VAD logic 112, or the like). For example, power manager 124 may be configured to send a control signal to an individual component to turn it on (i.e., wake it up).
In some examples, speech recognition module 122 may be configured to process data associated with speech signals, for example, detected by sensor 126 or MEMS sensor 106. For example, speech recognition module 122 may be configured to recognize speech, such as speech commands. In some examples, host system 116 may include signal processing module 120, which may be configured to supplement or off-load (i.e., from digital signal processor 110) signal processing capabilities when host system 116 is operating in a high power or full capture mode. In some examples, signal processing module 120 may be configured to have hardware signal processing capabilities.
In some examples, sensor 126 may operate as an acoustic sensor. In other examples, sensor 126 may operate as a vibration sensor. In some examples, sensor 126 may be implemented using multiple silicon microphones. In another example, sensor 126 may be implemented using multiple accelerometer modules. In still other examples, the above-described elements may be implemented differently in layout, design, function, structure, features, or other aspects and are not limited to the examples shown and described.
As a signal from a MEMS sensor is being monitored, a VAD device (e.g., low power VAD devices 102 and 202 in
As used herein, recognizing speech includes processing speech to identify, categorize, verify, store or otherwise derive meaning, from data associated with speech. Once the speech is being processed, an action associated with the speech may be taken (308). For example, the speech may include one or more commands, and a host system may be configured to take one or more actions in response to each of the one or more commands. For example, a speech recognition module may be configured to identify speech commands and to initiate actions associated with said speech commands (e.g., to turn on in response to an “on” command, to turn off in response to an “off” command, to switch modes in response to an associated command, to send control signals to other modules or devices in response to other associated commands, and the like). In another example, a speech recognition module may be configured to identify and store speech patterns (i.e., for one or more users). In yet another example, a speech recognition module may be configured to match sensor data (e.g., from MEMS sensor 106 and/or sensor 126 in
The structures and/or functions of any of the above-described features can be implemented in software, hardware, firmware, circuitry, or any combination thereof. Note that the structures and constituent elements above, as well as their functionality, may be aggregated or combined with one or more other structures or elements. Alternatively, the elements and their functionality may be subdivided into constituent sub-elements, if any. As software, at least some of the above-described techniques may be implemented using various types of programming or formatting languages, frameworks, syntax, applications, protocols, objects, or techniques. These can be varied and are not limited to the examples or descriptions provided.
As hardware and/or firmware, the above-described structures and techniques can be implemented using various types of programming or integrated circuit design languages, including hardware description languages, such as any register transfer language (“RTL”) configured to design field-programmable gate arrays (“FPGAs”), application-specific integrated circuits (“ASICs”), multi-chip modules, or any other type of integrated circuit.
According to some embodiments, the term “module” can refer, for example, to an algorithm or a portion thereof, and/or logic implemented in either hardware circuitry or software, or a combination thereof (i.e., a module can be implemented as a circuit). In some embodiments, algorithms and/or the memory in which the algorithms are stored are “components” of a circuit. Thus, the term “circuit” can also refer, for example, to a system of components, including algorithms. These can be varied and are not limited to the examples or descriptions provided.
Although the foregoing examples have been described in some detail for purposes of clarity of understanding, the above-described inventive techniques are not limited to the details provided. There are many alternative ways of implementing the above-described invention techniques. The disclosed examples are illustrative and not restrictive.
This application claims the benefit of U.S. Provisional Patent Application No. 61/780,896 (Attorney Docket No. ALI-143P), filed Mar. 13, 2013, which is incorporated by reference herein in its entirety for all purposes.
Number | Date | Country | |
---|---|---|---|
61780896 | Mar 2013 | US |