The present invention relates to devices for sensing acoustic vibrations.
A number of devices are typically used in communications devices such as handsets (mobile and wired telephones) and headsets (all types) for example, to detect the speech of a user. These devices include acoustic microphones, physiological microphones, and accelerometers.
One common device typically used for detecting speech is an acoustic pressure sensor or microphone. One example of an acoustic pressure sensor is an electret condenser microphone, which can currently be found in numerous mobile communication devices. These electret condenser microphones have been miniaturized to fit into mobile devices such as cellular telephones and headsets. A typical device might have a diameter of 6 millimeters (mm) and a height of 3 mm. The problem with these electret condenser microphones is that because the microphones are designed to detect acoustic vibrations in the air, they generally detect ambient acoustic noise in addition to the speech signal of interest. The received speech signal therefore often includes noise (such as engines, people, and wind), much of which cannot be removed without degrading the speech quality. The noise present in the received speech signal presents significant qualitative and functional problems for a variety of downstream speech processing applications of the host communication device, applications including basic voice services and speech recognition for example.
Another device used for detecting speech is a physiological microphone, also referred to as a “P-Mic”. The P-Mic detects body vibrations generated during speech through the use of a small gel-filled cushion coupled to a piezo-sensor. Since the gel cushion couples well to the human flesh and poorly to the air, the P-Mic can accurately detect speech vibrations when placed against the skin, even in high noise environments. However, this solution requires firm contact between the gel cushion and the skin to work effectively—a requirement the consumer market is unlikely to accept. Further, at a size of approximately 1.5 inches on a side, the P-Mic is typically too large for deployment into many consumer communication products. Additionally, the P-Mic is prohibitively expensive to see widespread use in consumer products such as headsets. Also, the P-Mic does not use a standard microphone electrical interface so additional circuitry is required in order to connect the P-Mic to an analog-to-digital converter, increasing both size and implementation cost.
Yet another common device typically used for detecting speech, which is similar in principle to the P-Mic, is a Bone Conduction Microphone (BCM). The BCM includes an accelerometer used to measure skin/flesh vibrations generated by speech. The accelerometer of the BCM measures its own displacement caused by speech vibrations. However, much like the P-Mic, accelerometers require good contact to work effectively and are currently too expensive and electronically cumbersome to be used in commercial communications products. Again, accelerometers cannot use a standard microphone electrical interface so additional circuitry is required to connect the accelerometer to an analog-to-digital converter, thereby increasing both size and implementation cost.
In the drawings, the same reference numbers identify identical or substantially similar elements or acts. To easily identify the discussion of any particular element or act, the most significant digit or digits in a reference number refer to the Figure number in which that element is first introduced (e.g., element 100 is first introduced and discussed with respect to
An acoustic vibration sensor, also referred to as a speech sensing device, is described below. The acoustic vibration sensor is similar to a microphone in that it captures speech information from the head area of a human talker or talker in noisy environments. This information can then be used to generate a Voice Activity Detection (VAD) Signal, which is useful in many speech applications. Previous solutions to this problem have either been vulnerable to noise, physically too large for certain applications, or cost prohibitive. In contrast, the acoustic vibration sensor described herein accurately detects and captures speech vibrations in the presence of substantial airborne acoustic noise, yet within a smaller and less expensive physical package. The noise-resistant speech information provided by the acoustic vibration sensor can subsequently be used in downstream speech processing applications (speech enhancement and noise suppression, speech encoding, speech recognition, talker verification, etc.) to improve the performance of those applications.
The following description provides specific details for a thorough understanding of, and enabling description for, embodiments of a transducer. However, one skilled in the art will understand that the invention may be practiced without these details. In other instances, well-known structures and functions have not been shown or described in detail to avoid unnecessarily obscuring the description of the embodiments of the invention.
The sensor also includes electret microphone 120 and the associated components and electronics coupled to receive acoustic signals from the talker via the coupler 110 and the diaphragm 108 and convert the acoustic signals to electrical signals representative of human speech. Electrical contacts 130 provide the electrical signals as an output. Alternative embodiments can use any type/combination of materials and/or electronics to convert the acoustic signals to electrical signals representative of human speech and output the electrical signals.
The coupler 110 of an embodiment is formed using materials having acoustic impedances matched to the impedance of human skin (characteristic acoustic impedance of skin is approximately 1.5×106 Pa×s/m). The coupler 110 therefore, is formed using a material that includes at least one of silicone gel, dielectric gel, thermoplastic elastomers (TPE), and rubber compounds, but is not so limited. As an example, the coupler 110 of an embodiment is formed using Kraiburg TPE products. As another example, the coupler 110 of an embodiment is formed using Sylgard® Silicone products.
The coupler 110 of an embodiment includes a contact device 112 that includes, for example, a nipple or protrusion that protrudes from either or both sides of the coupler 110. In operation, a contact device 112 that protrudes from both sides of the coupler 110 includes one side of the contact device 112 that is in contact with the skin surface of the talker and another side of the contact device 112 that is in contact with the diaphragm, but the embodiment is not so limited. The coupler 110 and the contact device 112 can be formed from the same or different materials.
The coupler 110 transfers acoustic energy efficiently from skin/flesh of a talker to the diaphragm, and seals the diaphragm from ambient airborne acoustic signals. Consequently, the coupler 110 with the contact device 112 efficiently transfers acoustic signals directly from the talker's body (speech vibrations) to the diaphragm while isolating the diaphragm from acoustic signals in the airborne environment of the talker (characteristic acoustic impedance of air is approximately 415 Pa×s/m). The diaphragm is isolated from acoustic signals in the airborne environment of the talker by the coupler 110 because the coupler 110 prevents the signals from reaching the diaphragm, thereby reflecting and/or dissipating much of the energy of the acoustic signals in the airborne environment. Consequently, the sensor 100 responds primarily to acoustic energy transferred from the skin of the talker, not air. When placed against the head of the talker, the sensor 100 picks up speech-induced acoustic signals on the surface of the skin while airborne acoustic noise signals are largely rejected, thereby increasing the signal-to-noise ratio and providing a very reliable source of speech information.
Performance of the sensor 100 is enhanced through the use of the seal provided between the diaphragm and the airborne environment of the talker. The seal is provided by the coupler 110. A modified gradient microphone is used in an embodiment because it has pressure ports on both ends. Thus, when the first port 104 is sealed by the coupler 110, the second port 106 provides a vent for air movement through the sensor 100.
The acoustic vibration sensor provides very accurate Voice Activity Detection (VAD) in high noise environments, where high noise environments include airborne acoustic environments in which the noise amplitude is as large if not larger than the speech amplitude as would be measured by conventional omnidirectional microphones. Accurate VAD information provides significant performance and efficiency benefits in a number of important speech processing applications including but not limited to: noise suppression algorithms such as the Pathfinder algorithm available from Aliph, Brisbane, Calif. and described in the Related Applications; speech compression algorithms such as the Enhanced Variable Rate Coder (EVRC) deployed in many commercial systems; and speech recognition systems.
In addition to providing signals having an improved signal-to-noise ratio, the acoustic vibration sensor uses only minimal power to operate (on the order of 200 micro Amps, for example). In contrast to alternative solutions that require power, filtering, and/or significant amplification, the acoustic vibration sensor uses a standard microphone interface to connect with signal processing devices. The use of the standard microphone interface avoids the additional expense and size of interface circuitry in a host device and supports for of the sensor in highly mobile applications where power usage is an issue.
As described above, the sensor includes additional electronic materials as appropriate that couple to receive acoustic signals from the talker via the coupler 410, the silicon gel 409, and the diaphragm 408 and convert the acoustic signals to electrical signals representative of human speech. Alternative embodiments can use any type/combination of materials and/or electronics to convert the acoustic signals to electrical signals representative of human speech.
The coupler 410 and/or gel 409 of an embodiment are formed using materials having impedances matched to the impedance of human skin. As such, the coupler 410 is formed using a material that includes at least one of silicone gel, dielectric gel, thermoplastic elastomers (TPE), and rubber compounds, but is not so limited. The coupler 410 transfers acoustic energy efficiently from skin/flesh of a talker to the diaphragm, and seals the diaphragm from ambient airborne acoustic signals. Consequently, the coupler 410 efficiently transfers acoustic signals directly from the talker's body (speech vibrations) to the diaphragm while isolating the diaphragm from acoustic signals in the airborne environment of the talker. The diaphragm is isolated from acoustic signals in the airborne environment of the talker by the silicon gel 409/coupler 410 because the silicon gel 409/coupler 410 prevents the signals from reaching the diaphragm, thereby reflecting and/or dissipating much of the energy of the acoustic signals in the airborne environment. Consequently, the sensor 400 responds primarily to acoustic energy transferred from the skin of the talker, not air. When placed again the head of the talker, the sensor 400 picks up speech-induced acoustic signals on the surface of the skin while airborne acoustic noise signals are largely rejected, thereby increasing the signal-to-noise ratio and providing a very reliable source of speech information.
There are many locations outside the ear from which the acoustic vibration sensor can detect skin vibrations associated with the production of speech. The sensor can be mounted in a device, handset, or earpiece in any manner, the only restriction being that reliable skin contact is used to detect the skin-borne vibrations associated with the production of speech.
Note that the silicon gel (block 702) is an optional component that depends on the embodiment of the sensor being manufactured, as described above. Consequently, the manufacture of an acoustic vibration sensor 100 that includes a contact device 112 (referring to
An acoustic vibration sensor, also referred to as a speech sensing device or sensor, is provided. The sensor, which generates electrical signals, comprises: at least one diaphragm positioned adjacent a front port; and at least one coupler configured to couple a first set of signals to the diaphragm and reject a second set of signals by isolating the diaphragm from the second set of signals, wherein the coupler includes at least one material having an acoustic impedance matched to an impedance of human skin.
The coupler of an embodiment couples to skin of a human talker and the first set of signals include speech signals of the talker and the second set of signals include noise of an airborne acoustic environment of the talker.
The coupler of an embodiment includes a first protrusion on a first side of the coupler that contacts a surface of the human skin and a second protrusion on a second side of the coupler that contacts the diaphragm.
The sensor of an embodiment includes a coupler having a first side that contacts the human skin and a second side that couples to the diaphragm via at least one layer of gel material.
The coupler of an embodiment comprises at least one material including at least one of silicone gel, dielectric gel, thermoplastic elastomers (TPE), and rubber compounds.
An acoustic sensor is provided that comprises: a first port on a first side of an enclosure; a second port on a second side of an enclosure; at least one diaphragm positioned between the first and second ports; and a contiguous coupler having a first portion that couples a first side of the diaphragm to skin of a human talker and a second portion that isolates the first side of the diaphragm from an acoustic environment of the human talker, wherein the coupler includes at least one material having an acoustic impedance matched to the impedance of skin.
The sensor of an embodiment further comprises an electret microphone coupled to receive acoustic signals from the talker via the coupler and the diaphragm, wherein the electret microphone is used to convert the acoustic signals to electrical signals.
The coupler of an embodiment comprises at least one material including at least one of silicone gel, dielectric gel, thermoplastic elastomers (TPE), and rubber compounds.
The coupler of an embodiment includes a contact device comprising a first side that contacts the skin and a second side that contacts the diaphragm.
In the sensor of an embodiment the second port couples a second side of the diaphragm to the airborne acoustic environment.
A communication system is provided that comprises: at least one signal processor; and at least one acoustic sensor that couples electrical signals representative of human speech to the signal processor, the sensor including at least one diaphragm positioned between a first port and a second port of an enclosure, the sensor further including a contiguous coupler comprising at least one material having an acoustic impedance matched to the impedance of skin, wherein the coupler includes a first portion that couples a first side of the diaphragm to skin of a human talker and a second portion that isolates a first side of the diaphragm from an acoustic environment of the human talker.
The communication system of an embodiment further comprises a portable communication device that includes the acoustic sensor, wherein the portable communication device includes at least one of cellular telephones, satellite telephones, portable telephones, wireline telephones, Internet telephones, wireless transceivers, wireless communication radios, personal digital assistants (PDAs), personal computers (PCs), headset devices, head-worn devices, and earpieces.
A device for sensing speech signals is provided that comprises means for receiving speech signals, along with means for coupling a first set of signals to the means for receiving and rejecting a second set of signals, wherein the means for coupling isolates the means for receiving from the second set of signals, wherein the means for coupling includes at least one material having an impedance matched to an impedance of human skin.
Aspects of the acoustic vibration sensor described herein may be implemented using any of a variety of materials and methods. Unless the context clearly requires otherwise, throughout the description and the claims, the words “comprise,” “comprising,” and the like are to be construed in an inclusive sense as opposed to an exclusive or exhaustive sense; that is to say, in a sense of “including, but not limited to.” Words using the singular or plural number also include the plural or singular number respectively. Additionally, the words “herein,” “hereunder,” “above,” “below,” and words of similar import refer to this application as a whole and not to any particular portions of this application. When the word “or” is used in reference to a list of two or more items, that word covers all of the following interpretations of the word: any of the items in the list, all of the items in the list and any combination of the items in the list.
The above description of illustrated embodiments of the acoustic vibration sensor is not intended to be exhaustive or to limit the system to the precise form disclosed. While specific embodiments of, and examples for, the acoustic vibration sensor are described herein for illustrative purposes, various equivalent modifications are possible within the scope of the sensor, as those skilled in the relevant art will recognize. The teachings of the acoustic vibration sensor provided herein can be applied to other sensing devices and systems, not only for the sensors described above.
The elements and acts of the various embodiments described above can be combined to provide further embodiments. These and other changes can be made to the acoustic vibration sensor in light of the above detailed description.
All of the above references and United States patents and patent applications are incorporated herein by reference. Aspects of the acoustic vibration sensor can be modified, if necessary, to employ the systems, functions and concepts of the various patents and applications described above to provide yet further embodiments of the acoustic vibration sensor.
In general, in the following claims, the terms used should not be construed to limit the acoustic vibration sensor to the specific embodiments disclosed in the specification and the claims, but should be construed to include all sensors and speech processing systems that operate under the claims to provide sensing capabilities. Accordingly, the acoustic vibration sensor is not limited by the disclosure, but instead the scope of the sensor is to be determined entirely by the claims.
While certain aspects of the acoustic vibration sensor are presented below in certain claim forms, the inventors contemplate the various aspects of the sensor in any number of claim forms. Accordingly, the inventors reserve the right to add additional claims after filing the application to pursue such additional claim forms for other aspects of the acoustic vibration sensor.
This application claims priority to U.S. patent application No. 60/443,818, filed Jan. 30, 2003. This application relates to the following U.S. patent application Ser. Nos. 09/990,847 filed Nov. 21, 2001; 10/159,770, filed May 30, 2002; 10/301,237, filed Nov. 21, 2002; 10/383,162, filed Mar. 5, 2003; 10/400,282, filed Mar. 27, 2003; and 10/667,207, filed Sep. 18, 2003.
Number | Name | Date | Kind |
---|---|---|---|
3789166 | Sebesta | Jan 1974 | A |
4006318 | Sebesta et al. | Feb 1977 | A |
4591668 | Iwata | May 1986 | A |
4607383 | Ingalls | Aug 1986 | A |
4901354 | Gollmar et al. | Feb 1990 | A |
4949387 | Andert et al. | Aug 1990 | A |
5097515 | Baba | Mar 1992 | A |
5150418 | Honda et al. | Sep 1992 | A |
5212764 | Ariyoshi | May 1993 | A |
5400409 | Linhard | Mar 1995 | A |
5406622 | Silverberg et al. | Apr 1995 | A |
5414776 | Sims, Jr. | May 1995 | A |
5473702 | Yoshida et al. | Dec 1995 | A |
5515865 | Scanlon | May 1996 | A |
5517435 | Sugiyama | May 1996 | A |
5539859 | Robbe et al. | Jul 1996 | A |
5633935 | Kanamori et al. | May 1997 | A |
5649055 | Gupta et al. | Jul 1997 | A |
5684460 | Scanlon | Nov 1997 | A |
5729694 | Holzrichter et al. | Mar 1998 | A |
5754665 | Hosoi et al. | May 1998 | A |
5790684 | Niino et al. | Aug 1998 | A |
5835608 | Warnaka et al. | Nov 1998 | A |
5853005 | Scanlon | Dec 1998 | A |
5917921 | Sasaki et al. | Jun 1999 | A |
5966090 | McEwan | Oct 1999 | A |
5986600 | McEwan | Nov 1999 | A |
6006175 | Holzrichter | Dec 1999 | A |
6009396 | Nagata | Dec 1999 | A |
6069963 | Martin et al. | May 2000 | A |
6188773 | Murata et al. | Feb 2001 | B1 |
6191724 | McEwan | Feb 2001 | B1 |
6266422 | Ikeda | Jul 2001 | B1 |
6430295 | Handel et al. | Aug 2002 | B1 |
6685638 | Taylor et al. | Feb 2004 | B1 |
6771788 | Soutar et al. | Aug 2004 | B1 |
20020039425 | Burnett et al. | Apr 2002 | A1 |
Number | Date | Country |
---|---|---|
0 637 187 | Feb 1995 | EP |
0 795 851 | Sep 1997 | EP |
0 984 660 | Mar 2000 | EP |
2000 312 395 | Nov 2000 | JP |
2001 189 987 | Jul 2001 | JP |
WO 02 07151 | Jan 2002 | WO |
Number | Date | Country | |
---|---|---|---|
20040249633 A1 | Dec 2004 | US |
Number | Date | Country | |
---|---|---|---|
60443818 | Jan 2003 | US |