This application, and the innovations and related subject matter disclosed herein, (collectively referred to as the “disclosure”) generally concern microphone-based orientation detectors and associated techniques. More particularly but not exclusively, this disclosure pertains to sensors (also sometimes referred to as detectors) configured to determine an orientation of a device relative to a speaker's mouth, with a sensor configured to determine an orientation based in part on a difference in spectral power between two microphone signals being but one particular example of disclosed sensors.
Some commercially available communication handsets have two microphones. A first microphone is positioned in a region expected to be near a user's mouth during use of the handset, and the other microphone is spaced apart from the first microphone. With such an arrangement, the first microphone is intended to be positioned to receive the user's utterances directly, and the other microphone receives a comparatively attenuated version of the user's utterances, allowing a signal from the other microphone to be used as a noise reference.
Two-microphone arrangements as just described can provide a much more accurate noise spectrum estimate as compared to estimates obtained from a single microphone. With a relatively more accurate estimate of the noise spectrum, a noise suppressor can be used with relatively less distortion to the desired signal (e.g., a voice signal in context of a mobile communication device).
However, despite such benefits of two-channel noise suppression, if the first microphone is moved away from the user's mouth, as when the handset is repositioned during use, then the accuracy of the spectral noise estimate can decrease, as the first microphone can receive a more attenuated version of the speech signal. Consequently, the reference microphone signal can include relatively more voice components relative to the first microphone, leading to voice distortion because there is less spectral separation between the microphone transducers when the user speaks.
Therefore, a need exists for orientation detectors configured to detect when a microphone has been moved away from a user's mouth. In addition, a need exists for speech enhancers compatible with a wide range of handset use positions. As well, a need exists for improved noise-suppression systems for use in mobile communication handsets.
The innovations disclosed herein overcome many problems in the prior art and address one or more of the aforementioned or other needs. In some respects, the innovations disclosed herein are directed to microphone-based orientation sensors and associated techniques, and more particularly but not exclusively, to sensors configured to determine an orientation of a device relative to a speaker's mouth. Some disclosed sensors are configured to determine an orientation based on a difference in spectral power as between first and second microphone signals relative to a reference microphone signal. Other disclosed sensors are configured to determine an orientation based on differences in spectral power among more than two microphone signals. Mobile communication handsets and other devices having such sensors and detectors also are disclosed.
An orientation detector and sensors are disclosed. A first microphone can have a first position, a second microphone can have a second position, and a reference microphone can be spaced from the first microphone and the second microphone. An orientation processor can be configured to determine an orientation of the first microphone, the second microphone, or both, relative to a position of a source of a targeted acoustic signal (e.g., a user's mouth) based on a comparison of a relative separation of a first signal associated with the first microphone to a relative separation of a second signal associated with the second microphone. Throughout this disclosure, reference is made to a user's mouth position. In context of a mobile handset, a user's mouth position is likely the most relevant source of a targeted acoustic signal. Other embodiments, however, can have acoustic sources other than a user's mouth. Accordingly, particular references to a user's mouth herein should be understood in a more general context as including other sources of acoustic signals.
The first signal can include or be a signal emitted by the first microphone transducer. In some instances, the first signal combines the signal emitted by the first microphone with a signal emitted by the second microphone. For example, the first signal can be a signal output from a beamformer. In some instances, the signal (or a portion thereof) emitted by the first microphone transducer can be more heavily weighted in the combination relative to the signal (or a portion thereof) emitted by the second microphone transducer. For example, in context of beamformers, a signal from a first microphone and a signal from a second microphone can be combined after being filtered to establish a suitable phase/delay of one signal relative to another signal, e.g., to achieve a desired beam directionality.
The second signal can include or be a signal emitted by the second microphone transducer. In some instances, the second signal combines the signal emitted by the second microphone with a signal emitted by the first microphone. The signal (or a portion thereof) emitted by the second microphone can be more heavily weighted in the combination relative to the signal emitted by the first microphone.
A measure of the separation of the first signal can include a difference in spectral power as between the first signal and a signal emitted by the reference microphone. A measure of the separation of the second signal can include a difference in spectral power as between the second signal and the signal emitted by the reference microphone.
Some orientation detectors also include a separation processor configured to determine a spectral power separation, relative to a signal emitted by the reference microphone transducer, of a signal emitted by the first microphone, a signal emitted by the second microphone, a first beam comprising the signal emitted by the first microphone and the signal emitted by the second microphone, and a second beam comprising the signal emitted by the first microphone and the signal emitted by the second microphone. The first beam can more heavily weight the signal emitted by the first microphone as compared to the signal emitted by the second microphone. Similarly, the second beam can more heavily weight the signal emitted by the second microphone as compared to the signal emitted by the first microphone. The first beam can have a directionality (sometimes also referred to in the art as a “look direction”) corresponding to a first direction of rotation relative to a user's mouth. The second beam can have a directionality corresponding to a second direction of rotation relative to the user's mouth. The first and the second directions can differ from each other, and in some cases can be opposite relative to each other.
Although orientation detectors are described herein largely in relation to two microphones and two beams, this disclosure contemplates orientation detectors having more than two microphones, as well as more than two beams, e.g., to provide relative higher resolution orientation sensitivity in rotation about a given axis, or to add orientation sensitivity in rotation about one or more additional axes (e.g., pitch, yaw, and roll). Some orientation detectors have a voice-activity-detector configured to declare voice activity when the spectral power separation of at least one of the signals emitted by the first microphone, the signal emitted by the second microphone, the first beam, and the second beam exceeds a threshold spectral power separation.
The threshold spectral power separation can vary inversely with a level of stationary noise.
An axis can extend from the first microphone to the second microphone, and wherein the orientation processor is further configured to determine an extent of rotation of the axis relative to a neutral position based on the comparison of the separation of the first signal to the separation of the second signal.
Some orientation detectors include one or more of a gyroscope, an accelerometer, and a proximity detector. A communication connection can link the orientation processor with one or more of the gyroscope, the accelerometer, and the proximity detector. The orientation processor can determine the orientation based at least in part on an output from one or more of the gyroscope, the accelerometer, and the proximity detector. In some instances, the orientation determined based in part on an output from one or more of the gyroscope, the accelerometer, and the proximity detector can be relative to a fixed frame of reference (e.g., the earth) rather than relative to a user's mouth.
An orientation determined by the orientation detector can be one of pitch, yaw, or roll. The orientation detector can also include a fourth microphone spaced apart from the first microphone, the second microphone and the reference microphone. The orientation processor can be configured to determine an angular rotation in the other two of pitch, yaw, and roll, based at least in part based on a comparison of a relative separation of a signal associated with the fourth microphone relative to the respective separations of the signals associated with the first and the second microphones.
Communication handsets are disclosed. A handset can have a chassis with a front side, a back side, a top edge, and a bottom edge. A first microphone and a second microphone can be spaced apart from the first microphone. The first and the second microphones can be positioned on or adjacent to the bottom edge of the chassis. A reference microphone can face the back side of the chassis and be positioned closer to the top edge than to the bottom edge. An orientation detector can be configured to detect an orientation of the chassis relative to a user's mouth based at least in part on a strength of a signal from the first microphone relative to a signal from the reference microphone compared to a strength of a signal from the second microphone relative to the signal from the reference microphone.
Some disclosed handsets also have a noise suppressor and a signal selector configured to direct to the noise suppressor a signal which is selected from one of the signal from the first microphone, the signal from the second microphone, an average of the signal from the first microphone and the signal from the second microphone, a first beam comprising a first combination of the signal from the first microphone with the signal from the second microphone, and a second beam comprising a second combination of the signal from the first microphone and the signal from the second microphone. The first combination can weight the signal from the first microphone more heavily as compared to the signal from the second microphone. The second combination can weight the signal from the second microphone more heavily as compared to the signal from the first microphone.
In some instances, the selector is configured to equalize a signal from the reference microphone to match a far-field response of the first beam signal, the second beam signal, or both, in diffuse noise.
The noise suppressor can be configured, in some instances, to subject the signal from the reference microphone to a minimum spectral profile corresponding to a system spectral noise profile of one or both of the first beam and the second beam.
Some communication handsets also have one or more of a gyroscope, an accelerometer, and a proximity detector and a communication connection between the orientation detector and the one or more of the gyroscope, the accelerometer, and the proximity detector.
Some communication handsets also have a calibration data store containing a correlation between an angle of the chassis relative to a user's mouth and the strength of the signal from the first microphone compared to the strength of the signal from the second microphone. Such calibration data can also contain a correlation between an angle of the chassis relative to a user's mouth and a strength of one or more beams.
In some instances, a measure of the orientation of the chassis relative to the user's mouth comprises an extent of rotation from a neutral position. In general, but not always, the user's mouth is substantially centered between the first microphone and the second microphone in the neutral position.
Some communication handsets have a fourth microphone spaced apart from the bottom edge of the chassis. The orientation detector can further be configured to determine an angular rotation in each of pitch, yaw, and roll, based at least in part on a strength of a signal from the fourth microphone relative to a signal from the reference microphone.
Also disclosed are tangible, non-transitory computer-readable media including computer executable instructions that, when executed, cause a computing environment to implement a disclosed orientation detection method.
The foregoing and other features and advantages will become more apparent from the following detailed description, which proceeds with reference to the accompanying drawings.
Unless specified otherwise, the accompanying drawings illustrate aspects of the innovations described herein. Referring to the drawings, wherein like numerals refer to like parts throughout the several views and this specification, several embodiments of presently disclosed principles are illustrated by way of example, and not by way of limitation.
The following describes various innovative principles related orientation-detection systems, orientation detection techniques, and related signal processors, by way of reference to specific orientation-detection system embodiments, which are but several particular examples chosen for illustrative purposes. More particularly but not exclusively, disclosed subject matter pertains, in some respects, to systems for detecting an orientation of a handset relative to a user's mouth.
Nonetheless, one or more of the disclosed principles can be incorporated in various other signal processing systems to achieve any of a variety of corresponding system characteristics. Techniques and systems described in relation to particular configurations, applications, or uses, are merely examples of techniques and systems incorporating one or more of the innovative principles disclosed herein. Such examples are used to illustrate one or more innovative aspects of the disclosed principles.
Thus, orientation-detection techniques (and associated systems) having attributes that are different from those specific examples discussed herein can embody one or more of the innovative principles, and can be used in applications not described herein in detail, for example, in “hands-free” communication systems, in hand-held gaming systems or other console systems, etc. Accordingly, such alternative embodiments also fall within the scope of this disclosure.
I. Overview
With a configuration as shown in
In some respects, this disclosure describes techniques for deciding which beam to use and under which circumstances. For example, if a user's mouth position is adjacent a center region 15 between the microphones 10, 20, an average of the signals (M1+M4)/2 can be used to collect a user's utterance. Alternatively, it might be preferred to use one of the beams, or one of the microphones M1 or M4, if the user's mouth position is biased toward the left or right of the bottom of the handset.
As used herein, the term “M1” refers to a signal from a first microphone 10, the term “M4” refers to a signal from a second microphone 20, and the term “M2” refers to a signal from the reference microphone 30.
II. Microphone-based Orientation Detection
With two microphones 10, 20, any of M1, M4, or beams formed using M1 and M4, can be used for noise-suppression in conjunction with the noise reference microphone M2. In an attempt to minimize voice distortion while achieving desirable noise suppression, a microphone signal or beam having the highest spectral separation when the near-end voice is active can be selected.
Let M1(k) and M2(k) denote the power spectrum of the output signal from the first microphone 10 and the reference microphone 30 respectively. Then the separation is defined, generally, as a separation function: sep(M1(k), M2(k)). In one particular embodiment, the separation function is defined as follows:
Separation between output signals from the second microphone 20 and the reference microphone 30 can be defined similarly. For beams that are formed from output signals from the first and second microphones 10, 20, the separation can be computed in a similar fashion, but with the output signal from the reference microphone 30 equalized to have the same far-field response as the beams. Such equalization allows the system to suppress noise introduced by beamforming.
A. Orientation Detection Based on Separation
Raw separation 111 between output signals from the first microphone 10 and the reference microphone 30, and raw separation 112 between output signals from the second microphone 20 and the reference microphone 30, respectively, denoted by sep(M1(k), M2(k)) and sep(M4(k), M2(k)), respectively, can be computed. Some time and frequency smoothing can be applied.
Since we are trying to determine the position of a near-end talker's mouth with respect to the bottom microphones 10, 20 of the device 1, separation data will only be considered during near-end speech. In this example, the VAD 120 considers the near-end talker to be active when the following condition is met:
max(sep(M1(k),M2(k)) and sep(M4(k),M2(k)))>Threshold.
The threshold can be a function of stationary noise, and typically can be reduced as the stationary noise level increases. In
The noise suppressor 150 suppresses noise from the selected signal 141 before emitting the output 160 from the speech enhancer 100.
The selector 240 can output an equalized noise signal 241 and the selected speech signal 242. The noise suppressor 250 can process the speech signal 242 and emit an output signal from 260 from the speech enhancer 200.
An output mode selector 245 can set an operating mode for the selector 240. For example, the selector can choose from between M1 and M4, between +X and −X, from among M1, M4 and (M1+M4)/2, or from among +X, −X and (M1+M4)/2. Where a beam (e.g., −X or +X) is selected for voice input (e.g., input 242), a signal from the reference microphone 30 (e.g., via the selector 240 as indicated in
With a VAD as indicated in
max(sep(M1(k),M2(k)),sep(M4(k),M2(k)),sep(+X(k),M2(k)),sep(−X(k),M2(k)))>Threshold,
where sep(+X(k), M2(k)) 313 and sep(−X(k), M2(k)) 314 are respective measures of separation of the beams. Signals 311 and 312 represent separation of the microphone channels 10, 20 relative to the reference microphone signal.
Other features in
The VAD output 321, 322 can be microphone or beam separation measures gated by voice activity. The orientation comparator 335 can receive and process any of the signal or beam separations. Including the beam separations in this way can enable near-end voice activity over a wider range of angles than in other embodiments. Such improvement can clearly be seen from the separation data shown in
The data shown in
Thus, a disclosed orientation detector can estimate an angular displacement from a neutral orientation (e.g., an orientation in which the user's mouth is adjacent a defined region of a handset, for example centered between the microphones 10, 20). In some embodiments, such estimates can be relatively coarse—the detector can reflect that the device 1 is oriented so as to place a user's mouth relatively nearer one microphone than the other. In other embodiments, as such estimates can be relatively more refined—the detector can accurately reflect an extent of angular rotation from a neutral orientation up to about 50 degrees. Some embodiments accurately reflect an extent of angular rotation from a neutral orientation up to between about 25 degrees and about 55 degrees, such as between about 30 degrees and about 45 degrees, with about 40 being another exemplary extent of angular rotation that disclosed detectors can discern accurately. Some estimates of angular rotation relative to a user's mouth are accurate to within between about 1 degree and about 15 degrees, for example between about 3 degrees and about 8 degrees, with about 5 degrees being a particular example of accuracy of disclosed detectors.
An output mode selector 345 can set an operating mode for the selector 340. For example, the selector can choose between M1 and M4, between −X and +X, among M1, M4 and (M1+M4)/2, or among +X, −X and (M1+M4)/2.
B. Combined Orientation Detection Approaches
Some devices 1 are equipped with one or more of a gyroscope (or “gyro”), a proximity sensor and an accelerometer. The gyro and accelerometer can determine an angular position of a given device with respect to Earth in a quick, reliable and accurate manner. In addition, such orientation detection is robust to noise and does not rely on or require near-end voice activity. However, a difficulty in using the gyro in the current context of speech enhancement is that it provides orientation with respect to Earth and not with respect to a user's mouth. Nonetheless, the gyro can be used together with any separation-based or other microphone-based orientation technique disclosed herein to provide a rapid response to angular phone movement. This concept is generally illustrated in the schematic illustration in
Separation Based Position Detection (SBPD) (also sometimes referred to more generally as microphone-based orientation detection) can be performed as described above at 510. The position reading from the gyro or other orientation sensor can be output at 530 to the SBPD 510 in a continuous manner. The SBPD 510 can make a determination of Left, Center, or Right position whenever there is sufficient near-end voice activity, and the orientation sensor output is recorded at that time. Whenever the SBPD 510 detects a change in orientation, the corresponding orientation sensor output readings can be checked to see if the change in detected position is confirmed by the orientation sensor's angle change in magnitude and/or sign.
If the two orientation approaches reach different conclusions, then the output of the SBPD 510 can be declared to be in error and rejected. Errors can occur more often due to noise.
Another aspect of the method shown in
An SGBPD can then be made by comparing the current Gyro reading with average Gyro readings Gyro_Left, Gyro_Center and Gyro_Right 521 corresponding to Left, Center, Right orientations. An instantaneous Aggregate orientation 540 determination can be made by comparing the current Gyro position to <Gyro_Left, Gyro_Center and Gyro_Right>. An output from the aggregate orientation 540 can result in an indication 550 of orientation (e.g., a user-interpretable or a machine-readable) indication.
In some embodiments, information from the gyro (or another orientation-sensitive device, including other microphone-based orientation detectors, e.g., having 3 or more microphones for orientation detection) can be combined with any of the microphone-based orientation detection systems described herein algorithm to detect a finer resolution of orientation relative to a user's mouth than just left/center/right.
If a proximity sensor indicates the device is removed from a user's ear and no longer is being held in a “handset” position with a user's mouth near the microphones 10, 20, the noise estimation can be based only on one microphone, e.g., microphone 30.
IV. Computing Environments
The computing environment 1100 includes at least one central processing unit 1110 and memory 1120. In
A computing environment may have additional features. For example, the computing environment 1100 includes storage 1140, one or more input devices 1150, one or more output devices 1160, and one or more communication connections 1170. An interconnection mechanism (not shown) such as a bus, a controller, or a network, interconnects the components of the computing environment 1100. Typically, operating system software (not shown) provides an operating environment for other software executing in the computing environment 1100, and coordinates activities of the components of the computing environment 1100.
The store 1140 may be removable or non-removable, and can include selected forms of machine-readable media. In general machine-readable media includes magnetic disks, magnetic tapes or cassettes, CD-ROMs, CD-RWs, DVDs, magnetic tape, optical data storage devices, and carrier waves, or any other machine-readable medium which can be used to store information and which can be accessed within the computing environment 1100. The storage 1140 stores instructions for the software 1180, which can implement technologies described herein.
The store 1140 can also be distributed over a network so that software instructions are stored and executed in a distributed fashion. In other embodiments, some of these operations might be performed by specific hardware components that contain hardwired logic. Those operations might alternatively be performed by any combination of programmed data processing components and fixed hardwired circuit components.
The input device(s) 1150 may be a touch input device, such as a keyboard, keypad, mouse, pen, touchscreen or trackball, a voice input device, a scanning device, or another device, that provides input to the computing environment 1100. For audio, the input device(s) 1150 may include a microphone or other transducer (e.g., a sound card or similar device that accepts audio input in analog or digital form), or a CD-ROM reader that provides audio samples to the computing environment 1100. The output device(s) 1160 may be a display, printer, speaker, CD-writer, or another device that provides output from the computing environment 1100.
The communication connection(s) 1170 enable communication over a communication medium (e.g., a connecting network) to another computing entity. The communication medium conveys information such as computer-executable instructions, compressed graphics information, or other data in a modulated data signal.
Tangible machine-readable media are any available, tangible media that can be accessed within a computing environment 1100. By way of example, and not limitation, with the computing environment 1100, computer-readable media include memory 1120, storage 1140, communication media (not shown), and combinations of any of the above. Tangible computer-readable media exclude transitory signals.
V. Other Embodiments
The examples described above generally concern orientation-detection systems and related techniques. Other embodiments than those described above in detail are contemplated based on the principles disclosed herein, together with any attendant changes in configurations of the respective apparatus described herein. Incorporating the principles disclosed herein, it is possible to provide a wide variety of systems adapted to detect an orientation of a device relative to a signal source.
For example, additional microphones can be added as between the microphones 10, 20 to improve the sensitivity and resolution of available beams in resolving changes in orientation relative to a user's mouth. For example, additional beams can be generated and have a finer resolution across a particular range of angular positions relative to a user's mouth. As another example, one or more microphones can be added to the device at other respective positions spaced apart from the lower edge 4. By comparing separation of such additional microphones relative to separation of the microphones 10, 20, additional orientation information can be gathered, permitting resolution of orientations in pitch, yaw, and roll.
Directions and other relative references (e.g., up, down, top, bottom, left, right, rearward, forward, etc.) may be used to facilitate discussion of the drawings and principles herein, but are not intended to be limiting. For example, certain terms may be used such as “up,” “down,”, “upper,” “lower,” “horizontal,” “vertical,” “left,” “right,” and the like. Such terms are used, where applicable, to provide some clarity of description when dealing with relative relationships, particularly with respect to the illustrated embodiments. Such terms are not, however, intended to imply absolute relationships, positions, and/or orientations. For example, with respect to an object, an “upper” surface can become a “lower” surface simply by turning the object over. Nevertheless, it is still the same surface and the object remains the same. As used herein, “and/or” means “and” or “or”, as well as “and” and “or.” Moreover, all patent and non-patent literature cited herein is hereby incorporated by references in its entirety for all purposes.
The principles described above in connection with any particular example can be combined with the principles described in connection with another example described herein. Accordingly, this detailed description shall not be construed in a limiting sense, and following a review of this disclosure, those of ordinary skill in the art will appreciate the wide variety of filtering and computational techniques that can be devised using the various concepts described herein. Moreover, those of ordinary skill in the art will appreciate that the exemplary embodiments disclosed herein can be adapted to various configurations and/or uses without departing from the disclosed principles.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the disclosed innovations. Various modifications to those embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of this disclosure. Thus, the claimed inventions are not intended to be limited to the embodiments shown herein, but are to be accorded the full scope consistent with the language of the claims, wherein reference to an element in the singular, such as by use of the article “a” or “an” is not intended to mean “one and only one” unless specifically so stated, but rather “one or more”. All structural and functional equivalents to the elements of the various embodiments described throughout the disclosure that are known or later come to be known to those of ordinary skill in the art are intended to be encompassed by the features described and claimed herein. Moreover, nothing disclosed herein is intended to be dedicated to the public regardless of whether such disclosure is explicitly recited in the claims. No claim element is to be construed under the provisions of 35 USC 112, sixth paragraph, unless the element is expressly recited using the phrase “means for” or “step for”.
Thus, in view of the many possible embodiments to which the disclosed principles can be applied, we reserve to the right to claim any and all combinations of features and technologies described herein as understood by a person of ordinary skill in the art, including, for example, all that comes within the scope and spirit of the following claims.
Number | Name | Date | Kind |
---|---|---|---|
6937980 | Krasny et al. | Aug 2005 | B2 |
7146013 | Saito | Dec 2006 | B1 |
7174022 | Zhang et al. | Feb 2007 | B1 |
7983720 | Chen et al. | Jul 2011 | B2 |
8175291 | Chan et al. | May 2012 | B2 |
8428661 | Chen et al. | Apr 2013 | B2 |
8626498 | Lee et al. | Jan 2014 | B2 |
8831686 | Hansson et al. | Sep 2014 | B2 |
8868413 | Siotis | Oct 2014 | B2 |
8948416 | Chen et al. | Feb 2015 | B2 |
9031256 | Visser et al. | May 2015 | B2 |
9245527 | Lindahl | Jan 2016 | B2 |
20030027600 | Krasny et al. | Feb 2003 | A1 |
20030161484 | Kanamori | Aug 2003 | A1 |
20060147063 | Chen | Jul 2006 | A1 |
20090238377 | Ramakrishnan et al. | Sep 2009 | A1 |
20100017205 | Visser | Jan 2010 | A1 |
20110208520 | Lee | Aug 2011 | A1 |
20110288860 | Schevciw et al. | Nov 2011 | A1 |
20120057717 | Nystrom | Mar 2012 | A1 |
20120123772 | Thyssen | May 2012 | A1 |
20120123774 | Choi | May 2012 | A1 |
20120230526 | Zhang | Sep 2012 | A1 |
20130166298 | Harada | Jun 2013 | A1 |
20130166299 | Shimotani | Jun 2013 | A1 |
20130272540 | hgren | Oct 2013 | A1 |
20130332157 | Iyengar et al. | Dec 2013 | A1 |
20140093091 | Dusan et al. | Apr 2014 | A1 |
20140188467 | Jing et al. | Jul 2014 | A1 |
20150110284 | Niemisto | Apr 2015 | A1 |
20150350395 | Jiang | Dec 2015 | A1 |
Number | Date | Country |
---|---|---|
2835958 | Feb 2015 | EP |
Number | Date | Country | |
---|---|---|---|
20160360314 A1 | Dec 2016 | US |