The present invention relates to audio enhancement for automatically increasing the spectral bandwidth of a voice signal to increase a perceived sound quality in a telecommunication conversation.
Sound isolating (SI) earphones and headsets are becoming increasingly popular for music listening and voice communication. SI earphones enable the user to hear an incoming audio content signal (be it speech or music audio) clearly in loud ambient noise environments, by attenuating the level of ambient sound in the user ear-canal.
SI earphones benefit from using an ear canal microphone (ECM) configured to detect user voice in the occluded ear canal for voice communication in high noise environments. In such a configuration, the ECM detects sound in the users ear canal between the ear drum and the sound isolating component of the SI earphone, where the sound isolating component is, for example, a foam plug or inflatable balloon. The ambient sound impinging on the ECM is attenuated by the sound isolating component (e.g., by approximately 30 dB averaged across frequencies 50 Hz to 10 kHz). The sound pressure in the ear canal in response to user-generated voice can be approximately 70-80 dB. As such, the effective signal to noise ratio measured at the ECM is increased when using an ear canal microphone and sound isolating component. This is clearly beneficial for two-way voice communication in high noise environments: where the SI earphone wearer with ECM can hear the incoming voice signal reproduced with an ear canal receiver (i.e., loudspeaker), with the incoming voice signal from a remote calling party. Secondly, the remote party can clearly hear the voice of the SI earphone wearer with the ECM even if the near-end caller is in a noisy environment, due to the increase in signal-to-noise ratio as previously described.
The output signal of the ECM with such an SI earphone in response to user voice activity is such that high-frequency fricatives produced by the earphone wearer, e.g., the phoneme/s/, are substantially attenuated due to the SI component of the earphone absorbing the air-borne energy of the fricative sound generated at the user's lips. As such, very little user voice sound energy is detected at the ECM above about 4.5 kHz and when the ECM signal is auditioned it can sound “muffled”.
A number of related art discusses spectral expansion. Application US20070150269 describes spectral expansion of a narrowband speech signal. The application uses a “parameter detector” which for example can differentiate between a vowel and consonant in the narrowband input signal, and generates higher frequencies dependant on this analysis.
Application US20040138876 describes a system similar to US20070150269 in that a narrowband signal (300 Hz t0 3.4 kHz) is analysis to determine in sibilants or non-sibilants, and high frequency sound is generated in the case of the former occurrence to generate a new signal with energy up to 7.7 kHz.
U.S. Pat. No. 8,200,499 describes a system to extend the high-frequency spectrum of a narrow-band signal. The system extends the harmonics of vowels by introducing a non-linearity. Consonants are spectrally expanded using a random noise generator.
U.S. Pat. No. 6,895,375 describes a system for extending the bandwidth of a narrowband signal such as a speech signal. The method comprises computing the narrowband linear predictive coefficients (LPCs) from a received narrowband speech signal and then processing these LPC coefficients into wideband LPCs, and then generating the wideband signal from these wideband LPCs
The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the invention, its application, or uses. Similar reference numerals and letters refer to similar items in the following figures, and thus once an item is defined in one figure, it may not be discussed for following figures.
In some embodiments, a system increases the spectral range of the ECM signal so that detected user-voice containing high frequency energy (e.g., fricatives) is reproduced with higher frequency content (e.g., frequency content up to about 8 kHz) so that the processed ECM signal can be auditioned with a more natural and “less muffled” quality.
“Voice over IP” (VOIP) telecommunications is increasingly being used for two-way voice communications between two parties. The audio bandwidth of such VOIP calls is generally up to 8 kHz. With a conventional ambient microphone as found on a mobile computing device (e.g., smart phone or laptop), the audio output is approximately linear up to about 12 kHz. Therefore, in a VOIP call between two parties using these conventional ambient microphones, made in a quiet environment, both parties will hear the voice of the other party with a full audio bandwidth up to 8 kHz. However, when an ECM is used, even though the signal to noise ratio improves in high noise environments, the audio bandwidth is less compared with the conventional ambient microphones, and each user will experience the received voice audio as sounding band-limited or muffled, as the received and reproduced voice audio bandwidth is approximately half as would be using the conventional ambient microphones.
Thus, embodiments herein expand (or extend) the bandwidth of the ECM signal before being auditioned by a remote party during high-band width telecommunication calls, such as VOIP calls.
The relevant art described above fails to generate a wideband signal from a narrowband signal based on a first analysis of a reference wideband speech signal to generate a mapping matrix (e.g., least-squares regression fit) that is then applied to a narrowband input signal and noise signal to generate a wideband output signal.
There are two things that are “different” about the approach in some of the embodiments described herein: One difference is that there is an intermediate approach between a very simple model (that the energy in the 3.5-4 kHz range gets extended to 8 kHz, say), and a very complex model (that attempts to classify the phoneme at every frame, and deploy a specific template for each case). Embodiments herein can have a simple, mode-less model, but where it has quite a few parameters, which can be learned from training data. The second significant difference is that the some of the embodiments herein use a “dB domain” to do the linear prediction.
Referring to
The system 10 can be configured to be part of any suitable media or computing device. For example, the system may be housed in the computing device or may be coupled to the computing device. The computing device may include, without being limited to wearable and/or body-borne (also referred to herein as bearable) computing devices. Examples of wearable/body-borne computing devices include head-mounted displays, earpieces, smartwatches, smartphones, cochlear implants and artificial eyes. Briefly, wearable computing devices relate to devices that may be worn on the body. Bearable computing devices relate to devices that may be worn on the body or in the body, such as implantable devices. Bearable computing devices may be configured to be temporarily or permanently installed in the body. Wearable devices may be worn, for example, on or in clothing, watches, glasses, shoes, as well as any other suitable accessory.
Although only the first 11 and second 12 microphone are shown together on a right earpiece, the system 10 can also be configured for individual earpieces (left or right) or include an additional pair of microphones on a second earpiece in addition to the first earpiece.
Referring to
In the configuration shown, the first 13 and second 15 microphones are mechanically mounted to one side of eyeglasses. Again, the embodiment 20 can be configured for individual sides (left or right) or include an additional pair of microphones on a second side in addition to the first side.
With respect to the previous figures, the system 10 or 20 may represent a single device or a family of devices configured, for example, in a master-slave or master-master arrangement. Thus, components of the system 10 or 20 may be distributed among one or more devices, such as, but not limited to, the media device 14 illustrated in
The computing devices shown in
In one exemplary embodiment of the present invention, there exists a communication earphone/headset system connected to a voice communication device (e.g. mobile telephone, radio, computer device) and/or audio content delivery device (e.g. portable media player, computer device). Said communication earphone/headset system comprises a sound isolating component for blocking the users ear meatus (e.g. using foam or an expandable balloon); an Ear Canal Receiver (ECR, i.e. loudspeaker) for receiving an audio signal and generating a sound field in a user ear-canal; at least one ambient sound microphone (ASM) for receiving an ambient sound signal and generating at least one ASM signal; and an optional Ear Canal Microphone (ECM) for receiving a narrowband ear-canal signal measured in the user's occluded ear-canal and generating an ECM signal. A signal processing system receives an Audio Content (AC) signal from the said communication device (e.g. mobile phone etc) or said audio content delivery device (e.g. music player); and further receives the at least one ASM signal and the optional ECM signal. Said signal processing system processing the narrowband ECM signal to generate a modified ECM signal with increased spectral bandwidth.
In a second embodiment, the signal processing for increasing spectral bandwidth receives a narrowband speech signal from a non-microphone source, such as a codec or Bluetooth transceiver. The output signal with the increased spectral bandwidth is directed to an Ear Canal Receiver of an earphone or a loudspeaker on another wearable device.
The reader is now directed to the description of
As illustrated, the system 40 of
The earpiece includes an Ambient Sound Microphone (ASM) 120 to capture ambient sound, an Ear Canal Receiver (ECR) 114 to deliver audio to an ear canal 124, and an Ear Canal Microphone (ECM) 106 to capture and assess a sound exposure level within the ear canal 124. The earpiece can partially or fully occlude the ear canal 124 to provide various degrees of acoustic isolation. In at least one exemplary embodiment, assembly is designed to be inserted into the user's ear canal 124, and to form an acoustic seal with the walls of the ear canal 124 at a location between the entrance to the ear canal 124 and the tympanic membrane (or ear drum). In general, such a seal is typically achieved by means of a soft and compliant housing of sealing unit 108.
Sealing unit 108 is an acoustic barrier having a first side corresponding to ear canal 124 and a second side corresponding to the ambient environment. In at least one exemplary embodiment, sealing unit 108 includes an ear canal microphone tube 110 and an ear canal receiver tube 112. Sealing unit 108 creates a closed cavity of approximately 5 cc between the first side of sealing unit 108 and the tympanic membrane in ear canal 124. As a result of this sealing, the ECR (speaker) 114 is able to generate a full range bass response when reproducing sounds for the user. This seal also serves to significantly reduce the sound pressure level at the user's eardrum resulting from the sound field at the entrance to the ear canal 124. This seal is also a basis for a sound isolating performance of the electro-acoustic assembly.
In at least one exemplary embodiment and in broader context, the second side of sealing unit 108 corresponds to the earpiece, electronic housing unit 100, and ambient sound microphone 120 that is exposed to the ambient environment. Ambient sound microphone 120 receives ambient sound from the ambient environment around the user.
Electronic housing unit 100 houses system components such as a microprocessor 116, memory 104, battery 102, ECM 106, ASM 120, ECR, 114, and user interface 122. Microprocessor (116) can be a logic circuit, a digital signal processor, controller, or the like for performing calculations and operations for the earpiece. Microprocessor 116 is operatively coupled to memory 104, ECM 106, ASM 120, ECR 114, and user interface 120. A wire 118 provides an external connection to the earpiece. Battery 102 powers the circuits and transducers of the earpiece. Battery 102 can be a rechargeable or replaceable battery.
In at least one exemplary embodiment, electronic housing unit 100 is adjacent to sealing unit 108. Openings in electronic housing unit 100 receive ECM tube 110 and ECR tube 112 to respectively couple to ECM 106 and ECR 114. ECR tube 112 and ECM tube 110 acoustically couple signals to and from ear canal 124. For example, ECR outputs an acoustic signal through ECR tube 112 and into ear canal 124 where it is received by the tympanic membrane of the user of the earpiece. Conversely, ECM 114 receives an acoustic signal present in ear canal 124 though ECM tube 110. All transducers shown can receive or transmit audio signals to a processor 116 that undertakes audio signal processing and provides a transceiver for audio via the wired (wire 118) or a wireless communication path.
Step 1. A first training step generating a “mapping” (or “prediction”) matrix based on the analysis of a reference wideband signal and a reference narrowband signal. The mapping matrix is a transformation matrix to predict high frequency energy from a low frequency energy envelope. In one exemplary configuration, the reference wideband and narrowband signals are made from a simultaneous recording of a phonetically balanced sentence made with an ambient microphone located in an earphone and an ear canal microphone located in an earphone of the same individual (i.e. to generate the wideband and narrowband reference signals, respectively).
Step 2. Generating an energy envelope analysis of an input narrowband audio signal.
Step 3: Generating a resynthesized noise signal by processing a random noise signal with the mapping matrix of step 1 and the envelope analysis of step 2.
Step 4: High-pass filtering the resynthesized noise signal of step 3.
Step 5: Summing the high-pass filtered resynthesized noise signal with the original an input narrowband audio signal.
In the model, there are sufficient input channels for an accurate prediction, but not so many that we need a huge amount of training data, or that we end up being unable to generalize.
The second approach or aspect of note of the method is that we use the “dB domain” to do the linear prediction (this is different from the LPC approach).
The logarithmic dB domain is used since it has the ability to provide a good fit even for the relatively low-level energies. If you just do least squares on the linear energy, it puts all its modeling power into the highest 5% of the bins, or something, and the lower energy levels, to which human listeners are quite sensitive, are not well modeled (NB “mapping” and “prediction” matrix are used interchangeably).
1. A first outgoing signal where the narrowband input signal is from an Ear Canal Microphone signal in an earphone (the “near end” signal), and the output signal from the spectral expansion system is directed to a “far-end” loudspeaker via a voice telecommunications system.
2. A second incoming signal where from the a second spectral expansion system that processing a received voice signal from a far-end system, e.g. a received voice system from a cell-phone. Here, the output of the spectral expansion system is directed to the loudspeaker in an earphone of the near-end party.
In one embodiment where the media device 50 operates in a landline environment, the transceiver 52 can utilize common wire-line access technology to support POTS or VoIP services. In a wireless communications setting, the transceiver 52 can utilize common technologies to support singly or in combination any number of wireless access technologies including without limitation Bluetooth™, Wireless Fidelity (WiFi), Worldwide Interoperability for Microwave Access (WiMAX), Ultra Wide Band (UWB), software defined radio (SDR), and cellular access technologies such as CDMA-1X, W-CDMA/HSDPA, GSM/GPRS, EDGE, TDMA/EDGE, and EVDO. SDR can be utilized for accessing a public or private communication spectrum according to any number of communication protocols that can be dynamically downloaded over-the-air to the communication device. It should be noted also that next generation wireless access technologies can be applied to the present disclosure.
The power supply 62 can utilize common power management technologies such as power from USB, replaceable batteries, supply regulation technologies, and charging system technologies for supplying energy to the components of the communication device and to facilitate portable applications. In stationary applications, the power supply 62 can be modified so as to extract energy from a common wall outlet and thereby supply DC power to the components of the communication device 50.
The location unit 58 can utilize common technology such as a GPS (Global Positioning System) receiver that can intercept satellite signals and there from determine a location fix of the portable device 50.
The controller processor 60 can utilize computing technologies such as a microprocessor and/or digital signal processor (DSP) with associated storage memory such a Flash, ROM, RAM, SRAM, DRAM or other like technologies for controlling operations of the aforementioned components of the communication device.
It should be noted that the methods 200 in
a. Smart watches.
b. Smart “eye wear” glasses.
c. Remote control units for home entertainment systems.
d. Mobile Phones.
e. Hearing Aids.
f. Steering wheels.
Such embodiments of the inventive subject matter may be referred to herein, individually and/or collectively, by the term “invention” merely for convenience and without intending to voluntarily limit the scope of this application to any single invention or inventive concept if more than one is in fact disclosed. Thus, although specific embodiments have been illustrated and described herein, it should be appreciated that any arrangement calculated to achieve the same purpose may be substituted for the specific embodiments shown.
Where applicable, the present embodiments of the invention can be realized in hardware, software or a combination of hardware and software. Any kind of computer system or other apparatus adapted for carrying out the methods described herein are suitable. A typical combination of hardware and software can be a mobile communications device or portable device with a computer program that, when being loaded and executed, can control the mobile communications device such that it carries out the methods described herein. Portions of the present method and system may also be embedded in a computer program product, which comprises all the features enabling the implementation of the methods described herein and which when loaded in a computer system, is able to carry out these methods.
While the present invention has been described with reference to exemplary embodiments, it is to be understood that the invention is not limited to the disclosed exemplary embodiments. The scope of the following claims is to be accorded the broadest interpretation so as to encompass all modifications, equivalent structures and functions of the relevant exemplary embodiments. Thus, the description of the invention is merely exemplary in nature and, thus, variations that do not depart from the gist of the invention are intended to be within the scope of the exemplary embodiments of the present invention. Such variations are not to be regarded as a departure from the spirit and scope of the present invention.
For example, the spectral enhancement algorithms described herein can be integrated in one or more components of devices or systems described in the following U.S. Patent Applications, all of which are incorporated by reference in their entirety: U.S. patent application Ser. No. 11/774,965 , entitled Personal Audio Assistant filed Jul. 9, 2007 claiming priority to provisional application 60/806,769 filed on Jul. 8, 2006; U.S. patent application Ser. No. 11/942,370 filed 2007 Nov. 19 ; entitled Method and Device for Personalized Hearing; U.S. patent application Ser. No. 12/102,555 filed 2008 Jul. 8 entitled Method and Device for Voice Operated Control; U.S. patent application Ser. No. 14/036,198 filed Sep. 25, 2013 entitled Personalized Voice Control; U.S. patent application Ser. No. 12/165,022 filed Jan. 8, 2009 entitled Method and device for background mitigation; U.S. patent application Ser. No. 12/555,570 filed 2013 Jun. 13 entitled Method and system for sound monitoring over a network; and U.S. patent application Ser. No. 12/560,074 filed Sep. 15, 2009 entitled Sound Library and Method.
This disclosure is intended to cover any and all adaptations or variations of various embodiments. Combinations of the above embodiments, and other embodiments not specifically described herein, will be apparent to those of skill in the art upon reviewing the above description.
These are but a few examples of embodiments and modifications that can be applied to the present disclosure without departing from the scope of the claims stated below. Accordingly, the reader is directed to the claims section for a fuller understanding of the breadth and scope of the present disclosure.
This Application claims the priority benefit of Provisional Application No. 61/920,321 filed on Dec. 23, 2013, the entire disclosure of which is incorporated herein by reference
Number | Name | Date | Kind |
---|---|---|---|
5978759 | Tsushima et al. | Nov 1999 | A |
6289311 | Omori et al. | Sep 2001 | B1 |
6681202 | Miet | Jan 2004 | B1 |
6683965 | Sapiejewski | Jan 2004 | B1 |
6829360 | Iwata et al. | Dec 2004 | B1 |
6895375 | Malah et al. | May 2005 | B2 |
7181402 | Jax et al. | Feb 2007 | B2 |
7233969 | Rawlins et al. | Jun 2007 | B2 |
7397867 | Moore et al. | Jul 2008 | B2 |
7433910 | Rawlins et al. | Oct 2008 | B2 |
7454453 | Rawlins et al. | Nov 2008 | B2 |
7546237 | Nongpiur et al. | Jun 2009 | B2 |
7599840 | Mehrotra et al. | Oct 2009 | B2 |
7693709 | Thumpudi et al. | Apr 2010 | B2 |
7727029 | Bolin et al. | Jun 2010 | B2 |
7792680 | Iser et al. | Sep 2010 | B2 |
7831434 | Mehrotra et al. | Nov 2010 | B2 |
7953604 | Mehrotra et al. | May 2011 | B2 |
7991815 | Rawlins et al. | Aug 2011 | B2 |
8090120 | Seefeldt | Jan 2012 | B2 |
8162697 | Menolotto et al. | Apr 2012 | B1 |
8190425 | Mehrotra et al. | May 2012 | B2 |
8199933 | Seefeldt | Jun 2012 | B2 |
8200499 | Nongpiur et al. | Jun 2012 | B2 |
8206181 | Steijner et al. | Jun 2012 | B2 |
8332210 | Nilsson et al. | Dec 2012 | B2 |
8358617 | El-Maleh et al. | Jan 2013 | B2 |
8386243 | Nilsson et al. | Feb 2013 | B2 |
8437482 | Seefeldt et al. | May 2013 | B2 |
8554569 | Chen et al. | Oct 2013 | B2 |
8639502 | Boucheron et al. | Jan 2014 | B1 |
8731923 | Shu | May 2014 | B2 |
8771021 | Edeler | Jul 2014 | B2 |
8831267 | Annacone | Sep 2014 | B2 |
20020116196 | Tran | Aug 2002 | A1 |
20030093279 | Malah | May 2003 | A1 |
20040076305 | Santiago | Apr 2004 | A1 |
20040138876 | Kallio et al. | Jul 2004 | A1 |
20050004803 | Smeets et al. | Jan 2005 | A1 |
20050049863 | Gong | Mar 2005 | A1 |
20060190245 | Iser | Aug 2006 | A1 |
20070055519 | Seltzer et al. | Mar 2007 | A1 |
20070078649 | Hetherington et al. | Apr 2007 | A1 |
20070237342 | Agranat | Oct 2007 | A1 |
20080031475 | Goldstein | Feb 2008 | A1 |
20080037801 | Alves | Feb 2008 | A1 |
20080208575 | Laaksonen | Aug 2008 | A1 |
20080219456 | Goldstein | Sep 2008 | A1 |
20080300866 | Mukhtar | Dec 2008 | A1 |
20090048846 | Smaragdis et al. | Feb 2009 | A1 |
20090129619 | Nordahn | May 2009 | A1 |
20090296952 | Pantfoerder | Dec 2009 | A1 |
20100074451 | Usher et al. | Mar 2010 | A1 |
20100158269 | Zhang | Jun 2010 | A1 |
20100246831 | Mahabub et al. | Sep 2010 | A1 |
20110005828 | Ye et al. | Jan 2011 | A1 |
20110019838 | Kaulberg et al. | Jan 2011 | A1 |
20110112845 | Jasiuk et al. | May 2011 | A1 |
20110188669 | Lu | Aug 2011 | A1 |
20110282655 | Endo | Nov 2011 | A1 |
20120046946 | Shu | Feb 2012 | A1 |
20120121220 | Krummrich | May 2012 | A1 |
20120128165 | Visser et al. | May 2012 | A1 |
20120215519 | Park et al. | Aug 2012 | A1 |
20120321097 | Braho | Dec 2012 | A1 |
20130013300 | Otani | Jan 2013 | A1 |
20130024191 | Krutsch | Jan 2013 | A1 |
20130039512 | Miyata et al. | Feb 2013 | A1 |
20130052873 | Riezebos et al. | Feb 2013 | A1 |
20130108064 | Kocalar et al. | May 2013 | A1 |
20130195283 | Larson et al. | Aug 2013 | A1 |
20130210286 | Golko | Aug 2013 | A1 |
20130244485 | Lam et al. | Sep 2013 | A1 |
20130322653 | Tsai et al. | Dec 2013 | A1 |
20140072156 | Kwon | Mar 2014 | A1 |
20140321673 | Seo et al. | Oct 2014 | A1 |
20150117663 | Hsu et al. | Apr 2015 | A1 |
20150156584 | Chen et al. | Jun 2015 | A1 |
20150358719 | Mackay et al. | Dec 2015 | A1 |
Number | Date | Country | |
---|---|---|---|
20150179178 A1 | Jun 2015 | US |
Number | Date | Country | |
---|---|---|---|
61920321 | Dec 2013 | US |