DVE system with normalized selection

Information

  • Patent Grant
  • 6549629
  • Patent Number
    6,549,629
  • Date Filed
    Wednesday, February 21, 2001
    23 years ago
  • Date Issued
    Tuesday, April 15, 2003
    21 years ago
Abstract
In a DVE, digital voice enhancement, communication system, the selection decision for choosing which microphone to be active is based on a given function of the speech of a respective talker relative to his/her acoustic environment at the respective microphone. The selection decision is based on a selection technique normalizing at least one of a) different microphone sensitivities and b) different background noise levels at the respective microphones, preferably based on the ratio of how much louder a talker speaks over the background noise at his/her respective microphone.
Description




BACKGROUND AND SUMMARY OF THE INVENTION




The invention relates to digital voice enhancement, DVE, communication systems, and more particularly to enhanced selection techniques between microphones.




The invention may be used in duplex systems, for example as shown in U.S. Pat. No. 5,033,082, and U.S. application Ser. No. 08/927,874, filed Sep. 11, 1997, simplex systems, for example as shown in U.S. application Ser. No. 09/050,511, filed Mar. 30, 1998, all incorporated herein by reference, and in other DVE communication systems.




The invention of the '874 application relates to acoustic echo cancellation systems, including active acoustic attenuation systems and communication systems. The invention of the '874 application arose during continuing development efforts relating to the subject matter of U.S. Pat. No. 5,033,082, incorporated herein by reference.




In one aspect of the invention of the '874 application, a fully coupled active echo cancellation matrix is provided, cancelling echo due to acoustic transmission between zones, in addition to cancellation of echoes due to electrical transmission between zones as in incorporated U.S. Pat. No. 5,033,082. In the latter patent, a communication system is provided including a first acoustic zone, a second acoustic zone, a first microphone at the first zone, a first loudspeaker at the first zone, a second microphone at the second zone and having an output supplied to the first loudspeaker such that a first person at the first zone can hear the speech of a second person at the second zone as transmitted by the second microphone and the first loudspeaker, a second loudspeaker at the second zone and having an input supplied from the first microphone such that the second person at the second zone can hear the speech of the first person at the first zone as transmitted by the first microphone and the second loudspeaker, a first model cancelling the speech of the second person in the output of the first microphone otherwise present due to electrical transmission from the second microphone to the first loudspeaker and broadcast by the first loudspeaker to the first microphone, the cancellation of the speech of the second person in the output of the first microphone preventing rebroadcast thereof by the second loudspeaker, and a second model cancelling the speech of the first person in the output of the second microphone otherwise present due to electrical transmission from the first microphone to the second loudspeaker and broadcast by the second loudspeaker to the second microphone, the cancellation of the speech of the first person in the output of the second microphone preventing rebroadcast thereof by the first loudspeaker. In the invention of the '874 application, there is provided a third model cancelling the speech of the first person in the output of the first microphone otherwise present due to acoustic transmission from the second loudspeaker in the second zone to the first microphone in the first zone, and a fourth model cancelling the speech of the second person in the output of the second microphone otherwise due to acoustic transmission from the first loudspeaker in the first zone to the second microphone in the second zone. The invention of the '874 application has desirable application in those implementations where there is acoustic coupling between the first and second zones, for example in a vehicle such as a minivan, where the first zone is the front seat and the second zone is a rear seat, and it is desired to provide an intercom communication system, and cancel echoes not only due to local acoustic transmission in a zone but also global acoustic transmission between zones, including in combination with active acoustic attenuation.




In another aspect of the invention of the '874 application, there is provided a switch having open and closed states, and conducting the output of a microphone therethrough in the closed state, a voice activity detector having an input from the output of the microphone at a node between the microphone and the switch, an occupant sensor sensing the presence of a person at the acoustic zone, and a logical AND function having a first input from the voice activity detector, a second input from the occupant sensor, and an output to the switch to actuate the latter between open and closed states. This feature is desirable in automotive applications when there are no additional passengers for a driver to communicate with.




In another aspect of the invention of the '874 application, an input to a model is supplied through a variable training signal circuit providing increasing training signal levels with increasing speech signal levels or increased interior ambient noise levels associated with higher vehicle speeds. This is desirable for on-line training noise to be imperceptible by the occupant yet have a sufficient signal to noise ratio for accurate model convergence.




In another aspect of the invention of the '874 application, a noise responsive high pass filter is provided between a microphone and a remote yet acoustically coupled loudspeaker, and having a filter cutoff effective at elevated noise levels and reducing bandwidth and making more gain available, to improve intelligibility of speech of a person in the zone of the microphone transmitted to the remote loudspeaker. In vehicle applications, the high pass filter is vehicle speed sensitive, such that at higher vehicle speeds and resulting higher noise levels, lower frequency speech content is blocked and higher frequency speech content is passed, the lower frequency speech content being otherwise masked at higher speeds by broadband vehicle and wind noise, so that the reduced bandwidth and the absence of the lower frequency speech content does not sacrifice the perceived quality of speech, and such that at lower vehicle speeds and resulting lower noise levels, the cutoff frequency of the filter is lowered such that lower frequency speech content is passed, in addition to higher frequency speech content, to provide enriched low frequency performance, and overcome objections to a tinny sounding system.




In another aspect of the invention of the '874 application, there is provided a feedback detector having an input from a microphone, and an output controlling an adjustable notch filter filtering the output of the microphone supplied to a remote yet acoustically coupled loudspeaker. This overcomes prior objections in closed loop communication systems which can become unstable whenever the total loop gain exceeds unity. Careful setting of system gain and acoustic echo cancellation may be used to ensure system stability. For various reasons, such as high gain requirements, acoustic feedback may occur, which is often at the system resonance or where the free response is relatively undamped. These resonances usually have a very high Q factor and can be represented by a narrow band in the frequency domain. Thus, the total system gain ceiling is determined by a small portion of the communication system bandwidth, in essence limiting performance across all frequencies in the band for one or more narrow regions. The present invention overcomes this objection.




In another aspect of the invention of the '874 application, an acoustic feedback tonal canceler is provided, removing tonal noise from the output of the microphone to prevent broadcast thereof by a remote but acoustically coupled loudspeaker.




The invention of the '511 application arose during development efforts directed toward reducing complexities of full duplex voice communication systems, i.e. bidirectional voice transmission where talkers exchange information simultaneously. In a full duplex system, acoustic echo cancellation is needed to overcome feedback generated by closed loop communication channel instabilities. Use of a simplex scheme that alternately selects one or another microphone or channel as active is another way to effectively control feedback into a near end microphone from a near end loudspeaker. In a simplex system, voice transmission is unidirectional, i.e. either one way or the other way at any given time, but not in both directions at the same time.




A simplex digital voice enhancement communication system does not rely on acoustic echo cancellation to ensure stable communication loop gains for closely coupled microphones and loudspeakers. However, there is a potential for feedback into a near end microphone from a far end loudspeaker. This situation exists because it would be self-defeating to have the active microphone switched off. The invention of the '511 application addresses and solves this problem in a particularly simple and effective manner with a combination of readily available known components.




The present invention relates to enhanced selection techniques in a digital voice enhancement communication system for selecting which of a plurality of microphones to connect to a loudspeaker. The switch in the DVE system must decide which microphone from an array of microphones to select as the active one. In the past, this decision was done by comparing the average magnitude of all microphone signals in which speech was detected (voice plus noise signals). The accuracy of this method was dependent on the sensitivity of each microphone and the background (noise) signal levels at each microphone. For example, a first talker might have a more sensitive microphone than a second talker and would therefore have a higher chance at being selected as the active talker. As another example, a third talker might be in a noiser location and therefore have a higher chance at being selected. The noted prior art method was not immune to different microphone sensitivities and different background noise levels. The present invention addresses and solves this problem.











BRIEF DESCRIPTION OF THE DRAWINGS





FIGS. 1-8

are taken from the noted '874 application.





FIG. 1

shows an active acoustic attenuation and communication system in accordance with the invention of the '874 application.





FIG. 2

shows an intercom communication system in accordance with the invention of the '874 application.





FIG. 3

shows a portion of a communication system in accordance with the invention of the '874 application.





FIG. 4

shows a communication system in accordance with the invention of the'874 application.





FIG. 5

shows a communication system in accordance with the invention of the '874 application.





FIG. 6

shows a communication system in accordance with the invention of the '874 application.





FIG. 7

shows a communication system in accordance with the invention of the '874 application.





FIG. 8

shows a communication system in accordance with the invention of the '874 application.





FIG. 9

is taken from the noted '511 application.





FIG. 9

schematically illustrates a digital voice enhancement communication system in accordance with the invention of the '511 application.





FIG. 10

shows a DVE, digital voice enhancement, communication system in accordance with the present invention.











DETAILED DESCRIPTION OF THE INVENTION





FIG. 1

is similar to the drawing of incorporated U.S. Pat. No. 5,033,082, and uses like reference numerals where appropriate to facilitate understanding.

FIG. 1

shows an active acoustic attenuation system


10


having a first zone


12


subject to noise from a noise source


14


, and a second zone


16


spaced from zone


12


and subject to noise from a noise source


18


. Microphone


20


senses noise from noise source


14


. Microphone


22


senses noise from noise source


18


. Zone


12


includes a talking location


24


therein such that a person


26


at location


24


is subject to noise from noise source


14


. Zone


16


includes a talking location


28


therein such that a person


30


at location


28


is subject to noise from noise source


18


. Loudspeaker


32


introduces sound into zone


12


at location


24


. Loudspeaker


34


introduces sound into zone


16


at location


28


. An error microphone


36


senses noise and speech at location


24


. Error microphone


38


senses noise and speech at location


28


.




An adaptive filter model


40


adaptively models the acoustic path from noise microphone


20


to talking location


24


. Model


40


is preferably that disclosed in U.S. Pat. No. 4,677,676, incorporated herein by reference. Adaptive filter model


40


has a model input


42


from noise microphone


20


, an error input


44


from error microphone


36


, and outputs at output


46


a correction signal to loudspeaker


32


to introduce cancelling sound at location


24


to cancel noise from noise source


14


at location


24


, all as in incorporated U.S. Pat. No. 4,677,676.




An adaptive filter model


48


adaptively models the acoustic path from noise microphone


22


to talking location


28


. Model


48


has a model input


50


from noise microphone


22


, an error input


52


from error microphone


38


, and outputs at output


54


a correction signal to loudspeaker


34


to introduce cancelling sound at location


28


to cancel noise from noise source


18


at location


28


.




An adaptive filter model


56


adaptively cancels noise from noise source


14


in the output


58


of error microphone


36


. Model


56


has a model input


60


from noise microphone


20


, an output correction signal at output


62


subtractively summed at summer


64


with the output


58


of error microphone


36


to provide a sum


66


, and an error input


68


from sum


66


.




An adaptive filter model


70


adaptively cancels noise from noise source


18


in the output


72


of error microphone


38


. Model


70


has a model input


74


from noise microphone


22


, an output correction signal at output


76


subtractively summed at summer


78


with the output


72


of error microphone


38


to provide a sum


80


, and an error input


82


from sum


80


.




An adaptive filter model


84


adaptively cancels speech from person


30


in the output


58


of error microphone


36


. Model


84


has a model input


86


from error microphone


38


, an output correction signal at output


88


subtractively summed at summer


90


with sum


66


to provide a sum


92


, and an error input


94


from sum


92


. Sum


92


is additively summed at summer


96


with the output


54


of model


48


to provide a sum


98


which is supplied to loudspeaker


34


. Sum


92


is thus supplied to loudspeaker


34


such that person


30


can hear the speech of person


26


.




An adaptive filter model


100


adaptively cancels speech from person


26


in the output


72


of error microphone


38


. Model


100


has a model input


102


from error microphone


36


at sum


92


, an output correction signal at output


104


subtractively summed at summer


106


with sum


80


to provide a sum


108


, and an error input


110


from sum


108


. Sum


108


is additively summed at summer


112


with the output


46


of model


40


to provide a sum


114


which is supplied to loudspeaker


32


. Hence, sum


108


is supplied to loudspeaker


32


such that person


26


can hear the speech of person


30


. Model input


86


is provided by sum


108


, and model input


102


is provided by sum


92


.




Sum


98


supplied to loudspeaker


34


is substantially free of noise from noise source


14


as acoustically and electrically cancelled by adaptive filter models


40


and


56


, respectively. Sum


98


is substantially free of speech from person


30


as electrically cancelled by adaptive filter model


84


. Hence, sum


98


to loudspeaker


34


is substantially free of noise from noise source


14


and speech from person


30


but does contain speech from person


26


, such that loudspeaker


34


cancels noise from noise source


18


at location


28


and introduces substantially no noise from noise source


14


and introduces substantially no speech from person


30


and does introduce speech from person


26


, such that person


30


can hear person


26


substantially free of noise from noise sources


14


and


18


and substantially free of his own speech.




Sum


114


supplied to loudspeaker


32


is substantially free of noise from noise source


18


as acoustically and electrically cancelled by adaptive filter models


48


and


70


, respectively. Sum


114


is substantially free of speech from person


26


as electrically cancelled by adaptive filter model


100


. Sum


114


to loudspeaker


32


is thus substantially free of noise from noise source


18


but does contain speech from person


30


, such that loudspeaker


32


cancels noise from noise source


14


at location


24


and introduces substantially no noise from noise source


18


and introduces substantially no speech from person


26


and does introduce speech from person


30


, such that person


26


can hear person


30


substantially free of noise from noise sources


14


and


18


and substantially free of his own speech.




Each of the adaptive filter models is preferably that shown in above incorporated U.S. Pat. No. 4,677,676. Each model adaptively models its respective forward path from its respective input to its respective output on-line without dedicated off-line pretraining. Each of models


40


and


48


also adaptively models its respective feedback path from its respective loudspeaker to its respective microphone for both broadband and narrowband noise without dedicated off-line pretraining and without a separate model dedicated solely to the feedback path and pretrained thereto. Each of models


40


and


48


, as in above noted incorporated U.S. Pat. No. 4,677,676, adaptively models the feedback path from the respective loudspeaker to the respective microphone as part of the adaptive filter model itself without a separate model dedicated solely to the feedback path and pretrained thereto. Each of models


40


and


48


has a transfer function comprising both zeros and poles to model the forward path and the feedback path, respectively. Each of models


56


and


70


has a transfer function comprising both poles and zeros to adaptively model the pole-zero acoustical transfer function between its respective input microphone and its respective error microphone. Each of models


84


and


100


has a transfer function comprising both poles and zeros to adaptively model the pole-zero acoustical transfer function between its respective output loudspeaker and its respective error microphone. The adaptive filter for all models is preferably accomplished by the use of a recursive least mean square filter, as described in incorporated U.S. Pat. No. 4,677,676. It is also preferred that each of the models


40


and


48


be provided with an auxiliary noise source, such as


140


in incorporated U.S. Pat. No. 4,677,676, introducing auxiliary noise into the respective adaptive filter model which is random and uncorrelated with the noise from the respective noise source to be cancelled.




In one embodiment, noise microphones


20


and


22


are placed at the end of a probe tube in order to avoid placing the microphones directly in a severe environment such as a region of high temperature or high electromagnetic field strength. Alternatively, the signals produced by noise microphones


20


and


22


are obtained from a vibration sensor placed on the respective noise source or obtained from an electrical signal directly associated with the respective noise source, for example a tachometer signal on a machine or a computer generated drive signal on a device such as a magnetic resonance scanner.




In one embodiment, a single noise source


14


and model


40


are provided, with cancellation via loudspeaker


32


and communication from person


26


via microphone


36


. In another embodiment, only models


40


and


56


are provided. In another embodiment, only models


40


,


56


and


84


are provided.




It is thus seen that communication system


10


includes a first acoustic zone


12


, a second acoustic zone


16


, a first microphone


36


at the first zone, a first loudspeaker at the first zone, a second microphone


38


at the second zone and having an output supplied to first loudspeaker


32


such that a first person


26


at first zone


12


can hear the speech of a second person


30


at second zone


16


as transmitted by second microphone


38


and first loudspeaker


32


, and a second loudspeaker


34


at second zone


16


and having an input supplied from first microphone


36


such that the second person


30


at the second zone


16


can hear the speech of the first person


26


at the first zone


12


as transmitted by first microphone


36


and second loudspeaker


34


. Each of the zones is subject to noise. First person


26


at first talking location


24


in first zone


12


and second person


30


at second talking location


28


in second zone


16


are each subject to noise. Loudspeaker


32


introduces sound into first zone


12


at first talking location


24


. Loudspeaker


34


introduces sound into second zone


16


at second talking location


28


. Error microphone senses noise and speech at location


24


. Model


40


has a model input from a reference signal correlated to the noise as provided by input microphone


20


sensing noise from noise source


14


. Model


40


has an error input


44


from microphone


36


. Model


40


has a model output


46


outputting a correction signal to loudspeaker


32


to introduce canceling sound at location


24


to attenuate noise thereat. Error microphone


38


senses noise and speech at location


28


. Model


48


has a model input


50


from a reference signal correlated with the noise as provided by input microphone


22


sensing the noise from noise source


18


. Model


48


has an error input


52


from microphone


38


. Model


48


has a model output outputting a correction signal to loudspeaker


34


to introduce cancelling sound at location


28


to attenuate noise thereat. Model


56


has a model input


60


from microphone


20


, a model output


62


outputting a correction signal summed at summer


64


with the output


58


of microphone


36


to electrically cancel noise from first zone


12


in the output of microphone


36


, and an error input


68


from the output


66


of summer


64


. Model


70


has a model input


74


from microphone


22


, a model output


76


outputting a correction signal summed at summer


78


with the output


72


of microphone


38


to cancel noise from zone


16


in the output of microphone


38


, and an error input


82


from the output


80


of summer


78


. Model


84


cancels the speech of second person


30


in the output of microphone


36


otherwise present due to electrical transmission from microphone


38


to loudspeaker


32


and broadcast by loudspeaker


32


to microphone


36


, the cancellation of the speech of person


30


in the output of microphone


36


preventing rebroadcast thereof by loudspeaker


34


. Model


100


cancels the speech of person


26


in the output of microphone


38


otherwise present due to electrical transmission from microphone


36


to loudspeaker


34


and broadcast by loudspeaker


34


to microphone


38


, the cancellation of the speech of person


26


in the output of microphone


34


preventing rebroadcast thereof by loudspeaker


32


.




The system above described is shown in incorporated U.S. Pat. No. 5,033,082.




In the system of the '874 application, additional models


120


and


122


are provided. Model


120


cancels the speech of person


26


in the output of microphone


36


otherwise present due to acoustic transmission from loudspeaker


34


in zone


16


to microphone


36


in zone


12


. This is desirable in implementations where there is no acoustic isolation or barrier between zones


12


and


16


, for example as in a vehicle such as a minivan where zone


12


may be the front seat and zone


16


a back seat, i.e. where there is acoustic coupling of the zones and acoustic transmission therebetween such that sound broadcast by loudspeaker


34


is not only electrically transmitted via microphone and loudspeaker


32


to zone


12


, but is also acoustically transmitted from loudspeaker to zone


12


. Model


122


cancels the speech of person


30


in the output of microphone otherwise due to acoustic transmission from loudspeaker


32


in zone


12


to microphone


38


in zone


16


.




Model


84


models the path from loudspeaker


32


to microphone


36


. Model


100


models the path from loudspeaker


34


to microphone


38


. Model


120


models the path from loudspeaker


34


to microphone


36


. Model


122


models the path from loudspeaker


32


to microphone


38


. Model


84


has a model input


86


from the input to loudspeaker


32


supplied from the output of microphone


38


, and a model output


88


to the output of microphone


36


supplied to the input of loudspeaker


34


. Model


100


has a model input


102


from the input to loudspeaker


34


supplied from the output of microphone


36


, and a model output


104


to the output of microphone


38


supplied to the input of loudspeaker


32


. Model


120


has a model input


124


from the input to loudspeaker


34


supplied from the output of microphone


36


, and a model output


126


to the output of microphone


36


supplied to the input of loudspeaker


34


. Model


122


has a model input


128


from the input to loudspeaker


32


supplied from the output of microphone


38


, and a model output


130


to the output of microphone


38


supplied to the input of loudspeaker


32


. An auxiliary noise source


132


, like auxiliary noise source


140


in incorporated U.S. Pat. No. 4,677,676, introduces auxiliary noise through summer


134


into model inputs


102


and


124


of models


100


and


120


, respectively, which auxiliary noise is random and uncorrelated with the noise from the respective noise source to be canceled. In one embodiment, the auxiliary noise source


132


is provided by a Galois sequence, M. R. Schroeder,


Number Theory In Science And Communications


, Berlin: Springer-Verlag, 1984, pages 252-261, though other random uncorrelated noise sources may of course be used. The Galois sequence is a pseudo random sequence that repeats after 2


M-


1 points, where M is the number of stages in a shift register. The Galois sequence is preferred because it is easy to calculate and can easily have a period much longer than the response time of the system. An auxiliary random noise source


136


introduces auxiliary noise through summer


138


into model inputs


86


and


128


of models and


122


, respectively, which auxiliary noise is random and uncorrelated with the noise from the respective noise source to be canceled. It is preferred that auxiliary noise source


136


be provided by a Galois sequence, as above described. Each of auxiliary noise sources


132


and


136


is random and uncorrelated relative to each other and relative to noise from noise source


14


, speech from person


26


, noise from noise source


18


, and speech from person


30


. Model


120


is trained to converge to and model the path from loudspeaker


34


to microphone


36


by the auxiliary noise from source


132


. Model


100


is trained to converge to and model the path from loudspeaker


34


to microphone


38


by the auxiliary noise from source


132


. Model


84


is trained to converge to and model the path from loudspeaker


32


to microphone


36


by the auxiliary noise from source


136


. Model


122


is trained to converge to and model the path from loudspeaker


32


to microphone


38


by the auxiliary noise from source


136


.





FIG. 2

shows a system similar to

FIG. 1

, and uses like reference numerals where appropriate to facilitate understanding. The system of

FIG. 2

is used in a vehicle


140


, such as a minivan. Loudspeaker


32


provides enhanced voice from zone


2


, i.e. with noise and echo cancellation as above described. Loudspeaker


32


also provides audio for zone


1


and cellular phone for zone


1


at


12


such as the front seat. Also supplied at zone are voice in zone


1


from person


26


such as the driver and/or front seat passenger. Also supplied at zone


1


due to acoustic coupling from zone


2


are the echo of enhanced voice


1


broadcast by speaker


34


, with noise and echo cancellation as above described, and audio from zone


2


and cellular phone from zone


2


. The signal content in the output of microphone


36


as shown at


59


includes: voice


1


; enhanced voice


1


echo; enhanced voice


2


; audio


1


; audio


2


; cell phone


1


; cell phone


2


. Loudspeaker


34


broadcasts enhanced voice


1


, audio for zone


2


and cellular phone for zone


2


at


16


such as a rear seat of the vehicle. Also supplied at zone


2


are voice in zone


2


from person


30


, such as one or more rear seat passengers, enhanced voice


2


echo which is the voice from zone


2


as broadcast by speaker


32


in zone


1


due to acoustic coupling therebetween, as well as audio from zone


1


and cell phone from zone


1


as broadcast by speaker


32


. The signal content in the output


72


of microphone


38


as shown at


73


includes: voice


2


; enhanced voice


2


echo; enhanced voice


1


; audio


1


; audio


2


; cell phone


1


; cell phone


2


. Summer


90


sums the output


58


of microphone


36


, the output


88


of model


84


, and the output


126


of model


120


, and supplies the resultant sum at


92


to summer


134


, error correlator multiplier


142


of model


84


, and error correlator multiplier


144


of model


120


. Summer


134


sums the output


92


of summer


90


, the training signal from auxiliary random noise source


132


, and the audio


2


and cell phone


2


signals for zone


2


, and supplies the resultant sum to loudspeaker


34


, model input


124


of model


120


, and model input


102


of model


100


. Summer


106


sums the output


72


of microphone


38


, model output


104


of model


100


, and model output


130


of model


122


, and supplies the resultant sum at


108


to summer


138


, error correlator multiplier


146


of model


100


, and error correlator multiplier


148


of model


122


. Summer


138


sums the output


108


of summer


106


, the training signal from auxiliary random noise source


136


, and the audio


1


and cell phone


1


signals for zone


1


, and supplies the resultant sum to loudspeaker


32


, model input


86


of model


84


, and model input


128


of model


122


. The training signal from auxiliary random noise source


132


is supplied to summer


134


and to error correlator multipliers


146


and


144


of models


100


and


120


, respectively. The training signal from auxiliary random noise source


136


is supplied to summer


138


and to error correlator multipliers


142


and


148


of models


84


and


122


, respectively.




In digital voice enhancement, DVE, systems, acoustic echo cancelers, AEC, are used to minimize acoustic reflection and echo, prevent acoustic feedback, and remove additional unwanted signals. Acoustic echo cancelers are most often only applied between the immediate zone loudspeaker and microphone, e.g. model


84


modeling the path from loudspeaker


32


to microphone


36


. However, in certain applications where the propagation losses or physical damping between communication zones such as


12


and


16


is not sufficient, e.g. a vehicle interior such as a minivan, the acoustic path between these zones may allow significant coupling and cause added system echo, acoustic feedback and signal corruption.




The system applies acoustic echo cancelers between all microphones and loudspeakers in the digital voice enhancement system as shown in FIG.


2


. This allows signal contributions from the following sources to be removed from the microphone signal so that it includes only the voice signal from the near end talker: the far end voice broadcast from the near end loudspeaker; the near end audio broadcast from the near end loudspeaker; the near end voice broadcast from the far end loudspeaker; the far end audio broadcast from the far end loudspeaker; cellular phone broadcast from near end and far end loudspeakers. By removing these components, the closed loop full duplex communication system is more stable with desired system gains that were not previously possible. In addition, the resulting signal has less extraneous noise which allows enhanced precision in speech processing activities.




Acoustic echo cancellation may require on-line estimation of the acoustic echo path. In vehicle implementations, it is desirable to detect when occupant movement occurs, to as quickly as possible update the acoustic echo cancellation models. In a desirable feature enabled by the present invention, the available supplemental restraint occupant sensor or a seat belt use detector may be monitored. If the sensor indicates a change in occupant location or seat belt use, an occupant movement is assumed, and rapid adaptation occurs to correct the acoustic echo cancellation models and ensure optimal performance of the system.




Further in vehicle implementations, the proper placement of a communication microphone is difficult due to varying sizes of occupants and seat track locations. Less ideal microphone locations result in lower signal to noise ratios, higher required system gain, and lower performance. In a desirable aspect, the system enables utilization of supplemental restraint occupant sensors or seat track location sensors, potentially available in future supplemental restraint occupant position detection systems. From such sensors, certain weight, height, fore/aft location information, etc., may be available. The system enables use of such information to select the most appropriate microphone, e.g. from a bank of microphones, and/or gain selection to ensure system performance. For example, certain weight or height information would signal a short occupant. From this information, the general seat track position may be presumed or obtained from a seat track location sensor, and a best suited microphone selected. Also, from height information, the distance from the occupant to the selected microphone might be estimated, and an appropriate gain applied to account for extra distance from the selected microphone. The system enables utilization of such signals to increase system robustness by selecting appropriate transducers and parameters. This provides microphone selection and/or gain selection by occupant sensor input.




Multidimensional digital voice enhancement systems can be reconfigured during operation to match occupant requirements. Many activities are processor intensive and compromise system robustness when compared with smaller dimensioned systems. In a desirable aspect, the system enables utilization of vehicle occupant sensor or seat belt use detector information to determine if an occupant is present in a particular digital voice enhancement zone. If an occupant is not detected, certain functions associated with that zone may be eliminated from the computational activities. Processor ability may be reassigned to other zones to do more elaborate signal processing. The system enables the system to reconfigure its dimensionality to perform in an optimum fashion with the requirements placed on it. This provides digital voice enhancement zone hibernation based on occupant sensors.




In digital voice enhancement systems, acoustic echo cancelers are used to minimize echo, stabilize closed loop communication channels, and prevent acoustic feedback, as above noted. The acoustic echo cancelers model the acoustic path between each loudspeaker and each microphone associated with the system. This full coupling of all the loudspeakers and microphones may be computationally expensive and objectionable in certain applications. In a desirable aspect, the system allows acoustic echo cancelers to be applied to loudspeaker-microphone acoustic paths when limited processor capabilities exist. Transfer functions are taken between each loudspeaker-microphone combination. The gain over the communication system bandwidth is compared between transfer functions. Those transfer functions exhibiting a higher gain trend over the frequency band indicate greater acoustic coupling between the particular loudspeaker and microphone. The system designer may use a gain trend ranking to apply acoustic echo cancelers first to those paths with the greater acoustic coupling. This allows the system designer to prioritize applying acoustic echo cancelers to the loudspeaker-microphone paths which most need assistance to ensure stable communication. Paths that cannot be serviced with acoustic echo cancelers would rely on the physical damping and propagation losses of the acoustic path for echo reduction, or other less intensive electronic means for increased stability. This enables digital voice enhancement optimization using physical characteristics.




A voice activity detection algorithm is judged by how accurately it responds to a wide variety of acoustic events. One that provides a 100% hit rate on desired voice signals and a 0% falsing rate on unwanted noises is considered ideal. Use of an occupant sensing device as one of the inputs to the voice activity detection algorithm can provide certainty, within limits of the occupant sensing device, that no falsing will occur when a location is not occupied. This feature would be especially relevant to automotive applications when there are no additional passengers for a driver to communicate with. Smart airbags and other passive safety devices may soon be required to know attributes such as the size, shape, and presence of passengers in vehicles for proper deployment. The minimum desired information to be known at the time of deployment would be to know if there is a passenger to be protected. No passenger, or possibly more important, a small passenger or child seat would require disarming of the passive restraint system. This sensing information would be useful as a compounding condition in digital voice enhancement systems to also deactivate a voice sensing microphone when no occupant is present. This provides voice activity detection with occupant sensing devices.





FIG. 3

shows a switch


150


having open and closed states, and conducting the output of microphone


38


therethrough in the closed state. A voice activity detector


152


has an input from the output of microphone


38


at a node


154


between microphone and switch


150


. An occupant sensor


156


senses the presence of a person at acoustic zone


16


, for example a rear passenger seat. A logic AND function provided by AND gate


158


has a first input


160


from voice activity detector


152


, a second input


162


from occupant sensor


156


, and an output


164


to switch


150


to actuate the latter between the open and closed states, to control whether the latter passes a zone transmit out signal or not.




It is desirable for on-line training noise to be imperceptible by the occupant, yet have sufficient signal to noise ratio for accurate model convergence. In a desirable aspect, the present system may be used to exploit microphone gate activity to increase the allowable training signal and acoustic echo cancellation convergence. This allows the acoustic echo cancellation models to be more aggressively and accurately adapted. When the microphone gate is opened, some level of speech will be present. When speech is transmitted, a higher level training signal may be added to the speech signal and still be imperceptible to the occupant. This can be accomplished by a gate controlled training signal gain, FIG.


4


. The present invention enables utilization of pre-existing system features to increase overall robustness in an unobtrusive fashion. This provides acoustic echo cancellation training noise level based on microphone gate activity.




In

FIG. 4

, the input to model


84


is supplied through a variable training signal circuit


170


providing increased training signal level with increasing speech signal levels from microphone


38


. Training signal circuit


170


includes a summer


172


having an input


174


from microphone


38


, an input


176


from a training signal, and an output


178


to loudspeaker


32


and to model


84


. A variable gain element


180


supplies the training signal from training signal source


182


to input


176


of summer


172


. A voice activity detector gate


184


senses the speech signal level from microphone


38


at a node


186


between microphone


38


and input


174


of summer


172


, and controls the gain of variable gain element


180


. As noted above, it is desired that the training signal levels be maintained below a level perceptible to a person at zone


12


.




Further in

FIG. 4

, the input to model


100


is supplied through variable training signal circuit


188


providing increasing training signal levels with increasing speech signal levels from microphone


36


. Training signal circuit


188


includes a summer


190


having an input


192


from microphone


36


, an input


194


from a training signal, and an output


196


to loudspeaker


34


and to model


100


. Variable gain element


198


supplies the training signal from training signal source


200


to input


194


of summer


190


. Voice activity detector gate


202


senses the speech signal level from microphone


36


at node


204


between microphone


36


and input


192


of summer


190


, and controls the gain of variable gain element


198


. It is preferred that the training signal level be maintained below a level perceptible to a person at zone


16


.




It is desirable to detect when occupant movement or luggage loading changes occur. In one implementation of the system, the vehicle door ajar or courtesy light signal may be monitored. If any door is opened, all on-line modeling is halted. This prohibits the models from adapting to both changes in the acoustic boundary characteristics due to open doors, and also to changes in loudspeaker location when mounted to the moving door. After the doors are determined to be shut, and a system settling time has passed, it can be assumed that an occupant movement or luggage loading change is likely to have occurred. Accordingly, adaptation can occur to correct the acoustic echo cancellation models and ensure optimal performance of the system. Alternatively, an echo return loss enhancement measurement can be made on each model to calculate the echo reduction offered by each acoustic echo cancellation and to determine if they are adequate. If it is determined that they are deficient, an aggressive adaptation could then correct the acoustic echo cancellation models. Again, the system enables the utilization of available signals to ensure system stability and robustness not only by not adapting while the physical system is in a nonfunctional condition but also by modeling when the system is returned to a functional condition to account for possible occupant or luggage movements.




Digital voice enhancement systems may pickup and rebroadcast engine related noise in vehicle applications or other applications involving periodic or tonal noise. This becomes particularly annoying when one of the communication zones has much lower engine related noise than others. In this situation, the rebroadcast noise is not masked by the primary engine related noise. In a desirable aspect of the system, the engine or engine related tach signal may be conditioned with DC blocking and magnitude clipping to meet proper A/D limitations. A rising edge or zero crossing detector monitors the input signal and calculates a scaler frequency value. An average magnitude detector also monitors the input signal to shut down the frequency detection routine if the average magnitude drops below a specified level. This is a noise rejection scheme for signals with varying amplitude depending on engine speed, revolutions per minute, RPM. The calculated frequency is then converted to the engine related frequencies of interest which are summed and input to an electronic noise control, ENC, filter reference, to be described. The output of the filter is then subtracted from the microphone signal to remove the engine related component from the signal.




In

FIG. 5

, a tonal noise remover


210


senses periodic noise and removes same from the output of microphone


36


to prevent broadcast thereof by loudspeaker


34


. Tonal noise remover


210


includes a summer


212


having an input


214


from microphone


36


, an input


216


from a tone generator


218


generating one or more tones in response to periodic noise and supplying same through adaptive filter model


220


, and an output


222


to loudspeaker


34


through summer


90


. Tone generator


218


receives a plurality of tach signals


224


,


226


, and outputs a plurality of tone signals to summer


228


for each of the tach signals, for example a tone signal


1


N


1


which is the same frequency as tach signal


1


, a tone signal


2


N


1


which is twice the frequency of tach signal


1


, a tone signal


4


N


1


which is four times the frequency of tach signal


1


, a tone signal


1


N


2


which is the same frequency as tach signal


2


, a tone signal


2


N


2


which is twice the frequency of tach signal


2


, etc. Model


220


has a model input


230


from summer


228


, a model output


232


outputting a correction signal to summer input


216


, and an error input


234


from summer output


222


.




Further in

FIG. 5

, a second tonal noise remover


240


senses periodic noise and removes same from the output of microphone


38


to prevent broadcast thereof by loudspeaker


32


. Tonal noise remover


240


includes summer


242


having an input


254


from microphone


38


, an input


246


from a tone generator


248


generating one more tones in response to periodic noise and supplying same through adaptive filter model


260


, and an output


262


to loudspeaker


32


through summer


106


. Tone generator


258


receives a plurality of tach signals such as


264


and


266


, and outputs a plurality of tone signals to summer


268


, one for each of the tach signals, as above described for tone generator


218


and tach signals


224


and


226


. Model


260


has a model input


270


from summer


268


, a model output


272


outputting a correction signal to summer input


246


, and an error input


274


from summer output


262


. In the noted vehicle implementation, tach


1


signals


224


and


264


are the same, and tach 2 signals


226


and


266


are the same.




In vehicle implementations, background ambient noise increases with vehicle speed, and as a result more gain is needed in a communication system to sustain adequate speech intelligibility. In a desirable aspect, the system enables application of a noise responsive, including vehicle speed sensitive, high pass filter to the microphone signal. The filter cutoff would increase with elevated noise levels, such as elevated vehicle speeds, and therefor reduce the system bandwidth. By limiting system bandwidth, more gain is available, resulting in improved speech intelligibility. At higher speeds, the lower frequency speech content is masked by broadband vehicle and wind noise, so that the reduced bandwidth does not sacrifice the perceived quality of speech. At low speeds, the high pass filter lowers its cutoff frequency, to provide enriched low frequency performance, thus overcoming objections to a tinny sounding digital voice enhancement system. This provides noise responsive, including speed dependent, band limiting for a communication system.




The adaptation of the acoustic echo cancellation models with random noise may be accomplished by injecting the training noise before or after the noise responsive or speed sensitive filter, FIG.


6


. Injection before such filter provides a system wherein the training noise is speed varying filtered. This approach is advantageous in obtaining the highest training signal allowed while being imperceptible to the occupant. However, the acoustic echo cancellation filters would have potentially unconstrained frequency components. Injection after the speed sensitive filter provides a system wherein the training noise would always be full bandwidth. This has the potential of being more robust, yet has the limitation of lower training noise levels allowed to be imperceptible to the occupant. In a desirable aspect, the system utilizes the natural trade-offs between bandwidth and gain, and results in a more robust communication system.




In

FIG. 6

, a noise responsive high pass filter


290


between microphone


36


and loudspeaker


34


has a filter cutoff effective at elevated noise levels and reducing bandwidth and making more gain available, to improve intelligibility of speech of person


26


transmitted from microphone


36


to loudspeaker


34


. In the noted vehicle application, high pass filter


290


is vehicle speed sensitive, such that at higher vehicle speeds and resulting higher noise levels, lower frequency speech content is blocked, and higher frequency speech content is passed, the lower frequency speech content being otherwise masked at higher speeds by broadband vehicle and wind noise, so that the reduced bandwidth and the absence of the lower frequency speech content does not sacrifice the perceived quality of speech, and such that at lower vehicle speeds and resulting lower noise levels, the cutoff frequency of the filter is lowered such that lower frequency speech content is passed, in addition to higher frequency speech content, to provide enriched low frequency performance, and overcome objections to a tinny sounding system. In one embodiment, a summer


292


has a first input


294


from microphone


36


, a second input


296


from a training signal supplied by training signal source


298


, and an output


300


to high pass filter


290


, such that the training signal is variably filtered according to noise level, namely vehicle speed in vehicle implementations. In an alternate embodiment, training signal source


298


is deleted, and a summer


302


is provided having an input


304


from high pass filter


290


, an input


306


from a training signal supplied by training signal source


308


, and an output


310


to loudspeaker


34


. In this embodiment, the training signal is full bandwidth and not variably filtered according to noise level or vehicle speed.




Further in

FIG. 6

, a noise responsive high pass filter


312


between microphone


38


and loudspeaker


32


has a filter cutoff effective at elevated noise levels and reducing bandwidth and making more gain available, to improve intelligibility of speech of person


30


transmitted from microphone


38


to loudspeaker


32


. In the noted vehicle application, high pass filter


312


is vehicle speed sensitive, such that at higher vehicle speeds and resulting high noise levels, lower frequency speech content is blocked and higher frequency speech content is passed, the lower frequency speech content being otherwise masked at higher speeds by broadband vehicle and wind noise, so that the reduced bandwidth and the absence of the lower frequency speech content does not sacrifice the perceived quality of speech, and such that at lower vehicle speeds and resulting lower noise levels, the cutoff frequency of the filter is lowered such that lower frequency speech content is passed, in addition to higher frequency speech content, to provide enriched low frequency performance, and overcome objections to a tinny sounding system. In one embodiment, a summer


314


has a first input


316


from microphone


38


, a second input


318


from a training signal supplied by training signal source


320


, and an output


322


to high pass filter


312


, such that the training signal is variably filtered according to noise level, namely vehicle speed in vehicle implementations. In an alternate embodiment, training signal source


320


is deleted, and a summer


324


is provided having an input


326


from high pass filter


312


, an input


328


from a training signal supplied by training signal source


330


, and an output


332


to loudspeaker


32


. In this embodiment, the training signal is full bandwidth and not variably filtered according to noise level or vehicle speed.




Optimal voice pickup in a digital voice enhancement system can be characterized by having the largest talking zone and the highest signal to noise ratio. The larger the talking zone the less sensitivity the digital voice enhancement system will have to the talkers physical size, seating position, and head position/movement. Large talking zones are attributed with good system performance and ergonomics. High signal to noise ratios are associated with speech intelligibility and good sound quality. These two design goals are not always complementary. Large talking zones may be accomplished by having multiple microphones to span the talking zone, however this may have a negative impact on the signal to noise ratio. It is desired that the available set of microphones be scanned to determine the best candidate for maximum speech reception. This may be based on short term averages of power or magnitude. An average magnitude estimation and subsequent comparison from two microphones is one implementation in a digital voice enhancement system.




As above noted, closed loop communication systems can become unstable whenever the total loop gain exceeds unity. Careful setting of the system gain, and acoustic echo cancellation may be used to ensure system stability. For various reasons such as high gain requirements, or less than ideal acoustic echo cancellation performance, acoustic feedback can occur. Acoustic feedback often occurs at a system resonance or where the free response is relatively undamped. These resonances usually occur at a very high Q, quality factor, and can be represented by a narrow band in the frequency domain. Therefore, the total system gain ceiling is determined by only a small portion of the communication system bandwidth, in essence limiting performance across all frequencies in the band for one or more narrow regions. In a desirable aspect, the system enables observation, measurement and treatment of persistent high Q system dynamics. These dynamics may relate to acoustic instabilities to be minimized. The observation of acoustic feedback can be performed in the frequency domain. The nature and sound of acoustic feedback is commonly observed in a screeching or howling burst of energy. The sound quality of this type of instability is beyond reverberation, echoes, or ringing, and is observable in the frequency domain by monitoring the power spectrum. Measurement of such a disturbance can be accomplished with a feedback detector, where the exact frequency and magnitude of the feedback can be quantified. Time domain based schemes such as auto correlation could alternatively be applied to obtain similar measurements. Observation and measurement steps could be performed as a background task reducing real time digital signal processing requirements. Treatment follows by converting this feedback frequency information into notch filter coefficients that are implemented by a filter applied to the communication channel. The magnitude of the reduction, or depth of the notch filter's null, can be progressively applied or set to maximum attenuation as desired. Once the filter has been applied, the observation of the acoustic feedback should vanish, however hysteresis in the measurement process should be applied to not encourage cycling of the feedback reduction. Long term statistics of the feedback treatment process can be utilized for determining if the notch filter could be removed from the communication channel. Additionally, multiple notch filters may be connected in series to eliminate more complicated acoustic feedback situations often encountered in three dimensional sound fields.




In

FIG. 7

, feedback detector


350


has an input


352


from microphone


36


, and an output


354


controlling an adjustable notch filter


356


filtering the output of microphone


36


supplied to loudspeaker


34


. Adjustable notch filter


356


has an input


358


from the output of microphone


36


. Feedback detector


350


has an input


352


from microphone


36


at a node


360


between the output of microphone


36


and the input


358


of adjustable notch filter


356


. Summer


90


has an input from the output of model


84


, an input from the output of model


120


, and an input from the output of adjustable notch filter


356


, and an output supplied to loudspeaker


34


. A second feedback detector


370


has an input


372


from microphone


38


, and an output


374


controlling a second adjustable notch filter


376


filtering the output of microphone


38


supplied to loudspeaker


32


. Adjustable notch filter


376


has an input


378


from microphone


38


at a node


380


between the output of microphone


38


and the input


378


of adjustable notch filter


376


. Summer


106


has an input from the output of model


100


, an input from the output of model


122


, and an input from the output of adjustable notch filter


376


. Summer


106


has an output supplied to loudspeaker


32


.




In a further aspect, a sine wave or multiple sine waves can be generated from the detected feedback frequency and serve as the reference to the electronic noise control filter. The ENC filter will form notches at the exact frequencies, and adjust its attenuation until the offending feedback tones are minimized to the level of the noise floor. The ENC filter is similar to a classical adaptive interference canceler application as discussed in


Adaptive Signal Processing


, Widrow and Steams, Prentice-Hall, Inc., Englewood Cliffs, N.J. 07632, 1985, pages 316-323. The output of the filter is then subtracted from the microphone signal to remove the feedback component from the signal. The feedback suppression is performed before the acoustic echo cancellation.




In

FIG. 8

, an acoustic feedback tonal canceler


390


removes tonal feedback noise from the output of microphone


36


to prevent broadcast thereof by loudspeaker


34


. Feedback tonal canceler


390


includes a summer


392


having an input


394


from microphone


36


, an input


396


from feedback detector


398


and tone generator


400


supplied through adaptive filter model


402


, and an output


404


to loudspeaker


34


through summer


90


. Model


402


has a model input


406


from tone generator


400


, a model output


408


supplying a correction signal to summer input


396


, and an error input


410


from summer output


404


. A second feedback tonal canceler


420


is comparable to feedback tonal canceler


390


. Feedback tonal canceler


420


includes a summer


422


having an input


424


from microphone


38


, an input


426


from feedback detector


428


and tone generator


430


supplied through adaptive filter model


432


, and an output


434


supplied to loudspeaker


32


through summer


106


. Model


432


has a model input


436


from tone generator


430


, a model output


438


supplying a correction signal to summer input


426


, and an error input


440


from summer output


434


.




It is desirable for communication systems to be usable as soon as possible after activated. However, this cannot take place until the acoustic echo cancellation models have converged to an accurate solution so that the system may be used with appropriate gain. In a desirable aspect of the system, the acoustic echo cancellation models may be stored in memory and used immediately upon system start up. These models may need some minor correction to account for changes in occupant position, luggage loading, and temperature. These model corrections may be accomplished with quicker adaptation from the stored models rather than starting from null vectors, for example in accordance with U.S. Pat. No. 5,022,082, incorporated herein by reference.





FIG. 9

shows a simplex digital voice enhancement communication system


502


in accordance with the noted '511 application, including a first acoustic zone


504


, a second acoustic zone


506


, a first microphone


508


in the first zone, a first loudspeaker


510


in the first zone, a second microphone


512


in the second zone, and a second loudspeaker


514


in the second zone. A voice sensitive gated switch


516


has a first mode with switch element


516




a


closed and supplying the output of microphone


508


over a first channel


518


to loudspeaker


514


. Switch


516


has a second mode with switch element


516




b


closed and supplying the output of microphone


512


over a second channel


520


to loudspeaker


510


. The noted first and second modes are mutually exclusive such that only one of the channels


518


and


520


can be active at a time. In the first mode, switch element


516




a


is closed and switch element


516




b


is open such that the switch blocks, or at least substantially reduces, transmission from microphone


512


to loudspeaker


510


. In the second mode, switch element


516




b


is closed and switch element


516




a


is open to block or substantially reduce transmission from microphone


508


to loudspeaker


514


. Voice activity detectors or gates


522


and


524


have respective inputs from microphones


508


and


512


, for controlling operation of switch


516


. When switch


516


is in its first mode, with switch element


516




a


closed and switch element


516




b


open, the speech of person


526


in zone


504


can be heard by person


528


in zone


506


as broadcast by speaker


514


receiving the output of microphone


508


. The speech of person


528


and the output of speaker


514


as picked up by microphone


512


are not transmitted to speaker


510


because switch element


516




b


is open. Thus, there is no echo transmission of the voice of person


526


back through microphone


512


and speaker


510


, and hence no need to cancel same. This provides the above noted simplification in circuitry and processing otherwise required for echo cancellation. The same considerations apply in the noted second mode of switch


516


, with switch element


516




b


closed and switch element


516




a


open, wherein there is no rebroadcast by speaker


514


of the speech of person


528


and hence no echo and hence no need to cancel same. A suitable gate and switch combination


522


,


524


,


516


uses a short-time, average magnitude estimating function to detect if a voice signal is present in the respective channel. Other suitable estimating functions are disclosed in


Digital Processing of Speech Signals


, Lawrence R. Rabiner, Ronald W. Schafer, 1978, Bell Laboratories, Inc., Prentice-Hall, pp. 120-126, and also as noted in U.S. Pat. No. 5,706,344, incorporated herein by reference.




A first noise sensitive bandpass filter


530


and a first equalization filter


532


are provided in first channel


518


. A second noise sensitive bandpass filter


534


and a second equalization filter


536


are provided in second channel


520


. Noise sensitive bandpass filter


530


is a noise responsive highpass filter having a filter cutoff frequency effective at elevated noise levels and reducing bandwidth and making more gain available, to improve intelligibility of speech of person


526


transmitted from microphone


508


to loudspeaker


514


, and as disclosed in the noted '874 application. Noise sensitive bandpass filter


534


is like filter


530


and is a noise responsive highpass filter having a filter cutoff effective at elevated noise levels and reducing bandwidth and making more gain available, to improve intelligibility or quality of speech of person


528


transmitted from microphone


512


to loudspeaker


510


. Equalization filter


532


reduces resonance peaks in the acoustic transfer function between loudspeaker


514


and microphone


508


to reduce feedback by damping the resonance peaks. This is desirable because in various applications, including vehicle implementations where zone


506


is the back seat and zone


504


is the front seat, there may be acoustic coupling between speaker


514


and microphone


508


. The resonance peaks may or may not be unstable, depending on total system gain. The equalization filter can take several forms including but not limited to graphic, parametric, inverse, adaptive, and as disclosed in U.S. Pat. Nos. 5,172,416, 5,396,561, 5,715,320, all incorporated herein by reference. The equalization filter may also take the form of a notch filter designed to selectively remove transfer function resonance peaks. Such a filter could be adaptive or determined offline based on the acoustic characteristics of a particular system. In one embodiment, equalization filter


532


is a set of one or more frequency selective notch filters determined from the acoustic transfer function between loudspeaker


514


in zone


506


and microphone


508


in zone


504


. Equalization filter


536


is like filter


532


and reduces resonance peaks in the acoustic transfer function between loudspeaker


510


and microphone


512


to reduce feedback by damping resonance peaks.




In the above noted vehicle implementation, each of highpass filters


530


and


534


is vehicle speed sensitive, preferably by having an input from the vehicle speedometer


538


. At higher vehicle speeds and resulting higher noise levels, lower frequency speech content is blocked and higher frequency speech content is passed, the lower frequency speech content being otherwise masked at higher speeds by broadband vehicle and wind noise, so that the reduced bandwidth and the absence of the lower frequency speech content does not sacrifice the perceived quality of speech. At lower vehicle speeds and resulting lower noise levels, the cutoff frequency of each of highpass filters


530


and


534


is lowered such that lower frequency speech content is passed, in addition to higher frequency speech content, to provide enriched low frequency performance, and overcome objections to a tinny sounding system. In vehicles having an in-cabin audio system, i.e. a radio and/or tape player and/or compact disc player and/or mobile phone, a digital voice enhancement activation switch


540


is provided for actuating and deactuating the voice sensitive gated switch


516


, i.e. turn the latter on or off, and providing an audio mute signal muting, or reducing to some specified level, the in-cabin audio system as shown at radio mute


542


.




In one embodiment, equalization filter


532


is a first frequency responsive spectral transfer function, and equalization filter


536


is a second frequency responsive spectral transfer function each for example as disclosed in above noted U.S. Pat. No. 5,715,320. The first frequency responsive spectral transfer function is a function of a model of the acoustic transfer function between loudspeaker


514


and microphone


508


. The second frequency responsive spectral transfer function of filter


536


is a function of a model of the acoustic transfer function between loudspeaker


510


and microphone


512


. In some embodiments, these first and second acoustic transfer functions are the same, e.g. where zones


504


and


506


are small, and in some implementations these first and second acoustic transfer functions are different. In one preferred form, the first frequency responsive spectral transfer function of filter


532


is the inverse of the noted first acoustic transfer function between loudspeaker


514


and microphone


508


, for example as disclosed in above noted U.S. Pat. No. 5,715,320. Likewise, the noted second frequency responsive spectral transfer function of filter


536


is the inverse of the noted second acoustic transfer function between loudspeaker


510


and microphone


512


, also as in above noted U.S. Pat. No. 5,715,320.




The disclosed combination is simple and effective, and is particularly desirable because it enables use of available known components. By using a speed variable highpass filter in the communication channel, the digital voice enhancement system does not excite lower order cabin modes in vehicle implementations. The highpass filter also greatly reduces transmitted wind and road noises, which are a function of speed, improving the overall sound quality of the digital voice enhancement system. No losses in speech quality are perceived due to aural masking effects from the in-cabin noise. Secondly, the post-processing equalization filter minimizes resonance peaks in the total acoustic transfer function. This has the benefit of reducing the potential for feedback by damping resonance peaks, and also creating a more natural sounding reproduction of speech. The audio mute signal from activation switch


540


is desirable so that when the user selects the digital voice enhancement system, the in-cabin audio system, if present, is disabled, or its output significantly reduced, i.e. muted, as shown at radio mute


542


. This prevents the digital voice enhancement system from detecting false information from the audio system and prevents distortions of the audio system by not allowing the digital voice enhancement system to rebroadcast the audio program.





FIG. 10

shows a DVE, digital voice enhancement, communication system in accordance with the present invention, and uses like reference numerals from above where appropriate to facilitate understanding. The system may be used in a duplex mode as in

FIGS. 1-8

, a simplex mode as in

FIG. 9

, and in other modes.





FIG. 10

shows a DVE system


550


having a plurality of microphones


508


,


552


,


554


,


556


, etc., and at least one loudspeaker


514


, and other loudspeakers if desired such as


558


,


560


, etc. Each microphone has a respective gate


562


,


564


,


566


,


568


, etc., as above, and the microphone signals are supplied in parallel through respective SNNR ratio calculators


570


,


572


,


574


,


576


, to be described, and supplied in parallel to switch


578


. As above described for gates


522


,


524


, a short-time average magnitude estimating function is used to detect if a voice signal is present in the respective channel, to provide a measure or function of the respective voice +noise signals


580


,


582


,


584


,


586


, etc. Other suitable estimating functions may be used as noted above and disclosed in


Digital Signal Processing of Speech Signals


, Lawrence W. Rabiner, Ronald W. Schafer, 1978, Bell Laboratories, Inc., Prentice-Hall, pages 120-126, and also as noted in U.S. Pat. No. 5,706,344, incorporated herein by reference. A longer-time average magnitude sensing function is used in the absence of voice activity detection, to create a measure or function of noise signals


588


,


590


,


592


,


594


, etc.




Switch


578


selects which microphone to electrically couple to loudspeaker


514


, and to any other loudspeaker if desired, so that a listener at loudspeaker


514


can hear the speech of a talker at the selected microphone. The selection decision is based on a given function of the speech of a respective talker relative to his/her acoustic environment at the respective microphone. The selection decision is based on a selection technique normalizing at least one and preferably both of a) different microphone sensitivities and b) different background noise levels at the respective microphones. This is accomplished by calculators


570


,


572


,


574


,


576


, etc. Calculator


570


determines the ratio







S





N





N





R

=


f


(

voice
+
noise

)



f


(
noise
)













where SNNR is the ratio of speech+noise to noise, and f is a given function thereof, preferably average magnitude, average power (magnitude


2


), or peak hold with a given decay rate, and outputs an SNNR signal


580


. The remaining calculators likewise determine the respective ratio for the respective inputs and output SNNR signals


582


,


584


,


586


, etc. The switching decision by switch


578


is based on the largest of the SNNR signals. Switch


578


electrically couples the loudspeaker to the respective selected microphone. The selection decision is based on the ratio of how much louder a talker speaks over the background noise at his/her respective microphone.




As an example, if a first talker and his microphone


508


were in a library, and a second talker and his microphone


552


were in a car on a cell phone, the background noise alone in the car might be louder than the first talker's voice plus the background noise in a library, and hence microphone


552


would always be selected, even if the first talker at microphone


508


was talking. If the second talker is also talking, the addition of his voice to the background noise in the car even further increases the sound level thereat, and further reduces the chances of the first talker ever being selected. In contrast, in the present invention, with the normalizing effect of the SNNR ratio, the selection decision is based on the ratio of how much louder the talker speaks over the background noise at his/her respective microphone. The talker in the library does not have to shout as loud as the talker in the car, nor shout over the background noise in the car, to have his microphone chosen to be active because it is not the overall voice+noise power which is used for the selection decision, but rather the ratio of voice+noise to noise, i.e. SNNR as noted above. The noted time average functions for the microphones are selected such that the addition of the talker's voice to the background noise signal is quickly recognized to provide the voice+noise signal


580


as the numerator to the calculator


570


, at which time the most recent noise value from the slower time averaging signal


588


is used for the denominator of the SNNR ratio. When the voice+noise signal


580


falls, the slower longer-time averaging is used to monitor noise signal


588


, with the resulting SNNR ratio being approximately unity, awaiting the next voice activated fast averaging rise of signal


580


.




It is recognized that various equivalents, alternatives and modifications are possible within the scope of the appended claims.



Claims
  • 1. A digital voice enhancement communication system comprising:a plurality of microphones; at least one loudspeaker; a switch for selecting which microphone to electrically couple to said at least one loudspeaker so that a listener at said at least one loudspeaker can hear the speech of a talker at the selected microphone, the selection decision being based on a given function of the speech of a respective talker relative to his/her acoustic environment at the respective microphone, wherein said selection decision is based on the ratio S⁢ ⁢N⁢ ⁢N⁢ ⁢R=f⁡(voice+noise)f⁡(noise)where SNNR is the ratio of speech plus noise to noise, and f is a given function thereof.
  • 2. The invention according to claim 1 wherein f is magnitude.
  • 3. The invention according to claim 2 wherein f is average magnitude.
  • 4. The invention according to claim 3 wherein f is power.
  • 5. The invention according to claim 4 wherein f is average power.
  • 6. The invention according to claim 1 wherein f is peak hold.
  • 7. The invention according to claim 6 wherein f is peak hold with a given decay rate.
  • 8. The invention according to claim 1 wherein said selection decision is based on the ratio of how much louder a talker speaks over the background noise at his/her respective microphone.
  • 9. A selection method for a digital voice enhancement communication system having a plurality of microphones, and at least one loudspeaker, comprising selecting which microphone to electrically couple to said at least one loudspeaker so that a listener at said at least one loudspeaker can hear the speech of a talker at the selected microphone, basing the selection decision on a given function of the speech of a respective talker relative to his/her acoustic environment at the respective microphone,and comprising basing the selection decision on the ratio S⁢ ⁢N⁢ ⁢N⁢ ⁢R=f⁡(voice+noise)f⁡(noise)where SNNR is the ratio of speech plus noise to noise, and f is a given function thereof.
  • 10. The method according to claim 9 wherein f is magnitude.
  • 11. The method according to claim 10 wherein f is average magnitude.
  • 12. The method according to claim 9 wherein f is power.
  • 13. The method according to claim 12 wherein f is average power.
  • 14. The method according to claim 9 wherein f is peak hold.
  • 15. The method according to claim 14 wherein f is peak hold with a given decay rate.
  • 16. The method according to claim 9 comprising basing said selection decision on the ratio of how much louder a talker speaks over the background noise at his/her respective microphone.
US Referenced Citations (31)
Number Name Date Kind
4359602 Ponto et al. Nov 1982 A
4602337 Cox Jul 1986 A
4658425 Julstrom Apr 1987 A
4677676 Eriksson Jun 1987 A
5022082 Eriksson et al. Jun 1991 A
5033082 Eriksson et al. Jul 1991 A
5111508 Gale et al. May 1992 A
5172416 Allie et al. Dec 1992 A
5216722 Popovich Jun 1993 A
5243659 Stafford et al. Sep 1993 A
5355419 Yamamoto et al. Oct 1994 A
5386477 Popovich et al. Jan 1995 A
5396561 Popovich et al. Mar 1995 A
5533120 Staudacher Jul 1996 A
5544242 Robinson Aug 1996 A
5557682 Warner et al. Sep 1996 A
5561598 Nowak et al. Oct 1996 A
5586189 Allie et al. Dec 1996 A
5590205 Popovich Dec 1996 A
5602928 Eriksson et al. Feb 1997 A
5602929 Popovich Feb 1997 A
5621803 Laak Apr 1997 A
5627747 Melton et al. May 1997 A
5633936 Oh May 1997 A
5673327 Julstrom Sep 1997 A
5680337 Pedersen et al. Oct 1997 A
5706344 Finn Jan 1998 A
5710822 Steenhagen et al. Jan 1998 A
5715320 Allie et al. Feb 1998 A
5940486 Schlaff Aug 1999 A
6031918 Chahabadi Feb 2000 A
Foreign Referenced Citations (2)
Number Date Country
0568129 Nov 1993 EP
0721178 Jul 1996 EP
Non-Patent Literature Citations (4)
Entry
Digital Processing of Speech Signals, Lawrence R. Rabiner, Ronald W. Schafer, 1978, Bell Laboratories, Inc. Prentice-Hall, pp. 120-126.
“DFR11EQ Digital Feedback Reducer and Graphic Equalizer With Software Interface for Windows”, Model DFR11EQ User Guide, Shure Brothers Incorporated, 222 Hartrey Ave., Evanston, IL 60202-3696, 1996.
Adaptive Signal Processing, Widrow and Stearns, Prentice-Hall, Inc., Englewood Cliffs, NJ 07623, 1985, p. 316-323.
Number Theory In Science And Communication, M.R. Schroeder, Berlin: Springer-Verlag, 1984, pp. 252-261.