his invention relates to a speech distribution system. In certain situations it may be difficult to hear speech audibly or clearly due to noise, other sounds or attenuation of the speech sound waves. For example in a motor vehicle, road and background noise may effectively render the spoken word inaudible. This type of problem is compounded when the driver of a vehicle is attempting to communicate with people who are relatively far from the driver, for example in rear seats. Quite often, especially in a minibus or similar vehicle which has three or four rows of seats, it may be necessary for the driver to turn his head in order to project his voice towards the rear of the vehicle. This can have dangerous consequences for the driver's attention is drawn from the road. On the other hand, projecting the sound forward causes undue attenuation thereof, especially in cars with good noise dampening.
Ironically, the better the sound dampening is in a vehicle (to reduce engine and road noise), the greater is the dampening effect on speech which is projected forward from occupants in the front seats and which is directed to passenger in the rear.
Equally, in the reverse sense, speech originating from the rear of a vehicle may be drowned out by background noise which may include sound emanating from an audio system, such as a radio/tape/CD unit, of the vehicle. Ideally, a situation should be created in which conversation can flow in a natural manner. This will enable the driver to engage pleasantly in conversation with fellow passengers while keeping a proper look out.
The invention provides a method of distributing speech which includes the steps of:
Step (b) is preferably carried out using adaptive filters, echo cancellation and other digital signal processing techniques.
The said signal may be distributed through at least one loudspeaker.
The said signal may be distributed to a plurality of loudspeakers at locations which may exclude the said given location.
The method of the invention may be implemented inside a vehicle and the locations may respectively correspond to seating positions inside the vehicle.
The loudspeaker referred to may be one of a plurality of loudspeakers which form part of an audio system inside a vehicle.
The method may include the step of varying the signal strength of the said signal which is distributed. Thus signals which have different strengths, depending on prevailing conditions and requirements, may be distributed to respective locations. The signal strength may be varied per location such that, for example, in a vehicle with three rows of seats the driver can converse with a passenger who is seated in the rearmost row, directly behind the driver. The signal level to other passengers may be turned down. The signal strength of the distributed signal may be greater in a situation with severe background noise and, for example at high vehicle speed, the strength of the speech signal can also be high.
If use is made of the loudspeakers of an audio system then the speech signal which is distributed may vary in strength in accordance with the strength or amplitude of an audio signal, music or otherwise, which is being transmitted on the audio system.
If different audio signals are received at respective locations then signals which correspond to each extracted speech signal may be distributed to the various locations but preferably excluding, in each case, the respective location from which an extracted signal originated to prevent an echo effect or positive feedback. If no additional wiring can be accommodated in the speech distribution system the locally received signals at the various locations may be filtered and may be shifted in frequency so that they can be transmitted to a central unit on the same conductive lines which are used for the transmission of audio signals from a central audio or control unit to the loudspeakers. This allows the distributed signal or signals to be mixed with signals originating from the audio system, for example radio or music signals, without any interference.
Time delays may be imparted to distributed signals to eliminate echo effects since the signals travelling via wire to the various locations travel much faster than soundwaves (speech) from the person speaking to the same locations.
The invention also provides apparatus for distributing speech which includes a receiving device for receiving an acoustic signal (noise, music, speech, etc.) from one of a plurality of locations, a module for extracting from the acoustic signal a signal which represents speech originating at or close to that location, and a unit for distributing an amplified signal, which includes the extracted speech signal, to at least some of the said plurality of locations.
The speech signal may be distributed to each of the said plurality of locations although, preferably, the location from which the said acoustic signal was received, is included.
The said extracted signal preferably represents the speech (in question) as best possible.
The invention is further described by way of examples with reference to the accompanying drawings in which:
a illustrates a variation to the apparatus of
a are similar to
The invention is based on the use of techniques of adaptive filters and echo cancellation to extract local speech from a signal carrying music, noise and speech and to distribute a resulting speech signal to one or more locations inside a vehicle. The invention can be effectively implemented making use of an audio system such as a radio/tape/CD system, inside a vehicle, which is connected to a plurality of loudspeakers and some microphones strategically placed inside the vehicle.
The principles of the invention can be described by the following generalised example.
Assume a four seater vehicle has a stereo radio/CD audio system with four speakers (left front, right front, left back, right back) and that a system according to the invention is integrated with the audio system. Four microphones are present, one at each seat.
A main unit has “a priori” information about the audio signal (ASe) originating from the radio/CD system. Without any other audio signal (from occupants, road noise, etc.) the signal detected by a microphone is a function (F) of ASe. This function is the complex result of the speaker transfer function, the attenuation over the air and through objects (seats etc.), sound reflections from objects, (windows etc.), the microphone transfer function, multiple paths along which the soundwaves travel, and the like.
Since ASe (reference signal from audio unit) is known and the result as measured by the microphone in the absence of other sounds is known, it is possible to model this transfer function using echo cancelling techniques and some fault minimisation algorithm, like a least means square (LMS) algorithm. Since other signals are also present in the microphone signal the calculations are a little more complex but techniques of this type are described in the art. Because other signals like the driver speech signal are not normally correlated with the signals from the audio unit, they will not statistically influence the filter adaptation over a period of time. The modelling results in a signal ASe.sup.1. Subtracting ASe.sup.1 from the microphone signal leaves the signals representing the speech and other noise.
a illustrates a modified version of the form of the invention shown in
Each loudspeaker may include more than one speaker, such as low frequency, midrange and tweeter devices.
It is to be borne in mind that the invention does not emulate the operation of a public address system in which an audio signal present at an input is amplifed indiscriminately. This invention aims to achieve a mix of the voice signal with the prevailing music or other audio entertainment without changing the ambience by an overbearing signal amplification.
The signal processing also removes the requirement for the microphone to be very close to, or specifically targeted at, the respective speaker.
The construction of the main unit and the construction of each distribution module are described hereinafter.
Note that in the following description the addition of the symbol “e” as a suffix to a sound signal denotes the electrical representation of such sound signal.
The audio unit 10 produces an audio signal AS (electrical counterpart ASe) which is transmitted through the main unit 14 and the distribution modules 16 to the respective loudspeakers 12.1 to 12.4. This aspect is normally substantially conventional and is not further described herein. In fact, this aspect is similar to a situation without the main unit and the distribution modules.
Assume that the loudspeaker 12.1 and the microphone 18.1 are associated with the position of the seat of the driver of the vehicle (in
The combined signal ASxe+S1ae.sup.1 is transmitted to the various loudspeakers 12.2 to 12.4 which are associated with different seats in the vehicle. Persons seated at these seats therefore hear a signal which consists of the audio signal originating from the audio unit 10 in accordance with the volume setting (including left/right balance and back/front balance) and the superimposed speech signal which is derived from the driver. Thus, with the system shown in
If additional wiring or other medium of transfer from the microphone to the main unit can be accommodated a system as shown in
The system shown in
It is to be noted that in the arrangement of
In
The module 16 includes mixers 22 and 24 respectively and first and second filters 26 and 28 respectively.
The filter 26 is a band pass filter extending for example from 1000 Hz to 20 kHz and is suitable for speech and music transmission. The purpose of this filter is to filter out a signal of speech and other sounds which are picked up by the local microphone 18, frequency shifted by the mixer 24 and local oscillator 30 and then mixed into the line by the mixer 22.
The filter 28 is a dynamic adaptive digital filter mechanism. The filter is implemented by dynamically adjusting the coefficients of an FIR-type filter so that all sounds which are detected by the microphone 18 and which are correlated with the sounds which are output to the loudspeaker 12, are cancelled out as best as possible. This technique can be implemented using a least means square error principle (LMS). The quality of the cancellation is determined by the quality of the digitization, length of filter, etc. As is usual a trade off with cost is required.
The system can be designed so that the adaptive filter can estimate the transfer function as part of the installation procedure. The resultant filter coefficients can then be stored in a non-volatile memory 29 and can be used every time the system is powered up. This approach prevents the adaptation process from starting at a random or an all-zero vector, speeds up the adaptation process, and helps to prevent spurious transients at start up.
The system can also be designed to store new coefficients when it is determined that the transfer function has changed, or has changed by more than a minimum setting. This can result when large objects are placed in a vehicle, when there is a change in passenger numbers, a change in balance (UR, F/B) and many more.
The filter 28 can also include a stage in which the output, typically the speech originating near a microphone 18, is filtered over the speech band, from say 300 Hz to 6 kHz, to keep noise out of the system. Alternatively the speech band filter can be positioned between the microphone and the filter 28. An anti-aliasing filter is required in any event.
The mixer 24 multiplies the signal which is transmitted to the main unit 14 with a signal from a local oscillator 30 so that the signal is translated in frequency. The mixer 22 mixes this signal with the signal AS from the main unit and allows both signals, i.e. the audio signal and the speech signal, to be impressed on the speaker wire 20 at different locations in the frequency spectrum.
It may be advantageous to add a low level of white noise to the signal from the audio system (radio/CD etc.) before this signal is output on the speakers. The adaptive filter 28 needs to build a model of the transfer function between the electrical signal before the speakers to the electrical signal after the microphone. In order to do so the filter requires energy over the whole frequency spectrum and since this cannot be guaranteed for all music and sounds from the audio system, it may be prudent to add the white noise from a source 31 for a short time period to help estimate the transfer function at all frequencies.
The noise level should be very low so that it does not irritate a listener. The white noise needs to be added only for about a second and the addition thereof should not prove to be a source of annoyance to the occupants of the vehicle. It may be necessary to repeat this from time to time.
It is also important to ensure that the sound from the microphones is processed in such a way that background noise is eliminated as far as possible. This can also be done using dynamic adaptive filtering techniques. For example, a continuous sine wave can easily be identified as a non-speech signal and then removed with a sharp filter.
The system can also be used to adapt sound levels at the different loudspeakers to prevailing conditions.
An important function that can be designed into the system is that of automatic volume control. A radio and music volume setting that may be acceptable at a high speed with an attendant high background noise level will probably be too loud when the vehicle speed is much lower.
The system has access to signals which represent noise and sound levels and which can be analysed to make a decision on automatically adjusting the volume control to a different level. With a digital signal processor available and microphones placed strategically in various places inside the vehicle, it is possible to extract the required parameters (road and engine noise levels) and to make the necessary adjustments to ensure a pleasant audio experience for the vehicle's occupants.
The system can also shut down if no voice signal is present and can be integrated with cell phone technology to provide hands-free working.
The filters 32 and 34 extract the frequency translated speech signal input on the speaker wire 20 by removing the baseband signals and the mixers 38 and 40 translate the speech signal to the base band. In the mixer 36 the audio signal is mixed with the speech signals from each of the locations and is then distributed to each loudspeaker except, possibly, for each speech signal, the respective location of origin.
The speech distribution system includes a mixer 50, a filter 52 and an echo cancellation mechanism 54. Four loudspeakers 12.1, 12.2, 12.3 and 12.4 are included in the audio system. A speaker wire 56 extends from the audio unit 10 and is destined for the speaker 12.1 associated with the driver. A speaker wire 58 which is destined for the speakers 12.2, 12.3 and 12.4 extends from the audio unit to the mixer 50. A microphone 60 is associated with the speaker 12.1 and is positioned to detect speech from a driver of the vehicle
The filter 52 is an analogue or digital filter which extracts a speech signal originating from the driver. If use is made of a digital filter then the filter includes an analogue anti-aliasing filter. This would typically be a 300 Hz to 3 kHz (or 6 kHz) bandpass filter.
The echo cancellation mechanism 54 is a dynamically adaptive device (see
The mechanism 54 may also include a fixed filter which limits the working of the adaptive portion of the mechanism to the same band as the filter 52.
The mixer 50 amplifies the desired speech signal to a level which is comparable to the amplitudes of the other signals or even to a predetermined user-settable level. The speech signal is then mixed with the audio signal originating from the unit 10 which is destined for the speakers 12.2 to 12.4. Volume may be controlled by means of a conventional device 62. The device 62 could also, to some extent, be controlled automatically, by means of a processor 63, which is responsive to background noise levels so that, as has been described hereinbefore, the volume of the audio input signal is automatically adjusted in a manner which is dependent on the background noise level. Thus if the audio unit volume level is increased the amplitude of the mixed speech signal is also increased. The volume adjustment may be effective for individual speakers or for groups of speakers.
It is possible to combine a microphone with a loudspeaker in the sense that these devices are integrally formed. In this instance the arrangement shown in
With a different approach it is possible to make use of centralised distribution. For example if the different microphones can be hardwired or if it can be assumed that the microphone signal can be transmitted over the loudspeaker wires or that the microphone is part of the loudspeaker then the system can be simplified as a central distribution unit. This technique is shown in
The arrangement of
According to a further modification of the invention time delays can be built into the system to compensate for the differences in the transmission times of the physical sounds (the true acoustic sounds) and the electronic or electrical signals which represent the sounds and travel much faster. In this way discernible echoes or reverberation effects can be eliminated or minimised.
Another possibility is to incorporate the distribution system, whether in the form of a central distribution unit or a distributed unit, into the audio system of the vehicle. Separate hardware items are then not installed for the components necessary to implement the speech distribution system are incorporated in the audio system.
The system of the invention, inter alia because of the presence of processing power 63 (see
Similarly, oral commands can be used to control other vehicle functions (69) such as setting a speed control unit, turning lights on and off, controlling wiper functions, mobile phone functions and the like. This may be done in conjunction with pressing an “audio command” activation button 71 that should typically be located on the steering wheel. It would be desirable for this unit to control, via voice command from the driver, the answering and dialing of a vehicular based mobile phone. The volume of the audio unit can then automatically be reduced and a particular occupant primarily targeted for the phone conversation or all occupants equally. Voice commands may be used for entertainment systems (DVD, VHS, TV), a radio station, electronic guidance (GPS) control and address selection, climatic control (A/C, heating), and the like.
In a further embodiment (see
In
In the system of
A digital filter is associated with each microphone although in this case only one microphone is shown. A signal from the radio unit 10 is fed into a shift register delay line 90 of the digital filter. The values from the delay line are then multiplied with the digital filter coefficients 92 and summed in an accumulator 94. The result is an estimate of the part of the microphone signal that represents the signals from the radio unit subjected to the transfer functions of the loudspeakers, the microphones and the media between them. This value is subtracted (step 96) from the signals detected by the microphone 18.1 to give a signal which, as has been discussed elsewhere, represents the error signal driving the filter adaptation process and also the signals of other sounds like speech originating close to the microphone.
In a stage 98 the error signal is multiplied with a coefficient that determines the adaptation rate and also the smoothness of the adaptation. The error signal is then further used to drive the filter coefficients 92. From the same signal, but on the signal side, an average power is determined in a step 100. This is useful to help keep signals adjusted or to set values at the various locations. The signal from the microphone may also be analysed in terms of content and power to prevent a situation in which no speech is present and only noise is being inserted into the system and amplified. This error (speech) signal is then adjusted in a stage 102 to reflect the volume settings of the speech to the various loudspeakers.
In a step 104 the final mix takes place between the signals from the radio unit 10 with the speech signals which are now volume adjusted. This can be done at a small signal level and the resulting signal is amplified (104) and is then sent to the various loudspeakers.
| Number | Date | Country | Kind |
|---|---|---|---|
| 99/7564 | Sep 1999 | ZA | national |
This is a continuation of co-pending application Ser. No. 10/149,362 having filing date of Aug. 28, 2002, which is in turn the U.S. nationalization of international patent application PCT/ZA00100244 having an international filing date of Dec. 7, 2000.
| Number | Date | Country | |
|---|---|---|---|
| Parent | 10149362 | Aug 2002 | US |
| Child | 11834195 | Aug 2007 | US |