1. Technical Field
The present disclosure relates to the field of processing audio signals. In particular, to a system and method for speech reinforcement.
2. Related Art
In-car communication (ICC) systems may be integrated into an automobile cabin to facilitate communication between occupants of the vehicle by relaying signals captured by microphones and reproducing them in audio transducers within the vehicle. For example, a speech signal received by a microphone near a driver is fed to an audio transducer near third row seats to allow third row occupants to hear the driver's voice clearly. Delay and relative level between a direct speech signal and a reproduced sound of a particular talker at a listener's location are important to ensure the naturalness of conversation. Reproducing the driver's voice in audio transducers situated in close proximity to the occupants may cause the occupants to perceive the driver's voice originating from both the driver's spatial location and from the spatial location of the audio transducers. In many cases, the perception of the driver's voice coming from two different spatial locations may be distracting to the occupants.
The system and method may be better understood with reference to the following drawings and description. The components in the figures are not necessarily to scale, emphasis instead being placed upon illustrating the principles of the disclosure. Moreover, in the figures, like referenced numerals designate corresponding parts throughout the different views.
Other systems, methods, features and advantages will be, or will become, apparent to one with skill in the art upon examination of the following figures and detailed description. It is intended that all such additional systems, methods, features and advantages be included with this description and be protected by the following claims.
A system and method for speech reinforcement may determine the spatial location of an audio source and the spatial location of a listener. An audio signal generated by the audio source may be captured. The spatial location, relative to the listener, of two or more audio transducers that emit a reinforcing audio signal to reinforce the audio signal may be determined. The captured audio signal may be used to generate, responsive to the spatial location of the audio source, the spatial location of the listener and the spatial location of the two or more audio transducers, the reinforcing audio signal such that, when emitted by the two of more audio transducers, the listener perceives a source of the reinforcing audio signal to be spatially located in substantially the spatial location of the audio source thereby reinforcing the audio signal.
An in-car communication (ICC) system may be integrated into the automobile cabin 100 that facilitates communication between occupants of the vehicle by relaying signals captured by one or more of the microphones 102 and reproducing them in the audio transducers 104 within the vehicle. For example, an audio signal captured by a microphone 102 near the driver's mouth may be fed to an audio transducer 104 near the third row to allow third row occupants to hear the driver's voice clearly. The ICC system may improve the audio quality associated with a person located in a first zone communicating with a person located in a second zone. Reproducing the driver's voice may result in a feedback path that may cause ringing; this may be mitigated by, for example, controlling a closed-loop gain. Delay and the relative amplitude level between a direct speech signal and a reproduced sound of a particular talker at a listener's location may also affect the naturalness of conversation. The ICC system may also be referred to as a sound reinforcement system. The sound reinforcement system may be used, for example, in large conference rooms with speakerphones and in audio performances at venues such as concert halls. The sound reinforcement system may also be used in other types of vehicles such as trains, aircraft and watercraft.
Audio signal 208A may be reflected by physical surfaces including, for example, the dashboard and the windshield in an automobile. The reflection of audio signal 208A may include reflected audio signals 210A and 210B (collectively or generically reflected audio signals 210). The reflected audio signals 210 may be characterized as reverberations and/or echoes of the audio signal 208. The reflected audio signals 210 may help the listener 204 spatially locate the audio source 202 in a way similar to that for audio signal 208B and 208C as described above.
The audio transducers 206 may be used to reinforce the captured audio signal to facilitate communication between the audio source 202 and the listener 204. The listener 204 may receive reinforcement audio signals 212C and 212D from audio transducer 206A. The reinforcement audio signals 212C and 212D may have differences in time and/or frequency as perceived by the listener 204 due to the acoustic environment and propagation delays between the audio transducer 206A and the left and right ears of the listener 204. The listener 204 may receive the reinforcement audio signal 212A and 212B from audio transducer 206B. The reinforcement audio signals 212A and 212B may have differences in time and/or frequency as perceived by the listener 204 due to the acoustic environment and propagation delays between the audio transducer 206B and the left and right ears of the listener 204. The listener 204 may perceive the reinforcement signals 212A, 212B, 212C and 212D (collectively or generically reinforcement audio signals 212) to be spatially located behind the listener 204 because the reinforcement audio signals 212 are emitted from the audio transducers 206 that are spatially located behind the listener 204. The listener 204 may perceive the spatial location of the audio signal 208 to be generated by the audio source 202 in front of the listener 204 and the spatial location of the reinforcement signals 212 to be generated from behind the listener 204. This may be distracting and sound unnatural to the listener 204.
The spatial location of a vehicle occupant may be determined in a variety of ways including, for example, sensors placed in each of the seating locations, audio processing of captured microphone signals that may track spatial location of audio signal 208, video cameras that support tracking motion inside the car, facial recognition, capturing heat signatures of occupants and other similar detection mechanisms. The vehicle occupants may include the audio source 202 and the listener 204. The spatial location of the audio transducers 206 may be known a priori or determined dynamically. Audio transducers 206 in an automobile may typically be spatially located in fixed locations. The captured audio signal may be processed in order for the listener 204 to perceive the reinforcement signals 212 to be generated by a virtual source 402 spatially located in substantially the spatial location of the audio source 202.
Processing (e.g. filtering) the captured audio signals reproduced as the reinforcement signals 212 in the two or more audio transducers 206 may be used to modify the spatial location of the virtual source 402 perceived by the listener 204. The processing applied to the captured audio signals emitted by the first audio transducer 206A may combine the desired spatial reinforcement signal 404B of the virtual source 402 and cancel the cross reinforcement signal 212B from the second audio transducer 206B in the left ear of the listener 204. The desired spatial reinforcement signal 404B associated with the virtual source 402 may be represented as a transfer function from the virtual source 402 to the left ear of the listener 204. The processing applied to the captured audio signals emitted by the first audio transducer 206A may be described as the convolution of the transfer function of the desired spatial reinforcement signal 404B and the inverse of the transfer function of the cross reinforcement signal 212B. Correspondingly, the filtering applied to the captured audio signals emitted by the second audio transducer 206B may be described as the convolution of the transfer function of the desired spatial signal 404A and the inverse of the transfer function of the cross reinforcement signal 212C. An example transfer function for the audio transducers 206 is shown in the following equations:
h
206A
=h
404B
h
212B
−1
h
206B
=h
404A
h
212C
−1
Processing the captured audio signal with the transfer function h206A and emitting the resultant signal from the audio transducer 206A may allow the listener 204 to perceive the desired spatial reinforcement signal 404B in the left ear. Filtering the captured audio signal with the transfer function h206B and emitting the resultant signal from the audio transducer 206B may allow the listener 204 to perceive the desired spatial reinforcement signal 404A in the right ear. The combination of the reinforcement signals 404A and 404B may allow the listener 204 to perceive the spatial location of the audio source to be that of the virtual source 402.
Calculating the transfer functions for the desired spatial signals, h404A and h404B, and the cross reinforcement signals, h212B and h212C, may be performed using, for example, any combination of theoretical or acoustic measurement techniques. One example theoretical calculation may create transfer functions that account for the propagation delay between the sources, the virtual source 402 and the audio transducers 206, and the spatial location of the listener 204. For example, the cross reinforcement signal 212B may have a propagation delay measured in milliseconds (msec) from the location of the audio transducer 206A to the right ear of the listener 204. The cross reinforcement signal 212C may have a propagation delay measured in msec from the location of the audio transducer 206B to the left ear of the listener 204. The desired spatial reinforcement signal 404A may have a propagation delay measured in msec from the location of the virtual source 402 to the right ear of the listener 204. The desired spatial reinforcement signal 404B may have a propagation delay measured in msec from the location of the virtual source 402 to the left ear of the listener 204. Each of the transfer functions may be created as a delayed impulse. The spatial location of the listener 204 may be an approximate spatial location as the listener 204 may move. For example, a sensor in the seat may determine that a listener 204 may be in the seating location but the exact position of the listeners' ears may be unknown. Any approximation error associated with creating the transfer function may result in a different perceived spatial location of the virtual source 402.
The transfer functions may include additional processing, or filtering, that may improve the accuracy of the perceived spatial location of the virtual source 402 including, for example, head shadowing effects, the acoustic environment of the car, shadowing effects of other listeners, orientation of the listener and the height of the listener. Microphones 102 located proximate to a listener 204 may be utilized to implement an adaptive filter that may improve the perceived spatial location of the virtual source 402.
In some situations, multiple listeners 204 may perceive the virtual source 402 from the same audio transducers 206. For example, two listeners 204 in the rear seat with a single driver, or audio source 202. The calculation of the transfer functions may utilize an average spatial location of the two listeners 204. The result of using an average spatial location of the two listeners 204 may cause each listener 204 to perceive the spatial location of the virtual source 402 to be in the front seat but not necessarily in the location of audio source 202. Each listener 204 may perceive the virtual audio source 402 to be in a different location. Even though the perceived spatial location of the virtual source 402 may not be in substantially the spatial location of the audio source 202, the overall perception of the listeners 204 may still be an improvement over the perception that the spatial reinforcement signals 304 are located behind the listener 204.
One or more ICC systems using speech reinforcement may be operated concurrently. The example systems described above show the driver as the audio source 202 communicating with one or more listeners 204 behind the driver. The driver may also be the listener 204 and the passengers behind the driver may become the audio source 202. In another example, a third row of seats in a vehicle cabin may include an ICC system with speech reinforcement to communicate with all the other vehicle occupants.
The processor 802 may comprise a single processor or multiple processors that may be disposed on a single chip, on multiple devices or distributed over more that one system. The processor 802 may be hardware that executes computer executable instructions or computer code embodied in the memory 804 or in other memory to perform one or more features of the system. The processor 802 may include a general purpose processor, a central processing unit (CPU), a graphics processing unit (GPU), an application specific integrated circuit (ASIC), a digital signal processor (DSP), a field programmable gate array (FPGA), a digital circuit, an analog circuit, a microcontroller, any other type of processor, or any combination thereof.
The memory 804 may comprise a device for storing and retrieving data, processor executable instructions, or any combination thereof. The memory 804 may include non-volatile and/or volatile memory, such as a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM), or a flash memory. The memory 804 may comprise a single device or multiple devices that may be disposed on one or more dedicated memory devices or on a processor or other similar device. Alternatively or in addition, the memory 804 may include an optical, magnetic (hard-drive) or any other form of data storage device.
The memory 804 may store computer code, such as the spatial location determiner 602 and the spatial processor 606 as described herein. The computer code may include instructions executable with the processor 802. The computer code may be written in any computer language, such as C, C++, assembly language, channel program code, and/or any combination of computer languages. The memory 804 may store information in data structures including, for example, feedback coefficients.
The I/O interface 806 may be used to connect devices such as, for example, the microphones 102, the audio transducers 206, the external inputs 608 and to other components of the system 800.
All of the disclosure, regardless of the particular implementation described, is exemplary in nature, rather than limiting. The system 800 may include more, fewer, or different components than illustrated in
The functions, acts or tasks illustrated in the figures or described may be executed in response to one or more sets of logic or instructions stored in or on computer readable media. The functions, acts or tasks are independent of the particular type of instructions set, storage media, processor or processing strategy and may be performed by software, hardware, integrated circuits, firmware, micro code and the like, operating alone or in combination. Likewise, processing strategies may include multiprocessing, multitasking, parallel processing, distributed processing, and/or any other type of processing. In one embodiment, the instructions are stored on a removable media device for reading by local or remote systems. In other embodiments, the logic or instructions are stored in a remote location for transfer through a computer network or over telephone lines. In yet other embodiments, the logic or instructions may be stored within a given computer such as, for example, a CPU.
While various embodiments of the system and method system and method for speech reinforcement, it will be apparent to those of ordinary skill in the art that many more embodiments and implementations are possible within the scope of the present invention. Accordingly, the invention is not to be restricted except in light of the attached claims and their equivalents.
This application claims the benefit of priority from U.S. Provisional Application No. 62/095,510, filed Dec. 22, 2014, which is incorporated by reference.
Number | Date | Country | |
---|---|---|---|
62095510 | Dec 2014 | US |