The present invention relates to a sound processing system and a sound processing method.
In general, speakers are installed at a plurality of positions in a vehicle interior. For example, a right front speaker in a right door part and a left front speaker in a left door part are installed at symmetrical positions with respect to a center line of a vehicle interior space. However, these speakers are not in symmetrical positions with respect to a listening position of a listener (driver seat, front passenger seat, rear seat, and the like).
For example, if a listener is sitting in the driver seat, the distance between the right front speaker and the listener is not equal to the distance between the left front speaker and the listener. As an example, for a right-hand drive car, the former distance is shorter than the latter distance. Therefore, when sound is output from speakers of two door parts at the same time, the listener sitting in the driver seat generally hears the sound output from the right front speaker, followed by the sound output from the left front speaker. The difference in distance between the listening position of the listener and each of the plurality of speakers (difference in time for a reproduced sound emitted from each speaker to arrive) causes a bias in sound image localization due to the Haas effect.
Various technologies are known to improve such sound image localization bias (for example, see Patent Document 1—Japanese Unexamined Patent Application 2008-67087).
However, the conventional technology exemplified in Patent Document 1 may not sufficiently improve sound image localization bias.
Therefore, in view of the foregoing, an object of the present application is to provide a sound processing system and sound processing method suitable for improving sound image localization bias.
A sound processing system according to an embodiment of the present application includes: a function acquisition unit that acquires an interaural cross correlation function when listening to sound output from a plurality of speakers at a predetermined listening position; a position determination unit that determines a target position based on an interaural cross correlation function of a predetermined range of interaural cross correlation functions acquired by the function acquisition unit; a delay amount calculation unit that calculates a delay amount based on the target position determined by the position determination unit; and a delay unit that delays an audio signal, which is a signal of the sound, output to at least one of the plurality of speakers, based on the delay amount calculated by the delay amount calculation unit. The interaural cross correlation function of the predetermined range is an interaural cross correlation function in a range of ±n (where n is a positive value greater than 1) milliseconds.
According to one embodiment of the present application, a sound processing system and sound processing method suitable for improving sound image localization bias are provided.
The following description relates to a sound processing system and sound processing method according to an embodiment of the present application.
The speaker SPFR is a right front speaker embedded in a right door part (driver seat side door part). The speaker SPFL is a left front speaker embedded in a left door part (front passenger seat side door part). The vehicle A may have yet another speaker (e.g., rear speaker) installed (i.e., three or more speakers).
The binaural microphone MIC has, for example, a configuration in which a microphone is incorporated in each ear of a dummy head imitating a human head. Hereinafter, the microphone incorporated in the right ear of the dummy head will be referred to as “microphone MICR.” The microphone incorporated in the left ear of the dummy head will be referred to as “microphone MICS.”
The player 10 is connected to a sound source. The player 10 plays an audio signal input from the sound source, which is then output to the LSI 11.
Examples of the sound source include disc media such as CDs (Compact Disc), SACDs (Super Audio CD), and the like that store digital audio data and storage media such as HDDs (Hard Disk Drive), USBs (Universal Serial Bus), and the like. A telephone (e.g., feature phone, smartphone) may be the sound source. In this case, the player 10 outputs through to the LSI 11 the voice signal during a call input from the telephone.
The LSI 11 is an example of a computer provided with a CPU (Central Processing Unit), RAM (Random Access Memory), ROM (Read Only Memory), and the like. The CPU of the LSI 11 includes a single processor or a multiprocessor (in other words, at least one processor) that executes a program written in the ROM of the LSI 11 and comprehensively controls the sound processing device 2.
The LSI 11 acquires an interaural cross correlation function (IACF) when listening to sound output from a plurality of speakers (in the present embodiment, speakers SPFR and SPFL) at a predetermined listening position (e.g., driver seat, front passenger seat, or rear seat), determines a target position based on an interaural cross correlation function of a predetermined range of acquired interaural cross correlation functions, calculates a delay amount based on the determined target position, and delays an audio signal, which is a signal of the sound, output to at least one of the plurality of speakers, based on the calculated delay amount. The interaural cross correlation function of the predetermined range is an interaural cross correlation function in a range of ±n (where n is a positive value greater than 1) milliseconds (msec).
The audio signal after the time alignment processing by LSI 11 is converted to an analog signal by the D/A converter 12. The analog signal is amplified by the amplifier 13 and output to the speakers SPFR and SPFL. As a result, music recorded in the sound source, for example, is reproduced in the vehicle interior from the speakers SPFR and SPFL.
According to the present embodiment, the delay amount is calculated using the interaural cross correlation function over a wide range exceeding the ±1 millisecond range (i.e., ±n millisecond range) and time alignment processing is performed to improve the bias in sound image localization that tends to occur in a listening environment of a vehicle interior.
In the present embodiment, a vehicle-mounted sound processing system 1 is exemplified. However, sound image localization bias can also occur in listening environments such as rooms in a building and the like. Therefore, the sound processing system 1 may be implemented for listening environments other than a vehicle interior.
The display unit 14 is a device that displays various screens, such as a settings screen, and examples include LCDs (Liquid Crystal Display), ELs (Electro Luminescence), and other displays. The display unit 14 may be configured to include a touch panel.
The operation unit 15 includes operators such as switches, buttons, knobs, wheels, and the like of a mechanical system, a capacitance non-contact system, a membrane system, and the like. If the display unit 14 includes a touch panel, the touch panel also forms a portion of the operation unit 15.
As shown in
The pre-processing unit 100 performs pre-processing to improve sound image localization bias. As shown in
The measuring signal generation unit 101a generates a predetermined measuring signal. The generated measuring signal is, for example, an M-sequence code (Maximal length sequence). The length of the measuring signal is at least twice the code length. Note that the measuring signal may be another type of signal, such as a TSP signal (Time Stretched Pulse) or the like, for example.
The control unit 101b sequentially outputs the measuring signal input from the measuring signal generation unit 101a to each of the speakers SPFR and SPFL. As a result, predetermined measuring sounds are sequentially output from each of the speakers SPFR and SPFL at a predetermined time interval.
In the present embodiment, the measurement position of the impulse response (an example of a predetermined listening position) is the driver seat. Therefore, the binaural microphone MIC is installed in the driver seat. The installation position of the binaural microphone MIC changes based on the listening position.
The microphone MICR and microphone MICL first acquire the measuring sound output from the speaker SPFR. The microphone MICR and microphone MICL then acquire the measuring sound output from the speaker SPFL.
The control unit 101b outputs signals of the measuring sounds (i.e., measurement signals) acquired by each of the microphones MICR and MICL to the response processing unit 101c. Hereinafter, the measurement signal output from the speaker SPFR and acquired by the microphone MICR will be referred to as “measurement signal RR.” The measurement signal output from the speaker SPFL and acquired by the microphone MICR will be referred to as “measurement signal RL.” The measurement signal output from the speaker SPFR and acquired by the microphone MICL will be referred to as “measurement signal LR.” The measurement signal output from the speaker SPFL and acquired by the microphone MICL will be referred to as “measurement signal LL.”
The response processing unit 101c acquires an impulse response.
By way of example, the response processing unit 101c calculates an impulse response by determining a cross correlation function between the measurement signal RR and a reference measurement signal by mathematical operation, calculates an impulse response by determining a cross correlation function between the measurement signal RL and the reference measurement signal by mathematical operation, and synthesizes the two calculated impulse responses. The synthesized impulse response is an impulse response corresponding to the right ear of a listener. Hereinafter, the impulse response corresponding to the right ear of the listener will be referred to as “impulse response R′.”
The response processing unit 101c calculates an impulse response by determining a cross correlation function between the measurement signal LR and a reference measurement signal by mathematical operation, calculates an impulse response by determining a cross correlation function between the measurement signal LL and the reference measurement signal by mathematical operation, and synthesizes the two calculated impulse responses. The synthesized impulse response is an impulse response corresponding to the left ear of the listener. Hereinafter, the impulse response corresponding to the left ear of the listener will be referred to as “impulse response L′.”
Note that the reference measurement signal is the same as the measuring signal generated by the measuring signal generation unit 101a and, is time synchronized. The reference measurement signal is stored in the flash memory 16, for example.
The impulse response recording unit 102 writes the impulse responses R′ and L′ acquired by the impulse response acquisition unit 101 to, for example, the flash memory 16.
As shown in
The bandwidth division unit 201 includes, for example, a 1/N octave bandwidth filter. The bandwidth division unit 201 divides each of the impulse responses R′ and L′ written to the flash memory 16 into a plurality of bandwidths bw1 to bwN with the 1/N octave bandwidth filter, which are then output to the calculation unit 202.
Hereinafter, the impulse response R′ of each bandwidth after division will be referred to as “split bandwidth response Rd”. Furthermore, the impulse response L′ of each bandwidth after division will be referred to as “split bandwidth response Ld”.
The calculation unit 202 generates various control parameters by performing the following processes for each of the bandwidths bw1 to bwN: calculation of the interaural cross correlation function based on the split bandwidth response Rd and split bandwidth response Ld; determination of the target position based on the calculated interaural cross correlation function; calculation of the delay amount based on the target position; and calculation of the phase correction amount. Details of each process by the calculation unit 202 are described later.
Note that the various control parameters generated by the calculation unit 202 include control parameters CPd and CPp corresponding to each of the bandwidths bw1 to bwN. The control parameter CPd is a control parameter for delaying one of either the audio signal output to the speaker SPFR or audio signal output to the speaker SPFL. The control parameter CPp is a control parameter for determining the phase correction amount of the audio signal by an all-pass filter.
The input unit 203 includes a selector connected to various sound sources. The input unit 203 outputs an audio signal S1 input from the sound source connected to the selector to the bandwidth division unit 204.
Note that in the present embodiment, the audio signal S1 is a two-channel signal that includes an R-channel audio signal S1R and an L-channel audio signal S1L.
The bandwidth division unit 204 includes, for example, a 1/N octave bandwidth filter. The bandwidth division unit 204 divides the audio signal S1 input from the input unit 203 into a plurality of bandwidths bw1 to bwN using the 1/N octave band filter, similar to the bandwidth division unit 201, which are then output to the processing unit 205.
Hereinafter, the audio signal S1R in each bandwidth after division will be referred to as “split bandwidth audio signal S2R.” Furthermore, the audio signal S1L in each bandwidth after division will be referred to as “split bandwidth audio signal S2L.”
The delay processing unit 205A delays audio signals for each of the bandwidths bw1 to bwN. By way of example, for each of the bandwidths bw1 to bwN, the delay processing unit 205a delays one of the split bandwidth audio signal S2R or split bandwidth audio signal S2L input from the bandwidth division unit 204 based on the control parameter CPd input from the calculation unit 202, and then outputs the signal to the phase correction unit 205b.
The phase correction unit 205b corrects the phase of the audio signal for each of the bandwidths bw1 to bwN. By way of example, the phase correction unit 205b includes an all-pass filter. As described in detail later, if the sign of the correlation value of the interaural cross correlation function is negative, the phase correction unit 205b applies the all-pass filter to the split bandwidth audio signals S2R and S2L to correct the phase based on the control parameter CPp input from the calculation unit 202, and then outputs the signals to the bandwidth synthesis unit 206. Furthermore, if the sign of the correlation value of the interaural cross correlation function is positive, the phase correction unit 205b outputs to the bandwidth synthesis unit 206 without applying the all-pass filter to the split bandwidth audio signals S2R and S2L.
Hereinafter, the split bandwidth audio signal S2R output from the phase correction unit 205b will be referred to as “split bandwidth audio signal S3R.” Furthermore, the split bandwidth audio signal S3L output from the phase correction unit 205b will be referred to as “split bandwidth audio signal S3L.”
The bandwidth synthesis unit 206 synthesizes the split bandwidth audio signal S3R in the bandwidths bw1 to bwN input from the phase correction unit 205b and the split bandwidth audio signal S3L in the bandwidths bw1 to bwN input from the phase correction unit 205b. An R-channel audio signal S4R obtained by synthesizing the split bandwidth audio signal S3R of the bandwidths bw1 to bwN and the L-channel audio signal S4L obtained by synthesizing the split bandwidth audio signal S3L of the bandwidths bw1 to bwN are output to the output unit 207.
The output unit 207 converts the two-channel audio signals S4R and S4L input from the bandwidth synthesis unit 206 into analog signals, respectively, amplifies the converted analog signals, and then outputs from the speakers SPFR and SPFL inside the vehicle interior. As a result, music of the sound source is reproduced, for example. Time alignment processing is performed based on the control parameter CPd in the delay processing unit 205a, such that sound image localization bias during music playback is improved.
In the pre-processing shown in
The binaural microphone MIC acquires the measurement sound sequentially output from each of the speakers SPFR and SPFL (step S103).
The control unit 101b outputs the measurement signals (specifically, the measurement signals RR, RL, LR and LL) input from the binaural microphone MIC to the response processing unit 101c.
The response processing unit 101c calculates the impulse response R′ based on the measurement signals RR and RL input from the control unit 101b and the impulse response L′ based on the measurement signals LR and LL input from the control unit 101b (step S104). The impulse response recording unit 102 writes the impulse responses R′ and L′ calculated by the response processing unit 101c to the flash memory 16 (step S105).
In the acoustic processing shown in
The IACF calculation unit 202a calculates the interaural cross correlation function for each of the bandwidths bw1 to bwN (step S202). By way of example, the IACF calculation unit 202a calculates the interaural cross correlation function in accordance with the following equation.
Rd(t) represents the amplitude of the split bandwidth response Rd at time t and represents the sound pressure entering the right ear at time t. Ld(t) represents the amplitude of the split bandwidth response Ld in the same bandwidth as the split bandwidth response Rd at the time t and represents the sound pressure entering the left ear at time t. t1 and t2 represent measurement times. As an example, t1 is 0 milliseconds and t2 is 100 milliseconds. T represents a correlation time. The range of the correlation time T is greater than ±1 millisecond and, for example, is in a range of ±50 milliseconds.
The closer the waveforms of the sound reaching the right and left ears of the listener, the closer the absolute value of the correlation value approaches 1 in the interaural cross correlation function exemplified in
In the present embodiment, the correlation value is calculated based on the right ear. Therefore, if the sound image is present on the right side of the listener, a higher peak correlation value is more likely to appear at a positive time. Furthermore, if the sound image is present on the left side of the listener, a higher peak correlation value is more likely to appear at a negative time. In light thereof, it is presumed that the sound image is localized slightly to the right of the listener in the example in
Thus, the IACF calculation unit 202a operates as a function acquisition unit that acquires the interaural cross-correlation when listening to sound output from a plurality of speakers (speakers SPFR and SPFL) at a predetermined listening position (e.g., driver seat, front passenger seat, or rear seat).
In the present embodiment, the following processing is performed to improve the slightly right-biased sound image localization shown in
By way of example, the target position determination unit 202b determines the target position based on the interaural cross correlation function calculated in step S202 for each of the bandwidths bw1 to bwN (step S203).
The interaural cross correlation function of the predetermined range is an interaural cross correlation function in a range of ±30 milliseconds, for example. The acoustic center C is the center of the entire shape formed by the interaural cross correlation function in the ±30 milliseconds range on the coordinate plane. The shape formed by the binaural cross-correlation function is the shape indicated by the hatched region (see
The target position determination unit 202b determines the calculated acoustic center C as the target position.
In another embodiment, the target position determination unit 202b may determine the peak position of the interaural cross correlation function near the acoustic center C as the target position. By way of example, the target position determination unit 202b may determine the peak position P1 nearest to the acoustic center C as the target position, or the largest peak position P2 within a certain range (e.g., ±10 milliseconds centered on the acoustic center C) as the target position.
Thus, the target position determination unit 202b operates as a position determination unit that determines the target position based on the interaural cross correlation function in a predetermined range (±n millisecond range) of the interaural cross correlation functions acquired by the IACF calculation unit 202a. In other words, the target position determination unit 202b operates as an acoustic center calculation unit that calculates the acoustic center C of the interaural cross correlation function in a predetermined range on a coordinate plane with the correlation value on the vertical axis and time on the horizontal axis, and determines the target position based on the acoustic center.
The delay amount calculation unit 202c calculates the delay amount based on the target position determined by the target position determination unit 202b for each of the bandwidths bw1 to bwN (step S204).
By way of example, the delay amount calculation unit 202c calculates the delay amount for the audio signal output to one speaker SP such that the acoustic center C, which is the target position, is positioned at or near 0 seconds on the time axis. In the present embodiment, the acoustic center C appears at a position on the time axis that is time TC seconds (in other words, slightly to the right of the listener). Therefore, the delay amount calculation unit 202c calculates time TC seconds as the delay amount for the audio signal output to the speaker SPFR.
The delay amount calculation unit 202c generates a control parameter CPd for delaying a delay target audio signal for each of the bandwidths bw1 to bwN (step S205).
The control parameter CPd includes a value indicating the delay target and a delay amount thereof. In the examples of
Note that when the target position is the peak position P1, the delay amount calculation unit 202c calculates the time TP1 seconds as the delay amount for the audio signal output to the speaker SPFR. When the target position is the peak position P2, the delay amount calculation unit 202c calculates the time TP2 seconds as the delay amount for the audio signal output to the speaker SPFR.
The sound processing unit 200 performs time alignment processing based on the control parameter CPd (step S206).
Specifically, the delay processing unit 205a of processing unit 205 performs delay processing based on the control parameter CPd for each of the bandwidths bw1 to bwN. Next, bandwidth synthesis processing by the bandwidth synthesis unit 206 and output processing by the output unit 207 are performed to reproduce an audio signal in which time alignment processing is applied to each of the bandwidths bw1 to bwN.
Thus, the delay processing unit 205a operates as a delay unit that delays the audio signal output to at least one of the plurality of speakers based on the delay amount calculated by the delay amount calculation unit 202c.
In the pre-processing unit 100, the impulse responses R′ and L′ of the sound after time alignment processing output from the output unit 207 are calculated and written to the flash memory 16 (see steps S103 to S106 in
The bandwidth division unit 201 divides each of the impulse responses R′ and L′ of the sound after time alignment processing, written to the flash memory 16, into a plurality of bandwidths bw1 to bwN (step S207). The IACF calculation unit 202a calculates the interaural cross correlation function of the impulse responses R′ and L′ of the sound after time alignment processing for each of the bandwidths bw1 to bwN (step S208).
As shown in
In the present embodiment, the target position is not determined by a simple method, for example, by determining the highest peak position as the target position, but is determined based on the acoustic center, in which correlation values other than the peak position are also considered (in other words, values that affect the sense of sound image localization). Therefore, even in a listening environment such as a vehicle interior and the like, where the graph of the interaural cross correlation function can take a complicated shape due to asymmetric speaker placement and a large amount of reflected and reverberant sound, an effect of improving the sound image localization bias can be sufficiently achieved.
Herein, if the sign of the correlation value with the largest absolute value of the interaural cross correlation functions in the predetermined range calculated in step S208 is negative, the phase of the sound from the speaker SPFR and the sound from the speaker SPFL is inverted at a position where the sense of sound image localization is strong. This causes the listener to feel auditory discomfort.
Therefore, if the sign of the largest correlation value above is negative (step S209: YES), the phase correction amount calculation unit 202d generates a control parameter CPp to make the sign of the correlation value positive (step S210). If the sign of the largest correlation value above is positive (step S209: NO), the acoustic processing shown in
The control parameter CPp includes a value indicating the phase correction amount. The phase correction amount indicates, for example, a value for turning the phase of a processing target bandwidth by 180° of the bandwidths bw1 to bwN.
The sound processing unit 200 performs phase correction processing based on the control parameter CPp (step S211).
Specifically, the phase correction unit 205b of the processing unit 205 performs phase correction processing based on the control parameter CPp by an all-pass filter for each of the bandwidths bw1 to bwN. The all-pass filter applied in the phase correction processing is, for example, a cascade connection of a predetermined number of second-order IIR (Infinite Impulse Response) filters. Note that the number of second-order IIR filters is determined as appropriate, taking into account the accuracy of phase correction and a filter processing load.
The phase correction processing by the phase correction unit 205b aligns the phase of the sound from the speaker SPFR and the sound from the speaker SPFL, such that music and the like are reproduced as an audibly natural sound.
The aforementioned is a description of exemplary embodiments. Embodiments of the present invention are not limited to those described above, and various modifications are possible within a scope of the technical concept of the present invention. For example, embodiments and the like that are explicitly indicated by way of example in the specification or combinations of obvious embodiments and the like are also included, as appropriate, in the embodiments of the present application.
For example, in the embodiment above, calculation and recording of the impulse responses R′ and L′ are performed as pre-processing to improve sound image localization bias, but the present invention is not limited thereto. In another embodiment, in addition to the calculation and recording of the impulse responses R′ and L′, bandwidth division by the bandwidth division unit 201 and various processes by the calculation unit 202 (calculation of interaural cross correlation function, determination of target position, calculation of delay amount, calculation of phase correction amount, and control parameters) may be performed as pre-processing.
If a pair of speakers is installed on the rear seat side in addition to the speakers SPFR and SPFL, processing is performed by the following procedure. By way of example, a binaural microphone MIC is installed in a front seat (driver seat or front passenger seat), and the processing shown in
Number | Date | Country | Kind |
---|---|---|---|
2022-100749 | Jun 2022 | JP | national |