This Nonprovisional application claims priority under 35 U.S.C. ยง 119(a) on Patent Application No. 2020-096756 filed in Japan on Jun. 3, 2020, the entire contents of which are hereby incorporated by reference.
One embodiment of the present disclosure relates to a sound signal processing method for processing an obtained sound signal.
In facilities such as a concert hall, various genres of music are played and speeches such as a lecture are given. Such facilities are required to have various acoustic characteristics (for example, reverberant characteristics). For example, a performance requires a relatively long reverberant, and a speech requires a relatively short reverberant.
However, in order to physically change the reverberant characteristics in a hall, it is necessary to change the size of an acoustic space by, for example, moving the ceiling, which requires a very large-scale facility.
In view of the above, for example, a sound field control device as shown in JP H06-284493 A performs processing that supports a sound field by generating a reverberant sound by processing the sound obtained by a microphone with a finite impulse response (FIR) filter, and outputting the reverberant sound from a speaker installed in a hall.
However, the sense of localization is blurred only by adding a reverberant sound. Recently, it has been desired to realize clearer sound image localization and richer space expansion.
In view of the above, an object of one embodiment of the present disclosure is to provide a sound signal processing method for controlling a richer acoustic space.
The sound signal processing method includes receiving a sound signal, generating an early reflection sound control signal that reproduces an early reflection sound and a reverberant sound control signal that reproduces a reverberant sound from the sound signal, controlling volume of the sound signal and distributing the sound signal to generate a direct sound control signal that reproduces a direct sound, and mixing the direct sound control signal, the early reflection sound control signal, and the reverberant sound control signal to generate an output signal.
The sound signal processing method can realize clearer sound image localization and richer space expansion.
The room 62 constitutes a space having a substantially rectangular parallelepiped shape. A sound source 65 exists on a stage 60 at the front of the room 62. The back of the room 62 corresponds to audience seats where listeners sit. The sound source 65 is, for example, a voice, a singing sound, an acoustic musical instrument, an electric musical instrument, an electronic musical instrument, or the like.
The shape of the room 62, the arrangement of sound sources, and the like are not limited to the example shown in
The sound field support system 1 includes, in the room 62, a directional microphone 11A, a directional microphone 11B, a directional microphone 11C, an omnidirectional microphone 12A, an omnidirectional microphone 12B, an omnidirectional microphone 12C, speakers 51A to 51J, and speakers 61A to 61F.
The speakers 51A to 51J are set on a wall surface. The speakers 51A to 51J are speakers with relatively high directivity, and mainly output sound toward audience seats. The speakers 51A to 51J output an early reflection sound control signal that reproduces an early reflection sound. Further, the speakers 51A to 51J output a direct sound control signal that reproduces a direct sound of the sound source.
The speakers 61A to 61F are installed on the ceiling. The speakers 61A to 61F are speakers with relatively low directivity, and output sound to the entire room 62. The speakers 61A to 61F output a reverberant sound control signal that reproduces a reverberant sound. Further, the speakers 61A to 61F output a direct sound control signal that reproduces a direct sound of the sound source. The number of speakers is not limited to the number shown in
The directional microphone 11A, the directional microphone 11B, and the directional microphone 11C mainly collect the sound of the sound source 65 on the stage. However, as shown in
The omnidirectional microphone 12A, the omnidirectional microphone 12B, and the omnidirectional microphone 12C are installed on the ceiling. The omnidirectional microphone 12A, the omnidirectional microphone 12B, and the omnidirectional microphone 12C collect the entire sound in the room 62, including the direct sound of the sound source 65 and the reflection sound in the room 62. The number of the directional microphones and the omnidirectional microphones shown in
In
The CPU constituting the sound signal processor 10 reads an operation program stored in the memory 31 and controls each configuration. The CPU functionally constitutes the position information obtainer 29, the impulse response obtainer 151, and the level balance adjuster 152 by the operation program. The operation program does not need to be stored in the memory 31. The CPU may download the operation program from, for example, a server (not shown) each time.
The gain adjuster 22 adjusts a gain of the sound signal obtained by the sound signal obtainer 21. The gain adjuster 22 sets, for example, a gain of the directional microphone at a position close to the sound source 65 to be high. The gain adjuster 22 is not an essential configuration in the present disclosure.
The mixer 23 mixes the sound signal whose gain is adjusted by the gain adjuster 22. Further, the mixer 23 distributes the mixed sound signal to a plurality of signal processing systems.
The mixer 23 outputs the distributed sound signal to the panning processor 23D, the FIR filter 24A, and the FIR filter 24B.
For example, the mixer 23 distributes the sound signals obtained from the directional microphone 11A, the directional microphone 11B, and the directional microphone 11C to ten signal processing systems according to the speakers 51A to 51J. Alternatively, the mixer 23 may distribute line-inputted sound signals to ten signal processing systems according to the speakers 51A to 51J.
Further, the mixer 23 distributes the sound signals obtained from the omnidirectional microphone 12A, the omnidirectional microphone 12B, and the omnidirectional microphone 12C to six signal processing systems according to the speakers 61A to 61F.
The mixer 23 outputs the sound signals mixed in the ten signal processing systems to the panning processor 23D and the FIR filter 24A. Further, the mixer 23 outputs the sound signals mixed in six signal processing systems to the FIR filter 24B.
Hereinafter, the six signal processing systems that output a sound signal to the FIR filter 24B will be referred to as a first system or a reverberant sound system, and the ten signal processing systems that output a sound signal to the FIR filter 24A will be referred to as a second system or an early reflection sound system. Further, ten signal processing systems that output a sound signal to the panning processor 23D will be referred to as a third system or a direct sound system. The FIR filter 24A corresponds to an early reflection sound control signal generator, and the FIR filter 24B corresponds to a reverberant sound control signal generator. The panning processor 23D corresponds to a direct sound control signal generator.
The mode of distribution is not limited to that in the above example. For example, sound signals obtained from the omnidirectional microphone 12A, the omnidirectional microphone 12B, and the omnidirectional microphone 12C may be distributed to the direct sound system or the early reflection sound system. Further, a line-inputted sound signal may be distributed to the reverberant sound system. Further, a line-inputted sound signal and sound signals obtained from the omnidirectional microphone 12A, the omnidirectional microphone 12B, and the omnidirectional microphone 12C may be mixed and distributed to the direct sound system or the early reflection sound system.
The mixer 23 may have a function of electronic microphone rotator (EMR). The EMR is a method of flattening the frequency characteristics of a feedback loop by changing a transfer function between a fixed microphone and a speaker over time. The EMR is a function that switches a connection relationship between a microphone and a signal processing system from moment to moment. The mixer 23 outputs a sound signal obtained from the directional microphone 11A, the directional microphone 11B, and the directional microphone 11C to the panning processor 23D and the FIR filter 24A by switching an output destination. Alternatively, the mixer 23 outputs the sound signal obtained from the omnidirectional microphone 12A, the omnidirectional microphone 12B, and the omnidirectional microphone 12C to the FIR filter 24B by switching an output destination. In this manner, the mixer 23 can flatten the frequency characteristics of an acoustic feedback system from a speaker to a microphone in the room 62. Further, the mixer 23 can ensure stability against howling.
Next, the panning processor 23D controls the volume of each sound signal of the direct sound system according to the position of the sound source 65 (S12). In this manner, the panning processor 23D generates a direct sound control signal.
The panning processor 23D obtains the position information of the sound source 65 from the position information obtainer 29. The position information is information indicating two-dimensional or three-dimensional coordinates with respect to a certain position of the room 62. The position information of the sound source 65 can be obtained by a beacon and a tag that transmit and receive a radio wave of, for example, Bluetooth (registered trademark). In the room 62, at least three beacons are installed in advance. The sound source 65 includes a tag. That is, a tag is attached to a performer or an instrument. Each beacon transmits and receives radio waves to and from the tag. Each beacon measures the distance to the tag based on the time difference between transmitting and receiving of radio waves. If the position information obtainer 29 obtains the position information of the beacon in advance, the position of the tag can be uniquely obtained by measurement of the distances from at least three beacons to the tag.
The position information obtainer 29 obtains the position information of the sound source 65 by obtaining the position information of the tag measured in the above manner. Further, the position information obtainer 29 obtains the position information of each of the speakers 51A to 51J and the speakers 61A to 61F in advance.
The panning processor 23D controls the volume of each sound signal output to the speakers 51A to 51J and the speakers 61A to 61F so that the sound image is localized at the position of the sound source 65 based on the obtained position information and the position information of the speakers 51A to 51J and the speakers 61A to 61F, so as to generate the direct sound control signal.
The panning processor 23D controls the volume according to the distance between the sound source 65 and each of the speakers, the speakers 51A to 51J and the speakers 61A to 61F. For example, the panning processor 23D increases the volume of the sound signal output to the speaker near the position of the sound source 65, and decreases the volume of the sound signal output to the speaker far from the position of the sound source 65. In this manner, the panning processor 23D can localize the sound image of the sound source 65 at a predetermined position. For example, in the example of
If the sound source 65 moves to the left side of the stage, the panning processor 23D changes the volume of each sound signal output to the speakers 51A to 51J and the speakers 61A to 61F based on the position information of the moved sound source 65. For example, the panning processor 23D increases the volume of the sound signal output to the speakers 51A, 51B, and 51F, and decreases the volume of the other speakers. In this manner, the sound image of the sound source 65 is localized on the left side of the stage when viewed from the audience seats.
As described above, the sound signal processor 10 realizes a distribution processor of the present disclosure by the mixer 23 and the panning processor 23D.
Next, the FIR filter 24A and the FIR filter 24B perform indirect sound generation processing (S13). The indirect sound generation processing is processing of individually generating an early reflection sound control signal that reproduces an early reflection sound and a reverberant sound control signal that reproduces a reverberant sound. The FIR filter 24A and the FIR filter 24B correspond to an indirect sound generator of the present disclosure.
First, the impulse response obtainer 151 sets filter coefficients of the FIR filter 24A and the FIR filter 24B. Here, impulse response data set to the filter coefficient will be described.
As shown in
Data of the impulse response is stored in the memory 31. The impulse response obtainer 151 obtains data of an impulse response from the memory 31. However, the data of an impulse response does not need to be stored in the memory 31. The impulse response obtainer 151 may download the data of an impulse response from, for example, a server (not shown) each time.
The impulse response obtainer 151 may obtain the data of an impulse response in which only the early reflection sound is cut out in advance and set the data to the FIR filter 24A. Alternatively, the impulse response obtainer 151 may obtain the data of an impulse response including the direct sound, the early reflection sound, and the reverberant sound, cut out only the early reflection sound, and set the data to the FIR filter 24A. Similarly, when only the reverberant sound is used, the impulse response obtainer 151 may obtain the data of an impulse response obtained by cutting out only the reverberant sound in advance and set the data to the FIR filter 24B. Alternatively, the impulse response obtainer 151 may obtain the data of an impulse response including the direct sound, the early reflection sound, and the reverberant sound, cut out only the reverberant sound, and set the data to the FIR filter 24B.
The data of an impulse response may be obtained at any position in the space 620. However, it is preferable to measure the data of an impulse response of the early reflection sound using a directional microphone installed near a wall surface. The early reflection sound is a clear reflection sound in the direction of arrival. Therefore, by measuring the data of an impulse response with a directional microphone installed near a wall surface, it is possible to precisely obtain the reflection sound data of the target space. In contrast, the reverberant sound is a reflection sound in which the direction of arrival of the sound is uncertain. Therefore, the data of an impulse response of the reverberant sound may be measured with a directional microphone installed near the wall surface, or may be measured by using an omnidirectional microphone different from one used for the early reflection sound.
The FIR filter 24A convolves data of different impulse responses with ten sound signals of the second system. When there are a plurality of signal processing systems, the FIR filter 24A and the FIR filter 24B may be provided for each of the signal processing systems. For example, ten of the FIR filters 24A may be provided.
As described above, when a directional microphone installed near a wall surface is used, the data of an impulse response is measured with a separate directional microphone for each signal processing system. For example, as shown in
The FIR filter 24A convolves the data of an impulse response with each sound signal of the second system. The FIR filter 24B convolves the data of an impulse response with each sound signal of the first system.
The FIR filter 24A generates an early reflection sound control signal that reproduces an early reflection sound in a predetermined space by convolving the data of an impulse response of a set early reflection sound with an input sound signal. The FIR filter 24B generates a reverberant sound control signal that reproduces a reverberant sound in a predetermined space by convolving the data of an impulse response of a set reverberant sound with an input sound signal.
The level setter 25A adjusts the level of the early reflection sound control signal. The level setter 25B adjusts the level of the reverberant sound control signal. The level balance adjuster 152 sets a level adjustment amount of the panning processor 23D, the level setter 25A, and the level setter 25B. The level balance adjuster 152 refers to a level of each of the direct sound control signal, the early reflection sound control signal, and the reverberant sound control signal, and adjusts the level balance of these signals. For example, the level balance adjuster 152 adjusts the level balance between the last level in time of the direct sound control signal and the first component in time of the early reflection sound control signal. Further, the level balance adjuster 152 adjusts the balance between the level of the last component in time of the early reflection sound control signal and the level of the first component in time of the reverberant sound control signal. Alternatively, the level balance adjuster 152 may adjust the balance between the power of a plurality of components of the latter half in time of the early reflection sound control signal and the power of a component of the first half in time of the reverberant sound control signal. In this manner, the level balance adjuster 152 can individually control the sounds of the direct sound control signal, the early reflection sound control signal, and the reverberant sound control signal, and control the sounds to be in an appropriate balance according to the space to which the sounds are applied.
The delay adjuster 28 adjusts delay time according to the distance between an optional microphone and a plurality of speakers. For example, for a plurality of speakers, the delay adjuster 28 sets the delay time to be smaller in ascending order of distances between the directional microphone 11B and the speakers. Alternatively, the delay adjuster 28 adjusts the delay time from the positions of the sound source and the microphone for which to measure the impulse response in the space 620 that reproduces the sound field. For example, in the FIR filter 24A, when the impulse response data measured at the directional microphone 510J installed in the space 620 is convolved with the speaker 51J, the delay time corresponding to the distance between the directional microphone 510J and the sound source 65 in the space 620 is set to the delay time of the speaker 51J in the delay adjuster 28. In this manner, the early reflection sound control signal and the reverberant sound control signal reach the listener later than the direct sound control signal, so that clear sound image localization and rich space expansion are realized.
It is preferable that the sound signal processor 10 do not perform delay adjustment on the direct sound control signal. If the position of the sound source 65 changes significantly in a short period of time when the sound image localization is controlled by the delay adjustment, phase interference occurs between the sounds output from a plurality of speakers. By not performing the delay adjustment on the direct sound control signal, the sound signal processor 10 can maintain the timbre of the sound source 65 without causing phase interference even if the position of the sound source 65 changes significantly in a short time.
Next, the output signal generator 26 mixes the direct sound control signal, the early reflection sound control signal, and the reverberant sound control signal to generate an output signal (S14). The output signal generator 26 may perform gain adjustment of each signal, adjustment of the frequency characteristics, and the like at the time of mixing.
The output unit 27 converts an output signal output from the output signal generator 26 into an analog signal. Further, the output unit 27 amplifies the analog signal. The output unit 27 outputs the amplified analog signal to a corresponding speaker (S15).
With the above configuration, the sound signal processor 10 obtains a sound signal, controls the volume of the sound signal and distributes the sound signal to a plurality of systems, generates a direct sound control signal, an early reflection sound control signal, and a reverberant sound control signal from the sound signal, and mixes the distributed sound signal, the direct sound control signal, the early reflection sound control signal, and the reverberant sound control signal to generate an output signal. In this manner, the sound signal processor 10 realizes clearer sound image localization and richer space expansion than before.
In particular, the sound signal processor 10 realizes the localization of a sound source by controlling the volume of a sound signal distributed to a plurality of speakers based on the position information of the sound source. Accordingly, it is possible to uniformly localize a clear sound image over a wide range in real time without depending on a reproduction environment such as the number and arrangement of speakers.
Further, the sound signal processor 10 outputs an early reflection sound control signal and a reverberant sound control signal from a plurality of speakers in addition to a direct sound control signal. In this manner, the audience listens to the early reflection sound control signal and the reverberant sound control signal in addition to the direct sound control signal. Therefore, the audience does not pay attention only to a specific speaker to which the direct sound control signal is output. Therefore, even when the number of speakers is small and the distance between the speakers is wide, the sound image is not localized only in a specific speaker.
The omnidirectional microphone 12A, the omnidirectional microphone 12B, and the omnidirectional microphone 12C collect the entire sound in the room 62, including the direct sound of the sound source 65 and the reflection sound in the room 62. Therefore, if the sound signal processor 10 generates a reverberant sound control signal using sound signals obtained by the omnidirectional microphone 12A, the omnidirectional microphone 12B, and the omnidirectional microphone 12C, the same reverberant is reproduced in the sound of the stage and in the sound of the audience seat. Therefore, for example, the same reverberant is reproduced in the sound of the performer and in the sound of the applause of the audience, and the audience can obtain a sense of unity.
Further, the early reflection sound has a smaller number of reflections than the reverberant sound that undergoes multiple reflections in the space. For this reason, the energy of the early reflection sound is higher than the energy of the reverberant sound. Therefore, by increasing the level of each speaker that outputs the early reflection sound control signal, the effect of the subjective impression of the early reflection sound can be improved, and the controllability of the early reflection sound can be improved.
Further, by reducing the number of speakers that output the early reflection sound control signal, it is possible to suppress an excessive increase in diffused sound energy. That is, the extension of the reverberant in the room due to the early reflection sound control signal can be suppressed, and the controllability of the early reflection sound can be improved.
As the speaker that outputs the direct sound control signal and the early reflection sound control signal is installed on the side of the room, which is located close to the audience, the speaker can be easily controlled to deliver a direct sound and an early reflection sound to the audience, and the controllability of the early reflection sound can be improved. Further, by installing the speaker that outputs the reverberant sound control signal on the ceiling of the room, it is possible to suppress the difference in the reverberant sounds due to the positions of the audience.
The description of the present embodiment is exemplary in all respects and is not restrictive. The scope of the present disclosure is shown not by the above-described embodiment but by the scope of claims. Furthermore, the scope of the present disclosure is intended to include all modifications within the meaning and scope equivalent to those of the claims.
Number | Date | Country | Kind |
---|---|---|---|
JP2020-096756 | Jun 2020 | JP | national |
Number | Name | Date | Kind |
---|---|---|---|
5642425 | Kawakami | Jun 1997 | A |
20070025560 | Asada | Feb 2007 | A1 |
20090296962 | Shirakihara | Dec 2009 | A1 |
20100135510 | Yoo | Jun 2010 | A1 |
20120063608 | Soulodre | Mar 2012 | A1 |
20130010984 | Hejnicki | Jan 2013 | A1 |
20160125871 | Shirakihara | May 2016 | A1 |
20200388296 | Von Tuerckheim | Dec 2020 | A1 |
Number | Date | Country |
---|---|---|
3026666 | Jun 2016 | EP |
H06284493 | Oct 1994 | JP |
2006245670 | Sep 2006 | JP |
Entry |
---|
Extended European search report issued in European Appln. No. 21177096.1 dated Oct. 26, 2021. |
Number | Date | Country | |
---|---|---|---|
20210385597 A1 | Dec 2021 | US |