The present invention relates to a technique for mixing acoustic signals obtained by performing sound collection using multiple microphones.
Currently, a virtual reality (VR) system using a head-mounted display has been proposed. In such a VR system, an image corresponding to the field of view of a user wearing the head-mounted display is displayed on a display.
Sound that is to be output from a speaker of the head-mounted display together with these images is, for example, collected by multiple microphones (hereinafter called “mics”).
Japanese Patent No. 3905364 discloses a configuration in which two acoustic signals of a right (R) channel and a left (L) channel are generated by processing the acoustic signals collected by the two mics based on the expansion/contraction rate of the sound field, and one set of (two) speakers is driven using two acoustic signals of the R channel and the L channel, and thereby the range of the sound field is adjusted.
Although Japanese Patent No. 3905364 discloses driving two speakers by adjusting the range of a sound field of acoustic signals obtained by performing sound collection using multiple mics, Japanese Patent No. 3905364 does not disclose driving three or more speakers by adjusting the range of a sound field of acoustic signals obtained by performing sound collection using multiple mics.
According to an aspect of the present invention, a mixing apparatus for outputting drive signals for respectively driving N (N being an integer that is 3 or more) speakers based on acoustic signals obtained by performing sound collection using a plurality of microphones, includes: a first speaker set processing unit to a P-th (P being N−1 or N) speaker set processing unit corresponding to respective speaker sets of two adjacent speakers among the N speakers, the first speaker set processing unit to the P-th speaker set processing unit each being configured to output a first drive signal for driving a first speaker of a corresponding speaker set, and a second drive signal for driving a second speaker of a corresponding speaker set; and a compositing unit configured to composite drive signals for driving the same speaker among 2P drive signals output by the first speaker set processing unit to the P-th speaker set processing unit. K-th speaker set processing unit (K being an integer from 1 to P) includes: a mic set processing unit that is provided corresponding to each microphone set of two microphones among the plurality of microphones determined based on arrangement positions of the plurality of microphones, and is configured to process acoustic signals output by the two microphones of a corresponding microphone set and to output a first acoustic signal and a second acoustic signal; a first addition unit configured to add the first acoustic signal output by the mic set processing unit corresponding to the microphone set and to output the first drive signal for driving the first speaker of a corresponding speaker set; and a second addition unit configured to add the second acoustic signal output by the mic set processing unit corresponding to the microphone set and to output the second drive signal for driving the second speaker of a corresponding speaker set, and the mic set processing unit configured to process acoustic signals output by two microphones of a corresponding microphone set based on an expansion/contraction coefficient for determining an expansion/contraction rate of a sound field, a shift coefficient for determining a shift amount of a sound field, and an attenuation coefficient for determining an attenuation amount of an acoustic signal output by a microphone.
Other features and advantages of the present invention will become clear through the following description given with reference to the accompanying drawings. Note that in the accompanying drawings, configurations that are the same or similar are denoted by the same reference numerals.
Hereinafter an exemplary embodiment of the present invention will be described with reference to the drawings. Note that the following embodiment is exemplary and the present invention is not limited to the content of the embodiment. Also, in the following drawings, constituent elements that are not needed in the description of the embodiment are omitted from the drawings.
Also, the acoustic signal processing unit 11 includes speaker compositing units corresponding to the respective speakers #2 to #N−1 included in two sets in a speaker set. Note that the speaker compositing unit corresponding to the speaker #X (X being an integer from 2 to N−1) is the X-th speaker compositing unit. Two signals for driving the speaker #X output by the speaker set processing unit, or more specifically, a higher-number drive signal #X−1 and a lower-number drive signal #X, are input to the X-th speaker compositing unit. The X-th speaker compositing unit composites the higher-number drive signal #X−1 and the lower-number drive signal #X and outputs the resulting signal as a drive signal #X. Note that among the total of 2(N−1) signals output by the N−1 set processing units, the signals for driving the speakers #1 and #N are only the lower-number drive signal #1 and the higher-number drive signal #N−1, and therefore the acoustic signal processing unit 11 outputs the lower-number drive signal #1 and the higher-number drive signal #N−1 as the drive signal #1 and the drive signal #N respectively.
As shown in
Hereinafter, the processing performed by the mic set processing unit will be described. First, it is assumed that an acoustic signal collected by a mic A will be called an acoustic signal A, an acoustic signal collected by a mic B will be called an acoustic signal B, and the acoustic signal A and the acoustic signal B are input to the mic set processing unit. The mic set processing unit performs a discrete Fourier transform on the acoustic signal A and the acoustic signal B each predetermined time segment. Hereinafter, the signals of the frequency ranges obtained by performing a discrete Fourier transform on the acoustic signal A and the acoustic signal B are a signal A and a signal B respectively. The mic set processing unit generates a signal R (a light channel: corresponds to a lower number) and a signal L (a left channel: corresponding to a higher number) of a frequency range from the signal A and the signal B using the following formula (1). Note that the processing shown in formula (1) is performed for each frequency component (bin) of the signal A and the signal B. Then, the mic set processing unit performs a discrete inverse Fourier transform on the signal R and the signal L of the frequency range and outputs two acoustic signals, namely an acoustic signal R and an acoustic signal L. The lower-number compositing unit adds the acoustic signals R output by the first mic set processing unit to the M-th mic set processing unit and outputs the lower-number drive signal #K. Similarly, the higher-number compositing unit adds the acoustic signals L output by the first mic set processing unit to the M-th mic set processing unit and outputs the higher-number drive signal #K.
In formula (1), f is the frequency (bin) being subjected to processing, and Φ is the principal value of the declination of the acoustic signal A and the acoustic signal B. Accordingly, in formula (1), f and Φ are values that are determined according to the two acoustic signals A and the acoustic signal B being subjected to processing. On the other hand, in formula (1), m1, m2, τ, and κ are variables that are determined by a variable determination unit and notified to the mic set processing units. Hereinafter, the technical meaning of the respective variables will be described.
m1 and m2 are attenuation coefficients, and are values that are 0 or more and 1 or less. Note that m1 determines the attenuation amount of the signal A and m2 determines the attenuation amount of the signal B. Hereinafter, it is assumed that m1 is called the attenuation coefficient of the mic A and m2 is called the attenuation coefficient of the mic B.
κ is a scaling (expansion/contraction) coefficient, and determines the range of the sound field. Note that the scaling coefficient κ is a value that is 0 or more and 2 or less. For example, it is assumed that the mic A and the mic B have been arranged as shown in
On the other hand, when m1 and m2 are set to 1 and τ is set to 0, if κ is made less than 1, the range of the sound field becomes shorter than when κ is 1, as shown in
τ is a shift coefficient, and has a value in a range from −x to +x. When τ=0 as described above, a matrix T has no influence on the signal A and the signal B. On the other hand, when τ=0 is not satisfied, the matrix T provides phase changes with the same absolute value but different signs to the signal A and the signal B. Accordingly, the position of the acoustic field shifts in the direction of the mic A or the mic B. Note that the direction of the shift is determined according to the sign of τ, and the greater the absolute value of τ is, the greater the shift amount is.
The coefficient determination unit of the K-th speaker set processing unit determines the coefficients of the first mic set processing unit to the M-th mic set processing unit, that is, m1, m2, τ, and κ, and notifies the first mic set processing unit to the M-th mic set processing unit. Hereinafter, the way in which the coefficient determination unit of the K-th speaker set processing unit determines the coefficients of the mic set processing units will be described.
Segment information indicating segments is input by a segment determination unit 12 (
The coefficient determination unit of the K-th speaker set processing unit stores mic information indicating the arrangement positions of the multiple mics, and speaker information indicating the arrangement positions of the speakers. Also, the segment indicated by the segment information is divided into N−1 sub-segments for each of the first speaker set to the N−1-th speaker set, and the sub-segments corresponding to the K-th speaker set are determined.
L1:L2:L3: . . . :LN-1=D1:D2:D3: . . . :DN-1
L1+L2+L3+. . . +LN-1=L
are satisfied. Note that as shown in
The coefficient determination unit of the K-th speaker set processing unit categorizes the M-th mic set based on the K-th sub-segment 64 and the arrangement positions of the mics.
Hereinafter, the way in which the coefficients to be used by the corresponding mic set processing units are determined for the first to third sets will be described. Note that hereinafter, a coefficient to be used by the mic set processing unit of a certain set will be expressed simply as “coefficient of mic set”. Also, it is assumed that, as shown in
For example, for the first set, the coefficient determination unit sets τ to 0, κ to 1, and for the attenuation coefficient, sets both of the two mics to 1. That is, expansion/contraction and shifting of the sound field are not performed, and the attenuation amount is set to a value according to which the acoustic signals collected by the two mics do not attenuate.
On the other hand, the coefficient determination unit determines the scaling coefficient κ and the shift coefficient τ of the third set such that the range of the sound field corresponds to an overlapping segment. That is, the coefficient determination unit determines the scaling coefficient κ of the third set based on the length L1 of the overlapping segment. Specifically, for example, letting L be the distance between the two mics in the third set, the scaling coefficient for the third set is determined so as to reach an expansion/contraction rate of L1/L. Accordingly, the coefficient determination unit determines the scaling coefficient κ of the third set such that the range of the sound field is shorter the shorter the length of the overlapping segment of the third set is. Also, the coefficient determination unit determines the shift coefficient τ of the third set such that the central position of the sound field is located at the central position of the overlapping segment. Accordingly, the coefficient determination unit determines the shift coefficient of the third set according to the distance between the center of the arrangement position of the two mics and the center of the overlapping segment. Also, the coefficient determination unit sets each of the attenuation coefficients of the two mics in the third set to 1. Alternatively, the coefficient determination unit sets the attenuation coefficient of the mic included in the K-th sub-segment 64 in the third set to a value that is the same as the attenuation coefficients of the two mics in the first set, and sets the attenuation coefficient of the mic not included in the K-th sub-segment 64 so as to be an attenuation amount that is greater than the attenuation amount of the mic included in the K-th sub-segment 64. Alternatively, the coefficient determination unit can set the attenuation coefficient of the mic not included in the K-th sub-segment 64 of the third set such that the attenuation amount increases the greater the length of the non-overlapping segment, that is, the maximum length L2 from the arrangement position of the mic to the K-th sub-segment 64 is.
Furthermore, for example, the coefficient determination unit sets τ to 0 and κ to 1 for the second set, similarly to the first set. However, the attenuation coefficients of the two mics are set to values whose attenuation amounts increase according to the attenuation coefficients set for the mics in the first and third sets. For example, the coefficient determination unit sets the attenuation coefficients of the two mics in the second set to a value at which the attenuation amount is the greater, that is, 0, or to a predetermined value near 0.
For example, as shown in
In the present embodiment, the acoustic signal processing unit 11 includes the first speaker set processing unit to the N−1-th speaker set processing unit, and the first speaker set processing unit to the N−1-th speaker set processing unit output drive signals corresponding to the speaker sets for reproducing the sound field of the first sub-segment to the N−1-th sub-segment using the two speakers included in each of the first speaker set to the N−1-th speaker set. Then, the acoustic signal processing unit 11 outputs the drive signals for driving the speakers. Note that two signals for driving the same speaker among the 2 (N−1) drive signals output by the first speaker set processing unit to the N−1-th speaker set processing unit are composited. By reproducing the sound fields of the sub-segments to which the speaker sets arranged as shown in
Finally, the segment determination unit 12 determines the segment based on a user operation. For example, if the user directly designates a segment, the segment determination unit 12 functions as a reception unit for receiving the operation of the user designating the segment. In this case, the segment determination unit 12 outputs the segment designated by the user to the acoustic signal processing unit 11. On the other hand, for example, if applied to viewing of an image on a head-mounted display for VR, or viewing of a 360-degree panorama image on a tablet, the segment determination unit 12 calculates the segment based on the range of the image viewed by the user and outputs the calculated segment to the acoustic signal processing unit 11.
Note that in the present embodiment, the segment is divided into sub-segments according to the proportion of the arrangement interval of the speakers, but if it is a prerequisite that the speakers are arranged at equal intervals, it is possible to use a configuration in which the segments are divided into sub-segments of equal intervals. In this case, the arrangement information indicating the arrangement positions of the speakers is not necessary.
Note that in the present embodiment, N speakers are arranged linearly in numerical order along a straight line or a curved line and (N−1) speaker sets are thus formed. However, N speakers can be arranged on a closed curved line, or for example, on a circular circumference, and the N speakers can form N speaker sets. In this case, in addition to the configuration shown in
The mixing apparatus 10 according to the present invention can be realized using a program that causes a computer including one or more processors and a storage unit to function as the above-described mixing apparatus 10. These programs can be stored in a non-transitory computer-readable storage medium or be distributed via a network. The program is stored in a storage unit and a processor executes the program, and thereby the functions of the units shown in
The present invention is not limited to the above-described embodiments, and various changes and modifications are possible without departing from the spirit and scope of the present invention. Accordingly, the following claims are attached in order to apprise the public of the scope of the present invention.
Number | Date | Country | Kind |
---|---|---|---|
JP2018-182012 | Sep 2018 | JP | national |
This application is a continuation of International Patent Application No. PCT/JP2019/032668 filed on Aug. 21, 2019, which claims priority to and the benefit of Japanese Patent Application No. 2018-182012 filed on Sep. 27, 2018, the entire disclosures of which are incorporated herein by reference.
Number | Name | Date | Kind |
---|---|---|---|
20200162816 | Horiuchi | May 2020 | A1 |
Number | Date | Country |
---|---|---|
2003-169399 | Jun 2003 | JP |
2006-033501 | Feb 2006 | JP |
3905364 | Apr 2007 | JP |
2019-068210 | Apr 2019 | JP |
Entry |
---|
Horiuchi, et al., “Sound Field Expansion Based on Spectral Modification Between Multiple Channels”, ITE Annual Convention 2017, 2 pages. |
Number | Date | Country | |
---|---|---|---|
20210185439 A1 | Jun 2021 | US |
Number | Date | Country | |
---|---|---|---|
Parent | PCT/JP2019/032668 | Aug 2019 | US |
Child | 17186591 | US |