This application is based upon and claims the benefit of priority of the prior Japanese Patent Application No. 2019-107707, filed on Jun. 10, 2019, the entire contents of which are incorporated herein by reference.
The embodiments discussed herein are related to a storage medium storing a speaker direction determination program, a speaker direction determination method, and a speaker direction determination apparatus.
An existing wearable voice translation system achieves voice translation in a hands-free manner by switching between a source language and a target language based on a speaker direction that is a direction in which a speaker is present. In the voice translation system, when the determination accuracy of the speaker direction is low, translation may not properly be performed and thus, it is demanded to further improve the determination accuracy of the speaker direction. Related art is disclosed in, for example, Japanese Laid-open Patent Publication No. 2018-40982.
According to an aspect of the embodiments, a non-transitory computer-readable storage medium storing a program that causes a computer to execute a process, the process includes acquiring inclination information indicating an inclination of a housing including a plurality of microphones with respect to a predetermined direction of a reference position; acquiring noise information on noise contained in at least one of a plurality of sound signals acquired by the plurality of microphones; acquiring a physical quantity indicating at least one of a phase difference and a sound pressure difference based on the plurality of sound signals acquired by the plurality of microphones; generating a correction model corrected such that the physical quantity in a correspondence in a reference model indicating the correspondence between a sound incidence angle onto the plurality of microphones in the case where the housing is located at the reference position and the physical quantity acquired in the case where the housing is located at the reference position corresponds to noise level indicated by the acquired noise information; setting the physical quantity corresponding to the sound incidence angle associated with the inclination indicated by the acquired inclination information in the correction model as a threshold; and comparing the acquired physical quantity with the set threshold to determine a speaker direction that is a direction in which a speaker making a speech corresponding to the plurality of sound signals acquired by the plurality of microphones is present.
The object and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention.
It is desired to properly determine a speaker direction.
An example of a first embodiment will be described below with reference to figures.
For example, the voice translation unit 40 translates a first language into a second language when the speaker direction is a forward direction of a housing of the speaker direction determination apparatus 10, and translates the second language into the first language when the speaker direction is an upward direction of the housing of the speaker direction determination apparatus 10. The first language may be English, for example, and the second language may be Japanese, for example.
Angles 0 degree, 90 degrees, and −90 degrees are examples of sound incidence angle. For example, when the sound incidence angle is 90 degree or −90 degree, the incidence direction of sound is parallel to the front surface of the housing, and when the sound incidence angle is 0 degree, the incidence direction of sound is orthogonal to the front surface of the housing.
The units included in the speaker direction determination unit 20A may be formed as individual hardware circuits configured by wired logic. The units included in speaker direction determination unit 20A may be implemented as one integrated circuit formed by integrating circuits corresponding to the units. The integrated circuit may be an integrated circuit such as an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or the like. The units in the speaker direction determination unit 20A each may be a functional module implemented by a computer program executed on a processor of the speaker direction determination unit 20A.
The first time-frequency conversion unit 23 converts the time-domain sound signal acquired by the first sound acquisition unit 21 into a frequency-domain sound signal. Conversion of the time-domain sound signal into the frequency-domain sound signal may be fast Fourier transformation (FFT), for example. The second time-frequency conversion unit 24 converts the time-domain sound signal acquired by the second sound acquisition unit 22 into a frequency-domain sound signal.
The phase difference estimation unit 25 that is an example of a physical quantity acquisition unit estimates a phase difference between the frequency-domain sound signal converted by the first time-frequency conversion unit 23 and the frequency-domain sound signal converted by the second time-frequency conversion unit 24. The phase difference that is an example of the physical quantity is a time difference in sound from a sound source to a microphone in the frequency domain, and an amplitude in the case where a sound signal is expressed as a complex number.
A phase difference dp(k) is estimated according to an equation (1), for example. dp(k) is a phase difference between a frequency-domain sound signal in a kth (k=0, 1, . . . , k−1) frequency band, which is converted by the first time-frequency conversion unit 23, and a frequency-domain sound signal in the kth frequency band, which is converted by the second time-frequency conversion unit 24. k may be 256, for example.
θ1(k) is a phase spectrum of the sound signal in the kth frequency band, which is converted by the first time-frequency conversion unit 23, θ2(k) is a phase spectrum of the sound signal in the kth frequency band, which is converted by the second time-frequency conversion unit 24, and are calculated according to a formula (2).
θ1(k)=arg(z1(k))=a tan(Im1(k)/Re1(k))
θ2(k)=arg(z2(k))=a tan(Im2(k)/Re2(k)) (2)
As expressed as a formula (3), z1(k) is the frequency-domain sound signal in the kth frequency band, which is converted by the first time-frequency conversion unit 23, expressed as a complex number, Re1(k) is a real part of the complex number, and Im1(k) is an imaginary part of the complex number. z2(k) is the frequency-domain sound signal in the kth frequency band, which is converted by the second time-frequency conversion unit 24, expressed as a complex number, Re2(k) is a real part of the complex number, and Im2(k) is an imaginary part of the complex number.
z
1(k)=Re1(k)+i Im1(k)
z
2(k)=Re2(k)+i Im2(k) (3)
The inclination acquisition unit 26 that is an example of an inclination information acquisition unit acquires a value indicating an inclination with respect to a reference position of the housing 11 of the speaker direction determination apparatus 10 from an inclination detection sensor such as an acceleration sensor disposed in the housing 11 of the speaker direction determination apparatus 10. As illustrated in
The acceleration sensor is a two or more-axis sensor without DC components being cut. A gyro sensor or a magnetic field sensor may be used in place of the acceleration sensor. The inclination of the housing 11 of the speaker direction determination apparatus 10 when worn by the user, which varies according to the body shape of the user wearing the speaker direction determination apparatus 10, may be measured and recorded in advance.
The determination boundary correction unit 28 corrects a speaker direction determination boundary that is an example of a threshold, based on the value indicating the inclination with respect to the reference position of the housing 11 of the speaker direction determination apparatus 10, which is acquired by the inclination acquisition unit 26. The speaker direction determination boundary varies with the case where the housing 11 of the speaker direction determination apparatus 10 is not inclined with the reference position as illustrated in
When the housing 11 is not inclined with respect to the reference position, the determination boundary is, for example, an estimated phase difference DB00 of the reference model in the case where the sound incidence angle is A00. When the estimated phase difference is equal to or smaller than DB00, the speaker direction is determined to be the upward direction. When the estimated phase difference is larger than DB00, the speaker direction is determined to be the forward direction.
When the housing 11 is inclined with respect to the reference position, the determination boundary is corrected to an estimated phase difference DB01 of the reference model at a sound incidence angle A01 corresponding to the inclination with respect to the reference position. When the estimated phase difference is equal to or smaller than DB01, the speaker direction is determined to be the upward direction, and when the estimated phase difference is larger than DB01, the speaker direction is determined to be the forward direction. As the inclination of the housing 11 with respect to the reference position becomes larger, the corrected determination boundary is further deviated from the uncorrected determination boundary.
The noise level estimation unit 27 that is an example of a noise information acquisition unit estimates noise level that is level of noise contained in sound acquired by the first sound acquisition unit 21 and the second sound acquisition unit 22. The noise level is an example of noise information. The noise level may be estimated according to any suitable existing method. The noise level may be an average of sound pressure in a non-speech section. The noise level may be calculated using the time-domain sound signal, and the average may be any of an arithmetic average, a geometric average, a harmonic average, and a moving average.
The model correction unit 29 that is an example of a model generation unit and a threshold setting unit corrects the reference model based on the estimated noise level to generate a correction model. As the surrounding noise level becomes larger, as illustrated in
In
When stationary noise exists in surroundings, as expressed as a formula (4), the phase spectra θt1(k) and θt2(k) each includes noise component zN(k).
θt1(k)=arg(z1(k)+zN(k))
θt2(k)=arg(z2(k)+zN(k)) (4)
In the phase difference expressed as a formula (5), as expressed as a formula (6), as the noise component zN(k) comes closer to ∞, the phase difference gets closer to 0.
When the noise level of surrounding stationary noise becomes large, the phase difference of target sound is buried, such that the sound phase difference approaches the phase difference of stationary noise.
The model correction unit 29 adjusts a correction amount of the determination boundary based on the noise level estimated by the noise level estimation unit 27. Describing in detail, the correction amount is adjusted such that the determination boundary comes closer to the uncorrected determination boundary as the noise level increases.
As illustrated in
A formula (7) illustrates the correction model.
φ=f(α(np)*ap+(1−α(np))*pz) (7)
φ is the sound incidence angle, α( ) is a function for calculating a control parameter that depends on the noise level, np is the noise level, ap is the estimated phase difference, and pz is the estimated phase difference at the fixed point FP.
pz=0.0
f(ap)=9.0*ap+40.0
α(np)=0.156*np−7.8 (8)
ap is the estimated phase difference, and in more detail, may be an average value of the estimated phase differences from a highest frequency band to a lowest frequency band. np is the noise level, and the estimated phase difference pz at the fixed point FP may be set in advance. The functions f( ) and α( ) are previously derived by statistical regression. The functions f( ) and α( ) may be derived using any of linear function, trigonometric function, and machine learning. Data on the reference model may be previously stored in a table or the like.
When the noise level np is 60 [dBA], the relation: α(60)=0.156*60−7.8=1.56 holds, and a function fd(ap) indicating a correction model AM is expressed as a formula (9).
For example, when the correction model AM has a larger inclination than the reference model OM (14.04>9.0) and the estimated phase difference ap is 0, the correction model AM has the same sound incidence angle as the reference model OM (40.0 [degrees]).
When the inclination of the housing 11 of the speaker direction determination apparatus 10 with respect to the reference position is θ [degree], a determination boundary Th(θ) of the reference model OM is expressed as a formula (10).
Th(θ)=f−1(f(Th0)−θ) (10)
Th0 is a determination boundary in the case where the housing 11 of the speaker direction determination apparatus 10 is located at the reference position. In the case of Th0=0.0, Th(θ)=−0.11θ, and when the inclination of the housing 11 of the speaker direction determination apparatus 10 with respect to the reference position is −10 [degree], a relation: Th(−10)=1.1 [rad] holds.
When the inclination of the housing 11 of the speaker direction determination apparatus 10 with respect to the reference position is θ [degree], a determination boundary Thd(θ) of the correction model AM is expressed as a formula (11).
Thd(θ)=fd−1(fd(Thd0)−θ) (11)
Thd0 is a determination boundary in the case where the housing 11 of the speaker direction determination apparatus 10 is located at the reference position. In the case of Thd0=0.0, Thd(θ)=−0.07θ, and when the inclination of the housing 11 of the speaker direction determination apparatus 10 with respect to the reference position is −10 [degree], a relation: Thd(−10)=0.71 [rad] holds. Thus, the determination boundary in the correction model AM shifts from the determination boundary of 1.1 [rad] corrected based on the inclination of the housing 11 with respect to the reference position in the reference model OM toward the determination boundary of 0.0 [rad] before correction based on the inclination of the housing 11.
The direction determination unit 31 that is an example of a determination unit compares the determination boundary set by the mode correction unit 29, for example, the estimated phase difference corresponding to the inclination of the housing 11 with respect to the reference position in the correction model with the phase difference estimated by the phase difference estimation unit 25, thereby determining the speaker direction. The direction of the reference position is not limited to the above-mentioned direction of gravity acceleration, and may be any predetermined direction. The predetermined direction may be a direction along a vertical center line of the housing in the normal position when worn by the user, and previously set by measurement. The predetermined direction may be specified by an angle difference from the direction of gravity acceleration.
The CPU 51, the primary storage unit 52, the secondary storage unit 53, and the external interface 54 are interconnected via a bus 59.
The primary storage unit 52 is a nonvolatile memory such as a random-access memory (RAM).
The secondary storage unit 53 includes a program storage area 53A and a data storage area 53B. As an example, the program storage area 53A stores a program such as a speaker direction determination program that causes the CPU 51 to execute speaker direction determination processing. For example, the data storage area 538 stores a value of the inclination of the housing 11 worn by a particular user with respect to the reference position, data on the reference model, and intermediate data temporarily generated in the speaker direction determination processing.
The CPU 51 reads the speaker direction determination program from the program storage area 53A, and expands the read program in the primary storage unit 52. The CPU 51 loads and executes the speaker direction determination program, thereby functioning as the first sound acquisition unit 21, the second sound acquisition unit 22, the first time-frequency conversion unit 23, the second time-frequency conversion unit 24, the phase difference estimation unit 25, the inclination acquisition unit 26, and the noise level estimation unit 27 in
The program such as the speaker direction determination program may be stored in a non-transitory recording medium such as a digital versatile disc (DVD), read via a recording medium reader, and expanded in the primary storage unit 52.
An external device is coupled to the external interface 54, and the external interface 54 causes the external device to exchange various information with the CPU 51. For example, the external interface 54 is coupled to the first mic M01 and the second mic M02.
Next, actions of the speaker direction determination apparatus 10 are summarized. The flow of actions of the speaker direction determination apparatus 10 is summarized in
In a step 102, the CPU 51 applies time-frequency conversion to each of the sound signal in the step 101. In a step 103, the CPU 51 estimates a phase difference between the first sound signal and the second sound signal, which are converted into the frequency-domain sound signal. In a step 104, the CPU 51 uses the noise level of at least one of the first sound signal and the second sound signal to correct the reference model, generating the correction model.
In a step 105, the CPU 51 sets a value corrected by applying the inclination of the housing 11 of the speaker direction determination apparatus 10 with respect to the reference position to the correction model generated in the step 104, as the determination boundary. In a step 106, the CPU 51 determines whether or not the estimated phase difference is equal to or smaller than the determination boundary. When the determination in the step 106 is affirmative, for example, the estimated phase difference is equal to or smaller than the determination boundary, it is determined that the speaker is present above, the CPU 51 proceeds to a step 108. In the step 108, the CPU 51 routes the sound signal to processing of translating the second language into the first language, and proceeds to a step 110.
When the determination in the step 106 is negative, for example, the estimated phase difference is larger than the determination boundary, it is determined that the speaker is present ahead, and proceeds to a step 109. In the step 109, the CPU 51 routes the sound signal to processing of translating the first language into the second language, and proceeds to a step 110. The routed sound signal is translated from the second language into the first language by an existing voice translation technique and for example, output as voice from a speaker.
In the step 110, the CPU 51 determines whether or not the user turns off the speaker direction determination function of the speaker direction determination apparatus 10. When the determination in the step 110 is negative, for example, speaker direction determination function is turned on, the CPU 51 returns to the step 101, reads sound signals in a next frame, and continues the speaker direction determination processing. When the determination in the step 110 is affirmative, for example, the speaker direction determination function is turned off, the CPU 51 finishes the speaker direction determination processing.
An object of the present embodiment is to properly determine the speaker direction. When the speaker direction is determined by comparing the phase difference between the frequency-domain sound signals corresponding to sound acquired by the plurality of microphones with the threshold, to properly determine the speaker direction, the threshold may be adjusted based on the inclination of the housing of the speaker direction determination apparatus with respect to the reference position. The inventors deem that the phase difference is affected by noise and reduced in a high-noise environment, possibly failing to properly determine the speaker direction.
In the present embodiment, inclination information indicating the inclination of the housing including the plurality of microphones with respect to the reference position is acquired, and noise information on noise contained in at least one of the plurality of sound signals acquired by the plurality of microphones is acquired. Based on the plurality of sound signals acquired by the plurality of microphones, the physical quantity indicating at least one of the phase difference and the sound pressure difference is acquired. The reference model indicates correspondence between the sound incidence angle onto the plurality of microphones in the case where the housing is located at the reference position, and the physical quantity acquired in the case where the housing is located at the reference position. The physical quantity in the correspondence in the reference model is corrected to correspond to the noise level indicated by the acquired noise information to generate the correction model. In the correction model, the physical quantity corresponding to the sound incidence angle associated with the inclination indicated by the acquired inclination information is set as a threshold. The speaker direction that is the direction in which the speaker making a speech corresponding to the plurality of sound signals acquired by the plurality of microphones is present is determined by comparing the acquired physical quantity with the set threshold.
In the present embodiment, even when the housing of the speaker direction determination apparatus is inclined with respect to the reference position in high-noise environment, the speaker direction may be properly determined.
A second embodiment is different from the first embodiment in that the model is corrected using a signal-to-noise ratio (hereinafter referred to as SNR) in place of the noise level. The SNR is an example of the noise information. Description of the same configuration and actions as those of the first embodiment is omitted.
SNR=vp−np (11)
vp is a sound pressure level in a speech section, and np is the noise level.
A formula (12) illustrates the correction model. α2( ) is a control parameter that depends on SNR, and is derived by statistical regression using any of linear function, trigonometric function, machine learning, or the like. α2( ) may be stored in a table or the like in advance.
φ=f(α2(SNR)*ap+(1−α2(SNR))*pz) (12)
In the second embodiment, the correction model is generated such that the determination boundary shifts from the determination boundary corrected based on the inclination of the housing 11 with respect to the reference position toward the uncorrected determination boundary as SNR becomes smaller. This is due to that as SNR decreases, the noise level increases.
In the present embodiment, inclination information indicating the inclination of the housing including the plurality of microphones with respect to the reference position is acquired, and noise information on noise contained in at least one of the plurality of sound signals acquired by the plurality of microphones is acquired. Based on the plurality of sound signals acquired by the plurality of microphones, the physical quantity indicating at least one of the phase difference and the sound pressure difference is acquired. The reference model indicates correspondence between the sound incidence angle onto the plurality of microphones in the case where the housing is located at the reference position, and the physical quantity acquired in the case where the housing is located at the reference position. The physical quantity in the correspondence in the reference model is corrected to correspond to the noise level indicated by the acquired noise information to generate the correction model. In the correction model, the physical quantity corresponding to the sound incidence angle associated with the inclination indicated by the acquired inclination information is set as a threshold. The speaker direction that is the direction in which the speaker making a speech corresponding to the plurality of sound signals acquired by the plurality of microphones is present is determined by comparing the acquired physical quantity with the set threshold.
In the present embodiment, even when the housing of the speaker direction determination apparatus is inclined with respect to the reference position in high-noise environment, the speaker direction may be properly determined.
A third embodiment is different from the first and second embodiments in that the estimated phase difference is corrected instead of generating the correction model to set the corrected determination boundary. The description of the configuration and operation that are substantially the same as those of the first and second embodiments will be omitted.
The phase difference correction unit 30 is an example of a model generation unit, a threshold setting unit, and a physical quantity generation unit, and as illustrated as a formula (13), calculates a correction phase difference apa.
apa=α(np)*ap+(1−α(np))*pz−Th(θ)+Th0 (13)
In the present embodiment, the speaker direction is determined by comparing the correction phase difference apa with the determination boundary, for example, the estimated phase difference corresponding to the inclination of the housing 11 of the speaker direction determination apparatus 10 with respect to the reference position in the reference model.
In the present embodiment, inclination information indicating the inclination of the housing including the plurality of microphones with respect to the reference position is acquired, and noise information on noise contained in at least one of the plurality of sound signals acquired by the plurality of microphones is acquired. Based on the plurality of sound signals acquired by the plurality of microphones, the physical quantity indicating at least one of the phase difference and the sound pressure difference is acquired. The reference model indicates correspondence between the sound incidence angle onto the plurality of microphones in the case where the housing is located at the reference position, and the physical quantity acquired in the case where the housing is located at the reference position. The physical quantity in the correspondence in the reference model is corrected to correspond to the noise level indicated by the acquired noise information to generate the correction model. The physical quantity corresponding to the sound incidence angle associated with the inclination indicated by the inclination information acquired in the correction model is set as a threshold. The acquired physical quantity is corrected such that a relation between the physical quantity corresponding to the sound incidence angle associated with the inclination indicated by the inclination information acquired in the reference model and a reference threshold becomes equal to a relation between the acquired physical quantity and the set threshold, to generate a correction physical quantity. The speaker direction that is the direction in which the speaker making a speech corresponding to the plurality of sound signals acquired by the plurality of microphones is present is determined by comparing the generated correction physical quantity with the reference threshold.
In the present embodiment, even when the housing of the speaker direction determination apparatus is inclined with respect to the reference position in high-noise environment, the speaker direction may be properly determined.
A fourth embodiment is different from the first embodiment in that the speaker direction is determined by using an estimated sound pressure difference instead of using the estimated phase difference to determine the speaker direction. Description of the same configuration and actions as those of the first to third embodiments is omitted.
The sound pressure difference estimation unit 25D that is an example of a physical quantity acquisition unit calculates an estimated sound pressure difference dpo(k) in a kth (k=0, 1, . . . , k−1) frequency band as expressed as a formula (14). k may be 256, for example. The estimated sound pressure difference is an example of the physical quantity. The estimated sound pressure difference dpo(k) is, for example, a difference between sound pressure power P1(k) of the frequency-domain sound signal corresponding to sound acquired by the first mic M01 and sound pressure power P2(k) of the frequency-domain sound signal corresponding to sound acquired by the second mic M02.
As expressed as a formula (15), z1(k) is the sound signal in the kth frequency band, which is converted by the first time-frequency conversion unit 23, expressed as a complex number, Re1(k) is a real part of the complex number, and Im1(k) is an imaginary part of the complex number. z2(k) is the frequency-domain sound signal in the kth frequency band, which is converted by the second time-frequency conversion unit 24, expressed as a complex number, Re2(k) is a real part of the complex number, and Im2(k) is an imaginary part of the complex number.
z
1(k)=Re1(k)+i Im1(k)
z
2(k)=Re2(k)+i Im2(k) (15)
In the fourth embodiment, the estimated phase difference dp(k) in the first to third embodiments is replaced with the estimated sound pressure difference dpo(k). The model indicating the relation between the sound incidence angle and the estimated phase difference in the first to third embodiments is replaced with a model indicating a relation between the sound incidence angle and the estimated sound pressure difference in
When surrounding stationary noise exists, as expressed as a formula (16), power spectra Pt1(k) and Pt2(k) contain a noise component zN(k).
P
t1(k)=10 log10(|z1(k)+zN(k)|2)
P
t2(k)=10 log10(|z2(k)+zN(k)|2) (16)
Thus, as expressed as a formula (17), the estimated sound pressure difference also contains the noise component zN(k).
In the formula (17), as expressed as a formula (18), as the noise component zN(k) comes closer to ∞, the sound pressure difference approaches 0.
For example, when surrounding stationary noise is large, the sound pressure difference of the target sound is buried, resulting in that the estimated sound pressure difference of the sound comes close to the sound pressure difference of stationary noise.
A correction model p in the case of the reference model of φD=fD(apo) is expressed as a formula (19).
φD=fD(αD(np)*apo+(1−αD(np))*poz) (19)
apo is the estimated sound pressure difference, and poz is the estimated sound pressure difference at the fixed point. The estimated sound pressure difference apo may be an average value of sound pressure differences from the highest frequency band to the lowest frequency band, and the sound pressure difference poz at the fixed point may be 0, for example. fD( ) and αD( ) are previously derived by statistical regression. fD( ) and αD( ) may be derived using any of linear function, trigonometric function, and machine learning.
In the step 103E, the CPU 51 estimates the sound pressure difference, for example, according to the formula (14), and in the step 106E, it is determined whether or not the sound pressure difference is equal to or smaller than the determination boundary. When the determination in the step 106 is affirmative, the CPU 51 proceeds to the step 108, and when the determination in the step 106 is negative, the CPU 51 proceeds to the step 109.
A sound pressure difference estimation unit may be added to the phase difference estimation unit in the first and second embodiments, and a sound pressure difference correction unit may be added to the phase difference correction unit in the third embodiment. In this case, the speaker direction may be determined using both the phase difference and the sound pressure difference.
The CPU 51 estimates the sound pressure difference in the step 103E, and estimates the phase difference in the step 103. In the step 106E, the CPU 51 determines whether or not the sound pressure difference estimated in the step 103E is equal to or smaller than the determination boundary of the sound pressure difference found by applying the inclination of the housing 11 of the speaker direction determination apparatus 10 to the correction model generated in the step 104, which indicates the relation between the sound incidence angle and the estimated sound pressure difference.
When the determination in the step 106E is affirmative, the CPU 51 proceeds to the step 106. In the step 106, the CPU 51 determines whether or not the phase difference estimated in the step 103 is equal to or smaller than the determination boundary of the phase difference found by applying the inclination of the housing 11 of the speaker direction determination apparatus 10 to the correction model generated in the step 104, which indicates the relation between the sound incidence angle and the estimated phase difference.
When the determination in the step 106 is affirmative, for example, the speaker direction is determined to be the upward direction, the CPU 51 proceeds to the step 110. When the determination in the step 106E is negative or the determination in the step 106 is negative, for example, the speaker direction is determined to be the forward direction, the CPU 51 proceeds to the step 109.
By combining the estimated phase difference with the estimated sound pressure difference, even when either of them is not property estimated, the speaker direction may be property determined. The processing in
In the present embodiment, inclination information indicating the inclination of the housing including the plurality of microphones with respect to the reference position is acquired, and noise information on noise contained in at least one of the plurality of sound signals acquired by the plurality of microphones is acquired. Based on the plurality of sound signals acquired by the plurality of microphones, the physical quantity indicating at least one of the phase difference and the sound pressure difference is acquired. The reference model indicates correspondence between the sound incidence angle onto the plurality of microphones in the case where the housing is located at the reference position, and the physical quantity acquired in the case where the housing is located at the reference position. The physical quantity in the correspondence in the reference model is corrected to correspond to the noise level indicated by the acquired noise information to generate the correction model. In the correction model, the physical quantity corresponding to the sound incidence angle associated with the inclination indicated by the acquired inclination information is set as a threshold. The speaker direction that is the direction in which the speaker making a speech corresponding to the plurality of sound signals acquired by the plurality of microphones is present is determined by comparing the acquired physical quantity with the set threshold.
In the present embodiment, even when the housing of the speaker direction determination apparatus is inclined with respect to the reference position in high-noise environment, the speaker direction may be properly determined.
The number of the microphones are two in the above-mentioned embodiments, but is not limited to two in the present embodiment, and may be three or more. For example, the speaker direction determination apparatus may be spherical, and the microphones may be disposed on the spherical surface at regular intervals. The determination result of the speaker direction is used for translation in the above-mentioned embodiments, but the present embodiment is not limited to this. For example, in generating a minute book, the minute book may be generated by determining the speaker based on the determination result of the speaker direction.
The flowcharts in
In the case where the determination boundary is not changed, as represented at the left end in
All examples and conditional language provided herein are intended for the pedagogical purposes of aiding the reader in understanding the invention and the concepts contributed by the inventor to further the art, and are not to be construed as limitations to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although one or more embodiments of the present invention have been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.
Number | Date | Country | Kind |
---|---|---|---|
2019-107707 | Jun 2019 | JP | national |