The present disclosure relates to a signal processing apparatus, a signal processing method, a signal processing program, a signal processing model production method, and a sound output device.
Recent spread of portable audio players promotes the spread of noise reduction systems that provide listeners (users) with good reproduced sound field spaces having reduced external environment noise, for sound output devices (e.g., headphones, earphones, etc.) used for the portable audio players.
In relation to the above technology, a technology has become widespread to suppress noise at a user's eardrum position by using a noise canceling (NC) filter.
However, the conventional technology has room for promoting further improvement in usability. For example, in the conventional technology, although a signal at the eardrum position is sometimes required to maximize an amount of NC effect at the eardrum position, it is difficult to achieve arrangement of a microphone at the eardrum position due to the specifications of a product.
Therefore, the present disclosure proposes a new and improved signal processing apparatus, signal processing method, signal processing program, signal processing model production method, and sound output device that are configured to promote further improvement in usability.
According to the present disclosure, a signal processing apparatus is provided that includes: an acquisition unit that acquires an acoustic characteristic in a user's ear, isolated from the outside world; an NC filter unit that generates sound data having a phase opposite to an ambient sound leaking into the user's ear; a correction unit that corrects the sound data by using a correction filter; and a determination unit that determines a filter coefficient of the correction filter based on the acoustic characteristic.
Preferred embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings. Note that in the present description and the drawings, component elements having substantially the same functional configurations are denoted by the same reference numerals and symbols, and redundant descriptions thereof will be omitted.
Note that the description will be given in the following order.
1. One embodiment of present disclosure
1.1. Introduction
1.2. Personalized NC optimization
1.3. Configuration of signal processing system
2. Function of signal processing system
2.1. First DNN
2.2. Second DNN
2.3. Third DNN
2.4. Correction filter estimation process
2.5. Fourth DNN
2.6. Procedure of process
2.7. Storage of and reference to correction filter
2.8. Fifth DNN
2.9. Sixth DNN
2.10. Exemplary functional configuration
2.11. Process in signal processing system
2.12. Variations of processing
3. Exemplary hardware configuration
4. Conclusion
<1.1. Introduction>
Physical characteristics, such as the head shape or ear size of a user, or external factors, such as the presence or absence of glasses or a hat can cause difference in volume or air density such as inside headphones. Therefore, when sound produced from a signal obtained after application of a noise reduction signal reaches a user's ear, the characteristics of the signal are changeable according to the volume and air density such as in the headphones, and thus, the characteristics of the signal may be different for different users. When the sound produced from the signal obtained after application of the noise reduction signal reaches the user's ear, the characteristics of the signal may be changed depending on a wearing condition of the headphones and the like, as well.
An NC filter having standard specifications (default) (hereinafter, appropriately referred to as “a default”) that is mounted on a product may be defined according to a standard head shape or wearing condition in design. For this reason, during use by the user, an error may occur in the head shape and the wearing condition as compared with the default, and thus, no optimal NC effect is obtained in some cases. Therefore, there is room for promoting further improvement in usability.
Therefore, the present disclosure proposes a new and improved signal processing apparatus, signal processing method, and signal processing model production method that are configured to promote further improvement in usability.
<1.2. Personalized NC Optimization>
Personalized NC optimization will be described first.
Next, an overview of a function for the personalized NC optimization will be described. In
(In the formula, F1 default represents the acoustic characteristic F1 in design, H1 default represents the device characteristic H1 in design.)
The device characteristic H1 and the acoustic characteristic F1 can be different between users. Therefore, it is also possible to perform the personal optimization by focusing on the device characteristic H1 and correcting H1 default M1 (hereinafter, appropriately referred to as “H1M1 characteristics”) included in the above Formula (1) between the users. However, this personal optimization requires arrangement of a microphone near an eardrum, making it difficult to measure the device characteristic H1 in use environments of the users. Therefore, in the present embodiment, for example, the device characteristic H1 is estimated on the basis of similarity between the device characteristic H1 and the device characteristic H2, focusing on the device characteristic H2.
Next, an overview of a function for the personalized NC optimization during use will be described with reference to
Next, HM characteristics included in the above formula (1) will be described with reference to
Here, as described above, it is difficult to measure H1M characteristic data illustrated in
Next, simulation of the NC effect will be described with reference to
Here, in
Hereinafter, in the embodiment, estimation of the correction filter using the machine learning such as deep neural network (DNN) will be described. Use of the machine learning such as DNN makes it possible to appropriately estimate the correction filter according to the shape of the user's head, the wearing condition, an external ambient sound, and the like without band limitation. This configuration makes it possible for a signal processing apparatus 10 to achieve the NC optimization with a higher degree of freedom in a wider band. Note that the DNN appearing in the embodiment is an example of artificial intelligence.
Hereinafter, in the embodiment, a description will be given of DNN (hereinafter, appropriately referred to as “correction filter coefficient estimation DNN” or “first DNN”) that inputs the H2M characteristic measured by the FBNC microphone and outputs a coefficient of the correction filter (correction filter coefficient) for optimally correcting a noise-cancelling signal generated on the basis of measurement data measured by the FFNC microphone. Note that the first DNN is not limited to outputting the correction filter coefficient for correcting the noise-cancelling signal, but may output a correction filter coefficient for optimally correcting a filter that generates the noise-cancelling signal on the basis of the measurement data measured by the FFNC microphone. Furthermore, a description will be given below of DNN (hereinafter, appropriately referred to as “correction determination DNN” or “second DNN”) that determines necessity/unnecessity of the correction in a case where insufficient amount of NC effect is provided upon optimization or in a case where insufficient amount of NC effect is provided upon correction due to large leak-in.
Hereinafter, the correction filter according to the embodiment may have, for example, a finite impulse response (FIR) where an impulse response is finite.
Hereinafter, the corrected filter according to the embodiment may be, for example, the α default to which the correction filter at a target time point such as the point of use is applied.
Hereinafter, in the embodiment, estimation of the amount of NC effect in an environment set according to a JEITA standard will be described, but the amount of NC effect not only in addition to the environment set according to JEITA but also in an environment set by another standard may be estimated. The signal processing apparatus 10 is configured to estimate the effect of optimization by estimating the amount of NC effect, and therefore, it is possible to determine whether to perform the optimization.
Hereinafter, in the embodiment, a headphones 20 will be described as an example of the sound output device.
<1.3. Configuration of Signal Processing System>
A configuration of a signal processing system 1 according to the embodiment will be described.
The signal processing apparatus 10 and the headphones 20 may be separately provided as a plurality of computer hardware devices on so-called on-premises, edge server, or cloud, or the functions of a plurality of any devices of the signal processing apparatus 10 and the headphones 20 may be provided as the same device. For example, the signal processing apparatus 10 and the headphones 20 may be devices configured so that the signal processing apparatus 10 and the headphones 20 function integrally and communicate with an external information processing device. Furthermore, the signal processing apparatus 10 and the headphones 20 are configured so that the user may perform mutual information/data communication with the signal processing apparatus 10 and the headphones 20 via a user interface (including a graphical user interface: GUI) and software (including a computer program (hereinafter, also referred to as a program)) that operate on a terminal device (personal device, such as a personal computer (PC) or a smartphone, including a display as an information display device and a voice and keyboard input) which is not illustrated.
(1) Signal Processing Apparatus 10
The signal processing apparatus 10 is an information processing apparatus that performs processing of determining the coefficient of the correction filter (filter coefficient) for performing optimal NC for an individual user. Specifically, the signal processing apparatus 10 acquires an acoustic characteristic in a user's ear, isolated from the outside world. Then, the signal processing apparatus 10 generates sound data having a phase opposite to the ambient sound leaking into the user's ear, and corrects the sound data by using the correction filter. Furthermore, the signal processing apparatus 10 determines the correction filter coefficient on the basis of the acoustic characteristic. This configuration makes it possible for the signal processing apparatus 10 to estimate the correction filter coefficient for optimization without requiring a signal at the eardrum position. Furthermore, the signal processing apparatus 10 can achieve processing for optimization without relying on the experience or knack of a designer. For this reason, the signal processing apparatus 10 has room for promoting further improvement in usability.
Furthermore, the signal processing apparatus 10 also has a function of controlling the overall operation of the signal processing system 1. For example, the signal processing apparatus 10 controls the overall operation of the signal processing system 1 on the basis of information shared between the apparatus and device. Specifically, the signal processing apparatus 10 determines the correction filter coefficient for optimization on the basis of information received from the headphones 20.
The signal processing apparatus 10 is implemented by a personal computer (PC), a server (Server), or the like. Note that the signal processing apparatus 10 is not limited to the PC, server, or the like. For example, the signal processing apparatus 10 may be a computer hardware device such as a PC or a server in which functions as the signal processing apparatus 10 are implemented as an application.
(2) Headphones 20
The headphones 20 are used by the user to listen to sound. The headphones 20 may be, not limited to the headphones, any sound output device, as long as the sound output device has a driver and a microphone and isolates a space including a user's eardrum from the outside world. For example, the headphones 20 may be earphones.
For example, the headphones 20 collect measurement sound output from the driver with the microphones.
The configuration of the signal processing system 1 has been described above. Next, the functions of the signal processing system 1 will be described. Note that the functions of the signal processing system 1 include a function of estimating the correction filter coefficient for correcting the α default to perform optimal NC for an individual user, and a function of determining whether to perform the optimal NC for the individual user.
<2.1. First DNN>
In the first DNN, the H2 user M2 characteristic based on the signal collected by the second microphone is input, and the correction filter coefficient is output. In the first DNN, optimization using Adam is performed as an example of an optimization method. In the first DNN, the correction filter coefficient based on H1 user M3 is used as training data. Here, in the first DNN, for example, a gradient method may be used to obtain the correction filter coefficient where a result of the simulation of NC satisfies the minimum, and the correction filter coefficient may be used as the training data. In the first DNN, the training data is used in which this correction filter coefficient is output and the H2 user M2 characteristic is input. The first DNN may use a loss function to transform both of the training data and estimation data into the frequency characteristic by Fast Fourier transform (FFT) and then use a common low-pass filter to calculate an average (average value) from a sum of absolute values of differences in the respective bands.
<2.2. Second DNN>
In the second DNN, the acoustic characteristic (e.g., a time signal of the impulse response and a frequency signal obtained by FFT) based on the signal collected by the second microphone and a corrected filter coefficient are input, and whether to perform correction is output. In the second DNN, optimization using the Adam is performed as an example of an optimization method. In the second DNN, a loss function based on cross entropy is used. In the second DNN, the simulation of NC is performed using the H2 user M2 characteristic, the microphone characteristic M1, the microphone characteristic M3, and the corrected filter coefficient. Then, the second DNN uses, as the training data, data labeled to indicate whether to perform correction, on the basis of whether the amount of NC effect that is the correction effect obtained as the result of simulation is equal to or more than a predetermined threshold. Here, the amount of NC effect represents a suppression amount that is obtained when the sound pressure at the eardrum position is compared between the exposure state in which the headphones 20 are not worn and an NC effective state, using a predetermined noise sound source and noise environment. For example, the signal processing system 1 may perform ⅓ octave band analysis for each of the exposed state in which the headphones 20 are not worn and the NC effective state, performing processing with the suppression amount and a noise suppression ratio in each band as the amount of NC effect.
Next, optimization based on the noise suppression ratio will be described. Here, the functions of the signal processing system 1 include a function of estimating whether noise is suppressed by correcting the NC filter. The signal processing system 1 uses DNN (hereinafter, appropriately referred to as “noise suppression ratio estimation DNN” or “third DNN”) that outputs the noise suppression ratio, thereby estimating whether noise is suppressed. The third DNN will be described below.
<2.3. Third DNN>
In the third DNN, the H2 user M2 characteristic, the H2M2 characteristic, and the α default are input, and the noise suppression ratio is output. In the third DNN, optimization using the Adam is performed as an example of an optimization method. In the third DNN, a loss function based on a mean square error is used.
<2.4. Correction Filter Estimation Process>
Next, described is estimation of a correction filter that corrects a difference based on the acoustic characteristic of an ambient sound measured. Here, the correction filter that corrects an error in the wearing condition of the user, as described above, is appropriately referred to as “wearing error correction filter” or “first correction filter”. Furthermore, the correction filter that corrects the difference based on the acoustic characteristic of the ambient sound is appropriately referred to as “ambient sound difference correction filter” or “second correction filter”. Here, in a case where the first correction filter is estimated, there is a possibility that the noise drowns the measurement sound unless the environment is quiet to some extent. In a case where the second correction filter is estimated, in some cases, a somewhat loud noise may be desirable to facilitate measurement of the characteristics of the ambient sound. Therefore, the signal processing system 1 determines whether which one of the first correction filter and the second correction filter is to be estimated according to a noise level of the ambient sound.
<2.5. Fourth DNN>
In the fourth DNN, a signal collected by the first microphone and the corrected filter at the target time point are input, and the second correction filter coefficient is output. In the fourth DNN, optimization using the Adam is performed as an example of an optimization method. In the fourth DNN, H1M3 and an acoustic characteristic F1 user are used to measure a surrounding sound field on the basis of various ambient sounds. In this configuration, the signal processing system 1 estimates an optimal filter coefficient based on the signal collected by the first microphone and a signal collected by the third microphone. Then, the signal processing system 1 uses, for example, a gradient method to estimate a correction filter coefficient for correcting a difference between the α default and the optimal filter coefficient. Then, the signal processing system 1 generates the training data in which the signal collected by the first microphone is input and the estimated correction filter coefficient is output. In the fourth DNN, after both the training data and the estimation data are weighted for each frequency band by using the loss function, an average may be calculated from a sum of an amplitude and a phase distance of each band. Here, the weighting for each frequency band is, for example, weighting based on exclusion of a high frequency band from which the NC effect using a low pass filter cannot be expected or exclusion of a low frequency band that has low frequency resolution using a high pass filter.
<2.6. Procedure of Process>
<2.7. Storage of and Reference to Correction Filter>
Next, procedures of processes of storing and referring to the correction filter will be described with reference to
Subsequently,
The signal processing apparatus 10 compares the amount of NC effect between two values of “O. Unknown” and “P. No” in “C. Bus” to update the first correction filter (S21). Here, the signal processing apparatus 10 compares the amount of NC effect of “0.60” in “O. Unknown” with the amount of NC effect of “0.70” in “P. No”, and updates the first correction filter to the correction filter (p), because the amount of NC effect of “P. No” is larger. Subsequently, the signal processing apparatus 10 stores the amount of NC effect obtained when the headphones 20, which are kept worn, is used in “C. Bus” by using the updated first correction filter (S22). Here, the amount of NC effect of “0.68” is stored in “C. Bus” during the state of “P. No”. Subsequently, the signal processing apparatus 10 measures the amount of NC effect obtained when using the headphones 20, which are kept worn, in “B. Train”, and compares the amount of NC effect with the amount of NC effect obtained when the headphones 20 are used in “C. Bus” (S23). The amount of NC effect is larger in “B. Train”, and therefore, the signal processing apparatus 10 overwrites the amount of NC effect. A condition of the ambient sound has changed from “C. Bus” to “B. Train” when the amount of NC effect being maximum is stored, and therefore, the signal processing apparatus 10 deletes (erases) the storage of “C. Bus”.
Thereafter (e.g., at a later date), the signal processing apparatus 10 stores the amounts of NC effect obtained when the user uses the headphones 20 in “B. Train” and “C. Bus” while wearing the glasses without performing the function of the optimization (S24). Here, the amount of NC effect of “0.64” is stored in “B. Train” during the state of “O. Unknown”. Subsequently, it is assumed that the user performs the function of the optimization in a quiet environment while wearing the headphones 20. Hereinafter, a state in which the optimization is performed while the spectacles are worn is referred to as “Q. Glasses” as appropriate. The signal processing apparatus 10 determines that a characteristic in wearing the headphones 20 when the function of the optimization is performed during the state of “Q. Glasses” is different from “N. Standard” and “P. No”, and estimates a correction filter (q) as the first correction filter (S25). In addition, the signal processing apparatus 10 estimates the amount of effect of each of “A. JEITA” and “B. Train” during the state of “Q. Glasses”. Here, the amount of NC effect of “0.70” is stored in “A. JEITA” and the amount of NC effect of “0.71” is stored in “B. Train”, during the state of “Q. Glasses”. Here, the actually measured value is stored in “O. Unknown”, and therefore, this actually measured value is used for the amount of NC effect of “B. Train” during the state of “Q. Glasses”. Note that when no actually measured value is stored in “O. Unknown”, the amount of NC effect of “A. JEITA” during the state of “Q. Glasses” is estimated as an input together with the ambient sound of “B. Train”. Then, the signal processing apparatus 10 compares the amount of NC effect between two values of “O. Unknown” and “Q. Glasses” in “B. Train” to update the first correction filter (S26). Here, the signal processing apparatus 10 compares the amount of NC effect of “0.64” in “O. Unknown” with the amount of NC effect of “0.71” in “Q. Glasses”, and updates the first correction filter to the correction filter (q), because the amount of NC effect of “Q. Glasses” is larger as a result of the comparison.
In order to determine the approximation to the H2 user M2 characteristic, the signal processing apparatus 10 may rearrange the correction filters so as to change the order of searching a list in the memory to have the order of the amount of NC effect or the order of the frequency of approximation to the H2 user M2 characteristic, instead of the order of storage or the order of address. Therefore, the signal processing apparatus 10 is allowed to select a correction filter with higher reliability. Here, some users perform the function of the optimization not so frequently. There is possibility that the headphones 20 may be used multiple times before the function of the optimization is performed. For this reason, the signal processing apparatus 10 may store the amount of NC effect during the state of “O. Unknown” and use the amount of NC effect to search for an approximate characteristic. For example, the signal processing apparatus 10 may store (1) “an average value of the amount of NC effect in a target wearing condition”, (2) “an average value of the amount of NC effect in the unknown wearing condition”, (3) “the frequency of using the headphones 20 when the correction filter is selected in the target wearing condition”, and (4) “the frequency of using the headphones 20 when the correction filter is selected during the unknown wearing condition”, and the like of each correction filter, and may use the values to search for the approximate characteristic.
There is a high possibility that the frequency of use in (3) described above depends on variation in the wearing condition of the user, and therefore, the signal processing apparatus 10 may perform the processing in descending order of the frequency to search for the approximate characteristic. Here, the correction filter high in the frequency of use in (3) described above tends to have the same wearing condition even if the user repeats wearing and removing multiple times, thus providing high reliability. If the correction filters the same in the frequency of use are included, the signal processing apparatus 10 may perform the search in the order of the amount of NC effect in (1) described above. Furthermore, when the correction filters having the same amount of NC effect are included in (1) described above, the signal processing apparatus 10 may perform the search in the order of the frequency of use in (4) described above. Then, the signal processing apparatus 10 may perform the search in the order of the amount of NC effect in (2) described above. Note that the above description is merely an example, and the order of searching is not limited to this description.
Next, a process of updating the memory storing the second correction filter will be described with reference to
In
In
<2.8. Fifth DNN>
Next, optimization based on a result of estimation of the correction filter will be described. Here, the functions of the signal processing system 1 include a function of estimating the NC effect by using the NC filter having the predetermined filter coefficient. The signal processing system 1 uses the fifth DNN to estimate the NC effect. The fifth DNN will be described below.
In the fifth DNN, the H2 user M2 characteristic and the corrected filter coefficient are input, and the amount of NC effect is output. Note that, in the fifth DNN, in addition to the above description, the H2M2 characteristic may be input. In the fifth DNN, optimization using the Adam is performed as an example of an optimization method. In the fifth DNN, a loss function based on the mean square error is used. In the fifth DNN, the simulation of NC is performed using the training data generated by the first DNN, and the amount of NC effect obtained as a result of the simulation is used as training data.
<2.9. Sixth DNN>
Next, optimization based on the environment set according to the predetermined standard will be described. Here, the functions of the signal processing system 1 include a function of estimating the NC effect in the environment set according to the predetermined standard. The signal processing system 1 uses the sixth DNN to estimate the NC effect. The sixth DNN will be described below.
In the sixth DNN, the amount of NC effect in a noise environment according to a predetermined standard, the corrected filter coefficient, and a characteristic of the ambient sound in the use environment of the user are input, and the amount of NC effect in the use environment of the user is output. In the sixth DNN, the loss function based on the mean square error is used. In the sixth DNN, the amount of NC effect obtained as a result of the simulation of NC is used as the training data. For example, in the sixth DNN, simulation of NC is performed using the NC filter, the correction filter, and data such as sound data (e.g., sound data of ambient sound measured by the first microphone to the third microphone) and characteristics, and the amounts of NC effect obtained as a result of the simulation are used as the training data.
<2.10. Exemplary Functional Configuration>
(1) Signal Processing Apparatus 10
As illustrated in
(1-1) Communication Unit 100
The communication unit 100 has a function of communicating with an external device. For example, in communication with the external device, the communication unit 100 outputs information received from the external device to the control unit 110. Specifically, the communication unit 100 outputs the information received from the headphones 20 to the control unit 110. For example, the communication unit 100 outputs the signals collected by the microphones included in the headphones 20 to the control unit 110.
In communication with the external device, the communication unit 100 transmits information input from the control unit 110 to the external device. Specifically, the communication unit 100 transmits information about acquisition of a collected sound signal input from the control unit 110 to the headphones 20. The communication unit 100 includes a hardware circuit (e.g., a communication processor) so that processing may be performed by a computer program running on the hardware circuit or running on another processing device (e.g. a CPU) controlling the hardware circuit.
(1-2) Control Unit 110
The control unit 110 has a function of controlling the operation of the signal processing apparatus 10. For example, the control unit 110 performs processing of determining the correction filter coefficient to perform the optimal NC for the individual user.
In order to implement the above-described functions, the control unit 110 includes an acquisition unit 111, a processing unit 112, and an output unit 113 as illustrated in
Acquisition Unit 111
The acquisition unit 111 has a function of acquiring the acoustic characteristic in the user's ear isolated from the outside world. For example, the acquisition unit 111 acquires the acoustic characteristic based on a collected sound signal obtained by collecting the measurement sound output into the ear. For example, the acquisition unit 111 acquires the acoustic characteristic based on the collected sound signal collected by a microphone of the sound output device.
The acquisition unit 111 acquires data stored in the storage unit 120. For example, the acquisition unit 111 acquires information about the correction filter coefficient.
Processing Unit 112
The processing unit 112 has a function for controlling processing in the signal processing apparatus 10. As illustrated in
Determination Unit 1121
The determination unit 1121 has a function of determining the correction filter coefficient on the basis of the acoustic characteristic acquired by the acquisition unit 111.
The determination unit 1121 determines the correction filter coefficient by using a trained model (e.g., the first DNN) in which the acoustic characteristic is input and the filter coefficient is output. For example, the determination unit 1121 determines the correction filter coefficient by using the trained model that has learned, as the training data, the acoustic characteristic estimated at the user's eardrum position.
The determination unit 1121 determines the correction filter coefficient by using the trained model (e.g., the second DNN) in which the acoustic characteristic and sound data are input and whether to correct the sound data is output. For example, the determination unit 1121 determines the correction filter coefficient by using the trained model that has learned, as the training data, given information labeled to indicate whether to perform correction on the basis of the noise suppression ratio estimated on the basis of the acoustic characteristic and sound data.
The determination unit 1121 determines the correction filter coefficient by using the trained model (e.g., the third DNN) in which the acoustic characteristic, the acoustic characteristic measured in advance, and the sound data are input and the noise suppression ratio is output. For example, the determination unit 1121 determines the correction filter coefficient by using the trained model that has learned, as the training data, the noise suppression ratio obtained on the basis of both of the acoustic characteristic estimated at the user's eardrum position and the sound data.
The determination unit 1121 determines the correction filter coefficient by using the trained model (the fourth DNN) in which the collected sound signal collected by a microphone different from the microphone having measured the acoustic characteristic and the sound data are input and the correction filter coefficient correcting a difference in filter coefficient based on the ambient sound in a user environment is output. For example, the determination unit 1121 determines the correction filter coefficient by using the trained model that has learned, as the training data, the filter coefficient correcting a difference in filter coefficient based on the acoustic characteristic estimated at the user's eardrum position.
The determination unit 1121 determines the correction filter coefficient by using the trained model (e.g., the fifth DNN) in which the acoustic characteristic and the sound data are input and the amount of NC effect is output. For example, the determination unit 1121 determines the correction filter coefficient by using the trained model that has learned, as the training data, the amount of effect based on the acoustic characteristic estimated at the user's eardrum position.
The determination unit 1121 determines the correction filter coefficient by using the trained model (the sixth DNN) in which the amount of NC effect in the environment set according to the predetermined standard, the sound data, and the acoustic characteristic of the ambient sound in the user environment are input, and the amount of NC effect in the user environment is output. For example, the determination unit 1121 determines the correction filter coefficient by using the trained model that has learned, as the training data, the amount of NC effect based on the sound data, the filter coefficient, and the acoustic characteristic of the ambient sound in the user environment.
NC Filter Unit 1122
The NC filter unit 1122 has a function of generating the sound data having a phase opposite to the ambient sound leaking into the user's ear. For example, the NC filter unit 1122 generates the sound data having a phase opposite to the acoustic characteristic of the ambient sound acquired by the acquisition unit 111.
Correction Unit 1123
The correction unit 1123 has a function of correcting the sound data generated by the NC filter unit 1122 by using the correction filter. Specifically, the correction unit 1123 performs correction by using the correction filter coefficient determined by the determination unit 1121.
Generation Unit 1124
The generation unit 1124 has a function of generating the trained model. For example, the generation unit 1124 generates the trained model that has learned input data and output data having been input to the loss function. The determination unit 1121 determines the correction filter coefficient estimated using the trained model generated by the generation unit 1124.
Correction Determination Unit 1125
The correction determination unit 1125 has a function of determining whether to correct the sound data generated by the NC filter unit 1122 by using the correction filter. For example, the correction determination unit 1125 uses the correction filter, to determine whether or not a sufficient correction effect can be expected, and determines the correction using the correction filter when the sufficient correction effect can be expected.
The correction determination unit 1125 determines the noise level of the ambient sound. The correction determination unit 1125 determines whether which one of the first correction filter and the second correction filter is to be used according to the noise level of the ambient sound.
Output Unit 113
The output unit 113 has a function of outputting the sound data corrected by the correction unit 1123. The output unit 113 provides the corrected sound data to, for example, the headphones 20 via the communication unit 100. When receiving the corrected sound data, the headphones 20 reproduce sound based on the corrected sound data. This configuration makes it possible for the user to try to listen to the sound corrected by the correction filter.
(1-3) Storage Unit 120
The storage unit 120 is implemented by, for example, a semiconductor memory device such as a random access memory (RAM) or flash memory, or a storage device such as a hard disk or optical disk. The storage unit 120 has a function of storing computer programs and data (including a form of a program) related to processing in the signal processing apparatus 10.
“Correction filter coefficient ID” indicates identification information for identifying the correction filter coefficient. “Correction filter coefficient” indicates the correction filter coefficient. “Performing state” indicates a performing state of an optimization function. In
(2) Headphones 20
As illustrated in
(2-1) Communication Unit 200
The communication unit 200 has a function of communicating with an external device. For example, in communication with the external device, the communication unit 200 outputs information received from the external device to the control unit 210. Specifically, the communication unit 200 outputs information received from the signal processing apparatus 10 to the control unit 210. For example, the communication unit 200 outputs information about acquisition of the sound data corrected by the correction filter, to the control unit 210.
(2-2) Control Unit 210
The control unit 210 has a function of controlling the operation of the headphones 20. For example, the control unit 210 transmits the acoustic characteristic based on the collected sound signal collected by a microphone to the signal processing apparatus 10 via the communication unit 200.
(2-3) Output Unit 220
The output unit 220 is implemented by a member that is configured to output sound, such as a speaker. The output unit 220 outputs sound based on the sound data.
<2.11. Process in Signal Processing System>
The functions of the signal processing system 1 according to the embodiment have been described. Next, a process in the signal processing system 1 will be described.
<2.12. Variations of Processing>
(Selection of Correction Filter with UI)
In the above embodiment, the example has been described in which the signal processing apparatus 10 determines whether to perform correction by using the machine learning such as DNN, but determination whether to perform correction by the signal processing apparatus 10 is not limited to this example. For example, the signal processing apparatus 10 may determine whether to perform the correction by receiving selection from the user.
It depends on user's subjectivity whether the user is more comfortable as the amount of NC effect increases. An example in which increasing amount of NC effect reduces user's comfort includes considerable suppression of mid-bas noise leading to unpleasant enhancement of high-tone noise having been masked by the mid-bass noise. The signal processing apparatus 10 may determine whether to perform the correction by presenting the amount of NC effect using the current filter coefficient, the amount of NC effect using the estimated correction filter coefficient, the amount of NC effect of the correction filter coefficient stored in the memory, and the like and receiving selection from the user. For example, the signal processing apparatus 10 may cause a mobile terminal such as a smartphone (hereinafter, appropriately referred to as “terminal device 30”) to display a list of the correction filters, for receiving selection from the user. For example, the signal processing apparatus 10 may cause the mobile terminal to display the list of the correction filters according to the wearing condition of the user. Therefore, the signal processing apparatus 10 allows the user to explicitly select the correction filter. Furthermore, the signal processing apparatus 10 can be configured so that the user may confirm the amount of NC effect on the basis of any ambient sound.
When receiving an operation on the trial listening Cll, the signal processing apparatus 10 may perform processing for outputting the sound based on the correction filter selected by the user. This configuration makes it possible for the user to try to listen to the sound based on the selected correction filter. Here, upon trial listening, the signal processing apparatus 10 may select and reproduce sound (e.g., music) stored in the terminal device 30 so that the user may readily recognize a difference between the correction filters included in the list. Alternatively, the signal processing apparatus 10 may reproduce any sound selected in advance by the user. Therefore, the signal processing apparatus 10 can readily perform comparation between the correction filters in the use environment of the user. Furthermore, the signal processing apparatus 10 may perform processing for causing to display the H2 user M2 characteristic. This configuration makes it possible for the signal processing apparatus 10 to cause the user to visually understand the H2 user M2 characteristic. Furthermore, the signal processing apparatus 10 may perform processing for enabling the user to name each correction filter, on UI of the terminal device 30. Therefore, the signal processing apparatus 10 is configured to allow the user to give names, thereby facilitating the user to selectively use the correction filters. At this time, ease of understanding information displayed on the UI or ease of operation on the UI may degrade. Therefore, the signal processing apparatus 10 may perform processing for enabling the user to make comparation in the trial listening as well by using audio guide or the like, with only the UI for the headphones 20. Furthermore, the signal processing apparatus 10 may perform processing for performing the process of estimation of the correction filter coefficient, on the terminal device 30 of the user or a server to which the terminal device 30 is connected.
Next, management and operation of the correction filter for the error in the ambient sound, on the terminal device 30 will be described. Here, the display screen of the terminal device 30 is provided with a tab for correcting wearing error and a tab for correcting a difference in the ambient sounds so that the lists of the correction filters are switched when the user selects any of the tabs.
Note that the terminal device 30 according to the embodiment may be, not limited to a mobile terminal such as a smartphone, any information processing device as long as the information processing device is configured to receive an operation for the correction filter from the user.
(Processing for ambient sound changing at any time)
In the above embodiment, updating the correction filter coefficient estimated on the basis of the user's operation by the signal processing apparatus 10 has been described, but update by the signal processing apparatus 10 is not limited to this example. The signal processing apparatus 10 may update at any time the correction filter coefficient estimated for the ambient sound changing at any time. As illustrated in
(Estimation of NC filter)
In the above embodiment, estimation of the correction filter coefficient for a difference in the ambient sound by the signal processing apparatus 10 has been described, but the filter coefficient of the NC filter may be estimated. For example, the signal processing apparatus 10 may estimate the filter coefficient that minimizes the collected sound signal collected by the third microphone based, on the basis of the collected sound signal collected by the first microphone and the collected sound signal collected by the third microphone. In the above embodiment, use of the correction filter coefficient estimated on the basis of various ambient sounds as the training data by the signal processing apparatus 10 has been described, but the correction filter coefficient may be estimated by determining a reference filter coefficient.
(Processing for adjusting gain)
In the above embodiment, determination of the correction filter coefficient and performance of correction with the determined correction filter coefficient by the signal processing apparatus 10 has been described. Here, the signal processing apparatus 10 may perform correction by adjusting the gain of the filter without determining the correction filter coefficient. In the correction, the signal processing apparatus 10 may add an offset on the basis of an error between the H2M2 characteristic and the H2 user M2 characteristic. Furthermore, the signal processing apparatus 10 may adjust this offset to calculate an offset value that minimizes the sum of squares of the error. When the minimum sum of squared error of the offset value is smaller than a predetermined threshold, the signal processing apparatus 10 may perform correction with the offset value as an adjustment value for the gain. Furthermore, the signal processing apparatus 10 may receive adjustment from the user on the basis of the offset value. Therefore, the signal processing apparatus 10 is allowed to perform adjustment according to the user's subjective preference or how the user hears sound. Furthermore, when the minimum sum of squared error of the offset value is larger than the predetermined threshold, the signal processing apparatus 10 may estimate the correction filter coefficient.
(Correction of Error)
Note that, in the above embodiment, the correction of the error on the basis of the individual differences between users and the wearing condition has been described, but the correction is not limited thereto. The correction according to the embodiment includes, for example, correction of an error based on individual differences of the headphones 20 or the like.
Finally, an exemplary hardware configuration of the signal processing apparatus according to the embodiment will be described with reference to
As illustrated in
The CPU 901 functions as, for example, an arithmetic processing device or a control device, and controls all or part of the operations of the component elements on the basis of various computer programs recorded in the ROM 902, the RAM 903, or the storage device 908. The ROM 902 is a unit that stores a program read by the CPU 901, data used for calculation, and the like. The RAM 903 temporarily or permanently stores, for example, a program read by the CPU 901, and data such as various parameters changing as appropriate upon running the program. These component elements are mutually connected by the host bus 904a including a CPU bus or the like. The CPU 901, the ROM 902, and the RAM 903 can implement the functions of the control unit 110 and the control unit 210 which have been described with reference to
The CPU 901, the ROM 902, and the RAM 903 are mutually connected, for example, via the host bus 904a configured to transmit data at high speed. Meanwhile, the host bus 904a is connected to, for example, the external bus 904b configured to transmit data at relatively low speed, via the bridge 904. In addition, the external bus 904b is connected to various component elements via the interface 905.
The input device 906 is implemented by a device into which information is input by a listener, such as a mouse, keyboard, touch panel, button, microphone, switch, and lever. Furthermore, the input device 906 may be, for example, a remote-control device using an infrared ray or another radio wave, or may be an external connection device that corresponds to the operation of the signal processing device 900, such as a mobile phone or PDA. Furthermore, the input device 906 may include, for example, an input control circuit or the like that generates an input signal on the basis of information input using the input means described above and outputs the input signal to the CPU 901. The administrator of the signal processing device 900 can operate the input device 906 to input various data to the signal processing device 900 or give an instruction for the signal processing device 900 to perform processing operation.
In addition, the input device 906 can include a device that detects the position of the user. For example, the input device 906 can include various sensors such as an image sensor (e.g., camera), depth sensor (e.g., stereo camera), acceleration sensor, gyro sensor, geomagnetic sensor, optical sensor, sound sensor, distance measurement sensor (e.g., time of flight (ToF) sensor), and force sensor. Furthermore, the input device 906 may acquire information about the signal processing device 900 itself, such as the attitude and movement speed of the signal processing device 900, and information about a space around the signal processing device 900, such as brightness and noise around the signal processing device 900. Furthermore, the input device 906 may include a GNSS module that receives a GNSS signal (e.g., GPS signal from a global positioning system (GPS) satellite) from a global navigation satellite system (GNSS) satellite and measures position information including the latitude, longitude, and altitude of the device. Furthermore, for the position information, the input device 906 may detect the position by transmission and reception with Wi-Fi (registered trademark), a mobile phone, PHS, smartphone, or the like, near field communication, or the like. The input device 906 can implement the function of, for example, the acquisition unit 111 which has been described with reference to
The output device 907 includes a device configured to visually or audibly notify the user of information acquired. Examples of such a device include a display device such as a CRT display device, liquid crystal display device, plasma display device, EL display device, laser projector, LED projector, and lamp, a sound output device such as a speaker and headphones, a printer device, and the like. The output device 907 outputs, for example, results obtained from various processing performed by the signal processing device 900. Specifically, the display device visually displays the results obtained from various processing performed by the signal processing device 900, in various formats such as text, image, tables, graph, and the like. Meanwhile, the sound output device converts an audio signal including voice data, sound data, or the like reproduced, into an analog signal, and aurally outputs the analog signal. The output device 907 can implement, for example, the functions of the output unit 113 and the output unit 220 which have been described with reference to
The storage device 908 is a data storage device that is formed as an example of a storage unit of the signal processing device 900. The storage device 908 is implemented by, for example, a magnetic storage device such as HDD, a semiconductor storage device, an optical storage device, a magneto-optical device, or the like. The storage device 908 may include a storage medium, a recording device that records data in the storage medium, a reading device that reads data from the storage medium, a deletion device that deletes data recorded in the storage medium, and the like. The storage device 908 stores a computer program executed by the CPU 901, various data, various data acquired from outside, and the like. The storage device 908 can implement, for example, the function of the storage unit 120 which has been described with reference to
The drive 909 is a storage medium reader/writer, and is built in or externally mounted to the signal processing device 900. The drive 909 reads information recorded in a removable storage medium such as a mounted magnetic disk, optical disk, magneto-optical disk, or semiconductor memory, and outputs the information to the RAM 903. In addition, the drive 909 is configured to write information on the removable storage medium.
The connection port 910 is, for example, a port for connecting an external connection device such as a universal serial bus (USB) port, IEEE1394 port, small computer system interface (SCSI), RS-232C port, or optical audio terminal.
The communication device 911 is a communication interface including, for example, a communication device or the like for connection to a network 920. The communication device 911 is a communication card or the like, such as for a wired or wireless local area network (LAN), long term evolution (LTE), Bluetooth (registered trademark), or wireless USB (WUSB). Furthermore, the communication device 911 may be a router for optical communication, a router for an asymmetric digital subscriber line (ADSL), a modem for various communications, or the like. The communication device 911 is configured to transmit/receive a signal or the like between, for example, the Internet or another communication device according to a predetermined protocol such as TCP/IP. The communication device 911 can implement, for example, the functions of the communication unit 100 and the communication unit 200 which have been described with reference to
Note that the network 920 is a wired or wireless transmission path for information transmitted from devices connected to the network 920. For example, the network 920 may include a public network such as the Internet, a telephone network, or a satellite communication network, various local area networks (LANs) including Ethernet (registered trademark), a wide area network (WAN), or the like. Furthermore, the network 920 may include a private network such as an Internet protocol-virtual private network (IP-VPN).
The example of the hardware configuration capable of implementing the functions of the signal processing device 900 according to the embodiment has been described above. Each of the component elements described above may be implemented using a general-purpose member, or may be implemented using hardware dedicated to the function of each component element.
Accordingly, the hardware configuration to be used can be changed as appropriate according to the technical level when the present embodiment is carried out.
As described above, the signal processing apparatus 10 according to the embodiment performs the processing of determining the correction filter coefficient, on the basis of the acoustic characteristic in the user's ear isolated from the outside world. Furthermore, the signal processing apparatus 10 performs processing of correcting the sound data having a phase opposite to the ambient sound leaking into the user's ear, by using the correction filter. This configuration makes it possible for the signal processing apparatus 10 to determine the correction filter coefficient for optimization without requiring, for example, an acoustic signal at the eardrum position where mounting the product is difficult. Furthermore, the signal processing apparatus 10 performs correction using the correction filter, and thus, the improvement in the NC effect can be promoted.
Therefore, it is possible to provide the new and improved signal processing apparatus, signal processing method, signal processing model production method, and sound output device that are configured to promote further improvement in usability.
Preferred embodiments of the present disclosure have been described above in detail with reference to the accompanying drawings, but the technical scope of the present disclosure is not limited to these examples. A person skilled in the art may obviously find various alternations and modifications within the technical concept described in claims, and it should be understood that the alternations and modifications will naturally come under the technical scope of the present disclosure.
For example, the respective devices described in the present description may be implemented as a single device, or some or all of the devices may be implemented as separate devices. For example, the signal processing apparatus 10 and the headphones 20 illustrated in
Furthermore, a series of processing steps by the respective devices described in the present description may be implemented using any of software, hardware, and a combination of the software and the hardware. The computer programs constituting the software are stored in advance in, for example, a recording media (non-transitory media) provided inside or outside the devices. Then, each program is read into, for example, the RAM upon execution by the computer and is executed by the processor such as the CPU.
Furthermore, the processes having been described using the flowcharts in the present specification may not necessarily be executed in the order illustrated. Some processing steps may be performed in parallel. In addition, an additional processing step may be employed, and some processing steps may be omitted.
Furthermore, the effects descried herein are merely illustrative or exemplified effects, and are not limitative. That is, with or in the place of the above effects, the technology according to the present disclosure may achieve other effects that are clear to those skilled in the art based on the description of this specification.
Additionally, the present technology may also be configured as below.
(1)
A signal processing apparatus including:
an acquisition unit that acquires an acoustic characteristic in a user's ear, isolated from the outside world;
an NC filter unit that generates sound data having a phase opposite to an ambient sound leaking into the user's ear;
a correction unit that corrects the sound data by using a correction filter; and
a determination unit that determines a filter coefficient of the correction filter based on the acoustic characteristic.
(2)
The signal processing apparatus according to (1), wherein
the acquisition unit
acquires the acoustic characteristic based on a collected sound signal obtained by collecting measurement sound output into the ear.
(3)
The signal processing apparatus according to (1) or (2), wherein
the determination unit
determines the filter coefficient by using a trained model in which an acoustic characteristic is input and a filter coefficient is output.
(4)
The signal processing apparatus according to (3), wherein
the determination unit
determines the filter coefficient by using the trained model that has learned, as training data, an acoustic characteristic estimated at a user's eardrum position.
(5)
The signal processing apparatus according to any one of (1) to (4), wherein
the determination unit
determines the filter coefficient by using a trained model in which an acoustic characteristic and sound data are input and whether to correct the sound data is output.
(6)
The signal processing apparatus according to (5), wherein
the determination unit
determines the filter coefficient by using the trained model that has learned, as training data, given information labeled to indicate whether to perform correction based on a noise suppression ratio estimated based on an acoustic characteristic and sound data.
(7)
The signal processing apparatus according to any one of (1) to (6), wherein
the determination unit
determines the filter coefficient by using a trained model in which an acoustic characteristic, an acoustic characteristic measured in advance, and sound data are input and a noise suppression ratio is output.
(8)
The signal processing apparatus according to (7), wherein
the determination unit
determines the filter coefficient by using the trained model that has learned, as training data, a noise suppression ratio obtained based on both of an acoustic characteristic estimated at a user's eardrum position and sound data.
(9)
The signal processing apparatus according to any one of (1) to (8), wherein
the determination unit
determines the filter coefficient by using a trained model in which a collected sound signal collected by a microphone different from a microphone having measured the acoustic characteristic and sound data are input and a correction filter coefficient correcting a difference in filter coefficient based on an ambient sound in a user environment is output.
(10)
The signal processing apparatus according to (9), wherein
the determination unit
determines the filter coefficient by using the trained model that has learned, as training data, a filter coefficient correcting a difference in filter coefficient based on an acoustic characteristic estimated at a user's eardrum position.
(11)
The signal processing apparatus according to any one of (1) to (10), wherein
the determination unit
determines the filter coefficient by using a trained model in which an acoustic characteristic and sound data are input and an amount of NC effect is output.
(12)
The signal processing apparatus according to (11), wherein
the determination unit
determines the filter coefficient by using the trained model that has learned, as training data, an amount of effect based on an acoustic characteristic estimated at a user's eardrum position.
(13)
The signal processing apparatus according to any one of (1) to (12), wherein
the determination unit
determines the filter coefficient by using a trained model in which an amount of NC effect in an environment set according to a predetermined standard, sound data, and an acoustic characteristic of ambient sound in a user environment are input, and an amount of NC effect in the user environment is output.
(14)
The signal processing apparatus according to (13), wherein
the determination unit
determines the filter coefficient by using the trained model that has learned, as training data, an amount of NC effect based on sound data, a filter coefficient, and an acoustic characteristic of ambient sound in a user environment.
(15)
A signal processing method performed by a computer, the signal processing method including:
an acquisition step of acquiring an acoustic characteristic in a user's ear, isolated from the outside world;
an NC filter step of generating sound data having a phase opposite to an ambient sound leaking into the user's ear;
a correction step of correcting the sound data by using a correction filter; and
a determination step of determining a filter coefficient of the correction filter based on the acoustic characteristic.
(16)
A signal processing program for causing a computer to perform:
an acquisition procedure of acquiring an acoustic characteristic in a user's ear, isolated from the outside world;
an NC filter procedure of generating sound data having a phase opposite to an ambient sound leaking into the user's ear;
a correction procedure of correcting the sound data by using a correction filter; and
a determination procedure of determining a filter coefficient of the correction filter based on the acoustic characteristic.
(17)
A signal processing model production method including: determining whether to correct a filter coefficient based on an acoustic characteristic based on a collected sound signal collected by a microphone and determining a filter coefficient for performing optimal noise canceling; learning, in order to generate a noise canceling signal based on the determined filter coefficient, performing learning with an acoustic characteristic based on a collected sound signal collected in advance by a microphone and a correction filter coefficient for performing optimal noise canceling as inputs, and producing a model for performing optimal noise canceling.
(18)
A sound output device including an output unit that outputs sound from which noise is cancelled based on a signal provided from a signal processing apparatus, the signal processing apparatus determining a filter coefficient for performing optimal noise cancelling based on an acoustic characteristic based on a collected sound signal collected by a microphone of the sound output device, providing a signal generated based on the determined filter coefficient.
Number | Date | Country | Kind |
---|---|---|---|
2020-101763 | Jun 2020 | JP | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2021/019901 | 5/26/2021 | WO |