This application claims the benefit of Taiwan application Serial No. 108148594, filed Dec. 31, 2019, the subject matter of which is incorporated herein by reference.
The invention relates in general to an automatic adjusting method and an electronic device using the same, and more particularly to a specific sound source automatic adjusting method and an electronic device using the same.
Along with the advance in technology, various audio/video entertainment devices are provided one after another. For the audio/video entertainment devices, the audio signal directly affects the user's sensations. To provide the user with better sensations, one specific sound source in the original sound signal needs to be amplified.
According to the conventional technology, the entire original sound signal is amplified when the specific sound source is detected. Despite increasing the user's sense of presence, such processing method does not do much good to the user, and the signal to noises ratio (SNR) still does not change because the background music and other sound sources are synchronously adjusted and amplified. Therefore, it has become a prominent task for the people in the technology field to provide a method or device for suitably adjusting specific sound source and increasing the SNR without affecting other sound sources.
The invention is directed to a specific sound source automatic adjusting method and an electronic device using the same. Through the technologies of determining the number of sound sources and separating the sound sources, the specific sound source is automatically adjusted, and the original sound signal is converted to an adjusted audio signal which is outputted to the headphone to provide the user with better sensations.
According to one embodiment of the present invention, a specific sound source automatic adjusting method is provided. The specific sound source automatic adjusting method includes the following steps. A probabilistic identification process of several specific sound sources is performed on an original sound signal. The number of sound sources of the original sound signal is determined according to the result of the probabilistic identification process on the original sound signal. If the number of sound sources of the original sound signal is greater than or equal to two, a directionality analysis procedure is performed on the original sound signal. At least one specific directional sub-signal is separated out from the original sound signal according to the result of the directional analysis procedure. The probabilistic identification process of the specific sound sources is performed on the specific directional sub-signal. The number of sound sources of the specific directional sub-signal is determined according to the result of the probabilistic identification process on the specific directional sub-signal. If the number of sound sources of the specific directional sub-signal is equal to one, a sound source adjustment procedure is performed.
According to another embodiment of the present invention, an electronic device for automatically adjusting a specific sound source is provided. The electronic device includes a first audio recognition unit, a first multi-sound source determination unit, a directivity analysis unit, a directional separation unit, a second audio recognition unit, a second multi-sound source determination unit and an audio adjustment unit. The first audio recognition unit is configured to perform a probabilistic identification process of several specific sound sources on an original sound signal. The first multi-sound source determination unit is configured to determine the number of sound sources of the original sound signal according to the result of the probabilistic identification process on the original sound signal. If the number of sound sources of the original sound signal is greater than or equal to two, the directivity analysis unit performs a directionality analysis procedure on the original sound signal. The directional separation unit is configured to separate out at least one specific directional sub-signal from the original sound signal according to the result of the directional analysis procedure. The second audio recognition unit is configured to perform the probabilistic identification process of the specific sound sources on the specific directional sub-signal. The second multi-sound source determination unit is configured to determine the number of sound sources of the specific directional sub-signal according to the result of the probabilistic identification process on the specific directional sub-signal. If the number of sound sources of the specific directional sub-signal is equal to one, the audio adjustment unit performs a sound source adjustment procedure.
The above and other aspects of the invention will become better understood with regard to the following detailed description of the preferred but non-limiting embodiment(s). The following description is made with reference to the accompanying drawings.
Referring to
Refer to
Referring to
Then, the method proceeds to step S102, a probabilistic identification process of several specific sound sources V1, V2 and V3 is performed on the original sound signal S1 by the first audio recognition unit 102. For example, the first audio recognition unit 102 performs recognition by using a recognition model M11 trained with bombardment sound to obtain a sound source probability P11 of the specific sound source V1; the first audio recognition unit 102 performs recognition by using a recognition model M12 trained with tank sound to obtain a sound source probability P12 of the specific sound source V2; and the first audio recognition unit 102 performs recognition by using a recognition model M13 trained with airplane sound to obtain a sound source probability P13 of the specific sound source V3.
Then, the method proceeds to step S103, the number of sound sources of the original sound signal S1 is determined by the first multi-sound source determination unit 103 according to the result of the probabilistic identification process on the original sound signal S1.
When the original sound signal S1 has only one specific sound source, the sound source probability of the specific sound source will be extremely high, and so will the maximum probability sound source probability be extremely high. When the original sound signal S1 has several specific sound sources (the background sound source is also a specific sound source), the sound source probability of each specific sound source will decrease, and the maximum probability sound source probability will not be too high. When the original sound signal S1 does not have any specific sound source, the sound source probability of each specific sound source will be extremely low, and so will the maximum probability sound source probability be extremely low.
That is, the first multi-sound source determination unit 103 can obtain the maximum probability Px from the sound source probabilities P11, P12, P13 of the specific sound sources V1, V2 and V3 according to formula (1). Then, the number of specific sound sources is determined according to the maximum probability Px.
Px=maxmPm (1)
The first multi-sound source determination unit 103 can set a higher threshold Th1H (such as 0.95) and a lower threshold Th1L (such as 0.1). When the original sound signal has only one specific sound source, the maximum probability Px will be higher than the higher threshold Th1H. When the original sound signal has only one specific sound source and contains background music, the maximum probability Px will be between the higher threshold Th1H and the lower threshold Th1L. When the original sound signal has more than two specific sound sources, the maximum probability Px will be between the higher threshold Th1H and the lower threshold Th1L. When the original sound signal does not have any specific sound source, the maximum probability Px will be lower than the lower threshold Th1L.
If the determination in step S103 is “the number of sound sources is 0”, the method returns to step S101 in which no adjustment is performed; if the determination in step S103 is “the number of sound sources is 1”, the method proceeds to step S104 to adjust the specific sound source; and if the determination in step S103 is “the number of sound sources is more than 2”, the method proceeds to step S106 to continue with the separation of sound sources.
In step S104, a sound source adjustment procedure is performed by the audio adjustment unit 104. For example, the audio adjustment unit 104 obtains an adjusted specific sound source V1′ by adjusting the volume of the specific sound source V1 or changing its frequency response via an equalizer (EQ).
In step S105, the adjusted specific sound source V1′ is synthesized with the original sound signal S1 by the synthesis unit 105 to obtain an adjusted audio signal S1′.
If the determination in step S103 is “the number of sound sources is more than 2”, the method proceeds to step S106 to continue with the separation of sound sources.
In step S106, a directionality analysis procedure is performed on the original sound signal S1 by the directivity analysis unit 106. Referring to
Wherein, the speed of sound c, the frequency f, and the binaural distance d are constants, and the only factor affecting the phase difference ΔØ is angle θf. Each frequency f corresponds to an angle θf. 1024 frequencies f may correspond to several angles θf, and it is possible that several frequencies f may correspond to the same angle θf. The directivity distribution diagram as illustrated in
Then, the method proceeds to step S107, at least one specific directional sub-signal is separated out from the original sound signal S1 by the directional separation unit 107 according to the result of the directional analysis procedure. For example, the directional separation unit 107 can separate out the specific directional sub-signal S11 corresponding to the angle θ1 and the specific directional sub-signal S12 corresponding to the angle θ2 from the original sound signal S1.
In the present step, the directional separation unit 107 applies a nonlinear projection column mask (NPCM) on the original audio signal S1 according to a specific direction of the directivity distribution diagram to obtain specific directional sub-signals S11 and S12. Each frequency f corresponds to an angle θ. For the n-th signal, the closer to the angle θn, the smaller (closer to 0) the assigned weight will be. The separation signal Sn(f) towards the angle θn can be obtained by shielding the signal farther away from angle θn using different weights. That is, each frequency energy S(f) is multiplied by the corresponding weight wnf:Sn(f)=wnf×S(f). Refer to
Although the specific directional sub-signal S11 and the specific directional sub-signal S12 are separated out from the original sound signal S1 in step S107, many specific sound sources are probably located on the same direction, therefore the specific directional sub-signal S11 does not necessarily have only single specific sound source, and the specific directional sub-signal S12 does not necessarily have only single specific sound source either. Therefore, the number of sound sources still needs to be determined.
In step S108, the probabilistic identification process of the specific sound sources V1, V2 and V3 is performed on the specific directional sub-signals S11 and S12 respectively by the second audio recognition unit 108. Let the specific directional sub-signal S11 be taken for example. The second audio recognition unit 108 performs recognition to obtain a sound source probability P21 of the specific sound source V1 by using a recognition model M21 trained with bombardment sound; the second audio recognition unit 108 performs recognition to obtain a sound source probability P22 of the specific sound source V2 by using a recognition model M22 trained with tank sound; and the second audio recognition unit 108 performs recognition to obtain a sound source probability P23 of the specific sound source V3 by using a recognition model M23 trained with airplane sound.
The recognition model M21 of step S108 can be the same as the recognition model M11 of step S102, or the recognition model M21 of step S108 can be the re-trained recognition model. The recognition model M22 of step S108 can be the same as the recognition model M12 of step S102, or the recognition model M22 of step S108 can be the re-trained recognition model. The recognition model M23 of step S108 can be the same as the recognition model M13 of step S102, or the recognition model M23 of step S108 can be the re-trained recognition model.
Let the specific directional sub-signal S12 be taken for example again. The second audio recognition unit 108 performs recognition to obtain a sound source probability P31 of the specific sound source V1 by using a recognition model M31 trained with bombardment sound; the second audio recognition unit 108 performs recognition to obtain a sound source probability P32 of the specific sound source V2 by using a recognition model M32 trained with tank sound; and the second audio recognition unit 108 performs recognition to obtain a sound source probability P33 of the specific sound source V3 by using a recognition model M33 trained with airplane sound.
The recognition model M31 of step S108 can be the same as the recognition model M11 of step S102, or the recognition model M31 of step S108 can be the re-trained recognition model. The recognition model M32 of step S108 can be the same as the recognition model M12 of step S102, or the recognition model M32 of step S108 can be the re-trained recognition model. The recognition model M33 of step S108 can be the same as the recognition model M13 of step S102, or the recognition model M33 of step S108 can be the re-trained recognition model.
Then, the method proceeds to step S109, the number of sound sources of the specific directional sub-signal S11 and the number of sound sources of the specific directional sub-signal S12 are determined by the second multi-sound source determination unit 109 according to the result of the probabilistic identification process performed on the specific directional sub-signal S11 and the specific directional sub-signal S12 respectively.
The second multi-sound source determination unit 109 can set a new higher threshold Th2H (such as 0.99) and a new lower threshold Th2L (such as 0.05). If the determination in step S109 is “the number of sound sources is 1”, the method proceeds to step S104 to adjust the specific sound source; if the determination in step S109 is “the number of sound sources is 2”, the method proceeds to step S110 to continue with the separation of signal. For example, if the number of sound sources of the specific directional sub-signal S11 is 1, the specific directional sub-signal S11 is adjusted through step S104; if the number of sound sources of the specific directional sub-signal S11 is 2, the specific directional sub-signal S11 is separated through step S110.
In step S110, a sparse characteristic analysis (SCA) program, an independent component analysis (ICA) program, or a non-negative matrix factorization program is performed on the specific directional sub-signal S12 by the characteristic separation unit 110. Through the directional separation of step S107, all sound sources of the specific directional sub-signal S12 are on the same direction, and basically the specific directional sub-signal S12 does not have many sound sources. To avoid unnecessary distortion, the specific directional sub-signal S12 is only separated into two sub-signals. The specific directional sub-signal can be separated out by using the sparse characteristic analysis (SCA) method according to the sparsity of the sound band between individual sub-signals, or by using the independent component analysis (ICA) method according to the independence between sound sources, or by using the non-negative matrix factorization method which divides the signal into different bases corresponding to suitable coefficients.
After two sub-signals are separated in step S110, the method proceeds to step S111.
In step S111, whether step S110 has been performed over K times is determined by the frequency determination unit 111. If step S110 has been performed over K times, the method proceeds to step S112; otherwise, the method returns to step S108. That is, after the step S110 of separating the signal has been performed for many times, if it still cannot be accurately confirmed that the sub-signal has only one sound source, the method skips the loop and proceeds to step S112.
In step S112, whether each of the specific sound sources V1, V2 and V3 of the specific directional sub-signal S12 exists is directly determined by the specific sound source determination unit 112 according to the result of the probabilistic identification process on the specific directional sub-signal S12. The specific sound source determination unit 112 sets a middle threshold Th3M as 0.5. If the sound source probability P31 of the specific sound source V1 is higher than the middle threshold Th3M, the specific sound source determination unit 112 directly determines that the specific sound source V1 exists, and the method proceeds to step S104 to perform adjustment; if the sound source probability P31 of the specific sound source V1 is not higher than the middle threshold Th3M, the specific sound source determination unit 112 directly determines that the specific sound source V1 does not exist and there is no need to perform adjustment. If the sound source probability P32 of the specific sound source V2 is higher than the middle threshold Th3M, the specific sound source determination unit 112 directly determines that the specific sound source V2 exists, and the method proceeds to step S104 to perform adjustment; if the sound source probability P32 of the specific sound source V2 is not higher than the middle threshold Th3M, the specific sound source determination unit 112 directly determines that the specific sound source V2 does not exist and there is no need to perform adjustment. If the sound source probability P33 of the specific sound source V3 is higher than the middle threshold Th3M, the specific sound source determination unit 112 directly determines that the specific sound source V3 exists and the method proceeds to step S104 to perform adjustment; if the sound source probability P33 of the specific sound source V3 is not greater than the middle threshold Th3M, the specific sound source determination unit 112 directly determines that the specific sound source V3 does not exist and there is no need to perform adjustment.
Through the above embodiments, the specific sound sources can be separated and adjusted accordingly, such that the specific sound sources can be highlighted, and the user can be provided with better sensations.
While the invention has been described by way of example and in terms of the preferred embodiment(s), it is to be understood that the invention is not limited thereto. On the contrary, it is intended to cover various modifications and similar arrangements and procedures, and the scope of the appended claims therefore should be accorded the broadest interpretation so as to encompass all such modifications and similar arrangements and procedures.
Number | Date | Country | Kind |
---|---|---|---|
108148594 | Dec 2019 | TW | national |
Number | Name | Date | Kind |
---|---|---|---|
20200107149 | Sunder | Apr 2020 | A1 |
20200327877 | Tourbabin | Oct 2020 | A1 |
20200412772 | Nesta | Dec 2020 | A1 |
Number | Date | Country |
---|---|---|
102799899 | Dec 2014 | CN |
Entry |
---|
Chi et al., “A frequency bin-wise nonlinear masking algorithm in convolutive mixtures for speech segregation”, Journal of the Acoustical Society of America, vol. 131, No. 5, May 2012 (published online Apr. 9, 2012), pp. EL361-EL367 (8 pages total). |
Number | Date | Country | |
---|---|---|---|
20210204083 A1 | Jul 2021 | US |