This application claims the priority benefit of Taiwan application serial no. 111148595, filed on Dec. 16, 2022. The entirety of the above-mentioned patent application is hereby incorporated by reference herein and made a part of this specification.
The disclosure relates to a signal processing technique, and in particular relates to an audio signal processing method and a mobile apparatus.
Generally, there are some noise reduction mechanisms on the transmission path of the microphone for conference applications in notebook computers. For example, the steady-state noise reduction technology of a single microphone, or the beamforming technology of a microphone array adjusts the direction of beam sound-reception direction (in order to avoid the direction of user movement, the angle of the beam should not be too narrow). Even the back-end artificial intelligence (AI) noise reduction technology is used to preserve the human voice signal.
For example,
In practical applications, when there are other people talking near the user, the voice signals of other people are often not filtered out, and even follow the voice signal of the user and are transferred out through the microphone path. In addition, when the user moves and is not completely in the direction corresponding to the microphone array, the received audio signal is also affected.
On the other hand, in a conference, most users uses an external microphone (e.g., a headset microphone). However, some external microphones are omnidirectional, which causes the surrounding audio signals to be recorded, which affects the noise reduction effect.
In view of this, the embodiments of the disclosure provide an audio signal processing method and a mobile apparatus, which use a blind signal separation (BSS) technology to enhance the noise reduction effect.
The audio signal processing method of the embodiment of the disclosure is suitable for a mobile apparatus and an external microphone (mic), the mobile apparatus is communicatively connected to the external microphone, and the mobile apparatus includes an embedded microphone (mic). This audio signal processing method includes (but not limited to) the following operation. A target direction of multiple sound-reception directions and a target distance corresponding to the target direction is determined according to multiple first audio signals in the sound-reception directions received by the embedded microphone. A primary sound source is located in the target direction and at the target distance from the embedded microphone, the target direction is determined based on a correlation between the first audio signals and a second audio signal received by the external microphone, and the target distance is determined based on signal power of a first audio signal in the target direction. A target algorithm is selected from multiple blind signal separation (BSS) algorithms according to the target direction and the target distance. The target algorithm is determined based on an included angle between the target direction and the interference source sound direction and the magnitude of the target distance, and the interference source sound direction corresponds to an interference sound source. The first audio signal received by the embedded microphone at the target direction is set as a secondary signal of the target algorithm, and the second audio signal received by an external microphone is set as a primary signal of the target algorithm. The audio signal of the primary sound source is separated from the primary signal and the secondary signal through the target algorithm.
The mobile apparatus of the embodiment of the disclosure includes (but is not limited to) an embedded microphone, a communication transceiver, and a processor. The embedded microphone is used for sound reception. The communication transceiver is communicatively connected to an external microphone and used to receive signals from the external microphone. The processor is coupled to the embedded microphone and the communication transceiver. The processor is configured to perform the following operation. A target direction of multiple sound-reception directions and a target distance corresponding to the target direction is determined according to multiple first audio signals in the sound-reception directions received by the embedded microphone. A target algorithm is selected from multiple blind signal separation (BSS) algorithms according to the target direction and the target distance. The first audio signal received by the embedded microphone at the target direction is set as a secondary signal of the target algorithm, and the second audio signal received by an external microphone is set as a primary signal of the target algorithm. The audio signal of the primary sound source is separated from the primary signal and the secondary signal through the target algorithm. A primary sound source is located in the target direction and at the target distance from the embedded microphone, the target direction is determined based on a correlation between the first audio signals and the second audio signal received by the external microphone, and the target distance is determined based on signal power of a first audio signal in the target direction. The target algorithm is determined based on an included angle between the target direction and the interference source sound direction and the magnitude of the target distance, and the interference source sound direction corresponds to an interference sound source.
Based on the above, according to the audio signal processing method and the mobile apparatus according to the embodiments of the disclosure, the audio signal of the primary sound source can be separated from the mixed signal (e.g., the first audio signal and the second audio signal) by using the corresponding target algorithm according to the location of the primary sound source. In this way, when the user uses the external microphone, only a single human vocal signal of the primary user can be transmitted from the microphone path.
In order to make the above-mentioned features and advantages of the disclosure comprehensible, embodiments accompanied with drawings are described in detail below.
The embedded microphone 11 can be a type of microphone, such as dynamic, condenser, or electret condenser, etc., and the embedded microphone 11 may also be a combination of other electronic elements, analog-to-digital converters, filters, and audio processors capable of receiving sound waves (e.g., human voice, ambient sound, machine operation sound, etc.) (i.e., sound reception or sound recording) and converting them into audio signals. The embedded microphone 11 is combined with the body of the mobile apparatus 10. In one embodiment, two or more embedded microphones 11 form a microphone array to provide a directional beam. In one embodiment, the embedded microphone 11 is used to receive/record the human speaker to obtain the voice signal. In some embodiments, the voice signal may include the voice of the human speaker, the sound from a speaker apparatus (not shown) and/or other ambient sounds.
The communication transceiver 12 can support Bluetooth, universal serial bus (USB), optical fiber, S/PDIF, 3.5 mm, or other audio transmission interfaces. In one embodiment, the communication transceiver 12 is used to receive (audio) signals from the external microphone 15.
The storage device 13 may be any type of fixed or movable random access memory (RAM), read only memory (ROM), flash memory, conventional hard disk drive (HDD), solid-state drive (SSD) or similar components. In one embodiment, the storage device 13 is used to store program codes, software modules, configuration, data (e.g., audio signals, algorithm parameters, etc.) or files, and the embodiments thereof are described in detail below.
The processor 14 is coupled to the embedded microphone 11, the communication transceiver 12, and the storage device 13. The processor 14 may be a central processing unit (CPU), a graphics processing unit (GPU), or other programmable general-purpose or special-purpose microprocessors, a digital signal processor (DSP), a programmable controller, a field programmable gate array (FPGA), an application-specific integrated circuit (ASIC), a neural network accelerator, or other similar components, or combinations of components thereof. In one embodiment, the processor 14 is used to execute all or some of the operations of the mobile apparatus 10, and can load and execute various program codes, software modules, files, and data stored in the storage device 13. In some embodiments, the functions of the processor 14 can be realized by software or chips.
The external microphone 15 can be a type of microphone, such as dynamic, condenser, or electret condenser, etc., and the external microphone 15 may also be a combination of other electronic elements, analog-to-digital converters, filters, and audio processors capable of receiving sound waves (e.g., human voice, ambient sound, machine operation sound, etc.) (i.e., sound reception or sound recording) and converting them into audio signals. The external microphone 15 can be omnidirectional or directional. In one embodiment, the external microphone 15 is an earphone microphone or a microphone of a wearable device. In one embodiment, the external microphone 15 is used to receive/record the human speaker to obtain the voice signal. In some embodiments, the voice signal may include the voice of the human speaker, the sound from a speaker apparatus (not shown) and/or other ambient sounds.
Hereinafter, the method according to the embodiment of the disclosure is described in conjunction with various components and modules in the mobile apparatus 10 and the external microphone 15. Each process of the method can be adjusted according to the implementation, and is not limited to thereto.
There are many ways to determine the sound-reception direction. In one embodiment, the processor 14 can form beams in multiple sound-reception directions (or directional angles) through the embedded microphone 11, such as the beams in the sound-reception directions θ1 and θ2 as shown in
In one embodiment, the target direction is determined based on the correlation between the first audio signals and the second audio signal received by the external microphone 15. For example, the processor 14 respectively calculates an orthogonal cross-correlation for each of the first audio signals and the second audio signal. If the correlation between a certain first audio signal and the second audio signal is the largest, the processor 14 sets the sound-reception direction corresponding to this first audio signal as the target direction.
Taking
On the other hand, in response to the fact that the first correlation R1 is not greater than the second correlation R2, the processor 14 may use the evaluation signal as a candidate signal to be a (new) candidate for the target direction. In this way, the first audio signal with the greatest correlation can be found, and its sound-reception direction is used as the target direction.
It should be noted that, if there are more than two greatest correlations, the processor 14 may determine the target direction between the sound-reception directions corresponding to these correlations according to a difference method.
In another embodiment, the direction of the primary sound source relative to the mobile apparatus 10 may be estimated based on the angle of arrival (AOA, or degree of arrival, DOA) positioning technology. For example, the processor 14 can determine the direction based on the time difference between two sound waves of audio signals from the primary sound source respectively arriving at the two embedded microphones 11 and the distance between the two embedded microphones 11, and thereby the direction is set as the target direction.
On the other hand, the target distance is determined based on the signal power of the first audio signal in the target direction. If the signal power is stronger, the target distance is closer; if the signal power is lower, the target distance is farther. For example, the signal power is inversely proportional to the square of the target distance, but may still be affected by factors such as the environment and receiver sensitivity.
Taking
For another example, the corresponding relationship between signal power and distance has been defined in a comparison table or conversion formula and can be loaded into the processor 14 to estimate the target distance.
Referring to
The blind signal separation algorithm includes an independent component analysis (ICA) algorithm and a sparse component analysis (SCA) algorithm.
The independent component analysis assumes that each sound source is independent of each other, and the audio signals of these sound sources do not affect the nature of the audio signal after being mixed, so the inverse transfer function matrix obtained by estimation (i.e., the separation matrix) is multiplied by the mixed signal to obtain the separated audio signal.
For example,
The sparse component analysis assumes that the audio signal of the sound source is very sparse in some domains. “Sparse” means that most of the values of the audio signal are close to 0, that is, each component point in the mixed signal usually has only one primary sound source. For example, a voicegram (or referred to as a spectrogram) can be viewed as the change of voice frequency components over time, and voice signals from different people have different sound characteristics (e.g., fundamental frequency, double frequency, speech tempo, or pauses), so that the intersection of voicegrams of different sound sources is very small (or disjointed). Therefore, each time-frequency domain unit in the voicegram of the mixed signal coming from only one of the sound sources is known as a sparse characteristic.
The target algorithm is determined based on an included angle between the target direction and the interference source sound direction and the magnitude of the target distance, and the interference source sound direction corresponds to an interference sound source.
According to the Gaussian distribution characteristics of the voice (i.e., the first audio signal and the second audio signal approach Gaussian distribution), the voice signal is initially separated by the independent component analysis, and the specified objective function defined in the calculation process (i.e., the target algorithm) changes according to the target direction and target distance of the primary sound source relative to the mobile apparatus 10.
Negentropy is a non-Gaussian measurement method. In information theory, the entropy of a random variable is related to information. Negentropy can be defined as:
where ygauss is a random variable conforming to the Gaussian distribution, y is a random variable corresponding to the primary signal and the secondary signal, and
py(τ) is the probability density function of the random variable y. Function (1) can be approximated as:
where E{ } is the expected function, and the parameter G can be selected from the parameters G1, G2 and G3:
a1 is a constant.
In one embodiment, the processor 14 can compare the target distance with a distance threshold (e.g., 10 cm, 15 cm or 30 cm). In response to the target distance being not less than the distance threshold, the processor 14 sets the target algorithm as the first independent component analysis algorithm using the parameter G1. That is, the processor 14 selects the first independent component analysis algorithm using the parameter G1 as the target algorithm. Since the user usually does not get too close to the mobile apparatus 10 in general use, the parameter G1 is usually adopted. In response to the target distance being less than the distance threshold, the processor 14 sets the target algorithm as the second independent component analysis algorithm using the parameter G2. That is, the processor 14 selects the second independent component analysis algorithm using the parameter G2 as the target algorithm to obtain better stability.
In one embodiment, the processor 14 can determine the software and hardware resources of the mobile apparatus 10 and the load of the corresponding computation. In response to the computational limit (e.g., the access speed or bandwidth of the storage device 13 or the processing speed of the processor 14), the processor 14 sets the target algorithm as the third independent component analysis algorithm using the parameter G3. That is, the processor 14 selects the third independent component analysis algorithm using the parameter G3 as the target algorithm, so as to meet the requirement of a small computation.
In addition, according to individual voice characteristics, sparse component analysis can be used to separate the voice signal more completely. In the embodiment of the disclosure, the target algorithm of the primary sound source is changed relative to the target direction and target distance of the mobile apparatus 10.
For example,
In order to project the mixed signals x1 and x2 into a sparse domain, the processor 14 can find its primary two directions (e.g., the target direction and the interference source sound direction). Referring to
In one embodiment, if the included angle between the target direction and the interference source sound direction is larger, the target direction and the interference source sound direction estimated by the nonlinear projection column masking algorithm may deviate from the actual direction. The processor 14 can compare the included angle between the target direction and the interference source sound direction with an angle threshold (e.g., 45 degrees, 60 degrees or 90 degrees). In response to the fact that the included angle between the target direction and the interference source sound direction is larger than the angle threshold, the processor 14 sets the target algorithm as a principal component analysis algorithm. That is, the processor 14 selects to use the principal component analysis algorithm as the target algorithm. In response to the fact that the included angle between the target direction and the interference source sound direction is not greater than the angle threshold, the processor 14 sets the target algorithm as a nonlinear projection column masking algorithm. That is, the processor 14 selects to use the nonlinear projection column masking algorithm as the target algorithm.
Referring to
To sum up, in the audio signal processing method and the mobile apparatus according to the embodiments of the disclosure, when an external microphone is used, the audio signal received by the external microphone is used as the primary signal. At the same time, the embedded microphone of the mobile apparatus is turned on, and the audio signal of the embedded microphone is used as the secondary signal. According to the direction and distance of the primary sound source relative to the mobile apparatus, and using the suitable blind signal separation technology, only the single audio signal of the primary sound source is transmitted on the microphone path, thereby strengthening the audio signal of the primary sound source.
Although the disclosure has been described in detail with reference to the above embodiments, they are not intended to limit the disclosure. Those skilled in the art should understand that it is possible to make changes and modifications without departing from the spirit and scope of the disclosure. Therefore, the protection scope of the disclosure shall be defined by the following claims.
Number | Date | Country | Kind |
---|---|---|---|
111148595 | Dec 2022 | TW | national |