This application claims the priority benefit of Taiwan application serial no. 110114321, filed on Apr. 21, 2021. The entirety of the above-mentioned patent application is hereby incorporated by reference herein and made a part of this specification.
The disclosure generally relates to a signal analysis technique, and in particular, to an apparatus and a method for audio signal processing selection.
Conventional audio signal processing operations include various noise reduction techniques. Different audio transmission modes (for example, a built-in loudspeaker, an earphone, or an external loudspeaker) used in an application (e.g., Skype, Teams, etc.) may result in a significant difference in the effect.
Accordingly, the embodiment of the disclosure is directed to an apparatus and a method for audio signal processing selection capable of providing an appropriate audio signal processing operation for a specific application and a specific audio output mode.
A method for audio signal processing selection in an embodiment of the disclosure includes (but not limited to): respectively performing multiple audio signal processing operations on a synthesized audio signal to generate multiple processed audio signals; evaluating the audio signal processing operations according to multiple comparison results of the processed audio signals and a primary signal, and selecting one of the audio signal processing operations corresponding to a designated application and a designated audio output mode according to an evaluation result corresponding to the audio signal processing operations. The synthesized audio signal is generated by adding a secondary signal into a primary signal, and the audio signal processing operations are related to removing the secondary signal from the synthesized audio signal. The processed audio signals are used by an identical designated application at an identical designated audio output mode, and the comparison results are related to a signal similarity. The evaluation result is related to one of the comparison results with the highest signal similarity.
An apparatus for audio signal processing selection in an embodiment of the disclosure includes (but not limited to) a storage and a processor. The storage is configured to store a code. The processor is coupled to the storage and is configured to load the code to execute: respectively performing multiple audio signal processing operations on a synthesized audio signal to generate multiple processed audio signals; using the processed audio signals at an identical designated audio output mode by an identical designated application; respectively evaluating the audio signal processing operations according to multiple comparison results between the processed audio signals and the primary signal and selecting one of the audio signal processing operations corresponding to the designated application and the designated audio output mode according to an evaluation result corresponding to the audio signal processing operations. The synthesized audio signal is generated by adding a secondary signal into a primary signal, and the audio signal processing operations are related to removing the secondary signal from the synthesized audio signal. The comparison results are related to signal similarity, and the evaluation result is related to one of the comparison results with the highest similarity.
In light of the above, the apparatus and the method for audio signal processing selection in the embodiments of the disclosure seek an audio signal processing operation which can output an audio signal which is the most similar to the primary signal for the designated application and the designated audio output mode. Accordingly, when the application and the audio output mode change, the most appropriate audio signal processing operation can be spontaneously switched.
To facilitate understanding of the features and advantages of the disclosure, reference will now be made in detail to the present exemplary embodiments of the disclosure, examples of which are illustrated in the accompanying drawings.
The storage 110 may be any type of fixed or mobile random access memory (RAM), read only memory (ROM), flash memory, hard disk drive (HDD), solid-state drive (SDD), or other similar devices. In an embodiment, the storage 110 is used to record programming codes, software modules (for example, a synthesis module 111, an application control module 113, an audio signal processing module 115, an evaluation module 117, and a selection module 119), a configuration setting, data, or a file (for example, an audio signal, a comparison result, and an evaluation result). Details of the above will be described in detail in the following.
The processor 150 is coupled to the storage 110, and the processor 150 may be a central processing unit (CPU), a graphic processing unit (GPU), or other programmable general-purpose or designated microprocessors, digital signal processor (DSP), programmable controller, field programmable gate array (FPGA), application-specific integrated circuit (ASIC), neural network accelerator, or similar device, or any combination of the above devices. In an embodiment, the processor 150 is used to execute some or all of the tasks of the apparatus 100 for audio signal processing selection and may load and execute each software module, code, file, and data stored in the storage 110.
In the following, a method according to an embodiment of the disclosure will be described with reference to the respective elements, modules, and signals of the apparatus 100 for audio signal processing selection. Each procedure in the method may be adjusted according to the practice, and is not limited thereto the following description.
In an embodiment, the synthesis module 111, for example, may superimpose the two signals SM and SN on the frequency spectrum or adopt other synthesis techniques. In another embodiment, the apparatus 100 for audio signal processing selection may simultaneously play the primary signal SM and the secondary signal SN through a built-in, an add-on or an external loudspeaker and further record the signals so as to obtain the synthesized audio signal SS.
On the other hand, in an embodiment, the audio signal processing operation on the synthesized audio signal SS performed by the audio signal processing module 115 is related to removing the secondary signal SN from the synthesized audio signal SS. For example, one of the purposes of the audio signal processing operation is to restore the primary signal SM or eliminate noise. A noise reduction/cancellation (or sound source separation) technique, for example, generates a signal with a phase opposite to the phase of a noise sound wave or adopts independent components analysis (ICA) to eliminate noise (that is, the secondary signal SN) from the synthesized audio signal SS. The embodiments of the disclosure do not intend to limit the type of the techniques.
The signal outputs through different audio signal processing techniques based on the same input signal may differ regarding the frequency, the waveform, or the amplitude. If multiple audio signal processing techniques are to be evaluated, the audio signal processing module 115 may integrate the audio signal processing techniques and process the synthesized audio signal SS by respectively adopting different audio signal processing techniques. In addition, to understand a removal capability of a specific audio signal processing operation on different secondary signals SN, the synthesis module 111 may also respectively incorporate different types of the secondary signals SN for subsequent evaluation training.
On the other hand, the application control module 113 may use the processed audio signals S1ns to SNns all at the same designated audio output mode through the same designated application. The designated audio output mode is one of multiple audio output modes. The audio output mode is, for example, a built-in loudspeaker, an earphone, or an external loudspeaker. Loudspeakers or earphones of different types or different manufacturers may be considered different audio output modes. In addition, the designated application is one of multiple applications. The applications may use an audio signal. The application is, for example, a video communication software, voice call software, music software, or video player software. In the embodiment of the disclosure, the same application condition (that is, the same designated audio output mode and the same designated application) is evaluated and selected for the processed audio signals S1ns to SNns. In a practical operation, the application control module 113 may start up the designated application and set up the designated audio output mode, and use the input audio signal as an audio signal for recording or playing and input the signal into the designated application.
In an embodiment, referring to
In another embodiment, referring to
The evaluation module 117 respectively evaluates the audio signal processing operations according to multiple comparison results between the processed audio signals S1ns to SNns (or the simulating output audio signals S1c to SNC) and the primary signal SM (step S330). Specifically, the evaluation module 117 compares the processed audio signals S1ns to SNns output through the different audio signal processing operations with the primary signal SM so as to generate multiple comparison results. The comparison results are related to signal similarity. Signal similarity is, for example, similarity of voice print characteristics, semantic recognition (for example, correctness of a text content after a speech-to-text conversion), or the residual of the secondary signal SN (for example, the signal intensity in a certain frequency band). Various methods are available to compare signal similarity. For example, if the primary signal SM is a clean human voice signal without noise, the evaluation module 117 may adopt a comparison combining voice print characteristics and semantic recognition. Another example, if the primary signal SM is a blank silence signal, the higher similarity represents a weaker signal. In other words, for the comparison on the noise suppression capabilities of the audio signal processing operations, the weaker signals of the processed audio signals S1ns to SNns represent the better noise suppression capability.
The evaluation module 117 may select one or more audio signal processing operations corresponding to the designated application and the designated audio output mode according to the evaluation result corresponding to the audio signal processing operations (step S350). Specifically, the evaluation result is related to the comparison results with the highest signal similarity. In other words, the higher signal similarity represents that the corresponding audio signal processing operation is more appropriate for the designated application and the designated audio output mode. On the other hand, the lower signal similarity represents that the corresponding audio signal processing operation is less appropriate for the designated application and the designated audio output mode. The evaluation module 117 may select one or more audio signal processing operations with the highest similarity, the second highest similarity, or other rankings from the audio signal processing operations and relate the selected audio signal processing operation to the designated application and the designated audio output mode.
For the evaluation on multiple applications and audio output modes, the application control module 113 may select another application and audio output mode as the designated application and the designated audio output mode, and the evaluation module 117 determines an appropriate audio signal processing operation for another application and audio output mode.
In an embodiment, the appropriate audio signal processing operation is already determined. When the designated audio output mode and the designated application are selected (that is, the application control module 113 determines a currently selected audio output mode as the designated audio output mode and a currently selected application as the designated application), the selection module 119 may use an audio signal processing operation selected according to the evaluation result to process the audio signal of the designated application. That is, the most appropriate audio signal processing operation is selected according to the evaluation result for the designated application and the designated audio output mode. For example, a user starts up a video communication software and sets up a loudspeaker output, the selection module 119 may select the audio signal processing operation corresponding to the video communication software and the loudspeaker output.
On the other hand, when the designated audio output mode and the designated application are not selected (that is, the application control module 113 determines a currently selected audio output mode is not the designated audio output mode and a currently selected application is not the designated application), the selection module 119 may switch to other audio signal processing operation. In other words, if the currently selected audio output mode is switched to a second designated audio output mode, and the currently selected application is switched to a second designated application, the selection module 119 may switch to an audio signal processing operation corresponding to the second designated application and the second designated audio output mode. For example, a user starts up a voice call software after finishing a video communication and sets up an earphone output, the selection module 119 may switch to an audio signal processing operation corresponding to the voice call software and the earphone output.
In summary, in the apparatus and the method for audio signal processing selection in the embodiments of the disclosure, an appropriate audio signal processing operation for a specific application and audio output mode is obtained through training. When an application and an audio output mode change, the method and the apparatus according to the embodiments of the disclosure may spontaneously switch to the most appropriate audio signal processing operation.
It will be apparent to those skilled in the art that various modifications and variations can be made to the structure of the disclosure without departing from the scope or spirit of the disclosure. In view of the foregoing, it is intended that the disclosure cover modifications and variations of this disclosure provided they fall within the scope of the following claims and their equivalents.
Number | Date | Country | Kind |
---|---|---|---|
110114321 | Apr 2021 | TW | national |
Number | Name | Date | Kind |
---|---|---|---|
7464029 | Visser | Dec 2008 | B2 |
8208654 | Coutinho et al. | Jun 2012 | B2 |
20070010978 | Crutchfield | Jan 2007 | A1 |
20110096942 | Thyssen | Apr 2011 | A1 |
20150373474 | Kraft et al. | Dec 2015 | A1 |
20210041953 | Poltorak | Feb 2021 | A1 |
Number | Date | Country |
---|---|---|
104160714 | Nov 2014 | CN |
201835784 | Oct 2018 | TW |
Number | Date | Country | |
---|---|---|---|
20220343889 A1 | Oct 2022 | US |