The application claims the priority benefit of Taiwan application serial no. 111135798, filed on Sep. 21, 2022. The entirety of the above-mentioned patent application is hereby incorporated by reference herein and made a part of the specification.
The disclosure relates to a signal processing, and more particularly, to a processing apparatus of a sound signal and a processing method of a sound signal.
Respirators prevent the wearer from inhaling components such as smoke, aerosols, dusts or microorganisms. Therefore, some governments recommend that people wear respirators for infectious diseases transmitted by droplets.
It is worth noting that with the advancement of technology, many electronic products provide sound control functions. The sound control function needs to rely on voice identification technology. However, respirators block the transmission of sound waves, which affect the frequency response of the sound signal, thereby reducing the accuracy of the voice identification system.
Some embodiments of the disclosure provide a processing apparatus of a sound signal and a processing method of a sound signal, which can restore the sound signal, thereby improving the accuracy of voice identification.
The processing method of the sound signal according to an embodiment of the disclosure includes (but is not limited to) the followings: receiving the sound signal; identifying a respirator type; and modifying the sound signal according to the respirator type. The respirator type is a type of a respirator corresponding to the sound signal.
The processing apparatus of the sound signal according to an embodiment of the disclosure includes (but is not limited to) a memory and a processor. The memory is used to store a code. The processor is coupled to the memory. The processor is for loading the code. The processor receives the sound signal, identifies a respirator type, and modifies the sound signal according to the respirator type. The respirator type is a type of a respirator corresponding to the sound signal.
Based on the above, according to the processing apparatus and the processing method of the sound signal according to the embodiment of the disclosure, the sound signal is modified according to the identification result of the respirator. Accordingly, the interference of the respirator on the sound wave may be reduced, thereby improving voice identification.
In order to make the above-mentioned and other features and advantages of the disclosure easier to understand, the following embodiments are given and described in detail with the accompanying drawings as follows.
The memory 11 may be any type of a fixed or a removable random access memory (RAM), a read only memory (ROM), a flash memory, a hard disk drive (HDD), a solid state drive (SSD), or similar components. In an embodiment, the memory 11 is used to store a code, a software module, a configuration, data, or a file (e.g., a signal, a model, or a feature), which will be described in detail in subsequent embodiments.
The processor 12 is coupled to the memory 11. The processor 12 may be a central processing unit (CPU), a graphic processing unit (GPU), or other programmable general-purpose or special-purpose microprocessors, a digital signal processor (DSP), a programmable controller, a field programmable gate array (FPGA), an application-specific integrated circuit (ASIC), a neural network accelerator, or other similar components or a combination of the above components. In an embodiment, the processor 12 is used to execute all or part of the operations of the processing apparatus 10, and may load and execute various codes, software modules, files, and data stored in the memory 11. In some embodiments, part of the operations in the method of the embodiment of the disclosure may be implemented by different or the same processor 12.
In an embodiment, the processing apparatus 10 further includes a microphone 13. The processor 12 is coupled to the microphone 13. For example, the microphone 13 is connected to the processing apparatus 10 through USB, Thunderbolt, Wi-Fi, Bluetooth, or other wired or wireless communication technologies. As another example, the processing apparatus 10 has the built-in microphone 13. The microphone 13 may be a type of microphone such as a dynamic microphone, a condenser microphone, or an electret condenser microphone. The microphone 13 may also be a combination of other electronic components, analog-to-digital converters, filters, and audio processors that may receive the sound wave (e.g., a human sound, an ambient sound, a machine operation sound) and convert them into the sound signal. In an embodiment, the microphone 13 is used to pick up/record the sound of the speaker, so as to obtain the sound signal.
In an embodiment, the processing apparatus 10 further includes an image capturing device 14. The processor 12 is coupled to the image capturing device 14. For example, the image capturing device 14 is connected to the processing apparatus 10 through USB, Thunderbolt, Wi-Fi, Bluetooth, or other wired or wireless communication technologies. As another example, the processing apparatus 10 has the built-in image capturing device 14. The image capturing device 14 may be a camera, a video camera, or a monitor, and captures the image within a specified field of view accordingly. In an embodiment, the image capturing device 14 is used for taking a picture or a video of the speaker.
Hereinafter, the method according to the embodiment of the disclosure will be described in conjunction with the various apparatuses, components, and modules in the processing apparatus 10. Each process of the method may be adjusted accordingly according to the situation of implementation, and is not limited hereto.
The processor 12 identifies the respirator type (step S220). Specifically, the respirator type is the respirator corresponding to the sound signal. For example, the speaker wears the respirator of the respirator type and speaks. As another example, the sound wave from other sound sources passes through the respirator of the respirator type. There are many types of respirators. For example,
In an embodiment, the processor 12 takes a picture of the speaker or other sound sources through the image capturing device 14 to obtain the image of the speaker or the sound source, or obtains the image captured from an external video apparatus. Next, the processor 12 may identify the respirator type of the respirator in the image.
For example, the processor 12 may pre-process the image through an OpenCV algorithm (e.g., adjust contrast, adjust brightness, or crop the image), and identify the respirator type through a classifier. The classifier is trained based on a machine learning algorithm (e.g., a supervised or a semi-supervised learning). The classifier may be used for object identification/detection. There are many algorithms for object identification, for example, a YOLO (You Only Look Once), a SSD (Single Shot Detector), or a R-CNN. Alternatively, the processor 12 may realize object identification through a feature-based matching algorithm (e.g., a histogram of oriented gradient (HOG), a Harr, or a speeded up robust features (SURF)).
It should be noted that the embodiment of the disclosure does not limit the algorithm used for object identification/detection. In an embodiment, the aforementioned object detection may also be performed by an external apparatus which provides the identification result to the processing apparatus 10.
In another embodiment, the processor 12 may identify the respirator type according to the sound feature of the sound signal. For example, the respirator blocks high frequency bands (2 to 10 (kHz)) more apparently. Different respirator types, for example, have large differences in attenuation between 2 and 5 kHz. Therefore, the processor 12 may distinguish different respirator types based on the attenuation amplitude of the sound signal at a specific frequency or frequency band in the frequency domain.
It should be noted that there are many kinds of sound features, and may be values obtained by a specific algorithm. As long as the different respirator types have a different value for a specific sound feature, they may be used to identify respirator types.
Referring to
It should be noted that the embodiment of the disclosure is not limited to the three respirator types, and the processor 12 may directly determine without sequentially comparing the respirator types. That is, the processor 12 may simultaneously perform steps S520, S540, S560 to directly determine if the speaker or the sound source corresponds to the first respirator, the second respirator, the third respirator, or other respirator types, and obtain the corresponding compensation signal.
C
X(f)=H(f)−MX(f)
where x is the number of the respirator type and may be a positive integer. For example, x=1 is the basic type, x=2 is the pattern type, and x=3 is the fit type. H(f) is the original signal, and Mx(f) is the training signal for the type x respirator. The compensation signal may be stored in the memory 11 and used for subsequent modification of the sound signal.
As an example,
The compensation value of the compensation signal of different respirator types may be different at different frequencies. For example,
The modified sound signal may be used for voice identification. In an embodiment, the processor 12 may identify in response to the sound signal is a registered signal according to the modified sound signal. The registered signal is the signal that is allowed to pass the verification, for example, the sound signal of a registrant who passed the identify verification.
There are many methods of voice identification.
On the other hand, the processor 12 may obtain the acoustic feature of the modified sound signal (step S124). Similarly, the processor 12 may derive acoustic signatures using an MFCC, a fBank, a log FBank, or other algorithms. Next, the processor 12 may generate a tested acoustic model of the speaker or other sound sources according to a second acoustic feature of the modified sound signal (step S125).
The processor 12 may compare the tested acoustic model with the registered acoustic model in the model library (step S126) and determine whether the sound signal is the registered signal according to the comparison result between the registered acoustic model and the tested acoustic model (step S127). In response to the sound signal is the registered signal according to a comparison result showing the registered acoustic model is the same as the tested acoustic model. For example, if the registered acoustic model is the same as the tested acoustic model, the sound signal is the registered signal. For example, the current speaker is the registrant. If the registered acoustic model is different from the tested acoustic model, the sound signal is not the registered signal. For example, the current speaker is not the registrant. Alternatively, the processor 12 may directly identify whether the sound signal is the registered signal by using an identification model based on the machine learning algorithm.
In other embodiments, the modified sound signal may also be used by other voice identification applications, for example, voice-to-text, voice dialing, voice command, or voice navigation.
To sum up, in the processing apparatus and the processing method of the sound signal of the embodiment of the disclosure, the corresponding compensation signal is provided for the respirator type identified so as to modify the sound signal. Accordingly, the distortion caused by the respirator may be corrected, thereby improving the accuracy of voice identification.
Although the disclosure has been described with reference to the embodiments above, the embodiments are not intended to limit the disclosure. Any person skilled in the art may make some changes and modifications without departing from the spirit and scope of the disclosure. Therefore, the scope of the disclosure will be defined in the appended patent application.
Number | Date | Country | Kind |
---|---|---|---|
111135798 | Sep 2022 | TW | national |