This application claims the priority benefit of Taiwan application serial no. 112116738, filed on May 5, 2023. The entirety of the above-mentioned patent application is hereby incorporated by reference herein and made a part of this specification.
This disclosure relates to a sound signal processing, and more particularly, to an audio parameter optimizing method and a computing apparatus related to audio parameters.
After a mobile apparatus is connected to a smart speaker set, audio signals may be transmitted to a smart speaker. After the smart speaker decodes the audio signals and performs a sound effect processing, music may be played.
It is worth noting that audio adjustment of an audio system usually focuses on a part of the sound that a user wants to enhance. However, different audio sources may have different sound characteristics. Therefore, a single audio adjustment is not suitable for all the audio signals.
An embodiment of the disclosure provides an audio parameter optimizing method and a computing apparatus related to audio parameters, and provides the proper audio parameters.
An audio parameter optimizing method in the embodiment of the disclosure includes (but is not limited to) the following. A sound signal is divided into multiple sound frames in a time domain. A wide dynamic range compression (WDRC) parameter corresponding to the sound signal is adjusted according to a maximum root mean square (RMS) and an average root mean square of the sound frames. An output power corresponding to an input power between the maximum root mean square and the average root mean square in the wide dynamic range compression parameter is increased.
A computing apparatus related to audio parameters in the embodiment of the disclosure includes (but is not limited to) a storage and a processor. The storage device is configured to store a program code. The processor is coupled to the storage device. The processor is configured to load the program code to divide a sound signal into multiple sound frames in a time domain, and adjust a wide dynamic range compression parameter corresponding to the sound signal according to a maximum root mean square and an average root mean square of the sound frames. The processor is configured to increase an output power corresponding to an input power between the maximum root mean square and the average root mean square in the wide dynamic range compression parameter.
Based on the above, according to the audio parameter optimizing method and the computing apparatus related to the audio parameters in the embodiment of the disclosure, the wide dynamic range compression parameter suitable for music listening may be defined based on sound features (e.g., the maximum root mean square and the average root mean square).
In order for the aforementioned features and advantages of the disclosure to be more comprehensible, embodiments accompanied with drawings are described in detail below.
The storage device 11 may be any type of fixed or removable random access memory (RAM), read-only memory (ROM), flash memory, conventional hard disk drive (HDD), solid-state drive (SSD), or similar elements. In an embodiment, the storage device 11 is configured to record program codes, software modules, configuration settings, data, or files (such as sound signals, sound features, or parameters), and will be described in detail in subsequent embodiments.
The processor 12 is coupled to the storage device 11. The processor 12 may be a central processing unit (CPU), a graphic processing unit (GPU), or other programmable general-purpose or special-purpose microprocessors, a digital signal processor (DSP), a programmable controller, a field programmable gate array (FPGA), an application-specific integrated circuit (ASIC), a neural network accelerator, or other similar elements or a combination of the above elements. In an embodiment, the processor 12 is configured to perform all or some of operations of the computing apparatus 10, and may load and execute the software modules, the files, and/or the data stored in the storage device 11. In some embodiments, functions of the processor 12 may be implemented through software.
Hereinafter, the various elements, modules, and signals in the computing apparatus 10 will be used to describe a method in the embodiment of the disclosure. Each of processes of the method may be adjusted according to an implementation situation and are not limited thereto.
For example,
According to different design requirements, in some embodiments, the processor 12 may select some of the sound frames for subsequent use. In addition, different sound frames may have different time lengths.
In an embodiment, the processor 12 may divide a left sound channel and a right sound channel of the sound signal respectively into the sound frames. The sound signal is a binaural stereo signal. In some application scenarios, content of the left sound channel and the right sound channel of the single sound signal may be different. For example, in a symphony, sound of string instruments in a left sound channel thereof occupies a larger proportion, while sound of wind instruments in a right channel thereof occupies a larger proportion. The sound signals of the left sound channel and the right sound channel may be regarded as two sound signals. The processor 12 may divide the two sound signals into a sound frame of the left sound channel and a sound frame of the right sound channel respectively. Assuming that each of the sound channels may be divided into M sound frames (M is a positive integer), there are a total of 2M sound signals for the two sound channels.
Referring to
The processor 12 may calculate a power of each of the sound frames according to the following Formula (1) (i.e., the root mean square).
xRMS is the power of the single sound frame (i.e., the root mean square), and n is a total number of the sampling points in a certain sound frame. x1 is intensity of the certain sound frame at the first sampling point (e.g., a first sampling value). x2 is the intensity of the certain sound frame at the second sampling point (e.g., a second sampling value). xn is the intensity of the certain sound frame at the nth sampling point (e.g., an nth sampling value). The rest may be derived by analogy.
In addition, the maximum root mean square is a maximum power of the sound frames divided from the single sound signal. The average root mean square is a power average of the sound frames divided from the single sound signal. Based on an experimental result, for the sound signal related to music listening, an (input) power interval between the maximum root mean square and the average root mean square may be regarded as an important interval. For the important interval, an output power may be appropriately amplified. The sound signal used for the music listening may refer to a sound signal of music genre or the sound signal expected to be played through home stereos, smart speakers, or headphones.
On the other hand, wide dynamic range compression may adjust the output power according to changes in the input power of the sound signal to be adapted accordingly to specific or limited hearing dynamic ranges. However, different sound signals have different crest factors (a peak value of the waveform divided by a root mean square of the waveform), which makes the single wide dynamic range compression parameter not applicable to all the sound signals. In an embodiment, the wide dynamic range compression parameter is a corresponding relationship between the input power and the output power. For example, an input power of −10 decibels (dB) corresponds to an output power of −8 dB. The corresponding relationship may be represented or recorded through a lookup table or function.
For example, in determining the wide dynamic range compression parameter, the processor 12 regards the power interval between the maximum root mean square and the average root mean square of the input power as the important interval and amplifies the output power corresponding to the important interval accordingly. It is assumed that a ratio of the input power to the corresponding output power in the original wide dynamic range compression parameters is 1:1. For the above important interval, the processor 12 may adjust the ratio of the input power to the corresponding output power in the wide dynamic range compression parameter to (for example) 1:1.2 to 1.5. However, the disclosure is not limited thereto, and an upper limit of the output power is 0 dB.
In an embodiment, the processor 12 may enable a change amount of the output power corresponding to the input power interval between the maximum root mean square and the average root mean square in the adjusted wide dynamic range compression parameter to be equal to the change amount of the output power corresponding to the input power interval in the unadjusted wide dynamic range compression parameter. Specifically, the wide dynamic range compression parameter is the corresponding relationship between the input power and the output power. When the corresponding relationship is mapped to an X-Y coordinate system (for example, the input power corresponds to an X-axis, and the output power corresponds to a Y-axis), linear, curve, and/or other functions may be used to represent the corresponding relationship.
For example,
On the other hand, a part enhanced by a wide dynamic range compression parameter S1 for the music listening is the input power interval from an average root mean square (Avg Rms) to a maximum root mean square (Max Rms). Here, enhancement refers to an output power value being greater than a corresponding input power value. Therefore, compared to the unadjusted wide dynamic range compression parameter NC, the output power of the wide dynamic range compression parameter S1 is greater from an inflection point p1 to an inflection point p3. That is, the output power value between the inflection point p1 (the input power corresponds to the X-axis, and the output power corresponds to the Y-axis with coordinates (Avg Rms, Avg Rms)) and the inflection point p3 (coordinates (Max Rms, 0)) is greater than the corresponding input power value. The inflection point indicates that the linear functions of the adjacent input power and output power have different slopes.
For other input power intervals (for example, values thereof are less than the average root mean square Avg Rms corresponding to the inflection point p1) (regarded as non-important intervals, such as noise), the wide dynamic range compression parameter S1 may be the same as the unadjusted wide dynamic range compression parameter NC. That is, the output power values from the inflection point p1 to the inflection point p3 are equal to the corresponding input power values.
In order to reduce distortion (e.g., discontinuous amplification), a slope s2 of the input power interval from the inflection point p2 to the inflection point p3 in the wide dynamic range compression parameter S1 may be equal to the slope of s0=1. In other words, compared to the unadjusted wide dynamic range compression parameter NC, the change amount of the output power from the inflection point p2 to the inflection point p3 in the wide dynamic range compression parameter S1 is the same.
In an embodiment, the processor 12 may define a starting point of the input power interval for the linear function with the same slope or the same change amount of the output power within a unit interval thereof according to the root mean square. A difference between the starting point and the root mean square is 0 to 6 decibels (dB). Taking
In an embodiment, the processor 12 may increase the output power corresponding to the input power between the maximum root mean square and the average root mean square in the wide dynamic range compression parameter to at most 0 dB. In other words, a maximum value of the output power in the wide dynamic range compression parameter is 0 dB. In this way, damage to a rear-end speaker caused by the excessive output power may be avoided.
Taking
In an embodiment, the processor 12 may play the sound signal adjusted by the corresponding wide dynamic range compression parameter through a speaker. For example, the digital signal processor, a digital-to-analog converter, or other elements adjusts the sound signal according to the determined wide dynamic range compression parameter, then convert a digital sound signal to an analog sound signal, and output the analog sound signal through the speaker. In addition, as long as the input power of the sound signal is within the above important interval, the corresponding output power may be amplified.
Based on the above, in the audio parameter optimizing method and the computing apparatus related to the audio parameters of the disclosure, the appropriate wide dynamic range compression parameter is provided for the music listening, and distortion amplification and damage to the rear-end output apparatus may be avoided.
Although the disclosure has been described with reference to the above embodiments, they are not intended to limit the disclosure. It will be apparent to one of ordinary skill in the art that modifications to the described embodiments may be made without departing from the spirit and the scope of the disclosure. Accordingly, the scope of the disclosure will be defined by the attached claims and their equivalents and not by the above detailed descriptions.
Number | Date | Country | Kind |
---|---|---|---|
112116738 | May 2023 | TW | national |