The present disclosure generally relates to sound processing method and device which can increase the temporal resolution as well as the frequency resolution simultaneously by extracting a frequency of an input sound using DJ transform. The frequency extracted according to the present invention can be used in various fields such as sound recognition and sound synthesis.
The Short-time Fourier Transform (STFT) is used in various fields dealing with sound, such as speech recognition, speaker recognition, etc. to extract frequencies from a given sound. However, when frequencies are extracted by the STFT, there is a limitation on increasing the temporal resolution as well as the frequency resolution due to the Fourier uncertainty principle. The Fourier uncertainty principle states that if a sound of a short duration is transformed into a frequency component, then the resolution of the frequency component is relatively low, and if a sound with a longer duration is used to obtain a more precise frequency, then the temporal resolution for the instant when the frequency component is extracted decreases.
For example, when using the STFT, assume that a window size is 25 milliseconds, and a rectangular filter is used. The frequency component extracted under these conditions has a resolution of 40 Hz. In that case, even if 420 Hz frequency exists in an input sound, only 400 Hz frequency and 440 Hz frequency appear as the extraction result, and the 420 Hz frequency does not appear. For that reason, the distinction between a pure tone composed of 420 Hz frequency only and a complex tone composed of 400 Hz and 440 Hz frequencies is not clear. Now, assume that 4 kHz frequency exists on the extracted result. The extraction result does not give any information on the time point when the 4 kHz frequency occurred within the 25 milliseconds window. For example, it is not possible to distinguish whether the 4 kHz frequency occurred in the range of 0˜10 milliseconds or in the range of 10˜20 milliseconds.
In order to get a frequency resolution of 20 Hz, the window size should be extended to 50 milliseconds. However, since the temporal resolution is inversely proportionate to the frequency resolution, the temporal resolution decreases due to the 50 milliseconds window. Also, if the window size is reduced to 12.5 milliseconds to increase the temporal resolution, the frequency resolution is lowered to 80 Hz. Due to this trade-off, the temporal resolution and the frequency resolution cannot be improved simultaneously when using the STFT.
According to research findings, it is known that human hearing ability is not restricted by the Fourier uncertainty principle. The present disclosure intends to propose the sound processing method and device using the DJ transform method, a new frequency extraction method from understanding of the human hearing ability that improves the temporal resolution as well as the frequency resolution simultaneously based on the operating principle of hair cells constituting the cochlea.
A sound processing method according to an embodiment of the present disclosure comprises the steps of: sampling, by a computer, natural frequencies of a plurality of springs, the plurality of springs having natural frequencies different from each other and oscillate according to an input sound; determining filtered pure-tone amplitudes of the plurality of springs by the computer: calculating transient-state-pure-tone amplitudes of the plurality of modeled springs; calculating expected steady-state amplitudes of the plurality of modeled springs; calculating predicted pure-tone amplitudes based on the expected steady-state amplitudes; calculating filtered pure-tone amplitudes by multiplying the transient-state-pure-tone amplitudes with the predicted pure-tone amplitudes; extracting, by the computer, a natural frequency of at least one spring of the plurality of springs which corresponds to a local maximum value among the filtered pure-tone amplitudes; and using the natural frequency for sound recognition or sound synthesis.
A sound processing device according to an embodiment of the present disclosure comprises: a memory; and a processor configured to: produce displacements and velocities of a plurality of springs by modeling the plurality of springs which have natural frequencies different from each other and oscillate according to an input sound, the displacements and the velocities being recorded in the memory, calculate transient-state-pure-tone amplitudes of the plurality of modeled springs, calculate expected steady-state amplitudes of the plurality of modeled springs, calculating predicted pure-tone amplitudes on the basis of the expected steady-state amplitudes, calculate filtered pure-tone amplitudes by multiplying the transient-state-pure-tone amplitudes with the predicted pure-tone amplitudes, extract the natural frequency of at least one spring of the plurality of springs which corresponds to a local maximum value among the filtered pure-tone amplitudes, and use the natural frequency for sound recognition or sound synthesis.
A sound processing method according to an embodiment of the present disclosure comprises the steps of: sampling, by a computer, natural frequencies of a plurality of springs, the plurality of springs having natural frequencies different from each other and oscillate according to an input sound; estimating an expected steady-state amplitude of the spring of which the amplitude is the highest among the plurality of modeled springs; calculating an energy of at least one spring of the plurality of springs of which the amplitude is the highest based on the expected steady-state amplitudes; calculating an amplitude of the input pure tone based on the energy, and using the amplitude of the input pure tone for sound recognition or sound synthesis.
A sound processing device according to an embodiment of the present disclosure comprises: a memory; and a processor configured to: produce displacements and velocities of a plurality of springs by modeling the plurality of springs which have natural frequencies different from each other and oscillate according to an input sound, the displacements and the velocities being recorded in the memory, estimate an expected steady-state amplitude of a spring of which the amplitude is the highest among the plurality of modeled springs, calculate an energy of a spring of which the amplitude is the highest based on the expected steady-state amplitudes, calculate an input pure tone amplitude based on said energy, and use the input pure tone amplitude for sound recognition or sound synthesis.
Said expected steady-state amplitude can be calculated based on the amplitudes at two different time points within a duration of the input sound.
Said expected steady-state amplitude Ai, s can be calculated by means of the equation below:
where Ai,s is the expected steady-state amplitude of i-th spring Si among the plurality of springs, wherein I is a positive integer, where t1 and t2 are two different time points within a duration of input sound, t2>t1, Ai(t1) is an amplitude of said spring Si at t1, Ai(t2) is an amplitude of said spring Si at t2, ζ is a damping ratio of said spring Si, and ω satisfies the equation ω=ωi√{square root over (1−2ζ2)}, where ωi is the natural frequency of said spring Si.
A difference between the two different time points can be a period of the natural frequency of the corresponding spring.
If one of the two time points is t1, a sampling rate of the input sound is SR, and a period of the natural frequency of the corresponding spring is T, then the other t2 of the two time points can be calculated by the equation below.
t
2
=[t
1
+SR×T+0.5]
The expected steady-state amplitude can be calculated by substituting amplitudes at least two points in the duration of the input sound into the following equation and using a linear regression analysis.
A(t)=As+i(Ac−As)e−ζω(t−t
where A(t) is an amplitude of any spring among said plurality of springs at t, As is the expected steady-state amplitude of said spring, Ac is an amplitude of said spring at tc, tc is a time point before the at least two points in the duration of the input sound, ζ is a damping ratio of said spring, and ω satisfies the equation ω=wi√{square root over (1−2ζ2)}, where ωi is the natural frequency of the spring.
Said modeling step can comprise the steps of: measuring displacements and velocities at time points for each of the plurality of springs; calculating energy at each time point for each of the plurality of springs based on the displacements and the velocities; and calculating an amplitude at each time point for each of the plurality of springs based on said energy.
The number of the plurality of springs may be determined based on a range and a resolution of the frequency to be extracted.
The sound recognition can include at least one of: speech recognition; speaker verification; speaker identification; source separation; sound direction detection; sound-based nomenclature diagnostics; sound-based machine fault diagnostics; or Sonar for navigation undersea terrain or ranging objects.
Said method may be recorded on a non-transitory computer-readable recording medium according to an embodiment of the present disclosure.
A method for checking error among pure tone frequencies comprising: inputting, by a computer, a frequency of a plurality of springs to which an input sound is applied, the frequency maintains a first value to a certain point of time and turns into a second value at the certain point, wherein a result of frequency transform to the certain point indicates the first value, and checking that immediately after the turning point, a transient error from the first value to the second value is within 10%.
According to an embodiment of the present disclosure, a method for extracting a sound frequency and a sound amplitude, which shows improved temporal and frequency resolution simultaneously, is provided. Accordingly, sounds having similar frequencies can be further subdivided and classified, and accuracy of speech recognition can be improved by precisely extracting the order of phonemes information from the speech. In addition, a stable speech recognition can be performed in a noisy environment, and the size of data required for speech recognition learning can be reduced.
The above and other aspects, features, and advantages of certain embodiments of the present disclosure will be more apparent from the following description taken in conjunction with the accompanying drawings, in which:
The following description with reference to the accompanying drawings is provided to assist in a comprehensive understanding of various embodiments of the present disclosure.
Hair cells convert mechanical signals generated in the basement membrane into electrical signals and transfer the signals to the primary auditory)cortex. Hair cells consist of about 3,500 inner hair cells and 12,000 external hair cells, and each hair cell reacts sensitively to the sound of its own natural frequency. This characteristic of hair cells is similar to a phenomenon occurred in a spring of which amplitude increases because of resonance when the spring receives an external force with a frequency that matches the natural frequency of the spring. Using this similarity, the present disclosure models the behavior of hair cells using a plurality of springs.
The human audible frequency is known to be in the range of 20-20,000 Hz and the human voice frequency is known to be in the range of 80-8,000 Hz. The frequency range covered in the field such as speech recognition is within 8 kHz. Considering the same, when used for a voice processing, the natural frequencies of the springs from 50 Hz to 8 kHz are classified by 1 Hz intervals, and 7,951 different types of springs can be used based on those natural frequencies. This means that the frequency resolution is 1 Hz unit. However, this is only an example, and widening the frequency range or increasing the resolution by using more springs is possible.
The behavior of a hair cell modeled by a spring can be represented as a differential equation of motion for driven harmonic oscillations. A sound corresponds to an external force made up of a combination of various sine waves which are applied to a spring. Each spring has its own natural frequency and draws its own motion trajectory by a series of sound samples. The motion trajectory of each spring can be obtained by calculating the solution of the differential equation of motion for driven harmonic oscillations using numerical analysis techniques such as the Runge-Kuta method.
Assume that ωi is the natural frequency of a spring Si (1≤i≤N). The spring Si is used to model the response of a hair cell that are most sensitive to the sound of ωi frequency among the hair cells constituting the human hearing system.
When the sound F0 cos(ωt) is input, the reaction xi(t) of the spring Si to the sound can be represented by the equation of motion of the following equation (1):
where xi is the length of the spring which deviates from the balance point (displacement), and m is the mass of the object suspended in the spring. ζ is a damping ratio and when a friction coefficient is bi,
ki is a spring constant. ωi is the natural frequency of the spring when both ζ and Fi are zero, and ωi=√{square root over (ki/m)}.
Equation (1) is a differential equation with a general solution. When ζ<1, the solution is the same as the equation (2) below.
where Ai and βi are determined by the initial conditions of the spring, and Zi and φi are as below:
The integer n is specified so that φi is between −180° and 0°. If F0=0, the spring is subjected to periodically damping oscillation as shown in
Consider a situation in which a sound having a frequency identical with the natural frequency ωi of a spring Si in a stop state is applied to the spring as an external force. The behavior of the spring in the process of reaching a steady state is described by the equation (6) below.
x
i(t)=(1−e−ζω
Therefore, the amplitude Ai(t) of the spring gradually increases along the trajectory of
and finally becomes F0/mZi.
As the external force disappears at the point to, the amplitude of the spring gradually decreases to zero. This corresponds F0=0 in the equation (2), and the amplitude change in this process follows the equation below.
A
i(t)=Ai(t0)e−ζω(t−t
According to the embodiments of present disclosure, two methods for extracting the frequency and amplitude of the input sound are proposed based on the behavior of the spring modeled as hair cells.
1. In a Steady State
(1) Extraction of Frequency
Based on the characteristic that a resonating spring oscillates with a greater amplitude than other springs, a frequency of an input sound can be extracted.
Given a pure sound Fo cos(ωt), an amplitude of a spring Si in a steady state becomes F0/mZi by the equation (5). If the mass m of the object suspended in each spring is equal to each other, the spring with the greatest amplitude is the spring having the minimum Zi. The relationship between the natural frequency ωi of the spring and the frequency ω of pure tone can be obtained by differentiating Equation (3) with respect to Oi, and the result is as follows:
ω=ωi√{square root over (1−2ζ2)} (8)
where <1/√{square root over (2)}. If ζ is a small value near zero, then ω≈ωi. For example, ζ could be 0.001.
In order to find out the spring having the greatest amplitude, a numerical analysis method such as Runge-Kuta, which solves differential equations, is used. Given a pure sound Fo cos(ωt), the displacement xi(t) and the velocity vi(t) of each spring Si which corresponds to the solution of equation (1) are calculated using the numerical analysis method. Since an energy of each spring is the sum of a kinetic energy and a potential energy, the energy of spring Si can be obtained by equation (9).
The energy of the spring that has reached a steady state maintains a constant value. Thus, the displacement x at the time when the velocity vi is 0 becomes the amplitude of the spring Si. Therefore, the amplitude Ai of spring Si in a steady state can be calculated by the equation below:
The spring having the largest amplitude among the extracted amplitudes of the springs is the resonating spring. Therefore, it is possible to obtain the frequency of an input pure tone by using both the natural frequency ωi of the spring having the largest amplitude and the equation (8).
(2) Extraction of Amplitude
In a steady state, the trajectory of the spring is given by the equation (5). Therefore, the relationship between an energy of a spring in a steady state, Ei,s, and an amplitude Fo of a given pure tone can be represented by the equation (11).
In addition, the energy in a steady state, Ei,s, can be obtained by putting the displacement xi and the velocity vi in the steady state, which are obtained by solving the equation (1) with the numerical analysis method, into the equation (9). Therefore, the amplitude Fo of a given pure tone becomes as below:
The natural frequency ωi of the spring that resonates with an external force is almost the same with the frequency of the external force. Therefore, if putting ω≈ωi into the equation (3), then Zi=2ωi2ζ. If putting both of this result and ωi=√{square root over (ki/m)} into the equation (12), the amplitude F0 of the input pure tone can be calculated by the equation (13).
F
o=2ζωi√{square root over (2mEi,s)} (13)
2. In a Transient State
(1) Extraction of Frequency
Assume that a pure tone Fo cos(ωt) is given over a time interval [ta, tb]. All springs start to move in an initial state where both displacements and velocities are zero. Using the numerical analysis technique, the energies of the springs are calculated at each time point, and the calculated results are put into the equation (10) to obtain the amplitudes of the springs at each time point. After that, the natural frequency of the spring having the largest amplitude is substituted into the equation (8) to calculate the frequency of the given pure tone.
(2) Extraction of Amplitude
Assume that an energy of a resonating spring Si found by the numerical analysis is Ei(t). The amplitude Ai(t) of a spring Si at time t can be calculated from Ei(t) using the equation (10).
According to the general solution of the equation (1), the amplitude Ai(t) of the spring Si resonating with a given sound wave follows the trajectory of the equation (6), so that the spring Si follows the trajectory of Ai(t)=(1—e−ζω(t−t
The energies Ei(t1) and Ei(t2) at two time points t1, t2 within the time interval [ta, tb] can be obtained with the numerical analysis method. Therefore, the amplitudes Ai(t1) and Ai(t2) can be obtained by substituting these results into the equation (10). The expected steady-state amplitude, Ai,s, can be obtained by putting the result into Ai(t)=(1−e−ζω(t−t
Next, regarding the case where the frequency is the same but the volume of the sound changes, assume that the amplitude of the sound given at the point tc has changed from F1 to F2. Let Ac be the amplitude of a spring at the time point tc and let As be the amplitude of a spring at the time the spring will have approached a steady state after the external force changes to F2. The behavior of the amplitude over time can be described by the following equation.
A(t)=As+(Ac−As)e−ζω(t−t
Given the amplitudes A(t1) and A(t2) at two time points t1 and t2 within the time interval that the amplitude changes from Ac to As, it can be seen that the obtained As is the same as Equation (14).
For example, consider the case where the external force F2=0 at the time point tc. When the external force disappears, the energy of the spring decreases exponentially according to the equation (7). Namely, the measured amplitude of the spring after ΔT seconds from the time when the external force disappears will be A(tc+ΔT)=A(tc)e−ζωΔt. Putting this measurement result into the equation (14) makes As=0, and it means the external force has disappeared.
Therefore, the expected steady-state amplitude, As, can be obtained by measuring the energy of the spring more than once. Using equation (10) which represents the correlation between amplitude and energy, the energy in the steady state, Es, can be calculated and consequently the amplitude Fo of a given pure tone can be calculated using the equation (13).
Since the force applied to the spring is in the form of a periodic function, the energy does not increase uniformly within a period of a transient state. Considering this characteristic, when selecting the two time points t1 and t2 described above, the time interval is made to be the same with the period.
In this regard, it may not be able to select two time points of which a time difference between them is one period due to the relationship between the sampling rate of the sound data and the natural frequency of the spring. In this case, an error may occur, and two methods can be used to correct this error.
The first method is to select an adjacent sample which shows a less difference with a period. When the position S1 of a sample and the period T of an audio data are given, the position S2 of the second sample is calculated as [S1+sampling rate×T+0.5]. The expected steady-state amplitude, As, is calculated by putting the time information of the two points and the amplitudes at the two points into the equation (14).
The second method uses a linear regression analysis. After extracting the amplitude at several points and putting the extracted data into the equation (15), the expected steady-state amplitude, As, is calculated by the linear regression analysis.
Based on the above theoretical background, a method for extracting a frequency of an input sound can be proposed as below.
Referring
The step (a) may comprise the steps of: measuring displacements xi(t) and velocities vi(t) at time points for each of the plurality of springs (see the equation 1); calculating energy Ei(t) at each time point for each of the plurality of springs based on the displacements and the velocities (see the equation 9); and calculating an amplitude Ai(t) of each of the plurality of springs based on the energies Ei(t) (see the equation 10).
The step (b) can be calculated with the equation (14).
In the step (b), said expected steady-state amplitude, Ai,s(t), can be calculated based on the amplitudes at two different time points within a duration of the input sound.
A difference between the two different time points can be a period of the natural frequency of the corresponding spring.
When one of the two time points is t1, a sampling rate of the input sound is SR, and the period of the natural frequency of the corresponding spring is T, the other t2 of the two time points can be calculated by means of the equation below.
t
2
=[t
1
+SR×T+0.5]
The number of the plurality of springs N may be determined based on a range and a resolution of the frequency to be extracted.
According to the method I for extracting the frequency and amplitude of the input sound described above, if the input sound is a pure tone, the frequency and amplitude of the input sound can be effectively extracted.
Now, assume that there are n types of pure tones constituting a complex tone F(t)=ΣjFj cos(ωjt+φj). If n=1, the pure tone of a given sound can be found by selecting the spring having the largest amplitude among the springs. However, if n>1, it is difficult to find out pure tones constituting the complex tone by selecting top n springs in the order of amplitude.
The first reason is that the amplitude of a spring of which the frequency is adjacent to the spring having the largest amplitude could be greater than the amplitude of the spring which resonates with other pure tones constituting the complex tone. The second reason is that, as shown in the trajectory after 0.8 seconds in
Accordingly, in this embodiment, instead of finding the local maximum value among the spring amplitudes at each time point, a method of finding the local maximum value from the results of multiplying an expected steady-state amplitude and a transient-state-pure-tone amplitude is proposed.
1. Expected Steady-State Amplitude and Filtered Pure-Tone Amplitude
First, in order to extract the pure tones constituting a complex tone, the amplitude Ai(t) of each spring Si is calculated by applying the step (a) of the method I to each spring for extracting the frequency of an input sound.
Next, an expected steady-state amplitude, Ai,s(t), is calculated by applying the step (b) of the method I for extracting the frequency of an input sound to the amplitude Ai(t) of each spring Si. However, the equation (14) which calculates the expected steady-state amplitude is an equation derived from the equation (7) which describes the behavior of a resonating spring. Therefore, high amplitudes could be resulted even at the frequencies away from the resonant frequency as in
Accordingly, the following steps are performed. The third step is to calculate a transient-state-pure-tone amplitude, Fi,t(t), by putting the amplitude Ai(t) of the spring Si into the equation (13). In addition, a predicted pure-tone amplitude, Fi,s(t), is calculated by applying steps (c) and (d) of the method I for extracting the frequency of the input sound to the expected steady-state amplitude, Ai,s(t).
As the final step, a filtered pure-tone amplitude, Fi,p(t), is calculated by multiplying the transient-state-pure-tone amplitude, Fi,t(t), with the predicted pure-tone amplitude, Fi,s(t), as in Fi,p(t)=Fi,t(t)×Fi,s(t). Additionally, the result of multiplication of the amplitudes may be divided by the maximum amplitude of the sound in order not to exceed 1 but to be normalized. For example, if the sound is expressed as a 16-bit integer, the result is divided by 32,767.
A filtered pure-tone amplitude has the characteristic that 1) the amplitude becomes 0 when the sound disappears, and 2) the amplitudes of frequencies away from a resonant frequency in the frequency domain are low.
2. Finding a Pure Tone from Local Maximum Values
However, if the frequency interval is narrow, no local maximum might exist between two adjacent local maxima.
Based on the theoretical background described above, the following method for extracting the frequency of the input sound is proposed.
Referring
The step (1) may comprise the steps of: measuring displacements xi(t) and velocities vi(t) at different time points for each of the plurality of springs (see the equation 1); calculating an energy Ei(t) at each time point for each of the plurality of springs based on the displacements xi(t) and the velocities vi(t) (see the equation 9); and calculating an amplitude Ai(t) at each time point for each of the plurality of springs based on the energy Ei(t) (see the equation 10).
The equation 13 can be used in the step (2), the equation 14 can be used in the step (3), and the equation 13 can be used in the step (4).
The number of the plurality of springs, N, may be determined based on a range and a resolution of the frequencies to be extracted.
In the step (3), the expected steady-state amplitudes, Ai,s(t), can be calculated based on the amplitudes at two time points within a duration of the input sound.
In the step (3), the expected steady-state amplitudes, Ai,s(t), can be calculated by means of the equation below:
where t1 and t2 are the two different time points within the duration of input sound, t2>t1, Ai(t1) is an amplitude of any spring among the plurality of springs at t1, Ai(t2) is an amplitude of said spring at t2, ζ is a damping ratio of said spring, and ω satisfies the equation ω=ωi√{square root over (1−2ζ2)}, where ωi is the natural frequency of said spring.
A difference between the two different time points can be a period of the natural frequency of the corresponding spring.
When one of the two time points is t1, a sampling rate of the input sound is SR, and a period of the natural frequency of the corresponding spring is T, the other t2 of the two time points is calculated by the equation below.
t
2
=[t
1
+SR×T+0.5]
In step (7), the natural frequency may be used for sound recognition or sound synthesis.
The sound processing method and sound processing apparatus according to the present embodiment can be applied not only to human voice but also to all types of sounds such as objects such as musical instruments and animals. In the present disclosure, sound recognition includes: speech recognition in a sense of converting human speech into text; speaker verification/speaker identification for determining whose voice an input sound corresponds to; source separation such as discrimination of a specific person's voice in a state in which the voices of a plurality of speakers are mixed, separation of voice from noise when noise is mixed, and separation of vocals from songs excluding instruments; sound direction detection; sound-based nomenclature diagnostics such as coughing or breathing; sound-based machine fault diagnostics based on mechanical sounds; and Sonar for navigating undersea terrain, ranging objects and more.
Sound recognition or sound synthesis are example to which the natural frequency obtained by the present invention can be applied, and the scope of the present invention is not limited thereto. The present invention can be applied to any field in which periodic properties or Fourier transforms are used such as price prediction for cryptocurrencies and stocks and image processing such as denoising.
Hereinafter, the experimental results according to the present embodiment will be described. To show the performance of the DJ transform according to the present disclosure, the results of the DJ transform and that of the STFT were compared. In the DJ transform, 7,951 springs of which natural frequencies are from 50 Hz to 8,000 Hz were used, respectively. The frequency interval of springs was 1 Hz. A 25 milliseconds window was used for the STFT.
The DJ transform was performed in an NVIDIA M40 GPU environment with 3,072 cores and 12 GB of memory and was implemented using the C language API of Cuda Toolkit 8.0. It took about 0.6 seconds to do the DJ transform for a 1 second audio data.
As shown in
Three experiments were conducted to compare the results of the DJ transform with the STFT in terms of temporal resolution.
The first experiment was to check the frequency extracted at the time point where an input frequency changes.
The second experiment is to extract frequencies from the sounds that appear and disappear rapidly. The first rows of
In
The upper drawing in
The third experiment is an extension of the second experiment, which shows the results in frequency extraction when a 1 kHz and a 2 kHz pure tones are alternately generated for 5 milliseconds from 200 milliseconds to 800 milliseconds (
The first rows of
As can be seen in
Since the complex tone is composed of 400 Hz and 440 Hz, the amplitude fluctuates in a 40 Hz cycle as shown in the bottom of
The sound processing device 100 may be any one of various types of digital computers. For example, the sound processing device may be a laptop computer, a desktop computer, a workstation, a server, a blade server, a mainframe, or any other suitable computers. Alternatively, the sound processing device may be any one of various types of mobile devices. For example, the sound processing device may be a personal digital assistant (PDA), a cellular phone, a smartphone, a wearable device, or any other similar computing devices. Components, connections and relations therebetween, and functions thereof, disclosed in the present disclosure, are merely illustrative and do not limit the scope of the present disclosure.
As shown in
A plurality of components of the sound processing device 100 are connected to the I/O interface 105. The plurality of components include an input unit 106, such as a keyboard, a mouse, or a microphone, an output unit 107, such as a monitor, or a speaker, a storage unit 108, such as a magnetic disk or an optical disc, and a communication unit 109, such as a network card, a modem, or a wireless communication transceiver. For example, a sound from which a fundamental frequency is to be extracted may be input through the microphone. The communication unit 109 allows the sound processing device 100 to exchange information/data with other devices through a computer network, such as the Internet, and/or telegraph networks.
The computing unit 101 may be a general purpose/dedicated processing component having processing and calculation functions. Some examples of the computing unit 101 include, but are not limited to, a central processing unit (CPU), a graphics processing unit (GPU), a dedicated artificial intelligence calculation chip, a computing unit configured to execute a machine learning model algorithm, a digital signal processor (DSP), and any other suitable processors, controllers, and microcontrollers. The computing unit 901 performs the sound processing method described above. For example, in an embodiment, the sound processing method may be implemented by a computer software program and may be stored in a machine-readable medium, such as the storage unit 108. In an embodiment, some or the entirety of a computer program may be loaded into and/or installed in the sound processing device 100 by the ROM 102 and/or the communication unit 109. When the computer program is loaded into the RAM 103 and executed by the computing unit 101, one step or a plurality of steps of the sound processing method described above may be performed. In another embodiment, the computing unit 901 is configured to perform the sound processing method according to the embodiment of the present disclosure in any other suitable manners (e.g. firmware).
In the present disclosure, the machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatuses, and devices, or suitable combinations thereof. More specific examples of the machine-readable storage medium may include electrical connection based on one line or a plurality of lines, a portable computer disk, a hard disk, a random access memory (RAM), a read-only memory (ROM), a erasable programmable read-only memory (EPROM or flash memory), optical fiber, CD-ROM, an optical storage device, a magnetic storage device, or any suitable combinations thereof.
A sound may be input to the sound processing device 100 through the microphone. The sound input through the microphone may be stored in an electronic form and may then be used. Alternatively, the input sound may be directly provided as an electronic file through the storage unit 108 or may be stored in an electronic form through the communication unit 109 and may then be used.
Although the present disclosure has been described in detail through preferred embodiments, the present disclosure is not limited thereto, and various changes and applications can be made without departing from the technical spirit of the present disclosure, which is obvious to a person skilled in the art. Therefore, the scope of protection for the present disclosure should be interpreted by the following claims, and all technical ideas within the scope equivalent thereto should be interpreted as being included in the scope of the present disclosure.
Number | Date | Country | Kind |
---|---|---|---|
10-2019-0003620 | Jan 2019 | KR | national |
This application is a continuation-in-part of U.S. application Ser. No. 17/268,444, filed on Feb. 12, 2021, which claims the benefit of PCT/KR2019/016347 filed on Nov. 26, 2019, which claims the benefit of Korean patent application 10-2019-0003620 filed on Jan. 11, 2019. The entire disclosure of the foregoing applications is incorporated by reference herein.
Number | Date | Country | |
---|---|---|---|
Parent | 17268444 | Feb 2021 | US |
Child | 18210866 | US |