This application claims the benefit under 35 U.S.C. § 119(a) of a Korean Patent Application filed with the Korean Intellectual Property Office on Sep. 12, 2005 and assigned Serial No. 2005-84780, the entire disclosure of which is hereby incorporated by reference.
1. Field of the Invention:
The present invention relates to an apparatus and a method for transmitting audio signals. More particularly, the present invention relates to an apparatus and a method for transmitting audio signals in such a manner that the audio signals transmitted and received in a mobile telephone network are preprocessed before being inputted to a voice encoder.
2. Description of the Related Art:
Variable-rate voice encoders used in mobile telephone networks support a creation of voice packets having a plurality of rates. Typical examples of the variable-rate voice encoders include Qualcomm-CELP (QCELP) and Enhanced Variable Rate Codec (EVRC) used in Code Division Multiple Access (CDMA) systems. The variable-rate voice encoders can select the rate of voice packets according to characteristics of inputted voice signals or based on the rate required by communication systems to compress or restore voice signals. The QCELP and EVRC create voice packets having a full rate, ½ rate, ¼ rate, or ⅛ rate. These encoders are based on a human voice creation model and exhibit optimal performance in compressing and decoding voice signals. However, the encoders exhibit poor performance with regard to signals (for example, music) having a creation model different from the voice creation model. This means that, when audio signals are transmitted and received in conventional mobile telephone networks, some measures must be taken to lessen the degradation of sound quality.
The methods shown in
Accordingly, there is a need for an improved apparatus and method for compensating for a degree of sound quality resulting from characteristics of voice encoders, in order to improve the sound quality of audio signals transmitted and received in mobile telephone networks.
An aspect of exemplary embodiments of the present invention is to address at least the above problems and/or disadvantages and to provide at least the advantages described below. Accordingly, an aspect of exemplary embodiments of the present invention is to provide an apparatus and a method for preprocessing audio signals in an analysis-by-synthesis scheme and transmitting the audio signals.
Another aspect of exemplary embodiments of the present invention is to provide an apparatus and a method for optimally preprocessing audio signals and transmitting the audio signals by using error signals between original audio signals and signals obtained by preprocessing the original signals with a frequency filter, encoding the audio signals with a voice encoder, and decoding the audio signals.
Another aspect of exemplary embodiments of the present invention is to provide an apparatus and a method for preprocessing audio signals by multiplying the frequency component of the audio signals by a specific filter gain value and transmitting the audio signals.
Another aspect of exemplary embodiments of the present invention is to provide an apparatus and a method for transmitting audio signals with lesser degradation of their sound quality resulting from a voice encoder in a mobile telephone network.
In order to accomplish an aspect of exemplary embodiments of the present invention, there is provided an apparatus for transmitting audio signals, in which a preprocessing filter converts an audio signal into a frequency domain and multiplying each frequency component by a filter gain value, the audio signal being inputted according to a frame, the preprocessing filter converts the audio signal in the frequency domain into a time domain and outputs the audio signal; a first voice encoder/synthesizer voice-encodes the audio signal outputted by the preprocessing filter, decodes the audio signal, and synthesizes the audio signal; a comparator outputs an error signal based on an error between the audio signal outputted by the first voice encoder/synthesizer and the audio signal inputted to the preprocessing filter; a second voice encoder voice-encodes the audio signal outputted by the preprocessing filter and transmitting the audio signal; and a filter gain/switch controller calculates a filter gain from the error signal outputted by the comparator and the audio signal inputted to the preprocessing filter, the filter gain being provided to the preprocessing filter, the filter gain/switch controller controls the preprocessing filter, when the filter gain is an optimal filter gain, so that the audio signal is processed according to the optimal filter gain and outputted to the second voice encoder.
In an exemplary implementation, the preprocessing filter includes a frequency converter for converting the audio signal into the frequency domain, the audio signal having been inputted to the preprocessing filter; a frequency filter for multiplying the audio signal by the filter gain value, the audio signal having been converted into the frequency domain by the frequency converter, and outputting the audio signal; and an inverse frequency converter for converting the audio signal into the time domain, the audio signal having been outputted by the frequency filter.
In an exemplary implementation, the filter gain/switch controller includes a frequency band-based Signal-to-Noise Ratio (SNR) calculator for converting the error signal outputted by the comparator and the audio signal inputted to the preprocessing filter into the frequency domain, respectively, and calculating an SNR for each frequency band; a frequency band-based filter gain calculator for calculating the filter gain for each frequency band by using the SNR calculated by the frequency band-based SNR calculator; a postprocessor for adjusting deviation of the filter gain and providing the filter gain to the preprocessing filter; and a switch controller for controlling the preprocessing filter, when the filter gain is an optimal filter gain, so that the audio signal is processed according to the optimal filter gain and outputted to the second voice encoder.
The above and other objects, features, and advantages of certain exemplary embodiments of the present invention will be more apparent from the following detailed description taken in conjunction with the accompanying drawings, in which:
Throughout the drawings, the same drawing reference numerals will be understood to refer to the same elements, features and structures.
The matters defined in the description such as a detailed construction and elements are provided to assist in a comprehensive understanding of exemplary embodiments of the invention. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted for clarity and conciseness.
The preprocessor 320 receives audio signals based on frames, preprocesses them into signals suitable for the voice encoder 332 by using the preprocessing filter 310, and outputs them to the signal transmitter 330. A number of feedback processes are conducted to create optimally preprocessed signals for each frame. A search process for creating optimally preprocessed signals is terminated when the feedback process is repeated a predetermined number of times or when a calculated error signal {tilde over (e)}[n] satisfies a predetermined criterion. Then, finally preprocessed signals are outputted for transmission. For example, the preprocessing process according to an exemplary embodiment of the present invention is divided into a search mode and a transmission mode. In the search mode, an optimal filter gain is searched for to be used by the preprocessing filter 310 for optimally preprocessed signals. In the transmission mode, the signal transmitter 330 uses the optimal filter gain to transmit the preprocessed signals to the voice encoder 332.
An input audio signal s[n] of a frame passes through the preprocessing filter 310 in the search mode. The input audio signal s[n] moves through the voice encoder/synthesizer 312 and reaches the comparator 314, which then creates an error signal {tilde over (e)}[n] from the input audio signal s[n]. Together with the input audio signal s[n], the error signal {tilde over (e)}[n] is used by the filter gain/switch controller 316 to obtain an optimal preprocessing filter gain for the current input frame. This process continues until feedback is repeated a predetermined number of times or a calculated error signal {tilde over (e)}[n] satisfies a predetermined criterion. Respective components of the preprocessor 320 will now be described in detail.
The preprocessing filter 310 is a frequency-domain filter. The preprocessing filter 310 converts an input audio signal s[n] in a time domain into a signal in a frequency domain and multiplies respective frequency components by a specific filter gain value. The resultant is converted into a signal in the time domain. As a result, a filtered signal is outputted.
The FFT converter 412 FFT-converts a time-domain input audio signal s[n] into a frequency-domain signal. The frequency filter 414 has a filter gain and frequency response characteristics based on a filter gain value provided by the filter gain/switch controller 316. For example, the frequency filter 414 multiplies an audio signal, which has been FFT-converted into the frequency domain, by a filter gain and a filter gain value provided by the filter gain/switch controller 316. The IFFT converter 416 IFFT-converts the resultant and outputs a preprocessed time-domain audio signal {tilde over (s)}[n]. Before the feedback process, the filter gain is initialized to 1.
The voice encoder/synthesizer 312 is composed of an encoder, which has the same construction as the voice encoder 332 used for signal transmission, and a corresponding synthesizer. The voice encoder/synthesizer 312 is used to accurately model the encoding and decoding processes of signal transmission channels. The voice encoder/synthesizer 312 consists of a voice encoder having the same function as the voice encoder 332 of the signal transmitter 330 and a synthesizer having the same function as a decoder used in the reception side. For example, the voice encoder/synthesizer 312 may be made up of a linear prediction analyzer and a synthesizer, a pitch analyzer and a synthesizer, or a Fixed Code Book (FCB) analyzer and a synthesizer.
The comparator 314 calculates the difference (that is, encoding error signal) {tilde over (e)}[n] between the input audio signal s[n] and the audio signal {tilde over (s)}[n] outputted by the voice encoder/synthesizer 312. The error signal {tilde over (e)}[n] calculated by the comparator 314 is used as an input signal to the filter gain/switch controller 316 together with the input audio signal s[n].
The filter gain/switch controller 316 obtains an optimal preprocessing filter gain for the current input frame with reference to the error signal {tilde over (e)}[n] and the input audio signal s[n]. As used herein, the filter gain refers to a frequency gain value used to determine the frequency response characteristics of the frequency filter 414. In an extreme case, the filter gain may be calculated for each frequency component of a single frame input audio signal. For example, if a frame of an input audio signal consists of 160 samples (20 ms), the number of frequency components to be filtered when the samples are to be filtered after 256 point FFT transform is 128. When different filter gains must be used for respective frequency components, a total of 128 gain values must be calculated for each frame. Such an approach of calculating and processing filter gain values for respective frequency components is inefficient when characteristics of human auditory recognition are taken into account. Human ability to discern frequencies is not uniform along the frequency axis, and there is a frequency masking effect. Considering this fact, respective components may be grouped in the frequency domain into a number of bands and the same gain may be used in the same band. This reduces the amount of calculation without affecting the performance. Selection of a method for grouping bands depends on the characteristics of input audio signals or target environments.
The frequency band-based SNR calculator 712 FFT-converts the input audio signal s[n] and the error signal {tilde over (e)}[n] calculated by the comparator 314 , respectively, and calculates a SNR in respective frequency bands shown in
Wherein, i refers to each band; NB refers to the total number of bands; n refers to the number of repeated feedback; Es [i] refers to the energy of an input audio signal s[n] of an ith band; and Een [i] refers to the energy of an error signal {tilde over (e)}[n] calculated by the comparator 314 at nth feedback of the ith band.
The frequency band-based filter gain calculator 714 calculates the filter gain for each band with reference to SNR values for respective frequency bands, which have been calculated by the frequency band-based SNR calculator 712, as defined by equation (2) below.
Gn [i] =αf(SNRn [i])+(1−α)Gn−1 [i]; i=1, . . . , NB (2 )
Wherein, Gn refers to a filter gain at nth feedback; Gn−1 refers to a filter gain at (n−1)th feedback; α refers to a regression coefficient, which is preferably 0.55 based on experiments; and f refers to a Sigmoid function having a value between [0,1 ], as defined by equation (3) below.
When the filter gain is calculated for respective bands by the frequency band-based filter gain calculator 714 in this manner, the result is as follows: if the input audio signal s[n] of a frequency band is larger than the error signal {tilde over (e)}[n] calculated by the comparator 314, that frequency band has a large value, that is, about 1. If not, the frequency band has a small value, that is, about 0. Consequently, if bands can be encoded well by the encoder, they are increased, and if not, decreased. This process is repeated in a feedback loop so that the filter gain value converges to a value optimized for the current input audio signal s[n].
The postprocessor 716 aims at reducing the aliasing effect resulting from inter-frame deviation of filter gain between current and previous frames, as well as from intra-frame deviation of filter gain between bands in the current frame. In order to solve the problem of inter-frame deviation, the filter gain of the current frame, which is being searched, may be limited within a predetermined range based on an optimal filter gain determined for all of the frames. For example, the inter-frame deviation is limited within 0.3 for all of the frames, as defined by equation (4) below.
Wherein, G*prev [i] refers to an optimal filter gain of a previous frame determined for an ith band.
In order to solve the problem of intra-filter deviation, a linear or sinusoidal smoothing function may be used.
The switch controller 718 determines whether to continue the feedback process. When the feedback is repeated a predetermined number of times or when a convergence condition is satisfied, the switch controller 718 switches the system from a search mode to a transmission mode. The maximum number of feedback repetition is preferably 10, based on experiments. The convergence condition is that the rate of change of energy of the error signal {tilde over (e)}[n] be within 0.1, as defined by equation (5) below.
transmission mode,
else
search mode.
As mentioned above, the preprocessor 320 receives audio signals for each frame and preprocesses the audio signals.
When an audio signal s[n] of a frame is inputted to the preprocessor 320 (S902), the preprocessing filter 310 filters the audio signal s[n] at the frequency filter 414 by using a filter gain and a filter gain value provided by the switch controller 316, so that a preprocessed audio signal {tilde over (s)}[n] is outputted (S904).
The preprocessed audio signal {tilde over (s)}[n] is encoded by the voice encoder/synthesizer 312. Then, the signal is decoded again and outputted as {tilde over (s)}[n] (S906).
The outputted {tilde over (s)}[n] is inputted to the comparator 314, which calculates the error between the audio signal s[n], which has been inputted in step S902, and outputs an error signal {tilde over (e)}[n] (S908).
Based on the error signal {tilde over (e)}[n] and the inputted audio signal s[n], the filter gain/switch controller 316 calculates an optimal preprocessing filter gain for the current input frame. When an optimal filter gain is obtained after feedback is repeated a predetermined number of times or when the convergence condition is satisfied (S910), the filter gain/switch controller 316 provides the preprocessing filter 310 with the optimal filter gain value.
The preprocessing filter 310 outputs an audio signal {tilde over (s)}[n], which has been preprocessed based on the optimal filter gain (S912).
If the filter gain/switch controller 316 fails to obtain an optimal filter gain, it returns to step S904 and repeats the ensuing steps until feedback is repeated a predetermined number of times or the convergence condition is satisfied.
In summary, the preprocessor 320 preprocesses inputted audio signals for each frame through the steps shown in
For example, the exemplary embodiments of present invention are applicable to ringback tones used in mobile telephone networks. Various types of music are commonly used as the ringback tones. When a ringback tone is transmitted to a user in a conventional manner, the voice encoder degrades the sound quality. If the ringback tone is preprocessed according to the present invention before being transmitted, the sound quality is hardly degraded. The ringback tone may be preprocessed in advance and stored separately, so that it can be transmitted via the voice encoder at the user's request. Alternatively, the ringback tone may be preprocessed every time the user requests the ringback tone and then transmitted.
As mentioned above, the exemplary embodiments of the present invention are advantageous in that, when audio signals are transmitted in a mobile telephone network, the sound quality of the audio signals is hardly degraded by the voice encoder, because the audio signals are preprocessed by using an optimal filter gain based on error signals obtained when the audio signals are preprocessed and outputted by the voice encoder and the synthesizer.
The exemplary embodiments of the present invention can also be embodied as computer-readable codes on a computer-readable recording medium. The computer-readable recording medium is any data storage device that can store data which can thereafter be read by a computer system. Examples of the computer-readable recording medium include, but are not limited to, read-only memory (ROM), random-access memory (RAM), CD-ROMs, magnetic tapes, floppy disks, optical data storage devices, and carrier waves (such as data transmission through the Internet via wired or wireless transmission paths). The computer-readable recording medium can also be distributed over network-coupled computer systems so that the computer-readable code is stored and executed in a distributed fashion. Also, function programs, codes, and code segments for accomplishing the present invention can be easily construed as within the scope of the invention by programmers skilled in the art to which the present invention pertains.
While the invention has been shown and described with reference to certain exemplary embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims and their equivalents.
Number | Date | Country | Kind |
---|---|---|---|
2005-84780 | Sep 2005 | KR | national |