The present invention relates to a speech signal encoding method and a speech signal decoding method, and more particularly, to methods of frequency-transforming and processing a speech signal.
In general, audio signals include signals of various frequencies, the human audible frequency ranges from 20 Hz to 20 kHz, and human voices are present in a range of about 200 Hz to 3 kHz. An input audio signal may include components of a high-frequency zone higher than 7 kHz at which human voices are hardly present in addition to a band in which human voices are present. In this way, when a coding method suitable for a narrowband (up to about 4 kHz) is applied to wideband signals or super-wideband signals, there is a problem in that sound quality degrades.
With a recent increase in demands for video calls, video conferences, and the like, techniques of encoding/decoding audio signals, that is, speech signals, so as to be close to actual voices have increasingly attracted attention.
Frequency transform which is one of methods used to encode/decode a speech signal is a method of causing an encoder to frequency-transform a speech signal, transmitting transform coefficients to a decoder, and causing the decoder to inversely frequency-transform the transform coefficients to reconstruct the speech signal.
In the techniques of encoding/decoding a speech signal, a method of encoding predetermined signals in the frequency domain is considered to be superior, but a time delay may occur when transform for encoding a speech signal in the frequency domain is used.
Therefore, there is a need for a method which can prevent the time delay in encoding/decoding a signal and increase a processing rate.
An object of the invention is to provide a method and a device which can effectively perform MDCT/IMDCT in the course of encoding/decoding a speech signal.
Another object of the invention is to provide a method and a device which can prevent an unnecessary delay from occurring in performing MDCT/IMDCT.
Another object of the invention is to provide a method and a device which can prevent a delay by not using a look-ahead sample to perform MDCT/IMDCT.
Another object of the invention is to provide a method and a device which can reduce a processing delay by reducing an overlap-addition section necessary for perfectly reconstructing a signal in performing MDCT/IMDCT.
(1) According to an aspect of the invention, there is provided a speech signal encoding method including the steps of: specifying an analysis frame in an input signal; generating a modified input based on the analysis frame; applying a window to the modified input; generating a transform coefficient by performing an MDCT (Modified Discrete Cosine Transform) on the modified input to which the window has been applied; and encoding the transform coefficient, wherein the modified input includes the analysis frame and a self replication of all or a part of the analysis frame.
(2) In the speech signal encoding method according to (1), a current frame may have a length of N and the window may have a length of 2N, the step of applying the window may include generating a first modified input by applying the window to the front end of the modified input and generating a second modified input by applying the window to the rear end of the modified input, the step of generating the transform coefficient may include generating a first transform coefficient by performing an MDCT on the first modified input and generating a second transform coefficient by performing an MDCT on the second modified input, and the step of encoding the transform coefficient may include encoding the first modified coefficient and the second modified coefficient.
(3) In the speech signal encoding method according to (2), the analysis frame may include a current frame and a previous frame of the current frame, and the modified input may be configured by adding a self-replication of the second half of the current frame to the analysis frame.
(4) In the speech signal encoding method according to (2), the analysis frame may include a current frame, the modified input may be generated by adding M self-replications of the first half of the current frame to the front end of the analysis frame and adding M self-replications of the second half of the current frame to the rear end of the analysis frame, and the modified input may have a length of 3N.
(5) In the speech signal encoding method according to (1), the window may have the same length as a current frame, the analysis frame may include the current frame, the modified input may be generated by adding a self-replication of the first half of the current frame to the front end of the analysis frame and adding a self-replication of the second half of the current frame to the rear end of the analysis frame, the step of applying the window may include generating first to third modified inputs by applying the window to the modified input while sequentially shifting the window by a half frame from the front end of the modified input, the step of generating the transform coefficient may include generating first to third transform coefficients by performing an MDCT on the first to third modified inputs, and the step of encoding the transform coefficient may include encoding the first to third transform coefficients.
(6) In the speech signal encoding method according to (1), a current frame may have a length of N, the window may have a length of N/2, and the modified input may have a length of 3N/2, the step of applying the window may include generating first to fifth modified inputs by applying the window to the modified input while sequentially shifting the window by a quarter frame from the front end of the modified input, the step of generating the transform coefficient may include generating first to fifth transform coefficients by performing an MDCT on the first to fifth modified inputs, and the step of encoding the transform coefficient may include encoding the first to fifth transform coefficients.
(7) In the speech signal encoding method according to (6), the analysis frame may include the current frame, and the modified input may be generated by adding a self-replication of the front half of the first half of the current frame to the front end of the analysis frame and adding a self-replication of the rear half of the second half of the current frame to the rear end of the analysis frame.
(8) In the speech signal encoding method according to (6), the analysis frame may include the current frame and a previous frame of the current frame, and the modified input may be generated by adding a self-replication of the second half of the current frame to the analysis frame.
(9) In the speech signal encoding method according to (1), a current frame may have a length of N, the window may have a length of 2N, and the analysis frame may include the current frame, and the modified input may be generated by adding a self-replication of the current frame to the analysis frame.
(10) In the speech signal encoding method according to (1), a current frame may have a length of N and the window may have a length of N+M, the analysis frame may be specified by applying a symmetric first window having a slope part with a length of M to the first half with a length of M of the current frame and a subsequent frame of the current frame, the modified input may be generated by self-replicating the analysis frame, and the step of applying the window may include generating a first modified input by applying the second window to the front end of the modified input and generating a second modified input by applying the second window to the rear end of the modified input.
The step of generating the transform coefficient may include generating a first transform coefficient by performing an MDCT on the first modified input and generating a second transform coefficient by performing an MDCT on the second modified input, and the step of encoding the transform coefficient may include encoding the first modified coefficient and the second modified coefficient.
(11) According to another aspect of the invention, there is provided a speech signal decoding method including the steps of generating a transform coefficient sequence by decoding an input signal; generating a temporal coefficient sequence by performing an IMDCT (Inverse Modified Discrete Cosine Transform) on the transform coefficients; applying a predetermined window to the temporal coefficient sequence; and outputting a sample reconstructed by causing the temporal coefficient sequence having the window applied thereto to overlap, wherein the input signal is encoded transform coefficients which are generated by applying same window as the window to a modified input generated based on a predetermined analysis frame in a speech signal and performing an MDCT thereto, and the modified input includes the analysis frame and a self-replication of all or a part of the analysis frame.
(12) In the speech signal decoding method according to (11), the step of generating the transform coefficient sequence may include generating a first transform coefficient sequence and a second transform coefficient sequence of a current frame, the step of generating the temporal coefficient sequence may include generating a first temporal coefficient sequence and a second temporal coefficient sequence by performing an IMDCT on the first transform coefficient sequence and the second transform coefficient sequence, the step of applying the window may include applying the window to the first temporal coefficient sequence and the second temporal coefficient sequence, and the step of outputting the sample may include overlap-adding the first temporal coefficient sequence and the second temporal coefficient sequence having the window applied thereto with a gap of one frame.
(13) In the speech signal decoding method according to (11), the step of generating the transform coefficient sequence may include generating first to third transform coefficient sequences of a current frame.
The step of generating the temporal coefficient sequence may include generating first to third temporal coefficient sequences by performing an IMDCT on the first to third transform coefficient sequences, the step of applying the window may include applying the window to the first to third temporal coefficient sequences, and the step of outputting the sample may include overlap-adding the first to third temporal coefficient sequences having the window applied thereto with a gap of a half frame from a previous or subsequent frame.
(14) In the speech signal decoding method according to (11), the step of generating the transform coefficient sequence may include generating first to fifth transform coefficient sequences of a current frame.
The step of generating the temporal coefficient sequence may include generating first to fifth temporal coefficient sequences by performing an IMDCT on the first to fifth transform coefficient sequences, the step of applying the window may include applying the window to the first to fifth temporal coefficient sequences, and the step of outputting the sample may include overlap-adding the first to fifth temporal coefficient sequences having the window applied thereto with a gap of a quarter frame from a previous or subsequent frame.
(15) In the speech signal decoding method according to (11), the analysis frame may include a current frame, the modified input may be generated by adding a self-replication of the analysis frame to the analysis frame, and the step of outputting the sample may include overlap-adding the first half of the temporal coefficient sequence and the second half of the temporal coefficient sequence.
(16) In the speech signal decoding method according to (11), a current frame may have a length of N and the window is a first window having a length of N+M, the analysis frame may be specified by applying a symmetric second window having a slope part with a length of M to the first half with a length of M of the current frame and a subsequent frame of the current frame, the modified input may be generated by self-replicating the analysis frame, and the step of outputting the sample may include overlap-adding the first half of the temporal coefficient sequence and the second half of the temporal coefficient sequence and then overlap-adding the overlap-added first and second halves of the temporal coefficient to the reconstructed sample of a previous frame of the current frame.
According to the aspects of the invention, it is possible to effectively perform MDCT/IMDCT in the course of encoding/decoding a speech signal.
According to the aspects of the invention, it is possible to prevent an unnecessary delay from occurring in course of performing MDCT/IMDCT.
According to the aspects of the invention, it is possible to prevent a delay by performing MDCT/IMDCT without using a look-ahead sample.
According to the aspects of the invention, it is possible to reduce a processing delay by reducing an overlap-addition section necessary for perfectly reconstructing a signal in the course of performing MDCT/IMDCT.
According to the aspects of the invention, since the delay in a high-performance audio encoder can be reduced, it is possible to use MDCT/IMDCT in bidirectional communications.
According to the aspects of the invention, it is possible to use MDCT/IMDCT techniques in a speech codec that processes high sound quality without any additional delay.
According to the aspects of the invention, it is possible to reduce a delay associated in the MDCT in the existing encoder and to reduce a processing delay in a codec without modifying/changing other configurations.
Hereinafter, embodiments of the invention will be specifically described with reference to the accompanying drawings. When it is determined that detailed description of known configurations or functions involved in the invention makes the gist of the invention obscure, the detailed description thereof will not be made.
If it is mentioned that an element is “connected to” or “coupled to” another element, it should be understood that still another element may be interposed therebetween, as well as that the element may be connected or coupled directly to another element.
Terms such as “first” and “second” can be used to describe various elements, but the elements are not limited to the terms. The terms are used only to distinguish one element from another element.
The constituent units described in the embodiments of the invention are independently shown to represent different distinctive functions. Each constituent unit is not constructed by an independent hardware or software unit. That is, the constituent units are independently arranged for the purpose of convenience for explanation and at least two constituent units may be combined into a single constituent unit or a single constituent unit may be divided into plural constituent units to perform functions.
On the other hand, various codec techniques are used to encode/decode a speech signal. Each codec technique may have characteristics suitable for a predetermined speech signal and may be optimized for the corresponding speech signal.
Examples of the codec using an MDCT (Modified Discrete Cosine Transform) includes AAC series of MPEG, G.722.1, G929.1, G718, G711.1, G722 SWB, G.729.1/G718 SWB (Super Wide Band), and G.722 SWB. These codecs are based on a perceptual coding method of performing an encoding operation by combining a filter bank to which the MDCT is applied and a psychoacoustic model. The MDCT is widely used in speech codecs, because it has a merit that a time-domain signal can be effectively reconstructed using an overlap-addition method.
As described above, various codecs using the MDCT are used and the codecs may have different structures to achieve effects to be realized.
For example, the ACC series of MPEG performs an encoding operation by combining an MDCT (filter bank) and a psychoacoustic model, and an ACC-ELD thereof performs an encoding operation using an MDCT (filter bank) with a low delay.
G722.1 applies the MCDT to the entire band and quantizes coefficients thereof G.718 WB (Wide Band) performs an encoding operation into an MDCT-based enhanced layer using a quantization error of a basic core as an input with a layered wideband (WB) codec and a layered super-wideband (SWB) codec.
In addition, EVRC (Enhanced Variable Rate Codec)-WB, G729.1, G.718, G711.1, G.718/G729.1 SWB, and the like performs an encoding operation into a MDCT-based enhanced layer using a band-divided signal as an input with a layered wideband codec and a layered super-wideband codec.
Referring to
Referring to
Side information on a signal length, a window type, bit assignment, and the like can be transmitted to the units 210 to 250 of the MDCT unit 200 via a secondary path 260. It is described herein that the side information necessary for the operations of the units 210 to 250 can be transmitted via the secondary path 260, but this is intended only for convenience for explanation and necessary information along with a signal may be sequentially transmitted to the buffer 210, the modification unit 220, the windowing unit 230, the forward transform unit 240, and the formatter 250 in accordance with the order of operations of the units shown in the drawing without using a particular secondary path.
The buffer 210 receives time-domain samples as an input and generates a signal block on which processes such as the MDCT are performed.
The modification unit 220 modifies the signal block received from the buffer 210 so as to be suitable for the processes such as the MDCT and generates a modified input signal. At this time, the modification unit 220 may receives the side information necessary for modifying the signal block and generating the modified input signal via the secondary path 260.
The windowing unit 230 windows the modified input signal. The windowing unit 230 can window the modified input signal using a trapezoidal window, a sinusoidal window, a Kaiser-Bessel Driven window, and the like. The windowing unit 230 may receive the side information necessary for windowing via the secondary path 260.
The forward transform unit 240 applies the MDCT to the modified input signal. Therefore, the time-domain signal is transformed to a frequency-domain signal and the forward transform unit 240 can extract spectral information from frequency-domain coefficients. The forward transform unit 240 may also receive the side information necessary for transform via the secondary path 260.
The formatter 250 formats information so as to be suitable for transmission and storage. The formatter 250 generates a digital information block including the spectral information extracted by the forward transform unit 240. The formatter 250 can pack quantization bits of a psychoacoustic model in the course of generating the information block. The formatter 250 can generate the information block in a format suitable for transmission and storage and can signal the information block. The formatter 250 may receive the side information necessary for formatting via the secondary path 260.
Referring to
The de-formatter 310 unpacks information transmitted from an encoder. By this unpacking, the side information on an input signal length, an applied window type, bit assignment, and the like can be extracted along with the spectral information. The unpacked side information can be transmitted to the units 310 to 350 of the MDCT unit 300 via a secondary path 360.
It is described herein that the side information necessary for the operations of the units 310 to 350 can be transmitted via the secondary path 360, but this is intended only for convenience for explanation and the necessary side information may be sequentially transmitted to the de-formatter 310, the inverse transform unit 320, the windowing unit 330, the modified overlap-addition processor 340, and the output processor 350 in accordance with the order of processing the spectral information without using a particular secondary path.
The inverse transform unit 320 generates frequency-domain coefficients from the extracted spectral information and inversely transforms the generated frequency-domain coefficients. The inverse transform may be performed depending on the transform method used in the encoder. When the MDCT is applied in the encoder, the inverse transform unit 320 can apply an IMDCT (Inverse MDCT) to the frequency-domain coefficients. The inverse transform unit 320 can perform an inverse transform operation, that is, can transform the frequency-domain coefficients into time-domain signals (for example, time-domain coefficients), for example, through the IMDCT. The inverse transform unit 320 may receive the side information necessary for the inverse transform via the secondary path 360.
The windowing unit 330 applies the same window as applied in the encoder to the time-domain signal (for example, the time-domain coefficients) generated through the inverse transform. The windowing unit 330 may receive the side information necessary for the windowing via the secondary path 360.
The modified overlap-addition processor 340 overlaps and adds the windowed time-domain coefficients (the time-domain signal) and reconstructs a speech signal. The modified overlap-addition processor 340 may receive the side information necessary for the windowing via the secondary path 360.
The output processor 350 outputs the overlap-added time-domain samples. At this time, the output signal may be a reconstructed speech signal or may be a signal requiring an additional post-process.
On the other hand, in the MDCT/IMDCT performed by the MDCT unit of the encoder and the IMDCT unit of the decoder, the MDCT is defined by Math Figure 1.
ãk=ak·w represents a windowed time-domain input signal and w represents a symmetric window function. αr represents N MDCT coefficients. âk represents a reconstructed time-domain input signal having 2N samples.
In a transform coding method, the MDCT is a process of transforming the time-domain signal into nearly-uncorrelated transform coefficients. In order to achieve a reasonable transmission rate, a long window is applied to a signal of a stationary section and the transform is performed. Accordingly, the volume of the side information can be reduced and a slow-varying signal can be more efficiently encoded. However, in this case, the total delay which occurs in application of the MDCT increases.
In order to prevent the total delay, a distortion due to a pre echo may be located in a temporal masking using a short window instead of the long window so as not to acoustically hear the distortion. However, in this case, the volume of the side information increases and the merit in the transmission rate is cancelled.
Therefore, a method (adaptive window switching) of switching a long window and a short window and adaptively modifying the window of a frame section to which the MDCT is applied can be used. Both a slow-varying signal and a fast-varying signal can be effectively processed using the adaptive window switching.
The specific method of the MDCT will be described below with reference to the accompanying drawings.
The MDCT can effectively reconstruct an original signal by cancelling an aliasing, which occurs in the course of transform, using the overlap-addition method.
As described above, the MDCT (Modified Discrete Cosine Transform) is a transform of transforming a time-domain signal into a frequency-domain signal, and the original signal, that is, the signal before the transform, can be perfectly reconstructed using the overlap-addition method.
A look-ahead (future) frame of a current frame with a length of N can be used to perform the MDCT on the current frame with a length of N. At this time, an analysis window with a length of 2N can be used for the windowing process.
Referring to
The length (2N) of the window is set depending on an analysis section. Therefore, in the example shown in
In order to apply the overlap-addition method, a predetermined section of the analysis section is set to overlap with the previous frame or subsequent frame. In the example shown in
In order to perform the MDCT on the (n−1)-th frame (“AB” section) with a length of N, a section with a length of 2N (“ABCD” section) including the n-th frame (“CD” section) with a length of N can be reconstructed. A windowing process of applying the analysis window to the reconstructed section is performed.
As for the n-th frame (“CD” section) with a length of N, an analysis section with a length of 2N (“CDEF” section) including the (n+1)-th frame (“EF” section) with a length of N for the MDCT is reconstructed and the window with a length of 2N is applied to the analysis section.
As described above, by using overlap-addition, the MDCT can perfectly reconstruct a signal before the transform. At this time, the window for windowing a time-domain signal should satisfy the condition of Math Figure 2 so as to perfectly reconstruct a signal before applying the MDCT.
ω1=ω4R, ω2=ω3R, ω1ω1+ω3ω3=ω2ω2+ω4ω4=1.0 <Math Figure 2>
In Math Figure 2 and
An example of the window satisfying the condition of Math Figure 2 is a symmetric window. Examples of the symmetric window include the trapezoidal window, the sinusoidal window, the Kaiser-Bessel Driven window, and the like. A window having the same shape as used in the encoder is used as a synthesis window used for synthesization in the decoder.
Referring to
An analysis window with a length of 2N is applied to the analysis section (S610). As shown in the figure, the first or second half of the analysis section to which the analysis window is applied overlaps with the previous or subsequent analysis section. Therefore, the signal before the transform can be perfectly reconstructed through the later overlap-addition.
Subsequently, a time-domain sample with a length of 2N is obtained through the windowing (S620).
The MDCT is applied to the time-domain sample to generate N frequency-domain transform coefficients (S630).
Quantized N frequency-domain transform coefficients are created through quantization (S640).
The frequency-domain transform coefficients are transmitted to the decoder along with the information block or the like.
The decoder obtains the frequency-domain transform coefficients from the information block or the like and generates a time-domain signal with a length of 2N including an aliasing by applying the IMDCT to the obtained frequency-domain transform coefficients (S650).
Subsequently, a window with a length of 2N (a synthesis window) is applied to the time-domain signal with a length of 2N (S660).
An overlap-addition process of adding overlapped sections is performed on the time-domain signal to which the window has been applied (S670). As shown in the drawing, by adding the section with a length of N in which the reconstructed signal with a length of 2N reconstructed in the (f−1)-th frame section and the reconstructed signal with a length of N reconstructed in the f-th frame section overlap with each other, the aliasing can be cancelled and a signal of the frame section before the transform (with a length of N) can be reconstructed.
As described above, the MDCT (Modified Discrete Cosine Transform) is performed by the forward transform unit (analysis filter bank) 240 in the MDCT unit 200 shown in
Specifically, the result as shown in Math Figure 3 can be obtained by performing the MDCT on an input signal ak including 2N samples in a frame with a length of 2N.
In Math Figure 3, ãk represents the windowed input signal, which is obtained by multiplying the input signal ak by a window function hk.
The MDCT coefficients can be calculated by performing an SDFT(N+1)/2, 1/2 on the windowed input signal of which the aliasing component is corrected. The SDFT (Sliding Discrete Fourier Transform) is a kind of time-frequency transform method. The SDFT is defined by Math Figure 4.
Here, u represents a predetermined sample shift value and v represents a predetermined frequency shift value. That is, the SDFT is to shift samples of the time axis and the frequency axis, while a DFT is performed in the time domain and the frequency domain. Therefore, the SDFT may be understood as generalization of the DFT.
It can be seen from the comparison of Math Figures 3 and 4 that the MDCT coefficients can be calculated by performing the SDFT(N+1)/2,1/2 on the windowed input signal of which the aliasing component is corrected as described above. That is, as can be seen from Math Figure 5, a value of a real part after the windowed signal and the aliasing component are subjected to the SDFT(N+1)/2, 1/2 is an MDCT coefficient.
αr=real{SDFT(N+1)/2,1/2(ãk)} <Math Figure 5>
The SDFT(N+1)/2, 1/2 can be arranged in Math Figure 6 using a general DFT (Discrete Fourier Transform).
In Math Figure 6, the first exponential function can be said to be the modulation of âk. That is, it represents a shift in the frequency domain by half a frequency sampling interval.
In Math Figure 6, the second exponential function is a general DFT. The third exponential function represents a shift in the time domain by (N+1)/2 of a sampling interval. Therefore, the SDFT(N+1)/2, 1/2 can be said to be a DFT of a signal which is shifted by (N+1)/2 of a sampling interval in the time domain and shifted by half a frequency sampling interval in the frequency domain.
As a result, the MDCT coefficient is the value of the real part after the time-domain signal is subjected to the SDFT. The relational expression of the input signal ak and the MDCT coefficient αr can be arranged in Math Figure 7 using the SDFT.
Here, {circumflex over (α)}r represents a signal obtained by correcting the windowed signal and the aliasing component after the MDCT transform using Math Figure 8.
Referring to
On the other hand, the IMDCT (Inverse MDCT) can be performed by the inverse transform unit (analysis filter bank) 320 of the IMDCT unit 300 shown in
The IMDCT can be defined by Math Figure 9.
Here, αr represents the MDCT coefficient and âk represents the IMDCT output signal having 2N samples.
The backward transform, that is, the IMDCT, has an inverse relationship with respect to the forward transform, that is, the MDCT. Therefore, the backward transform is performed using this relationship.
The time-domain signal can be calculated by performing the ISDFT (Inverse SDFT) on the spectrum coefficients extracted by the de-formatter 310 and then taking the real part thereof as shown in Math Figure 10.
In Math Figure 10, u represents a predetermined sample shift value in the time domain and v represents a predetermined frequency shift value.
Referring to
On the other hand, the IMDCT output signal âk includes an aliasing in the time domain, unlike the original signal. The aliasing included in the IMDCT output signal is the same as expressed by Math Figure 11.
As described above, when the MDCT is applied, the original signal is not perfectly reconstructed through the inverse transform (IMDCT) due to the aliasing component based on the MDCT and the original signal is perfectly reconstructed through the overlap-addition, unlike the DFT or the DCT. This is because information corresponding to the imaginary part is lost by taking the real part of the SDFT(N+1)/2, 1/2.
In order to reconstruct the “CD” frame section of the original signal, the “AB” frame section which is a previous frame section of the “CD” frame section and the “EF” frame section which is a look-ahead section thereof are necessary. Referring to
By applying the window shown in
The encoder applies the MDCT to “Aw1 to Dw4” and “Cw1 to Fw4”, and the decoder applies the IMDCT to “Aw1 to Dw4” and “Cw1 to Fw4” to which the MDCT has been applied.
Subsequently, the decoder applies a window to create sections “Aw1w2−Bw2Rw1, −Aw1Rw2+Bw2w2, Cw3w3+Dw4Rw3, and −Cw3w4+Dw4Rw4” and sections “Cw1w1−Dw2Rw1, −Cw1Rw2+Dw2w2, Ew3w3+Fw4Rw3, and −Ew3w4+Fw4Rw4”.
Then, by overlap-adding and outputting the sections “Aw1w2−Bw2Rw1, −Aw1Rw2+Bw2w2, Cw3w3+Dw4Rw3, and −Cw3w4+Dw4Rw4” and the sections “Cw1w1−Dw2Rw1, −Cw1Rw2+Dw2w2, Ew3w3+Fw4Rw3, and −Ew3w4+Fw4Rw4”, the “CD” frame section can be reconstructed like the original, as shown in the drawing. In the above-mentioned process, the aliasing component in the time domain and the value of the output signal can be obtained in accordance with the definitions of the MDCT and the IMDCT.
On the other hand, in the course of MDCT/IMDCT transform and overlap-addition, the look-ahead frame is required for perfectly reconstructing the “CD” frame section and thus a delay corresponding to the look-ahead frame occurs. Specifically, in order to perfectly reconstruct the current frame section “CD”, “CD” which is a look-ahead frame in processing the previous frame section “AB” is necessary and “EF” which is a look-ahead frame of the current frame is also necessary. Therefore, in order to perfectly reconstruct the current frame “CD”, the MDCT/IMDCT output of the “ABCD” section and the MDCT/I MDCT output of the “CDEF” section are necessary, and a structure is obtained in which a delay occurs by the “EF” section corresponding to the look-ahead frame of the current frame “CD”.
Therefore, a method can be considered which can prevent the delay occurring due to use of the look-ahead frame and raise the encoding/decoding speed using the MDCT/IMDCT as described above.
Specifically, an analysis frame including the current frame or a part of the analysis frame is self-replicated to create a modified input (hereinafter, referred to as a “modified input” for the purpose of convenience for explanation), a window is applied to the modified input, and then the MDCT/IMDCT can be performed thereon. By applying a window and creating a target section to be subjected to the MDCT/IMDCT through the self-replication of a frame without encoding/decoding the current frame on the basis of the processing result of a previous or subsequent frame, the MDCT/IMDCT can be rapidly performed without a delay to reconstruct a signal.
In the invention, as described above, an input (block) to which a window is applied is created by self-replicating the current frame “CD” or self-replicating a partial section of the current frame “CD”. Therefore, since it is not necessary to process a look-ahead frame so as to reconstruct the signal of the current frame, a delay necessary for processing a look-ahead frame does not occur.
Hereinafter, embodiments of the invention will be described in detail with reference to the accompanying drawings.
In the examples shown in
The encoder applies a window (current frame window) for reconstructing the current frame to the front section “ABCD” and the rear section “CDDD” of the modified input “ABCDDD”.
As shown in the drawing, the current frame window has a length of 2N to correspond to the length of the analysis frame and includes four sections corresponding to the length of the sub-frame.
The current frame window with a length of 2N used to perform the MDCT/IMDCT includes four sections each corresponding to the length of the sub-frame.
Referring to
The encoder transmits the encoded information to the decoder after applying the MDCT to the inputs. The decoder obtains the inputs to which the MDCT has been applied from the received information and applies the obtained inputs.
The MDCT/IMDCT result shown in the drawing can be obtained by processing the inputs to which the window has been applied on the basis of the above-mentioned definitions of MDCT and IMDCT.
The decoder creates outputs to which the same window as applied in the encoder is applied after applying the IMDCT. As shown in the drawing, the decoder can finally reconstruct the signal of the “CD” section by overlap-adding the created two outputs. At this time, the signal other than the “CD” section is cancelled by applying the condition (Math Figure 2) necessary for perfect reconstruction as described above.
In the examples shown in
Referring to
The current frame window with a length of N used to perform the MDCT/IMDCT includes four sections each corresponding to the length of the sub-frame.
The encoder applies the current frame window with a length of N to the front section “CC”, that is, “C1C2”, of the front section “CC” of the modified input “CCDD”, applies the current frame window to the intermediate section “CD”, that is, “C1C2D1D2”, and performs the MDCT/IMDCT thereon. The encoder applies the current frame window with a length of N to the intermediate section “CD”, that is, “C1C2D1D2”, of the front section “CC” of the modified input “CCDD”, applies the current frame window to the rear section “DD”, that is, “D1D2D1D2”, and performs the MDCT/IMDCT thereon.
The encoder transmits the encoded information to the decoder after applying the MDCT to the inputs, and the decoder obtains the inputs to which the MDCT has been applied from the received information and applies the IMDCT on the obtained inputs.
The MDCT/IMDCT results shown in
The decoder creates outputs to which the same window as applied in the encoder is applied after applying the IMDCT. The decoder can finally reconstruct the signal of the “C” section, that is, “C1C2”, by overlap-adding the two outputs. At this time, the signal other than the “C” section is cancelled by applying the condition (Math Figure 2) necessary for perfect reconstruction as described above.
The encoder transmits the encoded information to the decoder after applying the MDCT to the inputs, and the decoder obtains the inputs to which the MDCT has been applied from the received information and applies the IMDCT on the obtained inputs.
The MDCT/IMDCT results shown in
The decoder creates outputs to which the same window as applied in the encoder is applied after applying the IMDCT. The decoder can finally reconstruct the signal of the “D” section, that is, “D1D2”, by overlap-adding the two outputs. At this time, the signal other than the “D” section is cancelled by applying the condition (Math Figure 2) necessary for perfect reconstruction as described above.
Therefore, the decoder can finally perfectly reconstruct the current frame “CD” as shown in
In the examples shown in
Referring to
Here, the sub-frame section “C” includes sub-sections “C1” and “C2” as shown in the drawing, and a sub-frame section “D” also includes sub-sections “D1” and “D2” as shown in the drawing. Therefore, the modified input is “B2C1C2D1D2D2”.
The current frame window with a length of N/2 used to perform the MDCT/IMDCT includes four sections each corresponding to a half length of the sub frame. The sub-sections of the modified input “B2C1C2D1D2D2” include smaller sections to correspond to the sections of the current frame window. For example, “B2” includes “B21 B22”, “C” includes “C11C12”, “C2” includes “C21C22”, “D1” includes “D11D12”, and “D2” includes “D21D22”.
The encoder performs the MDCT/IMDCT the section “B2C1” and the section “C1C2” of the modified input by applying the current frame window with a length of N/2 thereto. The encoder performs the MDCT/IMDCT on the section “C1C2” and the section “C2D1” of the modified input by applying the current frame window with a length of N/2 thereto.
The encoder performs the MDCT/IMDCT on the section “C2D1” and the section “D1D2” of the modified input by applying the current frame window with a length of N/2 thereto, and performs the MDCT/IMDCT on the section “D1D2” and the section “D2D2” of the modified input by applying the current frame window with a length of N/2 thereto.
The encoder transmits the encoded information to the decoder after applying the MDCT to the inputs, and the decoder obtains the inputs to which the MDCT has been applied from the received information and applies the IMDCT on the obtained inputs.
The MDCT/IMDCT results shown in
The decoder creates outputs to which the same window as applied in the encoder is applied after applying the IMDCT. The decoder can finally reconstruct the signal of the section “C1”, that is, “C11C12”, by overlap-adding the two outputs. At this time, the signal other than the section “C1” is cancelled by applying the condition (Math Figure 2) necessary for perfect reconstruction as described above.
As a result, the encoder/decoder can finally perfectly reconstruct the current frame “CD” as shown in
In the examples shown in
Referring to
The current frame window with a length of 2N used to perform the MDCT/IMDCT includes four sections each corresponding to the length of the sub frame.
The encoder performs the MDCT/IMDCT on the front section “CCCD” of the modified input and the rear section “CDDD” of the modified input by applying the current frame window to the front section and the rear section of the modified input.
The encoder transmits the encoded information to the decoder after applying the MDCT to the inputs, and the decoder obtains the inputs to which the MDCT has been applied from the received information and applies the IMDCT on the obtained inputs.
The MDCT/IMDCT results shown in
The decoder creates outputs to which the same window as applied in the encoder is applied after applying the IMDCT. The decoder can finally reconstruct the current frame “CD” by overlap-adding the created two outputs. At this time, the signal other than the “CD” section is cancelled by applying the condition (Math Figure 2) necessary for perfect reconstruction as described above.
In the examples shown in
Referring to
The current frame window with a length of N used to perform the MDCT/IMDCT includes four sections each corresponding to the length of the sub-frame.
The encoder applies the current frame window with a length of N to the section “CC” and the section “CD” of the modified input to perform the MDCT/IMDCT thereon and applies the current frame window with a length of N to the section “CD” and the section “DD” to perform the MDCT/IMDCT thereon.
The encoder transmits the encoded information to the decoder after applying the MDCT to the inputs, and the decoder obtains the inputs to which the MDCT has been applied from the received information and applies the IMDCT on the obtained inputs.
The MDCT/IMDCT results shown in
The decoder creates outputs to which the same window as applied in the encoder is applied after applying the IMDCT. The decoder can finally reconstruct the signal of the “C” section, that is, “C1C2”, by overlap-adding the two outputs. At this time, the signal other than the “C” section is cancelled by applying the condition (Math Figure 2) necessary for perfect reconstruction as described above.
As a result, the encoder/decoder can finally perfectly reconstruct the current frame “CD” as shown in
In the examples shown in
Referring to
The current frame window with a length of N/2 used to perform the MDCT/IMDCT includes four sections each corresponding to a half length of the sub frame. The sub-sections of the modified input “C1C1C2D1D2D2” include smaller sections to correspond to the sections of the current frame window. For example, “C1” includes “C11C12”, “C2” includes “C21C22”, “D1” includes “D11D12”, “and D2” includes “D21D22”.
The encoder performs the MDCT/IMDCT the section “C1C1” and the section “C1C2” of the modified input by applying the current frame window with a length of N/2 thereto. The encoder performs the MDCT/IMDCT on the section “C1C2” and the section “C2D1” of the modified input by applying the current frame window with a length of N/2 thereto.
The encoder performs the MDCT/IMDCT on the section “C2D1” and the section “D1D2” of the modified input by applying the current frame window with a length of N/2 thereto, and performs the MDCT/IMDCT on the section “D1D2” and the section “D2D2” of the modified input by applying the current frame window with a length of N/2 thereto.
The encoder transmits the encoded information to the decoder after applying the MDCT to the inputs, and the decoder obtains the inputs to which the MDCT has been applied from the received information and applies the IMDCT on the obtained inputs.
The MDCT/IMDCT results shown in
The decoder generates outputs to which the same window as applied in the encoder is applied after applying the IMDCT. The decoder can finally reconstruct the signal of the section “C1”, that is, “C11C12”, by overlap-adding the two outputs. At this time, the signal other than the “C1” section is cancelled by applying the condition (Math Figure 2) necessary for perfect reconstruction as described above.
As a result, the encoder/decoder can finally perfectly reconstruct the current frame “CD” as shown in
The process of performing the MDCT/IMDCT will be described below with reference to
When time-domain samples are input as an input signal, the buffer 210 generates a block or frame sequence of the input signal. For example, as shown in
As shown in the drawing, the length of the current frame “CD” is N and the lengths of the sub-frames “C” and “D” of the current frame “CD” are N/2.
In this embodiment, an analysis frame with a length of N is used as shown in the drawing, and thus the current frame can be used as the analysis frame.
The modification unit 220 can generate a modified input with a length of 2N by self-replicating the analysis frame. In this embodiment, th e modified input “CDCD” can be generated by self-replicating the analysis frame “CD” and adding the replicated frame to the front end or the rear end of the analysis frame.
The windowing unit 230 applies the current frame window with a length of 2N to the modified input with a length of 2N. The length of the current frame window is 2N as shown in the drawing and includes four sections each corresponding to the length of each section (sub-frame “C” and “D”) of the modified input. Each section of the current frame window satisfies the relationship of Math Figure 2.
The windowing unit 230 outputs a modified input 1700 “Cw1, Dw2, Cw3, Dw4” to which the window has been applied as shown in the drawing.
The forward transform unit 240 transforms the time-domain signal into a frequency-domain signal as described with reference to
The formatter 250 generates digital information including spectral information. The formatter 250 performs a signal compressing operation and an encoding operation and performs a bit packing operation. In general, for the purpose of storage and transmission, the spectral information is binarized along with the side information in the course of compressing the time-domain signal using an encoding block to generate a digital signal. The formater can perform processes based on a quantization scheme and a psychoacoustic model, can perform a bit packing operation, and can generate side information.
The de-formatter 310 of the IMDCT unit 300 of the decoder performs the functions associated with decoding a signal. Parameters and the side information (block/frame size, window length/shape, and the like) encoded with the binarized bits are decoded.
The side information of the extracted information can be transmitted to the inverse transform unit 320, the windowing unit 330, the modified overlap-adding processor 340, and the output processor 350 via the secondary path 360.
The inverse transform unit 320 generates frequency-domain coefficients from the spectral information extracted by the de-formatter 310 and inversely transforms the coefficients into the time-domain signal. The inverse transform used at this time corresponds to the transform method used in the encoder. In the invention, the encoder uses the MDCT and the decoder uses the IMDCT to correspond thereto.
The windowing unit 330 applies the same window as applied in the encoder to the time-domain coefficients generated through the inverse transform, that is, the IMDCT. In this embodiment, a window with a length of 2N including four sections w1, w2, w3, and w4 can be applied as shown in the drawing.
As shown in the drawing, it can be seen that an aliasing component 1730 is maintained in a result 1725 of application of the window.
The modified overlap-adding processor (or the modification unit) 350 reconstructs a signal by overlap-adding the time-domain coefficients having the window applied thereto.
The output processor 350 outputs the reconstructed signal.
The process of performing the MDCT/IMDCT will be described below with reference to
When time-domain samples are input as an input signal, the buffer 210 generates a block or frame sequence of the input signal. For example, as shown in
In this embodiment, a look-ahead frame “Epart” with a length of M is added to the rear end of the current frame with a length of N and the result is used as the analysis frame for the purpose of the forward transform, as shown in the drawing. The look-ahead frame “Epart” is a part of the sub-frame “E” in the look-ahead frame “EF”.
The modification unit 220 can generate a modified input by self-replicating the analysis frame. In this embodiment, the modified input “CD EpartCDEpart” can be generated by self-replicating the analysis frame “CDEpart” and adding the replicated frame to the front end or the rear end of the analysis frame. At this time, a trapezoidal window with a length of N+M may be first applied to the analysis frame with a length of N+M and then the self-replication may be performed.
Specifically, as shown in
The windowing unit 230 applies the current frame window with a length of 2N+2M to the modified input with a length of 2N. The length of the current frame window is 2N+2M as shown in the drawing and includes four sections each satisfying the relationship of Math Figure 2.
Here, instead of applying the current frame window with a length of 2N+2M again to the modified input generated by applying the trapezoidal window with a length of N+M, the current frame window having a trapezoidal shape can be once applied. For example, the modified input with a length of 2N+2M can be generated by applying the trapezoidal window with a length of N+M and then performing the self-replication. The modified input may be generated by self-replicating the frame section “CDEpart” itself not having the window applied thereto and then applying a window with a length 2N+2M having trapezoidal shapes connected.
The forward transform unit 240 transforms the time-domain signal into a frequency-domain signal as described with reference to
The formatter 250 generates digital information including spectral information. The formatter 250 performs a signal compressing operation and an encoding operation and performs a bit packing operation. In general, for the purpose of storage and transmission, the spectral information is binarized along with the side information in the course of compressing the time-domain signal using an encoding block to generate a digital signal. The formater can perform processes based on a quantization scheme and a psychoacoustic model, can perform a bit packing operation, and can generate side information.
The de-formatter 310 of the IMDCT unit 300 of the decoder performs the functions associated with decoding a signal. Parameters and the side information (block/frame size, window length/shape, and the like) encoded with the binarized bits are decoded.
The side information of the extracted information can be transmitted to the inverse transform unit 320, the windowing unit 330, the modified overlap-adding processor 340, and the output processor 350 via the secondary path 360.
The inverse transform unit 320 generates frequency-domain coefficients from the spectral information extracted by the de-formatter 310 and inversely transforms the coefficients into the time-domain signal. The inverse transform used at this time corresponds to the transform method used in the encoder. In the invention, the encoder uses the MDCT and the decoder uses the IMDCT to correspond thereto.
As shown in the drawing, the inverse transform unit 320 generates a time-domain signal 1825 through the inverse transform. In this embodiment, the length of the section on which the transform is performed is 2N+2M, as described above. An aliasing component 1830 is continuously maintained and generated in the course of performing the MDCT/IMDCT.
The windowing unit 330 applies the same window as applied in the encoder to the time-domain coefficients generated through the inverse transform, that is, the IMDCT. In this embodiment, a window with a length of 2N+2M including four sections w1, w2, w3, and w4 can be applied as shown in the drawing.
As shown in
The modified overlap-adding processor (or the modification unit) 350 reconstructs a signal by overlap-adding the time-domain coefficients having the window applied thereto.
The component “Epart” included in “Cmodi” and “Dmodi” remains For example, as shown in
On the other hand,
As described above, even when the current frame “CD” is reconstructed, the application of the trapezoidal window is not described with reference to
As shown in
Therefore, by overlap-adding the currently-reconstructed trapezoidal “CDEpart” 1870 to the previously-reconstructed trapezoidal “Cpart” 1875, the current frame “CD” 1880 can be perfectly reconstructed. At this time, “Epart” reconstructed along with the current frame “CD” can be stored in the memory for the purpose of reconstruction of a look-ahead frame “EF”.
The output processor 350 outputs the reconstructed signal.
In the above-mentioned embodiments, the signals passing through the MDCT in the encoder, being output from the formatter and the de-formatter, and being subjected to the IMDCT can include an error due to quantization performed by the formatter and the de-formatter, but it is assumed for the purpose of convenience for explanation that when the error occurs, the error is included in the IMDCT result. However, by applying the trapezoidal window as described in Embodiment 8 and overlap-adding the result, it is possible to reduce the error of the quantization coefficients.
In Embodiments 1 to 8, it is described with reference to
Therefore, in Embodiment 8, other symmetric windows which can perfectly reconstruct the sub-frame “C” by overlap-addition can be used instead of the trapezoidal window. For example, as a window with a length of N+M having the same length as the trapezoidal window applied in
The encoder generates an input signal as a frame sequence and then specifies an analysis frame (S1910). The encoder specifies frames to be used as the analysis frame out of the overall frame sequence. Sub-frames and sub-sub-frames of the sub-frames in addition to the frames may be included in the analysis frame.
The encoder generates a modified input (S1920). As described above in the embodiments, the encoder can generate a modified input for perfectly reconstructing a signal through the MDCT/IMDCT and the overlap-addition by self-replicating the analysis frame or self-replicating a part of the analysis frame and adding the replicated frame to the analysis frame. At this time, in order to generate a modified input having a specific shape, a window having a specific shape may be applied to the analysis frame or the modified input in the course of generating the modified input.
The encoder applies the window to the modified input (S1930). The encoder can generate a process unit to which the MDCT/IMDCT should be performed by applying the windows by specific sections of the modified input, for example, by the front section and the rear section, or the front section, the intermediate section, and the rear section. At this time, the window to be applied is referred to as a current frame window so as to represent that it is applied for the purpose of processing the current frame in this specification, for the purpose of convenience for explanation.
The encoder applies the MDCT (S1940). The MDCT can be performed by the process units to which the current frame window is applied. The details of the MDCT is the same as described above.
Subsequently, the encoder can perform a process of transmitting the result of application of the MDCT to the decoder (S1950). The shown encoding process can be performed as the process of transmitting information to the decoder. At this time, the side information or the like in addition to the result of application of the MDCT can be transmitted to the decoder.
When the decoder receives the encoded information of a speech signal from the encoder, the decode de-formats the received information (S2010). The encoded and transmitted signal is decoded through the de-formatting and the side information is extracted.
The decoder performs the IMDCT on the speech signal received from the encoder (S2020). The decoder performs the inverse transform corresponding to the transform method performed in the encoder. In the invention, the encoder performs the MDCT and the decoder performs the IMDCT. Details of the IMDCT are the same as described above.
The decoder applies the window again to the result of application of the IMDCT (S2030). The window applied by the decoder is the same window as applied in the encoder and specifies the process unit of the overlap-addition.
The decoder causes the results of application of the window to overlap (overlap-add) with each other (S2040). The speech signal subjected to the MDCT/IMDCT can be perfectly reconstructed through the overlap-addition. Details of the overlap-addition are the same as described above.
For the purpose of convenience for explanation, the sections of a signal are referred to as “frames”, “sub-frames”, “sub-sections”, and the like. However, this is intended only for convenience for explanation, and each section may be considered simply as a “block” of a signal for the purpose of easy understanding.
While the methods in the above-mentioned exemplary system have been described on the basis of flowcharts including a series of steps or blocks, the invention is not limited to the order of steps and a certain step may be performed in a step or an order other than described above or at the same time as described above. The above-mentioned embodiments can include various examples. Therefore, it should be understood that the invention includes all other substitutions, changes, and modifications belonging to the appended claims.
When it is mentioned above that an element is “connected to” or “coupled to” another element, it should be understood that still another element may be interposed therebetween, as well as that the element may be connected or coupled directly to another element. On the contrary, when it is mentioned that an element is “connected directly to” or “coupled directly to” another element, it should be understood that still another element is not interposed therebetween.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/KR2011/008981 | 11/23/2011 | WO | 00 | 5/23/2013 |
Number | Date | Country | |
---|---|---|---|
61417214 | Nov 2010 | US | |
61531582 | Sep 2011 | US |