A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
The present patent application claims priority to the corresponding provisional patent application Ser. No. 60/589,286, entitled “Method and Apparatus for Coding Audio Signals,” filed on Jul. 19, 2004.
The present invention relates to the field of signal coding; more particularly, the present invention relates to coding of waveforms, such as, but not limited to, audio signals using sinusoidal prediction.
After the introduction of the CD format in the mid eighties, a flurry of application that involved digital audio and multimedia technologies started to emerge. Due to the need of common standards, the International Organization for Standardization (ISO) and the International Electro-technical Commission (IEC) formed a standardization group responsible for the development of various multimedia standards, including audio coding. The group is known as Moving Pictures Experts Group (MPEG), and has successfully developed various standards for a large array of multimedia applications. For example, see M. Bosi and R. Goldberg, Introduction to Digital Audio Coding and Standards, Kluwer Academic Publishers, 2003.
Audio compression technologies are essential for the transmission of high-quality audio signals over band-limited channels, such as a wireless channel. Furthermore, in the context of two-way communications, compression algorithms with low delay are required.
An audio coder consists of two major blocks: an encoder and a decoder. The encoder takes an input audio signal, which in general is a discrete-time signal with discrete amplitude in the pulse code modulation (PCM) format, and transforms it into an encoded bit-stream. The encoder is designed to generate a bit-stream having a bit-rate that is lower than that of the input audio signal, achieving therefore the goal of compression. The decoder takes the encoded bit-stream to generate the output audio signal, which approximates the input audio signal in some sense.
Existing audio coders may be classified into one of three categories: waveform coders, transforms coders, and parametric coders.
Waveform coders attempt to directly preserve the waveform of an audio signal. Examples include the ITU-T G.711 PCM standard, the ITU-T G.726 ADPCM standard, and the ITU-T G.722 standard. See, for example, W. Chu, Speech Coding Algorithms: Foundation and Evolution of Standardized Coders, John Wiley & Sons, 2003. Generally speaking, waveform coders provide good quality only at relatively high bit-rate, due to the large amount of information necessary to preserve the waveform of the signal.
That is, waveform coders require a large amount of bits to preserve the waveform of an audio signal and are thus not suitable for low-to-medium-bitrate applications.
Other audio coders are classified as transform coders, or subband coders. These coders map the signal into alternative domains, normally related to the frequency content of the signal. By mapping the signal into alternative domains, energy compaction can be realized, leading to high coding efficiency. Examples of this class of coders include the various coders of the MPEG-1 and MPEG-2 families: Layer-I, Layer-II, Layer-III (MP3), and advanced audio coding (AAC). M. Bosi and R. Goldberg, Introduction to Digital Audio Coding and Standards, Kluwer Academic Publishers, 2003. These coders provide good quality at medium bit-rate, and are the most popular for music distribution applications.
Also, transform coders provide better quality than waveform coders at low-to-medium bitrates. However, the coding delay introduced by the mapping renders them unsuitable for applications, such as two-way communications, where a low coding delay is required. For more information on transform coders, see T. Painter and A. Spanias, “Percerptual Coding of Digital Audio,” Proceedings of the IEEE, Vol. 88, No. 4, pp. 451-513, April 2000.
More recently, researchers have explored the use of models in audio coding, with the model controlled by a few parameters. By estimating the parameters of the model from the input signal, very high coding efficiency can be achieved. These kinds of coders are referred to as parametric coders. For more information on parametric coders, see B. Edler and H. Purnhagen, “Concepts for Hybrid Audio Coding Schemes Based on Parametric Techniques,” IEEE ICASSP, pp. II-1817-II-1820, 2002, and H. Purhagen, “Advances in Parametric Audio Coding,” IEEE Workshop on Applications of Signals Processing to Audio and Acoustics, pp. W99-1 to W99-4, October 1999. An example of parametric coder is the MPEG-4 harmonic and individual lines plus noise (HILN) coder, where the input audio signal is decomposed into harmonic, individual sine waves (lines), and noise, which are separately quantized and transmitted to the decoder. The technique is also known as sinusoidal coding, where parameters of a set of sinusoids, including amplitude, frequency, and phase, are extracted, quantized, and included as part of the bit-stream. See H. Purnhagen, N. Meine, and B. Edler, “Speeding up HILN—MPEG-4 Parametric Audio Encoding with Reduced Complexity,” 109th AES Convention, Los Angeles, September 2000, ISO/IEC, Information Technology—Coding of Audio-Visual Object—Part 3: Audio, Amendment 1: Audio Extensions, Parametric Audio Coding (HILN), 14496-3, 2000. An audio coder based on principles similar to that of the HILN can be found in a recent U.S. Patent Application No. 6,266,644, entitled, “Audio Encoding Apparatus and Methods”, issued Jul. 24, 2001. Other schemes following similar principles can be found in A. Ooment, A. Cornelis, and D. Brinker, “Sinusoidal Coding,” U.S. Patent Application No. U.S. 2002/0007268A1, published Jan. 17, 2002, and T. Verma, “A Perceptually Based Audio Signal Model with Application to Scalable Audio Compression,” Ph.D. dissertation—Stanford University, October 1999.
The principles of parametric coding have been widely used in speech coding applications, where a source-filter model is used to capture the dynamic of the speech signal, leading to low bit-rate applications. The code excited linear prediction (CELP) algorithm is perhaps the most successful method in speech coding, where numerous international standards are based on it. For more information on CELP, see W. Chu, Speech Coding Algorithms: Foundation and Evolution of Standardized Coders, John Wiley & Sons, 2003. The problem with these coders is that the adopted model lacks the flexibility to capture the behavior of general audio signals, leading to poor performance when the input signal is different from speech.
Sinusoidal coders are highly suitable for the modeling of a wide class of audio signals, since in many instances they have a periodic appearance in time domain. By combining with a noise model, sinusoidal coders have the potential to provide good quality at low bit-rate. All sinusoidal coders developed until recently operate in a forward-adaptive manner, meaning that the parameters of the individual sinusoids—including amplitude, frequency, and phase—must be explicitly transmitted as part of the bit-stream. Because this transmission is expensive, only a selected number of sinusoids can be transmitted for low bit-rate applications. See H. Purnhagen, N. Meine, and B. Edler, “Sinusodial Coding Using Loudness-Based Component Selection,” IEEE ICASSP, pp. II-1817-II-1820, 2002. Due to this constraint, the achievable quality of sinusoidal coders, such as the MPEG-4 HILN standard, is quite modest.
A method and apparatus for coding information are described. In one embodiment, an encoder for encoding a first set of data samples comprises a waveform analyzer to determine a set of waveform parameters from a second set of data samples, a waveform synthesizer to generate a set of predicted samples from the set of waveform parameters; and a first encoder to generate a bit-stream based on a difference between the first set of data samples and the set of predicted samples.
The present invention will be understood more fully from the detailed description given below and from the accompanying drawings of various embodiments of the invention, which, however, should not be taken to limit the invention to the specific embodiments, but are for explanation and understanding only.
A method and apparatus is described herein for coding signals. These signals may be audio signals or other types of signals. In one embodiment, the coding is performed using a waveform analyzer. The waveform analyzer extracts a set of waveform parameters from previously coded samples. A prediction scheme uses the waveform parameters to generate a prediction with respect to which samples are coded. The prediction scheme may include waveform matching. In one embodiment of waveform matching, given the input signal samples, a similar waveform is found inside a codebook or dictionary that best matches the signal. The stored codebook, or dictionary, contains a number of signal vectors. Within the codebook, it is also possible to store some signal samples representing the prediction associated with each signal vectors or codevectors. Therefore, the prediction is read from the codebook based on the matching results.
In one embodiment, the waveform matching technique is sinusoidal prediction. In sinusoidal prediction, the input signal is matched against the sum of a group of sinusoids. More specifically, the signal is analyzed to extract a number of sinusoids and the set of the extracted sinusoids is then used to form the prediction. Depending on the application, the prediction can be one or several samples toward the future. In one embodiment, the sinusoidal analysis procedure includes estimating parameters of the sinusoidal components from the input signal and, based on the estimated parameters, forming a prediction using an oscillator consisting of the sum of a number of sinusoids.
In one embodiment, sinusoidal prediction is incorporated into the framework of a backward adaptive coding system, where redundancies of the signal are removed based on past quantized samples of the signal. Sinusoidal prediction can also be used within the framework of a lossless coding system.
In the following description, numerous details are set forth to provide a more thorough explanation of the present invention. It will be apparent, however, to one skilled in the art, that the present invention may be practiced without these specific details. In other instances, well-known structures and devices are shown in block diagram form, rather than in detail, in order to avoid obscuring the present invention.
Some portions of the detailed descriptions which follow are presented in terms of algorithms and symbolic representations of operations on data bits within a computer memory. These algorithmic descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. An algorithm is here, and generally, conceived to be a self-consistent sequence of steps leading to a desired result. The steps are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like.
It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise as apparent from the following discussion, it is appreciated that throughout the description, discussions utilizing terms such as “processing” or “computing” or “calculating” or “determining” or “displaying” or the like, refer to the action and processes of a computer system, or similar electronic computing device, that manipulates and transforms data represented as physical (electronic) quantities within the computer system's registers and memories into other data similarly represented as physical quantities within the computer system memories or registers or other such information storage, transmission or display devices.
The present invention also relates to apparatus for performing the operations herein. This apparatus may be specially constructed for the required purposes, or it may comprise a general purpose computer selectively activated or reconfigured by a computer program stored in the computer. Such a computer program may be stored in a computer readable storage medium, such as, but is not limited to, any type of disk including floppy disks, optical disks, CD-ROMs, and magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, or any type of media suitable for storing electronic instructions, and each coupled to a computer system bus.
The algorithms and displays presented herein are not inherently related to any particular computer or other apparatus. Various general purpose systems may be used with programs in accordance with the teachings herein, or it may prove convenient to construct more specialized apparatus to perform the required method steps. The required structure for a variety of these systems will appear from the description below. In addition, the present invention is not described with reference to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of the invention as described herein.
A machine-readable medium includes any mechanism for storing or transmitting information in a form readable by a machine (e.g., a computer). For example, a machine-readable medium includes read only memory (“ROM”); random access memory (“RAM”); magnetic disk storage media; optical storage media; flash memory devices; electrical, optical, acoustical or other form of propagated signals (e.g., carrier waves, infrared signals, digital signals, etc.); etc.
System and Coder Overview
More specifically, encoder 200 comprises a buffer 214 containing a number of previously reconstructed samples 205. In one embodiment, the size of buffer 214 is larger than the size of the set of input samples 201. For example, buffer 214 may contain 140 reconstructed samples. Initially, the value of the samples in buffer 214 may be set to a default value. For example, all values may be set to 0. In one embodiment, buffer 214 operates in a first-in, first-out mode. That is, when a sample is inserted into buffer 214, a sample that has been in buffer 214 the longest amount of time is removed from buffer 214 so as to keep constant the number of samples in buffer 214.
Prediction generator 212 generates a set of predicted samples 206 from a set of analysis samples 208 stored in buffer 214. In one embodiment, prediction generator 212 comprises a waveform analyzer 221 and a waveform synthesizer 220 as further described below. Waveform analyzer 221 receives analysis samples 208 from buffer 214 and generates a number of waveform parameters 207. In one embodiment, analysis samples 208 comprise all the samples stored in buffer 214. In one embodiment, waveform parameters 207 include a set of amplitudes, phases and frequencies describing one or more waveforms. Waveform parameters 207 may be derived such that the sum of waveforms described by waveform parameters 207 approximates analysis samples 208. An exemplary process by which waveform parameters 207 are computed is further described below. In one embodiment, waveform parameters 207 describe one or more sinusoids. Waveform synthesizer 220 receives waveform parameters 207 from waveform analyzer 221 and generates a set of predicted samples 206 based on the received waveform parameters 207.
Subtractor 210 subtracts predicted samples 206 received from prediction generator 212 from input samples 201 and outputs a set of residual samples 202. Residual encoder 211 receives residual samples 202 from subtractor 210 and outputs codeword 203, which is a coded representation of residual samples 202. Residual encoder 211 further generates a set of reconstructed residual samples 204.
In one embodiment, residual encoder 211 uses a vector quantizer. In such a case residual encoder 211 matches residual samples 202 with a dictionary of codevectors and selects the codevector that best approximates residual samples 202. Codeword 203 may represent the index of the selected codevector in the dictionary of codevectors. The set of reconstructed residual samples 204 is given by the selected codevector. In an alternate embodiment, residual encoder 211 uses a lossless entropy encoder to generate codeword 203 from residual samples 202. For example, the lossless entropy encoder may use algorithms such as those described in “Lossless Coding Standards for Space Data Systems” by Robert F. Rice, 30th Asilomar Conference on Signals, Systems and Computers, Vol. 1, pp. 577-585, 1996. In one embodiment, reconstructed residual samples 204 are equal to residual samples 202.
Encoder 200 further comprises adder 213 that adds reconstructed residual samples 204 received from residual encoder 211 and predicted samples 206 received from prediction generator 212 to form a set of reconstructed samples 205. Reconstructed samples 205 are then stored in buffer 214.
Referring to
With the predicted samples, processing logic subtracts the set of predicted samples from the input samples, resulting in a set of residual samples (processing block 304). Processing logic encodes the set of residual samples into a codeword and generates a set of reconstructed residual samples based on the codeword (processing block 305). Afterwards, processing logic adds the set of reconstructed residual samples to the set of predicted samples to form a set of reconstructed samples (processing block 306). Processing logic stores the set of reconstructed samples into the buffer (processing block 307).
Processing logic determines whether more input samples need to be coded (processing block 308). If there are more input samples to be coded, the process transitions to processing block 301 and the process is repeated for the next set of input samples. Otherwise, the encoding process terminates.
Decoder 400 comprises a buffer 412 containing a number of previously decoded samples (e.g., previously generated output samples 403). In one embodiment, the size of buffer 412 is larger than the size of the set of input samples. For example, buffer 412 may contain 160 reconstructed samples. Initially, the value of the samples in buffer 412 may be set to a default value. For example, all values may be set to 0. In one embodiment, buffer 412 may operate in a first-in, first-out mode. That is, when a sample is inserted into buffer 412, a sample that has been in buffer 412 the longest amount of time is removed from buffer 412 in order to keep constant the number of samples in buffer 412.
Residual decoder 410 receives codeword 401 and outputs a set of reconstructed residual samples 402. In one embodiment, residual decoder 410 uses a dictionary of codevectors. Codeword 401 may represent the index of a selected codevector in the dictionary of codevectors. Reconstructed residual samples 402 are given by the selected codevector. In an alternate embodiment, residual decoder 410 may uses a lossless entropy decoder to generate reconstructed residual samples 402 from the codeword 401. For example, the lossless entropy encoder may use algorithms such as those described in “Lossless Coding Standards for Space Data Systems” by Robert F. Rice, 30th Asilomar Conference on Signals, Systems and Computers, Vol. 1, pp. 577-585, 1996.
Decoder 200 further comprises adder 411 that adds reconstructed residual samples 402 received from residual decoder 410 and predicted samples 405 received from prediction generator 413 to form output samples 403. Output samples 403 are then stored in buffer 412.
Prediction generator 413 generates a set of predicted samples 405 from a set of analysis samples 404 stored in buffer 412. In one embodiment 413, prediction generator 413 comprises a waveform analyzer 421 and a waveform synthesizer 420. Waveform analyzer 421 receives analysis samples 404 from buffer 412 and generates a number of waveform parameters 406. In one embodiment, analysis samples 404 comprise all the samples stored in buffer 412. Waveform parameters 406 may include a set of amplitudes, phases and frequencies describing one or more waveforms. In one embodiment, waveform parameters 406 are derived such that the sum of waveforms described by waveform parameters 406 approximates analysis samples 404. An example process by which the waveform parameters 406 are computed is further described below. In one embodiment, waveform parameters 406 describe one or more sinusoids. Waveform synthesizer 420 receives waveform parameters 406 from waveform analyzer 421 and generates predicted samples 405 based on received waveform parameters 406.
Referring to
Using the waveform parameters, processing logic generates a set of predicted samples based on the set of waveform parameters (processing block 503). Then, processing logic decodes the codeword and generates a set of reconstructed residual samples based on the codeword (processing block 504) and adds the set of reconstructed residual samples to the set of predicted samples to form a set of reconstructed samples (processing block 505). Processing logic stores the set of reconstructed samples in the buffer (processing block 506) and also outputs the reconstructed samples (processing block 507).
After outputting reconstructed samples, processing logic determines whether more codewords are available for decoding (processing block 508). If more codewords are available, the process transitions to processing block 501 where the process is repeated for the next codeword. Otherwise, the process ends.
In one embodiment, the waveform matching prediction technique is sinusoidal prediction.
Referring to
Referring to
If the stop condition is satisfied, processing transitions to processing block 608 where processing logic outputs predicted samples and the process ends. Otherwise, processing transitions to processing block 604 where processing logic determines parameters of a sinusoid from the set of analysis samples.
The parameters of the sinusoid may include an amplitude, a phase and a frequency. The parameters of the sinusoid may be chosen such as to reduce a difference between the sinusoid and the set of analysis samples. For example, the method described in “Speech Analysis/Synthesis and Modification Using an Analysis-by-Synthesis/Overlap-Add Sinusoidal Model” by E. George and M. Smith IEEE Transactions on Speech and Audio Processing, Vol. 5, No. 5, pp. 389-406, September 1997 may be used.
Afterwards, processing logic subtracts the determined sinusoid from the set of analysis samples (processing block 605), with the resultant samples used as analysis samples in the next iteration of the loop. Processing logic then determines whether the extracted sinusoid satisfies an inclusion condition (processing block 606). For example, the inclusion condition may be that the energy of the determined sinusoid is larger than a predetermined fraction of the energy in the set of analysis samples. If the inclusion condition is satisfied, processing logic generates a prediction by oscillating using the parameters of the extracted sinusoids and adding the prediction (that was based on the extracted sinusoid) to the predicted samples (processing block 607).
Waveform Matching Prediction Generation
The prediction scheme described herein is based on waveform matching. The signal is analyzed in an analysis interval having Na samples, and the results of the analysis are used for prediction within the synthesis interval of length equal to Ns. This is a forward prediction where the future is predicted from the past.
Referring to
In one embodiment, the data structure comprises a codebook. In such a case, the samples within the codebook (or codevector) that best matches the input signal samples are selected. In one embodiment, the prediction is then obtained directly from the codebook, where each codevector is associated with a group of samples dedicated to the purpose of prediction.
One embodiment of the structure of the codebook is shown in
An Embodiment for Sinusoidal Prediction
In the following discussion, it is assumed that for a certain frame (or a block of samples), the analysis interval corresponds to nε[0, Na−1], and the synthesis interval corresponds to nε[Na, Na+Ns−1]. The sinusoidal analysis procedure is performed in the analysis interval where the frequencies (wi), amplitudes (ai), and phases (θi) for i=1 to P are determined. In order to perform sinusoidal analysis, in one embodiment, the analysis-by-synthesis (AbS) procedure is an iterative method where the sinusoids are extracted from the input signal in a sequential manner. After extracting one sinusoid, the sinusoid itself is subtracted from the input signal, forming in this way a residual signal; the residual signal then becomes the input signal for analysis in the next step, where another sinusoid is extracted. This process is performed through a search procedure in which a set of candidate frequencies is evaluated with the highest energy sinusoids being extracted. In one embodiment, the candidate frequencies are obtained by sampling the interval [0, π] uniformly, given by
where Nw is the number of candidate frequencies, its value is a tradeoff between quality and complexity. Note that the number of sinusoids P is a function of the signal and is determined based on the energy of the reconstructed signal, denoted by Er(P). That is, during the execution of the AbS procedure, P starts from zero and increases by one after extracting one sinusoid, when the condition
Er(P)/Es>QUIT—RATIO (1.2)
is reached the procedure is terminated; otherwise, it continues to extract more sinusoids until that condition is met. In equation (1.2), Es is the energy of the original input signal and QUIT_RATIO is a constant, with a typical value of 0.95.
The reconstructed signal inside the analysis interval is
each sinusoid has an energy given by
Then the prediction is formed with
with pi, i=1 to P the decision flags associated with the ith sinusoid. The flag is equal to 0 or 1 and its purpose is to select or deselect the ith sinusoid for prediction.
Thus, once the analysis procedure is completed, it is necessary to evaluate the extracted sinusoids to decide which one would be included for actual prediction.
Referring to
Referring to
In one embodiment, in order to determine whether a component of frequency wi has been present in the past M frames, a small neighborhood near the intended frequency is checked. For example, the i−1, i, and i+1 components of the past frame may be examined in order to make a decision to use the sinusoid. In alternative embodiments, this can be extended toward the past containing the data of M frames (e.g., 2-3 frames).
The following C code implements a recursive algorithm to verify the time/frequency points, with the result used to decide whether a certain sinusoid should be adopted for prediction.
In the previous code, M is the length of the history buffer and f[k][m] is the history buffer, where each element is either 0 or 1, and is used to keep track of the sinusoidal components present in the past. The value off is determined with
where w[k], k=0 to Nw−1 are the Nw candidate frequencies in equation (1.1). The array is shifted in the next frame in the sense that
f[k][m]<←f[k][m−1]; m=M,M−1, . . . ,1 (1.7)
Thus, the results for a total of M past frames are stored in the array, which are used to decide whether a certain frequency component has been present for a long enough period of time. Note that m=0 corresponds to the current frame in equation (1.7).
Additional Coding Embodiments
A predicted signal 1211 is generated using sinusoidal analysis 1205 and sinusoidal oscillator 1206. Sinusoidal analysis processing 1205 receives previously received samples of input signal 1201 from buffer 1202 and generates parameters of the sinusoids 1212. In one embodiment, sinusoidal analysis processing 1205 extracts the amplitudes, frequencies, and phases of a number of sinusoids to generate sinusoid parameters 1212. Using sinusoid parameters 1212, sinusoidal oscillator 1206 generates a prediction in the form of prediction signal 1211.
The predicted signal xp 1211 is subtracted from input signal 1201 using adder (subtractor) 1203 to generate a residual signal 1210. Entropy encoder 1204 receives and encodes residual signal 1210 to produce bit-stream 1220. Entropy encoder 1204 may comprises any lossless entropy encoder known in the art. Bit-stream 1220 is output from the encoder and may be stored or sent to another location.
Referring to
A predicted signal 1211 is generated using sinusoidal analysis 1205 and sinusoidal oscillator 1206. Sinusoidal analysis processing 1205 receives previously received samples of input signal 1201 from buffer 1202 and generates parameters of the sinusoids 1212. In one embodiment, sinusoidal analysis processing 1205 extracts the amplitudes, frequencies, and phases of a number of sinusoids to generate sinusoid parameters 1212. Using sinusoid parameters 1212, sinusoidal oscillator 1206 generates a prediction in the form of prediction signal 1211.
The predicted signal xp 1211 is subtracted from input signal 1201 using adder (subtractor) 1203 to generate a residual signal 1210. Encoder 1400 receives and encodes residual signal 1210 to produce bit-stream 1401. Encoder 1400 may comprise any lossy coder known in the art. Bit-stream 1401 is output from the encoder and may be stored or sent to another location.
Decoder 1402 also receives and decodes bit-stream 1401 to produce a quantized residual signal 1410. Adder 1403 adds quantized residual signal 1420 to predicted signal 1211 to produce decoded signal 1411. Buffer 1404 buffers decoded signal 1411 to group a number of samples together for processing purposes. Buffer 1404 provides these samples to sinusoidal analysis 1205 for use in generating future predictions.
Prediction signal 1511 is generated using sinusoidal analysis 1505 and sinusoidal oscillator 1506. Sinusoidal analysis processing 1505 receives previously generated samples of decoded signal 1501 from buffer 1502 and generates parameters of the sinusoids 1512. In one embodiment, sinusoidal analysis processing 1505 extracts the amplitudes, frequencies, and phases of a number of sinusoids to generate sinusoid parameters 1512. Using sinusoid parameters 1512, sinusoidal oscillator 1506 generates a prediction in the form of prediction signal 1511. Thus, the decoded signal is used to identify the parameters of the predictor.
The described system is backward adaptive because the parameters of the predictor and the prediction are based on the decoded signal, hence no explicit transmission of the parameters of the predictor is necessary.
Note that the decoder of
Referring to
Embodiments with Switched Quantizers
In one embodiment, coders described above are extended to include two quantizers that are selected based on the condition of the input signal. An advantage of this extension is that it enables selection of one of two quantizers depending on the performance of the predictor. If the predictor is performing well, the encoder quantizes the residual; otherwise, the encoder quantizes the input signal directly. The bit-stream of this coder has two components: index to one of the quantizer and a 1-bit decision flag indicating the selected quantizer.
One mechanism in which the quantizer is selected is based on the prediction gain, defined by
with x the input signal, xp the predicted signal, and e the residual. The summations are performed within the synthesis interval. Thus, if the performance of the predictor is good (for instance, PG>0), then the encoder quantizes the residual signal; otherwise, the encoder quantizes the input signal directly.
A predicted signal 1711 is generated using sinusoidal analysis 1705 and sinusoidal oscillator 1706. Sinusoidal analysis processing 1705 receives previously received samples of decoded signal 1741 from buffer 1744 and generates parameters of the sinusoids 1712. In one embodiment, sinusoidal analysis processing 1705 extracts the amplitudes, frequencies, and phases of a number of sinusoids to generate sinusoid parameters 1712. Using sinusoid parameters 1712, sinusoidal oscillator 1706 generates a prediction in the form of prediction signal 1711.
The predicted signal xp 1711 is subtracted from input signal 1701 using adder (subtractor) 1703 to generate a residual signal 1710. Residual signal 1710 is sent to decision logic 1730 and encoder 1704B.
Encoder 1704B receives and encodes residual signal 1710 to produce an index 1735 that may be selected for output using switch 1751.
Decoder 1714B also receives and decodes the output of encoder 1704B to produce a quantized residual signal 1720. Adder 1715 adds quantized residual signal 1720 to predicted signal 1711 to produce a decoded signal that is sent to switch 1752 for possible selection as an input into buffer 1744. Buffer 1744 buffers decoded signals to group a number of samples together for processing purposes so that several samples may be processed at once. Buffer 1744 provides these samples to sinusoidal analysis 1705 for use in generating future predictions.
Encoder 1704A also receives samples of the input signal from buffer 1702 and encodes them. The encoded output is sent to an input of switch 1751 for possible selection as the index output from the encoder. The encoded output is also sent to decoder 1714B for decoding. The decoded output of decoder 1714B added to the predicted signal 1711 is sent to switch 1752 for possible selection as an input into buffer 1744.
Decision logic 1730 receives the samples of the input signal from buffer 1702 along with the residual signal 1710 and determines whether to select the output of encoder 1704A or 1704B as the index output of the encoder. This determination is made as described herein and is output from decision logic as decision flag 1732.
Switch 1751 is controlled via decision logic 1730 to output an index from either encoder 1704A or 1704B, while switch 1752 is controlled via decision logic 1730 to enable selection of the output of decoder 1714A or adder 1715 to be input into buffer 1744.
Referring to
Processing logic then determines if the decision flag is set to 1 (processing block 1782). If the decision logic block decides to quantize the input signal, processing logic quantizes the input signal with the index transmitted as part of the bit-stream (processing block 1783); otherwise, processing logic quantizes the residual signal with the index transmitted as part of the bit-stream (processing block 1784). Then processing logic obtains the decoded signal by adding the decoded residual signal to the prediction signal (processing block 1785). The result is stored in a buffer.
Using the decoded signal, processing logic determines the parameters of the predictor (processing block 1786). Using the parameters, processing logic generates the prediction signal using the predictor together with the decoded signal (processing block 1787). The encoding process continues until no additional input samples are available.
Switch 1852 selects the output of decoder 1804A or the output of adder 1803 as the decoded signal 1801 as the output of the decoder based on decision flag 1840.
Buffer 1802 stores decoded signal 1801 as well. Buffer 1802 groups a number of samples together for processing purposes so that several samples may be processed at once.
Prediction signal 1811 is generated using sinusoidal analysis 1805 and sinusoidal oscillator 1806. Sinusoidal analysis processing 1805 receives previously generated samples of decoded signal 1801 from buffer 1802 and generates parameters of the sinusoids 1812. In one embodiment, sinusoidal analysis processing 1805 extracts the amplitudes, frequencies, and phases of a number of sinusoids to generate sinusoid parameters 1812. Using sinusoid parameters 1812, sinusoidal oscillator 1806 generates a prediction in the form of prediction signal 1811. Thus, the decoded signal is used to identify the parameters of the predictor.
The process begins by processing logic recovering an index and a decision flag from the bit-stream (processing block 1881). Depending on the value of the decision flag, processing logic either decodes the index to obtain the decoded signal (processing block 1883), or decodes the residual signal (processing block 1884). In the latter case, processing logic finds the decoded signal by adding the decoded residual signal to the prediction signal.
Using the decoded signal, processing logic then determines the parameters of the sinusoids (processing block 1886). Using the parameters, processing logic generates the prediction signal using the parameters of the sinusoids together with the decoded signal (processing block 1887).
The decoding process continues until no additional data from the bit-stream are available.
An Embodiment with Signal Switching for Lossless Coding
In alternative embodiments, the encoding and decoding mechanisms are disclosed, which include a signal switching mechanism. In this case, the coding goes through the sinusoidal analysis process where the amplitudes, frequencies, and phases of a number of sinusoids are extracted and then used by the sinusoidal oscillator to generate the prediction.
A predicted signal 1911 is generated using sinusoidal analysis processing 1905 and sinusoidal oscillator 1906. Sinusoidal analysis processing 1905 receives buffered samples of input signal 1901 from buffer 1902 and generates parameters of the sinusoids 1912. In one embodiment, sinusoidal analysis processing 1905 extracts the amplitudes, frequencies, and phases of a number of sinusoids to generate sinusoid parameters 1912. Using sinusoid parameters 1912, sinusoidal oscillator 1906 generates a prediction in the form of prediction signal 1911.
The predicted signal xp 1911 is subtracted from input signal 1901 using adder (subtractor) 1903 to generate a residual signal 1910. Residual signal 1910 is sent to decision logic 1930 and switch 1920.
Decision logic 1930 receives the samples of the input signal from buffer 1902 along with the residual signal 1910 and determines whether to select the input signal samples stored in buffer 1902 or the residual signal 1910 to be encoded by the entropy encoder 1904. This determination is made as described herein and is output from decision logic as decision flag 1932. Flag 1932 is sent as part of the bit-stream and controls the position of switch 1920.
Encoder 1904 receives and encodes the output of switch 1920 to produce an index 1931.
Referring to
Adder 2003 adds the output of the entropy decoder 2010 to prediction signal 2011. Prediction signal 2011 is generated using sinusoidal analysis 2005 and sinusoidal oscillator 2006. Sinusoidal analysis processing 2005 receives previously generated samples of decoded signal 2001 from buffer 2002 and generates parameters of the sinusoids 2012. In one embodiment, sinusoidal analysis processing 2005 extracts the amplitudes, frequencies, and phases of a number of sinusoids to generate sinusoid parameters 2012. Using sinusoid parameters 2012, sinusoidal oscillator 2006 generates a prediction in the form of prediction signal 2011. Thus, the decoded signal is used to identify the parameters of the predictor. The output of adder 2003 is input to switch 2040.
Switch 2040 selects the output of decoder 2004 or the output of adder 2003 as the decoded signal 2001. The selection is based on the value of decision flag 2040 recovered from the bit-stream.
Buffer 2002 stores decoded signal 2001 as well. Buffer 2002 groups a number of samples together for processing purposes so that several samples may be processed at once. The output of buffer 2002 is sent to an input of sinusoidal analysis 2005.
The process begins by processing logic recovering an index and a decision flag from the bit-stream (processing block 2011). Depending on the value of the decision flag (processing block 2012), processing logic recovers either the decoded signal (processing block 2013) or the residual signal (processing block 2014). In the latter case, processing logic finds the decoded signal by adding the decoded residual signal to the prediction signal (processing block 2015).
Using the decoded signal, processing logic then determines the parameters of the sinusoids (processing block 2016) and, using the parameters, generates the prediction signal using the predictor together with the decoded signal (processing block 2017).
The decoding process continues until no additional data from the bit-stream are available.
Matching Pursuit Prediction
In one embodiment, the prediction performed is matching pursuant prediction.
Prediction memory 2110 contains one or more sets of prediction samples 2101. In one embodiment, the size of each set of prediction samples 2101 is equal to the size of the set of predicted samples 2102. In one embodiment, the number of sets in prediction memory 2110 is equal to the number of sets in waveform memory 2111, and there is a one-to-one correspondence between sets in waveform memory 2111 and sets in prediction memory 2110.
Waveform synthesizer 2112 receives one or more of waveform parameters 2103 from waveform analyzer 2113, and retrieves the sets of prediction samples 2101 from prediction memory 2110 corresponding to the one or more indices comprised the waveform parameters 2103. The sets of prediction samples 2101 are then summed to form predicted samples 2102. The waveform synthesizer 2112 outputs the set of predicted samples.
In an alternate embodiment, waveform parameters 2103 may further comprise a weight for each index. Waveform synthesizer 2112 then generates predicted samples 2102 by a weighted sum of prediction samples 2101.
Referring to
Next, processing logic retrieves a set of analysis samples from a buffer (processing block 2202). Using the analysis samples, processing logic determines whether a stop condition is satisfied (processing block 2203). In one embodiment, the stop condition is that the energy in the set of analysis samples is lower than a predetermined threshold. In an alternative embodiment, the stop is that a number of extracted sinusoids is larger than a predetermined threshold. In yet another alternative embodiment, the stop condition is a combination of the above examples.
However, other conditions may be used. If the stop condition is satisfied, processing transitions to processing block 2207. Otherwise, processing proceeds to processing block 2204 where processing logic determines an index of a waveform from the set of analysis samples. The index points to a waveform stored in a waveform memory. In one embodiment, the index is determined by finding a waveform in a waveform memory that matches the set of analysis samples best.
With the index, processing logic subtracts the waveform associated with the determined index from the set of analysis samples (processing block 2205). Then processing logic adds the prediction associated with the determined index to the set of predicted samples (processing block 2206). The prediction is retrieved from a prediction memory. After completing the addition, processing transitions to processing block 2203 to repeat the portion of the process. At processing block 2207, processing logic outputs the predicted samples and the process ends.
System 2300 further comprises a random access memory (RAM), or other dynamic storage device 2304 (referred to as main memory) coupled to bus 2311 for storing information and instructions to be executed by processor 2312. Main memory 2304 also may be used for storing temporary variables or other intermediate information during execution of instructions by processor 2312.
Computer system 2300 also comprises a read only memory (ROM) and/or other static storage device 2306 coupled to bus 2311 for storing static information and instructions for processor 2312, and a data storage device 2307, such as a magnetic disk or optical disk and its corresponding disk drive. Data storage device 2307 is coupled to bus 2311 for storing information and instructions.
Computer system 2300 may further be coupled to a display device 2321, such as a cathode ray tube (CRT) or liquid crystal display (LCD), coupled to bus 2311 for displaying information to a computer user. An alphanumeric input device 2322, including alphanumeric and other keys, may also be coupled to bus 2311 for communicating information and command selections to processor 2312. An additional user input device is cursor control 2323, such as a mouse, trackball, trackpad, stylus, or cursor direction keys, coupled to bus 2311 for communicating direction information and command selections to processor 2312, and for controlling cursor movement on display 2321.
Another device that may be coupled to bus 2311 is hard copy device 2324, which may be used for printing instructions, data, or other information on a medium such as paper, film, or similar types of media. Furthermore, a sound recording and playback device, such as a speaker and/or microphone may optionally be coupled to bus 2311 for audio interfacing with computer system 2300. Another device that may be coupled to bus 2311 is a wired/wireless communication capability 2325 to communication to a phone or handheld palm device.
Note that any or all of the components of system 2300 and associated hardware may be used in the present invention. However, it can be appreciated that other configurations of the computer system may include some or all of the devices.
Whereas many alterations and modifications of the present invention will no doubt become apparent to a person of ordinary skill in the art after having read the foregoing description, it is to be understood that any particular embodiment shown and described by way of illustration is in no way intended to be considered limiting. Therefore, references to details of various embodiments are not intended to limit the scope of the claims which in themselves recite only those features regarded as essential to the invention.
Number | Date | Country | |
---|---|---|---|
60589286 | Jul 2004 | US |