I. Field
The present invention generally relates to signal processing, and more particularly, to encoding and decoding of signals for storage and retrieval or for communications.
II. Background
In digital telecommunications, signals need to be coded for transmission and decoded for reception. Coding of signals concerns with converting the original signals into a format suitable for propagation over the transmission medium. The objective is to preserve the quality of the original signals but at a low consumption of the medium's bandwidth. Decoding of signals involves the reverse of the coding process.
A known coding scheme uses the technique of pulse-code modulation (PCM). Referring to
To conserve bandwidth, the digital values of the PCM pulses 20 can be compressed using a logarithmic compounding process prior to transmission. At the receiving end, the receiver merely performs the reverse of the coding process mentioned above to recover an approximate version of the original time-varying signal x(t). Apparatuses employing the aforementioned scheme are commonly called the a-law or μ-law codecs.
As the number of users increases, there is a further practical need for bandwidth conservation. For instance, in a wireless communication system, a multiplicity of users can be sharing a finite frequency spectrum. Each user is normally allocated a limited bandwidth among other users.
In the past decade or so, considerable progress has been made in the development of speech coders. A commonly adopted technique employs the method of code excited linear prediction (CELP). Details of CELP methodology can be found in publications, entitled “Digital Processing of Speech Signals,” by Rabiner and Schafer, Prentice Hall, ISBN: 0132136031, September 1978; and entitled “Discrete-Time Processing of Speech Signals,” by Deller, Proakis and Hansen, Wiley-IEEE Press, ISBN: 0780353862, September 1999. The basic principles underlying the CELP method is briefly described below.
Reference is now returned to
For simplicity, take only the three PCM pulse groups 22A-22C for illustration. During encoding prior to transmission, the digital values of the PCM pulse groups 22A-22C are consecutively fed to a linear predictor (LP) module. The resultant output is a set of frequency values, also called a “LP filter” or simply “filter” which basically represents the spectral content of the pulse groups 22A-22C. The LP filter is then quantized.
The LP module generates an approximation of the spectral representation of the PCM pulse groups 22A-22C. As such, during the predicting process, errors or residual values are introduced. The residual values are mapped to a codebook which carries entries of various combinations available for close matching of the coded digital values of the PCM pulse groups 22A-22C. The best fitted values in the codebook are mapped. The mapped values are the values to be transmitted. The overall process is called time-domain linear prediction (TDLP).
Thus, using the CELP method in telecommunications, the encoder (not shown) merely has to generate the LP filters and the mapped codebook values. The transmitter needs only to transmit the LP filters and the mapped codebook values, instead of the individually coded PCM pulse values as in the a- and μ-law encoders mentioned above. Consequently, substantial amount of communication channel bandwidth can be saved.
On the receiver end, it also has a codebook similar to that in the transmitter. The decoder (not shown) in the receiver, relying on the same codebook, merely has to reverse the encoding process as aforementioned. Along with the received LP filters, the time-varying signal x(t) can be recovered.
Heretofore, many of the known speech coding schemes, such as the CELP scheme mentioned above, are based on the assumption that the signals being coded are short-time stationary. That is, the schemes are based on the premise that frequency contents of the coded frames are stationary and can be approximated by simple (all-pole) filters and some input representation in exciting the filters. The various TDLP algorithms in arriving at the codebooks as mentioned above are based on such a model. Nevertheless, voice patterns among individuals can be very different. Non-human audio signals, such as sounds emanated from various musical instruments, are also distinguishably different from that of the human counterparts. Furthermore, in the CELP process as described above, to expedite real-time signal processing, a short time frame is normally chosen. More specifically, as shown in
Accordingly, there is a need to provide a coding and decoding scheme with improved preservation of signal quality, applicable not only to human speeches but also to a variety of other sounds, and further for efficient utilization of channel resources.
Copending U.S. patent application Ser. No. 11/583,537, assigned to the same assignee as the current application, addresses the aforementioned need by using a frequency domain linear prediction (FDLP) scheme which first converts a time-varying signal into a frequency-domain signal. The envelope and the carrier portions of the frequency-domain signal are then identified. The frequency-domain signal is then sorted into a plurality of sub-bands. The envelope portion is approximated by the FDLP scheme as an all-pole model. The carrier portion, which also represents the residual of the all-pole model, is approximately estimated. Resulted data of the all-pole model signal envelope and the estimated carrier are packetized as encoded signals suitable for transmission or storage. To reconstruct the time-varying signals, the encoded signals are decoded. The decoding process is basically the reverse of the encoding process.
For improved signal quality, the signal carrier can be more accurately determined prior to packetization and encoding, yet at substantially no extra consumption of additional bandwidth.
In an apparatus and method, a time-varying signal is partitioned into sub-bands. Each sub-band is processed and encoded via a frequency domain linear prediction (FDLP) scheme to arrive at an all-pole model. Residual signal resulted from the scheme in each sub-band is estimated. The all-pole model and the residual signal represent the Hilbert envelope and the Hilbert carrier, respectively, in each sub-band. Through the process of heterodyning, the time-domain residual signal is frequency shifted toward the baseband level as a downshifted carrier signal. Quantized values of the all-pole model and the downshifted carrier signal are packetized as encoded signals suitable for transmission or storage. To reconstruct the time-varying signals, the encoded signals are decoded. The decoding process is basically the reverse of the encoding process.
The partitioned frames can be chosen to be relatively long in duration resulting in more efficient use of format or common spectral information of the signal source. The apparatus and method implemented as described are suitable for use not only to vocalic voices but also for other sounds, such as sounds emanated from various musical instruments, or combination thereof.
These and other features and advantages will be apparent to those skilled in the art from the following detailed description, taken together with the accompanying drawings, in which like reference numerals refer to like parts.
The following description is presented to enable any person skilled in the art to make and use the invention. Details are set forth in the following description for purpose of explanation. It should be appreciated that one of ordinary skill in the art would realize that the invention may be practiced without the use of these specific details. In other instances, well known structures and processes are not elaborated in order not to obscure the description of the invention with unnecessary details. Thus, the present invention is not intended to be limited by the embodiments shown, but is to be accorded with the widest scope consistent with the principles and features disclosed herein.
In the encoding section 32, there is an encoder 38 connected to a data packetizer 40. A time-varying input signal x(t), after passing through the encoder 38 and the data packetizer 40 are directed to the data handler 36.
In a somewhat similar manner but in the reverse order, in the decoding section 34, there is a decoder 42 tied to a data depacketizer 44. Data from the data handler 36 are fed to the data depacketizer 44 which in turn sends the depacketized data to the decoder 42 for the reconstruction of the original time-varying signal x(t).
In step S1 of
In this specification and the appended claims, unless specifically specified wherever appropriate, the term “signal” is broadly construed. Thus the term signal includes continuous and discrete signals, and further frequency-domain and time-domain signals. Moreover, hereinbelow, lower-case symbols denote time-domain signals and upper-case symbols denote frequency-transformed signals. The rest of the notation will be introduced in subsequent description.
Progressing into step S2, the sampled signal x(n) is partitioned into a plurality of frames. One of such frame is signified by the reference numeral 46 as shown in
The time-varying signal within the selected frame 46 is labeled s(t) in
The discrete version of the signal s(t) is represented by s(n), where n is an integer indexing the sample number. Again, for reason of clarity so as not to obscure the drawing figure, only a few samples of s(n) are shown in
s(t)=s(nτ) (1)
where τ is the sampling period as shown in
Progressing into step S3 of
where s(n) is as defined above, f is the discrete frequency in which 0≦f≦N, T is the linear array of the N transformed values of the N pulses of s(n), and the coefficients c are given by c(0)=√{square root over (1/N)}, c(f)=√{square root over (2/N)} for 1≦f≦N−1 .
After the DCT of the time-domain parameter of s(n), the resultant frequency-domain parameter T(f) is diagrammatically shown in
Entering into step S4 of
In selecting the number of sub-bands M, there should be a balance between complexity and signal quality. That is, if a higher quality of the encoded signal is desired, more sub-bands can be chosen but at the expense of more packetized data bits and further a more complex dealing of the residual signal, both will be explained later. On the other hand, fewer numbers of sub-bands may be selected for the sake of simplicity but may result in the encoded signal with relatively lower quality. Furthermore, the number of sub-bands can be chosen as dependent on the sampling frequency. For instance, when the sampling frequency is at 16,000 Hz, M can be selected to be 15. In the exemplary embodiment, the sampling frequency is chosen to be 8,000 Hz and with M set at 13 (i.e., M=13).
After the N DCT coefficients are separated and fitted into the M sub-bands in the form of M overlapping Gaussian windows, as shown in
In the following description of the embodiment, M=13 and 1≦k≦M in which k is an integer. In addition, the DCT coefficients sorted in the kth sub-band is denoted Tk(f), which is a frequency-domain term. The DCT coefficients in the kth sub-band Tk(f) has its time-domain counterpart, which is expressed as sk(n).
At this juncture, it helps to make a digression to define and distinguish the various frequency-domain and time-domain terms.
The time-domain signal in the kth sub-band sk(n) can be obtained by an inverse discrete cosine transform (IDCT) of its corresponding frequency counterpart Tk(f). Mathematically, it is expressed as follows:
where sk(n) and Tk(f) are as defined above. Again, f is the discrete frequency in which 0≦f≦N, and the coefficients c are given by c(0)=√{square root over (1/N)}, c(f)=√{square root over (2/N)}. for 1≦f≦N−1.
Switching the discussion from the frequency domain to the time domain, the time-domain signal in the kth sub-band sk(n) essentially composes of two parts, namely, the time-domain Hilbert envelope {tilde over (s)}k(n) and the Hilbert carrier ck(n). The time-domain Hilbert envelope {tilde over (s)}k(n) is diagrammatically shown in
sk(n)={tilde over (s)}k(n)ck(n) (4)
Thus, from equation (4), if the time-domain Hilbert envelope {tilde over (s)}k(n) and the Hilbert carrier ck(n) are known, the time-domain signal in the kth sub-band sk(n) can be reconstructed. The reconstructed signal approximates that of a lossless reconstruction.
The diagrammatical relationship between the time-domain signal sk(n) and its frequency-domain counterpart Tk(f) can also be seen from
Returning now to
As also mentioned earlier, the time-domain term Hilbert envelope {tilde over (s)}k(n) in the kth sub-band can be derived from the corresponding frequency-domain parameter Tk(f). However, in sub-step S5k, instead of using the IDCT process for the exact transformation of the parameter Tk(f), the process of frequency-domain linear prediction (FDLP) of the parameter Tk(f) is employed in the exemplary embodiment. Data resulted from the FDLP process can be more streamlined, and consequently more suitable for transmission or storage.
In the following paragraphs, the FDLP process is briefly described followed with a more detailed explanation.
Briefly stated, in the FDLP process, the frequency-domain counterpart of the Hilbert envelope {tilde over (s)}k(n) is estimated, the estimated counterpart is algebraically expressed as {tilde over (T)}k(f) and is shown and labeled 56 in
Hereinbelow, further details of the FDLP process and the estimating of the parameter Ck(f) are described.
In the FDLP process, the algorithm of Levinson-Durbin can be employed. Mathematically, the parameters to be estimated by the Levinson-Durbin algorithm can be expressed as follows:
Fundamentals of the Z-transform in the z-domain can be found in a publication, entitled “Discrete-Time Signal Processing,” 2nd Edition, by Alan V. Oppenheim, Ronald W. Schafer, John R. Buck, Prentice Hall, ISBN: 0137549202, and is not further elaborated in here.
In equation (5), the value of K can be selected based on the length of the frame 46 (
In essence, in the FDLP process as exemplified by Equation (5), the DCT coefficients of the frequency-domain transform in the kth sub-band Tk(f) are processed via the Levinson-Durbin algorithm resulting in a set of coefficients a(i), where 0≦i≦K−1. The set of coefficients a(i) represents the frequency counterpart {tilde over (T)}k(f) (
The Levinson-Durbin algorithm is well known in the art and is also not explained in here. The fundamentals of the algorithm can be found in a publication, entitled “Digital Processing of Speech Signals,” by Rabiner and Schafer, Prentice Hall, ISBN: 0132136031, September 1978.
Advancing into sub-step S6k of
The result of the FDLP process is the parameter {tilde over (T)}k(f), which as mentioned above, is the Hilbert envelope {tilde over (s)}k(n) expressed in the frequency-domain term. The parameter {tilde over (T)}k(f) is identified by the reference numeral 56 in
The quantized coefficients a(i), where i=0 to K−1, of the parameter {tilde over (T)}k(f) will be part of the encoded information to be sent to the data handler 36 (
As mentioned above and repeated in here, since the parameter {tilde over (T)}k(f) is a lossy approximation of the original parameter Tk(f), the difference between the two parameters can be captured and represented as the residual value, which is algebraically expressed as Ck(f). Differently put, in the fitting process in sub-steps S5k and S6k via the Levinson-Durbin algorithm as aforementioned to arrive at the all-pole model, some information about the original signal cannot be captured. If signal encoding of high quality is intended, that is, if a lossless encoding is desired, the residual value Ck(f) needs to be estimated. The residual value Ck(f) basically corresponds to the frequency components of the carrier frequency ck(n) of the signal sk(n) and will be further explained.
Progressing into sub-step S7k of
Estimation of the residual value either in the frequency-domain expressed as Ck(f) or in the time-domain expressed as ck(n) is carried out in sub-step S8k of
ck(n)=sk(n)/{tilde over (s)}k(n) (6)
where all the parameters are as defined above.
It should be noted that Equation (6) is shown a straightforward way of estimating the residual value. Other approaches can also be used for estimation. For instance, the frequency-domain residual value Ck(f) can very well be generated from the difference between the parameters Tk(f) and {tilde over (T)}k(f). Thereafter, the time-domain residual value ck(n) can be obtained by a direct time-domain transform of the value Ck(f).
In
Reference is now returned to sub-step S9k of
To generate an analytic signal zk(n) of the time-domain signal ck(n), a Hilbert transform of the signal ck(n) needs to be carried out, as shown in step S9k of
where all the parameters are as defined above. Equation (7) basically is a commonly known Hilbert transform equation in the time-domain.
After the Hilbert transform, the analytic signal zk(n) is simply the summation of the time-domain signal ck(t) and the imaginary part of the Hilbert transform signal ĉk(t), as shown in step S10k of
zk(n)=ck(n)+j{circumflex over (ck)}(n) (8)
where j is an imaginary number
After the derivation of the analytic signal, the process of heterodyning is performed, as shown in sub-step S11k in
Mathematically, the demodulated signal, down-sampled Hilbert carrier, dk(n) is derived from the following equation:
dk(n)=zk(Rn)ck(Rn) (9)
where all the terms are as defined above; R is the down-sampling rate.
By down-shifting the frequency of the parameter ck(n) to arrive at the parameter dk(n), processing of the Hilbert carrier in each sub-band, such as filtering and thresholding to be described below, can be substantially made easier. Specifically, the offset frequency of the Hilbert carrier in each sub-band need not be determined or known in advance. For instance, in the implementation of a filter algorithm, all the sub-bands can assume one offset frequency, i.e., the baseband frequency.
After the process of frequency down-shifting, the down-sampled Hilbert carrier dk(n) is then passed through a low-pass filter, as shown in the sub-step S12k of
It should be noted that the demodulated carrier dk(n) is complex and analytic. As such, the Fourier transform of the parameter dk(n) is not conjugate-symmetric. Phrased differently, the process of heterodyning the analytic signal zk(n) essentially shifts the frequency of the Hilbert carrier ck(n) as dk(n) towards the baseband frequency, but without the conjugate-symmetric terms in the negative frequency. As can be seen from the frequency-domain transform Dk(f) of the down-shifted carrier dk(n) in
Entering into step S14k of
In this exemplary embodiment, the threshold is dynamically applied. That is, for each sub-band, the threshold 62 is made adjustable based on other parameters, such as the average and maximum magnitudes of the samples of the parameter Dk(f), and/or the same parameters but of the neighboring sub-bands of the parameter Dk(f). In addition, the parameters can also include the average and maximum magnitudes of the samples of the parameter Dk(f), and/or the same parameters but of the adjacent time-frames of the parameter Dk(f). Furthermore, the threshold can also be dynamically adapted based on the number of coefficients selected. In the exemplary embodiment, only values of the frequency-domain transform Dk(f) above the threshold line 62 are selected.
Thereafter, selected components of the parameter Dk(f) greater than the threshold are quantized. In this example, each selected component includes a magnitude value bm(i) and a phase value bp(i), where 0≦i≦L−1. The quantized values bm(i) and bp(i) are represented as the quantized values as shown in sub-step S15k in
The quantized values bm(i) and bp(i), where i=0 to L−1, of the threshold-filtered parameter Dk(f) will be another part of the encoded information along with the quantized coefficients a(i), where i=0 to K−1, as described above to be sent to the data handler 36 (
Reference is now returned to
Thereafter, all the data from each of the M sub-bands are concatenated and packetized, as shown in step S17 of
Data can be retrieved from the data handler 36 for decoding and reconstruction. Referring to
During transmission, if data in few of the M frequency sub-bands are corrupted, the quality of the reconstructed signal should not be affected much. This is because the relatively long frame 46 (
An exemplary reconstructed frequency-domain transform Dk(f) of the demodulated Hilbert carrier dk(t) are respectively shown in
Reference is first directed to the encoding section 32 of
The encoding section 32 comprises a central data bus 70 linking several circuits together. The circuits include a central processing unit (CPU) or a controller 72, an input buffer 74, and a memory unit 78. In this embodiment, a transmit circuit 76 is also included.
If the encoding section 32 is part of a wireless device, the transmit circuit 74 can be connected to a radio frequency (RF) circuit but is not shown in the drawing. The transmit circuit 76 processes and buffers the data from the data bus 70 before sending out of the circuit section 32. The CPU/controller 72 performs the function of data management of the data bus 70 and further the function of general data processing, including executing the instructional contents of the memory unit 78.
Instead of separately disposed as shown in
The input buffer 74 can be tied to other devices (not shown) such as a microphone or an output of a recorder.
The memory unit 78 includes a set of computer-readable instructions generally signified by the reference numeral 77. In this specification and appended claims, the terms “computer-readable instructions” and “computer-readable program code” are used interchangeably. In this embodiment, the instructions include, among other things, portions such as the DCT function 80, the windowing function 84, the FDLP function 86, the heterodyning function 88, the Hilbert transform function 90, the filtering function 92, the down-sampling function 94, the dynamic thresholding function 96, the quantizer function 98, the entropy coding function 100 and the packetizer 102.
The various functions have been described, e.g., in the description of the encoding process shown in
Reference is now directed to the decoding section 34 of
The decoding section 34 also has a central bus 190 connected to various circuits together, such as a CPU/controller 192, an output buffer 196, and a memory unit 197. Furthermore, a receive circuit 194 can also be included. Again, the receive circuit 194 can be connected to a RF circuit (not shown) if the decoding section 34 is part of a wireless device. The receive circuit 194 processes and buffers the data from the data bus 190 before sending into the circuit section 34. As an alternative, the receive circuit 194 can be parts of the CPU/controller 192, rather than separately disposed as shown. The CPU/controller 192 performs the function of data management of the data bus 190 and further the function of general data processing, including executing the instructional contents of the memory unit 197.
The output buffer 196 can be tied to other devices (not shown) such as a loudspeaker or the input of an amplifier.
The memory unit 197 includes a set of instructions generally signified by the reference numeral 199. In this embodiment, the instructions include, among other things, portions such as the depackertizer function 198, the entropy decoder function 200, the inverse quantizer function 202, the up-sampling function 204, the inverse Hilbert transform function 206, the inverse heterodyning function 208, the DCT function 210, the synthesis function 212, and the IDCT function 214.
The various functions have been described, e.g., in the description of the decoding process shown in
It should be noted the encoding and decoding sections 32 and 34 are shown separately in
In this embodiment, the memory unit 78 or 197 is a RAM (Random Access Memory) circuit. The exemplary instruction portions 80, 84, 86, 88, 90, 92, 94, 96, 98, 100, 102, 197, 198, 200, 202, 204, 206, 208, 210, 212 and 214 are software routines or modules. The memory unit 78 or 197 can be tied to another memory circuit (not shown) which can either be of the volatile or nonvolatile type. As an alternative, the memory unit 78 or 197 can be made of other circuit types, such as an EEPROM (Electrically Erasable Programmable Read Only Memory), an EPROM (Electrical Programmable Read Only Memory), a ROM (Read Only Memory), a magnetic disk, an optical disk, and others well known in the art.
Furthermore, the memory unit 78 or 197 can be an application specific integrated circuit (ASIC). That is, the instructions or codes 77 and 199 for the functions can be hard-wired or implemented by hardware, or a combination thereof. In addition, the instructions 77 and 199 for the functions need not be distinctly classified as hardware or software implemented. The instructions 77 and 199 surely can be implemented in a device as a combination of both software and hardware.
It should be further be noted that the encoding and decoding processes as described and shown in
Finally, other changes are possible within the scope of the invention. In the exemplary embodiment as described, only processing of audio signals is depicted. However, it should be noted that the invention is not so limited. Processing of other types of signals, such as ultra sound signals, are also possible. It also should be noted that the invention can very well be used in a broadcast setting, i.e., signals from one encoder can be sent to a plurality of decoders. Furthermore, the exemplary embodiment as described need not be confined to be used in wireless applications. For instance, a conventional wireline telephone certainly can be installed with the exemplary encoder and decoder as described. In addition, in describing the embodiment, the Levinson-Durbin algorithm is used, other algorithms known in the art for estimating the predictive filter parameters can also be employed. Additionally, any logical blocks, circuits, and algorithm steps described in connection with the embodiment can be implemented in hardware, software, firmware, or combinations thereof. It will be understood by those skilled in the art that theses and other changes in form and detail may be made therein without departing from the scope and spirit of the invention.
The present application for patent claims priority to U.S. Provisional Application No. 60/791,042, entitled “Processing of Excitation in Audio Coding Based on Spectral Dynamics in Sub-Bands,” filed on Apr. 10, 2006, and assigned to the assignee hereof and expressly incorporated by reference herein.
Number | Name | Date | Kind |
---|---|---|---|
4184049 | Crochiere et al. | Jan 1980 | A |
4192968 | Hilbert et al. | Mar 1980 | A |
4584534 | Lijphart et al. | Apr 1986 | A |
4849706 | Davis et al. | Jul 1989 | A |
4902979 | Puckette, IV | Feb 1990 | A |
5640698 | Shen et al. | Jun 1997 | A |
5651090 | Moriya et al. | Jul 1997 | A |
5715281 | Bly et al. | Feb 1998 | A |
5764704 | Shenoi | Jun 1998 | A |
5778338 | Jacobs et al. | Jul 1998 | A |
5781888 | Herre | Jul 1998 | A |
5802463 | Zuckerman | Sep 1998 | A |
5825242 | Prodan et al. | Oct 1998 | A |
5838268 | Frenkel | Nov 1998 | A |
5884010 | Chen et al. | Mar 1999 | A |
5943132 | Erskine | Aug 1999 | A |
6014621 | Chen | Jan 2000 | A |
6091773 | Sydorenko | Jul 2000 | A |
6243670 | Bessho et al. | Jun 2001 | B1 |
6680972 | Liljeryd et al. | Jan 2004 | B1 |
6686879 | Shattil | Feb 2004 | B2 |
7155383 | Chen et al. | Dec 2006 | B2 |
7173966 | Miller | Feb 2007 | B2 |
7206359 | Kjeldsen et al. | Apr 2007 | B2 |
7430257 | Shattil | Sep 2008 | B1 |
7532676 | Fonseka et al. | May 2009 | B2 |
7639921 | Seo et al. | Dec 2009 | B2 |
7949125 | Shoval et al. | May 2011 | B2 |
8027242 | Garudadri et al. | Sep 2011 | B2 |
20010044722 | Gustafsson et al. | Nov 2001 | A1 |
20030231714 | Kjeldsen et al. | Dec 2003 | A1 |
20040165680 | Kroeger | Aug 2004 | A1 |
20060122828 | Lee et al. | Jun 2006 | A1 |
20090177478 | Jax et al. | Jul 2009 | A1 |
20090198500 | Garudadri et al. | Aug 2009 | A1 |
20110270616 | Garudadri et al. | Nov 2011 | A1 |
Number | Date | Country |
---|---|---|
0782128 | Jul 1997 | EP |
0867862 | Sep 1998 | EP |
1093113 | Apr 2001 | EP |
1158494 | Nov 2001 | EP |
1852849 | Nov 2007 | EP |
62502572 | Oct 1987 | JP |
3127000 | May 1991 | JP |
6229234 | Aug 1994 | JP |
7077979 | Mar 1995 | JP |
7234697 | Sep 1995 | JP |
08102945 | Apr 1996 | JP |
9258795 | Oct 1997 | JP |
2002032100 | Jan 2002 | JP |
2003108196 | Apr 2003 | JP |
2005173607 | Jun 2005 | JP |
2005530206 | Oct 2005 | JP |
2007506986 | Mar 2007 | JP |
405328 | Sep 2000 | TW |
442776 | Jun 2001 | TW |
454169 | Sep 2001 | TW |
454171 | Sep 2001 | TW |
200507467 | Feb 2005 | TW |
200529040 | Sep 2005 | TW |
I242935 | Nov 2005 | TW |
200707275 | Feb 2007 | TW |
200727729 | Jul 2007 | TW |
WO03107329 | Dec 2003 | WO |
WO2005027094 | Mar 2005 | WO |
2005096274 | Oct 2005 | WO |
WO2007128662 | Nov 2007 | WO |
Entry |
---|
Qin Li; Atlas, L.; , “Properties for modulation spectral filtering,” Acoustics, Speech, and Signal Processing, 2005. Proceedings. (ICASSP '05). IEEE International Conference on , vol. 4, No., pp. iv/521-iv/524 vol. 4, Mar. 18-23, 2005 doi: 10.1109/ICASSP.2005.1416060 URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1416060&isnumber=3065. |
N Derakhshan; MH Savoji. Perceptual Speech Enhancement Using a Hilbert Transform Based Time-Frequency Representation of Speech. SPECOM Jun. 25-29, 2006. |
de Buda, R.; , “Coherent Demodulation of Frequency-Shift Keying with Low Deviation Ratio,” Communications, IEEE Transactions on , vol. 20, No. 3, pp. 429-435, Jun. 1972 doi: 10.1109/TCOM.1972.1091177 URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1091177&isnumber=23774. |
Athineos, M.; Ellis, D.P.W.; , “Frequency-domain linear prediction for temporal features,” Automatic Speech Recognition and Understanding, 2003. ASRU '03. 2003 IEEE Workshop on , vol., No., pp. 261-266, Nov. 30-Dec. 3, 2003 doi: 10.1109/ASRU.2003.1318451. |
Athineos, Marios / Hermansky, Hynek / Ellis, Daniel P.W. (2004): “LP-TRAP: linear predictive temporal patterns”, In Interspeech-2004, 949-952. |
S Schimmel, L Atlas. Coherent Envelope Detection for Modulation Filtering of Speech. Proceedings ICASSP 05 IEEE International Conference on Acoustics Speech and Signal Processing 2005 (2005) Vol. 1, Issue: 7, Publisher: IEEE, pp. 221-224. |
Tyagi, V.; Wellekens, C.; , “Fepstrum representation of speech signal,” Automatic Speech Recognition and Understanding, 2005 IEEE Workshop on , vol., No., pp. 11-16, Nov. 27-27, 2005 doi: 10.1109/ASRU.2005.1566475. |
International Search Report, PCT/US2007/066243, Sep. 6, 2007. |
Motlicek, P. et al., “Audio Coding Based on Long Temporal Contexts,” IDIAP Research Report, [Online] Apr. 2006, Retrieved from the Internet: URL: http://www.idiap.ch/publications/motlicek-idiap-rr-06-30.bib.abs.html>[retrieved on Mar. 2, 2007]. |
Motlicek, Ullal, Hermansky: “Wide-Band Perceptual Audio Coding based on Frequency-Domain Linear Prediction” IDIAP Research Report, [Online] Oct. 2006. XP002423397 Retrieved from the Internet: URL:http://WWW.idiap.ch/publications/motlicek-idiap-rr-06-58.bib.abs.html> [retrieved on Mar. 2, 2007]. |
Kumaresan Ramdas et al: “Model based approach to envelope and positive instantaneous frequency estimation of signals with speech applications” Journal of the Acoustical Society of America, AIP / Acoustical Society of America, Melville, NY, US, vol. 105, No. 3, Mar. 1999, pp. 1812-1924. XP012000860 ISSN: 0001-4966 *section B, III* p. 1913, left-hand column, lines 3-6. |
International Search Report˜PCT/US06/060168—International Search Authority—European Patent Office. |
Herre, J. et al. “Enhancing the Performance of Perceptual Audio Coders by Using Temporal Noise Shaping (TMS)” Preprints of papers presented at the AES convention, Nov. 8, 1996, pp. 1-24, p. 8, line 6—p. 11, line 21 figures 10, 14. |
Schimmel S., Atlas L., “Coherent Envelope Detector for Modulation Filtering of Speech”, in Proc. of ICASSP, vol. 1, pp. 221-224, Philadelphia, USA, May 2005. |
Motlicek P., Hermansky H., Garudadri H., “Speech Coding Based on Spectral Dynamics”, technical report IDIAP-RR 06-05, <http://www.idiap.ch>, Jan. 2006. |
M. Athineos, et al, “Frequency-domain linear prediction for temporal features” Automatic Speech Recognition and Understanding, 2003. ASRU '03. 2003 IEEE Workshop on St. Thomas, V.I., USA, Nov. 30-Dec. 3, 2003, Piscataway, NJ, USA, IEEE, Nov. 30, 2003, pp. 261-265, XP010713319 ISBN: 0-7803-7980-2, the whole document. |
Spanias A. S., “Speech Coding: A Tutorial Review” In Proc. of IEEE, vol. 82, No. 10, Oct. 1994. |
Hermansky H, “Perceptual linear predictive (PLP) analysis for speech”, J. Acoust. Soc. Am., vol. 87:4, pp. 1738-1752, 1990. |
Mark S. Vinton and Les. E. Atlas, “A Scalable and Progressive Audio Codec”, IEEE ICASSP 2001, May 7-11, 2001, Salt Lake City. |
Hermansky H., Fujisaki H., Sato V., “Analysis and Synthesis of Speech Based on Spectral Transform Linear Predictive Method”, In Proc. of ICASSP, vol. 8, pp. 777-780, Boston, USA, Apr. 1983. |
Makhoul J, “Linear Prediction: A Tutorial Review”, in Proc. of IEEE. vol. 63. No. 4, Apr. 1975. |
Athineos M., Hermansky H., Ellis D. P. W., “LP-TRAP: Linear predictive temporal patterns”, in Proc. of ICSLP, pp. 1154-1157, Jeju, S. Korea, Oct. 2004. |
C. Loeffler, A. Ligtenberg, and G. S. Moschytz, “Algorithm-architecture mapping for custom DCT chips.” in Proc. Int. Symp. Circuits Syst. (Helsinki, Finland), Jun. 1988, pp. 1953-1956. |
Ephraim Feig, “A fast scaled-DCT algorithm”, SPIE vol. 1244, Image Processing Algorithms and Techniques (1990), pp. 2-13. |
ISO/IEC JTC1/SC29/WG11 N7335, “Call for Proposals on Fixed-Point 8x8 IDCT and DCT Standard,” pp. 1-18, Poznan, Poland, Jul. 2005. |
ISO/IEC JTC1/SC29/WG11 N7817 [23002-2 WD1] “Information technology—MPEG Video Technologies—Part 2: Fixed-point 8x8 IDCT and DCT transforms,” Jan. 19, 2006, pp. 1-27. |
ISO/IEC JTC1/SC29/WG11N7292 [11172-6 Study on FCD] Information Technology—Coding of Moving Pictures and Associated Audio for Digital Storage Media at up to About 1.5 Mbit/s—Part 6: Specification of Accuracy Requirements for Implementation of Integer Inverse Discrete Cosine Transform, IEEE standard 1180-1190, pp. 1-14, Approved Dec. 6, 1990. |
M12984: Gary J. Sullivan, “On the project for a fixed-point IDCT and DCT standard”, Jan. 2006, Bankok, Thailand. |
M13004: Yuriy A. Reznik, Arianne T. Hinds, Honggang Qi, and Siwei Ma, “On Joint Implementation of Inverse Quantization and IDCT scaling”, Jan. 2006, Bankok, Thailand. |
M13005, Yuriy A. Reznik, “Considerations for choosing precision of MPEG fixed point 8x8 IDCT Standard” Jan. 2006, Bangkok, Thailand. |
M13326: Yuriy A. Reznik and Arianne T. Hinds, “Proposed Core Experiment on Convergence of Scaled and Non-Scaled IDCT Architectures”, Apr. 1, 2006, Montreux, Switzerland. |
Taiwan Search Report TW096112540, Dec. 13, 2009. |
W. Chen, C.H. Smith and S.C. Fralick, “A Fast Computational Algorithm for the Discrete Cosine Transform”, IEEE Transactions on Communications, vol. com-25, No. 9, pp. 1004-1009, Sep. 1977. |
Written Opinion—PCT/US2007/066243, International Search Authority, European Patent Office, Jun. 9, 2007. |
Y.Arai, T. Agui, and M. Nakajima, “A Fast DCT-SQ Scheme for Images”, Transactions of the IEICE vol. E 71, No. 11 Nov. 1988, pp. 1095-1097. |
Athineos, Marios et al., “Frequency-Domain Linear Prediction for Temporal Features”. Proceeding of ASRU-2003, Nov. 30-Dec. 4, 2003, St. Thomas USVI. |
Athineos, Marios et al., “PLP2 Autoregressive modeling of auditory-like 2-D spectro-temporal patterns”, Proceeding from Workshop n Statistical and Perceptual Audio Processing, SAPA-2004, paper 129, Oct. 3, 2004, Jeju, Korea. |
Christensen, Mods Graesboll et al., “Computationally Efficient Amplitude Modulated Sinusoidal Audiocoding Using Frequency-Domain Linear Prediction”, ICASSP 2006 Proceeding—Toulouse, France, IEEE Signal Processing Society, vol. 5, Issue , May 14-19, 2006 pp. V-V. |
Fousek, Petr, “Doctoral Thesis: Extraction of Features for Automatic Recognition of Speech Based on Spectral Dynamics”, Czech Technical University in Prague. Czech Republic, Mar. 2007. |
Herre, Jurgen, “Temporal Noise Shaping, Quantization and Coding Methods in Perceptual Audio Coding: A Tutorial Introduction”, Proceedings of the AES 17th International Conference: High-Quality Audio Coding, Florence, Italy, Sep. 2-5, 1999. |
Jan Skoglund et al: “On Time-Frequency Masking in Voiced Speech” IEEE Transactions on Speech and Audio Processing, IEEE Service Center, New York NY, US, vol. 8, No. 4, Jul. 1, 2000, XP011054031. |
Jesteadt, Walt at al., °Forward Masking as a Function of Frequency, Masker Level and Signal Delay, J. Acoust. Soc. Am., 71(4), Apr. 1982, pp. 950-962. |
Johnston J D: “Transform Coding of Audio Signals Using Perceptual Noise Criteria” IEEE Journal on Selected Areas i n Communications, IEEE Service Center, Piscataway, US, vol. 6, No. 2, Feb. 1, 1988, pp. 314-323, XP002003779. |
Marios Athineos & Daniel P.W. Ellis: “Autoregressive modeling of temporal envelopes”, IEEE Transactions on Signal Processing IEEE Service Center, New York, NY, US—ISSN 1053-587X, Jun. 2007, pp. 1-9, XP002501759. |
Motlicek, Petr at al., “Speech Coding Based on Spectral Dynamics”, Lecture Notes in Computer Science, vol. 4188/2006. Springer/Berlin/Heidelberg, DE, Sep. 2006. |
Motlicek, Petr et al., “Wide-Band Perceptual Audio Coding Based on Frequency-Domain Linear Prediction”, Proceeding of ICASSP 2007, IEEE Signal Processing Society, Apr. 2007, pp. 1-265-1-268. |
Sinaga F et al: “Wavelet packet based audio coding using temporal masking” Information, Communications and Signal Processing, 2003 and Fourth Pacific Rim Conference on Multimedia. Proceedings of the 2003 Joint Confe rence of the Fourth International Conference on Singapore Dec. 15-18, 2003, Piscataway, NJ, USA, IEEE, vol. 3, Dec. 15, 2003, pp. 1380-1383, XP010702139. |
Sriram Ganapathy et al: “Temporal masking for bit-rate reduction in audio codec based on Frequency Domain Linear Prediction” Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on, IEEE, Piscataway, NJ, USA, Mar. 31, 2008, pp. 4781-4784, XP031251668. |
Zwicker, et al., “Psychoacoustics Facts and Models,” Second Updated Edition with 289 Figures, pp. 78-110, Jan. 1999. |
Number | Date | Country | |
---|---|---|---|
20070239440 A1 | Oct 2007 | US |
Number | Date | Country | |
---|---|---|---|
60791042 | Apr 2006 | US |