Perceptual Transform Coding
With the introduction of portable digital media players, the compact disk for music storage and audio delivery over the Internet, it is now common to store, buy and distribute music and other audio content in digital audio formats. The digital audio formats empower people to enjoy having hundreds or thousands of music songs available on their personal computers (PCs) or portable media players.
One benefit of digital audio formats is that a proper bit-rate (compression ratio) can be selected according to given constraints, e.g., file size and audio quality. On the other hand, one particular bit-rate is not able to cover all scenarios of audio applications. For instance, higher bit-rates may not be suitable for portable devices due to limited storage capacity. By contrast, higher bit-rates are better suited for high quality sound reproduction desired by audiophiles.
When audio content is not at a suitable bit-rate for the application scenario (e.g., when high bit-rate audio is desired to be loaded onto a portable device or transferred via the Internet), a way to change the bit-rate of the audio file is needed. One known solution for this is to use a transcoder, which takes one compressed audio bitstream that is coded at one bit-rate as its input and re-encodes the audio content to a new bit-rate.
The following Detailed Description concerns various transcoding techniques and tools that provide a way to modify the bit-rate of a compressed digital audio bitstream.
More particularly, the novel transcoding approach presented herein encodes additional side information in a compressed bitstream to preserve information used in certain stages of encoding. A transcoder uses this side information to avoid or skip certain encoding stages when transcoding the compressed bitstream to a different (e.g., lower) bit-rate. In particular, by encoding certain side information into an initially encoded compressed bitstream, the transcoder can skip certain computationally intensive encoding processes, such as a time-frequency transform, pre-processing and quality based bit-rate control. Using preserved side-information coded into the initial version compressed bitstream, the transcoder avoids having to fully decode the initial compressed bitstream into a reconstructed time-sampled audio signal, and avoids a full re-encoding of such reconstructed audio signal to the new target bit-rate. With certain processing stages omitted, the transcoder instead can merely partially decode the initial compressed bitstream, and partially re-encode to the new target bit-rate. In addition, the side-information can contain information which can only be derived from the original signal which can result in a better quality transcoding.
This Summary is provided to introduce a selection of concepts in a simplified form that is further described below in the Detailed Description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter. Additional features and advantages of the invention will be made apparent from the following detailed description of embodiments that proceeds with reference to the accompanying drawings.
Various techniques and tools for fast and high quality transcoding of digital audio content are described. These techniques and tools facilitate the transcoding of an audio content bitstream encoded at an initial bit-rate into a target bit-rate suitable to another application or usage scenario for further storage, transmission and distribution of the audio content.
The various techniques and tools described herein may be used independently. Some of the techniques and tools may be used in combination (e.g., in different phases of a transcoding process).
Various techniques are described below with reference to flowcharts of processing acts. The various processing acts shown in the flowcharts may be consolidated into fewer acts or separated into more acts. For the sake of simplicity, the relation of acts shown in a particular flowchart to acts described elsewhere is often not shown. In many cases, the acts in a flowchart can be reordered.
Much of the detailed description addresses representing, coding, decoding and transcoding audio information. Many of the techniques and tools described herein for representing, coding, decoding and transcoding audio information can also be applied to video information, still image information, or other media information sent in single or multiple channels.
I. Example Audio Encoders and Decoders
Though the systems shown in
A. Audio Encoder
The encoder 200 receives a time series of input audio samples 205 at some sampling depth and rate. The input audio samples 205 are for multi-channel audio (e.g., stereo) or mono audio. The encoder 200 compresses the audio samples 205 and multiplexes information produced by the various modules of the encoder 200 to output a bitstream 295 in a compression format such as a WMA format, a container format such as Advanced Streaming Format (“ASF”), or other compression or container format.
The frequency transformer 210 receives the audio samples 205 and converts them into data in the frequency (or spectral) domain. For example, the frequency transformer 210 splits the audio samples 205 of frames into sub-frame blocks, which can have variable size to allow variable temporal resolution. Blocks can overlap to reduce perceptible discontinuities between blocks that could otherwise be introduced by later quantization. The frequency transformer 210 applies to blocks a time-varying Modulated Lapped Transform (“MLT”), modulated DCT (“MDCT”), some other variety of MLT or DCT, or some other type of modulated or non-modulated, overlapped or non-overlapped frequency transform, or uses sub-band or wavelet coding. The frequency transformer 210 outputs blocks of spectral coefficient data and outputs side information such as block sizes to the multiplexer (“MUX”) 280.
For multi-channel audio data, the multi-channel transformer 220 can convert the multiple original, independently coded channels into jointly coded channels. Or, the multi-channel transformer 220 can pass the left and right channels through as independently coded channels. The multi-channel transformer 220 produces side information to the MUX 280 indicating the channel mode used. The encoder 200 can apply multi-channel rematrixing to a block of audio data after a multi-channel transform.
The perception modeler 230 models properties of the human auditory system to improve the perceived quality of the reconstructed audio signal for a given bitrate. The perception modeler 230 uses any of various auditory models and passes excitation pattern information or other information to the weighter 240. For example, an auditory model typically considers the range of human hearing and critical bands (e.g., Bark bands). Aside from range and critical bands, interactions between audio signals can dramatically affect perception. In addition, an auditory model can consider a variety of other factors relating to physical or neural aspects of human perception of sound.
The perception modeler 230 outputs information that the weighter 240 uses to shape noise in the audio data to reduce the audibility of the noise. For example, using any of various techniques, the weighter 240 generates weighting factors for quantization matrices (sometimes called masks) based upon the received information. The weighting factors for a quantization matrix include a weight for each of multiple quantization bands in the matrix, where the quantization bands are frequency ranges of frequency coefficients. Thus, the weighting factors indicate proportions at which noise/quantization error is spread across the quantization bands, thereby controlling spectral/temporal distribution of the noise/quantization error, with the goal of minimizing the audibility of the noise by putting more noise in bands where it is less audible, and vice versa.
The weighter 240 then applies the weighting factors to the data received from the multi-channel transformer 220.
The quantizer 250 quantizes the output of the weighter 240, producing quantized coefficient data to the entropy encoder 260 and side information including quantization step size to the MUX 280. In
The entropy encoder 260 losslessly compresses quantized coefficient data received from the quantizer 250, for example, performing run-level coding and vector variable length coding. The entropy encoder 260 can compute the number of bits spent encoding audio information and pass this information to the rate/quality controller 270.
The controller 270 works with the quantizer 250 to regulate the bitrate and/or quality of the output of the encoder 200. The controller 270 outputs the quantization step size to the quantizer 250 with the goal of satisfying bitrate and quality constraints.
In addition, the encoder 200 can apply noise substitution and/or band truncation to a block of audio data.
The MUX 280 multiplexes the side information received from the other modules of the audio encoder 200 along with the entropy encoded data received from the entropy encoder 260. The MUX 280 can include a virtual buffer that stores the bitstream 295 to be output by the encoder 200.
B. Audio Decoder
The decoder 300 receives a bitstream 305 of compressed audio information including entropy encoded data as well as side information, from which the decoder 300 reconstructs audio samples 395.
The demultiplexer (“DEMUX”) 310 parses information in the bitstream 305 and sends information to the modules of the decoder 300. The DEMUX 310 includes one or more buffers to compensate for short-term variations in bitrate due to fluctuations in complexity of the audio, network jitter, and/or other factors.
The entropy decoder 320 losslessly decompresses entropy codes received from the DEMUX 310, producing quantized spectral coefficient data. The entropy decoder 320 typically applies the inverse of the entropy encoding techniques used in the encoder.
The inverse quantizer 330 receives a quantization step size from the DEMUX 310 and receives quantized spectral coefficient data from the entropy decoder 320. The inverse quantizer 330 applies the quantization step size to the quantized frequency coefficient data to partially reconstruct the frequency coefficient data, or otherwise performs inverse quantization.
From the DEMUX 310, the noise generator 340 receives information indicating which bands in a block of data are noise substituted as well as any parameters for the form of the noise. The noise generator 340 generates the patterns for the indicated bands, and passes the information to the inverse weighter 350.
The inverse weighter 350 receives the weighting factors from the DEMUX 310, patterns for any noise-substituted bands from the noise generator 340, and the partially reconstructed frequency coefficient data from the inverse quantizer 330. As necessary, the inverse weighter 350 decompresses weighting factors. The inverse weighter 350 applies the weighting factors to the partially reconstructed frequency coefficient data for bands that have not been noise substituted. The inverse weighter 350 then adds in the noise patterns received from the noise generator 340 for the noise-substituted bands.
The inverse multi-channel transformer 360 receives the reconstructed spectral coefficient data from the inverse weighter 350 and channel mode information from the DEMUX 310. If multi-channel audio is in independently coded channels, the inverse multi-channel transformer 360 passes the channels through. If multi-channel data is in jointly coded channels, the inverse multi-channel transformer 360 converts the data into independently coded channels.
The inverse frequency transformer 370 receives the spectral coefficient data output by the multi-channel transformer 360 as well as side information such as block sizes from the DEMUX 310. The inverse frequency transformer 370 applies the inverse of the frequency transform used in the encoder and outputs blocks of reconstructed audio samples 395.
II. Transcoding Using Encoder Generated Side Information
The encoder also encodes side information in the bitstream 415 for use in transcoding the bitstream by the bit-rate transcoder 420. This side information for transcoding generally includes information such as encoding parameters that are generated during the encoding process and typically discarded by an encoder when encoding for single bit-rate applications. These encoding parameters are derived from the original source audio input, which again is otherwise unavailable at the other location, time or setting to the bit-rate transcoder 420.
The encoder 410 can quantize the side information, so as to reduce the increase in bit-rate that the side information otherwise adds to the compressed bitstream 415. At very low bit-rates, the side information is quantized down to 1 kbps, which is generally a negligible bit-rate increase in many applications. In some embodiments of the bit-rate transcoder, this small of a bit-rate increase can permit the encoder to code multiple versions of the side information to support transcoding to different bitstream formats.
The bit-rate transcoder 420 receives the bitstream 415 that is encoded at the initial bit-rate and transcodes the bitstream using the side information to produce another transcoded bitstream (“Bitstream 1”) 425 having a second bit-rate suitable to the other application. Due to audio information loss when encoding the first bit-stream to the initial bit-rate, the transcoding process cannot add audio information and therefore would generally transcode to a lower bit-rate. The bit-rate transcoder 420 also may pass the side information into the bitstream 425. However, because audio information would be lost with each transcoding to lower bit-rates, it generally would not be desirable to cascade transcoding the audio content to successively lower bit-rates. The bit-rate transcoder 420 therefore generally omits encoding the side information into the transcoded bitstream 425.
Each of the bitstreams 415 and 425 can then be stored, transmitted or otherwise distributed in their respective application scenarios to be decoded by decoders 430, 440. The decoders 430, 440 can be identical decoders (e.g., such as the decoder 300 described above), each capable of handling multiple bit-rates of encoded bitstreams. The decoders 430, 440 reconstruct the audio content as their output 435, 445 in their respective application scenarios.
The partial decoder performs various processing stages of the full audio decoder 300 (
The bit-rate transcoder 420 also includes a side information decoder 530 that decodes the encoder-generated side information from the input bitstream 415. The side information consists of useful encoding parameters obtained from processing of the original input audio samples 205 (
This side information in the illustrated implementation includes multi-channel (e.g., stereo) processing parameters, and rate control parameters. The rate control parameters can be data characterizing a quality to quantization step size curve that is utilized for rate control by the rate/quality controller 270 with the quantizer 250. In one specific example, the quality to quantization step size curve can be a noise-to-mask ratio (NMR) versus quantization step size curve. This NMR curve is utilized by the rate/quality controller 270 of the audio encoder 200 to dynamically determine the quantizer step-size needed to achieve a desired bit-rate of the output bitstream 295. Techniques for utilizing the NMR curve for audio compression rate control is described in more detail by Chen et al., U.S. Pat. No. 7,027,982, entitled, “Quality And Rate Control Strategy For Digital Audio.” The NMR curve can be easily modeled such that the curve can be fully characterized by simply encoding a few anchor points along the curve. The side information representing the NMR curve thus can be compactly encoded in the bitstream 415 using a relatively small proportion of the overall bit-rate.
In other implementations of the bit-rate transcoder 420, the side information also can include information used for other coding techniques at the encoder. For example, the side information can include encoding parameters used by the encoder for frequency extension coding techniques, such as described by Mehrotra et al., U.S. Patent Application Publication No. 20050165611, entitled, “Efficient Coding Of Digital Media Spectral Data Using Wide-Sense Perceptual Similarity.”
The side information decoder 530 passes the decoded encoding parameters to a parameter adjuster 540. Based on these encoding parameters, the parameter adjuster 540 adjusts processing by the multi-channel transformer 220 and bark weighter 240 in the partial encoder 520. In the case of the multi-channel transformer 220, the adjustments can include parameters used in channel pre-processing or modifying the channel transform being used for the output bitstream 425. In the case of the bark weighter 240, the adjustments can include modifying the bark weights used by the bark weighter based on the encoding parameters.
The side information decoder 530 also passes the decoded NMR curve data to a bit-rate/quality controller 550 that controls quantization by the quantizer 250, so as to adjust encoding to the new target bit-rate. Because the encoding parameters that were passed as side information in the input bitstream are generated by the encoder 410 from the original input audio samples 405, the channel transformer 220, bark weighter 240, and bit-rate/quality controlled quantizer 250 are able to perform their respective encoding at the new target bit-rate while preserving nearly the same quality as a one-time encoding of a bitstream to the new target bit-rate from the original input audio samples 405.
Further, because the bit-rate transcoder 420 is able to adjust the parameters for the multi-channel transformer and bark weighter stages based on the side information generated by the encoder 410 from the original input audio samples 405, the bit-rate transcoder is able to avoid having to fully reconstruct the audio samples before re-encoding to the new target bit-rate. In other words, the decoder portion of the bit-rate transcoder able to omit the inverse frequency transformer 370 of a full decoder, and the bit-rate transcoder's encoder portion omits the forward frequency transformer 210. The adjustment of the encoding parameters to the new target bit-rate is much less complex and takes much less computation than the inverse and forward frequency transform, which provides faster transcoding by the bit-rate transcoder 420 compared to the full decode-and-encode approach of the prior art transcoder 100 (
Further, because the bit-rate transcoder 420 is able to adjust the parameters for the multi-channel transformer and bark weighter stages based on the side information generated by the encoder 410 from the original input audio samples 405, the bit-rate transcoder is able to avoid having to fully reconstruct the audio samples before re-encoding to the new target bit-rate. In other words, the decoder portion of the bit-rate transcoder is able to omit the inverse frequency transformer 370 of a full decoder, and the bit-rate transcoder's encoder portion omits the forward frequency transformer 210. The adjustment of the encoding parameters to the new target bit-rate is much less complex and takes much less computation than the inverse and forward frequency transform, which provides faster transcoding by the bit-rate transcoder 420 compared to the full decode-and-encode approach of the prior art transcoder 100 (
III. Computing Environment
The bit-rate transcoder can be implemented in digital audio processing equipment of various forms, including specialized audio processing hardware which may be professional studio grade audio encoding equipment as well as end user audio devices (consumer audio equipment, and even portable digital media players). In a common implementation, the bit-rate transcoder can be implemented using a computer, such as a server, personal computer, laptop or the like. These various hardware implementations provide a generalized computing environment in which the transcoding technique described herein is performed.
With reference to
A computing environment may have additional features. For example, the computing environment 700 includes storage 740, one or more input devices 750, one or more output devices 760, and one or more communication connections 770. An interconnection mechanism (not shown) such as a bus, controller, or network interconnects the components of the computing environment 700. Typically, operating system software (not shown) provides an operating environment for software executing in the computing environment 700 and coordinates activities of the components of the computing environment 700.
The storage 740 may be removable or non-removable, and includes magnetic disks, magnetic tapes or cassettes, CDs, DVDs, or any other medium which can be used to store information and which can be accessed within the computing environment 700. The storage 740 stores instructions for the software 780.
The input device(s) 750 may be a touch input device such as a keyboard, mouse, pen, touchscreen or trackball, a voice input device, a scanning device, or another device that provides input to the computing environment 700. For audio or video, the input device(s) 750 may be a microphone, sound card, video card, TV tuner card, or similar device that accepts audio or video input in analog or digital form, or a CD or DVD that reads audio or video samples into the computing environment. The output device(s) 760 may be a display, printer, speaker, CD/DVD-writer, network adapter, or another device that provides output from the computing environment 700.
The communication connection(s) 770 enable communication over a communication medium to one or more other computing entities. The communication medium conveys information such as computer-executable instructions, audio or video information, or other data in a data signal. A modulated data signal is a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media include wired or wireless techniques implemented with an electrical, optical, RF, infrared, acoustic, or other carrier.
Embodiments can be described in the general context of computer-readable media. Computer-readable media are any available media that can be accessed within a computing environment. By way of example, and not limitation, with the computing environment 700, computer-readable media include memory 720, storage 740, and combinations of any of the above.
Embodiments can be described in the general context of computer-executable instructions, such as those included in program modules, being executed in a computing environment on a target real or virtual processor. Generally, program modules include routines, programs, libraries, objects, classes, components, data structures, etc. that perform particular tasks or implement particular data types. The functionality of the program modules may be combined or split between program modules as desired in various embodiments. Computer-executable instructions for program modules may be executed within a local or distributed computing environment.
For the sake of presentation, the detailed description uses terms like “determine,” “receive,” and “perform” to describe computer operations in a computing environment. These terms are high-level abstractions for operations performed by a computer, and should not be confused with acts performed by a human being. The actual computer operations corresponding to these terms vary depending on implementation.
In view of the many possible embodiments to which the principles of our invention may be applied, we claim as our invention all such embodiments as may come within the scope and spirit of the following claims and equivalents thereto.
Number | Name | Date | Kind |
---|---|---|---|
4142071 | Croisier et al. | Feb 1979 | A |
4216354 | Esteban et al. | Aug 1980 | A |
4464783 | Beraud et al. | Aug 1984 | A |
5243420 | Hibi | Sep 1993 | A |
5341457 | Hall et al. | Aug 1994 | A |
5381143 | Shimoyoshi et al. | Jan 1995 | A |
5454011 | Shimoyoshi | Sep 1995 | A |
5461378 | Shimoyoshi et al. | Oct 1995 | A |
5463424 | Dressler | Oct 1995 | A |
5537440 | Eyuboglu et al. | Jul 1996 | A |
5541852 | Eyuboglu et al. | Jul 1996 | A |
5544266 | Koppelmans et al. | Aug 1996 | A |
5617142 | Hamilton | Apr 1997 | A |
5623424 | Azadegan et al. | Apr 1997 | A |
5659660 | Plenge et al. | Aug 1997 | A |
5835495 | Ferriere | Nov 1998 | A |
5970173 | Lee et al. | Oct 1999 | A |
6044089 | Ferriere | Mar 2000 | A |
6084909 | Chiang et al. | Jul 2000 | A |
6259741 | Chen et al. | Jul 2001 | B1 |
6370502 | Wu et al. | Apr 2002 | B1 |
6393059 | Sugiyama | May 2002 | B1 |
6404814 | Apostolopoulos et al. | Jun 2002 | B1 |
6426977 | Lee et al. | Jul 2002 | B1 |
6434197 | Wang et al. | Aug 2002 | B1 |
6463414 | Su et al. | Oct 2002 | B1 |
6466623 | Youn et al. | Oct 2002 | B1 |
6496216 | Feder | Dec 2002 | B2 |
6496868 | Krueger et al. | Dec 2002 | B2 |
6522693 | Lu et al. | Feb 2003 | B1 |
6526099 | Christopoulos et al. | Feb 2003 | B1 |
6606600 | Murgia et al. | Aug 2003 | B1 |
6647061 | Panusopone et al. | Nov 2003 | B1 |
6650705 | Vetro et al. | Nov 2003 | B1 |
6678654 | Zinser, Jr. et al. | Jan 2004 | B2 |
6728317 | Demos | Apr 2004 | B1 |
6757648 | Chen et al. | Jun 2004 | B2 |
6925501 | Wang et al. | Aug 2005 | B2 |
6931064 | Mori et al. | Aug 2005 | B2 |
6934334 | Yamaguchi et al. | Aug 2005 | B2 |
6944224 | Zhao | Sep 2005 | B2 |
6961377 | Kingsley | Nov 2005 | B2 |
7009935 | Abrahamsson et al. | Mar 2006 | B2 |
7027982 | Chen et al. | Apr 2006 | B2 |
7039116 | Zhang et al. | May 2006 | B1 |
7058127 | Lu et al. | Jun 2006 | B2 |
7142601 | Kong et al. | Nov 2006 | B2 |
7295612 | Haskell | Nov 2007 | B2 |
7318027 | Lennon et al. | Jan 2008 | B2 |
7408918 | Ramalho | Aug 2008 | B1 |
7953604 | Mehrotra et al. | May 2011 | B2 |
20020080877 | Lu et al. | Jun 2002 | A1 |
20020172154 | Uchida et al. | Nov 2002 | A1 |
20030206597 | Kolarov et al. | Nov 2003 | A1 |
20030227974 | Nakamura et al. | Dec 2003 | A1 |
20040136457 | Funnell et al. | Jul 2004 | A1 |
20040165667 | Lennon et al. | Aug 2004 | A1 |
20050041740 | Sekiguchi | Feb 2005 | A1 |
20050075869 | Gersho et al. | Apr 2005 | A1 |
20050165611 | Mehrotra et al. | Jul 2005 | A1 |
20050232497 | Yogeshwar et al. | Oct 2005 | A1 |
20060069550 | Todd et al. | Mar 2006 | A1 |
20060120610 | Kong et al. | Jun 2006 | A1 |
20060245491 | Jam et al. | Nov 2006 | A1 |
20070053444 | Shibata et al. | Mar 2007 | A1 |
20070058718 | Shen et al. | Mar 2007 | A1 |
20070058729 | Yoshinari | Mar 2007 | A1 |
20070219787 | Manjunath et al. | Sep 2007 | A1 |
20080021704 | Thumpudi et al. | Jan 2008 | A1 |
20080071528 | Ubale et al. | Mar 2008 | A1 |
20080144723 | Chen et al. | Jun 2008 | A1 |
20080187046 | Joch | Aug 2008 | A1 |
20080221908 | Thumpudi et al. | Sep 2008 | A1 |
20080234846 | Malvar | Sep 2008 | A1 |
20080259921 | Nadarajah | Oct 2008 | A1 |
20080260048 | Oomen et al. | Oct 2008 | A1 |
20090106031 | Jax et al. | Apr 2009 | A1 |
20120035941 | Thumpudi et al. | Feb 2012 | A1 |
Number | Date | Country |
---|---|---|
2002-152752 | May 2002 | JP |
2005-252555 | Sep 2005 | JP |
10-2005-0089720 | Sep 2005 | KR |
WO 0195633 | Dec 2001 | WO |
Entry |
---|
WIPO Bibliographic Document, WO/2007/131886, Publication Date: Nov. 22, 2007, Two Pages. |
WIPO Bibliographic Document, WO/2005/078707, Publication Date: Aug. 25, 2005, Two Pages. |
Khan et al., “Architecture Overview of Motion Vector Reuse Mechanism in MPEG-2 Transcoding,” Technical Report TR2001-01-01, Jan. 2001, 7 pp. |
Moshnyaga, “An Implementation of Data Reusable MPEG Video Coding Scheme,” Proceedings of World Academy of Science, Engineering and Technology, vol. 2, pp. 193-196, Jan. 2005. |
Moshnyaga, “Reduction of Memory Accesses in Motion Estimation by Block-Data Reuse,” ICASSP '02 Proceedings, IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 3, pp. III-3128-III-3131, May 2002. |
Nasim et al., “Architectural Optimizations for Software-Bassed MPEG4 Video Encoder,” 13th European Signal Processing Conference: EUSIPCO'2005, 4 pp., Sep. 2005. |
Senda et al., “A Realtime Software MPEG Transcoder Using a Novel Motion Vector Reuse and SIMD Optimization Techniques,” ICASSP '99 Proceedings, 1999 IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 4, pp. 2359-2362, Mar. 1999. |
Zhou et al., “Motion Vector Reuse Algorithm to Improve Dual-Stream Video Encoder,” ICSP 2008, 9th International Conference on Signal Processing, pp. 1283-1286, Oct. 2008. |
Kamikura K., Watanabe H., Jozawa H., Kotera H., and Ichinose S. “Global brightness-variation compensation for video coding”. —IEEE Trans. on Circuits and Systems for Video Technology—vol. 8, No. 8, (Dec. 1998):988-1000. |
Takahashi K., Satou K., Suzuki T., and Yagasaki Y. “Motion Vector Synthesis Algorithm for MPEG2-to-MPEG4 Transcoder” -Proc. of SPIE— vol. 4310 (Jan. 2001): 872-882. |
U.S. Appl. No. 60/341,674, filed Dec. 17, 2001, Lee et al. |
U.S. Appl. No. 60/488,710, filed Jul. 18, 2003, Srinivasan et al. |
U.S. Appl. No. 60/501,081, filed Sep. 7, 2003, Srinivasan et al. |
U.S. Appl. No. 60/501,133, filed Sep. 7, 2003, Holcomb et al. |
Acharya et al., “Compressed Domain Transcoding of MPEG,” Proc. IEEE Int'l Conf. on Multimedia Computing and Systems, Austin, Texas, 20 pp. (Jun. 1998). |
Amir et al., “An Application Level Video Gateway,” Proc. ACM Multimedia 95, 10 pp. (Nov. 1995). |
Assuncao et al., “A Frequency-Domain Video Transcoder for Dynamic Bit-Rate Reduction of MPEG-2 Bit Streams,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 8, No. 8, pp. 953-967 (Dec. 1998). |
Assuncao et al., “Buffer Analysis and Control in CBR Video Transcoding,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 10, No. 1, pp. 83-92 (Feb. 2000). |
Assuncao et al., “Transcoding of Single-Layer MPEG Video Into Lower Rates,” IEE Proc.—Vis. Image Signal Process., vol. 144, No. 6, pp. 377-383 (Dec. 1997). |
Braun et al., “Motion-Compensating Real-Time Format Converter for Video on Multimedia Displays,” Proceedings IEEE 4th International Conference on Image Processing (ICIP-97), vol. I, pp. 125-128 (1997). |
Brightwell et al., “Flexible Switching and Editing of MPEG-2 Video Bitstreams,” IBC-97, Amsterdam, 11 pp. (1997). |
Crooks, “Analysis of MPEG Encoding Techniques on Picture Quality,” Tektronix Application Note, 11 pp. (Jun. 1998). |
Dipert, “Image Compression Article Addendum,” EDN Magazine, 8 pp. (Jun. 18, 1998). |
Fogg, “Question That Should Be Frequently Asked About MPEG,” Version 3.8, 46 pp. (1996). |
Gibson et al., Digital Compression for Multimedia, “Chapter 4: Quantization,” Morgan Kaufman Publishers, Inc., pp. 113-138 (1998). |
Gibson et al., Digital Compression for Multimedia, “Chapter 7: Frequency Domain Coding,” Morgan Kaufman Publishers, Inc., pp. 227-262 (1998). |
Hamming, Digital Filters, Second Edition, “Chapter 2: The Frequency Approach,” Prentice-Hall, Inc., pp. 19-31 (1977). |
Haskell et al., Digital Video: An Introduction to MPEG-2, Table of Contents, International Thomson Publishing, 6 pp. (1997). |
ISO/IEC, “ISO/IEC 11172-2, Information Technology—Coding of Moving Pictures and Associated Audio for Digital Storage Media at up to about 1.5 Mbit/s—Part 2: Video,” 112 pp. (1993). |
ISO/IEC, “JTC1/SC29/WG11 N2202, Information Technology—Coding of Audio-Visual Objects: Visual, ISO/IEC 14496-2,” 329 pp. (1998). |
ITU-T, “ITU-T Recommendation H.261, Video Codec for Audiovisual Services at p x 64 kbits,” 25 pp. (1993). |
ITU-T, “ITU-T Recommendation H.262, Information Technology—Generic Coding of Moving Pictures and Associated Audio Information: Video,” 205 pp. (1995). |
ITU-T, “ITU-T Recommendation H.263, Video Coding for Low Bit Rate Communication,” 162 pp. (1998). |
Joint Video Team (JVT) of ISO/IEC MPEG & ITU-T VCEG, “Joint Final Committee Draft (JFCD) of Joint Video Specification,” JVT-D157, 207 pp. (Aug. 2002). |
Keesman et al., “Transcoding of MPEG Bitstreams,” Signal Processing: Image Communication 8, pp. 481-500 (1996). |
Knee et al., “Seamless Concatenation—A 21st Century Dream,” 13 pp. (1997). |
Lei et al., “Rate Adaptation Transcoding for Precoded Video Streams,” 13 pp. (2000). |
Leventer et al., “Towards Optimal Bit-Rate Control in Video Transcoding,” ICIP, xx pp. (2003). |
Microsoft Corporation, “Microsoft Debuts New Windows Media Player 9 Series, Redefining Digital Media on the PC,” 4 pp. (Sep. 4, 2002) [Downloaded from the World Wide Web on May 14, 2004]. |
Mook, “Next-Gen Windows Media Player Leaks to the Web,” BetaNews, 17 pp. (Jul. 2002) [Downloaded from the World Wide Web on Aug. 8, 2003]. |
Printouts of FTP directories from http://ftp3.itu.ch, 8 pp. [Downloaded from the World Wide Web on Sep. 20, 2005]. |
Reader, “History of MPEG Video Compression—Ver. 4.0,” 99 pp. [Document marked Dec. 16, 2003]. |
Shanableh et al., “Heterogeneous Video Transcoding to Lower Spatio-Temporal Resolutions and Different Encoding Formats,” IEEE Transactions on Multimedia, 31 pp. (Jun. 2000). |
Shanableh et al., “Transcoding of Video Into Different Encoding Formats,” ICASSP—2000 Proceedings, vol. IV of VI, pp. 1927-1930 (Jun. 2000). |
SMPTE, “SMPTE 327M-2000—MPEG-2 Video Recoding Data Set,” 9 pp. (Jan. 2000). |
Sun et al., “Architectures for MPEG Compressed Bitstream Scaling,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 6, No. 2, pp. 191-199 (Apr. 1996). |
Sun et al., “Lossless Coders,” Digital Signal Processing for Multimedia Systems, Chapter 15, pp. 385-416 (1999). (Sun et al.?). |
Swann et al., “Transcoding of MPEG-II for Enhanced Resilience to Transmission Errors,” Cambridge University Engineering Department, Cambridge, UK, pp. 1-4. 1996. |
Tan et al., “On the Methods and Performances of Rational Downsizing Video Transcoding,” Signal Processing: Image Communication 19, pp. 47-65 (2004). |
Tektronix Application Note, “Measuring and Interpreting Picture Quality in MPEG Compressed Video Content,” 8 pp. (2001). |
Tudor et al., “Real-Time Transcoding of MPEG-2 Video Bit Streams,” BBC R&D, U.K., 6 pp. (1997). |
Vishwanath et al., “A VLSI Architecture for Real-Time Hierarchical Encoding/Decoding of Video Using the Wavelet Transform,” Proc. ICASSP, 5 pp. (1994). |
Watkinson, The MPEG Handbook, pp. 275-281 (2004). |
Werner, “Generic Quantiser for Transcoding of Hybrid Video,” Proc. 1997 Picture Coding Symposium, Berlin, Germany, 6 pp. (Sep. 1997). |
Werner, “Requantization for Transcoding of MPEG-2 Intraframes,” IEEE Transactions on Image Processing, vol. 8, No. 2, pp. 179-191 (Feb. 1999). |
Wiegand et al., “Overview of the H.264/AVC Coding Standard,” IEEE Trans. on Circuits and Systems for Video Technology, vol. 13, No. 7, pp. 560-576 (Jul. 2003). |
Youn et al., “Video Transcoder Architectures for Bit Rate Scaling of H.263 Bit Streams,” ACM Multimedia 1999, Orlando, Florida, pp. 243-250 (1999). |
H.264—a New Technology for Video Compression, Nuntius Systems, Mar. 6, 2004, <http://www.nuntius.com/technology3.html>, 4 pages. |
ISO/IEC MPEG-2 Test Model 5 (with Overview), Mar. 1993, 10 pages. |
Kari, J.; Gavrilescu, M., “Intensity controlled motion compensation,” Data Compression Conference, 1998, DCC '98. Porceedings, vol., No., pp. 249-258, Mar. 30-Apr. 1, 1998. |
Roy et al., “Application Level Hand-off Support for Mobile Media Transcoding Sessions,” Proceedings of the 12th International Workshop on Network and Operating Systems Support for Digital Audio and Video, 2002, 22 pages. |
Vetro et al., “Complexity-Quality Analysis of Transcoding Architectures for Reduced Spatial Resolution,” IEEE Transactions on Consumer Electronics, 2002, 9 pages. |
Number | Date | Country | |
---|---|---|---|
20090125315 A1 | May 2009 | US |