The disclosed embodiments relate generally to video display technology, and more specifically to transcoding digital video data.
Transcoding is the direct digital-to-digital conversion of one digitally encoded format to another format. Transcoding can be found in many areas of content adaptation and is often used to convert incompatible or obsolete data into a more suitable format. It is also used to archive or distribute content on different types of digital media for use in different playback devices, such as converting songs from CD format to MP3 format for playback on computers and MP3 players. Transcoding is also commonly used in the area of mobile phone content adaptation. In this case, transcoding is necessary due to the diversity of mobile devices and their capabilities. This diversity requires an intermediate state of content adaptation in order to make sure that the source content will adequately play back on the target device.
One popular area in which transcoding is used is the Multimedia Messaging Service (MMS), which is the technology used to send or receive messages with media (image, sound, text and video) between mobile phones. For example, when a camera phone is used to take a digital picture, a high-quality image usually of at least 640.times.480 resolution is created. Sending the image to another phone may require that this high resolution image be transcoded to a lower resolution image with less color in order to better fit the target device's screen size and display limitations. Transcoding is also used by home theatre software, such as to reduce the usage of disk space by video files. The most common operation in this application is the transcoding of MPEG-2 files to the MPEG-4 format. With the huge number of online multimedia content and number of different devices available, real-time transcoding from any input format to any output format is becoming a necessary to provide true search capability for any multimedia content on any mobile device.
Present transcoding schemes typically utilize only the CPU resources of the processing system. Because of the size of video data, this can present a substantial processing overhead for the system, while additional available resources, such as GPU bandwidth often is underutilized in such operations.
What is desired, therefore, is a transcoding process that utilizes both GPU and CPU resources for the tasks performed in the transcode pipeline.
Embodiments are illustrated by way of example and not limitation in the figures of the accompanying drawings, in which like references indicate similar elements and in which:
Embodiments of the invention as described herein provide a solution to the problems of conventional methods as stated above. In the following description, various examples are given for illustration, but none are intended to be limiting. Embodiments are directed to a transcoding system that shares the workload of video transcoding through the use of multiple central processing unit (CPU) cores and/or one or more graphical processing units (GPU), including the use of two components within the GPU: a dedicated hardcoded or programmable video decoder for the decode step and compute shaders for scaling and encoding. The system combines usage of an industry standard Microsoft DXVA method for using the GPU to accelerate video decode with a GPU encoding scheme, along with an intermediate step of scaling the video.
Transcoding generally refers to the process of transforming video data from a first format to a second format. Transcoding involves starting with encoded video and encoding it again after a decode process. For example, source video that is encoded in one format and stored digitally is decoded, and then encoded to another format, or even re-encoded to the same format. Intermediate operations may also be performed on the transcoded video, such as scaling, blending with other video, and so prior to encoding in the second video format.
In one embodiment, the video transcoding pipeline of
In system 300, the encoded bitstream 302 is decoded using entropy decoder 304. The decoding process involves a number of variable length decoding steps, including inverse DCT (iDCT) 306, dequantization, renormalization of outputs 310, and so on. In one embodiment, a motion estimation process 308 is performed on reference frames 312 generated by the reconstruction step 310. The decoded video frames 314 are then scaled in a video scaler process 316. The scaled video frames 318 are then encoded to the second format through an encoding process illustrated as blocks 320 to 324. The video frames are first preprocessed 320, and then input to a motion estimation engine 322. An MB coding process 324 then generates the bitstream 328 of the second format, and reference frames 326 that are fed back to the motion estimation engine 322. In one embodiment, the one or more of the encoding processes of
As shown in
Under an embodiment, there are three different ways to decode the video bitstream 102 using the processing platform 102 that has both CPU and GPU processors. Depending upon the encoding format of the original bitstream 102, as well as the other transcode processes involved, one of the three decode methods is selected for a particular input bitstream.
In a first method, the CPU alone 204 is used to perform all of the steps related to the decode function 104. This is generally a software only implementation, in which the GPU 206 is then used to perform the scaling 106 and encoding functions. The CPU decoder method may also be referred to as a software decoder.
In a second method, a portion of the decoding process is performed on the CPU and the remainder is performed on the GPU. This is a software plus graphics chip solution that comprises a GPU programmable decoder system. In this method, the decoding performed on the CPU includes decoding steps up to the entropy decode step 304. The entropy decoding step and the optional scaling step 316 are performed on the GPU.
In a third method, dedicated hardware decoder circuitry present in the GPU (GPU hardware decoder) is employed for the decoding. Decoding is performed using a hardware/programmable video processor to decode the bitstream through the entropy decode step 304. The GPU hardware decoder may implemented in a programmable processing chip that has dedicated hardware with specific instructions, and is designed to implement certain specifications of one or more codecs. In one embodiment, the GPU hardware decoder is implemented as a UVD (Unified Video Decoder) portion of the GPU hardware, and is configured to support the hardware decode of H.264 and VC-1 video codec standards, or other codecs. In general, the UVD handles decoding of H.264/AVC, and VC-1 video codecs almost entirely in hardware. The UVD offloads nearly all of the video-decoder process for VC-1 and H.264, requiring minimal host (CPU) attention. In addition to handling VLC/CAVLC/CABAC, frequency transform, pixel prediction and inloop deblocking functions, the UVD also contains an advanced video post-processing block. Various post-processing operations from the UVD may include denoising, de-interlacing, scaling/resizing, and similar operations. The hardware/programmable video processor (e.g., UVD) may be implemented through any appropriate combination of processing circuitry that performs reverse entropy (variable length decode) uses programmable GPU shaders to do a remaining portion of the decoding operations.
For purposes of this description, “H.264” refers to the standard for video compression that is also known as MPEG-4 Part 10, or MPEG-4 AVC (Advanced Video Coding). H.264 is one of the block-oriented motion-estimation-based codecs developed by the ITU-T Video Coding Experts Group (VCEG) together with the ISO/IEC Moving Picture Experts Group (MPEG).
In one embodiment, the decode stage 304-314 of the UVD is configured to allow copying of the decoded bitstream out of the GPU 206 and into the CPU 204. This allows the CPU to perform the encoding steps 318-324 in the CPU. For this embodiment, the output from the video scaler 316 is output to CPU memory 208. This allows the system to share the processing load between the GPU and the CPU, since a copy of the data must be made available to both the GPU and the CPU. The encoding process using the UVD hardware allows a copy to be made at high speed for use by the CPU. This allows a copy of the images to be maintained by both the GPU and CPU separately. Any shared processing that is done by both the GPU and CPU thus involves the sharing of certain information rather than the transmission of the full images between the two units. This greatly reduces the bandwidth overhead required for shared CPU/GPU operations on the input video bitstream.
The scaler 316 of
For the embodiment in which the second method of decoding is implemented, that is, the software plus graphics chip solution in which decoding is performed on both the CPU and GPU, there are two possible variations that may be implemented through the use of the DirectX Video Acceleration (DXVA) interface defined by Microsoft Corporation. DXVA is an API (application program interface) specification that allows video decoding to be hardware accelerated and specifies how a decoder accesses the portions that are on the GPU. It allows the decoder to offload a number (e.g., the last two or three) decode pipeline stages to the GPU, after which the data is present on the GPU and ready to display. The pipeline allows certain CPU-intensive operations such as iDCT, motion compensation, deinterlacing and color correction to be offloaded to the GPU.
DXVA works in conjunction with the video rendering model used by the video card of the system. The DXVA is used by software video decoders to define a codec-specific pipeline for hardware-accelerated decoding and rendering of the codec. The pipeline starts at the CPU which is used for parsing the media stream and conversion to DXVA-compatible structures. DXVA specifies a set of operations that can be hardware accelerated and device driver interfaces (DDIs) that the graphic driver can implement to accelerate the operations. If the codec needs any of the supported operations, it can use these interfaces to access the hardware-accelerated implementation of these operations. The decoded video is handed over to the hardware video renderer where further post-processing might be applied to it before being rendered to the device. DXVA specifies the motion compensation 308 DDI, which specifies the interfaces for iDCT operations 306, Huffman coding, color correction, motion compensation, alpha blending, inverse quantization colorspace conversion and frame-rate conversion operations, among others.
In general, the DXVA API is used for Microsoft Windows compatible processing platforms. For processing platforms that use other operating systems, a DXVA-like interface can be used. Such an interface can be any API that offloads certain decode pipeline stages to the GPU. For Linux compatible processing platforms, the API may be implemented through the X-Video Motion Compensation (XvMC) API, for example. XvMC is an extension of the X video extension (Xv) for the X Window System, and allows video programs to offload portions of the video decoding process to the GPU.
For the embodiment in which the CPU performs the entropy decoding process 304 and the GPU performs the iDCT 306 and motion compensation 308 steps onward, the DXVA API dictates the information that is transmitted from the decoder 304 to each of the iDCT 306 and motion compensation 308 processes. Various different versions of the DXVA standard may be available, such as DXVA 1.0 and 2.0. For the embodiment in which the UVD performs the steps of the entropy decode process 304 onward, the DXVA 2.0 API specification may be used.
Embodiments of the decoding pipeline can be applied to video transcode and editing applications in which two or more bitstreams are processed. The different choices available for the decode process, that is CPU only, CPU plus GPU, UVD, or use of the DXVA 1.0 or 2.0 API facilitates video editing applications that may use multiple bitstreams, each of which may represent different scenes, for example.
In the video edit pipeline of
Although two bitstreams are illustrated in
The blending process 416 may utilize any built-in blending capability available on the GPU. For example, the GPU may include texture processing capabilities that allow for blending of textures using resident processes. The video effects provided within the video blend and effect process 416 may include certain commercially available effects provided by known video editors, such as blending left to right, top to bottom, or other transition effects.
Embodiments of the video decode method can be applied to standard predictive MPEG schemes. In processing a video stream, the MPEG encoder produces three types of coded frames. The first type of frame is called an “I” frame or intra-coded frame. This is the simplest type of frame and is a coded representation of a still image. In general, no processing is performed on I-frames; their purpose is to provide the decoder a starting point for decoding the next set of frames. The next type of frame is called a “P” frame or predicted frame. Upon decoding, P-frames are created from information contained within the previous P-frames or I-frames. The third type of frame, and the most common type, is the “B” frame or bi-directional frame. B-frames are both forward and backward predicted and are constructed from the last and the next P or I-frame. Both P-frames and B-frames are inter-coded frames. A codec encoder may encode a stream as the following sequence: IPBB . . . and so on. In digital video transmission, B-frames are often not used. In this case, the sequence may just consist of I-frames followed by a number of P-frames. For this embodiment, the initial I-frame is encoded as lossless, and all following P-frames are encoded as some fraction of lossless compression and some fraction as no-change.
In MPEG and similar systems, decoding frames creates orders frames in a decode order, which are different than the order that they are to be displayed. In this case, the video editor pipeline of
The output stream can then encoded or sent to optional display 514 through a frame rate logic process 512. The frame rate logic process 512 adapts the frame processing capability to the display rate capability, i.e., the refresh rate of the display to optimize the processing and display functions of the system.
Embodiments of the transcoding process allow the selection of decoding using a number of combinations of hardware and software structures.
In one embodiment, the choice of decoding scheme 706a-c may be selected explicitly by the user, or it may be selected automatically by a process executed in a processing system. The automatic process may select the decoding process depending upon the resources available. For example, if the UVD is available, the automatic process may dictate that the UVD be used exclusively for decoding. There may also be a defined default and one or more backup methods, such as decode using the UVD by default unless it is not available, in which case, decode using the CPU only, and so on. The scaler process may also be selected based on the automatic process depending on the decode scheme. For example, if the UVD is used for decoding, it should also be used for scaling, and if the CPU is used for decoding, it should also be used for scaling.
Embodiments of the transcode system and method combine the use of a GPU for encoding along with use of the GPU for decoding and scaling. The system enables the use of a UVD portion of GPU hardware to decode H.264 or VC-1 encoded video data, along with hardware based iDCT and motion compensation functions for MPEG-2. It also enables the use of the existing standard Microsoft API (DXVA 1.0 and 2.0) for facilitating the decode operation. The intermediate and optional step of scaling the video (such as re-sizing from one resolution to another) also employs the GPU functionality. The transcode pipeline also adds the capability of decoding multiple streams and performing a blending or special effects operations, such as for video editing. These operations can also use the GPU resources.
The processing platform of
Embodiments are applicable to all transcoding where the input format is decoded to raw pixels, then re-encoded into a different or the same codec, in a different resolution or the same resolution, and a different bitrate or the same bitrate or quality settings. The transcoding operation may be compressed domain transcoding, which is a method used by programs that compress DVD video, such as for DVD backup programs.
Although embodiments described herein have been directed to transcoding applications, it should be noted that such embodiments are also applicable to other applications, such as transrating. For example, lower bitrate transrating is a process similar to transcoding in which files are coded to a lower bitrate without changing video formats; this can include sample rate conversion, but may use the same sampling rate but higher compression. Transrating is used to fit a given media into smaller storage space, such as fitting a DVD content onto a video CD, or transmitting content over a lower bandwidth channel.
Although embodiments have been described with reference to graphics systems comprising GPU devices or visual processing units (VPU), which are dedicated or integrated graphics rendering devices for a processing system, it should be noted that such embodiments can also be used for many other types of video production engines that are used in parallel. Such video production engines may be implemented in the form of discrete video generators, such as digital projectors, or they may be electronic circuitry provided in the form of separate IC (integrated circuit) devices or as add-on cards for video-based computer systems.
In one embodiment, the system including the GPU/CPU processing platform comprises a computing device that is selected from the group consisting of: a personal computer, a workstation, a handheld computing device, a digital television, a media playback device, smart communication device, and a game console, or any other similar processing device.
Aspects of the system described herein may be implemented as functionality programmed into any of a variety of circuitry, including programmable logic devices (“PLDs”), such as field programmable gate arrays (“FPGAs”), programmable array logic (“PAL”) devices, electrically programmable logic and memory devices and standard cell-based devices, as well as application specific integrated circuits. Some other possibilities for implementing aspects include: memory devices, microcontrollers with memory (such as EEPROM), embedded microprocessors, firmware, software, etc. Furthermore, aspects of the video transcoding system may be embodied in microprocessors having software-based circuit emulation, discrete logic (sequential and combinatorial), custom devices, fuzzy (neural) logic, quantum devices, and hybrids of any of the above device types. The underlying device technologies may be provided in a variety of component types, e.g., metal-oxide semiconductor field-effect transistor (“MOSFET”) technologies like complementary metal-oxide semiconductor (“CMOS”), bipolar technologies like emitter-coupled logic (“ECL”), polymer technologies (e.g., silicon-conjugated polymer and metal-conjugated polymer-metal structures), mixed analog and digital, and so on.
It should also be noted that the various functions disclosed herein may be described using any number of combinations of hardware, firmware, and/or as data and/or instructions embodied in various machine-readable or computer-readable media, in terms of their behavioral, register transfer, logic component, and/or other characteristics. Computer-readable media in which such formatted data and/or instructions may be embodied include, but are not limited to, non-volatile storage media in various forms (e.g., optical, magnetic or semiconductor storage media) and carrier waves that may be used to transfer such formatted data and/or instructions through wireless, optical, or wired signaling media or any combination thereof. Examples of transfers of such formatted data and/or instructions by carrier waves include, but are not limited to, transfers (uploads, downloads, e-mail, etc.) over the Internet and/or other computer networks via one or more data transfer protocols (e.g., HTTP, FTP, SMTP, and so on).
Unless the context clearly requires otherwise, throughout the description and the claims, the words “comprise,” “comprising,” and the like are to be construed in an inclusive sense as opposed to an exclusive or exhaustive sense; that is to say, in a sense of “including, but not limited to.” Words using the singular or plural number also include the plural or singular number respectively. Additionally, the words “herein,” “hereunder,” “above,” “below,” and words of similar import refer to this application as a whole and not to any particular portions of this application. When the word “or” is used in reference to a list of two or more items, that word covers all of the following interpretations of the word: any of the items in the list, all of the items in the list and any combination of the items in the list.
The above description of illustrated embodiments of the video transcoding system is not intended to be exhaustive or to limit the embodiments to the precise form or instructions disclosed. While specific embodiments of, and examples for, processes in graphic processing units or ASICs are described herein for illustrative purposes, various equivalent modifications are possible within the scope of the disclosed methods and structures, as those skilled in the relevant art will recognize.
The elements and acts of the various embodiments described above can be combined to provide further embodiments. These and other changes can be made to the disclosed system in light of the above detailed description.
In general, in the following claims, the terms used should not be construed to limit the disclosed method to the specific embodiments disclosed in the specification and the claims, but should be construed to include all operations or processes that operate under the claims. Accordingly, the disclosed structures and methods are not limited by the disclosure, but instead the scope of the recited method is to be determined entirely by the claims.
While certain aspects of the disclosed embodiments are presented below in certain claim forms, the inventors contemplate the various aspects of the methodology in any number of claim forms. For example, while only one aspect may be recited as embodied in machine-readable medium, other aspects may likewise be embodied in machine-readable medium. Accordingly, the inventor reserves the right to add additional claims after filing the application to pursue such additional claim forms for other aspects.
The present application is a Continuation of U.S. patent application Ser. No. 12/264,892, filed Nov. 4, 2008 which is a Continuation-in-Part of U.S. patent application Ser. No. 12/189,060, filed Aug. 8, 2008, which is a Continuation-In-Part application of U.S. patent application Ser. No. 11/960,640, filed Dec. 19, 2007, which claims the benefit of U.S. Provisional Patent Application No. 60/928,799, filed May 11, 2007.
Number | Name | Date | Kind |
---|---|---|---|
5325493 | Herrell et al. | Jun 1994 | A |
5329615 | Peaslee et al. | Jul 1994 | A |
5452103 | Brusewitz | Sep 1995 | A |
5712664 | Reddy | Jan 1998 | A |
5757385 | Narayanaswami | May 1998 | A |
5844569 | Eisler et al. | Dec 1998 | A |
5892521 | Blossom et al. | Apr 1999 | A |
5953506 | Kalra et al. | Sep 1999 | A |
6058143 | Golin | May 2000 | A |
6078339 | Meinerth et al. | Jun 2000 | A |
6141020 | Larson | Oct 2000 | A |
6141023 | Meinerth et al. | Oct 2000 | A |
6167084 | Wang et al. | Dec 2000 | A |
6266072 | Koga et al. | Jul 2001 | B1 |
6366704 | Ribas-Corbera et al. | Apr 2002 | B1 |
6434197 | Wang et al. | Aug 2002 | B1 |
6526583 | Auld et al. | Feb 2003 | B1 |
6570571 | Morozumi | May 2003 | B1 |
6842180 | Maiyuran et al. | Jan 2005 | B1 |
6909432 | Alcorn et al. | Jun 2005 | B2 |
6933943 | Alcorn | Aug 2005 | B2 |
7293170 | Bowler et al. | Nov 2007 | B2 |
7522167 | Diard et al. | Apr 2009 | B1 |
7558428 | Shen et al. | Jul 2009 | B2 |
7593543 | Herz et al. | Sep 2009 | B1 |
7626637 | Chiu et al. | Dec 2009 | B2 |
7673304 | Gosalia et al. | Mar 2010 | B2 |
20010033619 | Hanamura et al. | Oct 2001 | A1 |
20020009143 | Arye | Jan 2002 | A1 |
20020015092 | Feder et al. | Feb 2002 | A1 |
20020075954 | Vince | Jun 2002 | A1 |
20030158987 | MacInnis et al. | Aug 2003 | A1 |
20030227974 | Nakamura et al. | Dec 2003 | A1 |
20040170330 | Fogg | Sep 2004 | A1 |
20040266529 | Chatani | Dec 2004 | A1 |
20050024363 | Estrop | Feb 2005 | A1 |
20050047501 | Yoshida et al. | Mar 2005 | A1 |
20060056708 | Shen et al. | Mar 2006 | A1 |
20060087553 | Kenoyer et al. | Apr 2006 | A1 |
20060114260 | Diard | Jun 2006 | A1 |
20070025441 | Ugur et al. | Feb 2007 | A1 |
20070091997 | Fogg et al. | Apr 2007 | A1 |
20070217518 | Valmiki et al. | Sep 2007 | A1 |
20080122860 | Amann et al. | May 2008 | A1 |
20080276262 | Munshi et al. | Nov 2008 | A1 |
20090016644 | Kalevo et al. | Jan 2009 | A1 |
20090066716 | Meulen | Mar 2009 | A1 |
Number | Date | Country |
---|---|---|
WO 2007148355 | Dec 2007 | IT |
9819238 | May 1998 | WO |
Entry |
---|
The Guru of 3D: “ATI Avivo Xcode pack for HD4800 series”, Jun. 27, 2008, pp. 1-3, URL:http://downloads.guru3d.com/ATI-Avivo-Xcode-pack-for-HD4800-series-download-1973.html, (last visited Apr. 24, 2012). |
“The new ATI radeon HD 4800 Graphics Card”, Aug. 8, 2008, pp. 1-2, URL:http://www.hightech-edge.com/amd-ati-radeon-hd-4800-series-graphics-card/1956/, (last visited Apr. 24, 2012). |
Guobin Shen et al., Accelerating Video Decoding Using GPU, Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP 2003), Apr. 6-10, 2003, Hong Kong, china, vol. 4, pp. IV—772-IV—775. |
Form PCT/IB/326, “PCT Notification Concerning Transmittal of International Preliminary Report on Patentability”, 1 pg., May 19, 2011. |
Form PCT/IB/373, “PCT International Preliminary Report”, 1 pg., May 10, 2011. |
Form PCT/ISA/237, “PCT Written opinion of the International Searching Authority”, 4 pgs., May 10, 2011. |
Form PCT/ISA/210, “International Search Report”, 3 pgs. Dec. 28, 2010. |
Papakipos, Matthew; “The Peakstream Platform: High Productivity Software Development for Multi-Core Processors”; XP-002502799; Apr. 10, 2007, pp. 1-12. |
Stokes, Jon; “PeakStream unveils multicore and CPU/GPU programming solution”; XP-002502800; Sep. 18, 2006; 3 pages. |
Tarditi, David et al.; “Accelerator: Using Data Parallelism to Program GPUs for General-Purpose Uses”; Oct. 21-25, 2006; pp. 325-335. |
RT'06 Poster Compendium; 2006 IEEE Symposium on Interactive Ray Tracing; University of Utah, Salt Lake City, Utah, Sep. 18-20, 2006; 20 pages. |
Non-Final Office Action in related application, U.S. Appl. No. 12/189,060, dated Mar. 7, 2014, 18 pages. |
Number | Date | Country | |
---|---|---|---|
20120243601 A1 | Sep 2012 | US |
Number | Date | Country | |
---|---|---|---|
60928799 | May 2007 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 12264892 | Nov 2008 | US |
Child | 13492281 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 12189060 | Aug 2008 | US |
Child | 12264892 | US | |
Parent | 11960640 | Dec 2007 | US |
Child | 12189060 | US |