The present invention relates to the field of video processing and video encoding. In particular, but not by way of limitation, the present invention discloses techniques for allowing multiple local video images to be created locally and then encoded for transmission to a remote location in an efficient manner.
Centralized computer systems with multiple terminal systems for accessing the centralized computer systems were once the dominant computer architecture. These mainframe or mini-computer systems were shared by multiple computer users wherein each computer user had access to a terminal system coupled to the mainframe computer.
In the late 1970s and early 1980s, semiconductor microprocessors and memory devices allowed the creation of inexpensive personal computer systems. Personal computer systems revolutionized the computing industry by allowing each individual computer user to have access to their own full computer system. Each personal computer user could run their own software applications and did not need to share any of the personal computer's resources with any other computer user.
Although personal computer systems have become the dominant form of computing, there has been a resurgence of the centralized computer with multiple terminals form of computing. Terminal systems can have reduced maintenance costs since terminal users cannot easily introduce viruses into the main computer system or load in unauthorized computer programs. Furthermore, modern personal computer systems have become so powerful that the computing resources in these modern personal computer systems generally sit idle for the vast majority of the time.
In the drawings, which are not necessarily drawn to scale, like numerals describe substantially similar components throughout the several views. Like numerals having different letter suffixes represent different instances of substantially similar components. The drawings illustrate generally, by way of example, but not by way of limitation, various embodiments discussed in the present document.
The following detailed description includes references to the accompanying drawings, which form a part of the detailed description. The drawings show illustrations in accordance with example embodiments. These embodiments, which are also referred to herein as “examples,” are described in enough detail to enable those skilled in the art to practice the invention. It will be apparent to one skilled in the art that specific details in the example embodiments are not required in order to practice the present invention.
This document will focus on exemplary embodiments that are mainly disclosed with reference to multiple thin-client terminal systems sharing a main server system. However, the teachings of this document can be used in other environments. For example, a video distribution system that distributes multiple different video feeds to multiple different video display systems could use the teachings of this document. The example embodiments may be combined, other embodiments may be utilized, or structural, logical and electrical changes may be made without departing from the scope what is claimed. The following detailed description is, therefore, not to be taken in a limiting sense, and the scope is defined by the appended claims and their equivalents.
In this document, the terms “a” or “an” are used, as is common in patent documents, to include one or more than one. In this document, the term “or” is used to refer to a nonexclusive or, such that “A or B” includes “A but not B,” “B but not A,” and “A and B,” unless otherwise indicated. Furthermore, all publications, patents, and patent documents referred to in this document are incorporated by reference herein in their entirety, as though individually incorporated by reference. In the event of inconsistent usages between this document and those documents so incorporated by reference, the usage in the incorporated reference(s) should be considered supplementary to that of this document; for irreconcilable inconsistencies, the usage in this document controls.
Computer Systems
The present disclosure concerns digital video encoding that may be performed with digital computer systems.
Within computer system 100 there are a set of instructions 124 that may be executed for causing the machine to perform any one or more of the methodologies discussed herein. In a networked deployment, the machine may operate in the capacity of a server or a client machine in server-client network environment, or as a peer machine in a peer-to-peer (or distributed) network environment. The machine may be a personal computer (PC), a tablet PC, a set-top box (STB), a Personal Digital Assistant (PDA), a cellular telephone, a web appliance, a network router, switch or bridge, or any machine capable of executing a set of instructions (sequential or otherwise) that specify actions to be taken by that machine. Further, while only a single machine is illustrated, the term “machine” shall also be taken to include any collection of machines that individually or jointly execute a set (or multiple sets) of instructions to perform any one or more of the methodologies discussed herein.
The example computer system 100 includes a processor 102 (e.g., a central processing unit (CPU), a graphics processing unit (GPU) or both), a main memory 104 and a static memory 106, which communicate with each other via a bus 108. The computer system 100 also includes an alphanumeric input device 112 (e.g., a keyboard), a cursor control device 114 (e.g., a mouse or trackball), a disk drive unit 116, a signal generation device 118 (e.g., a speaker) and a network interface device 120.
In a computer system, such as the computer system 100 of
The disk drive unit 116 includes a machine-readable medium 122 on which is stored one or more sets of computer instructions and data structures (e.g., instructions 124 also known as ‘software’) embodying or utilized by any one or more of the methodologies or functions described herein. The instructions 124 may also reside, completely or at least partially, within the main memory 104 and/or within the processor 102 during execution thereof by the computer system 100, the main memory 104 and the processor 102 also constituting machine-readable media.
The computer instructions 124 may further be transmitted or received over a network 126 via the network interface device 120. Such network data transfers may occur utilizing any one of a number of well-known transfer protocols such as the well known File Transport Protocol (FTP).
While the machine-readable medium 122 is shown in an example embodiment to be a single medium, the term “machine-readable medium” should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers) that store the one or more sets of instructions. The term “machine-readable medium” shall also be taken to include any medium that is capable of storing, encoding or carrying a set of instructions for execution by the machine and that cause the machine to perform any one or more of the methodologies described herein, or that is capable of storing, encoding or carrying data structures utilized by or associated with such a set of instructions. The term “machine-readable medium” shall accordingly be taken to include, but not be limited to, solid-state memories (such as Flash memory), optical media, and magnetic media.
For the purposes of this specification, the term “module” includes an identifiable portion of code, computational or executable instructions, data, or computational object to achieve a particular function, operation, processing, or procedure. A module need not be implemented in software; a module may be implemented in software, hardware/circuitry, or a combination of software and hardware.
Modern Graphics Terminal Systems
Before the advent of the inexpensive personal computer system, the computing industry largely used mainframe or mini-computers that were coupled to many terminals such that the users at the various terminals could share the computer system. Such terminals were often referred to as ‘dumb’ terminals since the actual computing ability resided within the mainframe or mini-computer and the ‘dumb’ terminal merely displayed an output and accepted alpha-numeric input. No computer applications ran locally on the terminal system. Computer operators shared the mainframe computer among the multiple individual users at the individual terminals coupled to the mainframe computer. Most terminal systems generally had very limited graphic capabilities and were mostly only displaying alpha-numeric characters on the local display screen.
With the introduction of the inexpensive personal computer system, the use of dumb terminals rapidly diminished since personal computer systems were much more cost effective. If the services of a dumb terminal were required to interface with a legacy terminal based mainframe or mini-computer system, a personal computer system could easily execute a terminal program that would emulate the operations of a dumb terminal at a cost very similar to the cost of a dedicated dumb terminal.
During the personal computer revolution, personal computers introduced high resolution graphics to personal computer users. Such high-resolution graphic display systems allowed for much more intuitive computer user interfaces than the text-only displays of primitive computer terminals. For example, most personal computer systems now provide high-resolution graphical user interfaces that use multiple different windows, icons, and pull-down menus that are manipulated with an on-screen cursor and a cursor-control input device. Furthermore, multi-color high-resolution graphics allowed for sophisticated applications that used photos, videos, and graphical images.
In recent years, a new generation of terminal devices have been introduced into the computer market. This new generation of computer terminals includes high-resolution graphics capabilities that personal computer users have become accustomed to. These new computer terminal systems allow modern computer users to enjoy the advantages of traditional terminal-based computer systems. For example, computer terminal systems allow for greater security and reduced maintenance costs since users of computer terminals cannot easily introduce computer viruses by downloading or installing new software. Furthermore, most personal computer users do not require the full computing ability provided by modern personal computer systems since interaction with a human user is limited by the human user's relatively slow typing speed.
Modern terminal-based computer systems allow multiple users located at high-resolution terminal systems to share a single personal computer system and all of the software installed on that single personal computer system. In this manner, a modern high-resolution terminal system is capable of delivering the functionality of a personal computer system to multiple users without the cost and the maintenance requirements of having a personal computer system for each user. A category of these modern terminal systems is called “thin client” systems. Although the techniques set forth this document will mainly be disclosed with reference to thin-client systems, the techniques described herein are applicable in other area of the IT industry as well.
A Thin-Client System
The goal of each thin-client terminal system 240 is to provide most or all of the standard input and output features of a personal computer system to a user of the thin-client terminal system 240. However, in order to be cost-effective, this goal is to be done without providing the full computing resources or software of personal computer system in the thin-client terminal system 240 since those features will be provided by the thin-client server system 220 that will interact with the thin-client terminal system 240. In effect, each thin-client terminal system 240 will appear to its user as a full personal computer system.
From an output perspective, each thin-client terminal system 240 provides both a high-resolution video display system and an audio output system. Referring to the embodiment of
From an input perspective, thin-client terminal system 240 of
The thin-client terminal system 240 may include other input, output, or combined input/output systems in order to provide additional functionality. For example, the thin-client terminal system 240 of
The thin-client server system 220 is equipped with software for detecting coupled thin-client terminal systems 240 and interacting with the detected thin-client terminal systems 240 in a manner that allows each thin-client terminal system 240 to appear as an individual person computer system. As illustrated in
Transporting Video Information to Terminal Systems
The communication channel 230 bandwidth required to deliver a continuous sequence of digital video frames from the thin-client server computer system 220 to thin-client terminal system 240 can quite large. In an environment wherein a shared computer network is used to transport video information to several thin-client terminal systems 240 (such as the thin-client terminal system environment illustrated in
When the computer applications run by the user of the thin-client terminal systems 240 are typical office work applications (word processors, databases, spreadsheets, etc.) that change the information on the display screen on a relatively infrequent basis, then there are simple methods that can be used to greatly decrease the amount of video display information delivered over the network while maintaining a high-quality user experience. For example, the thin-client server system 220 may only send video information across the communication channel 230 to a thin-client terminal system 240 when that video information changes. In this manner, when the video display screen for a particular thin-client terminal system 240 is static, then no video information needs to be transmitted from the thin-client server system 220 to that thin-client terminal system 240.
Three-Dimensional Graphics
Once reserved for very high end workstations, hardware-based three-dimensional (3D) graphics technology is now available for personal computers including the economical and portable models. The widespread availability of hardware-based 3D graphics technology has made 3D graphics hardware ubiquitous in personal computer hardware and many applications utilize the 3D graphics hardware. For example, the video display adapter 110 of
In an example embodiment, a method of providing improved 3D graphics support for terminal systems is disclosed that may rely on 3D graphics hardware already existing in the physical server machine where the virtual machine or terminal server is running. A terminal server is a server application that interfaces a multitude of remote terminal systems. The terminal server application shares the resources of a single server, creating a graphic interface dedicated to each terminal session as illustrated in
Most modern personal computers have a graphics chip with at least some 3D graphics technology features. These 3D graphics chips generally maintain both three-dimensional and two-dimensional representations of a screen. The three-dimensional representation may be a set of 3D object models and the coordinates and orientation of those object models within a three-dimensional space. The two 2D representation is how the three-dimensional object models would appear to a viewer in that three-dimensional space placed at a defined set of coordinates and with a defined viewing direction.
Example uses of three-dimensional graphics technology include high-end drawing functionality such as computer-aided design (CAD) and consumer products such as high-end video games. In 3D games, a 3D scene is updated in real time based upon a user's actions and the updated 3D scene is rendered into a 2D memory buffer. The 3D graphics hardware is used to aid the computer system in rendering the 2D representation from the 3D representation The 2D buffer has the exact representation of what is displayed on the display screen attached to the computer system with 3D graphics hardware.
Much of the time, the powerful 3D graphics chips within a personal computer system are not being used for CAD or high-end video games. In fact, most personal computer users only require of a small portion of the computing potential in the personal computer. In an example embodiment, the 3D graphics system in a computer system is configured to render 3D graphics on multiple different virtual screens thus sharing the 3D rendering capabilities of one 3D graphics chip with multiple users on the same computer system. This embodiment may be deployed for users on virtual machines as well users on terminal servers.
Drivers are provided for allowing a single 3D graphics processing hardware device to create multiple different “virtual 3D graphic cards”. In this document, a virtual 3D graphics card is a software entity that acts as a 3D graphics card for a terminal session. Each virtual 3D graphics card may or may not use the features of a real 3D graphics hardware device in a system. In example embodiments, a virtual 3D graphics card instance is created when either a new terminal session or a new virtual machine is launched. The new virtual 3D graphics card instance will pretend to be a physical 3D graphics card for the terminal session or virtual machine that may use a share of the physical 3D graphics hardware in the server system.
In an example embodiment, a system with many terminal server sessions or virtual machines each having a virtual 3D graphics card may be configured. Usually, only a few of the terminal sessions will actually require 3D rendering. However, in an example embodiment, sharing the physical 3D graphics hardware with multiple users running 3D applications may lower the frame rate for each terminal server but still delivers a good user experience. Each terminal session initiated may be associated with one or more threads of a plurality threads provided in the 3D graphics hardware.
Various different schemes may be used to share a 3D graphics chip among multiple terminal sessions. In one example embodiment, a context switching architecture is implemented. For example, the entire graphics pipeline may be executed for one terminal session and then flushed before context switching to another terminal session occurs.
In another example embodiment, the 3D graphics pipeline may be segmented. In such an embodiment, each pipeline segment may have an independent job such that task switching is performed on the pipeline segment scale.
The 3D graphics chip, in accordance with an example embodiment, may have a single or multiple 2D frame buffers. In an example embodiment, a “multi-head” 3D graphic chip is provided that supports multiple 2D frame buffers. The number of independent 2D frame buffers supported by a 3D graphics chip can be limited. In these cases, memory management may be implemented to swap 2D frame buffers when terminal sessions are switched.
Once the operation phase begins, the operating system on the terminal server system (under direction of the terminal session) may then render a virtual desktop using the virtual 3D graphics card at stage 340. The virtual 3D graphics card will render the virtual desktop in a 2D frame buffer. The virtual desktop content may then be displayed remotely by transmitting information from the 2D buffer at stage 350. For example, the display information in the frame buffer may be transmitted to a networked thin-client terminal system which may or may not include a CPU.
Referring to
Next, at stage 430, the terminal server or hypervisor may connect the virtual 3D graphics card 315 to the multi-thread, multi-tasking capable physical 3D graphics chip on the graphics adapter 110 of the server system. This connection may be done in a time-sharing manner such that each virtual 3D graphics card 315 only gets a time slice of the physical 3D graphics chip on the graphics adapter 110. The terminal server or hypervisor may also connect the application session to the inputs of the physical 3D graphics chip on the graphics adapter 110 through the virtual 3D graphics adapter 315 at stage 440. At this point, the initialization for the new session is complete.
Applications may then be launched within the session using 3D or 2D technology to draw the screen (e.g., a desktop image for a local or remote display device) at stage 450. The applications will use the virtual 3D graphics adapter 315. At stage 460, the virtual 3D graphics adapter 315 will access the physical 3D graphics chip to translate a 3D scene model into a 2D representation and store the translated result in 2D screen buffer 215 associated with the session and virtual 3D graphics adapter 315. The 2D screen buffer 215 may then be transmitted to the associated thin-client terminal 240 as set forth in stage 470. As illustrated in
In another example embodiment, multiple virtual 3D graphics accelerators are provided using a plurality of threads in a GPU. Each thread may be assigned to a session associated with a networked terminal device. In an example the networked terminal device may be a thin client which may or may not include a CPU. In an example embodiment, each session has a fully assigned thread and processing is not shared with other threads. Thus processing of different session with different terminal devices may not be shared. The 2D image data from the server system may be communicated to the networked terminal system using TCP/IP or any other network protocol.
Difficulty of Transporting Full Motion Video Information to Terminal Systems
Referring back to
When full motion video must be transmitted digitally, video compression systems are generally used in order to greatly reduce the amount of bandwidth needed to transport the video information. Thus, a digital video decoder may be implemented in thin-client terminal systems 240 in order to reduce the communication channel bandwidth used when a user executes an application that displays full-motion video.
Video compression systems generally operate by taking advantage of the temporal and spatial redundancy in nearby video frames. For efficient digital video transmission, video information is encoded (compressed) at a video origination site, transmitted in encoded form across a digital communication channel (such as a computer network), decoded (decompressed) at the destination site, and then displayed on a display device at the destination site. Many well-known digital video encoding systems exist such as MPEG-1, MPEG-2, MPEG-4, and H.264. These various digital video encoding systems are used to encode DVDs, digital satellite television, and digital cable television broadcasts.
Implementing digital video encoding and video decoding systems is relatively easy on a modern personal computer system that is dedicated to a single user since there is plenty of processing power and memory capacity available for the task. However, in a multi-user thin-client terminal system environment as illustrated in
Similarly, one of the primary goals for a multi-user thin-client system is to keep the construction of the thin-client terminal systems 240 as simple and inexpensive as possible. Thus, constructing a thin-client terminal system with a main computer processor with sufficient processing power to handle digital video decoding in the same manner as it is handled in personal computer system may not be cost efficient. Specifically, a thin-client terminal system 240 that could handle video decoding with generalized processor would require a large amount of memory in order to store the incoming data, storage space for the decoder code, the ability to perform dynamic updates, and sufficient processing power to execute the sophisticated digital video decoder routines such that the thin-client terminal system 240 would become expensive to develop and manufacture.
Integrating Full Motion Video Decoders in Terminal Systems
To efficiently implement full-motion video decoding in thin-client terminal systems, the thin-client terminal systems 240 may be implemented with one or more inexpensive dedicated digital video decoder integrated circuits. Such digital video decoder integrated circuits would relieve a main processor in the thin-client terminal system 240 from the difficult task of video decoding.
Dedicated digital video decoder integrated circuits have become relatively inexpensive due a mass marketplace for digital video devices. For example, DVD players, portable video playback devices, satellite television receivers, cable television receivers, terrestrial high-definition television receivers, and other consumer products must all incorporate some type of digital video decoding circuitry. Thus, a large market of inexpensive digital video decoder circuits has been created. With the addition of one or more inexpensive dedicated video decoder integrated circuits, a thin-client terminal system that is capable of handling digitally encoded video can be implemented at a relatively low cost.
The digital video decoders that are selected for implementation within the thin-client terminal system 240 are selected for ubiquity and low implementation cost in a thin-client system architecture. If a particular digital video decoder is ubiquitous but expensive to implement, it will not be practical due to the high cost of the digital video decoder. However, this particular case is generally self-limiting since any digital video decoder that is expensive to implement does not become ubiquitous. If a particular digital video decoder is very inexpensive but decodes a digital video encoding that is only rarely used within a personal computer environment then that digital video decoder will not be selected since it is not worth the cost of adding a digital video decoder that will rarely be used.
Although, dedicated video decoder integrated circuits have been discussed, the video decoders for use in a thin-client terminal system 240 may be implemented with many different methods. For example, the video decoders may be implemented with software that runs on a processor, as discrete off-the-shelf hardware parts, or as decoder cores that are implemented with an Application Specific Integrated Circuit (ASIC) or a Field Programmable Gate Array. In one embodiment, a licensed video decoder as part of an Application Specific Integrated Circuit (ASIC) was selected since other portions of the thin-client terminal system 240 could also be implemented on the same ASIC.
Integrating Full Motion Video Encoders in a Thin-Client Server Systems
The integration of digital video decoders into thin-client terminal systems only solves a portion of the full-motion video problem, the digital video decoding portion. To take advantage of the integrated digital video decoders, the thin-client terminal server system must be able to transmit encoded video to the thin-client terminal systems. One system for implementing video encoding within a thin-client server system 220 is illustrated in
Referring to
To help handle full-motion video, the present disclosure supports the virtual graphics card 531 with access to digital video decoders 532 and video digital transcoders 533. The digital video decoders 532 and digital video transcoders 533 are used to handle digital video encoding systems that are not directly supported by the digital video decoder(s) in a target thin-client terminal system 240. Specifically, the video decoders 532 and video transcoders 532 help the virtual graphics card 531 handle digital video encoding streams that are not natively supported by the digital video decoder(s) (if any) in thin-client terminal systems. The decoders 532 are used to decode video streams and place the data thin-client screen buffer 215. The transcoders 533 are used to convert from a first digital video encoding format into a second digital video encoding format. In this case, the second digital video encoding format will be a digital video encoding format natively supported by a target thin-client terminal device.
The transcoders 533 may be implemented as digital video decoder for decoding a first digital video stream into individual decoded video frames, a frame buffer memory space for storing decoded video frames, and a digital encoder for re-encoding the decoded video frames into a second digital video format. This enables the transcoders 533 to use existing video decoders on the personal computer system. Furthermore, the transcoders 533 could share the same video decoding software used to implement video decoders 532. Sharing code would reduce licensing fees.
To best describe video system transport system of the terminal server system 220, its operation will be described with reference to the flow diagram of
After the terminal session has been initialized and the virtual graphics card 531 has been created, the virtual graphics card 531 is ready to accept display requests from the associated application session 205 and the operating system 222 at step 630 in
Referring back to step 640, if the new display request presented to the virtual graphics card 531 is for a digital video stream to be displayed then the virtual graphics card 531 proceeds to step 650. At step 650, the virtual graphics card 531 determines if the associated thin-client terminal system 240 includes the appropriate digital video decoder needed to decode the digital video stream. If the associated thin-client terminal system 240 does have the appropriate video decoder, then the virtual graphics card 531 proceeds to step 655 where the virtual graphics card 531 can send the video stream directly to the associated thin-client terminal system 240. This is illustrated on
Handling Unsupported Encoded Video Requests
Referring back to step 650 of
At step 660, the virtual graphics card 531 determines if transcoding of the unsupported video stream presented to the virtual graphics card 531 is possible and desirable. Transcoding is the process of converting a digital video stream from a first video encoding format into another video encoding format If transcoding of the video stream is possible and desirable, then the virtual graphics card 531 proceeds to step 665 where the video stream is provided to transcoder software 533 to transcode the video stream into an encoded video stream that is supported by the associated thin-client terminal system 240. Note that in some circumstances it may be possible to transcode a video stream but not desirable to do so. For example, transcoding can be processor intensive task and if the thin-client server system already has a heavy processing load then it may not be desirable to transcode the video stream. This may be true even if the transcoding is performed in lossy manner that reduces quality in order to perform the transcoding quickly.
Referring back to step 660, if transcoding is not possible or not desirable then the virtual graphics card 531 may proceed to step 670. At step 670, the virtual graphics card 531 sends the video stream to video decoder software 532 to decode the video stream. The video decoder software 532 will write the frames of video information into the appropriate screen buffer 215 for the associated application session 205. The frame encoder 217 of the thin-client server system 220 will read that bit mapped screen buffer 215 and transport that display information to the thin-client terminal system 240. Note that the frame encoder 217 has been designed to only transport changes to the screen buffer 215 to the associated thin-client terminal system 240. With full motion video, the changes may occur so frequently that updates may not be transmitted as fast as the changes are being made such that video displayed on the thin-client terminal system 240 may be missing many frames and appear jerky.
The system disclosed in
The original system is clearly inadequate since it was only designed to handle relatively static screen displays such as those created by simple office applications like word processors and spreadsheets. The resultant display at the thin-client terminal systems 240 may appear jerky and out of synchronization. Furthermore, the execution of the software decoder 532 will waste valuable processor cycles that could instead go to applications sessions 205. Finally, the inefficient encoding of the video information done by the frame encoder 217 would likely tax the bandwidth of the communication channel 230. Thus, use of the original system for full motion video is probably the least desirable solution.
The video transcoding option has similar problems. Various software developers have created software applications for transcoding a video stream from a first encoding system to another encoding system. For example, an MPEG encoded stream may be transcoded into a H.264 video stream. However, video transcoding is a very computationally intensive operation if it is to be performed with minimal quality loss. In fact, even with modern microprocessors, a good quality transcoding operation may require multiple times the duration of the video file. For example, transcoding an encoded video file of one hour with DVD quality encoded in MPEG-2 into an equivalent file encoded in H.264 encoding may take from one to five hours if good quality is maintained even using a Quad Core Intel CPU running at 2.6 GHz. Such high-quality non real-time video transcoding is not an option for a real-time terminal system as illustrated in
Video Transcoding with Specialized Hardware
Video transcoding is a very specialized task that involves decoding an encoded video stream and then recoding the video stream in with an alternate video encoding system. Since different parts of a video image are generally not dependent upon each other, the task of transcoding lends itself to being divided and performed in parallel. Thus, the general purpose processor in a personal computer system is not the ideal system for transcoding. Instead, highly parallelized processor architectures are much better suited for the task of transcoding.
One type of highly parallelized processing architecture that is commonly available today is a Graphics Processor Unit (commonly referred to as a GPU). GPUs are specialized processors primarily designed for rendering three-dimensional graphical images in real-time within personal computer systems and videogame consoles. The GPU industry is currently dominated by nVidia, Inc. and ATI (a subdivision of Advanced Micro Devices). nVidia and ATI GPUs are designed with a large number of elementary processors on a single chip. Currently, state of the art nVidia graphics adapter cards have 240 processors also called stream processors. This large number of parallel processors will continue to grow in the future thus providing even better three-dimensional graphics rendering capabilities.
Due to their highly parallelized architecture, GPUs have proved to be very useful for performing compression for still images, full-motion video, and even audio. In comparison to the general purpose processor performing a transcoding operation of an hour of video data that may take 5 to 6 hours, the same transcoding operation may be performed by parallelized software running on a mid-range nVidia GPU in only 20 to 30 minutes. Since this is less than the one time length of the video, it can be performed in real-time. Allowing for some image quality degradation, the operation can be performed using even less of the GPU processing capabilities. And if this is performed within a system having a general purpose processor, that general purpose processor will be freed to operate on other tasks
Referring back to step 660 in
GPU Video Transcoding of Multiple Video Streams
The use of GPUs for transcoding has proven to be very effective. However, the system illustrated in
One the difficult aspects of time-division multi-tasking is the penalty imposed when switching between the different tasks. Specifically, when switching between different tasks, the full state of the processor for the current task must be stored and the full state of the next task must be completely loaded before the processor can continue. In GPU processors which have a highly parallelized architecture with deep processor pipelines, such task switching penalties are especially severe. The deep pipelines of the GPU processor must be emptied out, stored, and then reloaded for a task switch to occur. Thus, to improve upon transcoding performance, the present disclosure propose.
MPEG video encoding and its derivatives use a technique called intra-frame compression. While standards like MJPEG, DV and DVC compress frame by frame preserving each entire frame, the MPEG based standards only compress a few full independent frames called I-frames. The remaining frames are created by using information from other nearby frames. Specifically P-frames use information from other frames that previously occurred in the sequence and B-frames use information from frames that may occur before or after the current frame. Thus, between I-frames, MPEG standards create compressed frames (P-frames and B-frames) that contain only the changes between frames. An illustration of this is presented in
A problem with that technique is the inability to “cut” a video stream in its MPEG format at any arbitrary frame. Video editing applications accomplish such arbitrary cuts by decoding B-frames and P-frames and recoding those frames as I-frames. Specifically, all the frames in the time space between two I-frames can be fully decoded, creating all the original frames, cut and then re-encoded at that point. Applications such as hardware transcoding cannot do that since it would greatly impair the efficiency. Even if theoretically possible at the cost of reduced efficiency, it would greatly limit real-time applications. The inability to cut a stream at an arbitrary frame is part of the problem of doing multi-stream transcoding since it would greatly decrease the fixed time slot assigned to any stream for the hardware encoder.
To improve upon the art of multi-stream transcoding, the present disclosure introduces the idea transcoding multi-tasking based upon “chunks” of video defined by the existing I-frames in a video stream. The hardware encoder (either done with a GPU or a separate chip) receives a defined “chunk” of a video clip. The chunk is defined as two successive I-frames and all the other frames between those two I-frames. Task switching to the next chunk occurs after the current chunk is fully processed. In applications where the CPU does the actual decoding and raw uncompressed frames are passed to the hardware encoder for final compression, the CPU would pass a number of full frame equivalents to the number of frames included in a final chunk. This enables the hardware encoder to quickly compress that chunk and then switch to the next chunk. Over time, the series of chunks can be from any of the multiple active streams; whichever is next in line at the time that the hardware encoder is ready for the next chunk.
The video chunks will be compressed at a speed better than real time and then deposited in a stream buffer where they will rebuilt with the following video chunk without losing any video frames. The video chunks will be then streamed to the final destination at real-time speed.
Combined System with 3D Graphics and GPU Video Encoding
The teachings presented in the earlier sections may be combined to created a server system that uses specialized graphics hardware in the server system for performing both 3D graphics rendering and digital video encoding. In order to create such a system, the software for managing the sharing of the graphics hardware must be able to handle both 3D graphics rendering and digital video encoding tasks in its context switching architecture. Such context switching is well known in the art since most modern computer operating systems perform context switching in order to handle multiple applications running simultaneously on the same computer hardware.
The preceding technical disclosure is intended to be illustrative, and not restrictive. For example, the above-described embodiments (or one or more aspects thereof) may be used in combination with each other. Other embodiments will be apparent to those of skill in the art upon reviewing the above description. The scope of the claims should, therefore, be determined with reference to the appended claims, along with the full scope of equivalents to which such claims are entitled. In the appended claims, the terms “including” and “in which” are used as the plain-English equivalents of the respective terms “comprising” and “wherein.” Also, in the following claims, the terms “including” and “comprising” are open-ended, that is, a system, device, article, or process that includes elements in addition to those listed after such a term in a claim are still deemed to fall within the scope of that claim. Moreover, in the following claims, the terms “first,” “second,” and “third,” etc. are used merely as labels, and are not intended to impose numerical requirements on their objects.
The Abstract is provided to comply with 37 C.F.R. §1.72(b), which requires that it allow the reader to quickly ascertain the nature of the technical disclosure. The abstract is submitted with the understanding that it will not be used to interpret or limit the scope or meaning of the claims. Also, in the above Detailed Description, various features may be grouped together to streamline the disclosure. This should not be interpreted as intending that an unclaimed disclosed feature is essential to any claim. Rather, inventive subject matter may lie in less than all features of a particular disclosed embodiment. Thus, the following claims are hereby incorporated into the Detailed Description, with each claim standing on its own as a separate embodiment.
This application claims the benefit of U.S. Provisional Patent Application Ser. No. 61/032,045 filed Feb. 27, 2008 (“METHOD FOR VIRTUAL 3D GRAPHICS ACCELERATION”) and U.S. Provisional Patent Application Ser. No. 61/199,826 filed Nov. 19, 2008 (“SYSTEM AND METHOD FOR STREAMING MULTIPLE DIFFERENT VIDEO STREAMS”), both of which are incorporated herein by reference in their entirety.
Number | Date | Country | |
---|---|---|---|
61032045 | Feb 2008 | US | |
61199826 | Nov 2008 | US |