Cloud Gaming
Computer games have become one of the most dynamic and fastest changing technological areas. One approach to providing content rich games on mobile devices is to stream the 3D graphic contents as traditional video content (ordered sequences of individual still images). The idea is to define a client-server architecture where modern video streaming and cloud computing techniques are exploited to allow clients with thin computing and rendering resources to provide their users with interactive visualization of 3D environments and data sets.
There have been proposals for streaming 3D graphics commands and letting the client render the game contents, such as by Tzruya et al., in “Games@Large—a new platform for ubiquitous gaming and multimedia”, Proceedings of BBEurope, Geneva, Switzerland, December 2006, which is incorporated by reference as if set forth in full herein. However, the paradigm may change due to the emergence of cloud computing. The concept of cloud-based multi-player on-line gaming is to shift the graphic rendering operations from the local client to the server in the cloud center and stream the rendered game contents to end users in form of video. Such services have been offered by vendors such as Otoy and Onlive. The new service heavily relies on low-latency video streaming technologies. It demands rich interactivity between clients and servers and low delay video transmission from the server to the client. Many technical issues for such a system were discussed by Tzruya et al., discussed above, and also by A. Jurgelionis et al., in “Platform for Distributed 3D Gaming”, International Journal of Computer Games Technology”, 2009, the later of which is also incorporated by reference as if set forth in full herein. It remains needed, however, to develop highly efficient encoding schemes that generate a more uniform bit-rate throughput to avoid the buffer delay and network latency.
Video Compression, Generally
Conventional video compression methods are based on reducing the redundant and perceptually irrelevant information of video sequences (an ordered series of still images).
Redundancies can be removed such that the original video sequence can be recreated exactly (lossless compression). The redundancies can be categorized into three main classifications: spatial, temporal, and spectral redundancies. Spatial redundancy refers to the correlation among neighboring pixels. Temporal redundancy means that the same object or objects appear in the two or more different still images within the video sequence. Temporal redundancy is often described in terms of motion-compensation data. Spectral redundancy addresses the correlation among the different color components of the same image.
Usually, however, sufficient compression cannot be achieved simply by reducing or eliminating the redundancy in a video sequence. Thus, video encoders generally must also discard some non-redundant information. When doing this, the encoders take into account the properties of the human visual system and strive to discard information that is least important for the subjective quality of the image (i.e., perceptually irrelevant or less relevant information). As with reducing redundancies, discarding perceptually irrelevant information is also mainly performed with respect to spatial, temporal, and spectral information in the video sequence.
The reduction of redundancies and perceptually irrelevant information typically involves the creation of various compression parameters and coefficients. These often have their own redundancies and thus the size of the encoded bit stream can be reduced further by means of efficient lossless coding of these compression parameters and coefficients. The main technique is the use of variable-length codes.
Video compression methods typically differentiate images that can or cannot use temporal redundancy reduction. Compressed images that do not use temporal redundancy reduction methods are usually called INTRA or I-frames, whereas temporally predicted images are called INTER or P frames. In the INTER frame case, the predicted (motion-compensated) image is rarely sufficiently precise, and therefore a spatially compressed prediction error image is also associated with each INTER frame.
In video coding, there is always a trade-off between bit rate and quality. Some image sequences may be harder to compress than others due to rapid motion or complex texture, for example. In order to meet a constant bit-rate target, the video encoder controls the frame rate as well as the quality of images. The more difficult the image is to compress, the worse the image quality. If variable bit rate is allowed, the encoder can maintain a standard video quality, but the bit rate typically fluctuates greatly.
H.264/AVC (Advanced Video Coding) is a standard for video compression. The final drafting work on the first version of the standard was completed in May 2003 (Joint Video Team of ITU-T and ISO/IEC JTC 1, Draft ITU-T Recommendation and Final Draft International Standard of Joint Video Specification (ITU-T Rec. H.264|ISO/IEC 14496-10 AVC), Doc. JVT-G050, March 2003) and is incorporated by reference as if set forth in full herein. H.264/AVC was developed by the ITU-T Video Coding Experts Group (VCEG) together with the ISO/IEC Moving Picture Experts Group (MPEG). It was the product of a partnership effort known as the Joint Video Team (JVT). The ITU-T H.264 standard and the ISO/IEC MPEG-4 Part 10 (AVC) standard are jointly maintained so that they have identical technical content. H.264/AVC is used in such applications as players for Blu-ray Discs, videos from YouTube and the iTunes Store, web software such as the Adobe Flash Player and Microsoft Silverlight, broadcast services for DVB and SBTVD, direct-broadcast satellite television services, cable television services, and real-time videoconferencing.
The coding structure of H.264/AVC is depicted in
The H.264/AVC standard is actually more of a decoder standard than an encoder standard. This is because while H.264/AVC defines many different encoding techniques which may be combined together in a vast number of permutations and each technique having numerous customizations, an H.264/AVC encoder is not required to use any of them or use any particular customizations. Rather, the H.264/AVC standard specifies that an H.264/AVC decoder must be able to decode any compressed video that was compressed according to any of the H.264/AVC defined compression techniques.
Along these lines, H.264/AVC defines 17 sets of capabilities, which are referred to as profiles, targeting specific classes of applications. The Extended Profile (XP), intended as the streaming video profile, provides some additional tools to allow robust data transmission and server stream switching. Many of the available coding tools according to different profiles is shown in
We use the H.264/AVC video coding standard as the basis and make numerous fine-tuning so that it can meet the stringent needs of the real-time on-line gaming requirement.
Characteristics of Game Contents
In the conventional H.264/AVC coding scheme, an intra frame (I frame) consumes a bit rate which is 5-10 times more as compared with that of an inter frame as shown in
To illustrate this phenomenon in the context of cloud gaming and to test embodiments of the methods and systems described herein, several test video sequences were selected. The gaming contents of the test video sequences were classified into four categories according to their usage as follows:
To analyze the gaming contents, the test sequences were compressed using various quantization parameters (QP=12, 24, 36). The experimental results are summarized in Table 1 where “Compression Ratio” represents the ratio between compressed data size and uncompressed data size. Uncompressed data size is in YUV 4:2:0 format.
The graphs of bandwidth used over time for several of the video segments in Table 1 are shown in
Comparing the above figures, we can see that the results are very content sensitive.
Overview
In many embodiments a new coding scheme is used that scatters the number of intra frame coding bits across multiple frames. Here, we propose ways to modify the video encoding algorithm for H.264/AVC so that it can offer a nearly constant-bit-rate output. It consists of three sub-tasks as follows:
In H.264, a picture is partitioned into fixed-size macroblocks that each covers a rectangular picture area of 16×16 samples of the luma component and 8×8 samples of each of the two chroma components. This partitioning into macroblocks has been adopted in all previous video coding standards, such as MPEG-4 Visual and H.263. Macroblocks (MB) are the basic building blocks of the standard for which the decoding process is specified. Hence, an MB is coded independently and each MB coding type (MB_type) can be determined while keeping the bit-stream compatible with the syntax of the standard H.264/AVC decoder.
A slice is a sequence of macroblocks which are processed in the order of a raster scan, so a picture maybe split into one or several slices as shown in
Each slice can be coded using different coding types as follows.
I slice: A slice in which all MBs of the slice are coded using intra prediction.
P slice: In addition to the coding types of the I slice, some MBs of the P slice can also be coded using inter prediction with at most one motion-compensated prediction signal per prediction block.
B slice: In addition to the coding types available in a P slice, some MBs of the B slice can also be coded using inter prediction with two motion-compensated prediction signals per prediction block.
Since each slice of a coded picture should be decoded independently of the other slices of the picture, the H.264/AVC design enables sending and receiving the slices of the picture in any order relative to each other. So, any kinds of prediction methods, such as the motion estimation and intra prediction method cannot be used normally because additional information from out of the slice is not allowed. Hence, it is expected to lose coding performance as the number of slices increases. Under many typical circumstances, the coding performance degrades about 10% for each additional slice. In many embodiments of video encoders designed for achieving a more uniform bit rate, at least four slices for a given frame are used. So, in embodiments adding four slices, a coding performance degradation of about 40% is expected to provide the uniform bit rate video coding functionality.
Basic Coding Unit (BCU) with the Intra Macroblock Allocation (IMA) Map
Therefore, we propose a new type of coding unit called the basic coding unit (BCU). The BCU is similar to the concept of Slice as defined in the Extended Profile. Each macroblock can be assigned freely to a BCU based on a predefined IMBA map (Intra Macroblock Allocation map) shown in
With this technique, we can provide a uniform output bit rate without losing any coding performance as depicted in
Bit Allocation between Frames
Now, we can allocate appropriate bit budgets over various frames in a video game based on the bandwidth requirement and the video content characteristics. Since each MB of a BCU can be encoded independently, different quantization parameters can be assigned to different MBs of a BCU to result in a bit stream that has a more uniform output bit rate at the encoder. For the first intra frame and scene change, we can also employ larger quantization parameter to minimize bit rate fluctuation. As shown in
Reduction of Computational Complexity
The H.264 standard achieves higher compression efficiency than previous video coding standards with the rate-distortion optimized (RDO) method for mode decision. The outstanding coding performance of H.264, however, comes with the cost of significantly higher complexity, making it too complex to be applied widely. Therefore, this research has focused on the computational complexity reduction for H.264 coding standard, making it feasible to perform real-time encoding on a personal computer. We propose a fast mode decision algorithm using early SKIP mode decision and combined motion estimation and mode decision.
Since H.264/AVC provides many coding options (or functions) to achieve the higher coding efficiency, we cannot use the all the coding options for real-time encoding software. Hence, select several efficient options need to be selected. To evaluate the encoding time of each option, the following calculation of time difference (ΔTime) is defined by
where TFull
The SKIP mode refers to the 16×16 mode where neither motion nor residual information is encoded. It has the lowest complexity in the mode decision process since no motion search is required. Hence, if we determine the SKIP mode at an early stage, we can significantly reduce the encoding time by skipping the other inter modes. In order to determine whether the best MB mode is SKIP or not, we calculate rate-distortion cost for SKIP mode, Kmode-nonzero (SKIP), which represents the sum of absolute level of nonzero DCT coefficients. The value of Jmode-nonzero (SKIP) is calculated as following steps:
THSKIP
In order to show the efficiency of the developed uniform bitrates coding method, various gaming contents have been used in the experiments and the distribution of each bitstream has been compared in