Remote gaming applications, in which a server-side game is controlled by a client-side player, have attempted to encode the video output from a three-dimensional (3D) graphics engine in real-time using existing or customized encoders. However, the interactive nature of video games, particularly the player feedback loop between video output and player input, makes game video streaming much more sensitive to latency than traditional video streaming. Existing video coding methods can trade computational power, and little else, for reductions in encoding time. New methods for integrating the encoding process into the video rendering process can provide significant reductions in encoding time while also reducing computational power, improving the quality of the encoded video, and retaining the original bitstream data format to preserve interoperability of existing hardware devices.
Existing video coding standards have only color and temporal information contained in image sequences to improve video encoding time, size, or quality. Some coding standards, such as those in the MPEG standard series, use a computationally intensive block-based motion estimation method to approximate object movement based on the color data contained in a video. These block-based motion estimation methods have historically provided significant reductions in the size of encoded video, but are a source of significant latency in real-time video streaming environments.
Integrating the encoding process into the video rendering process provides access to additional data sources that can be leveraged for encoding improvements. For instance, some 3D graphics engines, such as those contained in a game engine, may already generate motion vectors that perfectly describe the movement of each pixel on each video frame. By providing both the final rendered frame and injecting properly formatted motion vector data into the encoder, the most computationally-complex and time-consuming step in the video encoder, motion estimation, can be skipped for each inter frame. Additionally, the motion vectors supplied by the graphics engine will be more accurate than those approximated by a block-based motion estimation algorithm, which will improve the quality of the encoded video.
These two domains, video encoding and real-time graphics rendering, have traditionally operated separately and independently. By integrating the graphics engine and encoder to leverage the strengths of each, the encoding time can be reduced enough to support streaming applications that are hyper-sensitive to latency.
These and other attendant advantages of the invention will become apparent in view of the deficiencies in the technologies described below.
For example, U.S. Patent Application Publication No. 2015/0228106 A1 (“the '106 Publication”) discloses technology directed to decoding video data to generate a sequence of decoded blocks of a video image. The technology allows for the use of each decoded block of a video image as a separate texture for corresponding polygons of the geometric surface as the decoded block is generated by the codec engine. The '106 Publication technology describes integration between a codec engine that decodes encoded video data to generate the video image to be mapped and a 3D graphics engine that renders the display picture in part by performing the texture mapping of the video image to the geometric surface. However, this technology is deficient compared to the present invention at least because it does not disclose nor use a graphics engine that provides both the final rendered frame and properly formatted motion vector data for injection into the video codec engine, such that the video codec engine does not need to perform any motion estimation prior to transmitting encoded video data to the remote client coding engine. By contrast, the present invention's improvement to computer technology provides reductions in encoding time and computational power, improvement in the quality of the encoded video, and results the retention of the original bitstream data format in order to preserve interoperability.
U.S. Patent Application Publication No. 2011/0261885 A1 (“the '885 Publication”), discloses systems and methods directed to bandwidth reduction through the integration of motion estimation and macroblock encoding. In this system, the motion estimation may be performed using fetched video data to generate motion estimation related information, including motion vectors. These motion vectors may correspond to a current macroblock, using corresponding video data cached in the buffer. Again, the '885 Publication technology is deficient compared to the present invention at least because it does not disclose nor use a graphics engine that provides both the final rendered frame and properly formatted motion vector data for injection into the video codec engine, such that the video codec engine does not need to perform any motion estimation prior to transmitting encoded video data to the remote client coding engine. As such, the technology of the '885 Publication does not provide the same reductions in encoding time and computational power, and the improvement in the quality of the encoded video that the present invention offers.
As is apparent from the above discussion of the state of art in this technology, there is a need in the art for an improvement to the present computer technology related to video encoding in game environments.
It is therefore an object of the exemplary embodiments disclosed herein to address disadvantages in the art and provide systems and methods for graphics generation that use networked server architecture running a graphics engine, a video codec engine and a remote client coding engine to transmit encoded video data, whereby the graphics engine provides both the final rendered frame and properly formatted motion vector data for injecting into the video codec engine.
It is another object of the invention to provide systems and methods for graphics generation in which the video codec engine does not need to perform any motion estimation prior to transmitting encoded video data to the remote client coding engine.
It is yet another object of the invention to provide systems and methods for graphics generation in which the graphics engine converts per-pixel motion vectors into per-block motion vectors.
It is yet another object of the invention to provide systems and methods for graphics generation in which the per-pixel motion vectors are generated by using a compute shader to add the per-pixel motion vectors to camera velocity to obtain a per-pixel result, and in which the per-pixel result is stored in a motion vector buffer.
It is yet another object of the invention to provide systems and methods for graphics generation in which the per-block motion vector data is injected by the graphics engine into the video encoding engine in real-time, concurrently with a chroma subsampled video frame.
A more complete appreciation of the invention and many of the attendant advantages thereof will be readily obtained as the same becomes better understood by reference to the following detailed description when considered in connection with the accompanying drawings, wherein:
In describing the preferred embodiments of the invention illustrated in the drawings, specific terminology will be resorted to for the sake of clarity. However, the invention is not intended to be limited to the specific terms so selected, and it is to be understood that each specific term includes all technical equivalents that operate in a similar manner to accomplish a similar purpose. Several preferred embodiments of the invention are described for illustrative purposes, it being understood that the invention may be embodied in other forms not specifically shown in the drawings.
In applications where a 3D graphics engine is rendering video to be encoded and transmitted in real-time, the graphics engine and encoder can be more tightly coupled to reduce the total computation time and computational overhead. Per-pixel motion vector data that is already generated by the graphics engine for each video frame can be converted to per-block motion vector data and injected into the codec engine to circumvent the motion estimation step which is the single most complex and computationally-intensive step in the encoding process. In graphics engines that use the reconstruction filter for plausible motion blur method, per-pixel motion vectors may already be calculated for each video frame. The conversion from per-pixel motion vectors to per-block motion vectors can be performed by finding the mean vector for each macroblock of 16×16 pixels. The conversion is performed in the 3D graphics engine so that only a small fraction of the original motion vector data needs to be passed from the 3D graphics engine to the coding engine. In cases where the graphics engine and coding engine do not share memory, this will also help reduce memory bandwidth consumption. The per-block motion vectors are injected into the codec engine, skipping the motion estimation step entirely, without significantly modifying the rest of the encoding process.
In a commonly used implementation of motion blur, referred to as the reconstruction filter for plausible motion blur, the per-pixel velocities from the velocity buffer are first down sampled into a smaller number of tiles, where each tile assumes the max velocity from the pixel group. The tiles are then masked using the per-pixel depths in the accumulation buffer and the results applied to the per-pixel colors in the color buffer to generate motion blur. There are several variations on the reconstruction filter method which improve fidelity, performance, or both, but the concepts remain similar and a velocity buffer contains the per-pixel motion between two adjacent frames. Although ‘velocity’ is the term used in graphics engine terminology and ‘motion vector’ is the term used in video encoding terminology, the terms are functionally equivalent and a per-pixel velocity is the same thing as a per-pixel motion vector. The velocity buffer contains the supplemental data, in the form of per-pixel motion vectors, which will be reused in the video encoding process.
In step 204, the graphics engine 100 located at the server 120 converts the per-pixel motion vectors to per-block motion vectors based on the macroblock size to be used in encoding. The H.264 codec uses 16×16 pixel macroblocks by default and has the option to sub-divide further. The 256 per-pixel motion vectors can be averaged together to provide a single mean vector that will serve as the per-block motion vector. This process is described in further detail in connection with
In step 206, the per-macroblock motion vector information is injected into the coding engine/encoder 102 located at the server 120, bypassing the motion estimation step. In software implementations of the encoder, the motion estimation step can be completely disabled, which provides a significant savings in CPU computation time. The time savings in the CPU should more than offset the additional time required to calculate the average vectors in the GPU (in step 204) and transfer them to the CPU.
In step 208, because the per-block motion vectors supplied by the graphics engine 100 are interchangeable with those calculated in a typical motion estimation step, encoding begins from the motion compensation step onward (step 208). The rest of the video encoding process, as described in further detail in connection with
The H.264 encoder uses a default macroblock size of 16×16, but can be subdivided into smaller sizes down to 4×4. In the
Optional modifications can be made to the arithmetic mean transformation 312 to improve quality at the cost of additional computational complexity or power. For instance, vector median filtering techniques can be applied to remove discontinuities in the macroblock's vector field before the arithmetic mean computation to ensure that the per-macroblock motion vector 310 is representative of most pixels in the macroblock 306. Because the resultant per-macroblock motion vector is derived from pixel-perfect motion vectors that were originally computed based on known object-movement data, these per-macroblock motion vectors will always be a more accurate representation than those calculated by existing block-based motion estimation algorithms that can only derive movement based on pixel color data.
The motion vectors 404, having already been converted for the appropriate macroblock size, can be used immediately without any alteration to the motion compensation 406. The results of the motion compensation 406 are combined with the input chroma subsampled video frame 402 to form the residual image 430, which is processed by the residual transformation & scaling 408, quantization 410, and scanning 412 steps that typically occur within existing hardware or software video encoders.
The deblocking steps must be performed if the implementation's chosen decoding standard demands it. The deblocking settings 420 and deblocked image 428 are calculated by applying the coding standard's algorithms for inverse quantization 414, inverse transform & scale 416, then deblocking 418. The scanned coefficients 412 are combined with the deblocking settings 420 and encoded in the entropy coder 422 before being transmitted as a bit stream 108 to the remote client computer system 116 for decoding at the remote client computer system's codec 110. The deblocked image 428 becomes the input for the motion compensation 406 of the next frame. The bit stream (comprising encoded video data) 108 retains the same format as defined by the encoding standard used in the implementation such as H.264/MPEG-4 AVC. This example is specific to the H.264/MPEG-4 AVC standard, can be generally used for similar coding standards that use motion estimation 426 and motion compensation 406 techniques.
The motion estimation step in traditional H.264 compliant encoding is typically the most computationally-complex and time-consuming step. As discussed herein, reusing game-generated motion vectors can produce significant reductions in encoding time.
In the test environment, the graphics engine produced output at a resolution of 1280×720 at 60 frames per second. The encoding times were captured from an x264 encoder running single-threaded. Running the encoder single-threaded will produce encoding times longer than real-world usage but will normalize measurements to one core so they are directly comparable to each other. Encoding times were first measured using unmodified motion estimation within the encoder, then remeasured in the same environment using the game-generated motion estimation feature enabled.
A low motion area was selected comprising of a first-person player view of the player's hands, weapon, and a stationary wall. The player's hands and weapons cycle through a slight “bobbing” animation to produce a small amount of pixel motion in a relatively small amount of screenspace. The results of this test are reproduced in Table 1 below, which shows latency results with and without the game-generated motion estimation techniques described herein. At a low intensity, with the game-generated motion estimation disabled, the unmodified encoding time was 12 ms. When the game-generated motion estimation was enabled, the encoding time was reduced by 3 ms to an encoding time of 9 ms. Similar latency reductions were shown for average and high motion intensity scenarios, with decreases in latency of 17.6% for average motion intensity scenarios and between 15% to 30% latency reductions in high latency scenarios. These results demonstrate a notably significant reduction in latency when the game-generated motion estimation is enabled.
The test environment also revealed that there is an additional cost when converting the game-generated per-pixel motion vectors into per-macroblock motion vectors for the encoder. However, this cost is significantly less than the encoding time reductions described in the previous section. With the graphics engine producing video at a resolution of 1280×720, the motion vector transformation from per-pixel to per-macroblock took 0.02 ms. The measured encoder time savings are three orders of magnitude larger than the added cost of using game-generated motion vectors for encoding.
The foregoing description and drawings should be considered as illustrative only of the principles of the invention. The invention is not intended to be limited by the preferred embodiment and may be implemented in a variety of ways that will be clear to one of ordinary skill in the art. Numerous applications of the invention will readily occur to those skilled in the art. Therefore, it is not desired to limit the invention to the specific examples disclosed or the exact construction and operation shown and described. Rather, all suitable modifications and equivalents may be resorted to, falling within the scope of the invention.
This application claims the benefit of and is a continuation of application Ser. No. 16/290,468, filed Mar. 1, 2019, and a divisional of application Ser. No. 15/958,499, filed Apr. 20, 2018, now U.S. Pat. No. 10,567,788. Application Ser. No. 16/290,468 claims the benefit of the following U.S. Provisional Applications: No. 62/488,526, filed Apr. 21, 2017, and No. 62/596,325, filed Dec. 8, 2017. The contents of each of the foregoing applications is incorporated in its entirety herein.
Number | Name | Date | Kind |
---|---|---|---|
5778190 | Agarwal | Jul 1998 | A |
5886741 | Srivastava | Mar 1999 | A |
6173077 | Trew et al. | Jan 2001 | B1 |
6529613 | Astle | Mar 2003 | B1 |
6687405 | Trew et al. | Feb 2004 | B1 |
6885707 | Tardif | Apr 2005 | B2 |
6903662 | Rix et al. | Jun 2005 | B2 |
6996177 | Beuker | Feb 2006 | B1 |
7844002 | Lu et al. | Nov 2010 | B2 |
8069258 | Howell | Nov 2011 | B1 |
8154553 | Peterfreund | Apr 2012 | B2 |
8678929 | Nishimura et al. | Mar 2014 | B1 |
8854376 | Bhat et al. | Oct 2014 | B1 |
8873636 | Iwasaki | Oct 2014 | B2 |
9313493 | Maaninen | Apr 2016 | B1 |
9358466 | Kruglick | Jun 2016 | B2 |
9609330 | Puri et al. | Mar 2017 | B2 |
9661351 | Laan | May 2017 | B2 |
9665334 | Iwasaki | May 2017 | B2 |
9697280 | Maharajh et al. | Jul 2017 | B2 |
9705526 | Veernapu | Jul 2017 | B1 |
9736454 | Hannuksela et al. | Aug 2017 | B2 |
9749642 | Sullivan et al. | Aug 2017 | B2 |
9762911 | Puri et al. | Sep 2017 | B2 |
9762919 | Cote et al. | Sep 2017 | B2 |
9774848 | Jayant et al. | Sep 2017 | B2 |
20050013363 | Cho et al. | Jan 2005 | A1 |
20050047504 | Sung | Mar 2005 | A1 |
20050053294 | Mukerjee | Mar 2005 | A1 |
20050104889 | Clemie et al. | May 2005 | A1 |
20060230428 | Craig et al. | Oct 2006 | A1 |
20080175439 | Kurata | Jul 2008 | A1 |
20100026886 | Sharlet | Feb 2010 | A1 |
20110032991 | Sekiguchi | Feb 2011 | A1 |
20110206124 | Morphet | Aug 2011 | A1 |
20110261885 | de Rivaz | Oct 2011 | A1 |
20120075317 | Clemie et al. | Mar 2012 | A1 |
20120200583 | Clemie et al. | Aug 2012 | A1 |
20120213278 | Yasugi et al. | Aug 2012 | A1 |
20130101039 | Florencio | Apr 2013 | A1 |
20130294522 | Lim et al. | Nov 2013 | A1 |
20130321423 | Rossato | Dec 2013 | A1 |
20140286423 | Chen | Sep 2014 | A1 |
20140348238 | Morphet et al. | Nov 2014 | A1 |
20150071357 | Pang | Mar 2015 | A1 |
20150117519 | Kim | Apr 2015 | A1 |
20150228106 | Laksono | Aug 2015 | A1 |
20150245049 | Lee | Aug 2015 | A1 |
20150379727 | Golas et al. | Dec 2015 | A1 |
20160133221 | Peana et al. | May 2016 | A1 |
20160150231 | Schulze | May 2016 | A1 |
20160198166 | Kudana et al. | Jul 2016 | A1 |
20160227218 | Trudeau et al. | Aug 2016 | A1 |
20170013279 | Puri et al. | Jan 2017 | A1 |
20170132830 | Ha et al. | May 2017 | A1 |
20170155910 | Owen | Jun 2017 | A1 |
20170200253 | Ling | Jul 2017 | A1 |
20170278296 | Chui et al. | Sep 2017 | A1 |
20180054613 | Lin | Feb 2018 | A1 |
20180300839 | Appu | Oct 2018 | A1 |
Number | Date | Country |
---|---|---|
1820281 | Aug 2017 | EP |
H6121518 | Apr 1994 | JP |
H6129865 | May 1994 | JP |
2487489 | Jul 2013 | RU |
WO 2007008356 | Jan 2007 | WO |
WO-2009-042433 | Apr 2009 | WO |
WO-2009138878 | Nov 2009 | WO |
WO-2016172314 | Oct 2016 | WO |
Entry |
---|
Erturk et al., Two-bit transform for binary block motion estimation, Jul. 2005, IEEE, vol. 15, Issue 7. |
Wang, Z. et al., “Image Quality Assessment: From Error Visibility to Structural Similarity,” IEEE Transactions on Image Processing, 13(4), pp. 600-612, Apr. 2004. |
Moorthy, A.K., “Efficient Motion Weighted Spatio-Temporal Video SSIM Index,” Human Vision and Electronic Imaging XV, vol. 7527, Mar. 2010, (http://1ive.ecu.utexas.edu/publications/2010/moorthy_spie_jan10.pdf). |
BGR Microsoft Article, http://bqr.com/2018/03/16/microsoft-netfiix-for-games-subscription-cloud/. |
Verge's Article on Blade Technology, https://www.theverge.com/2018/2/21/17029934/blade-shadow-us-launch-netflix-for-pc-games-cloud-streaming-windows-10. |
Parsec TechCrunch Article, https://techcrunch.com/2017/12/19/is-the-time-finally-right-for-platform-agnostic-cloud-gaming/. |
Giesen, F., “Efficient Compression and Rendering in a Client-Server Setting”, May 2008, pp. 1-68. |
Liang Cheng et al.: “Real-time 3d Graphics Streaming Using Mpeg-4”, Proceedings of the IEEE/ACM Workshop on Broadband Wireless Services and Applications, ICS-UCI, Jul. 18, 2004, pp. 1-16. |
Alparaone L et al.: “Adaptively Weighted Vector-Median Filters for Motion-Fields Smoothing”, 1996 IEEE International Conference On Acoustics, Speech, and Signal Processing—Proceedings. (ICASSP), Atlanta, May 7-10, 1996 [IEEE International Conference On Acoustics, Speech, and Signal Processing—Proceedings (ICASSP)], New York, IEEE, US, vol. CONF, 21, May 7, 1996, pp. 2267-2270. |
“Final Office Action Issued in Korean Patent Application No. 10-2019-7033912”, dated Aug. 17, 2021, 5 Pages. |
“Office Action Issued in Korean Patent Application No. 10-2019-7033912”, dated Oct. 22, 2021, 4 Pages. |
“Office Action Issued in Taiwan Patent Application No. 107113590”, dated Apr. 15, 2019, 12 Pages. (w/o English Translation). |
“Office Action Issued in Taiwan Patent Application No. 109107924”, dated Oct. 30, 2020, 3 Pages. (w/o English Translation). |
“Final Office Action Issued in U.S. Appl. No. 15/958,499”, dated Jan. 30, 2019, 8 Pages. |
“Non Final Office Action Issued in U.S. Appl. No. 15/958,499”, dated Sep. 6, 2019, 10 Pages. |
“Non Final Office Action Issued in U.S. Appl. No. 15/958,499”, dated Aug. 15, 2018, 7 Pages. |
“Final Office Action Issued in U.S. Appl. No. 16/290,468”, dated Sep. 27, 2019, 9 Pages. |
“Non Final Office Action Issued in U.S. Appl. No. 16/290,468”, dated Jul. 8, 2019, 8 pages. |
“Non Final Office Action Issued in U.S. Appl. No. 16/736,490”, dated Dec. 1, 2021, 12 Pages. |
“International Search Report and Written Opinion Issued in PCT Application No. PCT/US18/028544”, dated Jul. 3, 2018, 9 Pages. |
“Search Report Issued in European Patent Application No. 18788077.8”, dated Jan. 21, 2021, 12 Pages. |
“Office Action Issued in United Kingdom Patent Application No. 1916979.6”, dated Aug. 16, 2021, 2 Pages. |
“Office Action Issued in Australian Patent Application No. 2018254550”, dated Jul. 7, 2020, 3 Pages. |
“Office Action Issued in Russian Patent Application No. 2019136490”, dated Apr. 27, 2020, 11 Pages. |
“Search Report Issued in Russian Patent Application No. 2020134091”, dated May 18, 2021, 8 Pages. |
“Office Action Issued in Australian Patent Application No. 2020267252”, dated Nov. 5, 2021, 3 Pages. |
“Office Action Issued in Canadian Patent Application No. 3059740”, dated Aug. 24, 2021, 4 Pages. |
Number | Date | Country | |
---|---|---|---|
20200404317 A1 | Dec 2020 | US |
Number | Date | Country | |
---|---|---|---|
62596325 | Dec 2017 | US | |
62488526 | Apr 2017 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 15958499 | Apr 2018 | US |
Child | 16290468 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 16290468 | Mar 2019 | US |
Child | 16847493 | US |