The present disclosure relates generally to extended Reality (XR) applications, and more particularly, to techniques for split-rendering process to mitigate impact of delay variation in XR applications with remote rendering.
Mobile devices, head-mounted displays (HMDs), set top boxes and similar client devices lack the graphics capabilities and processing power for extended reality (XR) applications, such as video games and training simulations. XR applications include both virtual reality (VR) and augmented reality (AR) applications. The limitations on processing power and graphics capabilities can be overcome by implementing remote rendering in which the heavy lifting of three-dimensional (3D) graphics rendering is performed by a central server and the rendered video is transmitted to the client device over a network. For example, rendering servers can be located in a data center in an edge cloud. This approach requires real-time transport of encoded (compressed) video on the downlink to the client device and of sensor data (as well as camera capture for AR) on the uplink to the rendering server.
Video games and XR applications typically have very strict latency requirements. The interaction latency, i.e., the time from user presses a button or moves the controller until the video has been updated on the display (e.g., HMD) should be in the order of 100 ms. The motion to photon latency requirement, i.e., the time from movement of the user’s head until the video has been updated on the display is even shorter, in the order of about 20 ms. This latency requirement is practically unachievable in the case of remote rendering, so it is mitigated by techniques like time warp.
Compounding the problem of latency, the AR/VR application may react to the changing network condition by changing its encoding quality and therefore the rate of the video streams. Such adaptation methods are supported in current real-time video streaming solutions, such as Web Real-Time Communication (WebRTC) or Self-Clocked Rate Adaptation for Multimedia (SCReAM). Volatile throughput is a known issue in current (fixed and mobile) networks, and it is going to be even more volatile in Fifth Generation (5G) networks, where larger bit-pipes are possible with New Radio (NR) but the fading and interference effects of high-frequency channels are also more pronounced.
When the network throughput is lower than the video encoding rates, video packets will start queuing up at the bottleneck link and either will be delayed (delivered late) to the client or lost. Potential retransmission of these frames can result in additional delays. Because of the stringent latency requirement of the XR applications, the result of the queuing delays is that the delayed frames cannot be decoded and presented in time, which is effectively equivalent to being lost.
To prevent long queuing delays, the AR/VR application may react to the changing network condition by changing the encoding quality and therefore the bitrate of the video stream. With bitrate adaptation, the system will attempt to match the encoding rate of the video to the throughput of the network to prevent queueing delays. When the bitrate is adapted, there is, however, a transient period between the time that congestion is detection and the time that the bitrate is adjusted. During this transient period, the video rate may exceed the throughput. The length of the transient period may be decreased by reacting to congestion as fast as possible using, for example, the Low Latency, Low Loss, Scalable Throughput (L4S) mechanism that enables low transport queue build-ups. However, fast reaction to congestion may result in insufficient utilization of available resources and ultimately in decreased user experience.
Eliminating queuing delays by dropping frames at the transport queue or even at the server is not a viable solution since all frames have to be available at the client for decoding. If a frame is lost, a new I-frame would be needed that is considerably larger than a P-frame so creating I-frame should be avoided if possible
The present disclosure relates generally to a method of split rendering to make video applications more tolerant of queuing delay variation. The rendered image is divided into layers referred to as graphic layers to reduce video stream size. The server groups and orders the graphic layers based on Quality of Experience (QoE) importance. When applying bitrate adaptation, the server reduces the video quality of the layers with lower importance first. The client device decodes and presents the graphic layers that have been received in time, i.e., until decoding should start. The client device also keeps the last received instance for each graphic layer and uses them for the presentation if the next instance of the graphic layer is not been received in time for decoding. The client device provides feedback to the server indicating which layers it has received in time. Based on the feedback, the server control its ordering and adaptation mechanism.
A first aspect of the disclosure comprises methods of split rendering implemented by a server node to reduce the impact of delay variation in remote rendering applications. The method comprises rendering a visual scene to create a plurality of graphic layers. Each graphic layer comprises one or more objects in the visual scene. The method further comprises grouping and sorting the graphic layers according to a quality metric associated with the graphic layers to create multiple graphic layer groups with different quality ranks. The method further comprises encoding the graphic layer groups into a composite video frame so that each graphic layer group in the composite video frame is separately decodable. The method further comprises transmitting, in sorted order based on quality rank, each graphic layer group of the composite video frame to a client device.
A second aspect of the disclosure comprises methods of split rendering implemented by a client device to reduce the impact of delay variation. The method comprises receiving at least a part of a composite video frame from a server node. The composite video frame comprises a plurality of separately decodable graphic layer groups representing a visual scene. Each graphic layer group includes one or more graphic layers representing objects in the visual scene. The method further comprises decoding the received graphic layer groups in the composite video frame that are received prior to a decoding deadline. The method further comprises, if any graphic layer groups have not arrived before the decoding deadline, deriving a substitute graphic layer group for each of one or more late-arriving graphic layer groups in the composite video frame that are not received before the decoding deadline. The method further comprises rendering the composite video frame from the graphic layers in the received graphic layer groups and substitute layer groups to reconstruct the visual scene for display. The method further comprises sending feedback to the server node indicating the graphic layer groups that were received.
A third aspect of the disclosure comprises a server node configured to reduce the impact of delay variation in remote rendering applications. The server node is configured to split render a visual scene to create a plurality of graphic layers. Each graphic layer comprises one or more objects in the visual scene. The server node is further configured to group and sort the graphic layers according to a quality metric associated with the graphic layers to create multiple graphic layer groups with different quality ranks. The server node is further configured to encode the graphic layer groups into a composite video frame so that each graphic layer group in the composite video frame is separately decodable. The server node is further configured to transmit, in sorted order based on quality rank, each graphic layer group of the composite video frame to a client device.
A fourth aspect of the disclosure comprises a client device configured to reduce the impact of delay variation. The client device is configured to receive at least a part of a composite video frame from a server node. The composite video frame comprises a plurality of separately decodable graphic layer groups representing a visual scene. Each graphic layer group includes one or more graphic layers representing objects in the visual scene. The client device is further configured to decode the received graphic layer groups in the composite video frame that are received prior to a decoding deadline. is further configured t, if any graphic layer groups have not arrived before the decoding deadline, derive a substitute graphic layer group for each of one or more late-arriving graphic layer groups in the composite video frame that are not received before the decoding deadline. The client device is further configured to render the composite video frame from the graphic layers in the received graphic layer groups and substitute layer groups to reconstruct the visual scene for display. The client device is further configured to send feedback to the server node indicating the graphic layer groups that were received. The client device optionally displays the visual scene on a display.
A fifth aspect of the disclosure comprises a server node configured to reduce the impact of delay variation in remote rendering applications. The server node comprises communication circuitry for communicating with a client device and processing circuitry. The processing circuitry is configured to split render a visual scene to create a plurality of graphic layers. Each graphic layer comprises one or more objects in the visual scene. The processing circuitry is further configured to group and sort the graphic layers according to a quality metric associated with the graphic layers to create multiple graphic layer groups with different quality ranks. The processing circuitry is further configured to encode the graphic layer groups into a composite video frame so that each graphic layer group in the composite video frame is separately decodable. The processing circuitry is further configured to transmit, in sorted order based on quality rank, each graphic layer group of the composite video frame to a client device.
A sixth aspect of the disclosure comprises a client device configured to reduce the impact of delay variation. The client device comprises communication circuitry for communicating with a server node and processing circuitry. The processing circuitry is configured to receive at least a part of a composite video frame from a server node. The composite video frame comprises a plurality of separately decodable graphic layer groups representing a visual scene. Each graphic layer group includes one or more graphic layers representing objects in the visual scene. The processing circuitry is further configured to decode the received graphic layer groups in the composite video frame that are received prior to a decoding deadline. is further configured to, if any graphic layer groups have not arrived before the decoding deadline, derive a substitute graphic layer group for each of one or more late-arriving graphic layer groups in the composite video frame that are not received before the decoding deadline. The processing circuitry is further configured to render the composite video frame from the graphic layers in the received graphic layer groups and substitute layer groups to reconstruct the visual scene for display. The processing circuitry is further configured to send feedback to the server node indicating the graphic layer groups that were received. The client device optionally displays the visual scene on a display.
A seventh aspect of the disclosure comprises a computer program for a server node or other network node configured to perform split rendering to mitigate the impact of delay variation in remote rendering applications. The computer program comprises executable instructions that, when executed by processing circuitry in the server node causes the server node to perform the method according to the first aspect.
An eighth aspect of the disclosure comprises a carrier containing a computer program according to the seventh aspect. The carrier is one of an electronic signal, optical signal, radio signal, or a non-transitory computer readable storage medium.
A ninth aspect of the disclosure comprises a computer program for a client device or configured to perform split rendering to mitigate the impact of delay variation in remote rendering applications. The computer program comprises executable instructions that, when executed by processing circuitry in the server node causes the server node to perform the method according to the first aspect.
A tenth aspect of the disclosure comprises a carrier containing a computer program according to the ninth aspect. The carrier is one of an electronic signal, optical signal, radio signal, or a non-transitory computer readable storage medium.
An improved split rendering process of the present disclosure mitigates the impact of delay variation in XR application with remote rendering applications. A graphics application, such as a game engine, executes on a server node that is disposed in an Edge Data Network (EDN). The game engine generates a visual scene for display to a remote user. The visual scene is split rendered to generate graphic layers from 3D objects in the visual scene. The server node groups and sorts the graphic layers based on QoE importance to create graphic layer groups, encodes each graphic layer group into a composite video frame and appends metadata to the composite video frame identifying the graphic layer groups. The encoded video frame is then transmitted in sorted order based on quality rank to a client device 200 (e.g., an HMD worn by a user) where the video frame is decoded and displayed. When the deadline for decoding the video is reached, the client device reads the metadata, identifies the graphic layer groups and decodes each graphic layer group that is received prior to the decoding deadline. If a graphic layer group is not timely received prior to the decoding deadline, the client device 200 derives a substitute graphic layer group for the late or missing graphic layer group using buffered data from a previous frame. The resultant graphic layers are then rendered to reconstruct the visual scene and displayed to the user on the client device 200. The client device 200 further sends feedback to the server indicating the graphic layer groups that were timely received.
Based on feedback from the client, the split rendering technique as herein described enables the server node to separately adapt the graphic layer groups responsive to changes in network conditions depending on QoE-importance to ensure the best possible QoE for given transport conditions. Further, the need for fast adaptation to changing network conditions is eliminated by trading off a slight degradation of video quality for reduced latency and delay variability.
The techniques as herein described can be combined with other techniques, such a time warp techniques, to mitigate the impact of lost information in the rendered image.
The solution is network agnostic and does not require any specific cooperation with the network in form of off-line serve level agreements (SLAs) or in form of on-line network application interfaces (API) based interactions. However, the proposed method may still benefit from some of the existing technologies that keep the transport queues short, e.g., Explicit Congestion Notification (ECN) or L4S.
The access network 12 may be any type of communications network (e.g., Wireless Fidelity (WiFi), ETHERNET, Wireless Local Area Network (WLAN), Wideband Code Division Multiple Access (WCDMA), Long Term Evolution (LTE), etc.), and functions to connect subscriber devices, such as client device 200, to one or more service provider nodes, such as server node 100. The cloud network 14 provides such subscriber devices with “on-demand” availability of computer resources (e.g., memory, data storage, processing power, etc.) without requiring the user to directly, actively manage those resources. According to embodiments of the present disclosure, such resources include, but are not limited to, one or more XR applications being executed on server node 100. The XR applications may comprise, for example, gaming applications and/or simulation applications used for training.
Generally, the server node 100 groups and sorts graphic layers in a video frame based on QoE importance and transmits the different graphic layer groups in sorted order to the client device 200 based on QoE rank. In more detail, when the server node 100 receives a visual scene from the XR application, the server node 100 renders the visual scene to generate a plurality of graphic layers, each of which comprises one or more 3D objects in the visual scene (block 310). Each graphic layer is associated with a motion vector and spatial location information.
The server node 100 optionally receives feedback from the client device 200 indicating the graphic layer groups that were received in the previous frame, which can be used for bitrate adaptation of the graphic layer group as hereinafter described (block 320).
The server node 100 groups and sorts the graphic layers based on QoE importance (block 330). That is, graphic layers deemed more important to the user experience are given a higher rank than graphic layers deemed to be less important to user experience. The server node 100 can consider a number of factors in determining the QoE rank of a graphic layer. The following is a non-exhaustive listing of factors that may be considered in grouping and sorting the graphic layers:
The result of the grouping and sorting process is a set of graphic layer groups. Some graphic layer groups may comprise a single graphic layer while other groups comprise two or more graphic layers. In the most extreme cases, each group may comprise a single graphic layer.
Returning to
The encoder at the server node 100 adapts to the network condition. Depending on whether network congestion is present, the server node 100 determines the encoding rates (also referred to herein as the target rate) for each graphic layer or graphic layer group. If congestion is detected, the target rate is decreased for one or more graphic layers or graphic layer groups (block 350). If congestion is not detected, the target rate for one or more graphic layers or graphic layer groups may be increased (block 360).
In one embodiment, the encoding rate, also referred to as the target rate, is determined individually for each graphic layer group. The encoding logic takes into account the QoE-importance of the graphic layer or graphic layer group, i.e., the more important graphic layers are encoded with higher quality. This QoE-aware encoding ensures, for a given network condition, the best possible QoE at the client device 200.
The sorted graphic layers are input to a video encoder along with the target rates determined at blocks 350 and/or 360. The encoder in the server node 100 encodes each graphic layer or graphic layer group separately based on the determined target rates (block 370). The independent encoding eliminates dependencies between graphic layer groups and ensures that each graphic layer group is separately decodable independently of the other graphic layer groups.
As noted above, a graphic layer group may comprise one or more graphic layers. Referring back to
In some embodiments, the encoder can use the feedback from the client device 200 as reference for the encoding process. That is, the client device 200 may use the last received graphic layer or graphic layer group to encode the current graphic layer of graphic layer group. If a graphic layer or graphic layer group was not received in the previous frame, the encoder may use the last received graphic layer group as a reference for the current frame. For example, assume that the encoder is encoding Group C for the current frame and the client feedback indicates that group C was not received in frame n-1. In this case, the encoder can use Group C from frame n-2 as a reference to encode Group C in the current frame.
Following encoding, the composite video frame is transmitted to the client device 200 (block 380). The metadata is transmitted first, followed by each group in order based on the group rank.
In some embodiments, the server node 100 may also encode motion information for each graphic layer or graphic layer group. The motion information can be used, for example, to perform time-warping at the client device 200.
The graphic layer groups that have timely arrived are stored in a buffer and directly sent to displaying (blocks 425, 430). For the graphic layer groups that have not timely arrived, the client device 200 checks whether it has a buffered instance of the same graphic layer group from the last received frame (frame for time Ti-1 in
When all the graphic layer groups have arrived or have been derived, the client device 200 renders the display image and outputs the image to a display (blocks 455, 460). The image is rendered from the graphic layers using the 3D position information and the layer texture.
Finally, the client device 200 sends feedback to the server node 100 indicating which graphic layer groups were received before the decoding deadline (block 465). As noted above, this feedback can be used to prioritize the graphic layers, to determine an encoding rate for graphic layer groups, or to select a reference for encoding the graphic layer in the current frame.
In some embodiments, client device 200 may continue receiving the missing parts of the current frame after the decoding deadline is reached. If a substitute graphic layer group is stored in the buffer, it can be overwritten or replaced when the missing part is finally received. In some embodiments, the client could wait until a predetermined time after Ti (Ti + t) to provide the feedback to the server node 100. In this case, the feedback would indicate the graphic layer groups received prior to Ti + t. In another embodiment, the client device 200 waits until the first bytes of the next frame arrive before sending feedback.
The processing unit 120 in this example comprises an optional game engine 130 or other application, a rendering engine 140, a grouping and sorting unit 150, an encoding unit 160, a transmitting unit 170, and an optional feedback unit 180. The various units 120-180 can be implemented by hardware and/or by software code that is executed by a processor or processing circuitry. The game engine 130 receives the user input, sensor data and camera feeds, maintains a game context, and generates visual scenes that are presented on a display of the client device 200. The visual scenes generated by the game engine 130 are input to the rendering unit 140. The rendering unit 140 renders the visual scene provided by the game engine 130 into a set of graphical layers. Each graphic layer comprises one or more objects in the visual scene. The graphic layer includes position information for reconstructing the visual scene from the graphic layers and motion information describing or characterizing the motion of the objects in the graphic layer. Generally, the objects in the same graphic layer will be characterized by the same motion.
The rendering unit 140 passes the graphic layers to the grouping and sorting unit 150. The grouping and sorting unit 150 groups and sorts the graphic layers according to QoE importance as previously described. During the grouping and sorting process, some of the graphical layers may be consolidated into a single graphic layer group that represents a set of graphic layers with similar importance. The encoding unit 160 separately encodes each graphic layer group into a composite video frame and appends metadata to the beginning of the composite video frame identifying the graphic layer groups. In the composite video frame, the groups are ordered according to importance, which can be derived from the importance of the graphic layers in the graphic layer group. The transmitting unit 170 transmits the composite video frame to the client device 200 over the network via the transmitter 290 according to the communication protocols for the network. The processing unit 120 optionally includes a feedback unit 180 to handle client feedback from the client device 200. The client feedback, as previously described, may be provided to the grouping and sorting unit 150 for use in prioritizing the graphical layers, or to the encoding unit 160 for use in encoding the graphic layers.
The processing unit 220 includes a decoding unit 230, a derivation unit 240, a buffer unit 250, a feedback unit 260, a rendering unit 270, and an optional display 280. The various units 220-280 can be implemented by hardware and/or by software code that is executed by a processor or processing circuitry. The decoding unit 230 decodes the rendered video received from the server node 100 to obtain the graphic layers representing the objects in the visual scene. If any graphic layers in the current frame are missing (either because they arrived too late or were lost), the derivation unit 240 derives a substitute graphic layer group for the missing graphic layer group from information stored by the buffer unit 250. The buffer unit 250 stores the graphic layer groups that were successfully decoded and the substitute graphic layer groups generated by the derivation unit 240. The feedback unit 260 sends feedback to the server node 100 indicating the graphic layer groups that were received in time for decoding. The rendering unit 270 renders the composite video frame received from the server node 100 to reconstruct the visual scene output by the game engine 130 or other application running on the server node 100. The visual scene is output to the display 280 for presentation to the user.
In some embodiments of the method 500, the server node further adds metadata to the composite video frame identifying each graphic layer and its position in the video frame.
In some embodiments of the method 500, grouping and sorting the graphic layers comprises grouping M graphic layers into N graphic layer groups based on quality metrics of the graphic layers, where M > N.
Some embodiments of the method 500 further comprise receiving feedback from the client device 200 indicating graphic layers received in a previous frame and determining the quality metrics for the graphical layers based on the feedback from the client device 200.
Some embodiments of the method 500 further comprise receiving feedback from the client device indicating graphic layers received in a previous frame and determining an encoding for a graphic layer based on the feedback from the client device 200.
In some embodiments of the method 500, determining an encoding for a graphic layer based on the feedback from the client device comprises reducing an encoding rate for a graphic layer group when the feedback indicates that one of the graphic layers in the group was not received in the previous frame.
Some embodiments of the method 500 further comprise detecting congestion in the communication link between the rendering device (e.g. server node 100) and the client device 200 and independently varying the encoding rate for each group based on detected congestion.
Some embodiments of the method 500 further comprise reducing an encoding rate for at least one graphic layer group when congestion is detected and increasing an encoding rate for at least one group when congestion is not detected.
In some embodiments of the method 600, deriving a substitute graphic layer comprises, for each late-arriving graphic layer group, retrieving a previous graphic layer group corresponding to the late-arriving graphic layer group from a buffer and deriving the substitute graphic layer group based on the previous graphic layer group.
Some embodiments of the method 600 further comprise storing the timely received graphic layer groups and substitute graphic layer groups in a buffer.
Some embodiments of the method 600 further comprise, after a decoding deadline for the current frame, receiving a late-arriving graphic layer group, decoding the late-arriving graphic layer group, and storing the late-arriving graphic layer group in a buffer by replacing a corresponding substitute graphic layer group with the late-arriving graphic layer group.
The communication circuitry 810 comprises network interface circuitry for communicating with other nodes in and/or communicatively connected to communication network 10. Such nodes include, but are not limited to, one or more network nodes and/or functions disposed in cloud network 14, access network 12, and one or more client devices 200, such as a HMD.
Processing circuitry 720 controls the overall operation of server node 16 and is configured to perform the steps of methods 300 and 500 shown in
Memory 730 comprises both volatile and non-volatile memory for storing computer program code and data needed by the processing circuitry 720 for operation. Memory 730 may comprise any tangible, non-transitory computer-readable storage medium for storing data including electronic, magnetic, optical, electromagnetic, or semiconductor data storage. Memory 730 stores computer program 740 comprising executable instructions that configure the processing circuitry 720 to implement methods 300 and 500 shown in
The communication circuitry 810 comprises network interface circuitry for communicating with other nodes in and/or communicatively connected to communication network 10. Such nodes include, but are not limited to, one or more network nodes and/or functions disposed in cloud network 14, access network 12, and one or more server nodes 100.
Processing circuitry 820 controls the overall operation of the client device 800 and is configured to perform the steps of methods 400 and 600 shown in
Memory 830 comprises both volatile and non-volatile memory for storing computer program code and data needed by the processing circuitry 820 for operation. Memory 830 may comprise any tangible, non-transitory computer-readable storage medium for storing data including electronic, magnetic, optical, electromagnetic, or semiconductor data storage. Memory 830 stores computer program 840 comprising executable instructions that configure the processing circuitry 820 to implement methods 400 and 600 shown in
The user interface 850 comprises a head mounted display and/or user controls, such as buttons, actuators, and software-driven controls, that facilitates a user’s ability to interact with and control the operation of the application running on the server node 100. The head mounted display is worn by the user and is configured to display rendered video images to a user. The head-mounted display may include sensors for sensing head movement and orientation and cameras providing a video feed. Other types of displays can be used in place of the head-mounted display. Exemplary client devices include Cathode Ray Tubes (CRTs), Liquid Crystal Displays (LCDs), Liquid Crystal on Silicon (LCos) displays, and Light-Emitting Diodes (LEDs) displays. Other types of client devices not explicitly described herein may also be possible.
Those skilled in the art will also appreciate that embodiments herein further include corresponding computer programs. A computer program comprises instructions which, when executed on at least one processor of an apparatus, cause the apparatus to carry out any of the respective processing described above. A computer program in this regard may comprise one or more code modules corresponding to the means or units described above.
Embodiments further include a carrier containing such a computer program. This carrier may comprise one of an electronic signal, optical signal, radio signal, or computer readable storage medium.
In this regard, embodiments herein also include a computer program product stored on a non-transitory computer readable (storage or recording) medium and comprising instructions that, when executed by a processor of an apparatus, cause the apparatus to perform as described above.
Embodiments further include a computer program product comprising program code portions for performing the steps of any of the embodiments herein when the computer program product is executed by a computing device. This computer program product may be stored on a computer readable recording medium.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2020/074819 | 9/4/2020 | WO |