The specification relates generally to video delivery, and specifically to a method, system and apparatus of generating video data for delivery over a network link to a display device.
Media streaming systems, such as virtual reality systems in which a user wears a headset or other device displaying video, may rely on a distinct host device such as a game console to generate the video and stream the video to the headset. The communication link between the host device and the headset or other client device is increasingly being deployed as a wireless link, which is susceptible to degradation due to physical interference and the like. In the present of degradation, the link may not provide sufficient bandwidth to reliably carry the video stream. Reduced bandwidth availability may be accommodated by compressing the video stream, but such compression typically results in a loss of visual fidelity, reducing the quality of the video as perceived by the viewer at the client device.
An aspect of the specification provides a method of generating video data for delivery over a network link to a display, the method comprising: obtaining a positional indicator corresponding to the display; generating, based on the positional indicator, (i) a primary region definition and (ii) a secondary region definition distinct from the primary region definition; selecting (i) a primary compression level corresponding to the primary region definition, and (ii) a secondary compression level corresponding to the secondary region definition, the secondary compression level being greater than the primary compression level; generating a compressed video frame from an initial video frame by: applying, to a first portion of the initial video frame identified by the primary region definition, a first compression operation according to the primary compression level; applying, to a second portion of the initial video frame identified by the secondary region definition, a second compression operation according to the secondary compression level; and transmitting, for delivery to the display, the compressed video frame and a frame descriptor indicating (i) the primary and secondary region definitions, and (ii) configuration parameters for the first and second compression operations.
A further aspect of the specification provides a host device for generating video data for delivery over a network link to a display, the host device comprising: a memory; a communications interface; and a processor interconnected with the memory and the communications interface, the processor configured to: obtain a positional indicator corresponding to the display; generate, based on the positional indicator, (i) a primary region definition and (ii) a secondary region definition distinct from the primary region definition; select (i) a primary compression level corresponding to the primary region definition, and (ii) a secondary compression level corresponding to the secondary region definition, the secondary compression level being greater than the primary compression level; generate a compressed video frame from an initial video frame by: applying, to a first portion of the initial video frame identified by the primary region definition, a first compression operation according to the primary compression level; applying, to a second portion of the initial video frame identified by the secondary region definition, a second compression operation according to the secondary compression level; and transmit, for delivery to the display, the compressed video frame and a frame descriptor indicating (i) the primary and secondary region definitions, and (ii) configuration parameters for the first and second compression operations.
Embodiments are described with reference to the following figures, in which:
To that end, the host device 104 includes a central processing unit (CPU), also referred to as a processor 140. The processor 140 is interconnected with a non-transitory computer readable storage medium, such as a memory 142, having stored thereon various computer readable instructions in the form of an application 144 for execution by the processor 140 to configure the host device 104 to perform various functions (e.g. generating and streaming video data to the client device 108). The memory 142 also stores a repository 146 of multimedia data for use in generating the above-mentioned video data. The memory 142 includes a suitable combination of volatile (e.g. Random Access Memory or RAM) and non-volatile memory (e.g. read only memory or ROM, Electrically Erasable Programmable Read Only Memory or EEPROM, flash memory). The processor 140 and the memory 142 each comprise one or more integrated circuits.
The host device 104 may also include one or more input devices (e.g. a keyboard, mouse, game controller or the like, not shown), and one or more output devices (e.g. a display, speaker and the like, not shown). Such input and output devices serve to receive commands for controlling the operation of the host device 104 and for presenting information, e.g. to a user of the host device 104. The host device 104 further includes a wireless communications interface 148 interconnected with the processor 140. The interface 148 enables the host device 104 to communicate with other computing devices, including the client device 108. The nature of the interface 148 is selected according to the type of the network 112. In the present example, the network is a wireless local-area network (WLAN), such as a network based on the IEEE 802.11ad standard or the 802.11ay enhancement thereto (which specify operation at frequencies of about 60 GHz). The interface 148 therefore includes any suitable combination of antenna elements, transceiver components, controllers and the like, to enable communication over such a network.
The client device 108, in the present example, is implemented as a head-mounted unit supported by a headband or other mount that is wearable by an operator (not shown). The client device 108 includes a central processing unit (CPU), also referred to as a processor 180. The processor 180 is interconnected with a non-transitory computer readable storage medium, such as a memory 182, having stored thereon various computer readable instructions in the form of an application 184 for execution by the processor 180 to configure the client device 108 to perform various functions (e.g. receiving video data from the host device 104 and presenting the video data to a viewer or viewers of the client device 108). The memory 182 includes a suitable combination of volatile (e.g. Random Access Memory or RAM) and non-volatile memory (e.g. read only memory or ROM, Electrically Erasable Programmable Read Only Memory or EEPROM, flash memory). The processor 180 and the memory 182 each comprise one or more integrated circuits.
The client device 108 also include one or more input devices. The input devices of the client device 108 include a position tracker 185, which in the present example is an eye-tracking device. The position tracker 185 is configured to generate, and provide to the processor 180, indications of a position of the center of the visual field of a viewer of the client device 108. That is, the position tracker 185 is configured to track the gaze of the viewer by any suitable eye-tracking technology (e.g. retinal tracking).
The client device 108 also includes one or more output devices. In the present example, the output devices include at least one display 187. Specifically, as shown in
The client device 108 can also include further input devices (e.g. a microphone, a keypad, keyboard, or the like) and further output devices (e.g. one or more speakers). In addition, the client device 108 includes a wireless communications interface 188 interconnected with the processor 180. The interface 188 enables the client device 108 to communicate with other computing devices, including the host device 104. In the present example, as noted above in connection with the host device 104, the network 112 is based on the IEEE 802.11ad standard or the 802.11ay enhancement thereto. The interface 188 therefore includes any suitable combination of antenna elements, transceiver components, controllers and the like, to enable communication over the network 112 at frequencies of around 60 GHz.
Turning to
Referring to
Turning to
Turning now to
At block 305, the client device 108 is configured to generate a positional indicator corresponding to at least one of the displays 187. As discussed below, the positional indicator reflects the direction of the viewer's gaze. Therefore, at block 305 two positional indicators may be generated, each corresponding to one eye and one of the displays 187. In the discussion below, a single positional indicator will be discussed, with the understanding that when two positional indicators are to be employed, the process below is repeated for each eye and corresponding display 187.
The positional indicator indicates a position, on the display 187, of a field of view of the viewer of the display 187. More specifically, in the present example, the positional indicator indicates a position of a center, or central region, of the field of view. The positional indicator is expressed in a frame of reference corresponding to the display 187 (e.g. as pixel coordinates). As will be apparent to those skilled in the art, the positional indicator is generated via data received from the position tracker 185, which may be a retinal tracker or the like. The positional indicator defines at least a position of the field of view, and can also define a velocity of the field of view, indicating in which direction and at what speed the viewer's eye is moving. For example, the position tracker 185 can be configured to assess the position of the eye at a predefined frequency, and to determine the velocity based on any suitable number of position samples. Preferably the sampling frequency of the position tracker 185 is greater than the frame rate of the video data. For example, in some embodiments the video data may include a series of video frames displayed at a rate of about 90 frames per second, and the position tracker 185 may generate about 120 samples per second. Greater sampling rates (e.g. 300 Hz) may also be employed, as well as smaller sampling rates (e.g. 60 Hz).
Having obtained the positional indicator, the client device 108 is configured to transmit the positional indicator to the host device 104 via the network 112. At block 310, the host device 104 is configured to receive the positional indicator. At block 315, the host device 104 can be configured to assess a quality of the link between the host device 104 and the client device 108 via the network 112. For example, the host device 104 can be configured to track one of more of a signal-to-noise ratio (SNR), an indication of available bandwidth, a received signal strength indicator (RSSI) and the like. In general, the assessment of link quality at block 315 permits the host device 104 to determine whether the link via the network 112, over which video data is to be transmitted, provides sufficient bandwidth to carry to video stream.
Thus, at block 320, the host device 104 is configured to determine whether the link quality meets a predefined threshold. The threshold is defined based on the parameter(s) measured at block 315 for link quality assessment. For example, if SNR is measured at block 315, the determination at block 320 can include comparison of a measured SNR to a minimum threshold SNR.
When the determination at block is negative (i.e. when the assessed link quality does not meet the predefined threshold mentioned above), performance of the method 300 proceeds to block 325. At block 325 the host device 104 is configured to select and apply two or more distinct compression operations to regions of a video frame before transmitting the frame. In some embodiments, blocks 315 and 320 are omitted, and the performance of the method 300 instead always proceeds to block 325. When the determination at block 320 is affirmative, the above-mentioned compression may be omitted, and performance of the method 300 proceeds directly to block 330.
At block 325, the host device 104 is configured to generate at least a primary region definition and a secondary region definition distinct from the primary region definition. The host device 104 is also configured to select respective compression levels for the regions generated at block 325. As will be seen in greater detail below, the compression levels and the region definitions are employed by the host device 104 to generate video frames in which different portions are compressed to different degrees.
Turning to
The determination at block 405 can be based on the link quality assessment at block 315. For example, the host device 104 can be configured to compare the assessed link quality both to the above-mentioned threshold applied at block 320, as well as to a secondary threshold lower than the threshold applied at block 320. If the link quality fails to meet the initial threshold of block 320 but does meet the threshold of block 405, the determination at block 405 is negative. If, on the other hand, the assessed link quality fails to meet both thresholds, the determination at block 405 is affirmative. In other words, if the link quality is sufficiently low, additional compression may be required to accommodate the limited bandwidth available over the network 112.
Following either a positive or a negative determination at block 405, the host device 104 is configured to perform blocks 410 and 415. At block 410, the host device 104 (e.g. the region generator 208) is configured to generate at least a primary region definition and a secondary region definition based on the positional indicator received at block 310. The primary and secondary region definitions correspond to distinct regions of the display 187. In general, the region definitions are generated to correspond to regions of the display 187 to which different levels of compression may be applied with minimal effects on perceived image quality at the client device 108.
The ability of the human eye to distinguish detail decreases from the center of the visual field toward the outside of the visual field. For example, maximum resolution is typically available only over an arc of less than about 10 degrees. Therefore, images displayed at or near the center of the viewer's field of vision may be more susceptible to perceived degradation in quality when compressed, while images displayed away from the center of the viewer's field of vision may tolerate greater levels of compression before reduced image quality is perceived by the viewer. The host device 104 is configured to generate the primary and secondary region definitions according to the above radial decrease in
The host device 104, at block 410, is configured to retrieve one or more parameters from the memory 142 for use in generating the primary and secondary regions. For example, the parameters can define arcs corresponding to regions of the human field of vision with different perceived resolution. The arcs may be specified as angles in some embodiments. In other embodiments, in which the dimensions of the display 187 and the distance between the viewer's eye and the display 187 is known in advance, the parameters can be defined as two-dimensional areas and expressed in dimensions corresponding to those of the display 187. For example, referring to
To generate the region definitions at block 410, the host device 104 is configured to retrieve the portions shown in
Returning to
As seen above, each compression level corresponds to a set of compression parameters specifying a compression algorithm and any suitable settings for the specified algorithm. For example, the compression level “1” specifies no compression. The compression level “2”, meanwhile, specifies the use of the display stream compression (DSC) algorithm, which is generally considered a visually lossless algorithm, with a compression ratio of about 3 to 1. In other words, the second compression level specifies a greater level of compression than the first. The third and fourth levels of compression specify still greater levels. For example, the third and fourth levels may both specify use of the H.264 (also referred to as MPEG-4 AVC), with different data rates (the data rate “A” being greater than the data rate “B”). As will be apparent to those skilled in the art, H.264 may achieve compression ratios significantly greater than 3 to 1, at the cost of some loss of visual fidelity with the original video data.
Table 1 contains four compression levels. In other embodiments, greater numbers of compression levels may be defined in the compression parameters 212. In further embodiments, as few as two levels may be defined. The host device 104 can be configured, when a greater number of levels are available than there are region definitions, to select which compression levels to employ based on the link assessment at block 315.
Following the performance of block 415, the host device proceeds to block 330, shown in
When, at block 405, the host device 104 determines that auxiliary compression is also to be applied to one or more video frames, blocks 410 and 415 are performed as described above. In addition, the host device 104 is configured to perform block 420 (e.g. in parallel with block 410). At block 420, the host device 104 (e.g. the region generator 208) is configured to generate one or more auxiliary region definitions based on the source data stored in the repository 146, from which the video frames are generated (e.g. by the renderer 204). The auxiliary region definitions identify additional portions of the video frame, beyond the above-mentioned primary, secondary and tertiary (where employed) region definitions generated at block 410, for compression. In general, the generation of auxiliary region definitions at block 420 by the host device 104 includes inspection of the source data in the repository 146 for portions of the source data that give certain predefined characteristics to corresponding regions of the resulting video frame.
Examples of such predefined characteristics include regions of a video frame representing content at or beyond a certain depth from the viewer, which may be more tolerant of greater level of compression than content below the above-mentioned depth. Further examples of such predefined characteristics include regions of a video frame with polygon density below a configurable threshold (which may tolerate greater compression), as well as regions of a video frame with polygon density above a further configurable threshold (which may be less tolerant of compression). Still further examples include regions of a video frame with a density of edges or other features that is greater than a threshold (which may be less tolerant of compression), as well as regions with a density of edges or other features below a threshold (which may be more tolerant of compression).
At block 425, the host device 104 is configured to select auxiliary compression levels for the auxiliary region definitions generated at block 420. The auxiliary compression levels are selected from the compression parameters 212, as discussed above. For example, turning to
Returning briefly to
As will now be apparent, the updating performed at block 430 includes collecting all sub-regions with the same compression level into a single one of the primary, secondary and tertiary regions. Thus, the auxiliary region definition 600 is included in the updated secondary region definition 624 because the auxiliary region definition 600 has the same compression level as the secondary region definition 524. In other embodiments, the updating operation can include further determinations. For example, since each auxiliary region definition overlays at least one of the region definitions generated at block 410, the updated region definitions may be generated by averaging the compression level of an auxiliary region definition (selected at block 425) with the compression level(s) of the underlying region definition(s) generated at block 410. In further examples, the compression levels selected at block 425 may be applied to portions of the underlying region definitions from block 410 as increments or decrements, based on whether the auxiliary compression levels are higher or lower than the compression levels selected at block 415. In further examples, the region generator 208 is configured to apply minimum and/or maximum compression limits to the updated compression levels. For example, the region generator 208 can prevent any portion of the primary region definition 520 from being assigned a compression level greater than the level “1” from the table above, irrespective of compression levels selected for auxiliary region definitions that overlay the primary region definition 520.
In other embodiments, the determination at block 405 may be omitted, and both branches of region generation shown in
Following the completion of block 415 or block 430, the host device 104 is configured, at block 330, to generate and send one or more video frames compressed according to the region definitions and levels discussed above. Specifically, initial video frame(s) are rendered from the source data by the renderer 204, compressed by the codec 216 according to the region definitions and compression levels selected by the region generator 208, and provided to the communications interface 148 for transmission. The resulting frame or set of frames include both the video frames themselves, compressed as noted above, and a compression descriptor that includes the region definitions and compression levels according to which the frames were compressed. The compression descriptor includes coordinates or other definitions of the region definitions, as well as the compression levels assigned to each region definition. The compression descriptor can also include the compression settings corresponding to each compression level, though in other embodiments, the client device 108 may also store a copy of the compression parameters 212 in the memory 142. The compression descriptor is contained in one or more metadata fields associated with a frame or group of frames.
The client device 108 is configured, at block 335, to receive the frame(s) from the host device 104 and to decompress the frames according to the descriptor mentioned above. In particular, in the present example the frames are received via the communications interface 188, following which the region extractor 258 is configured to identify and extract the compression descriptor from the frames and transmit the compressed frames and the region definitions and compression levels to the codec 256. The codec 256 is configured to decompress the frames according to the region definitions and the compression levels, and pass the decompressed frames to the display driver 262 for presentation on the display 187. As will now be apparent, the method 300 may be repeated for a subsequent frame or group of frames.
The scope of the claims should not be limited by the embodiments set forth in the above examples, but should be given the broadest interpretation consistent with the description as a whole.
This application claims priority to U.S. provisional patent application Nos. 62/453,614 filed Feb. 2, 2017, and 62/529,839 filed Jul. 7, 2017. The contents of the above-mentioned applications are incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
62453614 | Feb 2017 | US | |
62529839 | Jul 2017 | US |