Screen content, or data describing information displayed to a user by a computing system on a display, generally includes a number of different types of content. These can include, for example, text content, video content, static images (e.g., displays of windows or other GUI elements), and slides or other presentation materials. Increasingly, screen content is delivered remotely, for example so that two or more remote computing systems can share a common display, allowing two remotely-located individuals to view the same screen simultaneously, or otherwise in a teleconference such that a screen is shared among multiple individuals.
Generally, screen content captured in a studio, from a camera, or based on other image- or text-based screen images (e.g., a display window) are captured in RGB format. The specific RGB format may vary in terms of the number of bits used for each color channel, such as R8B8G8 (8 bits for each of the R, G, and B color channels), or R16G16B16 (16 bits for each channel). If this data is then converted to a corresponding luminance-chrominance arrangement (i.e., a YUV format), a corresponding format would be YUV444, which requires a total of six bytes of information per pixel.
Because screen content is delivered remotely, and due to increasing screen resolutions, it is desirable to compress this content to a size below its native bitmap size, to conserve bandwidth and improve efficiency in transmission. Accordingly, many devices that are configured to send and/or receive video or screen data are capable of only accepting lower-objective quality data but with visually less loss subjective quality, such as YUV422 data, which requires four bytes to describe two pixels, or YUV411 data, which requires six bytes to describe four pixels, or even YUV420 data which are commonly used because it can bring visually lossless quality for the video content, which also requires six bytes to describe four pixels, but which are reordered to group the Y, U, and V values. Such devices are only capable of managing such smaller data sizes for processing and compression (e.g., using a Moving Picture Experts Group (MPEG) codec, such as H.264 or HEVC-based codec). However, conversion from YUV444 to one of these other formats is typically performed by dropping some of the data describing each pixel (i.e., downsampling), which results in a loss of some color information.
Increasingly because screen content is transmitted for viewing on a remote system, loss of data from YUV444 to YUV420 or some other lower-quality format typically used only for video transmission is required; however, because the screen content includes not just video content but static image and text content as well, use of the lower-quality format is undesirable. However, when a device receives such encoded screen content and decodes/decompresses that content, it does so in the lower-quality format in which the screen content was received (e.g., YUV420, etc.). In some cases, the decoded content is then up-converted to YUV444. This results in a number of issues. For example, such up-conversion is typically performed using a nearest pixel method or a bilinear method to accomplish up-conversion. Use of these techniques leads to creation of artifacts in the screen content after it is up-converted to the YUV444 format. For example, text may disappear if it is of a particular color, or other artifacts may appear (e.g., lines or shadows in the resulting image). Such artifacts lead to lack of clarity in the screen content, even when up-converted.
In summary, the present disclosure relates to chrominance down-conversion techniques useable to minimize visual artifacts that would otherwise occur when converting image data in a high quality format (e.g., YUV444) to a lower quality format for encoding and transmission to a remote system. In some aspects, the present disclosure applies up-conversion techniques that complement those down-conversion techniques to ensure that high-quality images data is reconstituted at the remote system.
In a first aspect, a method processing chrominance of screen content is disclosed. The method includes down-converting chrominance of screen content at a computing device from a first format to a second format, the second format compatible with a video codec. The method also includes compressing the down-converted screen content in the second format using the video codec to generate compressed down-converted screen content. The method further includes transmitting the compressed down-converted screen content to a second computing device.
In a second aspect, a screen content conversion system includes a down-conversion component operable on a computing device to receive screen content and down-convert chrominance of the screen content from a first format to a second format, the second format being compatible with a video codec, wherein the down-conversion component applies at least one of a nine-tap filter, a bilateral filter, or a discrete cosine transform to the screen content to generate down-converted screen content. The screen content conversion system further includes a compression component operable on the computing device to receive the down-converted screen content and generate compressed down-converted screen content by applying the video codec. The screen content conversion system also includes a transmission component operable to transmit the compressed down-converted screen content to a remote computing device.
In a third aspect, a computer-readable medium is disclosed that includes computer-executable instructions which, when executed, cause a computing system to perform a method of processing chrominance of screen content. The method includes down-converting chrominance of screen content at a computing device from a first format to a second format, the second format compatible with a video codec, wherein the down-conversion includes applying at least one of a nine-tap filter, a bilateral filter, or a discrete cosine transform to the screen content to generate down-converted screen content. The method further includes compressing the down-converted screen content in the second format using the video codec to generate compressed down-converted screen content. The method also includes transmitting the compressed down-converted screen content to a second computing device.
This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
As briefly described above, embodiments of the present invention are directed to chrominance up-conversion and down-conversion processing. In particular, the present disclosure relates generally to methods and systems for processing screen content, such as screen frames, which include a plurality of different types of screen content. Such screen content can include text, video, image, special effects, or other types of content. The chrominance up-conversion and down-conversion processes described herein maintain high quality color and image fidelity, while converting such screen content to a compressed format for transmission using video-encoding and transmission protocols.
To address some limitations in remote screen display systems, the Remote Desktop Protocol (RDP) was developed by MICROSOFT® Corporation of Redmond, Wash. In this protocol, a screen frame is analyzed, with different contents classified differently. When RDP is used, a mixed collection of codecs can be applied, based on the type of screen content that is to be compressed and transmitted to a remote system for subsequent reconstruction and display.
In recent iterations of RDP solutions, video codecs, such as MPEG-based codecs (e.g., HEVC or H.264/MPEG-4 AVC), have been used for compressing and distributing screen content. Such video codecs typically require use of compact representations of images using a small number of bits to describe pixel data (or at least a smaller number than are used for full-fidelity image data). As noted above, simple down-sampling of pixel data can lose critical information, causing visual artifacts in the resulting encoded, transmitted, and decoded screen content.
In some embodiments, and in contrast to existing RDP solutions, the chrominance down-conversion and up-conversion processes discussed herein format screen content for use in connection with a video codec, which typically receives a lower-quality format, such as the YUV420 format. By applying one or more of the techniques discussed herein, such as use of a nine tap filter, use of hue, saturation, and lightness (HSL) formatting and a bilateral filter, or use of a discrete cosine transform (and attendant inverse discrete cosine transform for up-conversion), visual artifacts are avoided that would otherwise occur by simply downsampling of chrominance components of higher-quality formats, such as YUV444.
Generally, the memory 106 includes a remote desktop protocol software 108 and an encoder 110. The remote desktop protocol software 108 generally is configured to replicate screen content presented on a local display 112 of the computing device 102 on a remote computing device, illustrated as remote device 120. In some embodiments, the remote desktop protocol software 108 generates content compatible with a Remote Desktop Protocol (RDP) defined by MICROSOFT® Corporation of Redmond, Wash.
As is discussed in further detail below, the encoder 110 can be configured to apply a universal content codec to content of a number of content types (e.g., text, video, images) such that the content is compressed for transmission to the remote device 120. In example embodiments, the encoder 110 can generate a bitstream that is compliant with a standards-based codec, such as an MPEG-based codec. In particular examples, the encoder 110 can be compliant with one or more codecs such as an MPEG-4 AVC/H.264 or HEVC/H.265 codec. Other types of standards-based encoding schemes or codecs could be used as well.
As illustrated in
In the context of the present disclosure, in some embodiments, a remote device 120 includes a main programmable circuit 124, such as a CPU, and a special-purpose programmable circuit 125. In example embodiments, the special-purpose programmable circuit 125 is a standards-based decoder, such as an MPEG decoder designed to encode or decode content having a particular standard (e.g., MPEG-4 AVC/H.264, or HEVC/H.265). In particular embodiments, the remote device 120 corresponds to a client device either local to or remote from the computing device 102, and which acts as a client device useable to receive screen content. Accordingly, from the perspective of the remote device 120, the computing device 102 corresponds to a remote source of graphical (e.g., display) content.
In addition, the remote device 120 includes a memory 126 and a display 128. The memory 126 includes a remote desktop client 130 and display buffer 132. The remote desktop client 130 can be, for example, a software component configured to receive and decode screen content received from the computing device 102. In some embodiments, the remote desktop client 130 is configured to receive and process screen content for presenting a remote screen on the display 128. The screen content may be, in some embodiments, transmitted according to the Remote Desktop Protocol defined by MICROSOFT® Corporation of Redmond, Wash. The display buffer 132 stores in memory a current copy of screen content to be displayed on the display 128, for example as a bitmap in which regions can be selected and replaced when updates are available.
Referring to
In the embodiment shown, the method 200 includes a frame receipt operation 202, which corresponds to receipt of screen content representing one or more frames at a conversion component. For example, the frame receipt operation 202 can correspond to receiving screen content at a conversion component of a computing system from another software subsystem of that same computing system; in alternative embodiments, the frame receipt operation 202 can correspond to receiving screen content from a separate computing system or device as compared to the system performing one or more of the operations of method 200.
A chrominance down-conversion operation 204 performs a down-conversion process on each of the pixels of screen content, thereby converting the screen content from a first format to a second format that is acceptable to video encoding codecs. In example embodiments, the first format can be a YUV444 format and the second format can be a YUV420 or YUV422 format. In alternative embodiments, other color representations (e.g., coordinate systems) and/or formats could be used as well. Furthermore, a variety of down-conversion methodologies could be used. As explained in further detail below in connection with
A compression operation 206 performs a compression operation on the down-converted screen content from the chrominance down-conversion operation 204. The compression operation can be, for example, application of a video codec to the down-converted screen content, which is in a format that is accepted by that video codec. In example embodiments, the compression operation 206 can correspond to applying an MPEG-4 AVC/H.264, HEVC/H.265, or other MPEG-based encoding scheme.
In the embodiment shown, a channel transmission operation 208 corresponds to transmitting the now encoded, or compressed, down-converted screen content from the computing system on which it is down-converted and/or compressed to a second computing system. This can correspond, for example, to transmission of encoded screen content from a computing device 102 to a remote device 120 as discussed above in connection with
In the embodiment shown, a decompression operation 210 decompresses the transmitted, compressed (and down-converted) screen content, using a complementary codec application as in compression operation 206. As noted above, the decompression operation 210 can utilize an MPEG-4 AVC/H.264, HEVC/H.265, or other MPEG-based encoding/decoding scheme. The decompression operation 210 reconstructs the down-converted screen content, e.g., the screen content in YUV420 format, YUV422, or other codec-compatible color representation.
Following the decompression operation 210, an up-conversion operation 212 generates an up-converted version of the screen content based on the down-converted screen content. The up-conversion process can be performed in many ways. For example, the up-conversion operation 212 can include applying a bilinear filter or bi-cubic filter during up-conversion from YUV420 or YUV422 to YUV444. Such an arrangement may be advantageous, for example when used in connection with the nine tap down-conversion process discussed above. In an alternative embodiment, the up-conversion operation 212 can simply be performed using the nearest point without use of any filter. In still further embodiments, the up-conversion operation 212 can include performance of an inverse discrete cosine transform; an example of such an inverse discrete cosine transform is illustrated in
A storage and display operation 214 receives up-converted screen content and can perform one or more operations using that up-converted screen content. In example embodiments, the storage and display operation 214 can display on a screen of a remote system, such as remote device 120, the screen content that has been up-converted to the original, first format (e.g., YUV444). In alternative embodiments, the received content can be stored at the remote system, such as remote device 120 in memory for later display or transmission.
Referring now to
The down-converted screen content is then passed to an encoding module 304, which encodes the down-converted screen content using a video codec. As noted above, any of a variety of different codecs could be used, such as a MPEG-based codec (e.g., AVC/H.264, HEVC/H.265). The encoded content is then passed to a transmission channel 306, which corresponds to transmission of the encoded screen content (e.g., from a computing device 102 to a remote device 120). At a receiving computing system (such as remote device 120), a video decompression operation 308 decompresses the received, encoded content, thereby reconstructing uncompressed screen content in the second format that is compatible with the codec selected for video compression/decompression (e.g., YUV420 or YUV422). This down-converted screen content is transferred to an up-conversion module 310 that performs an up-conversion of that screen content. In particular, a bilinear filter can be applied to generate the screen content (e.g., a frame) in an up-converted format (e.g., YUV444, or equivalent RGB format). A display or storage module 312 provides the screen content to either memory for storage or to a screen for display.
Referring to
In the embodiment shown, the up-conversion module 324 performs a general up-conversion process without the need for application of one or more filters, although use of such filters may be possible. For example, a filter or nearest point copy could be applied to generate up-converted screen content, e.g., a YUV444 frame.
Referring to
In contrast to the methods 300, 320, method 340 requires complementary operations at opposed ends (e.g., at the computing device 102 and the remote device 120) of a channel over which the down-converted and compressed data is transmitted. This is because, in general use of the DCT-based down conversion, an inverse DCT process is required to reconstruct the data from the DCT process, since the DCT process concentrates high-energy signals in a single quadrant of the DCT matrix. Details of this process are provided below in connection with
Referring now to
In this arrangement, it is noted that the UVdown point is based on itself and the eight surrounding pixels 404a-h.
In the example implementation of a nine tap filter 500 shown in
Referring to
The RGB-formatted screen content is passed to a hue, saturation, and lightness (HSL) conversion module 604. In example embodiments, saturation (s) can be calculated for each of the RGB channels using the following equations:
In the above equation, lightness (l) can be determined based on the following equation, which represents an average of the RGB values.
Hue (Hp) can be calculated from the lightness and saturation at a particular point, using the following formula:
Hp=l
p
*α+s
p*β
The HSL values, represented in terms of a pixel distance from a nearest neighboring pixel, are then passed to a bilateral filter 606, which receives those values alongside a range distance 608, and provides a downsampled chrominance result 610. In the embodiment shown, the bilateral filter 606 determines the UVdown component corresponding to the current down-sampling position using the following responsive filtering equation:
In the above bilateral filter definition, the N term is defined as follows:
In that equation, Iq is the pixel value of the q position, p is the current position, and G is the Gaussian kernel, box kernel or other kernel used in the filter.
Referring to
Referring now to
In the embodiment shown in
In
It is noted that, referring to
In general, and referring to
When comparing the various methods discussed above, it is noted that each of the methods may be used individually or in conjunction based on the desired computing complexity and computing resources available at a transmitting or receiving computing system. For example, the method 300 of
The method 340 of
As stated above, a number of program modules and data files may be stored in the system memory 904. While executing on the processing unit 902, the program modules 906 (e.g., remote desktop protocol software 108 and encoder 110) may perform processes including, but not limited to, the operations of a universal codec encoder or decoder, as described herein. Other program modules that may be used in accordance with embodiments of the present invention, and in particular to generate screen content, may include electronic mail and contacts applications, word processing applications, spreadsheet applications, database applications, slide presentation applications, drawing or computer-aided application programs, etc.
Furthermore, embodiments of the invention may be practiced in an electrical circuit comprising discrete electronic elements, packaged or integrated electronic chips containing logic gates, a circuit utilizing a microprocessor, or on a single chip containing electronic elements or microprocessors. For example, embodiments of the invention may be practiced via a system-on-a-chip (SOC) where each or many of the components illustrated in
The computing device 900 may also have one or more input device(s) 912 such as a keyboard, a mouse, a pen, a sound or voice input device, a touch or swipe input device, etc. The output device(s) 914 such as a display, speakers, a printer, etc. may also be included. The aforementioned devices are examples and others may be used. The computing device 900 may include one or more communication connections 916 allowing communications with other computing devices 918. Examples of suitable communication connections 916 include, but are not limited to, RF transmitter, receiver, and/or transceiver circuitry; universal serial bus (USB), parallel, and/or serial ports.
The term computer readable media as used herein may include computer storage media. Computer storage media may include volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information, such as computer readable instructions, data structures, or program modules. The system memory 904, the removable storage device 909, and the non-removable storage device 910 are all computer storage media examples (i.e., memory storage.) Computer storage media may include RAM, ROM, electrically erasable read-only memory (EEPROM), flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other article of manufacture which can be used to store information and which can be accessed by the computing device 900. Any such computer storage media may be part of the computing device 900. Computer storage media does not include a carrier wave or other propagated or modulated data signal.
Communication media may be embodied by computer readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave or other transport mechanism, and includes any information delivery media. The term “modulated data signal” may describe a signal that has one or more characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media may include wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, radio frequency (RF), infrared, and other wireless media.
One or more application programs 1066 may be loaded into the memory 1062 and run on or in association with the operating system 1064. Examples of the application programs include phone dialer programs, e-mail programs, personal information management (PIM) programs, word processing programs, spreadsheet programs, Internet browser programs, messaging programs, and so forth. The system 1002 also includes a non-volatile storage area 1068 within the memory 1062. The non-volatile storage area 1068 may be used to store persistent information that should not be lost if the system 1002 is powered down. The application programs 1066 may use and store information in the non-volatile storage area 1068, such as e-mail or other messages used by an e-mail application, and the like. A synchronization application (not shown) also resides on the system 1002 and is programmed to interact with a corresponding synchronization application resident on a host computer to keep the information stored in the non-volatile storage area 1068 synchronized with corresponding information stored at the host computer. As should be appreciated, other applications may be loaded into the memory 1062 and run on the mobile computing device 1000, including the remote desktop protocol software 108 (and/or optionally encoder 110, or remote device 120) described herein, as well as associated chrominance down-conversion processes as described above. In some analogous systems, an inverse process can be performed via system 1002, in which the system acts as a remote device 120 for decoding a bitstream generated using a video codec and up-converting chrominance of decompressed screen content.
The system 1002 has a power supply 1070, which may be implemented as one or more batteries. The power supply 1070 might further include an external power source, such as an AC adapter or a powered docking cradle that supplements or recharges the batteries.
The system 1002 may also include a radio 1072 that performs the function of transmitting and receiving radio frequency communications. The radio 1072 facilitates wireless connectivity between the system 1002 and the “outside world,” via a communications carrier or service provider. Transmissions to and from the radio 1072 are conducted under control of the operating system 1064. In other words, communications received by the radio 1072 may be disseminated to the application programs 1066 via the operating system 1064, and vice versa.
The visual indicator 1020 may be used to provide visual notifications, and/or an audio interface 1074 may be used for producing audible notifications via the audio transducer 1025. In the illustrated embodiment, the visual indicator 1020 is a light emitting diode (LED) and the audio transducer 1025 is a speaker. These devices may be directly coupled to the power supply 1070 so that when activated, they remain on for a duration dictated by the notification mechanism even though the processor 1060 and other components might shut down for conserving battery power. The LED may be programmed to remain on indefinitely until the user takes action to indicate the powered-on status of the device. The audio interface 1074 is used to provide audible signals to and receive audible signals from the user. For example, in addition to being coupled to the audio transducer 1025, the audio interface 1074 may also be coupled to a microphone to receive audible input, such as to facilitate a telephone conversation. In accordance with embodiments of the present invention, the microphone may also serve as an audio sensor to facilitate control of notifications, as will be described below. The system 1002 may further include a video interface 1076 that enables an operation of an on-board camera 1030 to record still images, video stream, and the like.
A mobile computing device 1000 implementing the system 1002 may have additional features or functionality. For example, the mobile computing device 1000 may also include additional data storage devices (removable and/or non-removable) such as, magnetic disks, optical disks, or tape. Such additional storage is illustrated in
Data/information generated or captured by the mobile computing device 1000 and stored via the system 1002 may be stored locally on the mobile computing device 1000, as described above, or the data may be stored on any number of storage media that may be accessed by the device via the radio 1072 or via a wired connection between the mobile computing device 1000 and a separate computing device associated with the mobile computing device 1000, for example, a server computer in a distributed computing network, such as the Internet. As should be appreciated such data/information may be accessed via the mobile computing device 1000 via the radio 1072 or via a distributed computing network. Similarly, such data/information may be readily transferred between computing devices for storage and use according to well-known data/information transfer and storage means, including electronic mail and collaborative data/information sharing systems.
Embodiments of the present invention, for example, are described above with reference to block diagrams and/or operational illustrations of methods, systems, and computer program products according to embodiments of the invention. The functions/acts noted in the blocks may occur out of the order as shown in any flowchart. For example, two blocks shown in succession may in fact be executed substantially concurrently or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved.
The description and illustration of one or more embodiments provided in this application are not intended to limit or restrict the scope of the invention as claimed in any way. The embodiments, examples, and details provided in this application are considered sufficient to convey possession and enable others to make and use the best mode of claimed invention. The claimed invention should not be construed as being limited to any embodiment, example, or detail provided in this application. Regardless of whether shown and described in combination or separately, the various features (both structural and methodological) are intended to be selectively included or omitted to produce an embodiment with a particular set of features. Having been provided with the description and illustration of the present application, one skilled in the art may envision variations, modifications, and alternate embodiments falling within the spirit of the broader aspects of the general inventive concept embodied in this application that do not depart from the broader scope of the claimed invention.