Screen content, or data describing information displayed to a user by a computing system on a display, generally includes a number of different types of content. These can include, for example, text content, video content, static images (e.g., displays of windows or other GUI elements), and slides or other presentation materials. Increasingly, screen content is delivered remotely, for example so that two or more remote computing systems can share a common display, allowing two remotely-located individuals to view the same screen simultaneously, or otherwise in a teleconference such that a screen is shared among multiple individuals. Because screen content is delivered remotely, and due to increasing screen resolutions, it is desirable to compress this content to a size below its native bitmap size, to conserve bandwidth and improve efficiency in transmission.
Although a number of compression solutions exist for graphical data such as screen content, these compression solutions are inadequate for use with variable screen content. For example, traditional Moving Picture Experts Group (MPEG) codecs provide satisfactory compression for video content, since the compression solutions rely on differences between sequential frames. Furthermore, many devices have integrated MPEG decoders that can efficiently decode such encoded data. However, MPEG encoding does not provide substantial data compression for non-video content that may nevertheless change over time, and therefore is not typically used for screen content, in particular for remote screen display.
To address the above issues, a mix of codecs might be used for remote delivery of graphical data. For example, text data may use a lossless codec, while screen background data or video data, a lossy codec that compresses the data may be used (e.g., MPEG-4 AVC/264). Additionally, in some cases, the lossy compression may be performed on a progressive basis. However, this use of mixed codecs raises issues. First, because more than one codec is used to encode graphical data, multiple different codecs are also used at a remote computing system that receives the graphical data. In particular when the remote computing system is a thin client device, it is unlikely that all such codecs are supported by native hardware. Accordingly, software decoding on a general purpose processor is performed, which is computing resource intensive, and uses substantial power consumption. Additionally, because of the use of different codecs having different processing techniques and loss levels in different regions of a screen image, graphical remnants or artifacts can appear in low bandwidth circumstances.
In summary, the present disclosure relates to a universal codec used for screen content. In particular, the present disclosure relates generally to methods and systems for processing screen content, such as screen frames, which include a plurality of different types of screen content. Such screen content can include text, video, image, special effects, or other types of content. The universal code can be compliant with a standards-based codec, thereby allowing a computing system receiving encoded screen content to decode that content using a special-purpose processing unit commonly incorporated into such computing systems, and avoiding power-consumptive software decoding processes.
In a first aspect, a method includes receiving screen content comprising a plurality of screen frames, wherein at least one of the screen frames includes a plurality of types of screen content. The method also includes encoding the at least one of the screen frames, including the plurality of types of screen content, using a single codec, to generate an encoded bitstream compliant with a standards-based codec.
In a second aspect, a system includes a computing system which has a programmable circuit and a memory containing computer-executable instructions. When executed, the computer-executable instructions cause the computing system to provide to an encoder a plurality of screen frames, wherein at least one of the screen frames includes a plurality of types of screen content. They also cause the computing system to encode the at least one of the screen frames, including the plurality of types of screen content, using a single codec, to generate an encoded bitstream compliant with a standards-based codec.
In a third aspect, a computer-readable storage medium comprising computer-executable instructions stored thereon is disclosed. When executed by a computing system, the computer-executable instructions cause the computing system to perform a method that includes receiving screen content comprising a plurality of screen frames, wherein at least one of the screen frames includes text content, video content, and image content. The method also includes encoding the at least one of the screen frames, including the text content, video content, and image content, using a single codec, to generate an encoded bitstream compliant with a standards-based codec.
This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
As briefly described above, embodiments of the present invention are directed to a universal codec used for screen content. In particular, the present disclosure relates generally to methods and systems for processing screen content, such as screen frames, which include a plurality of different types of screen content. Such screen content can include text, video, image, special effects, or other types of content. The universal codec can be compliant with a standards-based codec, thereby allowing a computing system receiving encoded screen content to decode that content using a special-purpose processing unit commonly incorporated into such computing systems, and avoiding power-consumptive software decoding processes.
To address some limitations in remote screen display systems, the Remote Desktop Protocol (RDP) was developed by MICROSOFT® Corporation of Redmond, Wash. In this protocol, a screen frame is analyzed, with different contents classified differently. When RDP is used, a mixed collection of codecs can be applied, based on the type of screen content that is to be compressed and transmitted to a remote system for subsequent reconstruction and display. For example, text portions of a screen can use a lossless codec, while image and background data use a progressive codec for gradually improving screen quality. Video portions of the screen content are encoded using a standards-based video codec, such as MPEG-4 AVC/264; such standards-based codecs are traditionally limited to encoding video content or other single types of content. Accordingly, using the collection of multiple codecs allows RDP to treat each content type differently, maintaining quality of content not likely to change rapidly, while allowing for lower quality of more dynamic, changing content (e.g., video). However, this mixed collection of codecs results in computational complexity at both the encoder and decoder, by requiring both an encoding, transmitting computing system and a receiving, decoding computing system to be compatible with all codecs used. Furthermore, the mix of codecs often results in visual artifacts in screen content, in particular during low-bandwidth situations.
In some embodiments, and in contrast to existing RDP solutions, the universal codec of the present disclosure is constructed such that its output bitstream is compliant with a particular standards-based codec, such as an MPEG-based codec. Therefore, rather than using multiple codecs as would often be the case where multiple content types are transmitted, a single codec can be used, with the encoding tailored to the particular type of content that is to be transmitted. This avoids possible inconsistencies in screen image quality that may occur at the boundaries between regions encoded using different codecs. A computing system receiving that bitstream can utilize a commonly-used hardware decoder to decode the received bitstream. Furthermore, it is difficult to control bit rate for the mixed codec because of different properties between lossless codec and lossy codec. This avoids decoding the bitstream in the general purpose processor of that receiving computer, and consequently lowers the power consumption of the receiving computer.
In some embodiments of the present disclosure, the universal codec is implemented using a frame pre-analysis module that contains motion estimation or heuristical histogram processing to obtain properties of a particular region. A classifier can determine the type of content in each particular region of a frame, and segregate the content types into different macroblocks. Those macroblocks can be encoded using different parameters and qualities based on the type of content, and may be processed differently (e.g., using different motion estimation techniques). However, each type of content is generally encoded such that a resulting output is provided as a bitstream that is compatible with a standards-based codec. One example of such a standards-based codec can be MPEG-4 AVC/264; however, other codecs, such as HEVC/H.265, could be used as well.
Generally, the memory 106 includes a remote desktop protocol software 108 and an encoder 110. The remote desktop protocol software 108 generally is configured to replicate screen content presented on a local display 112 of the computing device 102 on a remote computing device, illustrated as remote device 120. In some embodiments, the remote desktop protocol software 108 generates content compatible with a Remote Desktop Protocol (RDP) defined by MICROSOFT® Corporation of Redmond, Wash.
As is discussed in further detail below, the encoder 110 can be configured to apply a universal content codec to content of a number of content types (e.g., text, video, images) such that the content is compressed for transmission to the remote device 120. In example embodiments, the encoder 110 can generate a bitstream that is compliant with a standards-based codec, such as an MPEG-based codec. In particular examples, the encoder 110 can be compliant with one or more codecs such as an MPEG-4 AVC/H.264 or HEVC/H.265 codec. Other types of standards-based encoding schemes or codecs could be used as well.
As illustrated in
In the context of the present disclosure, in some embodiments, a remote device 120 includes a main programmable circuit 124, such as a CPU, and a special-purpose programmable circuit 125. In example embodiments, the special-purpose programmable circuit 125 is a standards-based decoder, such as an MPEG decoder designed to encode or decode content having a particular standard (e.g., MPEG-4 AVC/H.264). In particular embodiments, the remote device 120 corresponds to a client device either local to or remote from the computing device 102, and which acts as a client device useable to receive screen content. Accordingly, from the perspective of the remote device 120, the computing device 102 corresponds to a remote source of graphical (e.g., display) content.
In addition, the remote device includes a memory 126 and a display 128. The memory 126 includes a remote desktop client 130 and display buffer 132. The remote desktop client 130 can be, for example, a software component configured to receive and decode screen content received from the computing device 102. In some embodiments, the remote desktop client 130 is configured to receive and process screen content for presenting a remote screen on the display 128. The screen content may be, in some embodiments, transmitted according to the Remote Desktop Protocol defined by MICROSOFT® Corporation of Redmond, Wash. The display buffer 132 stores in memory a current copy of screen content to be displayed on the display 128, for example as a bitmap in which regions can be selected and replaced when updates are available.
Referring now to
In the embodiment shown, a classification component 212 classifies the content in each screen frame as either video content 214, screen image or background content 216, or text content 218. For example, a particular screen frame can be segmented into macroblocks, and each macroblock is classified according to the content in that macroblock. For example, video content 214 is passed to a video encoder 220, shown as performing an encoding according to an MPEG-based codec, such as MPEG-4 AVC/264. Screen image or background content 216 is passed to a progressive encoder 222, which performs an iteratively improving encoding process in which low quality image data is initially encoded and provided to a remote system, and then improved over time as bandwidth allows. Further, text content 218 is provided to a text encoder 224, which encodes the text using a clear, lossless codec. Encoded content from each of the video encoder 220, progressive encoder 222, and text encoder 224 are passed back to a multiplexor 226 in the RDP pipeline 202, which aggregates the macroblocks and outputs a corresponding bitstream to a remote system.
In contrast,
Referring now to
Once the codec post-processor 414 determines that an overall screen frame is acceptable, it indicates to multiplexor 416 that the encoded bitstream 410 and metadata 412 are ready to be transmitted to a remote system for display, and the multiplexor 416 combines the video with any other accompanying data (e.g., audio or other data) for transmission. Alternatively, the codec post-processor 414 can opt to indicate to the multiplexor 416 to transmit the encoded bitstream 410, and can also indicate to the RDP scheduler 402 to attempt to progressively improve that image over time. This loop process can generally be repeated until a quality of a predetermined threshold is reached, as determined by the codec post-processor 414, or until there is not sufficient bandwidth for the frame (at which time the codec post-processor 414 signals to the multiplexor 416 to communicate the screen frame, irrespective of whether the quality threshold has been reached).
Referring now to
In the embodiment shown, a full screen frame is received at an input operation 502, and passed to a frame pre-analysis operation 504. The frame pre-analysis operation 504 computes properties of an input screen frame, such as its size, content types, and other metadata describing the screen frame. The frame pre-analysis operation 504 outputs a code unit of a particular block size, such as a 16×16 block size. An intra/inter macroblock processing operation 506 performs a mode decision, various types of movement predictions (discussed in further detail below), and specific encoding processes for each of various types of content included in the screen frame on each macroblock. The entropy encoder 508 receives the encoded data and residue coefficients from the various content encoding processes of the intra/inter macroblock processing operation 506, and provides a final, unified encoding of the screen frame in a format generally compatible with a selected standards-based codec useable for screen or graphical content.
If either the screen frame is a new frame or new scene, or based on the motion estimation parameters in the simple motion estimation process 604, a frame type decision process 606 determines whether a frame corresponds to an I-Frame, a P-Frame, or a B-Frame. Generally, the I-Frame corresponds to a reference frame, and is defined as a fully-specified picture. I-Frames can be, for example, a first frame or a scene change frame. A P-Frame is used to define forward predicted pictures, while a B-Frame is used to define bidirectionally predicted pictures. P-Frames and B-Frames are expressed as motion vectors and transform coefficients.
If the frame is an I-Frame, the frame is passed to a heuristic histogram process 608, which computes a histogram of the input, full screen content. Based on the computed histogram and a mean absolute difference also calculated at heuristic histogram process 608, an I-Frame analysis process 610 generates data used by a classification process 612, which can be used in the decision tree to detect whether data in a particular region (macroblock) of a frame corresponds to video, image, text, or special effects data.
If the frame is a P-Frame, the frame is passed to a P-Frame clustering process 614, which uses the sum absolute difference and motion vectors to unify classification information. A P-Frame analysis process 616 then analyzes the frame to generate metadata that helps the classification process 612 determine the type of content in each macroblock of the frame. Similarly, if the frame is a B-Frame, the frame is passed to a B-Frame clustering process 618, which uses the sum absolute difference and motion vectors to unify the sum absolute difference information. A B-Frame analysis process 620 then analyzes the frame to generate metadata that helps the classification process 612 determine the type of content in each macroblock of the frame. In the case of P-Frames and B-Frames, it is noted that these are unlikely to correspond to text content types, since they represent motion change frames defined as a difference from a prior frame, and are intended for encoding movement between frames (e.g., as in a video or image movement).
The classification process 612 uses metadata generated by analysis processes 610, 616, 620, and outputs metadata and macroblock data to various content encoding processes within the intra/inter macroblock processing operation 506. The content encoding processes can be used, for example, to customize the encoding performed on various types of content, to allow the universal codec to selectively vary quality within a single frame based on the type of content present in the frame. In particular, in the embodiment shown, the classification process 612 routes video content 622 to a video macroblock encoding process 624, screen and background content 626 to a screen and background macroblock encoding process 628, special effects content 630 to a special effects macroblock encoding process 632, and text content 634 to a text macroblock encoding process 636. Generally, each of the encoding processes 624, 628, 632, 636 can use different mode decisions and motion estimation algorithms to encode each macroblock differently. Examples of such encoding processes are discussed further below in connection with
Referring now to
From either the high-complexity intra-macroblock prediction operation 706 or hybrid motion estimation operation 708, a transform and quantization operation 710 is performed, as well as an inverse quantization and transform operation 712. A further motion prediction operation 714 is further performed, with the predicted motion passed to adaptive loop filter 716. In some embodiments, the adaptive loop filter 716 is implemented as an adaptive deblocking filter, further improving a resulting encoded image. The resulting image blocks are then passed to a picture reference cache 718, which stores an aggregated screen frame. It is noted that the picture reference cache 718 is also provided for use by the hybrid motion estimation operation 708, for example to allow for inter-macroblock comparisons used in that motion estimation process.
Referring now to
As with the video encoder, from either the high-complexity intra-macroblock prediction operation 806 or global motion estimation operation 810, a transform and quantization operation 812 is performed, as well as an inverse quantization and transform operation 814. A further motion prediction operation 816 is further performed, with the predicted motion passed to adaptive loop filter 818. In some embodiments, the adaptive loop filter 818 is implemented as an adaptive deblocking filter, further improving a resulting encoded image. The resulting image blocks are then passed to a picture reference cache 718, which stores the aggregated screen frame including macroblocks of all types. It is noted that the picture reference cache 718 is also provided for use by the simple motion estimation operation 808, for example to allow for inter-macroblock comparisons used in that motion estimation process.
Referring now to
Generally, the special effects content encoder 900 separates intra-macroblock content 902 and inter-macroblock content 904 based on a mode decision received at the special effects content encoder 900, similar to the video encoder 700 and image content encoder 800 discussed above. The special effects content encoder 900 includes a high-complexity intra-macroblock prediction operation 906 analogous to those discussed above. However, in the special effects content encoder 900, rather than a hybrid motion estimation or simple motion estimation, a weighted motion estimation operation 908 is performed, followed by a motion vector smooth filter operation 910. The weighted motion estimation operation 908 utilizes luminance changes and simple motion detection to detect such special effects without requiring use of computing-intensive video encoding to detect changes between frames. The motion vector smooth filter operation is provided to improve coding performance of the motion vector, as well as to improve the visual quality of the special effects screen content. An example of a motion vector smooth filter that can be used to perform the motion vector smooth filter operation 910 is illustrated in
Similar to the video encoder 700 and image content encoder 800, from either the high-complexity intra-macroblock prediction operation 906 or motion vector smooth filter operation 910, a transform and quantization operation 912 is performed, as well as an inverse quantization and transform operation 914. A further motion prediction operation 916 is further performed, with the predicted motion passed to adaptive loop filter 918. In some embodiments, the adaptive loop filter 918 is implemented as an adaptive deblocking filter, further improving a encoded image. The resulting image blocks are then passed to the picture reference cache 718. It is noted that the picture reference cache 718 is also provided for use by the weighted motion estimation operation 908, for example to allow for inter-macroblock comparisons used in that motion estimation process.
Referring to
Similar to encoders 700-900, from either the low complexity motion prediction operation 1006 or motion vector smooth filter operation 1010, a transform and quantization operation 1012 is performed, as well as an inverse quantization and transform operation 1014. A further motion prediction operation 1016 is further performed. The resulting text blocks are then passed to the picture reference cache 718, which stores an aggregated screen frame. It is noted that the picture reference cache 718 is also provided for use by the text motion estimation operation 1008, for example to allow for inter-macroblock comparisons used in that motion estimation process.
Referring generally to
Referring to
From either the diamond motion estimation 1114, or if the fast skip decision 1104 determines that downsampling is not required (e.g., the motion estimation is already adequate following square motion estimation 1102), an end operation 1118 indicates completion of motion estimation for that macroblock.
As stated above, a number of program modules and data files may be stored in the system memory 1804. While executing on the processing unit 1802, the program modules 1806 (e.g., remote desktop protocol software 108 and encoder 110) may perform processes including, but not limited to, the operations of a universal codec encoder or decoder, as described herein. Other program modules that may be used in accordance with embodiments of the present invention, and in particular to generate screen content, may include electronic mail and contacts applications, word processing applications, spreadsheet applications, database applications, slide presentation applications, drawing or computer-aided application programs, etc.
Furthermore, embodiments of the invention may be practiced in an electrical circuit comprising discrete electronic elements, packaged or integrated electronic chips containing logic gates, a circuit utilizing a microprocessor, or on a single chip containing electronic elements or microprocessors. For example, embodiments of the invention may be practiced via a system-on-a-chip (SOC) where each or many of the components illustrated in
The computing device 1800 may also have one or more input device(s) 1812 such as a keyboard, a mouse, a pen, a sound or voice input device, a touch or swipe input device, etc. The output device(s) 1814 such as a display, speakers, a printer, etc. may also be included. The aforementioned devices are examples and others may be used. The computing device 1800 may include one or more communication connections 1816 allowing communications with other computing devices 1818. Examples of suitable communication connections 1816 include, but are not limited to, RF transmitter, receiver, and/or transceiver circuitry; universal serial bus (USB), parallel, and/or serial ports.
The term computer readable media as used herein may include computer storage media. Computer storage media may include volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information, such as computer readable instructions, data structures, or program modules. The system memory 1804, the removable storage device 1809, and the non-removable storage device 1810 are all computer storage media examples (i.e., memory storage.) Computer storage media may include RAM, ROM, electrically erasable read-only memory (EEPROM), flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other article of manufacture which can be used to store information and which can be accessed by the computing device 1800. Any such computer storage media may be part of the computing device 1800. Computer storage media does not include a carrier wave or other propagated or modulated data signal.
Communication media may be embodied by computer readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave or other transport mechanism, and includes any information delivery media. The term “modulated data signal” may describe a signal that has one or more characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media may include wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, radio frequency (RF), infrared, and other wireless media.
One or more application programs 1966 may be loaded into the memory 1962 and run on or in association with the operating system 1964. Examples of the application programs include phone dialer programs, e-mail programs, personal information management (PIM) programs, word processing programs, spreadsheet programs, Internet browser programs, messaging programs, and so forth. The system 1902 also includes a non-volatile storage area 1968 within the memory 1962. The non-volatile storage area 1968 may be used to store persistent information that should not be lost if the system 1902 is powered down. The application programs 1966 may use and store information in the non-volatile storage area 1968, such as e-mail or other messages used by an e-mail application, and the like. A synchronization application (not shown) also resides on the system 1902 and is programmed to interact with a corresponding synchronization application resident on a host computer to keep the information stored in the non-volatile storage area 1968 synchronized with corresponding information stored at the host computer. As should be appreciated, other applications may be loaded into the memory 1962 and run on the mobile computing device 1900, including the remote desktop protocol software 108 (and/or optionally encoder 110, or remote device 120) described herein. In some analogous systems, an inverse process can be performed via system 1902, in which the system acts as a remote device 120 for decoding a bitstream generated using a universal screen content codec.
The system 1902 has a power supply 1970, which may be implemented as one or more batteries. The power supply 1970 might further include an external power source, such as an AC adapter or a powered docking cradle that supplements or recharges the batteries.
The system 1902 may also include a radio 1972 that performs the function of transmitting and receiving radio frequency communications. The radio 1972 facilitates wireless connectivity between the system 1902 and the “outside world,” via a communications carrier or service provider. Transmissions to and from the radio 1972 are conducted under control of the operating system 1964. In other words, communications received by the radio 1972 may be disseminated to the application programs 1966 via the operating system 1964, and vice versa.
The visual indicator 1920 may be used to provide visual notifications, and/or an audio interface 1974 may be used for producing audible notifications via the audio transducer 1925. In the illustrated embodiment, the visual indicator 1920 is a light emitting diode (LED) and the audio transducer 1925 is a speaker. These devices may be directly coupled to the power supply 1970 so that when activated, they remain on for a duration dictated by the notification mechanism even though the processor 1960 and other components might shut down for conserving battery power. The LED may be programmed to remain on indefinitely until the user takes action to indicate the powered-on status of the device. The audio interface 1974 is used to provide audible signals to and receive audible signals from the user. For example, in addition to being coupled to the audio transducer 1925, the audio interface 1974 may also be coupled to a microphone to receive audible input, such as to facilitate a telephone conversation. In accordance with embodiments of the present invention, the microphone may also serve as an audio sensor to facilitate control of notifications, as will be described below. The system 1902 may further include a video interface 1976 that enables an operation of an on-board camera 1930 to record still images, video stream, and the like.
A mobile computing device 1900 implementing the system 1902 may have additional features or functionality. For example, the mobile computing device 1900 may also include additional data storage devices (removable and/or non-removable) such as, magnetic disks, optical disks, or tape. Such additional storage is illustrated in
Data/information generated or captured by the mobile computing device 1900 and stored via the system 1902 may be stored locally on the mobile computing device 1900, as described above, or the data may be stored on any number of storage media that may be accessed by the device via the radio 1972 or via a wired connection between the mobile computing device 1900 and a separate computing device associated with the mobile computing device 1900, for example, a server computer in a distributed computing network, such as the Internet. As should be appreciated such data/information may be accessed via the mobile computing device 1900 via the radio 1972 or via a distributed computing network. Similarly, such data/information may be readily transferred between computing devices for storage and use according to well-known data/information transfer and storage means, including electronic mail and collaborative data/information sharing systems.
Embodiments of the present invention, for example, are described above with reference to block diagrams and/or operational illustrations of methods, systems, and computer program products according to embodiments of the invention. The functions/acts noted in the blocks may occur out of the order as shown in any flowchart. For example, two blocks shown in succession may in fact be executed substantially concurrently or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved.
The description and illustration of one or more embodiments provided in this application are not intended to limit or restrict the scope of the invention as claimed in any way. The embodiments, examples, and details provided in this application are considered sufficient to convey possession and enable others to make and use the best mode of claimed invention. The claimed invention should not be construed as being limited to any embodiment, example, or detail provided in this application. Regardless of whether shown and described in combination or separately, the various features (both structural and methodological) are intended to be selectively included or omitted to produce an embodiment with a particular set of features. Having been provided with the description and illustration of the present application, one skilled in the art may envision variations, modifications, and alternate embodiments falling within the spirit of the broader aspects of the general inventive concept embodied in this application that do not depart from the broader scope of the claimed invention.