The present invention relates generally to controlling memory allocation at a client, and more particularly to controlling, by a server, how the memory is allocated at the client based on information determined at the server.
There is a need for systems that remotely manage content displayed on a client. However, obtaining client information for media delivery management is bandwidth consuming due to the size of graphical data.
The field of Software Virtualization generally involves the creation of a remotely accessed instance of a software program or service that is rendered to the user by a local proxy of that program such that, to the user, the service operates with all the features of and with a similar latency to a local application such that a user cannot tell that the service is remote. Virtual machines can be executed remotely to provide graphics processing and other computational tasks, as might be needed by a remote client device. Software Virtualization allows complex software systems to be maintained in a central location and accessed in the user's premises on a local computing device, smart TV, set-top box or the like.
The software systems that are most commonly virtualized utilize the Linux operating system which has become the international standard for computer systems large and small. There is an increasing demand for and use of software applications that operate in a variation of Linux called Android which powers the majority of mobile devices worldwide. This Linux variant was specifically designed for compact systems that display information controlled by gestures, such as smart phones and tablets, and is finding increasing use in smart TVs driven by the demand for living room access to the same apps that are most popular on mobile phones especially social media and video apps such as YouTube and others. Android and its applications (apps) typically require the operating system to symmetrical access to the device's memory for both the central processing unit (CPU) and the graphics processing unit (GPU). Many modern compact devices employ such a unified architecture for numerous reasons, including reduced component count as well as the flexibility to dynamically trade GPU memory for CPU memory as appropriate for the application. Because of this flexibility there is generally not an incentive for Android apps to conservatively use GPU memory.
However, problems arise when serving client devices such as cable TV set-tops and smart TVs. Due to cost constraints when manufacturing such devices, when these devices are used to control and manage the display of large amounts of video programming, there are limitations to their internal processing capabilities. This is largely due to the fact that they typically use either a unified memory architecture with a fixed division such that the CPU gets one fixed portion and the GPU the remainder or they may even have a totally discrete memory architecture, that is, separate CPU and GPU memory. The result is that such devices are not able to offer the same flexibility nor capability as a dedicated native system and the virtualized applications that run on them must deal with this memory constraint. This is the challenge to which the systems and methods of this invention provide a novel solution in order to optimize the operation of software designed for a largely unconstrained hardware environment when a virtualized version of the same software must then operating in various constrained hardware architectures.
Embodiments described herein are directed to improved systems and methods for allocating, at a server system, memory between a GPU and a CPU memory of a client device, to enable media-providing applications, which require access to media stored in the GPU, to be executed at a server.
In accordance with some embodiments, a method performed at a server computing device for remotely managing memory allocation of a client device is provided. The server system generates a model of a first memory architecture of a client device, the model of the first memory architecture including a GPU memory portion and a CPU memory portion. The server system receives a representation of a first image asset, and stores a first texture image corresponding to the first image asset in the GPU memory portion of the model at the server system. The first texture image is stored in the GPU memory portion of the client device. The server system determines, using the model, that the GPU memory portion at the client device needs to be reallocated. The server system identifies, using the model, one or more texture images that are stored in the GPU memory portion at the client device to evict and transmits an instruction, to the client device, to evict the one or more texture images from the GPU memory portion.
In some embodiments, a computer readable storage medium storing one or more programs for execution by one or more processors of an electronic device is provided. The one or more programs include instructions for performing any of the methods described above.
In some embodiments, an electronic device (e.g., a server system) is provided. The server system comprises one or more processors and memory storing one or more programs for execution by the one or more processors, the one or more programs including instructions for performing any of the methods described above.
It will be recognized that, in various embodiments, operations described with regard to the client may apply to a server and vice versa.
For a better understanding of the aforementioned preferred embodiments of the invention, as well as additional embodiments thereof, reference should be made to the following drawings.
A virtual machine (VM) is a software emulation of a computer system which can be customized to include a predefined amount of random access memory (RAM), storage space, operating system (OS), and graphics hardware support typically in the form of a graphics processing unit (GPU); potentially in addition to other computing resources. Such virtual machines are a close equivalent of a physical computer and provide the functionality thereof.
Computer systems, either in the form of physical hardware or virtualized as a VM, typically use one of the following three memory architectures for their CPU and GPU components:
As used herein, an image asset is a CPU domain, two-dimensional picture, compressed in a known image compression format such as, but not limited to, PNG, JPEG, WebP.
As used herein, a texture image is a GPU domain single array of texture pixels of certain dimensionality, either in an uncompressed or compressed texture image format. In some embodiments, a texture image is further enabled to be downloaded to the CPU (e.g., a CPU can interpret texture images, whereas the GPU cannot interpret image assets). In some embodiments, when a texture image is downloaded to the CPU it is optionally compressed into a image asset (e.g., as an asset in the CPU domain). For example, the texture image can be stored in the CPU as a texture image and/or as a image asset (e.g., by compressing the texture image into an image asset, as described by step 603).
Some applications typically execute on (e.g., and are designed/programmed for) systems that have a unified memory architecture. For example, many modern compact devices such as phones and tablets employ a unified memory architecture, enabling reduced component count and the flexibility to trade GPU memory for CPU memory as fit for the application then running. In contrast, the client devices being served by the approach being taught herein, typically use either a unified fixed division regime or have a complete discrete architecture. The net result is that such devices cannot offer the same flexibility as a native system would and which the applications are coded to assume.
Because the unified architecture provides such flexibility, there is generally no incentive for applications to optimize their GPU memory usage. For example, currently it is not considered an issue if all the data for a certain graphic, such as a particular texture image on the GPU, is retained, even when it is temporarily not being used, since it is the same physical memory as the CPU's memory and the system dynamically grows the amount of logical memory assigned to the GPU. In contrast for more compact and cost-efficient systems, the novel approach being taught herein enables that function to be virtualized to a client, providing the ability it to manage its GPU memory more efficiently.
In the novel unified architecture being taught herein, GPU memory texture images can be moved to CPU memory on demand in real time. As one example, if a situation arose where an application needed space for four GPU texture images, the traditional architecture, with its fixed division or discrete architecture, might only have space for three GPU texture images. Therefore, to accommodate for the additional texture image, one or more texture images have to be evicted from the GPU memory and (temporarily) stored in the CPU's memory. In contrast, the unified architecture described herein has already assigned four texture images to GPU memory and further free memory is available to either assign to CPU or GPU.
A high-level system overview, as illustrated in
In teaching the server-side orchestration logic for the eviction, restoration and compression processes, it should be appreciated that there is a strict separation between the data plane and control plane. The data plane is the client downloading texture images from the GPU to the CPU, applying compression if necessary, optionally storing the downloaded texture images, decompression of image assets to new texture images and the process of uploading those texture images to the GPU. In some embodiments, none of the steps listed above is done by the client's own initiative. Everything happens under the orchestration of the server, the control plane. Hence, the flow charts describe the server-side logic to control the data plane.
In some embodiments, GPU texture image allocations 105 can be moved to CPU memory 110 on demand in real time. As one example, if a situation arose where an application needed space for four GPU texture image allocations, yet the traditional architecture, with its fixed division 102 or discrete architecture 103, might only have space for three GPU texture image allocations 109. Therefore, to accommodate for the additional texture image, one or more texture images would have to be evicted 107 from the GPU memory and (temporarily) stored in the CPU's memory 110. In contrast, the unified architecture being taught has already assigned four texture images to GPU memory and further free memory 104 is available to either assign to CPU or GPU.
A first embodiment of the process of downloading image assets 212 from an application backend 206 is given in
A second embodiment of the process of downloading image assets 212 from an application backend 206 is given in
In a third embodiment of the process of downloading image assets from an application backend the server downloads 401 the image assets 303 directly from the application backend 206 and the client subsequently downloads 402 the image assets 211 to its CPU memory 110, as depicted in
The net result of the three embodiments is that the client 203, has copies of image assets 212 from the application backend 206 and the server 201 either has the same image assets 303 or a digest 209 of them. The server's GPU memory model 205 tracks the client's GPU memory 109 usage and is used to decide, according to the decision logic taught in
Textures images may also be stored on the GPU in a compressed format. Such texture image compression is another tool available to achieve the same goal of running an application so that it remains unaware of restrictions on available GPU memory on a GPU memory bound client.
In teaching the server-side orchestration logic for the eviction, restoration and compression processes, it should be appreciated that there is a strict separation between the data plane and control plane. The data plane is the client downloading image assets from the application backend 206 or server 201, texture images from the GPU's memory 109 to the CPU's memory 110, compressing the downloaded texture image to an image asset 605, decompression of images assets to texture images 706, and the process of uploading that data to the GPU 703. None of that is done by the client's own initiative. Everything happens under the orchestration of the server 201. Hence, the flow charts 9-14 describe the server-side logic to control the data plane.
It should further be appreciated that the server performs the same operations on its models of the client's CPU memory 210 GPU memory so that the model is always in sync with the state of the client. That way, the server can perform its orchestration solely based on its models and never has to query the client for its state.
The server system generates (1504) a model of a first memory architecture of a client device, the model of the first memory architecture including a GPU memory portion and a CPU memory portion corresponding to a GPU memory portion and a CPU memory portion, respectively, at the client device.
The server system receives (1506) a representation of a first image asset.
In some embodiments, the representation of the first image asset comprises (1508) the first image asset and is received from an application backend. For example, as illustrated in
In some embodiments, the representation of the first image asset comprises (1510) a digest of an image asset received from the client device. For example, as illustrated in
In some embodiments, the representation of the first image asset comprises (1512) the first image asset and is received from the client device. For example, as illustrated in
In response to receiving the representation of the first image asset, store (1514) a first texture image corresponding to the first image asset in the GPU memory portion of the model at the server system. The first texture image is stored in the GPU memory portion of the client device. In some embodiments, the model of the first memory architecture comprises emulating memory of the client device, including storing the image assets and/or texture images within the respective GPU memory portion and/or CPU memory portion of the client device.
In some embodiments, the GPU memory portion of the client device is (1516) fixed and CPU memory portion of the client device is fixed. For example, as illustrated in
The server system determines (1518), using the model, that the GPU memory portion at the client device needs to be reallocated. For example, the server system determines that the GPU memory portion of the client device needs to accommodate new texture images that will be used by the client. In some embodiments, the server system executes an application (e.g., a media-providing application), and the server system, using the application, determines when respective texture images will be displayed. For example, as used herein, a determination that the GPU memory portion of the client needs to be “reallocated” refers to a determination that one or more texture images stored in the GPU memory portion needs to be swapped out (e.g., removed from the GPU memory portion) in order to make room for another texture image to be stored in the GPU memory portion in its place (e.g., the GPU memory portion has a limited amount of available memory that the server determines how to allocate (e.g., which texture images to store in the GPU memory at a given point in time)).
In some embodiments, the server system executes (1520) a virtual application. In some embodiments, determining that the GPU memory portion needs to be reallocated comprises, using the virtual application, predicting when (e.g., and how) a respective texture image needs to be accessible (e.g., loaded in the GPU) to the client device. For example, the server system uses the model generated at the server system without querying the client for its state.
In response to determining that the GPU memory portion of the client device needs to be reallocated, the server system identifies (1522), using the model, one or more texture images that are stored in the GPU memory portion at the client device to evict.
The server system transmits (1524) an instruction, to the client device, to evict the one or more texture images from the GPU memory portion. In some embodiments, the server system continues identifying texture images to evict from the GPU memory portion until enough GPU memory is freed to accommodate the new texture image allocation.
In some embodiments, the server system receives (1526) a representation of a second image asset. For example, the server system receiving a plurality of image assets that are generated by application backend 206 (e.g., to be displayed at the client 203). In some embodiments, the server system updates the model using the representation of the second image asset, including storing a second texture image corresponding to the second image asset in the GPU memory portion of the model. In some embodiments, the model is updated (e.g., in real-time) to reflect a current state of the client device's GPU and CPU memory allocation.
In some embodiments, after transmitting the instruction to the client device to evict a respective texture image of the one or more texture images from the GPU memory portion of the client device, the server system transmits (1528) an instruction to the client device to restore the respective texture image. In some embodiments, the instruction to restore the first texture image is transmitted in accordance with a determination that the first texture image is needed in the near future (e.g., will be used in the next frame, 5 frames, etc.). In this way, texture images are restored when they are needed for use by the GPU, and the GPU memory portion is able to dynamically allocate its memory to store texture images that are needed, and evict texture images that are not needed. If the GPU needs an texture image that is not currently stored in the GPU, then the client device needs to restore the texture image.
In some embodiments, the server system determines (1530) whether the client device can restore the respective texture image (e.g., that the client device has stored the respective texture image or a compressed version of the respective texture image (e.g., as a respective compressed texture image) to restore the respective texture image). In some embodiments, determining whether the client device can restore the respective texture image comprises determining whether the texture image has been modified on the GPU, as illustrated in
In some embodiments, the server system determines (1532) whether the client device can restore the respective texture image. In accordance with a determination that the client device cannot restore the respective texture image, the server system transmits an instruction to download the respective texture image from the GPU memory portion of the client device and store it as the respective texture image in the CPU memory portion of the client device.
In some embodiments, the instruction to download the respective texture image from the GPU memory portion of the client device to the CPU memory portion of the client device further includes (1534) an instruction for, after downloading the respective texture image, removing the respective texture image from the GPU memory.
In some embodiments, the server system transmits an instruction to the client device for, after downloading the respective texture image to the CPU memory portion, compressing the respective texture image into a compressed version of the respective texture image, wherein the CPU memory portion stores the compressed version of the respective texture image (e.g., and the client device, in response to the instruction, compresses the respective texture image).
In some embodiments, transmitting an instruction to the client device to re-upload the respective texture image stored in the CPU memory portion of the client device to the GPU memory portion of the client device, wherein the compressed version of the respective texture image is uploaded to the GPU memory portion of the client device.
In some embodiments, the server system transmits (1536) an instruction to the client device to compress the respective texture image asset into a compressed image asset after the client device downloads the respective texture image to the CPU memory portion, wherein the compressed image asset is stored in the CPU memory portion (e.g., as described with reference to
In some embodiments, the server system transmits (1538) an instruction to the client device to re-upload the respective texture image stored in the CPU memory portion of the client device to the GPU memory portion of the client device, including transmitting an instruction for the CPU memory portion to decompress the compressed image asset into the respective texture image before re-uploading the respective texture image to the GPU memory portion (e.g., as described with reference to
In some embodiments, as illustrated in
Memory 1606 includes high-speed random access memory, such as DRAM, SRAM, DDR RAM, or other random access solid-state memory devices; and may include non-volatile memory, such as one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, or other non-volatile solid-state storage devices. Memory 1606, optionally, includes one or more storage devices remotely located from one or more CPUs 1602. Memory 1606, or, alternatively, the non-volatile solid-state memory device(s) within memory 1606, includes a non-transitory computer-readable storage medium. In some implementations, memory 1606, or the non-transitory computer-readable storage medium of memory 1606, stores the following programs, modules and data structures, or a subset or superset thereof:
In some implementations, the server computer system 1600 includes web or Hypertext Transfer Protocol (HTTP) servers, File Transfer Protocol (FTP) servers, as well as web pages and applications implemented using Common Gateway Interface (CGI) script, PHP Hyper-text Preprocessor (PHP), Active Server Pages (ASP), Hyper Text Markup Language (HTML), Extensible Markup Language (XML), Java, JavaScript, Asynchronous JavaScript and XML (AJAX), XHP, Javelin, Wireless Universal Resource File (WURFL), and the like.
Although
The client device includes input/output module 1704, including output device(s) 1705, such as video output and audio output, and input device(s) 1707. In some implementations, the input devices 1707 include a keyboard, a remote controller, or a track pad. For example, output device 1705 is used for outputting video and/or audio content (e.g., to be reproduced by one or more displays and/or loudspeakers coupled with client device 1700) and/or input device 1707 is used for receiving user input (e.g., from a component of client device 1700 (e.g., keyboard, mouse, and/or touchscreen) and/or a control coupled to client device 1700 (e.g., a remote control)). Alternatively, or in addition, the client device includes (e.g., is coupled to) a display device (e.g., to display video output).
The client device includes application proxy 1703 for communicating with third-party applications that are executing on the server system. For example, instead of storing and executing the application(s) on the client device, application proxy 1703 receives commands (e.g., from a virtual machine in the server system) and, based on the received commands, instructs the client device to update the display accordingly.
In some implementations, the one or more network interfaces 1710 include wireless and/or wired interfaces for receiving data from and/or transmitting data to other client devices 1700, a server computer system 1600, and/or other devices or systems. In some implementations, data communications are carried out using any of a variety of custom or standard wired protocols (e.g., USB, Firewire, Ethernet, etc.).
Memory 1712 includes high-speed random-access memory, such as DRAM, SRAM, DDR RAM, or other random-access solid-state memory devices; and may include non-volatile memory, such as one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, or other non-volatile solid-state storage devices. Memory 1712 may optionally include one or more storage devices remotely located from the CPU(s) 1706. Memory 1712, or alternately, the non-volatile memory solid-state storage devices within memory 1712, includes a non-transitory computer-readable storage medium. In some implementations, memory 1712 or the non-transitory computer-readable storage medium of memory 1712 stores the following programs, modules, and data structures, or a subset or superset thereof:
Features of the present invention can be implemented in, using, or with the assistance of a computer program product, such as a storage medium (media) or computer readable storage medium (media) having instructions stored thereon/in which can be used to program a processing system to perform any of the features presented herein. The storage medium (e.g., the memory 1606 and the memory 1712) can include, but is not limited to, high-speed random access memory, such as DRAM, SRAM, DDR RAM or other random access solid state memory devices, and may include non-volatile memory, such as one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, or other non-volatile solid state storage devices. In some embodiments, the memory 1606 and the memory 1712 include one or more storage devices remotely located from the CPU(s) 1602 and 1706. The memory 1006 and the memory 1712, or alternatively the non-volatile memory device(s) within these memories, comprises a non-transitory computer readable storage medium.
It will be understood that, although the terms “first,” “second,” etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the claims. As used in the description of the embodiments and the appended claims, the singular forms “a,” “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will also be understood that the term “and/or” as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, elements, components, and/or groups thereof.
As used herein, the term “if” may be construed to mean “when” or “upon” or “in response to determining” or “in accordance with a determination” or “in response to detecting,” that a stated condition precedent is true, depending on the context. Similarly, the phrase “if it is determined [that a stated condition precedent is true]” or “if [a stated condition precedent is true]” or “when [a stated condition precedent is true]” may be construed to mean “upon determining” or “in response to determining” or “in accordance with a determination” or “upon detecting” or “in response to detecting” that the stated condition precedent is true, depending on the context.
The foregoing description, for purpose of explanation, has been described with reference to specific embodiments. However, the illustrative discussions above are not intended to be exhaustive or to limit the claims to the precise forms disclosed. Many modifications and variations are possible in view of the above teachings. The embodiments were chosen and described in order to best explain principles of operation and practical applications, to thereby enable others skilled in the art.
This application is a continuation of International Patent Application PCT/US21/61958, entitled “Systems and Methods for Virtual GPU-CPU Memory Orchestration,” filed Dec. 6, 2021, which claims priority to U.S. Provisional Patent Application No. 63/122,441, entitled “Systems and Methods for Virtual GPU-CPU Memory Orchestration,” filed on Dec. 7, 2020. This application is related to U.S. patent application Ser. No. 16/890,957, entitled “Orchestrated Control for Displaying Media,” filed on Jun. 2, 2020, which claims priority to U.S. Provisional Application No. 62/868,310, filed on Jun. 28, 2019, each of which is hereby incorporated by reference in its entirety. This application is also related to U.S. patent application Ser. No. 16/721,125, entitled “Systems and Methods of Orchestrated Networked Application Services,” filed on Dec. 19, 2019, which is a continuation of International Application No. PCT/US18/40118, filed Jun. 28, 2018, which claims priority to U.S. Provisional Application No. 62/526,954, filed Jun. 29, 2017, each of which is hereby incorporated by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
63122441 | Dec 2020 | US |
Number | Date | Country | |
---|---|---|---|
Parent | PCT/US2021/061958 | Dec 2021 | US |
Child | 17544854 | US |