Embodiments of the invention relate to graphics systems and performance management of graphics systems.
The advance in graphics systems enables rapid development of graphics-intensive applications, such as video games, virtual reality, artificial intelligence, and the like. To execute these applications, a graphics system consumes a significant amount of system resources and power. Analyzing when and how much a graphics application utilizes system resources and power can help allocating system resource, setting time budget and scheduling. One indicator of how much a graphics application utilizes allocated system resources is the frame running time, which, defined at a high level, is the time duration when the application is actively executing one or more tasks for rendering a frame.
To optimize a system's power efficiency, it is a goal of a system designer to allocate a time-constrained task the least amount of resources that the task needs to complete just in time. Identifying the frame running time can help the system designer to achieve this goal.
Typically, user experience (UX) applications stay running until the completion of a frame. Some other applications, such as game applications, may have intermittent wakeup and sleep periods. As game applications typically have many interdependent parallel threads executing and sleeping at different times, identifying the frame running time for such applications can be a difficult task.
Conventional methods for estimating the frame running time generally ignore the duration of sleep of these threads, thus over-estimating the frame running time. One of these threads is a render thread. Using the render thread's running time to represent the frame running time would under-estimate the frame running time.
In one embodiment, a device is provided for dynamically estimating frame running time. The device comprises a processor to execute a plurality of threads of an application, and a graphics processor to receive commands from the processor for rendering frames. For one or more of the frames, the processor is operative to: record a timer period for each thread in a set of threads contributing to operations of a render thread which writes the commands for the graphics processor to render the frames; calculate a frame non-running time for a current frame using recorded one or more timer periods; and calculate the frame running time for the current frame by subtracting the frame non-running time from an end-to-end frame period. Each thread in the set of threads has a corresponding timer that controls a sleep state of the thread.
In another embodiment, a method is provided for dynamically estimating frame running time. The method comprises: recording a timer period for each thread in a set of threads contributing to operations of a render thread which writes commands for a graphics processor to render frames; calculating a frame non-running time for a current frame using recorded one or more timer periods; and calculating the frame running time for the current frame by subtracting the frame non-running time from an end-to-end frame period. Each thread in the set of threads has a corresponding timer that controls a sleep state of the thread.
In yet another embodiment, a processor is operative to dynamically estimate frame running time. The processor comprises memory containing instructions that when executed cause the processor to perform operations of: recording a timer period for each thread in a set of threads contributing to operations of a render thread which writes commands for a graphics processor to render frames; calculating a frame non-running time for a current frame using recorded one or more timer periods; and calculating the frame running time for the current frame by subtracting the frame non-running time from an end-to-end frame period. Each thread in the set of threads has a corresponding timer that controls a sleep state of the thread.
Embodiments of the invention improve accuracy in the estimation of frame running time. A system may use the estimated frame running time to allocate system resource such that graphics execution and rendering tasks can be finished just in time within a time budget of a frame to thereby minimize system resource waste. Reduction in system source waste, in turn, reduces system power consumption.
The present invention is illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings in which like references indicate similar elements. It should be noted that different references to “an” or “one” embodiment in this disclosure are not necessarily to the same embodiment, and such references mean at least one. Further, when a particular feature, structure, or characteristic is described in connection with an embodiment, it is submitted that it is within the knowledge of one skilled in the art to effect such feature, structure, or characteristic in connection with other embodiments whether or not explicitly described.
In the following description, numerous specific details are set forth. However, it is understood that embodiments of the invention may be practiced without these specific details. In other instances, well-known circuits, structures and techniques have not been shown in detail in order not to obscure the understanding of this description. It will be appreciated, however, by one skilled in the art, that the invention may be practiced without such specific details. Those of ordinary skill in the art, with the included descriptions, will be able to implement appropriate functionality without undue experimentation.
Embodiments of the invention provide a device, a method and a processor, which calculate frame running time based on timers associated with a set of threads of an application. Some or all of the threads may be time-constrained; that is, they must complete scheduled tasks before a deadline (or respective deadlines). The knowledge of execution behaviors in a frame improves allocation of system resource. A process may request an amount of system resource for executing the application based on, at least in part, the frame running time. An end-to-end frame period (or referred to as “frame period”) may include frame running time and frame non-running time. The frame non-running time can be calculated by the union of timer periods of a set of threads that contribute to the operations of the render thread in a frame. In the following description according to embodiments of the invention, the set of threads that contribute to the operations of the render thread may also include the render thread. The term “contributing thread” more specifically refers to a thread that wakes up the render thread, or a thread that is not the render thread but contributes to the operations of the render thread either directly or indirectly.
A processor, such as a central processing unit (CPU), may execute contributing threads as well as non-contributing threads and a render thread during a frame period. The render thread is executed by the CPU during each frame period to instruct a graphics processor, such as a graphics processing unit (GPU), to render graphics in a frame to be displayed on a display screen. Although CPU and GPU are described herein, it is understood that the CPU and the GPU may be replaced by other types of processors, where the former commands the latter to render graphics in frames (also referred to as “to render frames”).
A contributing thread may be associated with (i.e., corresponding to) zero, one or more timers. Generally, when a system executes graphics applications, timer(s) may be deployed to control the time at which a thread's running time start. The timer is set when the thread goes to sleep, and the system wakes up the thread when the timer ends. The timer start time and the timer end time are recorded in the system. The timer period, which is the time duration between the timer start time and the timer end time, can be used for calculating a frame non-running time. The end-to-end frame period minus the frame non-running time is the frame running time. In the following description, the terms “time” and “period” may be used interchangeably. For example, the term “running time” is equivalent to “execution period.” Furthermore, in the following description a contributing thread is described as having a corresponding timer. It is understood that the description can be extended to a contributing thread having more than one corresponding timer. A contributing thread having no corresponding timer is not taken into account when estimating the frame running time.
A thread's running time is the time duration when its corresponding timer does not run. During a contributing thread's running time, the contributing thread may be actively performing tasks in a number of time slices and may be stalled between the time slices. The stalls may be caused by resource sharing, waiting for results from other threads or software entities, waiting for a lower-layer software entity to complete a task, and the like. During the contributing thread's running time, the number of stalls may be so numerous that tracking the start and end times of each time slice can be impractical. The contributing thread's timer does not track the stalled time between the time slices in the running time, and the timer is observable by the operating system's framework (e.g. the application framework) that supports the execution of the application. Thus, using timer periods to estimate the frame running time incurs low overhead to the system. The estimated frame running time can be used by the system to determine the right amount of system resource to request, such that the allocated system resource can be fully utilized without waste while meeting the performance requirements such as the time budget for the frame period. For example, a system may determine a system resource to allocate to a graphic application or a gaming engine to render a next frame based on the frame running time estimated from the current frame (or the current frames and one or more prior frames). When the right amount of system resource is allocated, the next frame can be rendered just in time within the time budget deadline. Reducing the allocation of system resource can reduce system power consumption. The non-frame running time is an indication of under-utilization of system resources; the system resource is allocated with an aim to minimize the non-frame running time in the next frame or frames. The allocated system resource may include processing capacity, memory capacity, power allocation, time budget, etc.
According to embodiments of the invention, the frame running time may be calculated based on the knowledge of the threads' dependency relationship with the render thread, and the knowledge of each contributing thread's state change (e.g., when the contributing thread wakes up and sleeps). The application framework keeps track of this dependency relationship and the state change of each thread, and retains the history of frame running time and the history of system resource for determining the right amount of system resource to allocate in the next frames.
In one embodiment, the CPU 150 executes an application 110, which includes instructions for generating graphics content to be displayed on the display 140. In one embodiment, the CPU 150 sends commands to the GPU 120 to render frames in accordance with execution of the application 110. The commands may be queued in a command buffer. After the GPU 120 renders the frames, the content of each frame is sent to a frame buffer. A display controller reads the content from the frame buffer for display on the display 140.
The CPU 150 includes circuitry to perform logical and mathematical calculations. For the purpose of rendering frames for the application 110, the CPU 150 may generate commands for the GPU 120 to execute. The GPU 120 includes circuitry to perform the operations of graphics modeling and processing. For example, the GPU 120 may model a graphical object with primitives, manipulate the primitives by their vertices and pixels, generate surfaces containing rendered graphics, composite the surfaces, and write the composited surface to a frame buffer.
The GPU 120 generates new frames at a frame rate controlled by the CPU 150 according to a time budget. The frame rate may be a fixed frame rate or may vary from one frame to the next. The display 140 refreshes the displayed content at a refresh rate, which may be the same or different from the frame rate.
In one embodiment, the CPU 150 executes a render thread to generate commands for each frame of the application 110. In every frame, the render thread wakes up to write commands into the command buffer, and after finishing writing the commands for the frame, the render thread goes back to sleep. The time instant when the render thread enters the sleep state is the beginning of the next frame. The render thread may be wakened up by another thread (i.e., a contributing thread) when that thread has produced some output that triggers the wakeup of the render thread. In one embodiment, the render thread's running time is included in the frame running time.
In one embodiment, the device 100 executes an application framework 160 which provides a software infrastructure for the application 110 to interface with lower-layers of the operating system (e.g., drivers) and graphics execution. The application framework 160 supports the execution of the application 110 and other applications that runs on the device 100. In one embodiment, the application framework 160 estimates the frame running time for each frame of the application 110 execution, and requests an amount of system resource to be allocated for the execution of the application 110 based on the frame running time.
According to embodiments of the invention, the lower bound of the frame running time is the render thread's running time, and the upper bound is the end-to-end frame time. The difference between the end-to-end frame time and the frame running time is the frame non-running time, during which at least one contributing thread is in the sleep state. According to embodiments of the invention, the frame non-running time is calculated first using the union of timer periods, and the frame running time is calculated by subtracting the frame non-running time from the end-to-end frame period. An example where there are two contributing threads is provided in
At step 320, the frame non-running time is calculated by removing the overlapping execution period from Union, where the overlapping execution period is the portion of Union that overlaps with the render thread's running time (Tr). Thus, during the frame non-running time, at least one contributing thread is in the sleep state. If any portion of the frame non-running time falls outside the current frame's end-to-end frame period, that portion is not counted as part of the frame non-running time. Finally, the frame running time (Tfr) for the current frame is the end-to-end frame period (Tf) minus the frame non-running time.
In one embodiment, when the dependency relationship between a given thread and the render thread is unknown, the given thread may be assumed to be a contributing thread to the render thread.
The method 400 begins at step 410 when a frame (i.e., a current frame) starts. The frame loading history is estimated at step 420, taking into account the frame loading of the current frame and past frames. The frame loading refers to the workload on the system (e.g., the CPU 150) incurred by a frame. In one embodiment, the frame loading may be calculated by multiplying the frame running time and the system resource utilized by a frame. In one embodiment, the estimation of the frame loading history may include a history of frame running time 421 (i.e., the frame running time of the current and past frames) and a history of utilized system resource 422 (i.e., the system resource utilized by the current and past frames). According to the estimated frame loading history and the time budget for generating a next frame (or next frames), an amount of system resource is requested for the next frame or frames. For example, the requested amount of system resource may be equal to the average frame loading divided by the time budget for the next frame, where the average frame loading is calculated from the frame loading history of the current and N past frames (N is a positive integer). The calculation of requested system resource allows full utilization of allocated system resources while satisfying the time budget for on-time frame generation.
The method 500 begins at step 510 with the application framework 160 recording a timer period for each thread in a set of threads contributing to operations of a render thread, where the render thread writes commands for a graphics processor to render frames. Each thread in the set of threads has a corresponding timer that controls a sleep state of the thread. At step 520, a frame non-running time is calculated for a current frame using recorded one or more timer periods. At step 530, the frame running time is calculated for the current frame by subtracting the frame non-running time from an end-to-end frame period.
It is noted that operations performed by the application framework 160 are executed by the CPU 150 according to instructions contained in the application framework 160. The instructions may be stored in a machine-readable medium (such as a non-transitory machine readable storage medium). The non-transitory machine-readable medium may be any suitable tangible medium including a magnetic, optical, or electrical storage medium, which include volatile or non-volatile storage mechanisms. The machine-readable medium may contain various sets of instructions, code sequences, configuration information, or other data, which, when executed, cause a processor to perform steps in a method according to an embodiment.
The operations of the flow diagrams of
While the invention has been described in terms of several embodiments, those skilled in the art will recognize that the invention is not limited to the embodiments described, and can be practiced with modification and alteration within the spirit and scope of the appended claims. The description is thus to be regarded as illustrative instead of limiting.
This application claims the benefit of U.S. Provisional Application No. 62/503,999 filed on May 10, 2017.
Number | Date | Country | |
---|---|---|---|
62503999 | May 2017 | US |