Disclosed aspects are directed to resource allocation in a processing system. More specifically, exemplary aspects are directed to a distributed management of bandwidth allocation in a processing system.
Some processing systems may include shared resources, such as a shared memory, shared among various consumers, such as processing elements. With advances in technology, there is an increasing trend in the number of consumers that are integrated in a processing system. However, this trend also increases competition and conflict for the shared resources. It is difficult to allocate memory bandwidth of the shared memory, for example, among the various consumers, while also guaranteeing the expected quality of service (QoS) or other performance metrics for all the consumers.
Conventional bandwidth allocation mechanisms tend to be conservative in the allocation of available memory bandwidth to the various consumers, with a view to avoiding situations wherein desired memory bandwidth is not available for timing-critical or bandwidth-sensitive applications. However, such conservative approaches may lead to underutilization of the available bandwidth. Accordingly, there is a need in the art for improved allocation of available memory bandwidth.
Exemplary aspects of the invention are directed to systems and method for relate to distributed allocation of bandwidth for accessing a shared memory. A memory controller which controls access to the shared memory, receives requests for bandwidth for accessing the shared memory from a plurality of requesting agents. The memory controller includes a saturation monitor to determine a saturation level of the bandwidth for accessing the shared memory. A request rate governor at each requesting agent determines a target request rate for the requesting agent based on the saturation level and a proportional bandwidth share allocated to the requesting agent, the proportional share based on a Quality of Service (QoS) class of the requesting agent.
For example, an exemplary aspect is directed to a method distributed allocation of bandwidth, the method comprising: requesting bandwidth for accessing a shared memory, by a plurality of requesting agents, determining a saturation level of the bandwidth for accessing the shared memory in a memory controller for controlling access to the shared memory, and determining target request rates at each requesting agent based on the saturation level and proportional bandwidth share allocated to the requesting agent based on a Quality of Service (QoS) class of the requesting agent.
Another exemplary aspect is directed to an apparatus comprising: a shared memory, a plurality of requesting agents configured to request access to the shared memory and a memory controller configured to control access to the shared memory, wherein the memory controller comprises a saturation monitor configured to determine a saturation level of bandwidth for access to the shared memory. The apparatus also comprise a request rate governor configured to determine a target request rate at each requesting agent based on the saturation level and a proportional bandwidth share allocated to the requesting agent based on a Quality of Service (QoS) class of the requesting agent.
Another exemplary aspect is directed to an apparatus comprising: means requesting bandwidth for accessing a shared memory, means for controlling access to the shared memory comprising means for determining a saturation level of the bandwidth for accessing the shared memory, and means for determining a target request rate at each means for requesting based on the saturation level and a proportional bandwidth share allocated to the means for requesting agent based on a Quality of Service (QoS) class of the means for requesting.
Yet another exemplary aspect is directed to a non-transitory computer readable storage medium comprising code, which, when executed by a processor, cause the processor to perform operations for distributed allocation of bandwidth, the non-transitory computer readable storage medium comprising code for requesting bandwidth for accessing a shared memory, by a plurality of requesting agents, code for determining a saturation level of the bandwidth for accessing the shared memory, at a memory controller for controlling access to the shared memory, and code for determining target request rates at each requesting agent based on the saturation level and proportional bandwidth share allocated to the requesting agent based on a Quality of Service (QoS) class of the requesting agent.
The accompanying drawings are presented to aid in the description of aspects of the invention and are provided solely for illustration of the aspects and not limitation thereof.
Aspects of the invention are disclosed in the following description and related drawings directed to specific aspects of the invention. Alternate aspects may be devised without departing from the scope of the invention. Additionally, well-known elements of the invention will not be described in detail or will be omitted so as not to obscure the relevant details of the invention.
The word “exemplary” is used herein to mean “serving as an example, instance, or illustration.” Any aspect described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other aspects. Likewise, the term “aspects of the invention” does not require that all aspects of the invention include the discussed feature, advantage or mode of operation.
The terminology used herein is for the purpose of describing particular aspects only and is not intended to be limiting of aspects of the invention. As used herein, the singular forms “a,” “an,” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises”, “comprising,” “includes,” and/or “including,” when used herein, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
Further, many aspects are described in terms of sequences of actions to be performed by, for example, elements of a computing device. It will be recognized that various actions described herein can be performed by specific circuits (e.g., application specific integrated circuits (ASICs)), by program instructions being executed by one or more processors, or by a combination of both. Additionally, these sequence of actions described herein can be considered to be embodied entirely within any form of computer readable storage medium having stored therein a corresponding set of computer instructions that upon execution would cause an associated processor to perform the functionality described herein. Thus, the various aspects of the invention may be embodied in a number of different forms, all of which have been contemplated to be within the scope of the claimed subject matter. In addition, for each of the aspects described herein, the corresponding form of any such aspects may be described herein as, for example, “logic configured to” perform the described action.
Exemplary aspects of this disclosure are directed to processing systems comprising at least one shared resource such as a shared memory, shared among two or more consumers or requesting agents of the shared resource. In one example, the requesting agents can be processors, caches, or other agents which may access the shared memory. The requests may be forwarded to a memory controller which controls access to the shared memory. In some instances, the requesting agents may also be referred to as sources from which requests are generated or forwarded to the memory controller. The requesting agents may be grouped into classes with a Quality of Service (QoS) associated with each class.
According to exemplary aspects, bandwidth for the shared memory may be allocated in units of proportional shares of the total bandwidth to each QoS class, such that the bandwidth for each QoS class is sufficient to at least satisfy the QoS metrics for that QoS class. The parameter βi, where the “i” index identifies a QoS class to which a requesting agent belongs, is referred to as a “proportional share weight” for the QoS class (in other words, the proportional share weight indicates the proportional share of the bandwidth assigned to the agent based on the respective QoS of the class to which the agent belongs). In correspondence to the proportional share weight βi per class, a parameter αi is also defined per class, wherein for a QoS class identified by “i”, αi is referred to as a “proportional share stride” for the QoS class. In exemplary aspects, the proportional share stride αi of a QoS class is the inverse of the proportional share weight βi of the QoS class. The proportional share stride αi of the QoS class is representative of a relative cost of servicing a request from the QoS class.
When excess bandwidth is available, one or more QoS classes may be allotted the excess bandwidth, once again in proportion, based on the respective proportional share parameters αi or βi of the QoS classes. Exemplary aspects of proportional bandwidth distribution are designed to guarantee the QoS for each class, while avoiding problems of underutilization of excess bandwidth.
In an aspect, a saturation monitor can be associated with the memory controller for the shared resource or shared memory. The saturation monitor can be configured to output a saturation signal indicating one or more levels of saturation. The saturation level may provide an indication of the number of outstanding requests to be serviced during a given interval of time, and can be measured in various ways, including, for example, based on a count of the number of requests in an incoming queue waiting to be scheduled by the memory controller for accessing the shared memory, a number of requests which are denied access or are rejected from being scheduled for access to the shared resource due to lack of bandwidth, etc. The given interval may be referred to as an epoch, and can be measured in units of time, e.g., microseconds, or a number of clock cycles, for example. The length of the epoch can be application specific. The saturation monitor can output a saturation signal at one of one or more levels, for example, to indicate an unsaturated state, and one or more levels such as a low, medium, or high saturated states of the shared resource.
At each requesting agent, a governor is provided, to adjust the rate at which requests are generated from the agent, based on the saturation signal. The governors implement a governor algorithm which is distributed across the agents, in the sense that at every epoch, each governor recalculates a target request rate of its corresponding requesting agent without having to communicate with other governors of other requesting agents. In exemplary aspects, each governor can calculate the target request rate of its respective requesting agent based on knowledge of the epoch boundaries and the saturation signal, without communication with the other requesting agents.
With reference now to
Each time processors 102a-b request data from private caches 104a-b, respectively, and there is a miss in the respective private caches 104a-b, the private caches 104a-b will forward the requests to memory controller 106 for the requested data to be fetched from memory 112 (e.g., in an example where the request is a read request). The requests from private caches 104a-b are also referred to as incoming memory requests from the perspective of memory controller 106. Since memory 112 may be located off-chip or even in on-chip implementations, may involve long wires/interconnects for transfer of data, the interfaces to memory 112 (e.g., interface 114) may have bandwidth restrictions which may limit the number of incoming memory requests which can be serviced at any given time. Memory controller 106 may implement queuing mechanisms (not shown specifically) for queuing the incoming memory requests before they are serviced. If the queuing mechanisms are full or saturated, some incoming memory requests may be rejected in one or more ways described below.
Memory controller 106 is shown to include saturation monitor 108, wherein saturation monitor 108 is configured to determine a saturation level. The saturation level can be determined in various ways. In one example, saturation can be based on count a number of incoming memory requests from private caches 104a-b which are rejected or sent back to a requesting source as not being accepted for servicing. In another example, the saturation level can be based on a count or number of outstanding requests which are not scheduled access to memory 112 due to unavailability of bandwidth for access to memory 112. For example, the saturation level can be based on a level of occupancy of an overflow queue maintained by memory controller 106 (not explicitly shown), wherein the overflow queue can maintain requests which cannot be immediately scheduled access to memory 112 due to unavailability of bandwidth for access to memory 112 (e.g., rather than being rejected and sent back to the requesting source). Regardless of the specific manner in which the saturation level is determined, the count (e.g., of rejections or occupancy of the overflow queue) at the end of every epoch can be compared to a pre-specified threshold. If the count is greater than or equal to the threshold, saturation monitor 108 may generate a saturation signal (shown as “SAT” in
With continuing reference to
In terms of the proportional share weight βi, proportional bandwidth share for each requesting agent is provided by a bandwidth share weight assigned to the requesting agent divided by a sum of the bandwidth share weights assigned to each of the plurality of requesting agents. For example, the proportional share for each QoS class (or correspondingly, an agent belonging to the respective QoS class, e.g., for private caches 104a-b based on their respective QoS classes) can be expressed in terms of the assigned bandwidth share weight for the QoS class or corresponding agent, divided by the sum of the all of the respective assigned bandwidth share weights, which can be represented as shown in Equation (1) below,
wherein, the denominator Σ∀β1 represents the sum of the bandwidth share weights for all of the QoS classes.
It is noted that the calculation of the proportional share can be simplified from Equation 1 by using the proportional share strides a, instead of the proportional share weights βi. This is understood by recognizing that since a, is the inverse of βi, αi can be expressed as an integer, which means that division (or multiplication by a fraction) may be avoided during run time or on the fly to determine cost of servicing a request. Thus, in terms of proportional share strides a, the proportional bandwidth share for each requesting agent is provided by a bandwidth share stride assigned to the requesting agent multiplied by a sum of the bandwidth share strides assigned to each of the plurality of requesting agents.
Regardless of the specific mechanism used to calculate the respective proportional shares, request rate governors 110a-b may be configured to pace or throttle the rate at which memory requests are generated by private caches 104a-b in accordance with the target request rate. In an example, request rate governors 110a-b can be configured to adjust the target request rate by a process comprising multiple phases, e.g., four phases, in lockstep with one another, wherein the target request rate may vary based on the phase. Transitions between these phases and corresponding adjustments to the respective target request rate can occur at time intervals such as epoch boundaries. Running in lockstep can allow request rate governors 110a-b to quickly reach equilibrium such that request rates for all private caches 104a-b are in proportion to the corresponding bandwidth shares, which can lead to efficient memory bandwidth utilization. In exemplary implementations of rate adjustment based on the saturating signal SAT and request rate governors 110a-b, additional synchronizers are not required.
With reference now to
As shown in
In
(Equivalently in terms of stride, the Equation 2 can be represented as Equation (2′):
Stride=N*αi Equation(2′))
In one aspect, the upper bound and lower bound that each of request rate governors 110a-b obtains for its new target rate can be the last two target rates in the iterative decreasing of the target rate. As an illustration, assuming an nth iteration of the Rapid Throttle phase in Block 204 results in memory controller 106 being unsaturated, the target rate at the previous (n−1)th iteration can be set as the upper bound and the target rate at the nth iteration can be set as the lower bound. Example operations in the Rapid Throttle phase of Block 204 are described in
Once the upper bound and lower bounds are established in Block 204, process 200 can proceed to Block 206 comprising a second phase, referred to as the “Fast Recovery” phase. In the Fast Recovery phase the target rates generated by each of request rate governors 110a-b is quickly refined, e.g., using a binary search process, to a target rate which falls within the upper bound and lower bound, and has the highest value at which the saturation signal SAT from saturation monitor 108 does not indicate saturation. The binary search process may, at each iteration, change the target rate in a direction (i.e., up or down) based on whether the previous iteration resulted in (or removed) saturation of memory controller 106. In this regard, the pair of Equations (3) below may be applied if the previous iteration resulted in saturation of memory controller 106, and Equation (4) below may be applied if the previous iteration resulted in an unsaturated state of memory controller 106:
PrevRate=Rate; and Rate=Rate−(PrevRate−Rate) Equations (3)
Rate=0.5*(Rate+PrevRate) Equation (4)
(Equivalently, the counterpart Equations (3′) and (4′) are provided when stride is used instead of rate as shown in algorithm 650 of
In an aspect, operations at Block 206 can be closed ended, i.e., request rate governors 110a-b can exit the Fast Recovery phase after a particular number “S” (e.g., 5) number of iterations in the binary search are performed. Examples of operations at 206 in the Fast Recovery phase are described in greater detail with reference to
Referring to
Therefore, in an aspect, upon the Fast Recovery operations at 206 refining the target rates for governors 110a-b, process 200 can proceed to Block 208 comprising a third phase which may also be referred to as the “Active Increase” phase. In the Active Increase phase request rate governors 110a-b may seek to determine if more memory bandwidth has become available. In this regard, the Active Increase phase can include a step-wise increase in the target rate, at each of request rate governors 110a-b, which may be repeated until the saturation signal SAT from saturation monitor 108 indicates saturation of memory controller 106. Each iteration of the step-wise increase can enlarge the magnitude of the step. For example, the magnitude of the step may be increased exponentially, as defined by Equation (5) below, wherein N is an iteration number, starting at N=1
Rate=Rate+(βi*N) Equation (5)
Stride=Stride−αi*N Equation (5′))
Examples of operations at Block 208 in the Active Increase phase are described in greater detail in reference to
With reference to
However, in an aspect, to provide increased stability, process 200 can first proceed to Block 210 comprising a fourth phase referred to as a “Reset Confirmation” phase to confirm that the saturation signal SAT which caused the exit from the Active Increase phase in Block 208 was likely due to a material change in conditions, as opposed to a spike or other transient event. Stated differently, operations in the Reset Confirmation phase in Block 210 can provide a qualification of the saturation signal SAT as being non-transient, and if confirmed, i.e., if the qualification of the saturation signal SAT as being non-transient is determined to be true in Block 210, then process 200 follows the “yes” path to Block 212 referred to as a “Reset” phase, and then returns to operations in the Rapid Throttle phase in Block 204. In an aspect the Active Increase phase operations in Block 208 can also be configured to step down the target rate by one increment when exiting to the Reset Confirmation phase operations in Block 210. One example step down may be according to Equation (6) below:
Rate=PrevRate−βi Equation (6)
Stride=PrevStride+αi Equation (6′))
In an aspect, if operations in the Reset Confirmation phase at Block 210 indicate that the saturation signal SAT which caused the exit from the Active Increase phase operations in Block 208 was due to a spike or other transient event, process 200 may return to the Active Increase operations in Block 208. Corresponding Reset Confirmation phase at Block 260 is shown in
Referring to
Referring to
If at 602 SAT indicates that memory controller 106 is not saturated, the Binary Search Step algorithm 600 can proceed to the step up operation at 606 which increases the target rate according to Equation (4). The Binary Search Step algorithm 600 can then proceed to 608 where it can increment N by 1, then at 610 can return to the Fast Recovery phase algorithm 600. Upon detecting at 504 that N has reached S, the Fast Recovery phase algorithm 500 can proceed to 506, to initialize N to integer 1 and set PrevRate to the last iteration value of Rate, and then jump to the Active Increase phase in Block 208 of
Referring to
Referring to
Referring to
Referring to
Referring to
At epoch boundary 1110, SAT is absent. A result, as shown by 304 and 306 in
Referring to
At the epoch boundary 1124, SAT appears and, in response, the request rate governors 110 transition to the Rest Confirmation operations in Block 210 of
The interval following epoch boundary 1128 over which request rate governors 110a-b again remain in the Active Increase phase operations at Block 208 be referred to as the “Active Increase phase 1130.” Example operations over the Active Increase phase 1130 will again be described in reference to
At the epoch boundary 1134, SAT appears and, in response, the request rate governors 110 again transition to the
Referring to
It will be understood that within an epoch, controlled rate caches, such as the private caches 102, can be given “credit” for brief periods of inactivity, since Cnext can be strictly additive. Accordingly, if a private cache 104a-b goes through a period of inactivity such that Cnow>>Cnext, that private cache 104a-b can be allowed to issue a burst of requests without any throttling while Cnext catches up. Request rate governors 110a-b can be configured such that, at the end of each epoch, Cnext can be set equal to Cnow. In another implementation, request rate governors 110a-b can be configured such that at the end of each epoch boundary, adjusting C_Next can be adjusted by N*(difference in Stride, PrevStride), which makes it appear as if the prior N (e.g., 16) requests were issued at the new stride/rate rather than the old stride/rate. These features can provide a certainty that any built up credit from the previous epoch does not spill in to the new epoch.
In some aspects, pacer 1206 may be provided to allow a slack in the target rate enforced. The slack allows each requesting agent or class to build up a form of credit during idle periods when requests are not sent by the requesting agents. The requesting agents can later, e.g., in a future time window, use the accumulated slack to generate a burst of traffic or requests for access which would still meet the target rate. In this manner, the requesting agents may be allowed to send out bursts, which can lead to performance improvements. Pacer 1206 may enforce the target request rate by determining bandwidth usage over time windows or periods of time which are inversely proportional to the target request rate. Unused accumulated bandwidth from a previous period of time can be used in a current period of time to allow a burst of one or more requests even if the burst causes the request rate in the current period of time to exceed the target request rate.
In some aspect, pacer 1206 can be configured to provide throttling of private cache 102 according to that target request rate as discussed above. In an aspect, algorithm logic 1204 can be configured to receive SAT from saturation monitor 108, and perform each of the four phase processes described in reference to
Referring to
The
Referring to
For example, in one aspect, a scaling feature may be provided, configured to scale the target rate by the ratio between a miss rate of private caches 104a-b and a miss rate of shared cache 1302 for requests generated by processors 102a-b. The ratio can be expressed as follows:
In an aspect, the rate can be expressed as the number of requests issued over a fixed window of time, which can be arbitrarily termed “W.” In an aspect W can be set to be the latency of a memory request when the bandwidth of memory controller 106 is saturated. Accordingly, saturation RateMAX can be equal to the maximum number of requests that can be concurrently outstanding from a private cache 104a-b. The number, as is known in the related art, can be equal to the number of Miss Status Holding Registers (MSHRs) (not separately visible in
Referring to
Accordingly, it will be appreciated that exemplary aspects include various methods for performing the processes, functions and/or algorithms disclosed herein. For example,
Block 1402 comprises requesting, by a plurality of requesting agents (e.g., private caches 104a-b), bandwidth for accessing a shared memory (e.g., memory 112).
Block 1404 comprises determining a saturation level (saturation signal SAT) of bandwidth for accessing the shared memory in a memory controller (e.g., memory controller 106) for controlling access to the shared memory (e.g., based on count of a number of outstanding requests which are not scheduled access to the shared memory due to unavailability of the bandwidth for access to the shared memory).
Block 1406 comprises determining target request rates at each requesting agent (e.g., at request rate governors 110a-b) based on the saturation level and proportional bandwidth share allocated to the requesting agent based on a Quality of Service (QoS) class of the requesting agent. For example, the saturation level can indicate one of an unsaturated state, low saturation, medium saturation, or high saturation. In some aspects, the proportional bandwidth share for each requesting agent is provided by a bandwidth share weight assigned to the requesting agent divided by a sum of the bandwidth share weights assigned to each of the plurality of requesting agents, while in some aspects, the proportional bandwidth share for each requesting agent is provided by a bandwidth share stride assigned to the requesting agent multiplied by a sum of the bandwidth share strides assigned to each of the plurality of requesting agents. Further, method 400 can also comprise throttling issuance of requests from a requesting agent for access to the shared memory, for enforcing the target request rate at the requesting agent, and the saturation level may be determined at epoch boundaries, as discussed above.
In a particular aspect, input device 1530 and power supply 1544 can be coupled to the system-on-chip device 1522. Moreover, in a particular aspect, as illustrated in
It will be understood that the proportional bandwidth allocation according to exemplary aspects, and as shown in
Those of skill in the art will appreciate that information and signals may be represented using any of a variety of different technologies and techniques. For example, data, instructions, commands, information, signals, bits, symbols, and chips that may be referenced throughout the above description may be represented by voltages, currents, electromagnetic waves, magnetic fields or particles, optical fields or particles, or any combination thereof.
Further, those of skill in the art will appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the aspects disclosed herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The methods, sequences and/or algorithms described in connection with the aspects disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such that the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor.
Accordingly, an aspect of the invention can include a computer readable media embodying a method for bandwidth allocation of shared memory in a processing system. Accordingly, the invention is not limited to illustrated examples and any means for performing the functionality described herein are included in aspects of the invention.
While the foregoing disclosure shows illustrative aspects of the invention, it should be noted that various changes and modifications could be made herein without departing from the scope of the invention as defined by the appended claims. The functions, steps and/or actions of the method claims in accordance with the aspects of the invention described herein need not be performed in any particular order. Furthermore, although elements of the invention may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated.
The present application for patent claims the benefit of U.S. Provisional Application No. 62/258,826, entitled “A METHOD TO ENFORCE PROPORTIONAL BANDWIDTH ALLOCATIONS FOR QUALITY OF SERVICE,” filed Nov. 23, 2015, assigned to the assignee hereof, and expressly incorporated herein by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
62258826 | Nov 2015 | US |