Devices (e.g., publicly available servers such as public clouds) provide server resources (e.g., applications, virtual machines, execution entities, etc.) to users, and may allocate bandwidth for such server resources. For example, a user of a particular resource may have a predefined entitlement to a predefined amount (or portion) of such bandwidth.
According to one general aspect, a system may include micro-schedulers that control bandwidth allocation for clients, each client subscribing to a respective predefined portion of bandwidth of an outgoing communication link. A macro-scheduler controls the micro-schedulers, by allocating the respective subscribed portion of bandwidth associated with each respective client that is active, by a predefined first deadline, with residual bandwidth that is unused by the respective clients being shared proportionately among respective active clients by a predefined second deadline, while minimizing coordination among micro-schedulers by the macro-scheduler periodically adjusting respective bandwidth allocations to each micro-scheduler.
According to another aspect, bandwidth allocation is controlled, by a macro-controller, for users of respective resources that are respectively configured to transmit to a common outgoing communication link having a predefined maximum outgoing transmission bandwidth. Each of the respective users subscribes to a respective predefined portion of the outgoing transmission bandwidth of the outgoing communication link. Controlling the bandwidth allocation includes determining a periodic bit rate of outgoing transmissions at the outgoing communication link, for a first predetermined temporal interval, and determining whether the outgoing communication link is currently congested, by comparing the periodic bit rate of outgoing transmissions at the outgoing communication link with a predefined rate of congestion value. If the outgoing communication link is determined as being currently congested, a respective independent max-cap queue is assigned to each of the plurality of respective users of the plurality of respective resources that are currently active, and a respective amount is allocated up to the respective subscribed portion of bandwidth associated with the each respective client that is active, by a predefined first deadline, with residual bandwidth that is unused by the respective clients being shared proportionately among respective active clients by a predefined second deadline. Respective bandwidth allocations to each of the plurality of respective users of the plurality of respective resources are periodically adjusted, to provide an approximation of fair queuing.
According to another aspect, a computer program product includes a computer-readable storage medium storing executable instructions that cause at least one computing device to control bandwidth allocation for execution entities that are respectively configured to transmit to a common outgoing communication link having a predefined maximum outgoing transmission bandwidth. Each of the respective execution entities subscribes to a respective predefined portion of the outgoing transmission bandwidth of the outgoing communication link. Controlling the bandwidth allocation includes controlling a plurality of micro-schedulers, each of the micro-schedulers controlling bandwidth allocation for the respective execution entities. Controlling the plurality of micro-schedulers includes, for each respective execution entity that is active, allocating the respective subscribed portion of bandwidth associated with the each respective execution entity that is active, by a predefined first deadline, with residual bandwidth that is unused by the respective execution entities being shared proportionately among respective active execution entities by a predefined second deadline, while minimizing coordination among micro-schedulers by periodically adjusting respective bandwidth allocations to each micro-scheduler.
This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter. The details of one or more implementations are set forth in the accompanying drawings and the description below. Other features will be apparent from the description and drawings, and from the claims.
I. Introduction
Devices provide resources to users, and may allocate bandwidth for such resources (e.g., data communication bandwidth). For example, a user of a particular resource may have a predefined entitlement to a predefined amount (or portion) of such bandwidth. For example, publicly available servers (e.g., public clouds) provide server resources to users, and may allocate virtual machines (VMs) to such users, as well as allocating bandwidth for each VM. For example, a user of a VM may have entitlement to a predefined amount (or portion) of such bandwidth. For example, the user may have a subscription that entitles the user to predefined amounts of resources (e.g., the bandwidth).
In this context, “cloud” may refer to a variety of computing concepts that involve one or more devices or computers that may be connected through a communication network such as the Internet. For example, “cloud computing” may refer to distributed computing over a network, and may involve an ability to run a program or application on multiple connected computers. One skilled in the art of computing will appreciate that example techniques discussed herein may use a “cloud” or any other remote device (e.g., a remote server, or simpler types of remote devices), without departing from the spirit of the discussion herein.
In this context, a “virtual machine” (VM) may refer to an emulation of a particular computer system. For example, virtual machines may operate based on the computer architecture and functions of a real or hypothetical computer, and their implementations may involve specialized hardware, software, or a combination of both. In this context, “bandwidth” may refer to a measurement of bit-rate of available or consumed data communication resources.
Public clouds such as AZURE and AMAZON's EC2 do not just allocate Virtual Machines (VMs), but also bandwidth for each VM based on user entitlement (e.g., entitlement via user/customer subscription/agreement/payment). Thus, a provider may need to control bandwidth allocation at a server among user VMs according to the respective amounts of bandwidth to which each user is entitled (e.g., via subscription). While each user can be allocated a fixed amount of bandwidth to which they are entitled, this may be wasteful because many users may not use their bandwidth entitlement amount for long periods of time. In accordance with example techniques discussed herein, this unused (residual) bandwidth can be proportionately redistributed among other non-idle (i.e., active) user VMs. Fair bandwidth allocation or fair queuing techniques may be used in many situations to handle similar issues; however, VMs involve several characteristics that may involve handling different from that of conventional techniques.
In this context, “fair bandwidth allocation” in Internet gateways refers to a problem of determining how to control allocation of network capacity between users that share an Internet connection. In this context, “fair queuing” refers to scheduling techniques used by process and network schedulers, e.g., to allow multiple packet flows to fairly share the link capacity. For example, in contrast with conventional first in first out (FIFO) queuing, with fair queuing, a high-data-rate flow (e.g., involving large or many data packets) cannot take more than its fair share of the link capacity.
For example, conventional fair queuing techniques may assume there is a tight feedback loop between transmission on the link and the scheduler implementing the allocation (e.g., in routers). However, this assumption is false in Virtual Switches where the scheduler may be implemented in software while transmission is performed by a hardware Network Interface card (NIC). Crossing the boundary between software and hardware may be expensive, particularly at Gigabit speed; optimizations such as Large Send Offload and Interrupt batching attempt to reduce the communication between software and hardware to once for a large batch of packets, and thus, direct simulation of existing fair queueing algorithms (e.g., Deficit Round Robin) may exhibit inefficient performance.
In accordance with example techniques discussed herein, the bandwidth of each respective VM may be controlled independently based on the amount of bandwidth the VMs have been measured to send in the last (prior) scheduling period (e.g., immediately prior in time to a current scheduling period, or scheduling interval), of a plurality of scheduling periods corresponding to a plurality of temporal intervals associated with such scheduling. For example, such techniques may advantageously alleviate a need for a feedback loop between software and the physical NIC. Thus, such techniques may be simple, efficient, and agnostic to the implementation details of various different physical NICs.
Example techniques discussed herein may converge to fair allocations over a small number of scheduling intervals, and may minimize wasted bandwidth due to VMs that do not fully utilize their allocations.
Example techniques discussed herein are based only on local observations of transmissions (and not on an entire network—thus, there is no requirement of coordination with remote endpoints). Further, example techniques discussed herein may control traffic aggregation at the virtual port level and are agnostic to layer-3 and layer-4 protocols and traffic patterns (e.g., of Transmission Control Protocol (TCP)).
Example techniques discussed herein may advantageously provide approximations of fair bandwidth allocation with independent queues.
Example techniques discussed herein may advantageously provide fair queuing without a feedback loop between schedulers and transmitters (e.g., which may be advantageous when a scheduler is implemented via software and the transmitter is implemented via hardware).
Example techniques discussed herein may advantageously provide a control technique that converges to fair allocation of bandwidth over an advantageously small number of scheduling periods based on the amount actually transmitted by a VM's per flow queue, compared to the bandwidth the VM was allocated.
Example techniques discussed herein may advantageously provide parameters that control tradeoffs between convergence time to the amount of bandwidth to which a VM is entitled (e.g., subscribed) and the link utilization, as well as the burstiness of the bandwidth allocated to a VM.
Further, example techniques discussed herein may advantageously provide a mode in which no controls are applied (hence reducing software overhead) that optimizes for the expected case when there is no congestion on the link.
Example techniques discussed herein may advantageously provide minimal shared state among the different VMs, which may in turn minimize locking overhead.
In accordance with one example implementation, a module implemented using software may track utilization of the NIC by counting packets that pass through a network Quality of Service (QoS) module (e.g., counting the packets via a counter), and may periodically compute the bit rate of the packets based on this counter. For example, the bit rate may be periodically determined in accordance with a plurality of temporal intervals (or time periods). For example, such temporal intervals may be set at a predetermined value (e.g., 100 milliseconds (ms)).
If the NIC is not congested, nothing is done to further control bandwidth allocations (e.g., no queue has a cap). For example, NIC “congestion” may be determined in accordance with a comparison of the determined bit rate, for a particular temporal interval, against a predetermined threshold value of congestion.
If the NIC is congested, a max-cap queue is assigned to each VM, and the max-cap values of these queues are actively adjusted every temporal interval (e.g., every 100 ms) to balance between bandwidth requirements and fair bandwidth allocations (i.e., amount of bandwidth each VM wants/requests vs. amount of bandwidth each VM is entitled/subscribed to receive) by the macro-scheduler. Each max-cap queue is managed by a micro-scheduler, independently of other micro-schedulers. The micro-scheduler associated with each max-cap queue enforces the respective max-cap value by scheduling packets through that queue so as to not exceed the max-cap value set by the macro-scheduler.
If a VM has sent below its allocated rate in the last temporal interval (e.g., 100 ms), its max-cap value is reduced.
The residual bandwidth (bandwidth left over from VMs that do not use all their min guarantees) is used to distribute to VMs that want to send more, as discussed below.
If a VM has sent substantially close to its allocated rate in the last 100 ms, its max-cap value is increased proportional to its fair bandwidth allocation.
For example, by controlling the rate out of each VM and distributing only the residual bandwidth to VMs that need more bandwidth, it may be ensured that the sum of the max cap values is less than or equal to the NIC's effective capacity. Thus, QoS controls distribution of the NIC's capacity across VMs.
In accordance with example techniques discussed herein, each max cap queue may be adjusted to increase or reclaim its allocated bandwidth.
For example, any VM that needs bandwidth may be given at least its guaranteed bandwidth within a time that is specified by a configurable parameter (e.g., within 400 ms). Thus, bandwidth may be shared among clients such that every active client is eventually allocated its subscribed bandwidth (e.g., its entitlement amount of bandwidth).
For example, any unused bandwidth may be reclaimed within a time that is specified by a configurable setting (e.g., within 1000 ms) which can be tuned (e.g., depending on the burstiness of the traffic, and whether it is desired to reclaim residual bandwidth between bursts).
As shown in
For example, the clients 104a, . . . , 104n may be hosted on a single machine 114, and a host partition may include an operating system (OS) of the machine 114 to host the clients 104a, . . . , 104n, and may include a virtual switch (software) that acts like a switch with virtual ports to which the clients 104a, . . . , 104n are attached. The virtual switch includes functionality to ensure that the sum of the BW allocated across all the clients 104a, . . . , 104n is less than (or equal to) the output sum, and that the bandwidth is distributed among them proportionately to the respective client subscribed bandwidths.
As used herein, “client” may refer to a user, applications, virtual machines, or any type of execution entities that may use (or subscribe to, or be otherwise entitled to) outgoing bandwidth on a machine or device (e.g., a service provider server).
Features discussed herein are provided as example embodiments that may be implemented in many different ways that may be understood by one of skill in the art of computing, without departing from the spirit of the discussion herein. Such features are to be construed only as example embodiment features, and are not intended to be construed as limiting to only those detailed descriptions.
As further discussed herein,
A macro-scheduler 112 controls the plurality of micro-schedulers 102a, 102b, . . . , 102n. For example, for each respective client 104a, 104b, . . . , 104n that is active, the macro-scheduler 112 allocates the respective subscribed portion of bandwidth associated with the each respective client 104a, 104b, . . . , 104n that is active, by a predefined first deadline, with residual bandwidth that is unused by the respective clients being shared proportionately among respective active clients 104a, 104b, . . . , 104n by a predefined second deadline, while minimizing coordination among micro-schedulers 102a, 102b, . . . , 102n by the macro-scheduler 112 periodically adjusting respective bandwidth allocations to each micro-scheduler 102a, 102b, . . . , 102n.
For example, the macro-scheduler 112 periodically adjusts the respective bandwidth allocations to each micro-scheduler 102a, 102b, . . . , 102n by accessing measurements of respective amounts of bandwidth used by each respective active client 104a, 104b, . . . , 104n in a predetermined temporal interval that is prior in time to a current temporal interval associated with a current periodic adjustment.
For example, the macro-scheduler 112 periodically adjusts the respective bandwidth allocations to each micro-scheduler 102a, 102b, . . . , 102n by capping respective bandwidth allocations at respective values of the respective subscribed portions of bandwidth for respective clients 104a, 104b, . . . , 104n that transmitted in respective amounts greater than the respective subscribed portions of bandwidth for the respective clients 104a, 104b, . . . , 104n, in the predetermined temporal interval that is prior in time to the current temporal interval associated with the current periodic adjustment.
For example, the macro-scheduler 112 periodically adjusts the respective bandwidth allocations to each micro-scheduler 102a, 102b, . . . , 102n by capping respective bandwidth allocations at respective values of respective current actual transmission amounts of bandwidth for respective clients 104a, 104b, . . . , 104n that transmitted in respective amounts substantially less than the respective subscribed portions of bandwidth for respective clients, in the predetermined temporal interval that is prior in time to the current temporal interval associated with the current periodic adjustment.
For example, the macro-scheduler 112 periodically adjusts the respective bandwidth allocations to each micro-scheduler 102a, 102b, . . . , 102n by determining the residual bandwidth that is unused by the respective clients 104a, 104b, . . . , 104n, after lowering one or more respective bandwidth allocations by capping the one or more respective bandwidth allocations based on actual transmission amounts, and proportionately allocating residual bandwidth that is unused by the respective clients 104a, 104b, . . . , 104n, among respective active clients 104a, 104b, . . . , 104n who are currently requesting more bandwidth allocation.
For example, each of the micro-schedulers 102a, 102b, . . . , 102n controls bandwidth allocation for respective virtual machines (VMs) of the respective clients 104a, 104b, . . . , 104n, each of the respective clients 104a, 104b, . . . , 104n subscribing to the respective predefined portion of bandwidth for use by the respective VMs, wherein the respective VMs of the respective clients 104a, 104b, . . . , 104n are hosted on a cloud server, wherein the outgoing communication link 108 includes an outgoing network communication link associated with one or more network interface cards (NICs) located on the cloud server.
For example, each of the micro-schedulers 102a, 102b, . . . , 102n controls bandwidth allocation for respective applications of the respective clients 104a, 104b, . . . , 104n, each of the respective clients 104a, 104b, . . . , 104n subscribing to the respective predefined portion of bandwidth for use by the respective applications, wherein the respective applications of the respective clients 104a, 104b, . . . , 104n are hosted on a cloud server, wherein the outgoing communication link 108 includes an outgoing network communication link associated with one or more network interface cards (NICs) located on the cloud server.
For example, the outgoing communication link 108 includes an outgoing network communication link associated with a network interface card (NIC).
For example, the macro-scheduler 112 controls the plurality of micro-schedulers 102a, 102b, . . . , 102n, providing an approximation of fair queuing, without using a feedback loop between the macro-scheduler 112 and a transmitter.
For example, the macro-scheduler 112 controls the plurality of micro-schedulers 102a, 102b, . . . , 102n, providing no bandwidth allocation controls when it is determined that the outgoing network communication link 108 is not congested.
For example, the macro-scheduler 112 controls the plurality of micro-schedulers 102a, 102b, . . . , 102n, providing an approximation of fair bandwidth allocation to the respective clients 104a, 104b, . . . , 104n, assigning respective independent queues to the respective clients 104a, 104b, . . . , 104n.
For example, a virtual switch may include a plurality of respective virtual ports, wherein a portion of the virtual ports communicate with the clients 104a, 104b, . . . , 104n, and at least one of the virtual ports communicates with the outgoing communication link 108.
For example, the respective virtual ports that respectively communicate with the plurality of respective clients 104a, 104b, . . . , 104n each communicate with a respective token bucket queue that controls respective transmission rates of outgoing packets for each respective client 104a, 104b, . . . , 104n.
For example, a “token bucket” technique may be based on an analogy of a fixed capacity bucket into which tokens (e.g., representing a unit of bytes or a single packet of predetermined size) may be added at a fixed rate. When a packet is to be checked for conformance to predetermined limits, the bucket may be inspected to determine whether it has sufficient tokens at that time. If so, the appropriate number of tokens (e.g., equivalent to the length of the packet in bytes) are removed (“cashed in”), and the packet is passed, e.g., for transmission. The packet does not conform if there are insufficient tokens in the bucket, and the contents of the bucket are not changed. Non-conformant packets can be treated in various ways, for example, they may be dropped, or they may be enqueued for subsequent transmission when sufficient tokens have accumulated in the bucket, or they may be transmitted, but marked as being non-conformant, possibly to be dropped subsequently if the network is overloaded.
As shown in
For example, the VMs 104a, . . . , 104n may be hosted on a single machine 114, and a host partition may include an operating system (OS) of the machine 114 to host the VMs 104a, . . . , 104n, and may include a virtual switch (software) that acts like a switch with virtual ports to which the VMs 104a, . . . , 104n are attached. The virtual switch includes functionality to ensure that the sum of the BW allocated across all the VMs 104a, . . . , 104n is less than (or equal to) the output sum, and that the bandwidth is distributed among them proportionately to the respective VM subscribed bandwidths.
In an example bandwidth problem space (e.g., using hardware), hardware queues for each port have the packets for sending, and in software, there are layers so that it is possible to determine when packet has been sent. However, multiple threads pull packets out of queues, so that it is difficult to determine whether the packets are in correct order. For these types of scenarios, it may not be practical to implement hardware schedulers in software.
Many conventional techniques may be used for this example system, for example, deficit round robin (DRR), start-time fair queuing, etc.
However, if the clients 104 of
For example, it may be too expensive to include fine grain timers in software at microsecond granularity to perform the scheduling. For example, it may be infeasible/too expensive to include a real time scheduler, as a user may not be able to afford a VM switch scheduler to be invoked on every packet. Further, a single core may not transmit as fast as the NIC. As shown in
Example implications of such an example system may include:
Discussed below (and as shown in greater detail in
If the NIC is not in congestion mode, do nothing (no queue has a cap). The NIC is not in congestion mode if it has not crossed 90% of effective capacity, or if it has been below 80% of effective capacity for 120 s.
When the QoS first transitions into congestion mode, assign each queue a max cap of the queue's min guarantee.
When QoS is in congestion mode, all queues have a max cap, and the sum of the max caps is less than or equal to the NIC's effective capacity
Periodically (e.g., every 100 ms):
Set new allocated rate=min(Target allocated rate, Min guarantee).
For a queue with new allocated rate=Min guarantee, distribute the residual bandwidth according to fair share up to target allocated rate.
As more general explanation, if the NIC is not congested, nothing is done to further control bandwidth allocations. If the NIC is congested, a max-cap queue is assigned to each VM. As discussed above, each max-cap queue is managed by a micro-scheduler, independently of other micro-schedulers. As discussed above, these queues are actively adjusted every temporal interval (e.g., every 100 ms) to balance between bandwidth requirements and fair bandwidth allocations (i.e., amount of bandwidth each VM wants/requests vs. amount of bandwidth each VM is entitled/subscribed to receive).
If a VM has sent below its allocated rate in the last temporal interval (e.g., 100 ms), its max-cap value is reduced.
The residual bandwidth (bandwidth left over from VMs that do not use all their min guarantees) is used to distribute to VMs that want to send more: if a VM has sent substantially close to its allocated rate in the last 100 ms, its max-cap queue is increased proportional to its fair bandwidth allocation.
One skilled in the art of computing will appreciate that there may be many ways to efficiently control the bandwidth allocation discussed herein, without departing from the spirit of the discussion herein.
In this context, a “processor” may include a single processor or multiple processors configured to process instructions associated with a computing system. A processor may thus include one or more processors executing instructions in parallel and/or in a distributed manner. For example, the system 100A may include one or more processors (e.g., hardware processors). For example, the system 100A may include at least one tangible computer-readable storage medium storing instructions executable by the one or more processors, the executable instructions configured to cause at least one computing apparatus to perform operations associated with various example components included in the system 100A, as discussed herein. For example, the one or more processors may be included in the at least one computing apparatus. One skilled in the art of computing will understand that there are many configurations of processors and computing apparatuses that may be configured in accordance with the discussion herein, without departing from the spirit of such discussion.
In this context, a “component” or “module” may refer to instructions or hardware that may be configured to perform certain operations. Such instructions may be included within component groups of instructions, or may be distributed over more than one group. For example, some instructions associated with operations of a first component may be included in a group of instructions associated with operations of a second component (or more components). For example, a “component” herein may refer to a type of functionality that may be implemented by instructions that may be located in a single entity, or may be spread or distributed over multiple entities, and may overlap with instructions and/or hardware associated with other components.
According to an example embodiment, the system 100A may manage network communication between the system 100A and other entities that may communicate with the system 100A via at least one network. For example, the network may include at least one of the Internet, at least one wireless network, or at least one wired network. For example, the network may include a cellular network, a radio network, or any type of network that may support transmission of data for the system 100A.
As shown in
At 416, if the VM's send rate is less than 85% of the VM's current allocated rate (i.e., the VM is not using all of its allocated bandwidth), the VM's allocated rate is reduced to 110% of its current send rate. A ramp-up counter (RUStage) is decremented by 1 (until it reaches 0) each time it is determined that a VM is not using all of its allocated bandwidth.
At 418, if the VM's send rate is greater than 95% of the VM's current allocated rate (interpreted as the VM wanting more bandwidth than its current allocation), at 420, the ramp-up counter (RUStage) is increased (e.g., incremented by 1) each time it is determined that a VM wants more bandwidth (within a limiting predetermined, configurable number of iterations).
As shown in
At 426, for a second phase of ramp-up, the VM is given 50% more bandwidth, but not exceeding 10% if the link's speed. At 428, for a second phase of ramp-up, the VM is given 100% more bandwidth.
At 430, if the VM is neither using too much nor too little of its allocated bandwidth, the VM's computed target rate is set to the VM's allocated rate, and the ramp-up counter is decremented. At 432, if the VM is not transmitting, the VM is assigned a “small” max-cap value as its target rate (e.g., a max of its current target rate and 10 Mbps). For this example, the 10 Mbps is predetermined and configurable.
At 442, residual bandwidth is harvested from VM's that do not need all of their current target rate. The while loop is repeated until there is no more residual bandwidth to distribute, or when there is no more VM that needs more bandwidth. At 444, the final allocation values are set for the VM max-cap queues (e.g., by the macroscheduler 112).
As shown in
As shown in
As shown in
As shown in
As shown in
As shown in
As shown in
As shown in
As shown in
As shown in
As shown in
One skilled in the art of computing will appreciate that many different techniques may be used for efficiently controlling bandwidth allocation, without departing from the spirit of the discussion herein.
II. Flowchart Description
Features discussed herein are provided as example embodiments that may be implemented in many different ways that may be understood by one of skill in the art of computing, without departing from the spirit of the discussion herein. Such features are to be construed only as example embodiment features, and are not intended to be construed as limiting to only those detailed descriptions.
The micro-schedulers are controlled, by allocating the respective subscribed portion of bandwidth associated with each respective client that is active, by a predefined first deadline, with residual bandwidth that is unused by the respective clients being shared proportionately among respective active clients by a predefined second deadline, while minimizing coordination among micro-schedulers by the macro-scheduler periodically adjusting respective bandwidth allocations to each micro-scheduler (604).
For example, the outgoing communication link includes an outgoing network communication link associated with a network interface card (NIC) (606).
For example, the macro-scheduler periodically adjusts the respective bandwidth allocations to each micro-scheduler by accessing measurements of respective amounts of bandwidth used by each respective active client in a predetermined temporal interval that is prior in time to a current temporal interval associated with a current periodic adjustment (608), in the example of
For example, the macro-scheduler periodically adjusts the respective bandwidth allocations to each micro-scheduler by capping respective bandwidth allocations at respective values of the respective subscribed portions of bandwidth for respective clients that transmitted in respective amounts greater than the respective subscribed portions of bandwidth for the respective clients, in the predetermined temporal interval that is prior in time to the current temporal interval associated with the current periodic adjustment (610).
For example, the macro-scheduler periodically adjusts the respective bandwidth allocations to each micro-scheduler by capping respective bandwidth allocations at respective values of respective current actual transmission amounts of bandwidth for respective clients that transmitted in respective amounts substantially less than the respective subscribed portions of bandwidth for respective clients, in the predetermined temporal interval that is prior in time to the current temporal interval associated with the current periodic adjustment (612).
For example, the macro-scheduler periodically adjusts the respective bandwidth allocations to each micro-scheduler by determining the residual bandwidth that is unused by the respective clients, after lowering one or more respective bandwidth allocations by capping the one or more respective bandwidth allocations based on actual transmission amounts, and proportionately allocating residual bandwidth that is unused by the respective clients, among respective active clients who are currently requesting more bandwidth allocation (614).
For example, each of the micro-schedulers controls bandwidth allocation for respective virtual machines (VMs) of the respective clients, each of the respective clients subscribing to the respective predefined portion of bandwidth for use by the respective VMs, wherein the respective VMs of the respective clients are hosted on a cloud server, wherein the outgoing communication link includes an outgoing network communication link associated with one or more network interface cards (NICs) located on the cloud server (616), in the example of
For example, each of the micro-schedulers controls bandwidth allocation for respective applications of the respective clients, each of the respective clients subscribing to the respective predefined portion of bandwidth for use by the respective applications, wherein the respective applications of the respective clients are hosted on a cloud server, wherein the outgoing communication link includes an outgoing network communication link associated with one or more network interface cards (NICs) located on the cloud server (618).
For example, the macro-scheduler controls the plurality of micro-schedulers, providing an approximation of fair queuing, without using a feedback loop between the macro-scheduler and a transmitter (620).
For example, the macro-scheduler controls the plurality of micro-schedulers, providing no bandwidth allocation controls when it is determined that the outgoing network communication link is not congested (622).
For example, the macro-scheduler controls the plurality of micro-schedulers, providing an approximation of fair bandwidth allocation to the respective clients, assigning respective independent queues to the respective clients (624), in the example of
For example, a virtual switch includes a plurality of respective virtual ports, wherein a portion of the virtual ports communicate with the plurality of clients, and at least one of the virtual ports communicates with the outgoing communication link (626).
For example, the respective virtual ports that respectively communicate with the plurality of respective clients each communicate with a respective token bucket queue that controls respective transmission rates of outgoing packets for each respective client (628).
A periodic bit rate of outgoing transmissions at the outgoing communication link is determined, for a first predetermined temporal interval (704).
It is determined whether the outgoing communication link is currently congested, by comparing the periodic bit rate of outgoing transmissions at the outgoing communication link with a predefined rate of congestion value (706).
If the outgoing communication link is determined as being currently congested, a respective independent max-cap queue is assigned to each of the respective users of the respective resources that are currently active, a respective amount is allocated, up to the respective subscribed portion of bandwidth associated with the each respective client that is active, by a predefined first deadline, with residual bandwidth that is unused by the respective clients being shared proportionately among respective active clients by a predefined second deadline, and respective bandwidth allocations to each of the respective users of the respective resources are periodically adjusted, to provide an approximation of fair queuing (708).
For example, each of the respective independent max-cap queues is managed by a respective micro-scheduler, independently of other respective micro-schedulers managing other respective independent max-cap queues (710), in the example of
For example, the periodically adjusting the respective bandwidth allocations to each user includes accessing measurements of respective amounts of bandwidth used by each respective active user in a predetermined temporal interval that is prior in time to a current temporal interval associated with a current periodic adjustment (712).
For example, the approximation of fair queuing is provided without using a feedback loop between a scheduler and a transmitter (714).
For example, the controlling the bandwidth allocation for the plurality of users includes providing no bandwidth allocation controls when it is determined that the outgoing communication link is not currently congested (716).
A plurality of micro-schedulers are controlled, each of the micro-schedulers controlling bandwidth allocation for the respective execution entities (804).
The respective subscribed portion of bandwidth associated with each respective execution entity that is active, is allocated by a predefined first deadline, with residual bandwidth that is unused by the respective execution entities being shared proportionately among respective active execution entities by a predefined second deadline, while minimizing coordination among micro-schedulers by periodically adjusting respective bandwidth allocations to each micro-scheduler (806).
For example, the periodically adjusting the respective bandwidth allocations to each micro-scheduler includes accessing measurements of respective amounts of bandwidth used by each respective active execution entity in a predetermined temporal interval that is prior in time to a current temporal interval associated with a current periodic adjustment (808).
For example, the controlling the plurality of micro-schedulers provides an approximation of fair bandwidth allocation to the respective execution entities, assigning respective independent queues to the respective execution entities (810).
III. Aspects of Certain Embodiments
Features discussed herein are provided as example embodiments that may be implemented in many different ways that may be understood by one of skill in the art of computing, without departing from the spirit of the discussion herein. Such features are to be construed only as example embodiment features, and are not intended to be construed as limiting to only those detailed descriptions.
For example, a system includes at least one processor, and a computer-readable storage medium that stores executable instructions that are executable by the at least one processor. The executable instructions include a bandwidth allocation controller that includes a plurality of micro-schedulers, each of the micro-schedulers controlling bandwidth allocation for respective clients, each of the respective clients subscribing to a respective predefined portion of bandwidth of an outgoing communication link.
A macro-scheduler controls the plurality of micro-schedulers, by: for each respective client that is active, allocating the respective subscribed portion of bandwidth associated with the each respective client that is active, by a predefined first deadline, with residual bandwidth that is unused by the respective clients being shared proportionately among respective active clients by a predefined second deadline, while minimizing coordination among micro-schedulers by the macro-scheduler periodically adjusting respective bandwidth allocations to each micro-scheduler.
The macro-scheduler periodically adjusts the respective bandwidth allocations to each micro-scheduler by accessing measurements of respective amounts of bandwidth used by each respective active client in a predetermined temporal interval that is prior in time to a current temporal interval associated with a current periodic adjustment.
The macro-scheduler periodically adjusts the respective bandwidth allocations to each micro-scheduler by capping respective bandwidth allocations at respective values of the respective subscribed portions of bandwidth for respective clients that transmitted in respective amounts greater than the respective subscribed portions of bandwidth for the respective clients, in the predetermined temporal interval that is prior in time to the current temporal interval associated with the current periodic adjustment.
The macro-scheduler periodically adjusts the respective bandwidth allocations to each micro-scheduler by capping respective bandwidth allocations at respective values of respective current actual transmission amounts of bandwidth for respective clients that transmitted in respective amounts substantially less than the respective subscribed portions of bandwidth for respective clients, in the predetermined temporal interval that is prior in time to the current temporal interval associated with the current periodic adjustment.
The macro-scheduler periodically adjusts the respective bandwidth allocations to each micro-scheduler by determining the residual bandwidth that is unused by the respective clients, after lowering one or more respective bandwidth allocations by capping the one or more respective bandwidth allocations based on actual transmission amounts, and proportionately allocating residual bandwidth that is unused by the respective clients, among respective active clients who are currently requesting more bandwidth allocation.
Each of the micro-schedulers controls bandwidth allocation for respective virtual machines (VMs) of the respective clients, each of the respective clients subscribing to the respective predefined portion of bandwidth for use by the respective VMs, wherein the respective VMs of the respective clients are hosted on a cloud server, wherein the outgoing communication link includes an outgoing network communication link associated with one or more network interface cards (NICs) located on the cloud server.
Each of the micro-schedulers controls bandwidth allocation for respective applications of the respective clients, each of the respective clients subscribing to the respective predefined portion of bandwidth for use by the respective applications, wherein the respective applications of the respective clients are hosted on a cloud server, wherein the outgoing communication link includes an outgoing network communication link associated with one or more network interface cards (NICs) located on the cloud server.
The outgoing communication link includes an outgoing network communication link associated with a network interface card (NIC).
The macro-scheduler controls the plurality of micro-schedulers, providing an approximation of fair queuing, without using a feedback loop between the macro-scheduler and a transmitter.
The macro-scheduler controls the plurality of micro-schedulers, providing no bandwidth allocation controls when it is determined that the outgoing network communication link is not congested.
The macro-scheduler controls the plurality of micro-schedulers, providing an approximation of fair bandwidth allocation to the respective clients, assigning respective independent queues to the respective clients.
A virtual switch that includes a plurality of respective virtual ports, wherein a portion of the virtual ports communicate with the plurality of clients, and at least one of the virtual ports communicates with the outgoing communication link.
The respective virtual ports that respectively communicate with the plurality of respective clients each communicate with a respective token bucket queue that controls respective transmission rates of outgoing packets for each respective client.
A method includes controlling, by a macro-controller, bandwidth allocation for a plurality of users of a plurality of respective resources that are respectively configured to transmit to a common outgoing communication link having a predefined maximum outgoing transmission bandwidth, each of the respective users subscribing to a respective predefined portion of the outgoing transmission bandwidth of the outgoing communication link.
Controlling the bandwidth allocation includes determining a periodic bit rate of outgoing transmissions at the outgoing communication link, for a first predetermined temporal interval, and determining whether the outgoing communication link is currently congested, by comparing the periodic bit rate of outgoing transmissions at the outgoing communication link with a predefined rate of congestion value.
If the outgoing communication link is determined as being currently congested, assigning a respective independent max-cap queue is assigned to each of the plurality of respective users of the plurality of respective resources that are currently active. A respective amount is allocated up to the respective subscribed portion of bandwidth associated with the each respective client that is active, by a predefined first deadline, with residual bandwidth that is unused by the respective clients being shared proportionately among respective active clients by a predefined second deadline.
Respective bandwidth allocations to each of the plurality of respective users of the plurality of respective resources are periodically adjusted, to provide an approximation of fair queuing.
Each of the respective independent max-cap queues is managed by a respective micro-scheduler, independently of other respective micro-schedulers managing other respective independent max-cap queues, and
The periodically adjusting the respective bandwidth allocations to each user includes accessing measurements of respective amounts of bandwidth used by each respective active user in a predetermined temporal interval that is prior in time to a current temporal interval associated with a current periodic adjustment.
The approximation of fair queuing is provided without using a feedback loop between a scheduler and a transmitter.
The controlling the bandwidth allocation for the plurality of users includes providing no bandwidth allocation controls when it is determined that the outgoing communication link is not currently congested.
A computer program product includes a computer-readable storage medium storing executable instructions that cause at least one computing device to control bandwidth allocation for a plurality of execution entities that are respectively configured to transmit to a common outgoing communication link having a predefined maximum outgoing transmission bandwidth, each of the respective execution entities subscribing to a respective predefined portion of the outgoing transmission bandwidth of the outgoing communication link.
Controlling the bandwidth allocation includes controlling a plurality of micro-schedulers, each of the micro-schedulers controlling bandwidth allocation for the respective execution entities.
Controlling the plurality of micro-schedulers includes, for each respective execution entity that is active, allocating the respective subscribed portion of bandwidth associated with the each respective execution entity that is active, by a predefined first deadline, with residual bandwidth that is unused by the respective execution entities being shared proportionately among respective active execution entities by a predefined second deadline, while minimizing coordination among micro-schedulers by periodically adjusting respective bandwidth allocations to each micro-scheduler.
The periodically adjusting the respective bandwidth allocations to each micro-scheduler includes accessing measurements of respective amounts of bandwidth used by each respective active execution entity in a predetermined temporal interval that is prior in time to a current temporal interval associated with a current periodic adjustment.
The controlling the plurality of micro-schedulers provides an approximation of fair bandwidth allocation to the respective execution entities, assigning respective independent queues to the respective execution entities.
One skilled in the art of computing will understand that there may be many ways of efficiently controlling bandwidth allocation, without departing from the spirit of the discussion herein.
Customer privacy and confidentiality have been ongoing considerations in computing environments for many years. Thus, example techniques for efficiently controlling bandwidth allocation may use user input and/or data provided by users who have provided permission via one or more subscription agreements (e.g., “Terms of Service” (TOS) agreements) with associated applications or services associated with such techniques. For example, users may provide consent to have their input/data transmitted and stored on devices, though it may be explicitly indicated (e.g., via a user accepted agreement) that each party may control how transmission and/or storage occurs, and what level or duration of storage may be maintained, if any. Further, identifiers that may be used to identify devices used by a user may be obfuscated, e.g., by hashing actual user information. It is to be understood that any user input/data may be obtained in accordance with the privacy laws and regulations of any relevant jurisdiction.
Implementations of the various techniques described herein may be implemented in digital electronic circuitry, or in computer hardware, firmware, software, or in combinations of them (e.g., an apparatus configured to execute instructions to perform various functionality).
Implementations may be implemented as a computer program embodied in signals (e.g., a pure signal such as a pure propagated signal). Such implementations will be referred to herein as implemented via a “computer-readable transmission medium,” which does not qualify herein as a “computer-readable storage medium” or a “computer-readable storage device” as discussed below.
Alternatively, implementations may be implemented via a machine usable or machine readable storage device (e.g., a magnetic or digital medium such as a Universal Serial Bus (USB) storage device, a tape, hard disk drive, compact disk (CD), digital video disk (DVD), etc.), storing executable instructions (e.g., a computer program), for execution by, or to control the operation of, a computing apparatus (e.g., a computing apparatus), e.g., a programmable processor, a special-purpose processor or device, a computer, or multiple computers. Such implementations may be referred to herein as implemented via a “computer-readable storage medium” or a “computer-readable storage device” and are thus different from implementations that are purely signals such as pure propagated signals (and thus do not qualify herein as a “computer-readable transmission medium” as discussed above). Thus, as used herein, a reference to a “computer-readable storage medium” or a “computer-readable storage device” specifically excludes signals (e.g., propagated signals) per se.
A computer program, such as the computer program(s) described above, can be written in any form of programming language, including compiled, interpreted, or machine languages, and can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. The computer program may be tangibly embodied as executable code (e.g., executable instructions) on a machine usable or machine readable storage device (e.g., a computer-readable medium). A computer program that might implement the techniques discussed above may be deployed to be executed on one computer or on multiple computers at one site or distributed across multiple sites and interconnected by a communication network.
Method steps may be performed by one or more programmable processors executing a computer program to perform functions by operating on input data and generating output. The one or more programmable processors may execute instructions in parallel, and/or may be arranged in a distributed configuration for distributed processing. Example functionality discussed herein may also be performed by, and an apparatus may be implemented, at least in part, as one or more hardware logic components. For example, and without limitation, illustrative types of hardware logic components that may be used may include Field-programmable Gate Arrays (FPGAs), Program-specific Integrated Circuits (ASICs), Program-specific Standard Products (ASSPs), System-on-a-chip systems (SOCs), Complex Programmable Logic Devices (CPLDs), etc.
Processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer. Generally, a processor will receive instructions and data from a read only memory or a random access memory or both. Elements of a computer may include at least one processor for executing instructions and one or more memory devices for storing instructions and data. Generally, a computer also may include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto optical disks, or optical disks. Information carriers suitable for embodying computer program instructions and data include all forms of nonvolatile memory, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto optical disks; and CD ROM and DVD-ROM disks. The processor and the memory may be supplemented by, or incorporated in special purpose logic circuitry.
To provide for interaction with a user, implementations may be implemented on a computer having a display device, e.g., a cathode ray tube (CRT), liquid crystal display (LCD), or plasma monitor, for displaying information to the user and a keyboard and a pointing device, e.g., a mouse or a trackball, by which the user can provide input to the computer. Other kinds of devices can be used to provide for interaction with a user as well; for example, feedback provided to the user can be any form of sensory feedback, e.g., visual feedback, auditory feedback, or tactile feedback. For example, output may be provided via any form of sensory output, including (but not limited to) visual output (e.g., visual gestures, video output), audio output (e.g., voice, device sounds), tactile output (e.g., touch, device movement), temperature, odor, etc.
Further, input from the user can be received in any form, including acoustic, speech, or tactile input. For example, input may be received from the user via any form of sensory input, including (but not limited to) visual input (e.g., gestures, video input), audio input (e.g., voice, device sounds), tactile input (e.g., touch, device movement), temperature, odor, etc.
Further, a natural user interface (NUI) may be used to interface with a user. In this context, a “NUI” may refer to any interface technology that enables a user to interact with a device in a “natural” manner, free from artificial constraints imposed by input devices such as mice, keyboards, remote controls, and the like.
Examples of NUI techniques may include those relying on speech recognition, touch and stylus recognition, gesture recognition both on a screen and adjacent to the screen, air gestures, head and eye tracking, voice and speech, vision, touch, gestures, and machine intelligence. Example NUI technologies may include, but are not limited to, touch sensitive displays, voice and speech recognition, intention and goal understanding, motion gesture detection using depth cameras (e.g., stereoscopic camera systems, infrared camera systems, RGB (red, green, blue) camera systems and combinations of these), motion gesture detection using accelerometers/gyroscopes, facial recognition, 3D displays, head, eye, and gaze tracking, immersive augmented reality and virtual reality systems, all of which may provide a more natural interface, and technologies for sensing brain activity using electric field sensing electrodes (e.g., electroencephalography (EEG) and related techniques).
Implementations may be implemented in a computing system that includes a back end component, e.g., as a data server, or that includes a middleware component, e.g., an application server, or that includes a front end component, e.g., a client computer having a graphical user interface or a Web browser through which a user can interact with an implementation, or any combination of such back end, middleware, or front end components. Components may be interconnected by any form or medium of digital data communication, e.g., a communication network. Examples of communication networks include a local area network (LAN) and a wide area network (WAN), e.g., the Internet.
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims. While certain features of the described implementations have been illustrated as described herein, many modifications, substitutions, changes and equivalents will now occur to those skilled in the art. It is, therefore, to be understood that the appended claims are intended to cover all such modifications and changes as fall within the scope of the embodiments.
Number | Name | Date | Kind |
---|---|---|---|
6747976 | Bensaou | Jun 2004 | B1 |
7606154 | Lee | Oct 2009 | B1 |
7761875 | Karamanolis et al. | Jul 2010 | B2 |
8477610 | Zuo et al. | Jul 2013 | B2 |
8630173 | Sundar et al. | Jan 2014 | B2 |
8671407 | Ballani et al. | Mar 2014 | B2 |
8681614 | McCanne | Mar 2014 | B1 |
20030179774 | Saidi | Sep 2003 | A1 |
20110292792 | Zuo | Dec 2011 | A1 |
20120127857 | Sundar | May 2012 | A1 |
20130003543 | Ludwig | Jan 2013 | A1 |
20130254375 | Agiwal et al. | Sep 2013 | A1 |
20150282136 | Wenham | Oct 2015 | A1 |
Number | Date | Country |
---|---|---|
101674242 | Dec 2011 | CN |
102270104 | Jul 2013 | CN |
103346978 | Oct 2013 | CN |
2014021839 | Feb 2014 | WO |
Entry |
---|
Lam, et al., “NetShare and Stochastic NetShare: Predictable Bandwidth Allocation for Data Centers”, In Proceedings of ACM SIGCOMM Computer Communication Review, vol. 42, No. 3, Jul. 2012, pp. 5-11. |
Kramer, et al., “Fair Queueing with Service Envelopes (FQSE): A Cousin-Fair Hierarchical Scheduler for Subscriber Access Networks”, In IEEE Journal on Selected Areas in Communications, vol. 22, No. 8, Oct. 8, 2004, pp. 1497-1513. |
Medhi, et al., “Virtual Machines Cooperation for Impatient Jobs under Cloud Paradigm”, In International Journal of Information & Communication Engineering, vol. 7, Issue 1, Jan. 2011, pp. 1119-1125. |
Chuck, et al., “Bandwidth Recycling In IEEE 802.16 Networks”, In Proceedings of IEEE Transactions on Mobile Computing, vol. 9, No. 10, Jul. 1, 2010, pp. 1451-1464. |
Guo, et al., “A Cooperative Game Based Allocation for Sharing Data Center Networks”, In Proceedings of IEEE INFOCOM, Apr. 14, 2013, 9 Pages. |
Pan, et al., “Approximate Fair Bandwidth Allocation: A Method for Simple and Flexible Traffic Management”, In Proceedings of 46th Annual Allerton Conference on Communication, Control, and Computing, Sep. 23, 2008, pp. 1081-1085. |
Kabbani, et al., “AF-QCN: Approximate Fairness with Quantized Congestion Notification for Multi-tenanted Data Centers”, In Proceedings of IEEE 18th Annual Symposium on High Performance Interconnects, Aug. 18, 2010, pp. 58-65. |
“Class-Based Weighted Fair Queuing on Cisco Nexus 1000V Series Switches: Manage Congestion for Virtualized Data Center and Cloud Environments”, Retrieved on: Dec. 11 Available at: http://www.cisco.com/c/en/us/products/collateral/switches/nexus-1000v-switch-vmware-vsphere/ white_paper_c11-704041.html. |
Mishra, et al., “Managing Network Reservation for Tenants in Oversubscribed Clouds”, In Proceedings of IEEE 21st International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems, Aug. 14, 2013, pp. 50-59. |
Iancu, et al., “A Comparison of Feedback Based and Fair Queuing Mechanisms for Handling Unresponsive Traffic”, In Proceedings Sixth IEEE Symposium on Computers and Communications, Jul. 5, 2001, pp. 288-295. |
McCullough, et al., “The Role of End-to-End Congestion Control in Networks with Fairness-Enforcing Routers”, Published on: Apr. 8, 2013 Available at: http://cseweb.ucsd.edu/˜snoeren/papers/decongestion-tr.pdf. |
Popa, et al., “ElasticSwitch: Practical Work-Conserving Bandwidth Guarantees for Cloud Computing”, In Proceedings of ACM SIGCOMM, Aug. 12, 2013, pp. 351-362. |
Pan, et al., “Choke a Stateless Active Queue Management Scheme for Approximating Fair Bandwidth Allocation”, In Proceedings of Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies, Mar. 26, 2000, pp. 1-10. |
Alasem, Rafe, “Efficient and Fair Bandwidth Allocation AQM Scheme for Wireless Networks”, In International Journal of Computer Networks, vol. 2, Issue 2, Jun. 10, 2010, pp. 132-139. |
Xia, et al., “One More Bit Is Enough”, In Proceedings of ACM SIGCOMM, Aug. 21, 2005, 12 Pages. |
Jeyakumar, et al., “EyeQ: Practical Network Performance Isolation at the Edge”, In Proceedings of 10th USENIX conference on Networked Systems Design and Implementation, Apr. 2, 2013, 15 pages. |
“Second Written Opinion Issued in PCT Application No. PCT/US2016/012009”, dated Dec. 19, 2016, 6 Pages. |
“International Preliminary Report on Patentability Issued in PCT Application No. PCT/US2016/012009”, dated Apr. 10, 2017, 07 Pages. |
“International Search Report and Written Opinion Issued in PCT Application No. PCT/US2016/012009”, dated Apr. 13, 2016, 11 Pages. |
Abendrot, et al., “Solving the Trade-Off Between Fairness and Throughput: Token Bucket and Leaky Bucket-Based Weighted Fair Queueing Schedulers”, In AEU—International Journal of Electronics & Communications, vol. 4, Issue 5, May 2, 2006, pp. 404-407. |
Number | Date | Country | |
---|---|---|---|
20160212065 A1 | Jul 2016 | US |