The present invention relates generally to mobile communication networks and more particularly to an overload controller to prevent excessive loading in network nodes within the network.
In a wireless communication network, excessive processing loads at a network node within the network may lead to system crashes and, consequently, loss of system capacity. To avoid these problems, overload controls are employed to prevent excessive loading at network nodes. In general, overload controls should be rarely used and are intended primarily to avoid system collapse during rare overload events. Frequent activation of overload controls indicates that system capacity is insufficient and should be increased.
Overload controls are difficult to develop and test in a lab setting because extremely high offered loads must be generated and a wide range of operating scenarios must be covered. Also, because overload controls are meant to be activated infrequently in the field, undetected bugs may not show up for several months after deployment. These factors suggest the need to emphasize control robustness over system performance in the design of overload controls. In general, it is less costly to improve control robustness while maintaining adequate performance than it is to extract the last few ounces of system performance while maintaining adequate robustness.
The present invention is related to a method and apparatus for controlling the flow of incoming messages to a processor. A message throttler uses fractional tokens and controls the admission rate for incoming messages such the admission rate is proportional to the rate of incoming messages. Upon the arrival of an incoming message, the message throttler increments a token count by a fractional amount to compute a new token count, compares the new token count to a threshold, and admits a message from a message queue if the new token count satisfies a threshold. In one embodiment, the fractional amount of the tokens is dependent on the processing load.
The present invention may be employed to provide overload control in a network node in a communication network. A load detector monitors one or more processors located at the network node and generates a load indication. In one embodiment, the load indication is a filtered load estimate indicative of the load on the busiest processor located at the network node. The load indication is provided to a load controller. The load controller detects an overload condition and, when an overload condition exists, computes a message admission criteria based on the load indication. The message admission criteria may comprise, for example, an admission percentage expressed as a fraction indicating a desired percentage of the incoming messages that should be admitted into the network node. An admission controller including one or more message throttlers controls the admission of new messages into the network node based on the admission percentage provided by the admission controller, i.e., throttles incoming message streams.
In one embodiment, the admission percentage is applied across all message streams input into the network node. In other embodiments, the admission percentage may be applied only to those message streams providing input to the overloaded processor. When an overload condition exists, the load controller periodically computes the admission percentage and provides the admission percentage periodically to the admission controller. When the overload condition dissipates, the load controller signals the admission controller to stop throttling the incoming messages.
The wireless communication network 10 is a packet-switched network that employs a high-speed forward packet data channel (F-PDCH) to transmit data to the mobile stations 12. Wireless communication network 10 comprises a packet-switched network 20 including a Packet Data Serving Node (PDSN) 22, and Packet Control Function (PCF) 24, and one or more access networks (ANs) 30. The PDSN 22 connects to an external packet data network (PDN) 16, such as the Internet, and supports PPP connections to and from the mobile station 12. The PDSN 22 adds and removes IP streams to and from the ANs 30 and routes packets between the external packet data network 16 and the ANs 30. The PCF 14 establishes, maintains, and terminates connections from the AN 30 to the PDSN 22.
The ANs 30 provide the connection between the mobile stations 12 and the packet switched network 20. The ANs 30 comprise one or more radio base stations (RBSs) 32 and an access network controller (ANC) 34. The RBSs 32 include the radio equipment for communicating over the air interface with mobile stations 12. The ANC 34 manages radio resources within their respective coverage areas. An ANC 34 can manage more than one RBSs 32. In cdma2000 networks, an RBS 32 and an ANC 34 comprise a base station 40. The RBS 32 is the part of the base station 40 that includes the radio equipment and is normally associated with a cell site. The ANC 34 is the control part of the base station 40. In cdma2000 networks, a single ANC 34 may comprise the control part of multiple base stations 40. In other network architectures based on other standards, the network components comprising the base station 40 may be different but the overall functionality will be the same or similar.
Each network node (e.g. RBS 32, ANC 34, PDSN 22, PCF 24, etc.) within the wireless communication network 10 may be viewed as a black box with M message streams as input. The network node 40 can be any component in the wireless communication network 10 for processing messages. The message streams can be from a mobile station 12 (e.g., registration messages) or the network 10 (e.g., paging messages). A generic network node denoted by reference numeral 40 is shown schematically in
The load detector 46 monitors the load on all processors 42 and reports a maximum load to the load controller 48. One measure of the load is the utilization percentage. Each processor 42 is either doing work or is idle because no work is queued. The kernel for each processor 42 measures the load by sampling the processor 42 and determining the percentage of time it is active. Denoting each processor 42 with the subscript i, a load estimate γi for each processor 42 is filtered by the load detector 46 to produce a filtered load estimate {circumflex over (ρ)}i. In the discussion below, the processor 42 with the maximum estimated load is denoted i*. The time constant of the load estimate filter should be roughly equal to the average inter-arrival time of messages from the stream that creates the most work for the particular processor 42. The load reporting period should be chosen based on an appropriate tradeoff between signaling overhead and overload reaction time. The time constant and the load reporting period can be determined in advance based on lab measurements. The load reporting periods for each processor 42 should preferably be uncorrelated in order to avoid bursty processing by the load detector 46.
At any point in time the network node 40 is in one of two states, normal or overloaded. In the normal state, the estimated load {circumflex over (ρ)}i for each processor 42 is less than a predetermined threshold ρmax and the admitted load for each processor 42 equals the offered load. The network node 40 is in the overloaded state when the processing load for one or more processors 42 exceeds the threshold ρmax. The network node 40 remains in the overloaded state until: 1) the maximum load for all processors 42 drops below the threshold ρmax, and 2) the admitted load equals the offered load for all processors 42.
The load detector 46 reports the maximum estimated load {circumflex over (ρ)}i* among all processors 42 to the load controller 48. The load controller 48 determines the percentage of incoming messages that should be admitted to the network node 40 to maintain the maximum estimated load {circumflex over (ρ)}i* below the threshold ρmax. The percentage of incoming messages that are admitted is referred to herein as the admission percentage and is expressed in the subsequent equations as a fraction (e.g. 0.5=50%). The admission percentage is denoted herein as α(n), where n designates the control period. Note, that the control period may be a fixed period or a variable period. The admission controller 50, responsive to the load controller 48, manages the inflow of new messages into the network node 40 to maintain the admission percentage α(n) at the desired level. The admission percentage α(n) is continuously updated by the load controller 48 from one control period to the next while the overload condition persists.
Consider the instant when the network node 40 first enters an overloaded state. Assume that there M different message streams denoted by the subscript j. The message arrival rate for each message stream may be denoted by λj and the average processing time for all messages may be denoted si*j. The maximum estimated load {circumflex over (ρ)}i*(0) for the busiest processor 42 at the start of the first control period is given by:
where ρbkg represents the load generated internally by operating system management processes in the processor 42. It is assumed that ρbkg is a constant value and is the same for all processors 42. The admission percentage α(1) for the first control period in the overload event needed to make the expected processing load equal to ρmax satisfies the equation:
Solving Equations (1) and (2), the admission percentage α(1) for the first control period in the overload event can be computed according to:
The admission percentage α(1) is reported to the admission controller 50, which throttles incoming messages in each message stream. The admission controller 50 may throttle all incoming message streams, or may throttle only those message streams providing input to the overloaded processor 42.
In the second control period of an overload event, it may be assumed that the message arrival rate for each message stream is reduced to α(1)λj through the first control period. Therefore, the admission percentage α(2) for the second control period is given by:
In general, the admission percentage for a given control period is given by:
For the first control period in an overload event, α(1) may be assumed to be 1. Once the filtered load estimate {circumflex over (ρ)}i*(n) for the busiest processor 42 is close to ρmax, the load controller 48 maintains the same admission percentage. If the filtered load estimate {circumflex over (ρ)}i*(n) is smaller than ρmax the admitted load is increased, while if it is larger than ρmax, the admitted load is decreased. The network node 40 is no longer in an overloaded state once the admission percentage α(n) becomes larger than unity.
Note that an overload event is triggered when the maximum estimated load {circumflex over (ρ)}i*(n) exceeds ρmax for the busiest processor 42. However, the overload control algorithm continues to be active even if the maximum load drops below ρmax. The reason is that a drop in load does not necessarily indicate reduction in the offered load to the network node 40, but may be due to a reduction in the admitted load. Hence, once overload control is triggered, the maximum estimated load {circumflex over (ρ)}i*(t) cannot be used to determine overload dissipation.
As {circumflex over (ρ)}i*(n) drops below ρmax, α(n) increases. If {circumflex over (p)}i*(n) remains below ρmax even when α(n) is greater than unity, the network node 40 is no longer in an overload state since the admitted load equals the offered load without any processors 42 exceeding the load threshold ρmax. Hence, dissipation of the overload condition is detected by monitoring α(n).
As noted above, the load controller 48 periodically reports the admission percentage α(n) to the admission controller 50. The admission controller 50 includes a message throttler 52 for each message stream an exemplary message throttler is shown in
During a control period with a duration T, an average of λT messages arrive which causes B to increase by α(n)λT. Hence, the number of messages served equals the floor of α(n)λT. Hence the admitted rate is α(n) times the offered rate λ as required by the load controller 48. Message throttling is terminated when α(n)>1 for a predetermined number of consecutive periods. An admission percentage greater than unity implies that there is no throttling. In some embodiments, the message throttler 52 may modify α(n) based on message type. The admission percentage α(n) may be increased for higher priority messages and lowered for low priority messages.
The admission percentage is also used to detect the dissipation of an overload condition. The load controller 48 compares the admission percentage to 1 (block 132). An admission percentage equal to or greater than 1 implies no message throttling. If the admission percentage is greater than 1, the load controller 48 increments a counter (block 134). The load controller 48 compares the counter value to a predetermined number N (block 136). When the counter value reaches N, the network node 40 is considered to be in a normal, non-overloaded state. In this case, the load controller 48 sets the overload flag to false (block 138), sets alpha equal to 1 (block 138), and signals the admission controller 150 to stop message throttling (block 140). After checking the counter and performing any required housekeeping functions, the load controller 48 sends the admission percentage to the admission controller 50 (block 144) and determines whether to continue load controller (block 146). Normally, load control is performed continuously while the network node 40 is processing messages. In the event that load control is no longer desired or needed, the procedure ends (block 148).
When the processing time per message is small compared to the control interval the admission control can quickly reduce congestion. However, in some cases (e.g. T&E log collection), a single message (to turn on collection of the logs) can result in significant work on all processors 42. In such a case, it may be desirable to pre-empt such tasks. In other words, if an overloaded condition is detected, non-essential tasks should be terminated or at least the operator should be warned that user traffic will be affected if the task is not terminated.
If such non-essential tasks are not terminated, the overload control algorithm described above is still effective in protecting against overload as shown in the following example. Assume ρmax=80 and that the average utilization of the busiest processor is 70%. Also assume that background processing tasks consume 10% of processor cycles. Now suppose that some task is started that uses 20% of the processor cycles. This work is not reduced by throttling the usual messages and hence is uncontrollable. If the above algorithm is used, the admission percentage α(1) for the first control period in an overload that α(1)=(80−10)/(90−10)=⅞. The filtered load estimate at the beginning of the first control period is {circumflex over (ρ)}(1)=30+(⅞)*60=82.5 since only 60% of the load on the busiest processor 42 is controlled and not the 80% based on our estimate of the background work. At the end of the second control interval, these calculations can be repeat to obtain α(2)=0.84483 and {circumflex over (ρ)}(2)=80.7. Therefore, within two control periods, the overload control brings the utilization of the busiest processor within 1% of its target value even though the assumption on the background work was incorrect. Note that the actual admitted load is less than that computed here since only an integer number of messages are accepted (the floor of αλT). Therefore the processor utilization is reduced faster in practice.
A similar reasoning can be used to show that, the overload control works well even if the background processing load ρbkg is different for different processors 42 and we simply use an average value in the algorithm (as opposed to using the value that corresponds to the busiest processor 42). If the background processing load of the busiest processor 42 is less than the average over all processors 42, the algorithm converges to the target threshold from below.
The present invention may, of course, be carried out in other specific ways than those herein set forth without departing from the scope and essential characteristics of the invention. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, and all changes coming within the meaning and equivalency range of the appended claims are intended to be embraced therein.