BROADBAND COMMUNICATION LINK PERFORMANCE MONITORING METHOD FOR COMMUNICATION DEVICES

TECHNICAL FIELD

The present disclosure relates generally to systems and methods for managing communication systems. More particularly, the present disclosure relates to systems, devices and methods for monitoring operation and performance of one or more communications links within a communication network.

BACKGROUND

The complexity of modern communication network systems presents a great challenge to managing communication links in an efficient manner. One important aspect of link management is throughput, which is commonly measured by transferring a large file between two communication devices in a network. The resulting traffic tends to degrade the performance of user payload traffic within the network. In addition, in metered access networks, the file transfer is counted toward data usage, which may trigger throughput throttling or a data usage charges, thus, rendering downloading large files an unsuitable method for continuous monitoring of link performance.

Packet pairing is a common technique to measure link throughput by consecutively sending two packets, measuring the dispersion between correspondingly received timestamps, and computing throughput by dividing packet size by dispersion. While this approach reduces the impact on user payload traffic performance, the measurements require highly accurate timestamps, which may be not suitable to certain network architectures. For example, many access networks employ traffic shaping to limit maximum data-rate. To measure the throughput that an end-user experiences, the measurement scheme needs to send a sufficient number of packets to trigger traffic shaping so as to avoid over-estimating the actual end-user throughput. Since the packet pairing method sends only two packets, it does not trigger traffic shaping and, thus, oftentimes over-estimates the throughput of the access network in the presence of a traffic shaper. Cross-traffic may cause an increase in packet dispersion due to additional queueing delay at the router when multiple traffics intersect, which may cause the packet pairing to under-estimate the actual throughput on the link. Packet train dispersion may improve the throughput estimation accuracy by increasing the number of transmitted packets and applying statistical analysis. Packet train dispersion may also be used to detect the presence of traffic shaping. Unfortunately, the injection of a packet train may negatively impact payload traffic performance and typically cannot be used to continuously monitor the performance of an access network.

Communication devices behind a gateway have no public IP address and, thus, cannot be reached from outside of the network. Network Address Translation (NAT) techniques are used to translate an address between a private IP address/port pair and a public IP address/port pair. Oftentimes, NAT uses a translation table that contains entries that map private IP address/port pair(s) to public IP address/port pair(s). An entry may be deleted if a communication session is inactive for a certain timeout duration. The IP address relationship between many home network devices and external networks may be maintained using NAT hole punching, whereby “keep-alive IP packets” are periodically exchanged with an external server to keep entries in the NAT table. However, the packets used for NAT hole punching are not well-suited for monitoring access network performance.

Accordingly, what is needed are systems, devices, and methods that can efficiently and continuously monitor communication link performance while overcoming the shortcomings of existing methods.

SUMMARY OF THE PRESENT DISCLOSURE

Embodiments of the present disclosure describe a method that continuously monitors an access network and determines whether the access network supports a service type of interest and accurately measures link throughput with little or no impact on payload traffic performance, while enabling NAT hole punching. In embodiments, an agent (e.g., hardware and/or software) located behind a NAT periodically measures the packet dispersion by transmitting/receiving a short burst of communication packets to or from a remote/outside server and determines whether a link can support a particular service type by comparing the minimum required data rate of the service to the lower bound of throughput estimated from the packet dispersion. The frequency of occurrence of this transmission may be adjusted such that NAT hole punching may be maintained. When more accurate throughput measurement is desired, embodiments of the present disclosure may measure data transfer throughput without degrading user payload traffic by using certain protocols (e.g., Lower-Than-Best-Effort Transport Protocols, such as Low Extra Delay Background Transport (LEDBAT)), such that, in the presence user payload traffic, the transmission rate is decreased such as to avoid interference with the user payload traffic.

In embodiments, the method for periodically monitoring the communication link performance while enabling NAT traversal comprises: (1) transmitting at least one communication packet, which comprises a timestamp and an identifier, by a first communication device behind a NAT and coupled to a second communication device via a network that comprises a communication link; (2) measuring the time of the arrival of the communication packet at the second communication device; (3) deriving a communication performance from the timestamp in the packet and the measured time of the arrival at the second communication device; (4) acknowledging the received packets by sending packets comprising a timestamp, an identifier, and sequence number by the second communication device that acknowledges received packets by comprising a (receive) timestamp, a (receive) identifier, and a sequence number; (5) measuring the time of the arrival of the communication packets at the first communication device; (6) deriving the communication performance from the timestamp in the packet and the measured time of the arrival at the first communication device; (7) triggering the measurement of throughput of the communication link by the first communication device if a trigger condition is met. In certain embodiments, throughput measurement is triggered if the lower bound of a throughput estimate is lower than a predefined threshold, or if a timer expires. In embodiments, throughput is measured by transferring large amounts of data using certain protocols (e.g., Lower-Than-Best-Effort transport protocols), such that the throughput measurement does not degrade user payload traffic performance.

BRIEF DESCRIPTION OF THE DRAWINGS

References will be made to embodiments of the present disclosure, examples of which may be illustrated in the accompanying figures. These figures are intended to be illustrative, not limiting. Although the present disclosure is generally described in the context of these embodiments, it should be understood that it is not intended to limit the scope of the present disclosure to these particular embodiments. Items in the figures are not to scale.

FIG. 1 is a block diagram of a communication link monitoring system according to various embodiments of the present disclosure.

FIG. 2 is an exemplary flowchart illustrating a method for monitoring a communication link by an agent according to various embodiments of the present disclosure.

FIG. 3 is an exemplary flowchart illustrating a method for monitoring a communication link at a server according to various embodiments of the present disclosure.

FIG. 4 illustrates an exemplary probing packet structure according to various embodiments of the present disclosure.

FIG. 5 depicts an operation for estimating broadband performance according to various embodiments of the present disclosure.

FIG. 6 illustrates an exemplary speed of Internet payload traffic and Internet speed test, according to various embodiments of the present disclosure.

FIG. 7 illustrates an exemplary system for speed of Internet payload traffic and Internet speed test, according to embodiments of the present disclosure.

FIG. 8 depicts a simplified block diagram of a computing device/information handling system, in accordance with embodiments of the present disclosure.

DETAILED DESCRIPTION OF EMBODIMENTS

In the following description, for purposes of explanation, specific details are set forth in order to provide an understanding of the present disclosure. It will be apparent, however, to one skilled in the art that the present disclosure can be practiced without these details. Furthermore, one skilled in the art will recognize that embodiments of the present disclosure, described below, may be implemented in a variety of ways, such as a process, an apparatus, a system, a device, or a method on a tangible computer-readable medium.

Components, or modules, shown in diagrams are illustrative of exemplary embodiments of the present disclosure and are meant to avoid obscuring the present disclosure. It shall also be understood that throughout this discussion that components may be described as separate functional units, which may comprise sub-units, but those skilled in the art will recognize that various components, or portions thereof, may be divided into separate components or may be integrated together, including integrated within a single system or component. It should be noted that functions or operations discussed herein may be implemented as components. Components may be implemented in software, hardware, or a combination thereof.

Furthermore, connections between components or systems within the figures are not intended to be limited to direct connections. Rather, data between these components may be modified, re-formatted, or otherwise changed by intermediary components. Also, additional or fewer connections may be used. It shall also be noted that the terms “coupled,” “connected,” or “communicatively coupled” shall be understood to include direct connections, indirect connections through one or more intermediary devices, and wireless connections.

Reference in the specification to “one embodiment,” “preferred embodiment,” “an embodiment,” or “embodiments” means that a particular feature, structure, characteristic, or function described in connection with the embodiment is included in at least one embodiment of the present disclosure and may be in more than one embodiment. Also, the appearances of the above-noted phrases in various places in the specification are not necessarily all referring to the same embodiment or embodiments.

The use of certain terms in various places in the specification is for illustration and should not be construed as limiting. A service, function, or resource is not limited to a single service, function, or resource; usage of these terms may refer to a grouping of related services, functions, or resources, which may be distributed or aggregated.

The terms “include,” “including,” “comprise,” and “comprising” shall be understood to be open terms and any lists the follow are examples and not meant to be limited to the listed items. Any headings used herein are for organizational purposes only and shall not be used to limit the scope of the description or the claims. Each reference mentioned in this patent document is incorporate by reference herein in its entirety.

Furthermore, one skilled in the art shall recognize that: (1) certain steps may optionally be performed; (2) steps may not be limited to the specific order set forth herein; (3) certain steps may be performed in different orders; and (4) certain steps may be done concurrently.

In this document the terms “average speed of payload downstream traffic,” “payload downstream rate,” and “user payload traffic speed” are used interchangeably. Similarly, the terms “Internet downstream speed test” and “speed test downstream rate” are used interchangeably, and “download speed for Internet speed test” and “traffic rate of speed test traffic,” are used interchangeably. Further, a location is considered “behind” a device if that location is further away from the Internet/cloud than the device.

Although the present disclosure is described in the context of “maximum,” or “average,” values, a person of skill in the art will appreciate that other statistical measures, such as averages, median, percentile, standard deviation, variance, variation, maximum, minimum, and n-th order statistics may be used. Similarly, the systems and methods described with respect to downstream measurements may be equally applied to upstream measurements.

FIG. 1 is a block diagram of a communication link monitoring system according to various embodiments of the present disclosure. In embodiments, the system in FIG. 1 continuously and concurrently determines whether an access network supports a service type of interest and enables NAT traversal. The system may accurately measure link throughput with a reduced effect on payload traffic performance. The system comprises a server 100, a gateway 110, and LAN devices 140. The gateway 110 is coupled to the server 100 via broadband connection 150. In the example in FIG. 1, an agent 130-1 resides within a gateway 110, and an agent 130-2 resides within a LAN device 140-1. An access network 160 may be part of the broadband connection 150, which connects the gateway 110 to the Internet or other network. For example, access network 160 may be a DSL system or a cable modem system. The broadband connection 150 may experience problems such as a low throughput, excessive latency, an outage or other problems known to one of skill in the art. Such problems may occur at various locations within a network, including the access network 160.

An agent 130 may be located behind an NAT 120 and communicate with the server 100 using NAT traversal operations. LAN devices 140 are coupled to gateway 110 and located behind the NAT 120. One skilled in the art will recognize that the LAN devices 140 use NAT traversal operations in order to communicate with server via an address translation procedure within the NAT 120.

In operation, the agent 130 may periodically send at least one communication packet to the server 100. The rate at which communication packets are sent may be fixed, variable, configurable, or otherwise controlled, e.g., by the agent 130 itself or by some external source (not shown). The packet may comprise information, such as a timestamp and the identity of the agent, that enables link measurement and may be used to monitor an upstream performance of the broadband connection 150. In certain instances, the period is set shorter than the NAT binding timeout to maintain a NAT hole. The agent 130 may trigger more accurate broadband throughput measurements if appropriate, e.g., by sending a large file. When the server 100 receives the packets from the agent 130, the server 100 measures the time of the arrival of the communication packet and derives from the timestamp in the received packet and the measured time of the arrival one or more communication performance metrics, as will be discussed with reference to FIG. 4. The server 100 may then send one or more acknowledgement packets back to the agent 130. The communication packets may comprise information such as the timestamp that is used to monitor the downstream performance of the broadband connection. Moreover, the communication packets could have the information that can be used to discover round-trip performance or upstream performance of the broadband connection.

In embodiments, the agent 130 measures the time of the arrival of the communication packets from the server 100. Then, the agent 130 derives one or more communication performances from the timestamp in the received packet and the measured arrival time of the packets. The agent 130 may initiate or request more accurate throughput measurement of upstream or downstream broadband connection, for example if a problem in the broadband connection is detected. In embodiments, an accurate throughput may be measured by transferring large files between the agent 130 and a speedtest server 170. In certain examples, the speedtest server 170 is embedded in the server 100.

FIG. 2 is an exemplary flowchart illustrating a method for monitoring a communication link by an agent according to various embodiments of the present disclosure. The method may be applied by a system such as the system shown in FIG. 1 or by other systems that fall under the scope and spirit of the present disclosure.

In certain embodiments, the agent 130 performs steps enabling the detection of link performance using the steps set forth and/or combinations with supplemental steps thereof. The process may begin when a trigger 200, e.g., an agent that periodically triggers the transmission of packets to a server. In certain examples, a triggering period may be set shorter than or equal to a NAT binding timeout to maintain a NAT hole. In embodiments, when no prior knowledge about the NAT binding timeout exists, the periodic trigger may test different periods, monitor the acknowledgement packets from the server, and determine a periodicity with which the agent 130 receives acknowledgement packets. If triggered, the agent 130 may transmit M packets 210 to the server 100, where M is larger than or equal to one.

FIG. 4 illustrates an exemplary probing packet structure according to various embodiments of the present disclosure. One skilled in the art will recognize that the packet architecture of a transmitted packet may be modified, supplemented or otherwise changed to allow link monitoring. In example in FIG. 4, the packet comprises UDP header, agent identity (ID), a sequence number, and a timestamp when the packet was sent. In addition, the packet may comprise measurement results from prior packet exchanges, or other parameters, which allow the server 100 or agent 130 to better evaluate a link, e.g., link quality. If a large packet is desirable to improve the accuracy of monitoring, the agent 130 may add a random data to a transmitted packet. Third, the agent 130 receives N acknowledge packets from the server 100 and may obtain a timestamp for each received packet, as shown in 220 of FIG. 2.

In embodiments, the agent 130 derives a communication performance metric based on a timestamp obtained at step 230 and the information in a receive packet. The communication performance metric may comprise a queue delay, latency, round-trip-time (RTT), probability of error, lower bound of the downstream throughput, and a probability that a downstream throughput is below a threshold, e.g., a threshold defined by a minimum downstream throughput for supporting certain services such as IPTV or the minimum speed promised by a broadband provider. One skilled in the art will recognize that other link characteristics may be monitored and/or identified using various embodiments of the present disclosure.

At step 240 in FIG. 2, the agent 130 may trigger a more accurate throughput test, e.g., if a trigger condition is satisfied. If the trigger condition is not satisfied, the agent 130 may return to periodic trigger step 200. In embodiments, a more accurate throughput measurement is triggered when the lower bound of the throughput is less than a predetermined threshold, e.g., the minimum throughput that supports certain services such as IPTV. In embodiments, the throughput measurement is triggered when a timer expires. The throughput measurement triggering may be delayed until ongoing traffic through the gateway may fall below a predefined threshold. If throughput measurement is triggered, the agent 130 begins throughput test.

In embodiments, the throughput is measured by moving a large file between the server 100 and the agent 130. For downstream throughput measurement, the agent 130 may download a large file from a server. For upstream throughput measurements, the agent 130 may (create and) use a large file to upload it to the server. It is noted that that the server for throughput test could be different from the server 100 and may comprise any type of web server that allows upload and download of large files. Since a large file transfer may degrade the performance of payload traffic, in embodiments, throughput measurement triggering may be delayed until the ongoing payload traffic in the gateway drops below a threshold.

According to various embodiments of the present disclosure, the agent 130 may be integrated within a gateway and may function as a proxy server for LAN devices behind the NAT so as to allow other LAN devices behind the NAT to connect to the server without requiring that each LAN device perform NAT traversal operations. In this example, the agent 130 may be positioned behind a NAT and maintain a connection to an external server by periodically exchanging packets. The agent 130 may run a proxy server that receives communication packets from other LAN devices, relay the packets to the destination outside the home network, receive packets whose destination are to LAN devices, and relay the packets to the corresponding LAN devices. For example, a socket secure protocol (“SOCKS”) may be utilized as a proxy server. When relaying a packet, the agent 130 may use the local address and port pair that was previously used, e.g., for NAT hole punching. As a result, not all LAN devices need to perform NAT traversal operations.

FIG. 3 is an exemplary flowchart illustrating a method for monitoring a communication link at a server according to various embodiments of the present disclosure. The server 100 may be coupled to the agent 130 to continuously monitor the agent 130 and determine whether the broadband connection 150 supports a service type of interest, while, at the same time, enabling NAT traversal. At step 300, the server 100 may receive packets from the agent 130 and measure received timestamps. At step 310, the server may send N acknowledge packets to the agent 130. In embodiments, the packet sent by the server 100 may be the same and comprise some of the same information as that sent by the agent 130, e.g., as shown in FIG. 4, which illustrates an exemplary probing packet structure according to various embodiments of the present disclosure.

Returning to FIG. 3, in embodiments, the packet sent by the server 100 may comprise the sequence number and the timestamp written in the received packets. At step 320, the server 100 may derive communication performance from the timestamp obtained at step 300 and the information contained in the received packet. In embodiments, the communication performance comprises queue delay, latency, probability of error, lower bound of the upstream throughput, and a probability that the upstream throughput is below a threshold defined as the minimum upstream throughput that supports certain services such as IPTV. One skilled in the art will recognize that communication performance may comprise other and/or additional parameters relevant to the communication link.

The server 100 starts to wait for the packets from the agent 130. In embodiments, the server 100 may provide a web service for large file upload and download that can be used by the agent 130 to measure the upstream and downstream throughput.

FIG. 5 depicts an operation for estimating broadband performance characteristics of a communication link, according to various embodiments of the present disclosure. Characteristics may comprise queue delay, latency, RTT, probability of error, throughput, and the probability that the throughput is below a threshold where the threshold is the minimum throughput to support certain services such as IPTV.

As depicted in FIG. 5, the agent 130 transmits one packet with B_Ubytes to the server 100, and the upstream throughput without a load is R_Ukbps. The server 100 transmits two packets with B_Dbytes to the agent 130, and the downstream throughput without a load is R_Dkbps. Because the measurement involves a small number of packets, it does not affect the quality of payload traffic. In FIG. 5, t denotes timestamp, and T denotes time duration. The time measurements may contain, e.g., three subscripts, each separated by comma with the first subscript denoting the type, the second subscript denoting a batch index, and the last subscript denoting a sequence number. The five types of letters represent: t for transmit, r for receive, q for queue delay, d for dispersion, b for baseline delay, o for processing delay, such as processing delay such as OS latency. The batch index k indicates that it is the k-th packet exchanged between the server and the agent 130. The sequence number is the index for the packets within a batch, starting from 1. For ease of explanation, it is assumed that the sequence numbers for upstream and downstream are counted together per batch, which is different from the sequence number in the probing packet structure in FIG. 4. For example, t_t,k,n/t_r,k,ndenotes the transmit/receive timestamp of the n-th packet during the k-th packet exchange between server 100 and agent 130. Similarly, T_q,k,ndenotes the queueing delay of the n-th packet during the k-th packet exchange.

The estimate of delays is denoted as D_a,b,kwhere a denotes either downstream D or upstream U, b denotes the type, and k is either the batch number (if it is an instantaneous estimate) or the statistics type (if it is a statistic obtained using estimates from multiple batches). The following types are used for D: q for queue delay, d for dispersion, b for baseline delay, o for OS delay, w for one way delay. Note that D is used to represent “estimate” and T is used to denote ground truth. For example, D_U,w,kis the estimate of the upstream one-way delay for k-th batch. In embodiments, the agent 130 counts the number of packet drops based on sequence number and measures the packet loss rate by dividing the number of packet drops by the number of received packets.

In embodiments, the agent 130 may transmit a packet 560 with transmit timestamp t_s,k,1as shown in the FIG. 5. The packet may arrive at the server 100 at time t_r,k,1. The upstream baseline delay, i.e., the delay from agent 130 to the server when there is no traffic, may be T_b,k,1. When there is cross-traffic 580 in the path, the packet may be further delayed by queueing delay T_q,k,1. The received packet 570 may be dispersed by T_d,k,1due to finite upstream bandwidth R_Uwhere T_d,k,1=8*B_D/R_Dmsec. The server 100 and the agent 130 are oftentimes not time-synchronized; therefore, the timestamp in the server 100 and the agent 130 have a clock offset T_Δthat may fluctuate over time but is relatively stable when compared to the queuing delay and, thus, has no batch index. Then t_s,k,1−t_r,k,1=T_b,k,1+T_q,k,1+T_d,q,1+T_Δ. In embodiments, the server 100 spends time T_o,k,1to prepare the packets 500 and sends the packets 500 and 510 to the agent 130 at time t_s,k,2and t_s,k,3, respectively. Δt_s,k=t_s,k,3−t_s,k,2is the time between two consecutive packet transmissions.

Packets 530 and 550 may correspond to the transmitted packets 500 and 510 and they may be received at respective times t_r,k,2and t_r,k,3. Similar to the upstream condition, the downstream baseline delay is T_b,k,2. In embodiments, when there is cross-traffic 520 in the path, the packet 530 may be further delayed by queueing delay T_q,k,2. The received packet 530 may be dispersed by T_d,k,2due to finite bandwidth R_Dwhere T_d,k,2=8*B_D/R_Dmsec. Similarly, when there is cross-traffic 540 in the path, the packet 550 may be further delayed by queueing delay T_q,k,3. The received packet 550 may be dispersed by the same 8*B_D/R_Dmsec if the packets 530 and have same size and if the downstream throughput R_Dis unchanged.

Using these measurements, various embodiments of the present disclosure may derive the upstream one-way delay as:

D
_U,w,k
=t
_r,k,1
−t
_s,k,2
=T
_b,k,1
+T
_q,k,1
+T
_d,l,1
+T
_Δ

The server may estimate the D_U,w,kusing timestamp t_s,k,1written in the packet 560. It is noted that the one-way delay estimate D_U,w,kmay be inaccurate due to clock offset T_Δ. However, in embodiments, queuing delay and delay jitter may be relatively accurately estimated even with clock offset, e.g., by using statistical analysis methods.

First, the minimum one-way delay may be defined as D_U,w,min=min_{k=1, . . . ,K}D_U,w,k. Over an extended period of time, the upstream path and upstream throughput may remain unchanged. In this example, the baseline delay and dispersion may be constant over a measurement period, and thus drop batch index k, i.e., T_b,k,1=T_b,1, T_d,k,1=T_d,1, for ∀k. Then, D_U,w,min=D_U,w,kfor k when the queueing delay is zero, i.e., T_q,k,1=0. Therefore, D_U,w,min=T_b,1+T_d,1+T_Δ.

The estimate of queueing delay at packet k is equal to D_U,q,k=D_U,w,k−D_U,w,min. Since the queueing delay typically increases with queues in the upstream path, queueing delay may be used as a good indicator of congestion in the upstream path. Likewise, one may define one-way delay jitter as D_U,w,jitter=std(D_U,w,k)=std(T_q,k,1), where std(X) represents the standard deviation of the random variable X, because T_b,1+T_d,1+T_Δnearly constant. Thus, the one-way delay jitter may be used as a good indicator of poor multi-media communication performance.

The downstream one-way delay estimate is:

D
_D,w,k
=t
_s,k,2
−t
_r,k,2
=T
_b,2
+T
_q,k,2
+T
_d,2
−T
_Δ;

the downstream minimum delay estimate is D_D,w,min=min_{k=1, . . . ,K}D_D,w,k;

the downstream queue delay estimate is D_D,q,k=D_D,w,k−D_D,w,min; and

the downstream one-way delay jitter is D_D,w,jitter=std(D_D,w,k)=std(T_q,k,2).

Note that the agent 130 can measure downstream queue delay and jitter if the transmit timestamp t_s,k,2is present in the transmitted packet 500. Further note that the one-way delay measured using the second downstream packet 510 may be inaccurate if T_q,k,2+T_d,2>Δt_s,k, because t_s,k,3−t_r,k,3=T_b,2+T_q,k,2+T_d,k,2+T_d,k,31+T_q,k,3−Δt_s,k−T_Δ, which is affected by both queuing delays and Δt_s,k. Therefore, In embodiments, the one-way delay may be analyzed by using only the first received packet if the queue delay of the first packet is larger than a threshold, which may be Δt_s,k−T_d,k,2.

One skilled in the art will recognize that the equations and mathematical expression herein are intended to be representative of certain embodiments. Other variations of the present disclosure may be described by other and/or additional equations and variables.

In embodiments, the agent 130 may derive the upstream queue delay and upstream delay jitter from RTT, downstream queue delay, and downstream delay jitter; therefore, the upstream measurement by the server 100 does not need to be written in transmitted packet 500.

First, the agent 130 may measure RTT as:

RTT_k=t_r,k,2−t_s,k,1=T_b,1+T_q,k,1+T_d,k,1+T_o,k,1+T_b,k,2+T_q,k,2+T_d,2

which is independent of clock offset T_Δ. The minimum RTT may be defined as RTT_min=min_{k=1, . . . ,K}RTT_kin certain examples, and the sum of queue delay in both direction is D_DU,q,k=RTT_k−RTT_min=T_q,k,1+T_q,k,2because the routing path, upstream/downstream rate, and the time a server prepares a packet, T_o,k,1, are relatively constant over a length of time. In embodiments, the agent 130 may compute the upstream queue delay as D_U,q,k=D_DU,q,k−D_D,q,k, e.g., if D_U,q,kis not in packet 500. The RTT jitter may be computed as RTT_jitter=std(RTT_k)=std(T_q,k,1+T_q,k,2). Since the upstream and downstream queue delays are often uncorrelated, the upstream delay jitter D_U,w,jittermay be estimated from RTT jitter as D_U,w,jitter=√{square root over (RTT_jitter²−D_U,w,jitter²)} and, thus, the agent 130 does not need to obtain the server's upstream delay jitter estimate in packet 500. Again, the mathematical expressions and representations are intended to be representative of examples of embodiments, there may be other embodiments that are defined mathematically differently.

In embodiments, the agent 130 may derive downstream throughput by analyzing the dispersion to identify the lower bound of the access network speed. The agent 130 may estimate the downstream dispersion from the difference of two timestamps received in the agent 130, i.e., D_D,d,k=t_r,k,3−t_r,k,2=T_q,k,3+T_d,k,3and may estimate a downstream bottleneck throughput as {circumflex over (R)}_D,k=B_D/D_D,d,k. In embodiments, the agent 130 may discard the downstream bottleneck throughput estimate, e.g., if D_D,q,2>Threshold. If the bottleneck is located at the end of the path, {circumflex over (R)}_D,kmay represent the lower bound of actual throughput R_D,k. Because the agent 130 is coupled to the access network portion of the broadband connection, such as DSL and Cable, and the access network tends to be the bottleneck link for broadband connection, {circumflex over (R)}_D,kmay be the lower bound of downstream throughput of access network. In the gateway, the agent 130 may have access to a counter that measures the number of bytes that the gateway receives during a certain period of time. In embodiments, the agent 130 may use such a counter in lieu of B_D, the number of bytes in the downstream transmit packet, e.g., to improve the accuracy of the throughput estimation.

In embodiments, the agent 130 may be aware of the minimum downstream rate that LAN devices use, denoted as R_D,req, which aids in identifying a likelihood that the throughput is below the threshold. For example, if a user watches HDTV streaming at a rate of 6 Mbps and using LAN device 140-1, the minimum downstream throughput of the access network R_D,reqis 6 Mbps. If {circumflex over (R)}_D,k≥R_D,req, the access network has sufficient downstream capacity to support the user service. If {circumflex over (R)}_D,k<R_D,req, it is possible that the access network does not have enough downstream capacity to support such user service since {circumflex over (R)}_D,kis the lower bound of the access network capacity. In embodiments, e.g., based on historical data, P(R_D,k≥R_D,req), the probability that the downstream access network provides enough capacity for user service at k-th batch may be computed, where P(R_D,k≥R_D,req)=1 if {circumflex over (R)}_D,k≥R_D,req, and is a monotonically decreasing function of R_D,req−{circumflex over (R)}_D,kif {circumflex over (R)}_D,k<R_D,req.

In embodiments, the agent 130 may estimate accurate downstream throughput of a broadband connection if a trigger condition 240 is satisfied. Accurate downstream throughput is an important parameter to monitor in order to ensure that an ISP honors its SLA (Service Level Agreement), e.g., the broadband speed that an ISP promises to deliver to the user. Oftentimes, broadband speed is limited not by the capacity of the access network but rather by a traffic shaper that delays the downstream packet if the traffic shaper's queue is full, e.g., the gateway receives more than a certain number of bytes over certain a period of time. A measurement system should send a sufficient number of bytes/packets to trigger the traffic shaping to monitor the downstream broadband speed.

In embodiments, the server 100 may transmit N packets to the agent 130 and then compute the broadband speed as {circumflex over (R)}_D,max=max_k(N−1)B_D/(t_r,k,N+1−t_r,k,2). In embodiments, the server 100 may start to transmit 2 packets (N₁=2) for the first batch and transmit more packets (e.g., N_k=2*N_k) until (N−1)B_D/(t_r,k,N+1−t_r,k,2) starts to decrease in the absence of queuing delay. In yet another embodiment, each batch of measurements may be repeated to improve the accuracy of the estimate. It is noted that this process reduces disruption to the payload traffic since only the last measurement would trigger traffic shaping. Assuming, for example, that L measurements are performed and that each measurement uses twice as many packets as the immediately preceding measurement. Since this increases the number of packets until the Internet speed decreases, which means traffic shaping was triggered, only the last measurement would have triggered the traffic shaping. Therefore, for the first L−1 measurements, the payload traffic would not have been affected by the traffic shaping, i.e., disruptions to the payload traffic are significantly reduced.

In embodiments, the agent 130 may estimate accurate throughput of the broadband connection by transferring a large file between the agent 130 and the server 170. For example, if a file with B kBytes are transferred from the speedtest server 170 to the agent in t1 seconds, the agent 130 may estimate the downstream broadband throughput as B*8/t1 Kbps. If a user uses the broadband connection during the measurement, such a large file transfer may degrade the performance user payload traffic. The agent 130 may first ascertain the presence of ongoing user payload traffic. In embodiments, the agent 130 may read the number of bytes that the gateway has received from the broadband connection over the last t2 seconds, and declares that there was user payload traffic in the downstream direction if the received number of bytes is greater larger than a threshold and defer the triggering of an accurate downstream throughput measurement. However, the absence of user payload traffic for those t2 seconds may not ensure the absence of any new user payload traffic during the measurement. In embodiments, to minimize the impact of a large file transfer on new user payload traffic, the agent 130 may use the lower-than best-effort transport protocol, which automatically yields to TCP flows. In embodiments, the agent 130 and speedtest server 170 use LEDBAT as the transport protocol.

As previously mentioned, embodiments of the present disclosure may be used to monitor whether an ISP provides an Internet speed that is set forth by an SLA. For example, the SLA may specify a certain download speed, R_down, for a given time. To determine whether the specified speed in the SLA is met, R_downmay be compared to a current Internet download speed, x(t), using existing Internet speed test tools. However, such existing methods have three main problems:

First, if R_downis high, the speed test requires a relatively large amount of data; thus, consuming a relatively large amount of Internet bandwidth. For example, if R_downis 1 Gbps and the duration or a test is 1 second, the speed test may require the transfer of 125 MB of data.

Second, during the speed test, the quality of Internet services may degrade since the user payload traffic has to share bandwidth with the speed test traffic; especially, if both have the same priority (e.g., when both use the TCP protocol), then user payload traffic may suffer packet loss and an unwanted reduction in speed.

Third, Internet service quality may change over time. For example, a greater number of users may use Internet services in the evenings, such that SLA download speed requirements may be not met at certain times of the day. As another example, during certain times, radio interference may be present, again, resulting in the specified download speed not being met. As a result, infrequent speed tests may not be able to detect an existing discrepancy between R_downspecified in the SLA and the actual download speed.

Embodiments, of the present disclosure address the above-mentioned problems in several ways:

(1) Instead of measuring Internet speed up to a maximum R_down, certain embodiments determine whether test packets in addition to the user payload traffic may be successfully transmitted between an agent and a server. If additional test packets may be transmitted without affecting user payload traffic quality, it may be concluded that an ISP does not apply throttling to the user payload and that, thus, the user's Internet experience is not affected by, e.g., the download speed specified in the SLA, R_down.

To illustrate how certain embodiments test whether additional test packets may be transmitted, the following assumptions may be made with reference to FIG. 6 that illustrates an exemplary speed of Internet payload traffic and Internet speed test according to various embodiments of the present disclosure:

T_sdenotes a sampling interval for a speed measurement (e.g., one sample taken every second). Note that for ease of presentation uniform (equidistant) sampling is assumed. In practice, sampling interval T_smay be adapted according to a payload traffic pattern and/or previously obtained Internet speed test results. It is also noted that presented downstream speed measurements and tests are merely exemplary. Similarly, the presented methods may equally be used for upstream speed tests.

x(n) denotes, within a measurement window in sampling interval T_swhere n represents the sample index, the average speed of payload downstream traffic that is the sum of Internet download bandwidths used by all downstream payload services at time (n−1)T_s≤t<nT_s.

z(n) denotes the Internet downstream speed test at time (n−1)T_s≤t<nT_s.

T1 denotes the duration of the sampling interval (e.g. 60 sec.) when a characteristic of the payload traffic is monitored.

N1 is the number of payload traffic downstream speed samples, N1=T1/T_s.

N2 is the number of Internet downstream speed test samples, N2=T2/T_s, and t=0 indicates the time when the speed test starts. T2 denotes the speed measurement interval duration.

R_max(T1) is the maximum downstream user payload traffic speed between T1≤t<0 in the absence of speed test traffic, which is the same as max(x(n)), N1≤n<0.

R_downis the download speed specified, e.g., in the SLA.

The problem is to detect whether R_max(T1)=max(x(n)) over N1≤n<0 was throttled by the ISP.

Note that z(t) is less than R_max, the maximum payload speed between −T1≤t<0 or the download speed R_downspecified in the SLA; however, the sum of the payload downstream rate and the speed test downstream rate may be higher than R_max.

To test this hypothesis, in embodiments, an agent may download packets at the rate of z(n), such that

max(z(n))=R_dover 0≤n<N2 where R_d≤R_max(T1) and R_down.

Optionally, sum(z(n)+x(n), 0≤n<=N2)≥B_s, where B_sis the minimum data size that triggers traffic shaping.

Note that z(n) is smaller than R_max(T1) and R_down. In prior art systems, z(n) is greater than R_downand oftentimes unlimited. Therefore, embodiments of the present disclosure use a lower amount of download traffic to measure the Internet speed.

In embodiments, if z(n)+x(n)≥(R_max(T1)+Threshold), or any statistics applied to (z(n)+x(n)) is ≥R_max(T1), it may be concluded that additional test packets may be downloaded over the Internet, i.e., the Internet service was not throttled.

Conversely, if z(t)+x(t), or any statistics applied to (z(n)+x(n)) is <(R_max(T1)+Threshold), in embodiments, it may be concluded that the Internet service may have been throttled. When this event is detected, optionally, the Internet download speed may be tested without a rate limit or with a rate limit at R_down, which may be the download speed specified by an SLA. In embodiments, if this Internet download speed test shows that the measured Internet download speed is less than the specified R_down, it may be concluded that the download speed in the SLA is not met.

In embodiments, R_d, the download speed for the Internet speed test samples, N2, and the Threshold may be configured based on statistics of the speed of payload downstream traffic, x(n), and the number of samples to determine statistics of the payload traffic speed samples N1. As an example, assuming that Internet speed was measured by uniform sampling within a sampling interval T_s, and further assuming a Gaussian distribution of x(n) over −N1≤n<0 having a standard deviation R_sand an average R_a, then, the probability that x(n)+R_d≥R_max(T1)+Threshold at each sample n is 16% if R_dis set to R_max(T1)+Threshold−R_a−R_s. Assuming that x(n) are independent and identically distributed random variables, and R_dis set to R_max(T1)+Threshold−R_a−R_s, then the probability that x(n)+R_d>R_max(T1)+Threshold at least once for 0≤t<N2 is 1−(1−0.15)^N2. Based on this relationship, N2 and R_dmay be selected such that they provide a target detection probability. For example, given R_d, N2 may be set by setting 1−(1−0.15)^N2such as to have a certain desirable probability p if R_dwas set as R_max(T1)+Threshold−R_a−R_s. If R_dis set differently, N2 may be empirically determined or by using any method known in the art. Likewise, Threshold may be set to adjust a confidence interval. Assuming user traffic is random, as a person skilled in the art will appreciate, the confidence interval of the statistics of measured traffic speed may be computed given N1 repeated measurements. For example, instead of using the maximum of the payload traffic speed, the confidence interval of the maximum traffic speed may be computed and used for setting Rmax(T1).

In embodiments, the sampling interval, T_s, or the sampling method in general may be adapted based on the line characteristics. For example, if the RTT between an agent and a speed test server is relatively long, T_smay be increased in order to mitigate the impact of a TCP slow start. In another example, if the user payload traffic is bursty, or the number of Internet users is large, then T_sshould be set relatively short to capture the bursty behavior.

(2) To minimize the impact on user payload traffic, in embodiments, the Internet speed test packets may use a lower-than-best-effort transport protocol such as LEDBAT.

(3) Due to the conditions in (1) and (2), Internet speed need not be continuously monitored. Therefore, in embodiments, an Internet speed test is triggered when it is likely that the Internet speed is throttled.

In embodiments, machine learning methods may be employed to learn when and how to trigger an Internet speed test. An exemplary machine learning method may use features that have been extracted from user payload traffic speed x(n), previous speed test results, non-invasive speed test (e.g., packet pairing, packet dispersion measurement, or RTT measurement) results, and other features that may be collected by an agent to determine a likelihood that Internet speed is throttled. For example, if the maximum user payload speed v[k]=max(x[n]) may be measured every minute, where k represents a sample index within K maximum user payload speed measurements used for testing the likelihood of Internet throttling, and if max(v[k])−min(v[k]) is small for K minutes, e.g., K=5 minutes (during which the maximum user payload speed is determined 5 times), then it is more likely that the Internet speed is throttled at a speed equivalent to max(v[k]).

In embodiments, if a non-invasive speed test detects a burst of packet loss, it is determined that it is more likely that the Internet speed has been throttled. In embodiments, by applying machine learning methods that use, for example, logistic regression, the likelihood of Internet speed throttling may be estimated and then a speed test may be triggered in response to the likelihood being greater than a given threshold.

In embodiments, the triggers for Internet speed tests for different agents may be coordinated such as to enhance the diagnostics of network problems and enable SLA violation detection. Six exemplary use cases of such coordination are discussed next:

(1) In typical access networks, many access lines such as DSL, PON, and Cable Internet are connected to a network aggregation unit such as DSLAM, ONU, and cable head-end, as shown in FIG. 7, which illustrates an exemplary system for speed of Internet payload traffic and Internet speed test, according to embodiments of the present disclosure.

Then, traffic from a plurality of lines may be connected to the Internet via a single aggregated line. For example, many lines coupled to the same access network may connect to the Internet via an access aggregation unit, such as a DSLAM. In another example, many wireless lines may be connected to a base station that connects to the Internet. Therefore, when the users connected to the access network aggregation unit consume a large bandwidth, the single aggregated line may represent a bottleneck. Therefore, in embodiments, when a trigger condition is satisfied, e.g., in one of the agents, then more than one of the agents sharing the same network aggregation unit may initiate an Internet speed test, such that the connection between the network aggregation unit and the Internet can be tested.

(2) Since a speed test uses a significant amount of Internet bandwidth, this may create network congestion if many network nodes run speed tests at the same time. Therefore, various embodiments distribute the speed test load across a network such as to avoid congestion. In embodiments, Internet speed tests may be scheduled such that only a relatively small number of agents that share the same access network simultaneously are permitted to run the speed test.

(3) If a user experiences a network problem, certain embodiments determine the location of the problem by measuring the speed between different nodes in the network. In embodiments it is determined whether the problem is caused by a Wi-Fi problem or an access network problem. To identify the problem, two or more Internet speed test agents that are coupled to the gateway (or CPE) may simultaneously start an Internet speed test, e.g., if a trigger condition is satisfied. If the access network is identified as the source of a problem, all agents involved in the Internet speed test may be assigned a lower-than historically normal speed. Conversely, if the Wi-Fi is identified as the problem, some agents may be assigned a normal speed, while the agent that triggered the Internet speed test may be assigned a lower-than historically normal speed. The test server and agent may be located at the access aggregation unit. To identify the problem, embodiments may measure (1) the speed between access aggregation node and Internet and (2) the speed between the access aggregation node and CPE; and attribute the problem to an access network if measurement (2) indicates a problem.

(4) To test relatively high maximum speed, e.g., 1 Gbps, it may be difficult for one agent to transmit and receive high speed communication flow due to hardware/software limitations such as CPU, memory, and OS. To solve this issue, in embodiments, two or more Internet speed test agents connected to and/or embedded into a gateway (or CPE) may simultaneously start an Internet speed test if the trigger condition is satisfied. Since multiple agents are transmitting and receiving data, it is easier to reach relatively high data rates, e.g., 1 Gbps. In embodiments, a speed test involving multiple agents may be coordinated by an agent at the gateway/CPE or by a server.

(5) When there is more than one test server, in embodiments, two Internet speed triggers, e.g., each corresponding to a different test server, may be coordinated such as to detect the location of the network problem. For example, when the Internet speed test result measured between an agent and the test server in FIG. 7 is relatively low, then a speed test with another test server (not shown) may be triggered. If the result is consistent, it is likely caused by a broadband speed issue. If not, the result is likely not caused by a broadband speed issue.

(6) In embodiments, when an agent has more than one broadband connection, the triggers for the broadband connections may be coordinated. For example, assuming that the speed tests are triggered for all broadband connections, the difference of the ratio of different speed test results may indicate some Internet speed throttling in one of the broadband connections.

In embodiments, the Internet speed test agents may coordinate with each other or they may be coordinated by a number of test servers. For example, a test server may receive speed test trigger(s) from local or remote agents and send speed test triggers to more than one of the agents that are connected to the same access network aggregation unit. In another example, an agent may send triggers to all agents connected to the same access network aggregation unit or CPE.

It is understood that there may be many possible ways to identify the agents connected to the same access network aggregation unit. For example, in embodiments, ICMP traceroute may be used to discover the host name of an adjacent network node. In another example, one may send LAN broadcast packets to discover agents that are connected to the same LAN.

FIG. 8 depicts a simplified block diagram of a computing device, in accordance with embodiments of the present disclosure. It will be understood that the functionalities shown for system 800 may operate to support various embodiments of a computing system—although it shall be understood that a computing system may be differently configured and include different components, including having fewer or more components as depicted in FIG. 8.

As illustrated in FIG. 8, the computing system 800 includes one or more central processing units (CPU) 801 that provides computing resources and controls the computer. CPU 801 may be implemented with a microprocessor or the like, and may also include one or more graphics processing units (GPU) 819 and/or a floating-point coprocessor for mathematical computations. System 800 may also include a system memory 802, which may be in the form of random-access memory (RAM), read-only memory (ROM), or both.

A number of controllers and peripheral devices may also be provided, as shown in FIG. 8. An input controller 803 represents an interface to various input device(s) 804. The computing system 800 may also include a storage controller 807 for interfacing with one or more storage devices 808 that might be used to record programs of instructions for operating systems, utilities, and applications, which may include embodiments of programs that implement various aspects of the present invention. Storage device(s) 808 may also be used to store processed data or data to be processed in accordance with the invention. The system 800 may also include a display controller 809 for providing an interface to a display device 811, which may be a cathode ray tube (CRT), a thin film transistor (TFT) display, organic light-emitting diode, electroluminescent panel, plasma panel, or other type of display. The computing system 800 may also include one or more peripheral controllers or interfaces 805 for one or more peripherals. Example of peripheral may include one or more printers, scanners, input devices, output devices, sensors, and the like. A communications controller 814 may interface with one or more communication devices 815, which enables the system 800 to connect to remote devices through any of a variety of networks including the Internet, a cloud resource (e.g., an Ethernet cloud, a Fiber Channel over Ethernet (FCoE)/Data Center Bridging (DCB) cloud, etc.), a local area network (LAN), a wide area network (WAN), a storage area network (SAN) or through any suitable electromagnetic carrier signals including infrared signals.

In the illustrated system, all major system components may connect to a bus 816, which may represent more than one physical bus. However, various system components may or may not be in physical proximity to one another. For example, input data and/or output data may be remotely transmitted from one physical location to another. In addition, programs that implement various aspects of the invention may be accessed from a remote location (e.g., a server) over a network. Such data and/or programs may be conveyed through any of a variety of machine-readable media.

Aspects of the present invention may be encoded upon one or more non-transitory computer-readable media with instructions for one or more processors or processing units to cause steps to be performed. It shall be noted that the one or more non-transitory computer-readable media shall include volatile and non-volatile memory. It shall be noted that alternative implementations are possible, including a hardware implementation or a software/hardware implementation. Hardware-implemented functions may be realized using application specific integrated circuits (ASICs), programmable arrays, digital signal processing circuitry, or the like. Accordingly, the terms in any claims are intended to cover both software and hardware implementations. Similarly, the term “computer-readable medium or media” as used herein includes software and/or hardware having a program of instructions embodied thereon, or a combination thereof. With these implementation alternatives in mind, it is to be understood that the figures and accompanying description provide the functional information one skilled in the art would require to write program code (i.e., software) and/or to fabricate circuits (i.e., hardware) to perform the processing required.

It shall be noted that embodiments of the present invention may further relate to computer products with a non-transitory, tangible computer-readable medium that have computer code thereon for performing various computer-implemented operations. The media and computer code may be those specially designed and constructed for the purposes of the present invention, or they may be of the kind known or available to those having skill in the relevant arts. Examples of tangible computer-readable media include, but are not limited to: magnetic media such as hard disks; optical media such as CD-ROMs and holographic devices; magneto-optical media; and hardware devices that are specially configured to store or to store and execute program code, such as ASICs, programmable logic devices (PLDs), flash memory devices, and ROM and RAM devices. Examples of computer code include machine code, such as produced by a compiler, and files containing higher level code that are executed by a computer using an interpreter. Embodiments of the present invention may be implemented in whole or in part as machine-executable instructions that may be in program modules that are executed by a processing device. Examples of program modules include libraries, programs, routines, objects, components, and data structures. In distributed computing environments, program modules may be physically located in settings that are local, remote, or both.

One skilled in the art will recognize no computing system or programming language is critical to the practice of the present invention. One skilled in the art will also recognize that a number of the elements described above may be physically and/or functionally separated into sub-modules or combined together.

It will be appreciated to those skilled in the art that the preceding examples and embodiments are exemplary and not limiting to the scope of the present disclosure. It is intended that all permutations, enhancements, equivalents, combinations, and improvements thereto that are apparent to those skilled in the art upon a reading of the specification and a study of the drawings are included within the true spirit and scope of the present disclosure. It shall also be noted that elements of any claims may be arranged differently including having multiple dependencies, configurations, and combinations.

	Number	Date	Country
	62756032	Nov 2018	US
	62624475	Jan 2018	US

BROADBAND COMMUNICATION LINK PERFORMANCE MONITORING METHOD FOR COMMUNICATION DEVICES

Information

Publication Number

Date Filed

Date Published

Inventors

Original Assignees

CPC

International Classifications

Abstract

Description

Claims

BACKGROUND

PCT Information

Provisional Applications (2)