This disclosure relates to network analysis, and in particular to a high speed, low overhead determination of bandwidth between peer network devices.
High speed data networks form part of the backbone of what has become indispensable worldwide data connectivity. Within the data networks, network devices such as switching devices direct data packets from source ports to destination ports, helping to eventually guide the data packets from a source to a destination. Improvements in identifying available bandwidth will further enhance the communication capabilities of data networks.
The network 100 is not limited to any particular implementation or geographic scope. As just a few examples, the network 100 may represent a private company-wide intranet; a wide-area distribution network for cable or satellite television, Internet access, and audio and video streaming; or a global network (e.g., the Internet) of smaller interconnected networks. The data center 110 may represent a highly concentrated server installation 150 with attendant network switch and router connectivity 152. The data center 110 may support extremely high volume e-commerce, search engines, cloud storage and cloud services, streaming video or audio services, or any other types of functionality.
In the example in
At any given location, the gateway may connect to any number and any type of node. In the example of
It may be of particular importance or interest to determine the available bandwidth between any two peer devices communicating in the network 100. Further, it may be beneficial to do so in manner that directly tests the bandwidth for data transfer between the two devices. That is, the bandwidth tests preferably are free of influence by surrounding concerns, such as transmission control protocol (TCP) processing or hypertext transport protocol (HTTP) overhead and stack performance, and processor load or speed. The test architecture below achieves gigabit test rates, with little or no host CPU utilization. As a result, the CPU may remain dedicated to, e.g., running customer applications.
The user interface 209 and the input/output interfaces 206 may include a graphical user interface (GUI), touch sensitive display, voice or facial recognition inputs, buttons, switches, speakers and other user interface elements. Additional examples of the input/output interfaces 206 include microphones, video and still image cameras, headset and microphone input/output jacks, Universal Serial Bus (USB) connectors, memory card slots, and other types of inputs. The input/output interfaces 206 may further include magnetic or optical media interfaces (e.g., a CDROM or DVD drive), serial and parallel bus interfaces, and keyboard and mouse interfaces.
The system circuitry 204 may include any combination of hardware, software, firmware, or other logic. The system circuitry 204 may be implemented, for example, with one or more systems on a chip (SoC), application specific integrated circuits (ASIC), discrete analog and digital circuits, and other circuitry. The system circuitry 204 is part of the implementation of any desired functionality in the device 200. In that regard, the system circuitry 204 may include circuitry that facilitates, as just a few examples, connecting to a speed test control server, requesting a speed test, receiving in response the address of a data server, establishing a test connection with the data server, initiating bandwidth tests with the data server, and analyzing packet streams to determine test results.
The control server and data server may be any other devices in communication with the device 200, e.g., the control server 250 and the data server 252. There may be any number of control servers and data servers. Furthermore, a single device may implement the functionality of both a control server and a data server.
As just one example, the system circuitry 204 may include one or more processors 220 and memories 222. The memory 222 and storage devices 214, 216 store, for example, control instructions 224 and an operating system 226. The processor 220 executes the control instructions 224 and the operating system 226 to carry out any desired functionality for the device 200, including bandwidth determination functionality. The control parameters 228 provide and specify configuration and operating options for the control instructions 224, operating system 226, and other functionality of the device 200. As a few examples, the control parameters may include user datagram protocol (UDP) port numbers, internet protocol (IP) addresses, test algorithm parameters, such as packet lengths, number of iterations, latency thresholds, packet loss thresholds, and other parameters.
The server side of the architecture 300 includes, either logically or physically, a control server 302 and data servers, e.g., the data server 304. The control server 302 accepts connections from client devices, e.g., the client device 306 that will test bandwidth between the client device and a data server. As just two examples, the client device 306 may be a cable modem, DSL modem, or other type of gateway device.
The control server 302 coordinates the test process with the client devices. The control server 302 may also manage the data servers which are responsible for sending the data traffic, for a downstream bandwidth test, or receiving data traffic, for an upstream bandwidth test. The control server 302, client device 306, and data server 304 may implement a control protocol, a specific example of which is provided in detail below.
With reference also to the logic flow in
The bandwidth test includes generating and sending test packets in data streams between the client device and the data server (914). The data streams are analyzed to measure the packet loss of the test connection, and to measure round trip delay (916). A test algorithm determines bandwidth based on the measurements across one or more data streams (918).
In some implementations, there is no global clock, and instead the device generating the test traffic generates and inserts latency packets into the test traffic flow. The latency packets may be identified by a predetermined payload bit pattern inserted by the device generating the test traffic. The receiving device identifies the latency packet and returns it to the device generating the test traffic. Because the latency packet has the original time stamp from generation, the device generating the test traffic may determine the round trip delay.
Stated in another manner, the architecture 300 partitions the bandwidth test within the same traffic flow, e.g., the UDP test packet stream. In one aspect, the traffic flow includes the test packets that determine that bandwidth of the link. The receiving device may simply count and drop the test packets without any further substantial processing. In another aspect, the traffic flow includes special packets, e.g., the latency packet, on which to perform more specific processing, e.g., receive, recognize, and return to the sender.
Note also that the control server 302 may establish control connections to the data servers, e.g., the control connection 316. The control server 302 may use the control connection 316 to manage the data servers. For instance, the control connection 316 may convey diagnostic information to the control server 302 concerning which the load and number of ongoing bandwidth tests handled by the data server.
In the architectures 300 and 400, the bandwidth test operates in connection with packet generator functionality and packet analyzer functionality. Either may be a hardware or software implementation. The packet generator functionality generates test packets that form the test traffic. The packets have a configured duration, payload length, and are sent at a specific test rate to test across a range of possible bandwidths. The packet analyzer functionality receives the test packets and updates statistics that reflect bandwidth and latency.
In some implementations, the packet generator and packet analyzer implement packet snooping to facilitate generation of test packets, and recognizing and receiving test packets. The packet generator and packet analyzer may use address information including a predetermined source IP address and port number, and destination IP address and port numbers.
The packet generator may be in the datapath, e.g., a network processor on an Ethernet interface in the communication interface 202. When the address information is established, the packet generator may listen on all interfaces for the different communication channels flowing through the communication interface. The packet generator may thereby recognize an initial packet addressed with the address information. After detecting the initial packet, the packet generator may determine packet characteristics (e.g., payload length) of the initial packet, and apply the packet characteristic as a template from which to generate a data flow of bandwidth test packets to the test partner. That is, the initial packet is a sample from which the packet generator generates the test packets in the bandwidth test traffic flow. Thus, the packet generator, e.g., at a network processor, may offload the task of generating test packets for bandwidth testing from packet circuitry that generated the initial packet.
The “packet circuitry” may generate the initial packet used as a template, and that the packet circuitry may be a host / system CPU, with the offload done from the host CPU to a network processor, e.g., on a network card to generate the test packet stream.
The packet analyzer may perform a similar analysis. That is, the packet analyzer may not know how the test packets will be received (e.g., the specific IP address), though it may have a preconfigured socket to which the packet analyzer is bound. Accordingly, the packet analyzer may watch all interfaces for the preconfigured socket to receive a matching packet. The packet analyzer may thereby determine the test packet characteristics. Note also that the packet analyzer may examine the packet payload to identify special packets, such as the latency packet. The packet analyzer may pass special packets up the stack for processing, but update statistics and drop test packets.
The client device 506 communicates with a unified server 516 (e.g., a server computer system). The unified server 516 also includes support for communication sockets 518 and network device drivers 520. A bandwidth test application 522 also runs on the unified server 516.
For the upstream bandwidth test 502, the bandwidth test application 508 initiates the bandwidth test, including configuring the packet accelerator 614 with address and socket information bound to the bandwidth test process. The bandwidth test application 508 may send an initial packet that that packet accelerator 514 detects to use as a template, and generate the test packets in the bandwidth test traffic flow. The test packets traverse the network, and the unified server 516 receives them and passes them to the packet analyzer in bandwidth test application 522. The bandwidth test application 522 executes an available test algorithm on the test packets, determines results, and may send the results back to the client device 506.
For the WAN downstream test 504, the client device 506 requests the data server functionality in the unified server 516 to generate a bandwidth test traffic flow. In the unified server 516, the bandwidth test application 522 generates the test packets and sends them through the network to the client device. The packet analyzer in the packet accelerator 514 at the client device 506 receives the packets and executes an available test algorithm to determine bandwidth test results. The packet accelerator 514 communicates the bandwidth test results to the bandwidth test application 508. Note that the packet analyzer in the packet accelerator 514 may listen on the bound bandwidth test port across all interfaces to find the test packets.
The unified server 606 communicates with a client device 616 (e.g., a personal computer). The client device 616 also includes support for communication sockets 618 and network device drivers 620. A bandwidth test application 622 also runs on the client device 616.
In the upstream bandwidth test 602, the packet generator in the client device 616 generates the test packets. The test packets in the bandwidth test traffic flow reach the packet analyzer in the packet accelerator 614. The packet analyzer runs an available test algorithm on the received test packets and communicates test results back to the client device 616.
For the downstream bandwidth test 604, the packet accelerator 614 in the unified server 606 generates the test packets after having captured the characteristics of an initial packet sent by the bandwidth test application 608. The test packets form a bandwidth test traffic flow which reaches the packet analyzer implemented in the bandwidth test application 622. The packet analyzer runs an available test algorithm on the received test packets and determines the test results.
Test algorithms may be configured to perform a wide range of analyses. The test algorithm attempts to find the maximum bandwidth that passes one or more pre-configured bandwidth tests. As one specific example, the goal of the bandwidth test may be to find ‘Good Put’ (GP), where GP=Maximum Bandwidth@Packet Loss<Maximum Packet Loss. Expressed another way, GP is an estimate of the maximum bandwidth that can be achieved that has acceptable packet loss (which may be 0%). The test algorithm may perform a series of test steps until the GP is determined. In each test step, the test algorithm may analyze a given traffic flow of test packets. The result of the current test step may determine whether a new test step is executed, and another traffic flow generated for analysis. The test algorithm may report results to the client device, as examples, the GP in Kbps, the packet loss (e.g., a number or percentage of packets), and round trip delay (e.g., average round trip delay in psec).
All of the test algorithm parameters may dynamically set, e.g., as user-configurable parameters, fixed and pre-determined. In one implementation, the test algorithm is a binary search of an initial bandwidth range for a specific number of search steps. The binary search may begin with a specified maximum bandwidth, and end when: the specified maximum number of test steps is reached, or the traffic flow bandwidth equals the maximum bandwidth.
In the example of
Another example test algorithm uses the measured packet loss of the current test step to compute the next traffic flow bandwidth. For example, the test algorithm may implement:
If current Test Step packet loss>=15%
Next Test Stream Bandwidth=Current Test Stream Bandwidth*85%
The test ends when, for instance:
Maximum Number of Steps is reached, OR
Test Stream Bandwidth equals Maximum Bandwidth
This test algorithm often converges faster than a binary search, e.g., in two test steps.
The test algorithm may also address network buffering, which can skew bandwidth measurements because it artificially reduces packet loss. As one example, the bandwidth algorithm may run each test step for a longer time to reduce buffering interference, e.g., three or more seconds per test step. As another example, the algorithm may use both packet loss and round-trip delay to qualify successful/failed test steps. In some implementations, the typical round-trip latency is obtained prior to generation of the test streams. Then, at each test step the test algorithm compares the obtained round-trip latency against the typical latency. If the obtained latency is significantly higher (above a predetermined threshold) than the typical latency, the test step is voided and a new test step is issued.
For instance, the test algorithm may implement a hybrid binary search:
Good Put=Maximum Bandwidth@Packet Loss<Maximum Packet Loss AND Round-Trip Delay<Minimum Round-Trip Delay+Margin (%).
The hybrid binary search implements a round trip delay filter to further determine when a particular test step has passed or failed. Adding the round trip delay filter may allow each test step to run for a shorter duration, e.g., 1 second.
The test algorithm may also be run at various intervals, e.g., periodically, pseudo-randomly, or other interval, to detect increasing or decreasing delay or packet loss after a first successful test is run and a maximum bandwidth value is selected.
Additionally or alternatively, a comparison between multiple test steps may be used (e.g., a search). For example, the round-trip latency may be tracked along with packet loss. In some cases, as the test step bandwidth is reduced packet loss may hold steady, but round-trip latency may continue to decrease as the bandwidth is decreased between test steps. Similarly, round-trip latency may increase as the bandwidth between test steps increases. Once test step bandwidth reaches a value that results in little or no network buffering, the round-trip delay and packet loss may not necessarily change as the bandwidth between test steps is changed, e.g., a bandwidth level for which the derivative of the round-trip AND the packet loss with respect to test-step bandwidth is zero (or below a predetermined threshold). Thus, in some implementations, the test algorithm may search for the maximum bandwidth value for which there is a pre-determined differential relationship to the packet loss or round-trip delay level, e.g., instead of a particular absolute packet loss or absolute round-trip delay level.
As another example implementation of a system that performs bandwidth testing, the system may include a communication interface configured to communicate with a speed test control server and communicate with a data server. The system includes test control circuitry configured to establish a transmission control protocol (TCP) control connection with the speed test control server, request a speed test from the speed test control server, and responsive to the request for the speed test, receive a data server address from the speed test control server.
The system also establishes a user datagram protocol (UDP) test connection with the data server at the data server address and initiates a bandwidth test between the system and the data server. In the system, analysis circuitry is configured to receive a test data stream from the data server, separate out from the test data stream latency packets and non-latency packets, and determine a round trip latency from the data server to the system with the latency packets. The analysis circuitry may also determine a packet loss with the non-latency packets and determine whether the test data stream provides a maximum bandwidth estimate in view of the round trip latency and the packet loss. In that regard, the analysis circuitry is configured to compare the packet loss against a pre-determined packet loss threshold and compare the round trip latency against a pre-determined latency threshold.
In some implementations, the client device implements multiple modes of operation. For example, the client may have two modes of operation:
First Mode: Send: In the example send mode, a test stream (e.g., a single test stream) with specified parameters may be transmitted and the obtained results are reported. This mode may allow an operator to spot check an interface's bandwidth, or to implement a custom GP search algorithm.
Second Mode: Bandwidth: the client device and data server autonomously exchange one or more test streams based on the specified parameters. The exchange continues until either GP is determined, or the maximum number of test steps is reached. The obtained results are then reported.
Examples of client device parameters include:
The control server application parameters may vary widely depending on the implementation, and as one example may include:
The architectures described above may support streamlined implementation of the packet generator and packet analyzer in any of a wide variety of communication systems such as those supported by Broadcom devices BCM63138, BCM63168, BCM6838, and/or other devices. The architectures may provide gigabit speeds for testing purposes with little or no load on the host CPU by offloading bandwidth testing to hardware accelerators.
In the architectures described above, the underlying network stack, such as a software stack or hardware stack, may be responsible for building and transmitting packets to networking interfaces using the payloads sent to the socket by the test application. The network stack may also map packets received at networking interfaces to the sockets used by the architectures for reception.
A UDP socket may be defined by the following tuple: source IP address, destination IP address, UDP source port, and UDP destination port. However, UDP sockets many not necessarily provide information about other network layers such as Ethernet, virtual local area network (VLAN), point-to-point protocol over Ethernet (PPPoE), IP tunnels, or other network layers. In some cases, the architectures may use information from other network layers to identify test packets. The architectures may abstract this information from the socket into the underlying network stack.
In some cases, packet generators and packet analyzers are not necessarily integrated with the software network stack. To address this lack of integration, the full packet contents may be explicitly configured in order to allow generation of test stream packets and classification of received traffic into test stream packets. However, this is information may not necessarily be available to the client device and the data server due to the socket nature of their connections.
To make this information available the architectures may define an application programming interface (API) for the hardware running the packet generator and packet analyzer. The architecture may use this API to configure the UDP socket tuple obtained once the UDP connection is established. The tuple is used by the hardware running the stream generator to detect test packet headers for stream packets transmitted by the test application to network interfaces. The tuple is used by the hardware running the packet analyzer to capture test stream packets received from the network interfaces.
In hardware-based packet generator implementations, the client device or bandwidth test application may setup the tuple for the packet generator and enable the packet generator. The packet generator may examine packets transmitted by the networking stack to search for the tuple. The client device or bandwidth test application may transmit a test stream packet to the socket, with the packet containing the packet generator configuration parameters in the packet payload.
The packet generator detects the test stream packet and intercepts it. The packet generator may extract the configuration parameters from the packet payload and apply them as a template for test packets. The intercepted packet may also include the packet headers applied through the stack for packet generation. Accordingly, the packet generator may use the intercepted packet, including packet headers, as the template for generating the test packets in the traffic flow to the same network interface and queue that were the destination of the intercepted packet.
In hardware-based packet analyzer implementations, the client device or bandwidth test application may setup the Tuple for the packet analyzer and activate the packet analyzer. The packet analyzer may examine packets received from the network interfaces. The packet analyzer may drop detected test packets and responsively update test statistics. Once the traffic flow finishes, the client device or bandwidth test application obtains the packet analyzer statistics through an API.
In the example binary search algorithm, the test stream bandwidth at individual steps step is adjusted following a binary search progression. An example binary search algorithm pseudo code implementation is shown below:
In some implementations, SSA tests may run in a service-provider managed network. In some cases, service-provider test may be subject to network level quality of service (QoS). To avoid managed services disruption, SSA test streams may be prioritized at a higher level than “best effort” traffic such as internet browsing, and at a lower level than traffic related to services that managed by the service-provider, such as IPTV or VoIP. In this case, the bandwidth measurements will indicate the bandwidth available for unmanaged services. Additionally or alternatively, tests may be performed at a priority higher than any other services and the calculated connection data may reflect the capabilities of the entire system.
In some cases, managed services may be disabled in the system under test. In such cases, QoS enforcement may not necessarily affect the SSA bandwidth measurements. For example, if no managed services are active at the time of measurement, the SSA measurement may report the total bandwidth available.
An example protocol specification is provided below. However, other protocol implementations may be used.
The methods, devices, processing, and logic described above may be implemented in many different ways and in many different combinations of hardware and software. For example, all or parts of the implementations may be circuitry that includes an instruction processor, such as a Central Processing Unit (CPU), microcontroller, or a microprocessor; an Application Specific Integrated Circuit (ASIC), Programmable Logic Device (PLD), or Field Programmable Gate Array (FPGA); or circuitry that includes discrete logic or other circuit components, including analog circuit components, digital circuit components or both; or any combination thereof. The circuitry may include discrete interconnected hardware components and/or may be combined on a single integrated circuit die, distributed among multiple integrated circuit dies, or implemented in a Multiple Chip Module (MCM) of multiple integrated circuit dies in a common package, as examples.
The circuitry may further include or access instructions for execution by the circuitry. The instructions may be stored in a tangible storage medium that is other than a transitory signal, such as a flash memory, a Random Access Memory (RAM), a Read Only Memory (ROM), an Erasable Programmable Read Only Memory (EPROM); or on a magnetic or optical disc, such as a Compact Disc Read Only
Memory (CDROM), Hard Disk Drive (HDD), or other magnetic or optical disk; or in or on another machine-readable medium. A product, such as a computer program product, may include a storage medium and instructions stored in or on the medium, and the instructions when executed by the circuitry in a device may cause the device to implement any of the processing described above or illustrated in the drawings.
The implementations may be distributed as circuitry among multiple system components, such as among multiple processors and memories, optionally including multiple distributed processing systems. Parameters, databases, and other data structures may be separately stored and managed, may be incorporated into a single memory or database, may be logically and physically organized in many different ways, and may be implemented in many different ways, including as data structures such as linked lists, hash tables, arrays, records, objects, or implicit storage mechanisms. Programs may be parts (e.g., subroutines) of a single program, separate programs, distributed across several memories and processors, or implemented in many different ways, such as in a library, such as a shared library (e.g., a Dynamic Link Library (DLL)). The DLL, for example, may store instructions that perform any of the processing described above or illustrated in the drawings, when executed by the circuitry.
Various implementations have been specifically described. However, many other implementations are also possible.
This application claims priority to U.S. Provisional application No. 62/182,675, filed 22 Jun.2015, and to U.S. Provisional application No. 62/051,356, filed 17 Sep.2014, which is entirely incorporated by reference.
Number | Date | Country | |
---|---|---|---|
62182675 | Jun 2015 | US | |
62051356 | Sep 2014 | US |