The present invention relates to network security generally, and in particular, to systems and methods for distinguishing legitimate network client connections from proxy connections and can assess a risk associated with such proxy connections.
Enterprise software, crypto exchanges, traditional and emergent financial technology providers, and streaming content companies face an ever-increasing onslaught of online fraud and cyber-attacks on their critical systems, private data and content. Among the attack vectors are IP address spoofing and geolocation spoofing (geoshifting) enabled by proxied network access.
Proxy technologies provide online privacy, and security for network client connections to remote websites, applications, databases, and content. A proxy often provides a secure encrypted connection from the client to a remote proxy server that then acts on behalf of the client to forward packets to the destination endpoint. Example proxy technologies include, but are not limited to Virtual Private Networks (VPN), The Onion Router (TOR), etc.
A reputable, non-nefarious, client may use a proxy simply for privacy and security. However, nefarious clients may exploit proxies to access geographically restricted data, content, or streaming media, from outside the geographically restricted region. They may also use proxies to exploit regional price differences, i.e., to access content available at a lower price in a geographic region other than the users' actual geolocation. In addition to accessing streaming content, proxies can be used to evade geographical restrictions and regulations governing financial systems, cryptocurrency trading, gaming and lotteries, access confidential data records or personal identifying information, and to subvert or skirt data privacy protection regulations such as the European Union's General Data Protection Regulation (GDPR). Malicious attacks on critical infrastructure such as the 2021 Colonial Pipeline ransomware attack was perpetrated in part by accessing their systems using a VPN with leaked login credentials.
Proxies such as VPN and TOR provide a secure encrypted connection for the client's network traffic to provide online privacy, and security. The encrypted connection acts as a secure tunnel from the client to an endpoint. Typical scenarios include a laptop or mobile device client accessing the internet. A VPN client running on the client device establishes a secure encrypted connection to a remote VPN server. During the connection process, the VPN client is provided an IP address and the IP address of the associated VPN server. Once the VPN connection is established, the client presents its packets to the VPN client where it is encrypted and the encapsulated inside of an IPSEC packet. The packet is then forwarded to the remote VPN server, typically via The Internet. The VPN connection is also referred to as a VPN tunnel. These tunnels are very useful when a client is accessing the Internet from a public WIFI or access point. Since the client's packets are encrypted, the user is protected from inadvertently sharing passwords or even the websites or accessed IP locations. Upon arrival at the VPN server, the connection's packets are decrypted and forwarded to the intended recipient, e.g., a website, content distribution services, application, etc. In some cases, for even more privacy and security, multiple tunnels can be chained together, e.g., TOR, “Double-VPN”, with each tunnel encrypted such that there is no one VPN server that has full knowledge of the full path or of the source IP and destination IP addresses.
Note that the VPN client's IP address is provided by the VPN service and is not associated with the client's original (untunneled) IP address. The VPN service may register a pool of IP addresses with organizations such as Internet Assigned Numbers Authority (IANA) or Asia Pacific Network Information Center (APNIC). IP address registration requires customer name and city/region served. By registering in multiple countries, or by obtaining access to IP registered in multiple countries, VPN services may appear to be in any continent, region, or city. Thus, the client using a VPN can take on the apparent location of any desired location that is supported by the VPN service. For instance, a client physically located in San Francisco can choose to connect to a VPN service with a server and IP address registered to Germany, France, or Italy, as desired. This IP geolocation spoofing is a major selling point of VPN connectivity and easily defeats today's IP location lookup database services.
Furthermore, VPN providers not only can spoof the IP address geolocation, but they also often physically place VPN servers in the spoofed location. This gives the VPN client the ability to have a secure, encrypted connection that provides both IP geolocation spoofing and the ability to make the connection appear as though it originated physically from within the country where the VPN server resides.
A related feature of VPN tunnels is that even though the tunneled packet may traverse many routers on its journey from the VPN client to the VPN server, the Time-To-Live (TTL) field in the client's tunneled IP packet is not decremented during its transit through the tunnel. Thus, when the packet is decrypted by remote VPN server, the forwarded packet has a TTL that appears much closer to the target country than to the client's originating county.
In addition, the system will often see better throughput and lower latency by having the server placed in or near to the target country, as opposed to routed elsewhere and then to the endpoint in the target country. Even further, some VPN providers implement TCP optimization functions to decrease round-trip latency and improve throughput. In summary, VPN proxy services have many methods that make it exceedingly difficult for the endpoint to determine if the connection is via a proxy and whence it originated.
Incumbent VPN detection methods typically use a database of information that has been associated with a particular IP address or set of IP addresses. The database may be a local copy provided by the database vendor or it may be accessible on the web via an API. In a typical use case, a web-based application may use the IP address of an incoming network connection to look up geolocation and proxy risk information for that IP address.
Incumbent IP database vendors build their databases by accumulating and associating information from a number of sources for a particular IP address or set or block of IP addresses. In building their databases, publicly available IP registration databases can be consulted. Domain Name Service (DNS) servers provide up to date information. In addition, many IP addresses can be scraped from websites. In many cases, VPN vendors provide configuration files that include their proxy IP address(es). In addition, IP database vendors may subscribe to VPN providers, make connections dynamically, and in so doing, learn the IP addresses.
Drive testing, wireless network sniffing, and other more manual ways may also be used to gather additional or more localized information about IP locations. These various sources may then be curated into a database that can then be consulted for geolocation and proxy determination.
VPN providers know, however, that their information can be gleaned over time, and many have adopted a “cat and mouse” game, to get around IP database VPN detection. Recent advances in proxy technology include ephemeral IP address lease, Residential VPN, and Proxy over VPN. Proxy providers can use ephemeral IP address to simply rotate through pools of IP addresses that they have acquired, and some of the more advanced providers change IP addresses on a per session basis, or even choosing to change IP addresses on a timed basis. Consequently, an IP address database quickly becomes stale and inaccurate. Now that IP addresses can be rotated per session or every few minutes, it makes it virtually impossible for a database to keep up to date with the proxy providers.
Residential VPN is an increasingly prevalent technique where a residential subscriber wittingly, or unwittingly, operates a VPN server in their home system. Potentially nefarious remote users connect to the residential VPN server and thus share the IP address and the bandwidth of the residential subscriber. Traditional IP address database systems cannot distinguish between the legitimate residential subscriber and the potentially nefarious VPN user and cannot block the access of the one without impacting the other.
Prior art IP database VPN detection cannot distinguish a legitimate subscriber from a Residential VPN subscriber since the IP address is the same for both. They can either mark the full IP address as suspect or not. This is a problem for the content provider as they typically do not want to block the legitimate subscriber, resulting in allowing both legitimate and VPN subscribers.
Thus, there is a need for improved systems and methods that can distinguish individual network flows that are legitimate from those flows that are via VPN.
The present invention addresses, among other things, the above-described problems in the prior art related to detection of proxied network connections and flows.
According to embodiments of the present invention, systems, and processes for Detection of Proxied Network Connections (DPNC) provide means to detect the presence of a proxy in the network connection path and to provide an indication or risk score.
According to embodiments of the present invention, proxy detection and generation of a risk score can be used to inform a policy to take action, e.g., to block, challenge or allow the connection. In some cases, the detection and risk score may be used to inform the user that the suspicious activity has been seen while still allowing the connection.
According to embodiments of the invention, rather than relying on the IP address of a connection, DPNC observes packets at various points located between the VPN server and endpoints distal to the client. Packet observation can be done in a number of ways, including, passive or active optical or electrical taps, in-line processing, port mirroring, Switched Port Analyzer (SPAN) or using a terminated network connection. Observed packets are processed by systems acting as “network sensors”. Observation points may be located at one or more intermediate points between the server and the endpoints, co-located with the application server(s) endpoint(s), or the dedicated network sensors.
According to embodiments of the present invention, observed packets may be processed or analyzed in real-time, near real-time or may be post-processed from packets captures stored in memory, or in non-volatile storage.
According to embodiments of the present invention, a subset of the packet, called “a packet slice” can be used to process packets. The slice may be further processed into metadata or organized into a data structure with the desired fields of the packet.
According to embodiments of the present invention, a number of packet fields that include IP address, IP protocol and transport layer port fields, may be associated with each other to create a unique session ID that corresponds to a particular network flow, or set of related flows, between a client and one or more endpoints.
According to embodiments of the present invention, a system for detecting proxied network connections (DPNC) may include a plurality of network sensors, a software module and a DPNC server. The plurality of network sensors are coupled with an electronic data network and configured to receive network traffic including packets for a network connection, to extract metadata features from said packets, and to transmit the extracted metadata features. These features may be stored in a database and/or streamed directed to the DPNC server. The software module is adapted to be initiated by a client device, to generate network traffic from the client device to one or more of the plurality of network sensors. The DPNC server is configured to receive the extracted metadata features, to analyze the extracted metadata features and to determine whether the network connection was proxied based on a comparison of at least one of the extracted metadata features with an expected metadata feature, and to generate an indicator indicating a likelihood that said network connection is proxied.
The indicator may be a score and can allow another system to act on the score. For example, the output from the DPNC server could be messaged to an application server which could allow or block access based on the score.
According to embodiments of the present invention, the server may be configured to determine whether said network connection was proxied based at least on a calculation of latency for the packets relating said network connection and a comparison of the calculated latency with an expected latency.
According to embodiments of the present invention, the server may be configured to use a single stage or a multi-stage classification with a neural network or decision tree to determine whether said network connection was proxied.
According to embodiments of the present invention, a method for detecting proxied network connections (DPNC) is provided. The method may include a step of providing a software module configured to be initiated by a client device. When the software module is initiated, network traffic is generated from the client device to a plurality of network sensors coupled with an electronic data network and receives network traffic including packets for a network connection, extracts metadata features from said packets, and transmits the extracted metadata features. These features may be stored in a database and/or streamed directed to a DPNC server. The DPNC server receives the extracted metadata features, to analyze the extracted metadata features and determines whether the network connection was proxied based on a comparison of at least one of the extracted metadata features with an expected metadata feature. An indicator indicating a likelihood that said network connection is proxied can be generated.
Further features and advantages of the present invention will become apparent following review of the detailed description set forth below along with the accompanying figures.
The novel features of the embodiments described herein are set forth with particularity in the appended claims. The embodiments, however, both as to organization and methods of operation may be better understood by reference to the following description, taken in conjunction with the accompanying drawings as follows.
The following descriptions are presented to enable any person skilled in the art to create and use apparatuses, systems and methods described herein.
Reference will now be made in detail to several embodiments, including embodiments showing exemplary implementations of improved systems and methods for the detection of proxied network connections. Wherever practicable, similar or like reference numbers may be used in the figures and may indicate similar or like functionality.
In various embodiments, methods, systems, and apparatus for detection of proxied network connections are disclosed. In some embodiments, the systems, and methods for detection of proxied network connections comprise packet inspection, metadata generation, artificial intelligence for classification and detection with a cloud-based API to indicate the presence of a proxy. In some embodiments, the systems, and methods for detection of proxied network connections may provide VPN decisions and scoring in real-time. In some embodiments, the systems, and methods for detection of proxied network connections may provide VPN decisions and scoring in forensic detection from packet/stream captures.
A fundamental weakness of prior IP address database solutions is that they rely on the IP address as the key to lookup a result in a database. Instead of relying on the IP address of the connection, DPNC observes packets at various vantage points located between the VPN server and endpoints distal to the client. Packet observation can be done in a number of ways, including, passive or active optical or electrical taps, in-line processing, port mirroring, Switched Port Analyzer (SPAN) or using a terminated network connection. Observed packets are processed by systems or components acting as “network sensors”. Observation points may be located at one or more intermediate points between the server and the endpoints, co-located with the application server(s) endpoint(s), or the dedicated network sensors.
Observed packets may be processed or analyzed in real-time (sub-second), near real-time (within seconds of transit) or may be post-processed from packets captures (packet copies) that are stored in memory, or in non-volatile storage, e.g., Solid State Drive or magnetic media. Note that the full content of each packet is not necessarily required. The full packet or a subset of the packet, called “a packet slice” is acceptable as long as the slice retains the required information for processing. The slice need not be contiguous bytes or bits of the packet, but can be selected fields such as IP addresses, IP Time to Live (TTL), TCP/UDP headers. The slice may be further processed into metadata or organized into a data structure with the desired fields of the packet. Regardless of observation method or location, the DPNC system requires packet slices to be timestamped at the time of observation. Timestamps need not be globally synchronized but are expected to be based a suitably stable clock counter with sub-millisecond resolution. The timestamp is not part of the packet itself, rather it is metadata associated with the packet slice. Similarly, the packet length is not part of the packet per se but is metadata that is associated with the packet slice.
Instead of using the IP address as its primary key for VPN detection, DPNC associates a number of packet field—an “n-tuple”, including IP address, IP protocol and transport layer port fields with a session ID that corresponds to particular network flow, or set of related flows, between a client and one or more endpoint, called a “session.” Each flow may be a Transmission Control Protocol (TCP), or Stream Control Transmission Protocol (SCTP) connection between a client and an endpoint. Flows may be initiated by the client device, or the endpoint device, and may transport higher network layer communications, such as HyperText Transfer Protocol Secure (HTTPS).
DPNC detects the presence of the proxy/VPN tunnel by analyzing the observed session flow(s). Flows that traverse a tunnel experience a number of packet processing and forwarding steps that may leave distinguishable alterations in packet data, values, format, counts, consistency, temporal or protocol sequencing, etc. DPNC's ability to learn and recognize the alterations is a fundamental, technical improvement over the current state of the art.
In addition, DPNC can distinguish between the flows of a legitimate subscriber at a location from those of a VPN subscriber sharing the legitimate subscriber's connection, even if the IP address and/or physical location is shared. This addresses Residential VPN subterfuge and is a further fundamental, technical improvement over the current state of the art.
Today's IP database VPN detection cannot distinguish the legitimate subscriber from the Residential VPN subscriber since the IP address is the same for both. Traditional IP database detection methods provide a risk score for the IP address per se, but not on a per-user or per-flow basis. This is a problem for the content provider as they typically do not want to block the legitimate subscriber, often resulting in allowing both legitimate and VPN subscriber. However, DPNC can make the distinction thus allowing selective handling by policy on a per-connection or per-flow basis even for a single subscriber IP address.
In computer networking, a proxy server is a server application that acts as an intermediary between a client requesting a resource and the server providing that resource. Instead of connecting directly to a server that can fulfill a requested resource, such as a file or web page, the client directs the request to the proxy server, which evaluates the request and performs the required network transactions. This serves as a method to simplify or control the complexity of the request, or provide additional benefits such as load balancing, privacy, or security. Proxies were devised to add structure and encapsulation to distributed systems. A proxy server thus functions on behalf of the client when requesting service, potentially masking the true origin or geolocation of the client requesting access to the resource server. See e.g., https://en.wikipedia.org/wiki/Proxy_server.
Virtual Private Networks (VPN) are a specific type and manner of providing a proxy connection. VPN proxy terminology is used hereafter. However, the methods, system and apparatus apply to general proxy connection types and not limited solely to VPN.
The VPN application looks for an available remote VPN server with an IP address associated with the desired apparent location. Although not strictly required, the VPN provider may choose a VPN server that is physically located in the desired apparent location. The VPN client establishes a secure encrypted connection to the VPN server, called a “tunnel”. Once the tunnel is established, the user's system forwards packets (e.g., from the users' browser) to the VPN client and the VPN client in turn forwards packets through the VPN tunnel to the remote VPN server. The VPN server accepts packets from the VPN client, potentially requiring decryption and reassembly of the packets that were tunneled and forwards the reconstituted packets to the IP address of the application server endpoint 110, or destination.
The key point is that the user's actual IP address is never exposed by the VPN tunnel and the endpoint application only ever sees the proxied IP address. To the endpoint, the connection appears as though it comes from an IP address in the user's desired apparent location. Thus, a user in Los Angeles can use a VPN tunnel to appear to be in London, log in to their BBC account, and watch the content of their choosing. This issue is that the content is often geographically restricted, and the service provider can lose revenue or be fined for providing access to the content from outside the restricted geography. Similar restrictions apply to financial transactions, gaming, betting, medical records, and access to critical infrastructure.
There are a variety of VPN technologies, including OPENVPN (see, e.g., https://en.wikipedia.org/wiki/OpenVPN), IKEv2 (see, e.g., https://en.wikipedia.org/wiki/Internet_Key_Exchange), and proprietary technologies, e.g., NORDLYNX (NORDVPN) (see, e.g., https://support.nordvpn.com/General-info/1438624372/What-is-NordLynx.htm). In addition, the various VPN's often offer the option of a variety of network transport layer technologies, e.g., User Datagram Protocol (UDP) or Transmission Control Protocol (TCP) (see, e.g., https://en.wikipedia.org/wiki/User_Datagram_Protocol, https://en.wikipedia.org/wiki/Transmission_Control_Protocol).
Note that users are not limited to individual human users logging in to a website, and can be machines, applications, IoT devices, mobile devices, and other automata that desire to access a remote server, server process and/or its application and stored data, files, images, videos.
Heretofore the main method of detecting VPN connections has been to scrape VPN provider websites for their server IP addresses, obtain VPN accounts and cycle through connections and locations to learn their server IP addresses, and other forensic methods based on the IP address to develop a database that an application and lookup the likelihood of a VPN or proxy. The fundamental problem is that these incumbent methods rely on the IP address. The VPN provides know that the VPN detection services use these methods and so they regularly change their server IP addresses, in a game of cat and mouse.
The innovation driving the present invention is that DPNC does not rely on the IP address, but rather on the characteristics of the connection. Although the proxy is able to hide the IP address of the user, it must intercede between the user and the endpoint. In so doing, alterations are made in the way that the packets are handled, and these alterations are detectable by a Machine Learning system, e.g., Neural Network, Decision Tree, etc. According to embodiments of the invention, network sensors are used to analyze packet flows to detect and extract data relating to such alterations and to stream the data to a central server or database for further analysis. Based on such alterations, it can be determined whether a network session is proxied or not.
Physical networks typically limit the size of the largest packets to a “Maximum Transmission Unit”, often on the order of 1500 Bytes (1500 B). Untunneled packets from a client are limited to no larger than the MTU size. When a tunnel is used, either the MTU presented to the client must be reduced to accommodate the overhead of the tunnel, or the client packets must be fragmented to not exceed the link MTU. In either case, the client packets are transported in a way that is different than that of an un-tunneled connection.
At the egress of the VPN tunnel, the VPN server decrypts the packets and, in some cases, performs reassembly of packet payloads that were fragmented to fit within the MTU.
A packet traversing the VPN tunnel may be modified the VPN server and client processes and differ in several packet fields or bits from that of an untunneled packet. In addition, since the tunneled packets have a reduced MTU with respect to untunneled packets, may have altered MTU, there may be detectable alteration of inter-packet temporal variation, packets counts and packet order. Further, since the tunnel itself entails a bandwidth overhead and processing overheads, the throughput, latency, jitter, and packet loss of the tunnel may be different than that of an un-tunneled connection. The present invention can detect or measure such characteristics and compare it to expected characteristics for untunneled packets.
Maximum Segment Size (MSS) and Maximum Transmission Unit MTU are two limits on packet size. Packets can only reach a certain size (measured in bytes) before computers, routers, and switches cannot handle them. MSS is the size of maximum number of payload bytes in a packet of a TCP connection, while MTU measures the entire packet, including headers. Packets that exceed a network's MTU can be fragmented, or broken up into smaller packets, and then reassembled. Packets that exceed the MSS may be dropped by the endpoint or cause errors.
IPsec protocols add several headers and trailers to packets, all of which take up several bytes. For networks that use IPsec, either the MSS and MTU must be adjusted accordingly, or packets will be fragmented and slightly delayed. Usually, the MTU for a network is 1,500 bytes. A normal IP header is 20 bytes long, and a TCP header is also 20 bytes long, meaning each packet can contain 1,460 bytes of payload. However, IPsec adds an Authentication Header, or an ESP header, and other associated trailers. These may add 50-60 bytes to a packet, or more.
Networks packets are comprised of a number of data fields including header(s), payload, and trailer(s). The header and trailer may in turn have many subfields. In the case of the IP header, the subfields include IP source and IP destination addresses. Networks are layered, with physical layers, media access control, network, transport, session, presentation, and application layers as per the Open Systems Interconnect (OSI) model, and often the payload of one layer is the full next higher layer that may encapsulate higher layers in turn. Each layer may have several versions such as IPv4 and IPv6 network layers, or several methods such as TCP, UDP and SCTP transport layers. For purposes here, the user payload is considered to be the payload of the transport layer. The various layer headers and their subfields are of special import for detection of proxied network connections.
According to embodiments of the invention, DPNC includes observing the user packets on a connection-by-connection basis. An individual connection to an application is called a “flow,” can be one-way (unidirectional), two-way (bidirectional) and in some cases broadcast or multicast (one to any or many). In the case of TCP or SCTP transport protocols, the connections are bidirectional. For our purposes, by convention, we label packets sent by the client to the endpoint as the “in” direction, and packets sent from the endpoint to the client as the “out” direction. Rather than processing entire packets with large payloads, according to embodiments of the invention, DPNC can process and analyze just the packet headers and further individual subfields. The information resulting from this processing are called “metadata features”.
Metadata features may include packet size, packet count, packet header, packet order, packet size, packet and protocol header values, temporal values of the times of arrival (TOA) of packets and time differences of arrival (TDOA) of related packets of a stream, and the states of the TCP or other protocols used by the connection. The variations or anomalies of the metadata of tunneled connections with respect to un-tunneled connections are herein called “tells”.
While any single one tell may be insufficient to detect the presence of the VPN, a combination of metadata feature\ and temporal tells can also be used to detect the presence of the VPN tunnel, and in most cases, may be more accurate.
According to embodiments of the invention, metadata features can be computed or extracted from packets for each connection or flow for all the packets of a stream over its duration, or a subset of the packets, e.g., an initial set of packets such as the initial TCP handshake and TLS handshakes. According to embodiments of the invention, network sensors may compute or extracted metadata features from packets it receives and stream the metadata features to a central database or service for further processing. An embodiment may sample packets subsets periodically over long-lasting connections to provide a semi-continuous testing.
According to embodiments of the invention. The DPNC process does not require all of the bytes of the packet and typically does not need the payload. The resulting packet slice may contain the some or all of the packet information, packet size and timestamp, and may be of fixed, limited, or variable length depending upon the fields selected to be in the slice.
In some embodiment, flows may be observed “mid-flight” between the remote VPN server and the application, rather than at or near the terminating end of the stream and the metadata from the mid-flight location(s) alone, or in concert with the network sensor provide metadata for the same flows.
Transport layer and upper layers such as TCP and Transport Layer Security (TLS) are governed by specific state machines. The state machines require that certain interactions must occur between the user and the endpoint. These interactions are not typically fully masked by the VPN. The presence of the proxy imposes subtle variations not just on the fields, but on the latency, jitter and loss over the VPN connection that may interact with the protocol state machines. The variations in the timing and behavior of the tunnels are detectable by machine learning methods.
More advanced VPN's may optimize tunnel performance by performing optimizations such as TCP Offload (see, e.g., https://en.wikipedia.org/wiki/TCP_offload_engine). TCP optimizations may include the rapid local completion of the TCP handshake without always requiring the full roundtrip. However, there are certain transactions, such as the exchange of encryption keys in Transport Layer Security (TLS) that require a full roundtrip delay that cannot be offloaded. The disparity in latency for offloaded and full roundtrip delays is a strong tell for a Machine learning method to tell that a proxy is present. In essence, advanced VPN's are more detectable by virtue of their ability to improve performance with respect to standard VPNs.
Network sensors 206 may be the end point of the TCP flow 204, an intermediate packet forwarding device inspecting the packets in flight, or a monitoring system where packet copies have been directed (observation points). Network sensors 206 may be geographically distributed. Client flows 204 may be observed by any subset of observations points. VPN proxy detection can be performed using observations from a one or more observation point. Geolocation by way of multilateration or AI/ML typically entails observations of client flows from more than one observation point, though one observation point can suffice in certain scenarios. Since the network sensors 206 are distributed and inspect packets, this is termed Distributed Packet Inspection.
Users (client devices) can initiate a connection to a network resource, potentially via a proxy. The accessing process generates one or more network connections to one or more systems distributed over the network where the network connection is passively observed. In one embodiment, the network connection initiates a well-defined network action, such as an “HTTPS Get” to a server deemed “network sensor” that responds to the request in a normal fashion. The network sensor examines some or all of the packets of the connection, timestamps the packet arrival, extracts some or all of the packet contents to form metadata that could be all or some of the packet or its extracted fields, timestamps, size, and other parameters related to the packet and its contents, possibly also metadata that spans one or more packets such as an interpacket arrival time, connection state or inter-connection data. The metadata generation process can be split across multiple systems with the network sensor preforming some or all of the functions, and the detection server performing additional steps.
The network sensors 308b are configured to observe the exchange of packets and to process to deliver the full packet, or a subset of packet bytes called a “slice”, and to deliver them to a network sensor 308b along with metadata that includes packet size and timestamp. The network sensor function may run in a physical server, virtual machine, or container. The network sensor 308b may also run in a physical server, virtual machine or container and may be co-located with the observing system or elsewhere in the network.
The paths 310 between the client 302 and the network sensors 308b are typically multi-hop, traversing a number of packet-forwarding devices 312, e.g., switches, routers, access points, multiplexers, radio links, cable modems, FTTH, xDSL, mobile networks. The paths 310 from the client 302 to the network sensors 308b typically diverge at some point(s) on their way to the different endpoints, especially when the network sensors 308b are geographically dispersed.
The network sensors 308b are configured to process the copied packets, slices, and packet metadata, to form metadata streams that are then processed to identify the presence of a proxy. In the example of
As in the case of intentional traffic, when the opportunistic traffic from the client in untunneled, the paths from the client to the network sensors typically diverge at some point(s) on their way to the different application endpoints, especially when the application servers are geographically distributed. As shown, metadata may be stored in the database 504 and/or streamed to a remote DPNC server (not shown) for processing according to embodiments of the invention.
The traffic may be observed at the several CDN App servers 502, for instance in the kernel, or by virtue of a network tap function in a virtual switch interconnecting virtual machines. The network sensor function 502b may run in a physical server, virtual machine, or container. In addition, connection packets may be copied to network sensor(s) 502b from physical packet forwarding devices 312 anywhere in the connection path between the proxy server (not shown), including the possibility of multiple locations. The packet copies may arrive via a passive or active optical, electrical or RF tap, port mirror or SPAN port. Packet copies may be accompanied by additional metadata such as length, port, location information, or timestamp. The copied information may be the full packet or packet slices. Packets, slices, or metadata streams may be copied to multiple sensors 502b for load balancing, redundancy, and reliability. The packets, packet slices and/or metadata may also be stored in memory 504 or storage device for near real-time or forensic proxy detection or other analytics. Embodiments of DPNC may include both intentional and opportunistic traffic processing.
Other machine learning techniques such as Recurrent Neural Network (RNN), Long-Short-Term Machine (LSTM), Random Forest, or eXtreme Gradient Boosting (XGBoost) decision tree can be applied. Multiple techniques can be used and can be federated. The VPN detection process yields a VPN classification decision 706 (e.g., VPN=yes/no), a confidence measure (e.g., 0-100%), potentially with additional geolocation information. The decision, confidence measure, ancillary determinations and optionally the metadata associated with the connection are stored in a database 708. This information may also be streamed to other processes such as a policy process used to block, challenge, sandbox, or allow the connection (not shown). The datastore 708 may be queried via an API, RESTful API, SQL, etc.
A single network sensor can provide the metadata required in order to classify a flow as having been transported over a VPN or proxy connection. Multiple network sensors may be used to improve the detection accuracy, and a majority vote or other weighting of individual sensors may be used. Decision trees or neural nets may also be used, acting on the collection of individual sensor metadata, or on fully pooled metadata.
The distributed detection of proxied network connections system includes the VPN detection pipeline, methods, and apparatus for introducing training sessions into the pipeline and methods and apparatus for collecting and processing metadata that is then used to validate the Machine Learning process that detects the presence of VPN or proxy. Training and validation may occur continuously or on-demand and may be ongoing while the proxy detection system is in operation.
Networks are not often static and unchanging. Network nodes (packet-forwarding devices such as routers and switches) are designed to operate in conditions where network links may fail or the routes and paths between and among the network nodes change for any reason including link failures, addition of network nodes and other changes. When these events occur, it is often the case the network selects or determines a new path from the source to the destination. These path changes may impact latency, jitter, throughput and loss between the source and destination and may also impact the makeup of the packet stream itself and impact the behavior of the transport protocol. Since the network changes with time, it is important that the distributed VPN detection system change to continue to provide accurate proxy detection.
The DPNC system can perform training and validation continuously and/or on-demand and may operate while the proxy detection system is in operation. Continuous and ongoing training on demand enables continuous weights promotion and best model promotion even the neural network is running. This is important since proxy methods and locations change over time, as does the underlying network configuration, capabilities, application software behavior, types, and application server geographic locations. Further the set of network sensors and their locations may vary over time.
The left side of
When the training client sessions are performed, they may be identified with a session ID. The core machine learning 804 queries the database 708 for the data associated with known training session IDs and uses the metadata for these sessions in its training. In addition to providing the session metadata, decision and confidence level, the known training sessions are labelled by method (direct or proxy), device type, model or version, proxy type and provider, and optionally, include geographical location of the test client. The core ML process 804 extracts the features from the metadata, applies these to a neural network model structure that is in training, and with proxy detection (and other decisions such as device or proxy type) results are compared versus the known labels. The results are scored, fed back to train the neural network or ML process. The ML process under training may be the same structure as what is running live in the network, in which case if the newly trained weights perform better than the weights in the active proxy decision process, the new weights may be promoted (updated) to the active proxy decision process. This promotion can be done during a scheduled maintenance window or, if desired, continuously to the active proxy detection process.
The training process includes review by experts 806 who can choose to modify the ML models, structure (such as number of layers or elements), or new features or “tells” based on their insights and analysis of the scoreboard. These are model centric ideas. The new model(s) can be compared to the active model, and if deemed better are promoted to the active network replacing the old model.
The experts 806 may determine that new or additional packet processing methods or metadata features or statistics are warranted and devise new tests to be run. These are data centric ideas.
Scalability of the deployment is required to support the number of new accesses per second, geographical distribution, multiple applications, and the aforementioned multitenancy.
Network sensors in
The metadata processing can form a stream with timestamped metadata features that can be processed in real-time or is stored as a file in database as shown. The session metadata is streamed or accessed from the database (in near real-time or forensically) to the VPN detection neural network that then computes the VPN detection confidence.
DPNC may be performed using a single network sensor “sensor” in a single location or may include initiated connections to a set or subset of sensors distributed around the world. In one embodiment, many sensors 1102 are distributed around the earth, for instance many geographically distributed in the United States and Europe. Sensors may be deployed wherever there are network connected servers including Africa, South America Asia and so on. Sensors may be deployed on physical servers, virtual machines, or containers and with one or more sensors in a particular location for load balancing or redundancy.
According to embodiments of the invention, the DPNC is configured to observe packet streams between the client accessing the application and the network sensors. In one embodiment, the client device is caused to generate streams to/from network sensors. These streams are intentional. In one specific embodiment, the intentional streams are TLS/TCP/IP.
In other embodiments, the streams may be TLS/SCTP/IP. Streams may be IP4, IPV6. Intentional streams can be generated by the client and the client may generate one or multiple streams to one network sensor, one or multiple streams to each of multiple sensors. In one embodiment, the network sensor is implemented in the Linux kernel. In other embodiments, the sensor can be a process running on a physical machine, virtual machine, or container on any operating system, e.g., android, iOS, MacOS, Windows.
In another embodiment, DPNC observes packets that are normally generated by the application. These are opportunistic streams. In these cases, the streams may be the same types as intentional streams, e.g., TLS/TCP/IP. Opportunistic streams can be generated by the client and the client may generate one or multiple streams to one network sensor, one or multiple streams to each of multiple sensors.
In another embodiment, the invention uses both intentional and opportunistic streams.
In another embodiment, opportunistic packets are passively observed by a network sensor that is co-located at or very near to the server side of the connection.
In various embodiments, the observed packets can be copied and to the network sensor using a passive tap (optical or electrical), actively port mirrored or spanned to the sensor, or from a network monitor device. The network sensor may provide the timestamping of the packet arrival time, or it may use timestamps that are appended to the externally observed packet data.
In these and other embodiments, the observed packets, or packet slices, are initially stored in memory, a file, on a disk or non-volatile storage, along with the observation times. These packets, files, storage can be processed in no-real time by the sensor, and its metadata.
The traffic generation process may be initiated upon initial request of the webpage, and can also be run continuously, periodically, as scheduled, on demand (via a user keyclick or button press) or when the page is refreshed. Similarly, the traffic generation process may be initiated be run continuously, periodically, as scheduled, on demand (via a user keyclick or button press) by applications running on the embedded or mobile device 1202 without a webpage.
The user's system 1302 (e.g., a web browser, tablet or phone app, computer application, set top video box, smart TV application, streaming media “stick”) provides login credentials to a “control server” 1304 to initiate a session and retrieve the locations of available streaming media. The client also makes an intentional connection to a network sensor 1310a in order that the DPNC system can determine if a tunnel/VPN is in use
Streaming content may be cached by a Content Distribution Network (CDN) (e.g., AKAMAI, CLOUDFLARE, AWS CLOUDFRONT) to reduce latency and improve user experience. The authenticated user 1302 makes a connection to a CDN content server 1304, 1306 specified in the index in addition to an intentional connection to a network sensor 1310b in order that the DPNC can determine if a tunnel/VPN is in use. During the lifetime of the session the client may make multiple connections to both CDN content servers and network sensors ensuring that tunnel/VPN status is not changed.
Instead of requiring a separate intentional connection to a network sensor, the DPNC system may opportunistically use the connection between user 1302 and content server 1306 itself to determine of a tunnel/VPN is in use. A mechanism such as a port mirror, SPAN port or tap 1314 is used to duplicate the network traffic between user and content server and forwards this traffic directly to a network sensor 1310c or potentially with or after storing in a packet capture device 1316. The DPNC server 1318 processes this duplicated traffic to determine of a tunnel/VPN is in use. In this case the traffic duplication and DPNC detection process is transparent to both the user and content server.
The content server may itself be able use existing opportunistic traffic for DPNC server 1318. DPNC software installed on the content server itself observes network traffic between user and itself by monitoring the OS network stack directly and providing duplicated packets to a network sensor 1310d running on the content server itself. In this case the DPNC detection process is transparent to both the user and content server.
Once these features are available per packet, the DPNC can compute additional features according to the unfolding of the states of a TCP flow from the network layer to the application layer, further compute features over up to all of the flows of a client-sensor connection including inter-flow latency, and further compute features over up to all of the network sensors of a session.
Further aspects of the invention according to embodiments can be described by way of example. A client navigates to a webpage using a browser. The remote webserver serves the webpage in which a JavaScript library is embedded. The library is called initiating the VPN Detection and Geolocation session. A session consists of an HTTPS “GET” request directed at a number of remote endpoints, e.g., network sensors. The HTTPS “GET”, or “ping,” is a resource request for an image file from a webserver running on the network sensors. The “ping” begins with the establishment of a TCP connection, a TLS connection handshake, and ultimately the data transfer request and response followed by termination of the TCP connection. This process can be automatically repeated a desired number of times by use of a HTTPS redirect until the image is ultimately supplied, or the redirect limit is achieved. Further, the client may repeat the ping process several times with its chosen set of network sensors. Thus, from the lowest to the highest construct, there are:
Features may be extracted relating to each of these and computed over all of the above groupings, including inter-flow and inter ping latencies, and for various temporal and statistical relationships among the sets.
One example Neural Network embodiment includes a 4-layer perceptron neural network with an input layer (18 nodes), 2 hidden layers (both 1200 nodes), and one output layer (single node), totaling 2419 nodes. The activation function for the hidden layers may be Rectified Linear Unit (ReLU). Hidden layers can use dropout and batch-normalization, a technique that improves the performance and stability of deep neural networks by standardizing the outputs of each layer. The activation function for the output layer may be sigmoid as it is a classifier that returns a score between 0.0 and 1.0, in this case, for instance, where 0.0 meaning not a VPN and 1.0 meaning 100% sure of VPN. Each time-duration input feature may be normalized independently, across the session, using the z-transform/z-score, i.e., for feature t, z=(t−mean(t))/σ(t). The z-score brings related session inputs together in a more uniform, normalized range of values space that is consistent for all network sensors in the session.
The neural network can independently evaluate the metadata from one or more network sensors to generate a per network sensor score in the range 0 to 1. A final score in the range 0 to 1 is determined by averaging the individual scores from the set of participating range sensors. This value is deemed the confidence score, or conversely a risk score. The application can access scores from the database through a cloud-based API. The VPN confidence measure for a session may be input to a policy engine that uses the measure to inform a policy that may allow/challenge/block access and may provide full or limited access or direct accesses to a “sandbox” when deemed concerning. For example, one possibility is that 0-0.5 is allowed, >0.5 to 0.8 is challenged and >0.8 is blocked. In other embodiments, scores do not have to be constrained or scaled to 0 to 1, and the final combination of scores may involve other machine learning techniques, e.g., majority vote, weighted voting, decision trees, and neural network.
According to some embodiments, Binary Cross Entropy (BCE) drive the machine learning back propagation. BCE is essentially a log of the abs difference between each label and the corresponding prediction, averaged across each training batch. See e.g., https://pytorch.org/docs/stable/generated/torch.nn.BCELoss.html.
At a low level, each training process cycle can entail the following steps:
At a higher level (ignoring K-folds validation), the process can include the following steps:
Tuning can include the following steps:
The training features, or tells, can include metadata selected from the session packets and statistics of durations of TCP and TLS state machines: Select metadata may include but is not limited to: Packet Header options: TCP Options, Window, State transition duration statistics (minimum, mean, maximum and standard deviation): TCP Three-Way Handshake, TLS “ServerHello To Cipher”, TLS “Cipher To ClientData”, TLS “ServerData To Ack”. Statistics are preferably aggregated over user sessions for a given network sensor, per network sensor, and all network sensors.
These and other tells can be input to the Machine Learning system during training; and by the operational system to classify and score real-time accesses.
The present invention can be applied to not only IP protocols, e.g., IP version 4 (IPv4) or IP version 6 IPv6, and TCP, but other transport protocols that can operate over IP, such as User Datagram Protocol (UDP) and Stream Control Transmission Protocol (SCTP).
Several of the IP and TCP fields, options and parameters are constrained by the underlying network or, in the case of a proxy, the tunnel. For example, the maximum transmission unit (MTU) size on the network path between two Internet Protocol (IP) hosts is limited by the maximum size packet on the layer two network (e.g., Ethernet) or by the IPSEC Tunnel that is itself limited by the layer two network. Protocols such as Path MTU Discovery (PMTUD) is a standardized technique in computer networking to determine the MTU. See https://en.wikipedia.org/wiki/Path_MTU_Discovery, incorporated herein by reference. Since the Layer 2 network limits the MTU, the tunnel is limited and this results in a different (generally smaller) MTU for the clients IP packets that are tunneled. This also has an impact on the TCP Maximum Segment Size (MSS) wherein the MSS over the tunnel is less than that of a non-tunneled TCP connection.
At the IP layer, one of the most obvious differences is in the IP Time-to-Live (TTL). The IP TTL is initially set to a certain number (e.g., 64 or 128) and decremented by each router that forwards the packet to its ultimate endpoint. A VPN Tunnel shelters the host's TTL from being decremented until it egresses the VPN endpoint. In the case of a long, potentially international, route through a tunnel, the IP TTL is essentially unchanged until the packet egresses the remote VPN server. This makes it appear as though the connection originated at the VPN server, and if the server is physically located in the destination country, it will appear to be in that country by virtue of the non-decremented IP TTL. In other words, a non-tunneled packet originating in San Jose, CA, may see its TTL decremented from 64 to 44 along a journey of 20 routers to an endpoint in London, UK. On the other hand, a VPN tunnel from San Jose to London, UK will leave the VPN server with an IP TTL of 64 and perhaps reach the destination endpoint with an IP TTL of 60 since there are fewer router hops from the London VPN server to the London endpoint. This, combined with the fact that it is impossible for a packet to travel faster than the speed of light and a client in San Jose may not be able to send a packet in any time less than the physical delay through the physical network, switches and routers. This physically constrained delay limitation (for one-way and roundtrip communications) may be different (larger than) that which would be expected for a system that was actually located in London talking to an endpoint in London.
In addition to these gross packet and temporal differences, there a number of other subtle differences that can be detected to classify a VPN connection from a direct connection. These include packet loss and retransmission rate variations, relative delays observed among sets of network sensors.
In addition, there are features (tells) that may be used to train a neural network or other machine learning systems (Random Forest decision tree, XGBoost, or heuristic) to classify a session as being conducted over VPN/proxy. The features can be associated with (“extracted”) from the metadata processed form the packets of the stream. The features can be from fields in a particular packet, a pair of packets, the states of the protocols such as TCP and SSL/TLS connections, the full set of packets of a TCP flow, the full set of flows of the client to a particular network sensor, functional comparisons and measures taken over all the network sensor sensors. The features can be individual fields of the packets, per packet type, per state, flow, connection, session, and the temporal relationship of the packets with respect to one another such as the TCP Round Trip Time (RTT). Features can be alphanumeric values, flags, bits, states. Counts of packets, by packet type, state, and flag settings. Further, features can be derived from histograms, minimum values, mean, median, maximum, variance. Features may include mathematical comparisons (less than, equal too, between), mathematical ratios, and measures on set and subset membership.
The VPN/proxy tunnel impacts the behavior of the client's IP and TCP fields. Accordingly, in embodiments of the invention, the proxy detection process is trained to distinguish tunneled traffic from non-tunneled traffic.
Though the DPNC network sensors do not directly process packets that are in the tunnel, the above impacts are detectable on the server side of the connection. Accordingly, embodiments of the invention are configured and/or trained to consider this information for the detection of a proxied connection.
TCP is a connection-oriented protocol for the reliable transfer of information over a network. In many cases, the information to be transported may be files or video streams. These files may be large or long-lived video streams and exceed the Maximum Segment Size (MSS) imposed on the connection due to IP transport over the network. Though IPv4 can support an MSS of 64 KB, the network typically imposes a much smaller MSS due to the MTU imposed by the network link. In an untunneled situation, the MTU is often 1500 B. In combination of a 20 B IPv4 header, and a 20 B TCP header, the MSS is often limited to 1460 B (MSS(clear) versus MSS(tunnel)). In the case of a VPN tunnel, however, the tunnel overhead may easily exceed 100 B resulting in an MSS of 1360 B or fewer.
As shown in
The VPN/proxy tunnel impacts the behavior of the client's connection resulting in different packets counts and packet sizes (and loss, latency, and jitter). Accordingly, in embodiments of the invention, the proxy detection process is trained to distinguish tunneled traffic from non-tunneled traffic.
VPN classifiers 2302(a-n) may be a trained neural network, a software heuristic, a simple or compound decision tree (e.g., Random Forest and/or XGBoost variants), or an ensemble of classifiers of similar or dissimilar type may provide output classification, confidence, and optional other information such as geolocation. The output y also may be computed by a function such as majority vote, weighted or unweighted averaging. Classifiers 2302(a-n) can be concatenated or hierarchical.
One embodiment of the invention includes a DPNC VPN classifier that operates on the features found in the packets and packet metadata and the various functional and statistical metrics of features processed over the flows, sensors, and session. The classifier can determine if a proxy is detected (yes/no or true/false) and the result may include a confidence score or risk score. The score may be scaled to 0 to 1 (0% to 100%) or another numeric threshold, e.g., <=0; >0, etc. The classifier may make the classification based on some, or all, of the metadata of the session, and may make an iterative classification a number of times as the metadata is processed and collected. Thus, it is not necessary for all the data from all the flows or sensors to be received before a classification and confidence level may be determined. In this way, incremental results can be used as soon as available, or when a timer expires, or when required or requested by the system. The classifier process may continue to operate until all the session data is received and processed. Incremental classification results may be presented to subsequent classifier layers, and/or written to the database incrementally up until and including to the final classification of the session is complete. Note that the upstream system may use an incremental update, perhaps due to time constraints, that is not the final classification.
The combination of the Neural Networks (or decision trees) into a multi-stage proxy detection process distributes the decision load and incorporates feedback and other ancillary metadata to achieve a higher level of accuracy than a single stage that uses the packet metadata alone in its classification or confidence level scoring.
Exemplary results stage results are shown in the figure. In this example, a first stage trained Neural Network may use Session Metadata features to determine:
A second stage Ensemble Classifier may use first stage, historic, and exogenous inputs to refine the data to the following results:
Two cases (a) and (b) are shown for each connection. The direct access cases 2502(a) and 2504(a) are where the connections are not over VPN are shown with solid lines. The cases 2502(b) and 2504(b) where the client establishes a VPN tunnel to a VPN server in Virginia is shown in dashed lines.
As an example, in the direct access case 2502(a), a user accesses a web application using their laptop to sends IP packets via the subscriber's home WiFi router that may then forwards them over the internet via routers in Salt Lake City, Denver, Chicago and then to the webserver in New York. Along its journey, the IP packets in this case encounter a total of 5 routers and therefore, the IP packet arrives at the New York endpoint with a Time-To-Live (TTL) that has been decremented by 5. Similarly, in direct access case 2504(a), packets to a Webserver in Los Angeles result in the packet's IP TTL decremented by 1. However, in the case of the VPN tunnel from the client to the VPN server in Virginia 2504(b), the IP packets from the client are tunneled all the way to Virginia where they egress from the VPN server with the originating TTL. The VPN server then forwards them via a router in Virginia to New York with the TTL decremented only by 1. Client packets destined to Los Angeles 2504(b) also egress the VPN tunnel in Virginia and are then forwarded by routers in Atlanta, Dallas, Phoenix and then to the endpoint in Los Angeles. Upon arrival in Los Angeles, these packets have their TTL decremented by 4. The TTL values for the direct cases 2502(a) and 2504(a) are consistent with a client in San Jose. The TTL values in the VPN cases 2502(b) and 2504(b) are consistent with a VPN client in Virginia. In essence, the TTL gives the appearance that the VPN client is in Virginia.
One key difference between a client physically located in Virginia connecting to a server in New York, versus a VPN client in San Jose transiting a tunnel to Virginia and then connecting to New York, is the round-trip time (RTT).
The VPN/proxy tunnel impacts the behavior of the client's connection resulting in different packets counts and packet sizes, loss, latency, jitter, TTL and other fields. According to embodiments of the invention, the proxy detection process is trained to distinguish tunneled traffic from non-tunneled traffic based on these tells.
It has been observed that the VPN/proxy tunnel impacts the behavior of the client's connection resulting in different packets counts and packet sizes, loss, latency, jitter, TTL and other packet fields and protocol behavior. According to embodiments of the present invention, the proxy detection process is trained to distinguish tunneled traffic from non-tunneled traffic using these tells.
While embodiments of the invention have been disclosed above, various modifications to the example embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the invention.
Moreover, in this description, numerous details are set forth for the purpose of explanation. However, one of ordinary skill in the art will realize that the invention may be practiced without the use of these specific details. In other instances, well-known structures and processes are shown in block diagram form in order not to obscure the description of the invention with unnecessary detail. Thus, the present disclosure is not intended to be limited to the embodiments shown but is to be accorded the widest scope consistent with the principles and features disclosed herein.
In describing exemplary embodiments, specific terminology is used for the sake of clarity. For purposes of description, each specific term is intended to at least include all technical and functional equivalents that operate in a similar manner to accomplish a similar purpose. Additionally, in some instances where a particular exemplary embodiment includes a plurality of system elements, device components or method steps, those elements, components or steps may be replaced with a single element, component or step. Likewise, a single element, component or step may be replaced with a plurality of elements, components or steps that serve the same purpose. Moreover, while exemplary embodiments have been shown and described with references to particular embodiments thereof, those of ordinary skill in the art will understand that various substitutions and alterations in form and detail may be made therein without departing from the scope of the invention. Further still, other embodiments, functions and advantages are also within the scope of the invention.
This application claims priority to U.S. Provisional Application No. 63/480,660, entitled DETECTION OF PROXIED NETWORK CONNECTIONS, filed on Jan. 19, 2023, the entire contents of which are hereby incorporated by reference.
Number | Date | Country | |
---|---|---|---|
63480660 | Jan 2023 | US |