The discussion below is merely provided for general background information and is not intended to be used as an aid in determining the scope of the claimed subject matter.
Public availability of Internet access continues to increase along with wireless networking and the proliferation of mobile computer users. Public Internet venues, such as Internet Cafés and the like typically subsidize the cost of providing Internet services through advertising revenues. Although advertising can assist in subsidizing publicly available Internet, problems with such subsidizing exist. For example, advertisers that pay for displayed advertising in public Internet locations have difficulty validating that public machines have actually displayed their advertising. Techniques for accounting or fiscal analysis for the advertising and other subsidized services on public machines, such as internet cafes, are limited. Current systems rely on operators of the Internet Café or other public location to report to the advertising source any details regarding use of the machines. Administrators at the public café's may be required to enter codes that identify the specific Internet Café and each public machine in the cafe in order to install proprietary software, making installing software on public machines problematic. Internet café's that change machines, including, for example, host computers, networking hardware, hubs, switches and routers and the like, create administrative difficulties when new software or machines must be installed.
Techniques described herein describe systems and methods for network identification and fingerprinting for Internet Protocol (IP) based networks. More specifically, systems and methods herein provide for self-identification of machines in a network to identify a working topology of any current machines on a network and assign a weighting to each current machine as a function of a transience determination. The self-identification and transience determination allow for each machine on a network to provide a current topology and transience determination to other host computers on a network and to a remote server. The current topology and transience determination enable a collector of data, either a remote collector or local administrator to determine an appropriate weighting scheme for the transience determination. Moreover, the topology and transience data enable logical network location correlation of data from multiple host computers across multiple networks.
This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter. The term “tools,” for instance, may refer to system(s), method(s), computer-readable instructions, and/or technique(s) as permitted by the context above and throughout the document.
The detailed description is described with reference to accompanying FIGs. In the FIGs, the left-most digit(s) of a reference number identifies the FIG. in which the reference number first appears. The use of the same reference numbers in different FIGs indicates similar or identical items.
While the invention may be modified, specific embodiments are shown and explained by way of illustration in the drawings. The drawings and detailed description are not intended to limit the invention to the particular form disclosed, and instead the intent is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the present invention as defined by the claims.
This document describes systems and methods for dynamic time weighted network identification and/or fingerprinting system. More specifically, embodiments herein provide a method for identifying remote computer usage.
In an embodiment, one of the networked computers 110, 120 or 140 can operate with or without static routing functions. According to an embodiment, one or more networked computers identifies a subnet of machines via identified subnet internet protocol (IP) addresses. Once other machines are identified as being members of a current topology, the one or more networked computers can each perform a scan via an address resolution protocol (ARP) on the identified internet protocol (IP) address of each machine in the current topology to identify a media access control (MAC) address assigned to each machine. A MAC address is a unique 48-bit value assigned to the routing interface of each machine connected to a network. More specifically, referring to
Referring now to
The order in which the method is described is not intended to be construed as a limitation, and any number of the described method blocks can be combined in any order to implement the method, or an alternate method. Additionally, individual blocks may be deleted from the method without departing from the spirit and scope of the subject matter described herein. Furthermore, the method can be implemented in any suitable hardware, software, firmware, or combination thereof.
As shown, block 210 provides for identifying one or more machines on a network of machines. For example, machine 140 can identify other machines on the network. A network could be a non-switched IP based network. In such a network, machines operating in promiscuous mode can detect traffic destined for other machines on a link in the network. Promiscuous mode refers to computers with a network interface card (NIC) set to “promiscuous mode” so that the machine receives all packets on a network link and not just packets addressed to the MAC Address for the machine.
For those networks operating in promiscuous mode, machines can use the packets detected on a network link to build a list of active IP addresses. Optional block 2102 disposed within block 210, provides for scanning the network of machines for IP addresses associated with the one or more machines. For example, machine 140 can scan for IP addresses on a subnet of IP addresses for a network, such as those IP addresses for machines 110, 120, 130 and switch/hub/router 150 to determine which machines are online at that time. In other networks, such as networks with changing IP addresses due to operations using a dynamic host configuration protocol (DHCP) or the like, IP addresses can be still be used to identify machines, but the IP addresses cannot be used to identify the machines on the network if they are not static. Rather, non-static IP addresses can be used to perform further operations to locate more permanent identifiers for the machines, such as MAC addresses.
In one embodiment, the identifying of the machines on a network includes querying an external source, such as a remote server. For example, an external source can identify an external IP address for a machine or a plurality of machines on a network. A machine on the network can query a remote server to provide information that is sent to that remote server to collect exposed external IP addresses.
An address resolution protocol (ARP) scan can enable identification of machines via enabling the querying machine to receive MAC addresses of other machines on a network. MAC addresses enable a more permanent identification of machines in a network than IP addresses because MAC addresses are generally permanent and in most cases associated directly with a specific piece of hardware.
Block 220 provides for performing an address resolution procedure, such as an ARP on each of the one or more machines to determine one or more machine specific identifiers associated with each of the one or more machines. For example, machine 140 can perform an ARP to determine a MAC address for one or more of identified machines such as machine 110 and 120.
On a switched IP network, the switch generally restricts traffic such that even a promiscuous host cannot see traffic that is not broadcast or not destined for a specific MAC address. A machine on a switched IP network can identify other machines by issuing an address resolution protocol (ARP) scan across the subnet range of the network. Similarly, if external IP addresses are collected from a remote entity and sent to a local machine, the local machine may request that the remote entity perform an ARP scan on its behalf. Therefore, the network can retrieve one or more MAC addresses by performing the ARP scan using those external IP addresses.
Block 230 provides for applying a dynamic weighting to each identified machine on the network as a function of a transience of each identified machine. For example, machine 140 can apply a weighting to each of machines 110, 120, and 130 according to a transience of each identified machine. For example, the transience can include a determination of whether machine 140 had previously identified machine 110, 120, and/or 130. To determine a transience, machine 140 can maintain a list of identified machines to perform a comparison with prior address resolution procedures, such as prior ARP scans.
In one embodiment, the transience is determined after a machine has first composed a list containing metadata related to previous scans. For example, after machine 140 compiles a list of active MAC addresses on the network, machine 140 can later apply a reverse address lookup using, for example, a Reverse Address Resolution Protocol (RARP) to determine machine IP addresses and compare to the prior list to determine if there was any change in the topology of machines.
In one embodiment, the weighting can include assigning those machines that are more transient with less weight than more permanent machines on a network. For example, if a machine has just been added to a network, a host computer such as machines 110, 120, and 130 performing a scan of machines on the network would determine that the machine's MAC address was not found in any previous scans of the network. Accordingly, a more transient weight would apply to such a machine. Conversely, if a particular machine is found each time a scan is performed, a more permanent machine is identified and weighted as being less transient. The weighting could be such that a lower weight is applied to machines that are more transient and a higher weight is given to machines that are less transient. For example, in some systems, the higher weighting could be granted network benefits as determined by a policy from an administrator or the like.
In another embodiment, the weighting can be in accordance with system requirements. For example, a weighting of each identified machine can be based on the number of entries, and each entry can be assigned a value. A default gateway, such as switch/hub/router 150 (or router 310) can be identified as a landmark in a network topology and have a MAC address that is given a substantially higher weighting than other machines on the network due to its non-transient nature. Additionally, weighting can be performed by each machine capable of scanning other machines in a local network. For example, referring back to
In one embodiment, switch/hub/router 150 (or router 310 shown in
Weighting can also be calculated by a machine on a network each time a MAC address is active on a subsequent iteration of a network ARP scan. For example, if a subsequent scan performed by machine 140 indicates that machine 110 is connected to the network, the weight accorded to machine 110 can be increased because it has demonstrated more permanence. Thus, the weighting can by dynamic in that each machine on a network can alter an assigned weighting according to transience and other criteria.
Table 1, below illustrates an exemplary assignment of weights for
As shown, a dynamic weighting can change in accordance with different variables and different weighting schemes. In Table 1, printer (machine) 130 could be a network printer that is always online and available. Accordingly, it is assigned a higher dynamic weight because it is more permanent. Conversely, machine 110 appears more transient and has a lower weighting.
The system could determine that weighting calculations should be performed regularly during a day or any appropriate predetermined period. Other methods of weighting dynamically can include performing detections of other machines sporadically, according to a random time period or other period appropriate for a given network.
The dynamic weight associated with the percentage of detections can be calculated on a linear basis so there is a direct correlation between detections and dynamic weight. In other embodiments, however, a dynamic weight can be determined as an exponential function, or other function depending on the network properties or other criteria. An exponential function could be more appropriate in circumstances under which fewer detections are necessary for determining a more permanent weighting.
In one embodiment, no single MAC address change causes a network to be identified differently from an earlier identification. Rather, a combination of changes can impact the identification. For example, depending on the function used to determine transience, a MAC address change combined with metadata such as a serial number change or manufacturer change of hardware in a network can be taken into account. Also, a MAC address change that recurs a predetermined number of times could cause a network to be identified differently. Thus, the weighting can be both dynamic and time adjusted.
Either a machine in a network or a remote web server can perform an inverse query or reverse lookup using one or more external IP addresses for the machines on the network. A protocol for performing a reverse lookup includes the InterNet Assigned Numbers Authority (IANA) protocol. IANA is responsible for allocation of IP addresses. An IANA reverse query using an external IP address can provide geographic location and ownership data on a given IP address including service provider and other details. This information can be collected by machines in a network to add information to a list of identifying information of other machines on a local network.
Referring now to
Referring now to
Block 410 provides for receiving network identification data from one or more machines in a network. Disposed within block 410 is block 4102 which provides for cryptographically altering the network identification data. For example, in one embodiment, machine 140 collects network identification data, such as MAC address, IP addresses, serial numbers of machines on the network, and other metadata via a scan. Machine 140 can then organize the data into a network identification data listing. Machine 140 can also perform a hash of the data listing. A hash function or other randomizing function can enable machine 140 to send less data across the internet and also preserve privacy for the information sent. In one embodiment, multiple hashes of the data are computed using various portions of the data based on weighting and sent to the remote server 330. Those machines that share one or more of the same hashes can be considered part of the same network. The hashing function can apply to different components of the network identification data listing to enable further statistics to be determined by a remote server. Exemplary components can include the type of machine (computer, printer, mobile device), a manufacturer identifier, a serial number for a device, a MAC address, an IP address.
Block 420 provides for transmitting externally available network data to the one or more machines on the network to enable identification of the one or more machines on the network. For example, remote server 330 can transmit to machine 140 any externally detected IP addresses by performing an inverse query based on the received network identification data.
Block 430 provides for receiving transience data from the one or more machines indicative of a transience associated with the one or more machines. For example, after machine 140 determines MAC addresses of other machines operating within the network, data sent to remote server 330 can include a listing of all the machines detected by machine 140. The listing can include a hashed value of MAC addresses.
Block 440 provides for comparing the received data from the one or more machines to one or more stored transience data. For example, remote server 330 could receive the transience data from machine 140, which could only list a current view of machines on a network. Remote server 330 can include a data store 3302 that holds one or more prior received transience data. Remote server 330 can then compare prior received transience data to the received transience data to obtain a current transience of the one or more machines. The comparing can include determining which hash received from the one or more machines had more hits.
Block 450 provides for transmitting transience statistical data to the one or more machines. For example, if remote server 330 receives multiple hashes from a network, a statistical comparison can determine which hash had the most hits to allow a machine in the network, such as machine 140, to adjust its weighting scheme. The transience statistical data can increase the accuracy of transience data already in a machine regarding the prominence and permanence of other entities in the network.
Either an administrator of a network or an administrator of a remote server receiving transience data can calculate a dynamic weight. Exemplary criteria for dynamic weighting can include the following:
In one embodiment, a weighting scheme can also be implemented using one or more of the above criteria automatically. For example, rather than an administrator determining weighting criteria, an artificial intelligence or self-learning weighting scheme can be implemented. Such an artificial intelligence weighting scheme can take place out of band (OOB) as such as an application running concurrently with network software but outside of in-band data streams.
In some embodiments, the weighting scheme can be configured to prioritize network data listings received by more permanent machines.
In another embodiment, the weighting can be overridden or supplemented by an aggregated policy coming from any combination of the administrator/operator and/or one or more remote entities. For example, an operator may choose to apply a higher weighting (or more permanence) to machines associated with specific MAC addresses. Alternatively or additionally, an administrator/operator could apply determine that machines associated with specific MAC addresses should be given a fixed weight. Also, a remote entity could specify that certain MAC addresses or machines associated with certain MAC addresses should not be used for weighting determinations or other policy calculations due to their generic nature. For example, machines with MAC addresses of “00-00-00-00-00-00” or similar informationally deficient addresses may be ignored. Also, in an embodiment, a remote entity or administrator/operator may determine for rescanning frequency and the like.
Referring to now
According to an embodiment, verification of a network can include looking at the current data from a current scan and determining the current data and stored data match based on the weighting data.
The tables provided below illustrate the method for verifying a network. Each of Table 2, 3 and 4 represent previously collected data from three different networks received by, for example, a remote entity or local entity. Note that a host computer may not be simultaneously connected to three different networks, but could have information identifying three distinctly different networks over a period of time. For example, if a topology of computers changes over time, or if the host computer connects to a different network at a different location and stored that information.
Table 5 represents an exemplary detection of machines from a current scan. The current scanned data could include a determination of which machine is currently online:
As shown in
Block 530 provides for identifying the one or more networks according to a statistical function applied to the compared transience data and stored transience data. For example, a network could be identified according to a percentage of the weighting in the transience data, such as 80%. Comparing Table 5 to stored Tables 2, 3 and 4, for example, Table 5 is only a 6% match for Network 1 shown in Table 2, but an 82% match for Network 2, shown in Table 2. Therefore, a function requiring at least an 80% match would lead the verifying entity to believe that this is network 2. In one embodiment, the method performed in
The invention is operational with numerous other general purpose or special purpose computing system environments or configurations. Examples of well known computing systems, environments, and/or configurations that may be suitable for use with the invention include, but are not limited to, personal computers, server computers, hand-held or laptop devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.
The invention may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices. Tasks performed by the programs and modules are described below and with the aid of figures. Those skilled in the art can implement the description and figures as processor executable instructions, which can be written on any form of a computer readable medium.
With reference to
Computer 610 typically includes a variety of computer readable media. Computer readable media can be any available media that can be accessed by computer 610 and includes both volatile and nonvolatile media, removable and non-removable media. By way of example, and not limitation, computer readable media may comprise computer storage media and communication media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital video disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by computer 610. Communication media typically embodies computer readable instructions, data structures, program modules and includes any tangible information delivery media or article of manufacture.
The system memory 630 includes computer storage media in the form of volatile and/or nonvolatile memory such as read only memory (ROM) 631 and random access memory (RAM) 632. A basic input/output system 633 (BIOS), containing the basic routines that help to transfer information between elements within computer 610, such as during start-up, is typically stored in ROM 631. RAM 632 typically contains data and/or program modules that are immediately accessible to and/or presently being operated on by processing unit 620. By way of example, and not limitation,
The computer 610 may also include other removable/non-removable volatile/nonvolatile computer storage media. By way of example only,
The drives and their associated computer storage media discussed above and illustrated in
A user may enter commands and information into the computer 610 through input devices such as a keyboard 662, a microphone 663, and a pointing device 661, such as a mouse, trackball or touch pad. Other input devices (not shown) may include a joystick, game pad, satellite dish, scanner, or the like. These and other input devices are often connected to the processing unit 620 through a user input interface 660 that is coupled to the system bus, but may be connected by other interface and bus structures, such as a parallel port, game port or a universal serial bus (USB). A monitor 691 or other type of display device is also connected to the system bus 621 via an interface, such as a video interface 690. In addition to the monitor, computers may also include other peripheral output devices such as speakers 697 and printer 696, which may be connected through an output peripheral interface 695.
The computer 610 may operate in a networked environment using logical connections to one or more remote computers, such as a remote computer 680. The remote computer 680 may be a personal computer, a hand-held device, a server, a router, a network PC, a peer device or other common network node, and typically includes many or all of the elements described above relative to the computer 610. The logical connections depicted in
When used in a LAN networking environment, the computer 610 is connected to the LAN 671 through a network interface or adapter 670. When used in a WAN networking environment, the computer 610 typically includes a modem 672 or other means for establishing communications over the WAN 673, such as the Internet. The modem 672, which may be internal or external, may be connected to the system bus 621 via the user-input interface 660 or other appropriate mechanism. In a networked environment, program modules depicted relative to the computer 610, or portions thereof, may be stored in the remote memory storage device. By way of example, and not limitation,
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as illustrative forms of implementing the claims.