Geographic location determination including inspection of network address

Abstract
Geographic location determination for a client is facilitated by performing a trace route between a known network device and the client. The trace route results in a list of intermediary network devices between the client and the known network device. Network addresses may be configured with geographically significant portions. Thus, network addresses for the client and/or one or more of the intermediary devices can be inspected to facilitated a geographic location determination for the client based on the geographically significant portions. An online service provider hosting network sites may prepare client activity reports for the hosted sites using the determined geographic data. The determined geographic locations may also be used to ensure compliance, such as with regulations, distribution agreements, etc., for data distributions to clients.
Description


FIELD OF THE INVENTION

[0001] The invention generally relates to geographic location determination, and more particularly to inspecting a network address to identify a geographic location.



BACKGROUND

[0002] It is advantageous for a server to determine the geographic location of incoming client connections. For example, location determination facilitates media distribution compliance, such as for honoring sporting event black out requirements. Location determination also facilitates providing geographic sensitive advertising, sales offers, discounts, data stream sources, and the like, as well as client tracking and evaluation.


[0003] Typically, a web site identifies an incoming client Transmission Control Protocol/Internet Protocol (TCP/IP) address (hereafter IP address), and performs a reverse Domain Name Service (DNS) lookup to obtain a text name for the IP address. This text name can then be inspected to guess a geographic location. For example, a specific IP address may resolve to “cs.sfu.ca”, from which can be deduced that the client is connecting from the Simon Frazier University in Canada. However, a problem with this technique is that many domain names cannot reliably be looked up. For example, popular “.to” and “.tv” domains indicate, from a reverse DNS lookup, that clients are respectively geographically located in Tonga and Tuvalu, notwithstanding their actually being based in the United States or another country.


[0004] Another technique is to inspect the “whois” domain name registry database to obtain registration details for a domain name. However, there the information within the database is arbitrary, and therefore it also cannot be relied upon. Thus, what is needed is a more reliable way to perform geographic location determination.







BRIEF DESCRIPTION OF THE DRAWINGS

[0005] The features and advantages of the present invention will become apparent from the following detailed description of the present invention in which:


[0006]
FIG. 1 illustrates one embodiment for determining geographic locations for a client that connects to a server.


[0007]
FIG. 2 illustrates an exemplary format for encoding a numeric network address with a text based network addresses.


[0008]
FIG. 3 illustrates one embodiment for reporting estimated client geographic locations.


[0009]
FIG. 4 illustrates a suitable computing environment in which certain aspects of the invention may be implemented.







DETAILED DESCRIPTION

[0010] As will be discussed below, many network addresses are named assigned a text based “human readable” address that is constructed with respect to known geographic locations, e.g., airports, cities, states, corporations, schools, etc. By inspecting geographic references in text based addresses assigned to routers and/or hosts situated between a client and server on a network, a server may improve estimates of a geographic location for a client.


[0011]
FIG. 1 illustrates one embodiment for determining a geographic location for a client that connects to a server. In the illustrated embodiment, multiple servers are configured to appear to the client as a single server 300 (FIG. 3).


[0012] A first operation is to receive 100 a client connection. Assumed an incoming client connection represents a connection by an individual computer, such as an end-users computer. It will be appreciated, however, that the incoming connection may be from any networked device, e.g., mobile or non-mobile computers, phones, personal digital assistants (PDAs), etc.


[0013] The network address for the connecting client is then determined 102. A network address represents a network identifier at which the incoming client may be reached. It is assumed that the client has a conventional numeric TCP/IP address, e.g., a dot quad address such as 192.168.10.100, or a text based network address. However, it will be appreciated that other network protocols may use a different addressing format.


[0014] A trace route is then performed 104 between the server and the client's network address. Trace routing involves determining a network path between the client and server. Examples of extant trace route programs include the “traceroute” application program provided by many Unix operating systems, and the “tracert.exe” application program provided by some Microsoft Windows operating systems. (Please note that all marks used herein are the property of their respective owners.)


[0015] In a TCP/IP network, trace routing is effected by directing towards the client successive network data packets with incrementally longer time-to-live (TTL) values. The TTL determines how many hops a packet is allowed before it is returned by a receiving host; return identifies the receiving host. Through successive TTL increments, all intermediary hosts (e.g., computers, routers, machines, other network devices, etc.) between the client and the server can be identified. It will be appreciated that other network environments may provide equivalent techniques.


[0016] In one embodiment, the trace routing is performed entirely externally to the client, e.g., the route path is always directed towards the client. In another embodiment, a trace route is performed from the client to the server. In a further embodiment, both the client-side and server-side trace routes are combined to maximize ability to determine a geographic location for the client. In one embodiment, the client-side trace route is performed by a network browser “plug in” or “helper application.” In one embodiment, client side trace routing may be triggered automatically, such as by the client receiving and executing a server-side or client-side script, a web page, or other trigger.


[0017] Assuming trace routing may provide a results list comprising both text based and numeric network addresses, the numeric network address are lookup up 106 to determine their text based encoding. In one embodiment having a TCP/IP network, looking up network addresses comprises performing a reverse DNS lookup on the numeric network address. In one embodiment, rather than looking up all numeric addresses in the list, instead less than all are looked up. For example, one might only look up a few of the network addresses in the list “nearest” the client.


[0018] After text based encodings have been looked up, a format for the text based encodings is identified 108. A format describes the arrangement, or structure, of the text assigned to a numeric network address. Frequently, network backbone companies structure the text based network addresses to facilitate organization, management and security of the network addresses. In particular, the structure often comprises a geographic component so that the network backbone can distinguish addresses assigned to different regions of the country.


[0019] For example, FIG. 2 illustrates an exemplary format utilized by UUNet (a division of WorldCom, Inc. of Georgia) for encoding a numeric network address with a text based network addresses. As illustrated, the exemplary network address has a first portion 200 comprising port and device data for network equipment utilized to host a particular network address, and a last portion 204 identifying a particular backbone provider (here, alter.net is a part of WorldCom). The middle portion 202 of the illustrated exemplary address comprises a reference to a nearest airport to the device to which the network address is assigned. Thus, one may look up the airport code for a UUNet address and determine the nearest airport for a particular network address.


[0020] Continuing with FIG. 1, after identifying 108 the format, a format description is looked up for the identified format. A format description identifies portions of the format, if any, that contain geographically significant data. For example, as illustrated in FIG. 2, the format description would identify the airport code portion 202 as being a geographically significant portion. Note that a text based network address may have multiple geographic references, e.g., multiple airport, city, state, etc. identifiers. The format description identifies portions relevant to determining the geographic location of the network address.


[0021] In one embodiment, a database stores known formats and indicators of geographically significant portions of the stored formats. In one embodiment, if an address does not match any stored known formats, or if it appears to match multiple formats, then an expert system, rule based system, or other deductive system may be utilized to analyze a text based network address to determine its geographic location. For example, if a trace route indicates network traffic traveled from X, through Y, to Z, and it is determined X and Z are WorldCom addresses, but it is unclear what format Y has, then a rule may conclude that Y is also a WorldCom address by virtue of its being enclosed by WorldCom addresses. In one embodiment, a scoring system is used to select a most likely format for a particular network address.


[0022] Lexical analysis or pattern matching (e.g., regular expressions) may be used to match a text based network address against known formats. In one embodiment, the search space for a matching format is reduced by identifying the domain name of the network address, e.g., FIG. 2. item 204, and then only searching for matches used with that domain name. For example, if alter.net is identified in the last portion of the text based network address, then only formats used by alter.net are inspected for a match.


[0023] After looking up 110 the format description, geographically significant portions of the text based network address are extracted 112 and used to estimate a geographic location of the client. In the illustrated embodiment, rather than identifying the format and geographically significant portions of all network addresses resulting from the trace route, instead only a network address “nearest” the client is processed. After extracting 112 the geographically significant portions, a test 114 is performed to determine whether refinement is desired. Such refinement may be required when no reverse DNS lookup can be performed to obtain a text based readable encoding of the “nearest” address, or if one desires to corroborate estimates through inspection of other addresses.


[0024] If no refinement is desired, then processing ends 116. If refinement is desired, then as discussed above, a text based encoding for another network address is looked up 118, if necessary, its format identified 120, the format description looked up 122, and geographically significant portions extracted 124. This supplementary geographic location data is then used to revise 126 the initial geographic location estimate. Processing continues with another test 114 for further refinement.


[0025] In one embodiment, client geographic location estimation may also be based at least in part on data known about the client, e.g., from data obtained from client records, mailing lists, marketing research, etc. In one embodiment, a database is used to store text based encodings for trace route results that had to be looked up, as well as estimated geographic locations for client network addresses. This database may then operate as a cache for subsequent processing of repeated network addresses.


[0026]
FIG. 3 illustrates one embodiment for reporting estimated client geographic locations. Estimated locations may be determined as discussed above, or retrieved from a database caching previous determinations.


[0027] As illustrated, a client may contact a single server, or multiple servers in a data center 300. In one embodiment, data center servers may be logically grouped to appear as a single server. In one embodiment, the servers 300 host a customer's Internet web site(s). Contact activity, e.g., by customers, visitors, etc., results in the generation of client activity logs 302 containing network addresses associated with contacting entities. As discussed above, a trace route can be performed between the servers 300 and a client to estimate a geographic location for the client. In one embodiment, each client network address is immediately processed to identify a geographic location for the client upon the client contacting the servers 300. In another embodiment, network addresses are collected for later asynchronous processing when a sufficient number of network addresses have been collected.


[0028] Assuming that addresses are collected for later group processing, in one embodiment, client activity logs 302 are filtered 304 to remove undesirable network addresses to prevent these network addresses from being processed. Undesirable network addresses include addresses that have been previously processed, as well as recognized addresses, such as ones belonging to machines of the servers 300, or other known/undesirable machines. It will be appreciated that various filter characteristics may be used to determine undesirable addresses.


[0029] Filtering results in a list 306 of network addresses for which a geographic location is to be determined. A test 308 is performed to determine whether a particular network address in the list has previously been located. If not, then the network address is asynchronously trace routed 310 to identify, as discussed above, intermediaries between the client and the servers 300. As illustrated, it is assumed the trace route operation also analyzes the route results to estimate a geographic location for the network address as discussed above. The trace routing is performed asynchronously to allow collection of client network addresses for location to continue independent of the trace routing operation. It will be appreciated, however, that some embodiments may perform the trace routing synchronously, such as discussed above, when a client network addresses is processed on contact with the servers 300.


[0030] The determined location for the client is then stored 312 in a database, and given to a data feeder 314 which is used to feed the results to a report generator 316. The report generator generates reports 318 of client activity that can be distributed to businesses being hosted by the servers 300. By storing 312 the determined location, in a subsequent geographic location determination, the test 308 returns that the client network address is already known, and the value stored 312 in the database is provided directly to the report generator 316. It will be appreciated that various reports 318 may be generated, such as reports for a particular site hosted by the servers indicating the geographic location for clients contacting the sites. It will be further appreciated that the mechanisms discussed herein may be applied in real time determinations of appropriate advertising, content, etc. to be sent to a contacting client.


[0031]
FIG. 4 and the following discussion are intended to provide a brief, general description of a suitable computing environment in which certain aspects of the illustrated invention may be implemented.


[0032] For example, an exemplary system for embodying one of the servers 300 of FIG. 3 includes a machine 400 having system bus 402 for coupling various machine components. Typically, attached to the bus are processors 404, a memory 406 (e.g., RAM, ROM), storage devices 408, a video interface 410, and input/output interface ports 412. The machine may also include embedded controllers, Programmable Logic Devices (PLD), Programmable Logic Arrays (PLA), Programmable Array Logic (PAL), Generic Array Logic (GAL), Field-Programmable Gate Arrays (FPGA), Application Specific Integrated Circuits (ASIC), computers, smart cards, or another machine, system, etc.


[0033] The machine is expected to operate in a networked environment using logical connections to one or more remote machines 414, 416 through a network interface 418, modem 420, or other communication pathway. Machines may be interconnected by way of a wired or wireless network 422 including an intranet, the Internet, local area networks, wide area networks, cellular, cable, laser, satellite, microwave, Blue Tooth, optical, infrared, or other carrier technology.


[0034] The invention may be described by reference to different high-level program modules and/or low-level hardware contexts that may be stored in memory 406 and/or storage devices 408. Program modules include procedures, functions, programs, components, data structures, and the like, for performing particular tasks or implementing particular abstract data types. One skilled in the art will realize that program modules and low-level hardware contexts can be interchanged with low-level hardware instructions, and are collectively referenced hereafter as “directives.” One will further appreciate that directives may be recorded or carried in a compressed, encrypted, or otherwise encoded format without departing from the scope of this patent, even if the instructions must be decrypted, decompressed, compiled, interpreted, or otherwise manipulated prior to their execution or other utilization by the machine.


[0035] Memory 406, storage devices 408, and associated media, can store data and directives for the machine 400. Program modules may be implemented within a single machine, or processed in a distributed network environment, and stored in both local and remote memory. Memory and storage devices include hard-drives, floppy-disks, optical storage, magnetic cassettes, tapes, flash memory cards, memory sticks, digital video disks, biological storage, and the like, as well as wired and wireless transmission environments, such as network 422, over which directives may be delivered in the form of packets, serial data, parallel data, or other suitable transmission format.


[0036] Thus, for example, with respect to the illustrated embodiments, assuming machine 400 operates a server, then remote devices 414, 416 may respectively be clients contacting the server over the network 422. It will be appreciated that remote machines 414, 416 may be configured like machine 400, and therefore include many or all of the elements discussed for machine. It should also be appreciated that machines 400, 414, 416 may be embodied within a single device, or separate communicatively-coupled components, and may include or be embodied within routers, bridges, peer devices, web servers, etc.


[0037] Illustrated methods, and corresponding written descriptions thereof, are intended to illustrate machine-accessible media storing directives, or the like, which may be incorporated into single and multi-processor machines, portable computers, such as handheld devices including Personal Digital Assistants (PDAs), cellular telephones, and the like. Directives, when accessed, read, executed, loaded into, or otherwise utilized by a machine, causes the machine to perform the illustrated methods. The figures, written description, and claims may variously be understood as representing instructions taken alone, instructions as organized in a particular form, e.g., packet, serial, parallel, etc., and/or instructions together with their storage or carrier media.


[0038] Having described and illustrated the principles of the invention with reference to illustrated embodiments, it will be recognized that the illustrated embodiments can be modified in arrangement and detail without departing from such principles.


[0039] And, even though the foregoing discussion has focused on particular embodiments, it is understood that other configurations are contemplated. In particular, even though expressions such as “in one embodiment,” “in another embodiment,” or the like are used herein, these phrases are meant to generally reference embodiment possibilities, and are not intended to limit the invention to particular embodiment configurations. As used herein, these terms may reference the same or different embodiments, and unless implicitly or expressly indicated otherwise, embodiments are combinable into other embodiments. Consequently, in view of the wide variety of permutations to the above-described embodiments, the detailed description is intended to be illustrative only, and should not be taken as limiting the scope of the invention.


[0040] What is claimed as the invention, therefore, is all such modifications as may come within the scope and spirit of the following claims and equivalents thereto.


Claims
  • 1. A method for geographic location determination based at least in part on inspection of a network address of a client, the method comprising: performing a trace route between a server and the address of the client, the trace route identifying at least one domain name in a route between the server and the client; identifying a construction format for the domain name; identifying a geographically significant component of the domain name; and determining a geographic location for the domain name based at least in part on the geographically significant component.
  • 2. The method of claim 1, further comprising: analyzing domain names associated with a network access provider so as to identify the construction formats for said domain names; identifying geographically significant components of said construction components; and storing cross-references between said geographically significant components and geographic locations in a storage.
  • 3. The method of claim 1, further comprising: validating said determined geographic location by performing at least one alternate geographic determination for the network address.
  • 4. The method of claim 3, further comprising: determining more than one geographic location for the network address; and ranking said determined geographic locations in accordance with the number of alternate geographic location determinations consistent with said determined geographic locations.
  • 5. The method of claim 1, further comprising: providing a regular expression corresponding to the construction format; matching the regular expression against the domain name; and identifying a geographically significant portion of the regular expression so as to facilitate said identifying the geographically significant component of the domain name.
  • 6. The method of claim 1, wherein said performing the trace route is performed from the server to the client.
  • 7. The method of claim 1, wherein said performing the trace route is performed from the client to the server.
  • 8. A method for determining a geographic location for a network address, comprising: receiving a trace route comprising first and second network host identifiers for hosts disposed between a server and a client on a network; matching the first network host identifier to a first template; first parsing the first network host identifier according to the first template; and identifying an estimated geographic location for the client based at least in part on said first parsing.
  • 9. The method of claim 8, further comprising: matching the second network host identifier to a second template; second parsing the second network host identifier according to the second template; and revising said estimated geographic location based at least in part on said first parsing.
  • 10. The method of claim 8, further comprising: revising said estimated geographic location based at least in part on a client profile associated with the client.
  • 11. The method of claim 10, further comprising: said client contacting the server with the web browser, said browser providing the client profile to the server.
  • 12. The method of claim 10, wherein the client profile is based at least in part on a customer database identifying the client.
  • 13. The method of claim 10, wherein the client profile is based at least in part on a mass-marketing database identifying the client.
  • 14. A method of determining a geographic location, comprising: creating a log comprising network addresses of clients that have communicated with a web server; filtering the log so as to remove undesirable network addresses; asynchronously performing a trace route between a first one of said filtered network addresses and the server; analyzing a result of said asynchronous performed trace route; and determining a geographic location for said first one responsive to said analyzing.
  • 15. The method of claim 14, further comprising: generating a report comprising geographic locations for clients that have communicated with the web server.
  • 16. The method of claim 14, wherein said determining the geographic location comprises: matching the result against a template identifying geographically significant portions of network addresses formatted in compliance with the template.
  • 17. The method of claim 14, wherein undesirable network addresses comprise network addresses already having a known geographic location.
  • 18. An apparatus for geographic location determination based at least in part on inspection of a network address of a client comprising a readable medium having instructions encoded thereon for execution by a processor, said instructions capable of directing the processor to perform: performing a trace route between a server and the address of the client, the trace route identifying at least one domain name in a route between the server and the client; identifying a construction format for the domain name; identifying a geographically significant component of the domain name; and determining a geographic location for the domain name based at least in part on the geographically significant component.
  • 19. The apparatus of claim 18, said instructions including further instructions capable of directing the processor to perform: analyzing domain names associated with a network access provider so as to identify the construction formats for said domain names; identifying geographically significant components of said construction components; and storing cross-references between said geographically significant components and geographic locations in a storage.
  • 20. The apparatus of claim 18, said instructions including further instructions capable of directing the processor to perform: validating said determined geographic location by performing at least one alternate geographic determination for the network address.
  • 21. The apparatus of claim 20, said instructions including further instructions capable of directing the processor to perform: determining more than one geographic location for the network address; and ranking said determined geographic locations in accordance with the number of alternate geographic location determinations consistent with said determined geographic locations.
  • 22. The apparatus of claim 18, said instructions including further instructions capable of directing the processor to perform: providing a regular expression corresponding to the construction format; matching the regular expression against the domain name; and identifying a geographically significant portion of the regular expression so as to facilitate said identifying the geographically significant component of the domain name.
  • 23. The apparatus of claim 18, wherein said performing the trace route is performed from the server to the client.
  • 24. The apparatus of claim 18, wherein said performing the trace route is performed from the client to the server.
  • 25. An apparatus for determining a geographic location for a network address comprising a readable medium having instructions encoded thereon for execution by a processor, said instructions capable of directing the processor to perform: receiving a trace route comprising first and second network host identifiers for hosts disposed between a server and a client on a network; matching the first network host identifier to a first template; first parsing the first network host identifier according to the first template; and identifying an estimated geographic location for the client based at least in part on said first parsing.
  • 26. The apparatus of claim 25, said instructions including further instructions capable of directing the processor to perform: matching the second network host identifier to a second template; second parsing the second network host identifier according to the second template; and revising said estimated geographic location based at least in part on said first parsing.
  • 27. The apparatus of claim 25, said instructions including further instructions capable of directing the processor to perform: revising said estimated geographic location based at least in part on a client profile associated with the client.
  • 28. The apparatus of claim 27, said instructions including further instructions capable of directing the processor to perform: said client contacting the server with the web browser, said browser providing the client profile to the server.
  • 29. The apparatus of claim 27, wherein the client profile is based at least in part on a customer database identifying the client.
  • 30. The apparatus of claim 27, wherein the client profile is based at least in part on a mass-marketing database identifying the client.
  • 31. An apparatus for determining a geographic location comprising a readable medium having instructions encoded thereon for execution by a processor, said instructions capable of directing the processor to perform: creating a log comprising network addresses of clients that have communicated with a web server; filtering the log so as to remove undesirable network addresses; asynchronously performing a trace route between a first one of said filtered network addresses and the server; analyzing a result of said asynchronous performed trace route; and determining a geographic location for said first one responsive to said analyzing.
  • 32. The apparatus of claim 31, said instructions including further instructions capable of directing the processor to perform: generating a report comprising geographic locations for clients that have communicated with the web server.
  • 33. The apparatus of claim 31, wherein said instructions for determining the geographic location comprises instructions for: matching the result against a template identifying geographically significant portions of network addresses formatted in compliance with the template.
  • 34. The apparatus of claim 30, wherein undesirable network addresses comprise network addresses already having a known geographic location.
  • 35. An apparatus for geographic location determination based at least in part on inspection of a network address of a client, the apparatus comprising: performing means for performing a trace route between a server and the address of the client, the trace route identifying at least one domain name in a route between the server and the client; identifying means for identifying a construction format for the domain name; identifying means for identifying a geographically significant component of the domain name; and determining means for determining a geographic location for the domain name based at least in part on the geographically significant component.
  • 36. The apparatus of claim 35, further comprising: analyzing means for analyzing domain names associated with a network access provider so as to identify the construction formats for said domain names; identifying means for identifying geographically significant components of said construction components; and storing means for storing cross-references between said geographically significant components and geographic locations in a storage.
  • 37. The apparatus of claim 36, further comprising: validating means for validating said determined geographic location by performing at least one alternate geographic determination for the network address.
  • 38. An apparatus for determining a geographic location for a network address, comprising: receiving means for receiving a trace route comprising first and second network host identifiers for hosts disposed between a server and a client on a network; matching means for matching the first network host identifier to a first template; first parsing means for parsing the first network host identifier according to the first template; and identifying means for identifying an estimated geographic location for the client based at least in part on said first parsing.
  • 39. The apparatus of claim 38, further comprising: matching means for matching the second network host identifier to a second template; second parsing means for parsing the second network host identifier according to the second template; and revising means for revising said estimated geographic location based at least in part on said first parsing.
  • 40. The apparatus of claim 38, further comprising: revising means for revising said estimated geographic location based at least in part on a client profile associated with the client.
  • 41. An apparatus for determining a geographic location, comprising: creating means for creating a log comprising network addresses of clients that have communicated with a web server; filtering means for filtering the log so as to remove undesirable network addresses; asynchronous tracing means for asynchronously performing a trace route between a first one of said filtered network addresses and the server; analyzing means for analyzing a result of said asynchronous performed trace route; and determining means for determining a geographic location for said first one responsive to said analyzing.
  • 42. The apparatus of claim 41, further comprising: generating means for generating a report comprising geographic locations for clients that have communicated with the web server.
  • 43. The apparatus of claim 41, wherein said determining means for determining the geographic location comprises: matching means for matching the result against a template identifying geographically significant portions of network addresses formatted in compliance with the template.