Technical Field
This application generally relates to client-server communications and the delivery of content over computer networks, and more particularly to the identification and/or characterization of client devices that are requesting content over computer networks.
Brief Description of the Related Art
The client-server model for obtaining content over a computer network is well-known in the art. In a typical system, such as that shown in
It also known in the art to use distributed computer systems to deliver content to client devices. One such distributed computer system is a “content delivery network” or “CDN” that is operated and managed by a service provider. The service provider typically provides the content delivery service on behalf of third party content providers. A “distributed system” of this type typically refers to a collection of autonomous computers linked by a network or networks, together with the software, systems, protocols and techniques designed to facilitate various services, such as content delivery or the support of outsourced site infrastructure. Typically, “content delivery” refers to the storage, caching, or transmission of content—such as web pages, streaming media and applications—on behalf of content providers, and ancillary technologies used therewith including, without limitation, DNS query handling, provisioning, data monitoring and reporting, content targeting, personalization, and business intelligence.
In a known system such as that shown in
Typically, content providers offload their content delivery by aliasing (e.g., by a DNS CNAME or otherwise) given content provider domains or sub-domains to domains that are managed by the service provider's authoritative domain name service. End user client machines 122 that desire such content may be directed to the distributed computer system to obtain that content more reliably and efficiently. The servers 102 respond to the client requests by obtaining requested content from a local cache, from another content server, from the origin server 106, or other source, for example.
Although not shown in detail in
As illustrated in
The machine shown in
The CDN may include a storage subsystem (sometimes referred to as “NetStorage”) which may be located in a network datacenter accessible to the content servers, such as described in U.S. Pat. No. 7,472,178, the disclosure of which is incorporated herein by reference. The CDN may operate a server cache hierarchy to provide intermediate caching of customer content; one such cache hierarchy subsystem is described in U.S. Pat. No. 7,376,716, the disclosure of which is incorporated herein by reference. For live streaming delivery, the CDN may include a live delivery subsystem, such as described in U.S. Pat. No. 7,296,082, and U.S. Publication No. 2011/0173345, the disclosures of which are incorporated herein by reference.
Whether content is delivered directly as in
The proliferation of client devices means that the display features, form factors, functional capabilities, and other characteristics thereof are becoming much more diverse. Online content providers want to be able to deliver content effectively and efficiently to this increasing array of clients in a way that is situationally-aware. To optimize the end user experience, a given server (in the CDN or otherwise) preferably is able to understand the capabilities, limitations, and other attributes of the client device that is requesting content from it. The server can then act appropriately for the particular device—for example, sending images appropriately sized for the client device's screen, or filtering content sent to the client so that incompatible content is not delivered to the client. Hence, there is a need for a server to be able discern information about a requesting client in a rapid fashion, accurately, at scale, and while accommodating a non-uniform and ever-expanding universe of new clients.
The teachings herein address these and other needs and offer other features and advantages that will become apparent in view of this disclosure.
The teachings herein generally relate to client-server communications and the delivery of content over computer networks to client devices, and the teachings provide improved methods, systems, and apparatus for identifying and/or characterizing client devices that are requesting content from a server. For example, based on information sent in a client device's request for content, a server modified in accordance with the teachings hereof can derive and identify the client device and a set of characteristics associated with the client device. Such characteristics might include the model name/manufacturer of the client device, screen dimensions of the client device, information about the particular system or browser version it is running, content formats it supports, and so on. The server operating may then use this information to modify and customize its response for the given client device.
In one embodiment, as part of an offline configuration, each of a set of known client devices is initially associated with a set of tokens that are expected to be received in a request from a client device—typically tokens that would be present in the client device user-agent header in an HTTP ‘Get’ request (although other fields might be used with teachings hereof). A data structure mapping expected tokens to associated known client devices is established.
Continuing the example, when the system is live (online), a particular client device makes a request for content, and sends the user-agent header field. The server tokenizes this information, breaking it up into its individual constituents, such as “Windows” or “Safari.” Based on a comparison between the tokens generated from the information received from the client device, and the expected tokens that were previously associated with known client devices, the server can determine which of the known client devices is sending that request. In effect, the server can select which of the known client devices has tokens that are most similar to those generated from the request, the matching set of tokens representing a kind of fingerprint for the device.
The process finding a matching device, given a set of tokens generated from a client device's request, may be accomplished in a variety of ways. For example, the server can use a scoring approach by taking a particular generated token, using it look up those known client devices that had been associated with that token, and then increasing a score for each of those known client devices. This process is repeated for the other generated tokens, and at the end of the scoring, the known client device with the highest score can be selected as the matching client device. Note that tokens may have different weights, so that the appearance of a particular token may result in a larger increase in the matching known client devices' scores than does the appearance of others.
In an alternate embodiment, the server identifies the requesting client device as one of the known client devices by using the set of generated tokens to create a key. For example, the generated tokens may be aliased to integers or other identifiers, which are then combined to create the key. Or the tokens themselves may used (e.g., as strings, for example, which are concatenated). The server uses the constructed key to look up a device identifier in the data structure, e.g., which has been prepopulated so that the key points to a particular device identifier that corresponds to the matching client device. In alternate embodiments, some but not all of the tokens may be used in constructing the key. For example, certain tokens can be ignored, if they are low-value for identifying a client device. To accomplish this, the system may employ of whitelist of valid tokens, created offline during the initial configuration. Only tokens in the whitelist are used in constructing the key. This allows low-value or noise tokens, omitted from the whitelist, to be ignored during the matching process. Using a blacklist of invalid tokens is an alternate embodiment.
Once a requesting client device is identified as a particular known client device, the server can map that client device's identity to a set of client device characteristics. Such characteristics might include (for example) screen dimensions, model name, support for AJAX technologies, and other features that were not known based on the client device's request. The teachings hereof are applicable to (though not limited to) use with mobile devices such as wireless smartphones or Wifi-enabled tablets, and so forth. The characteristics of such devices vary widely and by knowing the characteristics of the client device, a server can customize a response for the particular client device, apply appropriate optimization techniques, or send the information to an origin server or elsewhere to be used for performing such customizations and optimizations. Ideally, the result is better display of the content on the client device, as well as improved performance, since the nature and size of the content can be adjusted based on an expected bandwidth to the client device and the capabilities that the client device possesses.
It should be understood that while the use of user-agent HTTP headers is one application, the teachings hereof and in particular the tokenization approach described herein are not limited to such. Likewise the server is in many cases an HTTP server, but is not limited to such. In some cases, the server may be an HTTP proxy server in a content delivery network operated by a service provider on behalf of participating content providers, and the identification of the client device and its characteristics may be offered as a service by the CDN for participating content providers. Hence, such information may be communicated from the content delivery platform to the content provider's own servers or other data infrastructure for use in, e.g., content authoring.
The foregoing merely refers to non-limiting embodiments of the subject matter disclosed herein and the appended claims define the subject matter for which protection is sought. The teachings hereof may be realized in a variety of systems, methods, apparatus, and non-transitory computer-readable media. It is also noted that the allocation of functions to particular machines described herein is not limiting, as the functions recited herein may be combined or split amongst machines in a variety of ways.
The teachings herein will be more fully understood from the following detailed description taken in conjunction with the accompanying drawings, in which:
It should be noted that in the Figures, the integers representing internal identifiers (such as 37, 56, 118, 17, and 1,2,3, etc., in
The following description sets forth embodiments of the invention to provide an overall understanding of the principles of the structure, function, manufacture, and use of the subject matter disclosed herein. The systems, methods and apparatus described herein and illustrated in the accompanying drawings are non-limiting examples; the scope of the invention is defined solely by the claims. The features described or illustrated in connection with one embodiment may be combined with the features of other embodiments; such modifications and variations are intended to be included within the scope of the present disclosure. All patents, publications and references cited herein are incorporated herein by reference in their entireties.
Section 1.0—Introduction
According to the teachings hereof, the functionality of a server can be extended by incorporating a component that identifies client devices that are making requests to the server, and potentially supplies a set of characteristics about the identified devices. For convenience of description (only), this component is referred to herein as the device characterization component (DC). Given a client request, the DC identifies the client device that made it, e.g., by mapping it to a particular client device identifier. It should be noted that identifying a client device does not necessarily mean identifying just the hardware (e.g., a particular model of laptop or of the smartphone) but also may involve the information about the software resident on the device—particularly the OS and browser or other client application. Thus, a particular make/model of laptop running Windows 7 and using Internet Explorer to make requests can qualify and be identified as a different client device than the same make/model of laptop running Windows XP and using Firefox. Likewise, not every existing client device needs to be identified uniquely, because in some cases devices with insignificant devices may be treated as effectively the same client device. From an identified device, the DC can also provide information about the device's characteristics (e.g., screen height/width, JavaScript support, browser version, or other characteristics relating to the client's hardware and/or software, etc.) to other components in the server.
Typically, the server with the DC component is a web (HTTP) server, or in implementations relevant to the CDN system described above, the server may be a server running the HTTP proxy 207 process (HTTP proxy server). For example, the DC may be implemented as an independent library which will be used by the HTTP process or HTTP proxy process 207 to identify client devices and determine client device characteristics. The determined characteristics are preferably exposed to control information (e.g., metadata) and control routines executing in the server, so that this information can be taken into account to construct a response suited for the client device. The identification and/or characterization of the devices also can be logged and reported to a content provider user of a CDN.
While the DC is preferably resident within a given server fielding client requests, this is not a limitation, as the DC function could be implemented, for example, as a remote service.
The DC typically utilizes information received in the client request, typically information in one or more HTTP headers, and (in particular) a user-agent header. The user-agent request header field in HTTP 1.1 is described in RFC 2616. However, the teachings hereof are not limited to user-agent headers; for example, the techniques may be applied to data in other HTTP headers or part of some other, potentially later-defined header or data field adapted to be used for client-identifying purposes, whether those headers/fields are received from a client device or otherwise made known to the server. Examples of other headers include the X-Device-Stock-UA, X-wap-profile, X-OperaMini-Phone-UA header, etc. For convenience of illustration, the examples below use the user-agent header.
A Lexer, which receives user agents from client requests and breaks up the user agents into meaningful chunks, called tokens.
A Matcher, which receives the tokens from the Lexer and is responsible for using them to identify particular devices. The Matcher employs a match index, described in more detail and in different variations below, to match a given token to a set of devices associated with it.
A Characteristics Database, which stores characteristics for each client device. Given a particular device identified by the Matcher, the characteristics database provides a set of characteristics for that device. These characteristics can then be used by the server to generate an appropriate response for the client device.
Preferably, the match index and the characteristics database can each be updated via configuration files without requiring changes to the DC core logic or to the glue code in the server. The configuration of the system will also be described herein, and involves use of some of the same components. For convenience of description, the configuration is referred to herein as an “offline” process, while actual operation when the server is receiving client requests and identifying/characterizing client devices is referred to as “runtime” or “online.”
Before examining each of the components in more detail, presented below is a discussion of the data model for the DC.
Section 2.0—Data Model
Section 2.1—Match Index
In the present embodiment, the DC library maintains a match index to identify devices. The match index maps tokens to known client devices, preferably in memory. An example of such an index for three devices is shown in
As mentioned above, in alternate implementations, the match index and the DC system may utilize information other than, or in addition to, the user-agent header. For example, assume that the information for devices 1, 2, 3 in the lower part of
Section 2.2—Device Characteristics
In the present embodiment, the Characteristics Database stores, and the DC makes available to other server components (e.g., to the HTTP process or the HTTP proxy process), a set of characteristics for client devices. The names and permissible values of these characteristics are preferably configurable. Some examples are provided in later paragraphs.
To facilitate reporting and logging, a device_name characteristic is provided. The device_name characteristic is a unique name (per client device) which can be used in server log lines for later data processing.
Additionally, another characteristic referred to herein as “buckets”, a 32-bit mask, is included. An example is shown in
Client device characteristics might include such things as screen dimensions, JavaScript support, browser name and/or version, or other characteristics, relating to the device hardware and/or software running on the device. Other examples of the kinds of characteristics that may be made available about a particular identified client device include: operating system name and/or version, processor name and/or version, the form factor of the device (e.g., smartphone, tablet, laptop), model name or manufacturer, user interface details (e.g., touchscreen availability, trackball, audio features, etc.), release date, connectivity/protocol information (e.g., WiFi enabled, 3G-capable), information about how the device renders/displays markup languages like html, wml, xhtml, or others, what support the device offers for AJAX technologies (e.g., Javascript support, event listening support, CSS manipulation support), further screen information like display resolution and whether the display has dual orientation capability, support for content formats (including multimedia), how the device handles certain transactions such as authentication and HTTP post, information about the client device's cache, whether the device has a camera, or other hardware (processor, memory, etc.) features, whether particular software is installed, and so on. Virtually any characteristic about a client device that might be useful for a content developer designing a website or otherwise might be recognized by the DC system and then reported when the matching client device is seen by the system.
Section 3.0—DC Components
Section 3.1—Lexer
As noted above, DC treats user-agents as sequences of tokens rather than whole character strings. Tokenizing input may be accomplished using conventional approaches, as modified by the teachings hereof. In the present embodiment, scanning a user-agent to tokenize it occurs online when servicing requests. It also occurs offline when generating and building the match index. Hence, to make sure that user-agents are being tokenized uniformly during configuration (offline) and at runtime (online), the same lexing routines ought to be used in each process.
Section 3.2—Matcher
As described above in connection with
For example, a score for a given client device can be calculated as the total of the tokens from the user-agent that map to that client device. (If the token appears in the user-agent but does not map to that client device, the token is not applied, i.e., it would be zero. In other embodiments, the token in such a scenario might be counted as a negative.) The client device identified for a given user-agent is that device having the highest score relative to other client devices.
Note that the weight of each token is not necessarily the same. For example, the Matcher can keep a weighting indicating how significant it considers a given token for identifying a client device with which it is associated. The more common a particular token (across devices), the less significant it might be considered to be for identifying a particular client device.
By way of illustration, consider the sample match index in
As previously noted, a client device can be associated with information from not just the user-agent but other header or client information. In such a case, the user-agent can be scored and then combined with other information to determine the final matching device.
It is noted that in the example above, the weighting of tokens as the reciprocal of the number of user-agents containing it is provided for illustration purposes only; however, the approach described herein is not limited to any particular weighting mechanism.
Internal Data Flow for Matcher
With reference to
Looking up a given token to obtain its associated “offset integer” can be accomplished using any of a wide variety of techniques known to those skilled in the art, and preferably will depend on the design parameters at hand, as well as the universe of possible tokens. Example implementations for performing such lookups include hash tables, tries (examples including a Patricia tree, nedtries and Judy arrays), cmptrees, policy-based trees, and other associative arrays.
Section 3.3—Characteristics Database
The characteristics database maps the client device IDs returned from the Matcher to the characteristics data that was configured for that client device. Virtually any set of characteristics may be configured. This enables the maintainer of the characteristics database to customize which characteristics are available to the server without changes to either the DC component or the server. Providers of the kinds of data that can be used to populate at least some of the data in the characteristics database include WURFL (wireless universal resource file, a device description repository) and Device Atlas.
Section 4.0—Configuration Files
Configuration input to DC is preferably implemented in the form of a lexicon file, an index file and a database file. The DC library uses these to construct its match index and characteristics database. To support dynamic reconfiguration (described below) as well as to ensure that servers with the same DC configuration give the same answers, the lexicon file is provided to synchronize the tokens and client device IDs that appear in the match index and database files.
The lexicon file consists of a lexicon ID, a list of tokens, and a list of client devices. Match indices and characteristics databases are constructed with a lexicon; they will throw an error/exception if their own configurations do not refer to the ID of the lexicon with which they were made.
Section 5.0—Reconfiguration
It is preferable to have the ability to reconfigure deployed DC functionality in a given server online. With respect to a given lexicon, a match index or characteristics database may be reloaded at any time. When new client devices or tokens are to be added to the system, a new lexicon is constructed.
Section 6.0—Reporting and Logging
A server with the DC component can use the pre-specified buckets characteristic (described above) to group counts of page views, requests, and bytes transferred. This enables the gathering of statistics on any arbitrary set of client devices via changes in configuration.
For more sophisticated data mining, the server can insert a unique device name (e.g., the device_name characteristic described above in the Data Model section, or other identifier) into its log lines. Those logs can then be processed with the characteristics database available to provide additional information about the client devices identified on the lines.
Section 7.0—“Direct Match” Embodiment
In an alternative embodiment, a ‘direct matching’ approach may be employed instead of the scoring of client device IDs to identify a “winning” client device that was described above.
For direct matching, a user-agent string received in a request at the server is tokenized into one or more tokens, which are each associated with an integer, much as previously described with respect to
In the current embodiment, the key essentially points to an offset in a data structure such as the array of
While all of the tokens in the user-agent can be used to create the key (as shown in
In light of the above, in one embodiment, noise tokens can be identified and removed from the system as part of the configuration process. For example, when the offline tokenizing process is performed on sample user agents it yields a set of tokens (a whitelist) that is used to construct the lexicon file and the match index. The noise tokens are removed or otherwise kept out of the set of tokens, and therefore the lexicon file and the match index. As a result these noise tokens are not used in looking up a client device in the match index—if they appear at runtime in the user-agent from a client device, they are ignored.
It should be noted that, in alternative implementations, the noise tokens could be part of a blacklist that is used to discard tokens at runtime, rather than the whitelist implementation described above.
It is also possible to configure the system to utilize “distinguished” or “high-value” tokens, and to use these to help identify client devices. For example, assume that when tokenized, a given user agent produces tokens A, B, C, D, E. Assume further that a set of distinguished tokens (within the larger set of valid user-agent tokens) has been identified, and that tokens B, C, D are such tokens. Tokens B and C, for example might designate a particular operating system, while token D represents a particular browser. The system can be configured to construct a key out solely of these tokens at runtime. For example, the key may be constructed as follows: operating_system_token+browser_token. Continuing the example, the resulting key is B (selected as the first OS token to appear)+D. This new key B+D can be used to look up a matching device in the match index. (Other categories of tokens beyond operating system and browser tokens might be defined in practice.)
In one implementation, the use of the distinguished token lookup on distinguished tokens can be performed if the normal token matching process does not yield a matching device.
Section 8.0—Computer Based Implementation
The clients, servers, and other devices described herein may be implemented with conventional computer systems, as modified by the teachings hereof, with the functional characteristics described above realized in special-purpose hardware, general-purpose hardware configured by software stored therein for special purposes, or a combination thereof.
Software may include one or several discrete programs. Any given function may comprise part of any given module, process, execution thread, or other such programming construct. Generalizing, each function described above may be implemented as computer code, namely, as a set of computer instructions, executable in one or more processors to provide a special purpose machine. The code may be executed using conventional apparatus—such as a processor in a computer, digital data processing device, or other computing apparatus—as modified by the teachings hereof. In one embodiment, such software may be implemented in a programming language that runs in conjunction with a proxy on a standard hardware platform running an operating system such as Linux. The functionality may be built into the proxy code, or it may be executed as an adjunct to that code.
While in some cases above a particular order of operations performed by certain embodiments is set forth, it should be understood that such order is exemplary and that they may be performed in a different order, combined, or the like. Moreover, some of the functions may be combined or shared in given instructions, program sequences, code portions, and the like. References in the specification to a given embodiment indicate that the embodiment described may include a particular feature, structure, or characteristic, but every embodiment may not necessarily include the particular feature, structure, or characteristic.
Computer system 1100 includes a processor 1104 coupled to bus 1101. In some systems, multiple processor and/or processor cores may be employed. Computer system 1100 further includes a main memory 1110, such as a random access memory (RAM) or other storage device, coupled to the bus 1101 for storing information and instructions to be executed by processor 1104. A read only memory (ROM) 1108 is coupled to the bus 1101 for storing information and instructions for processor 1104. A non-volatile storage device 1106, such as a magnetic disk, solid state memory (e.g., flash memory), or optical disk, is provided and coupled to bus 1101 for storing information and instructions. Other application-specific integrated circuits (ASICs), field programmable gate arrays (FPGAs) or circuitry may be included in the computer system 1100 to perform functions described herein.
Although the computer system 1100 is often managed remotely via a communication interface 1116, for local administration purposes the system 1100 may have a peripheral interface 1112 communicatively couples computer system 1100 to a user display 1114 that displays the output of software executing on the computer system, and an input device 1115 (e.g., a keyboard, mouse, trackpad, touchscreen) that communicates user input and instructions to the computer system 1100. The peripheral interface 1112 may include interface circuitry, control and/or level-shifting logic for local buses such as RS-485, Universal Serial Bus (USB), IEEE 1394, or other communication links.
Computer system 1100 is coupled to a communication interface 1116 that provides a link (e.g., at a physical layer, data link layer, or otherwise) between the system bus 1101 and an external communication link. The communication interface 1116 provides a network link 1118. The communication interface 1116 may represent a Ethernet or other network interface card (NIC), a wireless interface, modem, an optical interface, or other kind of input/output interface.
Network link 1118 provides data communication through one or more networks to other devices. Such devices include other computer systems that are part of a local area network (LAN) 1126. Furthermore, the network link 1118 provides a link, via an internet service provider (ISP) 1120, to the Internet 1122. In turn, the Internet 1122 may provide a link to other computing systems such as a remote server 1130 and/or a remote client 1131. Network link 1118 and such networks may transmit data using packet-switched, circuit-switched, or other data-transmission approaches.
In operation, the computer system 1100 may implement the functionality described herein as a result of the processor executing code. Such code may be read from or stored on a non-transitory computer-readable medium, such as memory 1110, ROM 1108, or storage device 1106. Other forms of non-transitory computer-readable media include disks, tapes, magnetic media, CD-ROMs, optical media, RAM, PROM, EPROM, and EEPROM. Any other non-transitory computer-readable medium may be employed. Executing code may also be read from network link 1118 (e.g., following storage in an interface buffer, local memory, or other circuitry).
Any trademarks appearing herein (including Windows, Mozilla, Macintosh, Intel, Safari, iPhone, Blackberry, Android) are the properties of their respective owners and are used for identification and descriptive purposes in explaining the subject matter hereof, and not to imply endorsement or affiliation.
This application is a continuation of U.S. application Ser. No. 13/730,428, filed Dec. 28, 2012, which claims the benefit of priority of U.S. Provisional Application No. 61/581,738, filed Dec. 30, 2011, and of U.S. Provisional Application No. 61/595,982, filed Feb. 7, 2012, the teachings of all of which are hereby incorporated by reference in their entireties.
Number | Name | Date | Kind |
---|---|---|---|
7653875 | Jennings et al. | Jan 2010 | B2 |
20100041380 | Hewes | Feb 2010 | A1 |
20100107225 | Spencer | Apr 2010 | A1 |
20120246689 | Thomas | Sep 2012 | A1 |
Number | Date | Country | |
---|---|---|---|
20160323387 A1 | Nov 2016 | US |
Number | Date | Country | |
---|---|---|---|
61595982 | Feb 2012 | US | |
61581738 | Dec 2011 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 13730428 | Dec 2012 | US |
Child | 15210357 | US |