1. Technical Field
This disclosure generally relates to data processing apparatus and to client-server systems for delivering online content, among other things.
2. Brief Description of the Related Art
It is known the art, in accordance with the HTTP protocol, for a server identified by a given domain name to store one or more cookies on the client machine of an end-user visiting a website hosted by that server. The cookie contains typically data relevant to the client or to the end-user, such as state information for a given web session, a record of visits, purchases, and/or other past activities on the website by the end-user. Further, a cookie might contain a unique identifier for the client, allowing them to identified and tracked on subsequent visits (sometimes referred to as an ID cookie). Whatever information the cookie(s) might store, when the client returns to the website, it sends its cookies to the server and thereby enables the server to access the stored data.
According to convention, a server sets a cookie to be accessible only within the host domain (e.g., foo-A.com or shoppingcart.foo-A.com, etc.). The cookie's scope may also be limited to a particular path (e.g., /user) within the domain. Thus, the cookie's domain and path determine the scope of the cookie, and they tell the client that the cookie should only be sent back to a server hosting the stated domain and path, e.g., as part of the client's content request to that server. This generally means that cookies set in one domain are not accessible to hosts in another domain.
In some cases, however, there is a need to synchronize cookies across domains. For example, in the online advertising industry, bidders and ad exchanges often need to synchronize ID cookies so that in online auctions for advertising space managed by the ad exchange, the bidder can identify a particular client internally given the ad exchange's identifier. As another example, a website owner may need to synchronize cookies with an outside analytics service, so that the analytics service can identify a particular client internally given the website owner's identifier. Further a website owner may operate a multi-domain site, and need to synchronize cookies across those disparate domains. As a result, certain cookie synchronization techniques have been developed.
Current cookie synchronization techniques require a complicated series of messages between multiple parties. This is not only slow, due to the round trips involved, but also requires a high degree of coordination amongst the involved parties.
For example, it is known in the art to use a series of HTTP redirects (302 responses) to synchronize cookies between two machines.
The process begins when an end-user client 100 makes a HTTP ‘Get’ request to foo-A.com for an object. The object may be, for example, a match tag or pixel placed on a web page for the purpose of initiating the synchronization process. Server A is able to read the ID cookie from its domain (e.g., ID=123) and issue a HTTP 302 redirect to foo-B.com, placing its cookie in the redirect URL as a parameter, a technique sometimes referred to as ‘piggybacking’ the cookie. Server B receives the subsequent request for the redirect URL from the end-user client and reads the foo-A.com ID cookie, while also receiving its own foo-B.com ID cookie (e.g., ID=456) from the client, since the client will send its foo-B.com cookies as part of the request. Hence, Server B now has both ID cookies and can establish a mapping between the two. Server B can then deliver the pixel (the 1 xl image) to the client 100. Alternatively, as shown by the dotted arrows, Server B could issue another redirect to foo-A.com, placing its cookie in the redirect URL as a parameter. This way, Server A will receive the foo-B.com ID cookie and also can establish the mapping between the two ids.
As mentioned above, this and other prior art approaches for cookie synchronization are slow and complex.
There is a need to improve the speed and reduce the complexity of existing cookie synchronization techniques. Moreover, there is also a need to improve content delivery on websites that source content from multiple domains. As will be described below, improved cookie synchronization techniques can facilitate methods and systems for delivering web content sourced from multiple domains.
The teachings hereof address these needs and offer advantages and functionality which will become clear in view of this disclosure.
This disclosure describes, among other things, improved systems and methods for synchronizing cookies across different domains, and for leveraging those systems for content delivery solutions, including solutions for sites that incorporate third-party content.
For example, two parties hosting content under different domain names from one another may desire to synchronize identification or ‘ID’ cookies that hold identifiers for a given client or end-user, so that one or both of the parties can map a given identifier from one domain to the identifier used in the other domain. Some of the techniques described herein leverage one or more proxy servers that may be part of a distributed computing platform known as a content delivery network. Furthermore, improved techniques for cookie synchronization can facilitate new ways of accelerating the delivery of content. In situations where a particular website is built on content from multiple domains (e.g., a web page from one domain with embedded content from another domain), the techniques in some embodiments enable cookies from the different domains to be mapped to one another, and this mapping can be used to apply content acceleration techniques. For example, an ID cookie for a given client received in a request for a web page in a first domain can be used to determine a corresponding ID cookie(s) for that client in second domain. This information can be used to prefetch embedded content from the second domain (among other acceleration techniques).
The foregoing merely refers to non-limiting embodiments of the subject matter disclosed herein. The appended claims define the scope of the invention and are also considered to be part of the disclosure hereof. The teachings hereof may be realized in a variety of systems, methods, apparatus, and non-transitory computer-readable media. It is also noted that the allocation of functions to different machines is not limiting, as the functions recited herein may be combined or split amongst different machines in a variety of ways.
The teachings hereof will be more fully understood from the following detailed description taken in conjunction with the accompanying drawings, in which:
The following description sets forth embodiments of the invention to provide an overall understanding of the principles of the structure, function, manufacture, and use of the methods and apparatus disclosed herein. The systems, methods and apparatus described herein and illustrated in the accompanying drawings are non-limiting examples; the scope of the invention is defined solely by the claims. The features described or illustrated in connection with one exemplary embodiment may be combined with the features of other embodiments. Such modifications and variations are intended to be included within the scope of the present invention. All patents, publications and references cited herein are expressly incorporated herein by reference in their entirety.
Some embodiments described herein make use of an intermediary between a client and a server. For example, some embodiments make use of an edge-deployed proxy server, as utilized in a distributed computing platform configured as a content delivery network. Hence for illustrative purposes an example of a content delivery network is described below.
As used herein, a domain name, or sometimes simply a ‘domain,’ is used to refer to a name that designates a realm of administrative authority on the Internet. An example of a domain name is “example.com”, which indicates a particular top level domain (“.com”) and a second level domain (“example”). Such a domain name may have subdomains, such as “images.example.com” and “www.example.com”, which are also themselves domain names. If in use, a domain name typically is resolved through the domain name system (DNS system) to identify a particular network host or device, e.g., a particular machine or set of machines.
In this disclosure, the term ‘URL’ is used to refer to a ‘uniform resource locator’. As those skilled in the art will recognize, according to convention a given URL may contain several components or fields, including a protocol (also referred to as a scheme), a hostname, a path (which may include a filename, if the URL is pointing to a particular file/resource rather than a directory), a query (e.g., a query string with query parameters), and a fragment. Thus a representative URL may be written as <protocol>://<hostname>/<path><query><fragment>. However, a URL need not contain all of these components.
CDN
One kind of distributed computer system is a “content delivery network” or “CDN” that is operated and managed by a service provider. The service provider typically provides the content delivery service on behalf of third parties. A “distributed system” of this type typically refers to a collection of autonomous computers linked by a network or networks, together with the software, systems, protocols and techniques designed to facilitate various services, such as content delivery or the support of outsourced site infrastructure. Typically, “content delivery” refers to the storage, caching, or transmission of content—such as web pages, streaming media and applications—on behalf of content providers, and ancillary technologies used therewith including, without limitation, DNS query handling, provisioning, data monitoring and reporting, content targeting, personalization, and business intelligence.
In a known system such as that shown in
Typically, content providers offload their content delivery by aliasing (e.g., by a DNS CNAME or otherwise) given content provider domains to domains that are managed by the service provider's authoritative domain name service. End user client machines 222 that desire such content may be directed to the distributed computer system to obtain that content more reliably and efficiently. The CDN servers respond to the client requests, for example by obtaining requested content from a local cache, from another CDN server, from the origin server 206, or other source.
Although not shown in detail in
As illustrated in
The machine shown in
The CDN may include a network storage subsystem (sometimes referred to herein as “NetStorage”) which may be located in a network datacenter accessible to the content servers, such as described in U.S. Pat. No. 7,472,178, the disclosure of which is incorporated herein by reference.
The CDN may operate a server cache hierarchy to provide intermediate caching of customer content; one such cache hierarchy subsystem is described in U.S. Pat. No. 7,376,716, the disclosure of which is incorporated herein by reference.
Proxy Server Cookie Matching
An enhancement to the redirect technique described with respect to
Referring now to
At 402, the client 100 makes a request to the proxy server for content (e.g., for a pixel, other image, or other web page object). Upon receiving this request, the proxy server invokes a content handling configuration for foo-A.com to determine how to handle this request (e.g., as specified in configuration metadata as taught in U.S. Pat. No. 7,240,100 the teachings of which are hereby incorporated by reference). In this case, assume the configuration indicates that a request for this object should be handled as a cookie-syncing request and provides the necessary parameters to perform the cookie sync (e.g., which domain with which to perform the cookie sync, information necessary to decode the cookie, etc.). Note that in alternate embodiments, information relating to the cookie synchronization process may be placed in the request URL as a parameter, or even in the requested object (which the proxy server can periodically obtain from Server A and cache locally).
The proxy server contains logic to extract the ID cookie from the client's request and insert it into a redirect URL to foo-B.com, as shown in step 404, causing the client to make a request to Server B, as shown in step 406. If no reciprocal cookie sync is necessary, then in this example the proxy server's role is done and Server B would provide the requested content to the client. However, in the embodiment illustrated here, foo-B.com responds with its own redirect providing its ID cookie with “ID=456” in the URL (408). Following the DNS aliasing process, the redirect will arrive back at the proxy server (410), which then serves the requested object (412) and stores the association between the two ID cookies (e.g., foo-A.cookie_id of 123 equals foo-B.cookie_id of 456). At that point or some later time, the association is reported back to Party A, as shown in step 414.
The redirect technique illustrated in
In an alternate embodiment, illustrated in
Note that if the association between the ID cookies is cached at the proxy server or a remote storage accessible to the proxy server, it is possible to accelerate the process further: when the proxy server receives the initial request (502) from the client and receives foo-A.com's cookies, it can perform an internal lookup in a cookie association cache, using the foo-A.com ID cookie as a key, to see if it already has an associated foo-B.com ID cookie. If so, then the proxy server does not need to redirect to foo-B.com and wait for a response (as in step 504 and 506), but instead can serve the requested content and report the mapping between the cookies (508).
The above technique can be used to synchronize across cookie-isolated subdomains, as can any of the other embodiments described herein.
Proxy Server ‘Silent’ Cookie Syncing
In another embodiment, a proxy server performs so-called ‘silent’ cookie syncing, in that the proxy server does not issue redirect responses as described above. Instead, the proxy server records and correlates ID cookies that are exposed during requests for content that the proxy server is handling.
To fulfill to the client's request for the object, the proxy server may retrieve the object from a local cache, if the object is stored and valid (e.g., not expired) for delivery, or may make a forward request (shown with a dotted line) to Server A to obtain the object, and then relay it to the client 100 (604).
At a subsequent time, assume that the same client 100 makes a request to the proxy server for an object in the foo-B.com domain (606). The foregoing process repeats, with the proxy server obtaining the foo-B.com ID cookie and the CDN ID cookie, storing them in the database, and servicing the client's request for the object.
As a result of this process, the proxy server can establish a mapping between ID cookies across domains and report those mappings to CDN customers Party A and Party B. In this implementation, the mapping is keyed by the CDN ID cookie in the database. At steps 610 and 612, the proxy server can report the pairing to Parties A and B.
It should be noted that in practice, a given client 100 may not be guaranteed to return to the same proxy server in a given set of proxy servers. Thus, the database is preferably maintained across proxy servers in the CDN—potentially across servers in a particular region or across some other subset of proxy servers in the CDN—or even across the entire CDN platform. A given proxy server can report an ID cookie mapping, once determined, to a central repository (shown in
Cookie Syncing Via Proxy Server URL Modification
In another embodiment, a proxy server synchronizes cookies by rewriting URLs on-the-fly. This technique to the situation, among others, where Party A is a content provider with a website at foo-A.com and Party A has arranged for another party, Party B, to provide certain content on the site from Party B's own domain. Party B in this case might be a social media network, analytics or web monitoring vendor, advertiser, a party that provides site enhancements with embedded news/content feeds (using web API calls, for example), or otherwise. For purposes of illustration, assume Party A has published an html document (or other markup language document using XML or WML, or other content) on its site with an embedded URL(s) pointing to foo-B.com for such content. The content from Party B is typically referred to as “third-party content” on Party A's site, which is typically referred to as the “first-party” site.
<img src=“http://foo-B.com/image.gif” height=“50” width=“50”>
(The embedded object might be any type of content, be it images, or code, or videos, or other html, iframes, or otherwise, etc. The example of an image is used solely for illustrative purposes.)
The proxy server parses the html file and upon seeing this URL (in this case, within the image tag), the proxy server sees that the domain foo-B.com is outside of the foo-A.com domain. In some implementations, the proxy server may refer to a content handling routine that instructs the proxy server to look for the foo-B.com domain as a known 3d party provider for the foo-A website. In other implementations, the proxy server can examine the domain names in the URLs to determine that the URL pointing to foo-B.com represents embedded third-party content. (Note that the hostname in the URL that triggers this may be the ‘foo-B.com’ name alone, as shown above, or a name containing the ‘foo-B.com’ domain name, such as ‘www.foo-B.com.’) The proxy server determines whether cookies have been synchronized for foo-A.com and foo-B.com and whether there is an existing (e.g., cached) mapping between them. The first time that the process takes place, there will be no such mapping.
If there is no such mapping, in order to synchronize cookies, the proxy server modifies the URL to point to a domain for Party B that has been aliased to the CDN, preferably a subdomain of a Party B domain name. In this embodiment, the aliased domain is one under which Party B has placed its ID cookie 100 on the client, i.e., a domain that is within the valid scope of the cookie so that the cookie will be accessible for requests made to the aliased domain. In
While the example above involves modifying the URL to point to an aliased subdomain of the Party B domain, it does not necessarily have to be a subdomain. For example, the Party B could also set up an alternate domain (e.g., foo-B-shadow.com) that is aliased to the CDN. Party B would need to arrange for the same ID cookies to be placed in both foo-B.com and foo-B-shadow.com. It could do this as follows: when a client visits foo-B.com, Party B sets its ID cookie and issues a redirect to foo-B-shadow.com with the cookie piggybacked in the URL, and foo-B-shadow.com then sets the same ID cookie under its domain. This requires extra configuration and time because of the redirect, but if the cookie ids do not change, it only has to be done the first time a client visits foo-B.com.
Returning to
Note that to establish the cookie mapping the proxy server will typically need to know that the cookies received with the request at 702 are to be associated with the cookies received with the request at 706. These two requests may be separated in time and even may be received at different proxy servers in the CDN. The synchronization process is preferably handled asynchronously. Hence, it is preferable that when modifying the URL to point to the subdomain or shadow domain (at 703/704), the proxy server also inserts some information into the URL to keep state and signify that the URL is part of a cookie synchronization process. This information may include the foo-A.com cookie ID (e.g., piggybacking it into the URL), information about Party A or foo-A.com, a special character sequence indicating that the request is part of a cookie sync process, etc., such that at 706 this information can simply be read from the URL by the receiving proxy server and acted upon accordingly to complete the cookie mapping. In short, the proxy server preferably embeds state into the URL at 703/704 and/or inserts a pointer to stored state information on the proxy server.
Because a CDN typically contains multiple proxy servers, once the cookie mapping is established, the mappings are shared across the CDN or at least across a subset of proxy servers in the CDN, at least in some embodiments.
Moving to step 708, the proxy server responds to the client's request by obtaining image.gif from Server B and returning it to the client 100. To reduce integration complexity and as shown in
Note that in some cases, in step 706 the client 100 may not have any cookies to send for Party B's domain, because it may be the first time that the client 100 has requested content from Party B's site, or because they have been deleted from the client machine 100, for example. In such a case, the response from the server of Party B (at 707) may include a directive to set an ID cookie on the client 100. The proxy server may, in some embodiments, capture this ID cookie, map it to the CDN ID cookie and/or Party A's ID cookie, and store it for later use, all before sending the set cookie directive onwards to the client 100 (at 708). In this way, cookie synchronization can be achieved the first time that the client 100 appears on the third-part (Party B) site, as the ID cookie is being set.
Acceleration of Third-Party Content
The synchronization of cookies in
Turning again to
The proxy server sends the file with the modified URL to the client 100 (at 712). Since the CDN is handling aliased foo-A.com, the client's request for the embedded third-party object will come to the proxy server. In anticipation of this request for the third-party content, the proxy server can pre-fetch the embedded object from Server B. The foo-B.com ID cookie must be used to make a complete and proper forward request to Server B for the content, which is potentially personalized content. Because of the previously-established cookie mapping, the proxy server has the foo-B.com ID cookie. Hence, the proxy server uses the foo-A.com ID cookie to determine the appropriate foo-B.com ID cookie, based on the previously established mapping, and pre-fetches the object (713). When the client 100 eventually parses the modified page.html and issues the request for image.gif (714), the proxy server has already obtained the object and can send it to the client immediately.
Note that in step 714 the client's request is for http://foo-A.com/foo-B.com/image.gif. The proxy server recognizes this as a special URL due to the embedded foo-B.com in the path, and recognizes that the object to deliver to the client is at http://foo-B.com/image.gif, which has been pre-fetched and stored in the cache at the proxy server. (Alternatively, a special sequence of characters could be inserted in the path to indicate to the proxy server in that the URL is a rewritten third-party URL, e.g., http://foo-A.com/special-prefix/foo-B.com/image.gif)
Note that the proxy server can modify the URL in a variety of ways and that the above is but one example. For example, in an alternate embodiment, the URL in the page can be modified as follows:
This is then sent (at 712) and the subsequently (at 714) the proxy server is configured to recognize this as the special URL and act accordingly.
Beyond pre-fetching, another advantage of the foregoing technique is that the URL for the modified page.html itself and the embedded object URL are now at the same domain, i.e., the host is foo-A.com (see the client requests at 710 and 714, in which the hostnames are the same). This domain consolidation allows a suitably capable client browser to operate more efficiently in terms of multiplexing connections to the proxy server and other enhancements. Both examples of rewritten URLs illustrate this domain consolidation technique.
In step 800, the proxy server receives a client request for first-party html (or other content with embedded URLs), and client also sends its cookies for the first-party domain. In step 802, the proxy server obtains an html document, e.g., from cache or from the first-party server. The proxy server parses the html to find the URL pointing to an embedded third party object hosted under a third-party domain, see step 804. In step 806, if the proxy server already has a mapping between the first-party and third-party domains, it branches to 808. If not, it branches to 818 in order to establish that mapping. In step 818, the proxy server modifies the third-party domain URL to point to a third party domain aliased so as to be handled by the proxy server/CDN, preferably a subdomain. The html with this modified URL is sent to the client. Subsequently the client makes a request for the third-party object at the modified URL and along with this request sends the third-party ID cookie (step 820). In step 822 the proxy server maps the first-party ID cookie to the third-party ID cookie and stores this association. The proxy server then fetches and sends the third-party object to the client, in step 824.
In the branch beginning with step 808, the proxy server modifies the URL to point to the first-party domain for domain consolidation purposes (and also specifies the location of the third-party object in the URL path) and serves the html file with this modified URL to the client. In anticipation of receiving a request for this URL back from the client, the proxy server looks up the third-party ID cookie based on the first-party ID cookie and pre-fetches the third-party object using this information (810, 812). When the client request is subsequently received (814), the proxy server can serve the third-party object without the delay of fetching the object (816).
It should be understood that the while the examples above involve modification of URLs in a markup language page that point to an embedded object, this is not a limitation. In some cases, a page returned from Server A in step 703 may contain or reference code (e.g., Javascript or other script) that sources third-party content on Server B, e.g., by causing the client to construct a URL with a third-party domain like foo-B.com and issue a request for content at such URL. In this scenario, the proxy server can modify the code as it passes through the proxy server such that it no longer calls the third-party domain for content but rather points to the domain aliased to the CDN (e.g., cdn.foo-B.com or the alternate domain, as described above). This modified code can be returned in step 704 for execution by the client 100. (Steps 712 of
Third Party is a Participating Content Provider
The approaches described in connection with
Continuing this example, at 706, the client would make a request using the foo-B.com domain, which would be aliased to the CDN and handled by the proxy server (either the same proxy server or another in the CDN). The proxy server could capture the ID cookie for foo-B.com at that point, and be able to make the association between the ID cookies for foo-A.com and foo-B.com. The resulting synchronization of ID cookies could be used to accelerate delivery of the Party B's content embedded on Party A's page, using the prefetching and/or domain consolidation approaches described above with respect to
It should be noted that because the CDN is handling Party B's content delivery, another way for the proxy server to capture the foo-B.com ID cookie is to do so when the proxy server is receiving a request for content at the Party B website (that is, a user seeking to go directly to the Party B website in another flow; and not when seeking Party B's embedded content on the Party A site). Using the CDN ID as described earlier with respect to
Use of Computer Technologies
The clients, servers, and other devices described herein may be implemented with conventional computer systems, as modified by the teachings hereof, with the functional characteristics described above realized in special-purpose hardware, general-purpose hardware configured by software stored therein for special purposes, or a combination thereof.
Software may include one or several discrete programs. Any given function may comprise part of any given module, process, execution thread, or other such programming construct. Generalizing, each function described above may be implemented as computer code, namely, as a set of computer instructions, executable in one or more processors to provide a special purpose machine. The code may be executed using conventional apparatus—such as a processor in a computer, digital data processing device, or other computing apparatus—as modified by the teachings hereof In one embodiment, such software may be implemented in a programming language that runs in conjunction with a proxy on a standard Intel hardware platform running an operating system such as Linux. The functionality may be built into the proxy code, or it may be executed as an adjunct to that code.
While in some cases above a particular order of operations performed by certain embodiments is set forth, it should be understood that such order is exemplary and that they may be performed in a different order, combined, or the like. Moreover, some of the functions may be combined or shared in given instructions, program sequences, code portions, and the like. References in the specification to a given embodiment indicate that the embodiment described may include a particular feature, structure, or characteristic, but every embodiment may not necessarily include the particular feature, structure, or characteristic.
Computer system 1000 includes a processor 1004 coupled to bus 1001. In some systems, multiple processor and/or processor cores may be employed. Computer system 1000 further includes a main memory 1010, such as a random access memory (RAM) or other storage device, coupled to the bus 1001 for storing information and instructions to be executed by processor 1004. A read only memory (ROM) 1008 is coupled to the bus 1001 for storing information and instructions for processor 1004. A non-volatile storage device 1006, such as a magnetic disk, solid state memory (e.g., flash memory), or optical disk, is provided and coupled to bus 1001 for storing information and instructions. Other application-specific integrated circuits (ASICs), field programmable gate arrays (FPGAs) or circuitry may be included in the computer system 1000 to perform functions described herein.
Although the computer system 1000 is often managed remotely via a communication interface 1016, for local administration purposes the system 1000 may have a peripheral interface 1012 communicatively couples computer system 1000 to a user display 1014 that displays the output of software executing on the computer system, and an input device 1015 (e.g., a keyboard, mouse, trackpad, touchscreen) that communicates user input and instructions to the computer system 1000. The peripheral interface 1012 may include interface circuitry, control and/or level-shifting logic for local buses such as RS-485, Universal Serial Bus (USB), IEEE 1394, or other communication links.
Computer system 1000 is coupled to a communication interface 1016 that provides a link (e.g., at a physical layer, data link layer, or otherwise) between the system bus 1001 and an external communication link. The communication interface 1016 provides a network link 1018. The communication interface 1016 may represent a Ethernet or other network interface card (NIC), a wireless interface, modem, an optical interface, or other kind of input/output interface.
Network link 1018 provides data communication through one or more networks to other devices. Such devices include other computer systems that are part of a local area network (LAN) 1026. Furthermore, the network link 1018 provides a link, via an internet service provider (ISP) 1020, to the Internet 1022. In turn, the Internet 1022 may provide a link to other computing systems such as a remote server 1030 and/or a remote client 1031. Network link 1018 and such networks may transmit data using packet-switched, circuit-switched, or other data-transmission approaches.
In operation, the computer system 1000 may implement the functionality described herein as a result of the processor executing code. Such code may be read from or stored on a non-transitory computer-readable medium, such as memory 1010, ROM 1008, or storage device 1006. Other forms of non-transitory computer-readable media include disks, tapes, magnetic media, CD-ROMs, optical media, RAM, PROM, EPROM, and EEPROM. Any other non-transitory computer-readable medium may be employed. Executing code may also be read from network link 1018 (e.g., following storage in an interface buffer, local memory, or other circuitry).
Any trademarks appearing herein are for identification and descriptive purposes only. The enumeration and labeling of steps or elements in the Figures and corresponding descriptive text is for reference purposes only and is not intended to be limiting in any way.
This application is based on and claims the benefit of priority of U.S. Provisional Application No. 61/736,166, filed Dec. 12, 2012, the teachings of which are hereby incorporated by reference in their entirety.
Number | Date | Country | |
---|---|---|---|
61736166 | Dec 2012 | US |