The present invention relates generally to the field of data transfer over a computer network, and more particularly to cache validation to reduce the amount of data necessary to be transferred.
The Hypertext Transfer Protocol (HTTP) is an application protocol for distributed information systems and is the foundation of data communication for the World Wide Web. A client computer submits an HTTP request message to a server computer. The server, which stores content, or provides resources, such as HTML files, or performs other functions on behalf of the client, returns a response message to the client. A response contains completion status information about the request and may contain any content requested by the client in its message body. The HTTP protocol is designed to permit intermediate network elements, such as proxy servers, to improve or enable communications between clients and servers.
An intermediate server between the requesting client computer and the origin server may cache responses from the origin server and return subsequent requests for the same content directly. A cache hierarchy is a collection of caching proxy servers organized in a logical parent/child arrangement so that caches closest to the origin server act as parents to caches closer to the client computer. For example, a request from a client computer to an origin server computer may go through a series of proxy servers arranged in a hierarchical manner. The first proxy server receiving the request searches its cache for the proper content. If the content is not found (termed a “cache miss”), the first proxy server requests the content from the next proxy server in the hierarchical line which in turn searches its own cache. If the “parent” locates the content (“cache hit”), it returns the content to the “child” without passing the request further. The child, in turn, returns the content to the client computer.
When a cache has a stale entry that it would like to use as a response to a client's request, it first has to check with the origin server (or possibly an intermediate cache with a fresh response) to see if its cached (stale) entry is still usable. This is known as “validating” the cache entry. An entity tag, or ETag, is part of the HTTP protocol. More specifically, ETags are part of HTTP version 1.1 or later as earlier versions did not support ETags. ETags are one mechanism that HTTP provides for cache validation, and which allow a client to make conditional requests. This allows caches to be more efficient, and saves bandwidth, as a server does not need to send a full response if the content has not changed.
An ETag is an opaque identifier typically assigned by an origin server to a specific version of a resource found at a uniform resource locator (URL). “Opaque” is used to denote that the ETag is unique to the computer generating the ETag. Another computer generating an ETag on the same version of the same resource would not produce the same ETag. If the resource content at the URL ever changes, a new and different ETag is assigned. Used in this manner, ETags can be quickly compared to determine if two versions of a resource are the same or are different. The use of ETags in the HTTP header is optional.
In typical usage, when a computer requests a resource, the server assigns an ETag to the resource and returns the resource along with the corresponding ETag value, which is placed in an HTTP “ETag” header field. The computer may then cache the resource along with the corresponding ETag. Later, if the computer requests the same resource, the computer sends the request and the ETag, the ETag being in an “If-None-Match” HTTP header field. On this subsequent request, the server may now compare the client's ETag with the ETag for the current version of the resource. If the ETag values match, meaning that the resource has not changed, then the server may send back a very short response with an HTTP “not modified” status. This status tells the computer that its cached version is current and should be used, saving the bandwidth that would otherwise be used to send the resource.
Aspects of an embodiment of the present invention disclose a method, computer system, and computer program product for validating a web cache independent of an origin server computer. The method comprises a first computer, connectedly disposed between a second computer and a third computer, receiving a request for a resource stored on the second computer from the third computer, the request having an entity tag (ETag) corresponding to a cached version of the resource stored on the third computer. The method further comprises the first computer forwarding the request for the resource to the second computer and receiving a copy of the resource from the second computer. The method further comprises the first computer generating an ETag for the copy of the resource received from the second computer. The method further comprises the first computer comparing the generated ETag with the ETag corresponding to the cached version of the resource, and in response to determining that the generated ETag and the ETag corresponding to the cached version match, the first computer sending a response to the third computer indicating that the cached version of the resource is the same as the resource on the second computer.
The present invention will now be described in detail with reference to the Figures.
In the illustrated embodiment, distributed data processing system 100 comprises origin server computer 102, containing resource 104, and client computing devices 106, 108, and 110 connected to origin server computer 102 via a cell tower proxy server computer 112 and network 114.
In the depicted embodiment, origin server computer 102 is a web server containing content desired by a client computing device, i.e., resource 104. One example of resource 104 is an html file. Origin server computer 102 may be a computer system, a desktop, a notebook, a laptop computer, a tablet computer, a thin client, or any other electronic device or computing system capable of sending and receiving information via a network. In another embodiment, origin server computer 102 may represent a computing system utilizing clustered computers and components to act as a single pool of seamless resources when accessed through a network. This is a common implementation for datacenters and for cloud computing applications.
Respective client computing devices 106, 108, and 110 are, in one embodiment, handheld devices or smart-phones. In other embodiments, client computing devices 106, 108, and 110 may be any computer system capable of communicating with origin server computer 102.
Client computing devices 106, 108, and 110 communicate with origin server computer 102 via network 114. In the depicted embodiment, client computing devices 106, 108, and 110 are smart-phones connected to network 114 via a cell tower, where the cell tower contains the cell tower proxy server 112. In an alternate embodiment, client computing devices 106, 108, and 110 may connect directly to network 114.
Network 114 may include connections, such as wire, wireless communication links, or fiber optic cables, as well as at least one intermediate proxy server, e.g., proxy server computer 116, capable of relaying information between origin server computer 102 and one of client computing devices 106, 108, and 110. In some embodiments, a series of proxy servers may be utilized to make the connection. A person of ordinary skill in the art will understand that cell tower proxy server 112 may also be considered part of network 114.
In an exemplary operation scenario, a client computing device, such as client computing device 106, requests resource 104 from origin server 102. The request is relayed to cell tower proxy server 112. Cell tower proxy server 112 searches for a copy of resource 104 in cache 117, finds a copy of the resource, but cannot verify that the copy is current. The request is forwarded towards origin server 102. Proxy server 116 is a computer in the communication line between cell tower proxy server 112 and origin server 102, and hence, receives the request in turn. Similarly, even if proxy server 116 has a copy of resource 104 in its cache 118, the proxy server is unable to verify that the copy is current at this point, and forwards the request towards origin server 102. If origin server 102 does not support cache validation, then even if the copy of resource 104 in either cache 117 or 118 is current, the origin server returns a new copy of the resource, unnecessarily taking up bandwidth. Ordinarily, the new copy of resource 104 is relayed all the way back to client computing device 106. However, validation program 120 on proxy server computer 116 generates and applies ETags to resources independent of origin server computer 102. When a request is received at proxy server computer 116, validation program 120 keeps track of an ETag associated with the request, and upon subsequently receiving the new copy of the resource from origin server computer 102, generates an ETag on the new copy and compares it to the ETag associated with the request. If the ETags match, validation program 120 sends a “not modified” response back toward cell tower proxy server 112 without having to send the entire new copy of resource 104, thus saving bandwidth between proxy server 116 and at least cell tower proxy server 112.
A person of ordinary skill in the art will recognize that validation program 120 may run on any computer intermediate to an origin server computer and a client computing device. However, in the preferred embodiment, validation program 120 runs on a computer intermediate to an origin server computer and a low capacity network link between the proxy server computer executing the validation program and the client computing device. One such example is the link between a cell tower and a core network. Embodiments of the present invention recognize that the bandwidth of a core network, such as network 114, is typically many gigabits/sec. However, the link from a cell tower to the core network often only has a capacity of a few megabits/sec. With many mobile computing devices connecting through a cellular network, each cell tower may receive many times more bites per second than can be relayed to the core network, often causing a bottleneck. Validation program 120 prevents redundant data from being transferred over this lower capacity link.
Proxy server computer 116, executing validation program 120, can include internal and external components (depicted in
Validation program 120 executes on a proxy server computer in the line of communication between a client computing device and an origin server. Validation program 120 begins by receiving a request for a resource from the client computing device (step 202). The request may have gone through any number of intermediate computers prior to the proxy server computer that validation program 120 executes on, including, in the preferred embodiment, cell tower proxy server 112.
Validation program 120 determines whether the received request has a corresponding ETag (decision block 204). The ETag may have been added to a header of the request by the client computing device or any other intermediate computer prior to the proxy server computer that contains a copy of the requested resource in its cache, so that the contents of the cache may be validated. For example, the client computing device may have a copy of the desired resource and a corresponding ETag in a cache on the client computing device. Without knowing that the copy of the resource is valid, the client computing device sends a request for the resource with the ETag in a header to conditionally request the resource from the origin server if the server's copy of the resource is different from the client computing device's copy. In another example, the client computing device desires a resource and does not have a copy of the resource in a local cache. The client computing device unconditionally requests the resource. The request is received by an intermediate computer, such as cell tower proxy server 112, which does have a copy of the resource and a corresponding ETag in a local cache. Unable to determine if the cached copy is current, the intermediate computer forwards the request towards the origin server and includes the corresponding ETag in a header, making the request conditional upon the resource being different from the intermediate computer's copy. Now if the ETag is matched at a subsequent computer, a “not modified” response indicates to the intermediate computer that its copy is valid. The intermediate computer returns a copy of its now validated copy of the resource to the client computing device.
If the request received at the proxy server computer executing validation program 120 does not have a corresponding ETag (negative branch of decision 204), then validation program 120 assumes that this is a fresh or first request for the resource and, in response, forwards the request toward the origin server (step 206). There may be any number of intermediate computers between the origin server and the proxy server computer.
As mentioned previously, ETags are unique to the computer that created the ETags. In an alternative embodiment, validation program 120 also determines if a received ETag corresponding to the request was generated by the proxy server computer that validation program 120 resides on. If the ETag was not created by the proxy server computer, validation program 120 may treat the request as not having an ETag, proceeding to step 206 to forward to request (and any existing headers) toward the origin server.
After forwarding the request to the origin server, validation program 120 subsequently receives the resource from the origin server (step 208). Validation program 120 determines if the received resource has a corresponding ETag (decision block 210). A received resource not having an ETag indicates that the origin server does not validate web caches and that a full response is required from the origin server. Responsive to determining that the received resource does not have a corresponding ETag (negative branch of decision 210), validation program 120 generates an ETag for the resource, generally by using a hash across the content of the resource, and assigns the ETag to the resource (step 212). Validation program 120 sends the received resource, along with the corresponding ETag, towards the client computer device (step 214). If, on the other hand, validation program 120 determines that the received resource did have a corresponding ETag (positive branch of decision 210), validation program 120 skips the generation and assignment of an ETag in step 212, and sends the received resource and the received corresponding ETag towards the client computing device (step 214).
Validation program 120 caches a copy of the resource, including the ETag, at the proxy server computer for future use (step 216).
Returning back to decision block 204, if validation program 120 determines that the request received from the client computing device does have a corresponding ETag (positive branch of decision 204), validation program 120 searches the cache in the proxy server computer for the resource (step 218). Validation program 120 determines whether the cache contains a copy of the requested resource (decision block 220), and if the cache does not have a copy of the resource (negative branch of decision 220), may, in one embodiment, assume that the ETag was not assigned at the proxy server computer, and treat the request like a fresh request by proceeding to step 206.
If validation program 120 finds a copy of the requested resource in the cache (positive branch of decision 220), validation program 120 forwards the request to the origin server (step 222), subsequently receives the resource from the origin server (step 224), and generates an ETag based on the resource received from the origin server (step 226). Validation program 120 compares the generated ETag to the ETag corresponding to the request and determines if a match exists (decision block 228).
If validation program 120 determines that the generated ETag matches the ETag corresponding to the request (yes branch of decision 228), validation program 120 sends a response of “not modified” back towards the client computing device (step 230), allowing the cache with a current copy of the resource (either from an intermediate computer like cell tower proxy server 112 or the client computing device itself) to return to the client, or use (if it is the client) its own copy of the resource. Alternatively, if validation program 120 determines that the generated ETag does not match the ETag corresponding to the request (no branch of decision 228), validation program 120 assigns the generated ETag to the resource received from the origin server (step 232) and sends the resource and the generated ETag towards the client computer computing device (step 234). Validation program 120 caches a copy of the resource, including the generated ETag, for future use (step 236).
Proxy server 116 includes communications fabric 302, which provides communications between processor(s) 304, memory 306, persistent storage 308, communications unit 310, and input/output (I/O) interface(s) 312.
Memory 306 and persistent storage 308 are examples of computer-readable tangible storage devices. A storage device is any piece of hardware that is capable of storing information, such as, data, program code in functional form, and/or other suitable information on a temporary basis and/or permanent basis. Memory 306 may be, for example, one or more random access memories (RAM) 314, cache memory 316, or any other suitable volatile or non-volatile storage device.
Validation program 120 and web cache 118 are stored in persistent storage 308 for execution by one or more of the respective processors 304 via one or more memories of memory 306. In the embodiment illustrated in
The media used by persistent storage 308 may also be removable. For example, a removable hard drive may be used for persistent storage 308. Other examples include an optical or magnetic disk that is inserted into a drive for transfer onto another storage device that is also a part of persistent storage 308, or other removable storage devices such as a thumb drive or smart card.
Communications unit 310, in these examples, provides for communications with other computers and devices. In these examples, communications unit 310 includes one or more network interface cards. Communications unit 310 may provide communications through the use of either or both physical and wireless communications links. In another embodiment still, proxy server 116 may be devoid of communications unit 310. Validation program 120 may be downloaded to persistent storage 308 through communications unit 310.
I/O interface(s) 312 allows for input and output of data with other devices that may be connected to proxy server 116. For example, I/O interface 312 may provide a connection to external devices 318 such as a keyboard, keypad, a touch screen, and/or some other suitable input device. I/O interface(s) may also connect to a display 320.
Display 320 provides a mechanism to display data to a user and may be, for example, a computer monitor.
The aforementioned programs can be written in various programming languages (such as Java or C++) including low-level, high-level, object-oriented or non object-oriented languages. Alternatively, the functions of the aforementioned programs can be implemented in whole or in part by computer circuits and other hardware (not shown).
Based on the foregoing, a method, computer system, and computer program product have been disclosed for validating a web cache independent of an origin server. However, numerous modifications and substitutions can be made without deviating from the scope of the present invention. In this regard, each block in the flowcharts or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in each block may occur out of the order noted in the figures. Therefore, the present invention has been disclosed by way of example and not limitation.
Number | Name | Date | Kind |
---|---|---|---|
5699526 | Siefert | Dec 1997 | A |
5787470 | DeSimone et al. | Jul 1998 | A |
5878218 | Maddalozzo, Jr. et al. | Mar 1999 | A |
5991306 | Burns et al. | Nov 1999 | A |
6023726 | Saksena | Feb 2000 | A |
6112231 | DeSimone et al. | Aug 2000 | A |
6330561 | Cohen et al. | Dec 2001 | B1 |
7113935 | Saxena | Sep 2006 | B2 |
7451225 | Todd et al. | Nov 2008 | B1 |
7930402 | Swildens et al. | Apr 2011 | B2 |
8103783 | Plamondon | Jan 2012 | B2 |
8224964 | Fredrickson et al. | Jul 2012 | B1 |
8255570 | Samuels et al. | Aug 2012 | B2 |
8275829 | Plamondon | Sep 2012 | B2 |
8352605 | Samuels et al. | Jan 2013 | B2 |
8364785 | Plamondon | Jan 2013 | B2 |
8504775 | Plamondon | Aug 2013 | B2 |
8505057 | Rogers | Aug 2013 | B2 |
8615583 | Plamondon | Dec 2013 | B2 |
9055118 | Lobo | Jun 2015 | B2 |
20010051927 | London et al. | Dec 2001 | A1 |
20030115281 | McHenry et al. | Jun 2003 | A1 |
20030187917 | Cohen | Oct 2003 | A1 |
20030188106 | Cohen | Oct 2003 | A1 |
20050033926 | Dumont | Feb 2005 | A1 |
20050102282 | Linden | May 2005 | A1 |
20050193096 | Yu et al. | Sep 2005 | A1 |
20060123340 | Bailey et al. | Jun 2006 | A1 |
20060129533 | Purvis | Jun 2006 | A1 |
20060195660 | Sundarrajan | Aug 2006 | A1 |
20070047719 | Dhawan | Mar 2007 | A1 |
20070156852 | Sundarrajan | Jul 2007 | A1 |
20070250601 | Amlekar et al. | Oct 2007 | A1 |
20080195819 | Dumont | Aug 2008 | A1 |
20080228772 | Plamondon | Sep 2008 | A1 |
20080229017 | Plamondon | Sep 2008 | A1 |
20080256249 | Siress | Oct 2008 | A1 |
20080320151 | McCanne et al. | Dec 2008 | A1 |
20080320225 | Panzer | Dec 2008 | A1 |
20090010163 | Isomura | Jan 2009 | A1 |
20090083279 | Hasek | Mar 2009 | A1 |
20090083494 | Bhanoo et al. | Mar 2009 | A1 |
20090094417 | Carlson et al. | Apr 2009 | A1 |
20100002817 | Vrcelj et al. | Jan 2010 | A1 |
20100063995 | Chen | Mar 2010 | A1 |
20100107234 | Aldor | Apr 2010 | A1 |
20100281217 | Sundarrajan et al. | Nov 2010 | A1 |
20110023105 | Islam | Jan 2011 | A1 |
20110055021 | Haag | Mar 2011 | A1 |
20110191449 | Swildens et al. | Aug 2011 | A1 |
20110205585 | Mihara | Aug 2011 | A1 |
20110238828 | Grigsby et al. | Sep 2011 | A1 |
20110320523 | Chan et al. | Dec 2011 | A1 |
20120017034 | Maheshwari et al. | Jan 2012 | A1 |
20120042264 | Burckart et al. | Feb 2012 | A1 |
20120089781 | Ranade et al. | Apr 2012 | A1 |
20120226767 | Luna et al. | Sep 2012 | A1 |
20120271908 | Luna | Oct 2012 | A1 |
20120284356 | Luna | Nov 2012 | A1 |
20120290646 | Sundarrajan | Nov 2012 | A1 |
20120311648 | Swildens et al. | Dec 2012 | A1 |
20130086197 | Ho et al. | Apr 2013 | A1 |
20130086323 | Kadlabalu | Apr 2013 | A1 |
20130132472 | Sundarrajan | May 2013 | A1 |
20130159274 | Silberstein | Jun 2013 | A1 |
20130173756 | Luna et al. | Jul 2013 | A1 |
20130179545 | Bishop | Jul 2013 | A1 |
20130198313 | Hayton | Aug 2013 | A1 |
20130204961 | Fliam et al. | Aug 2013 | A1 |
20130254385 | Lyon | Sep 2013 | A1 |
20140019577 | Lobo et al. | Jan 2014 | A1 |
20140068402 | Mir et al. | Mar 2014 | A1 |
20140280522 | Watte | Sep 2014 | A1 |
Number | Date | Country |
---|---|---|
2006081032 | Aug 2006 | WO |
2008112770 | Sep 2008 | WO |
2010029081 | Mar 2010 | WO |
Entry |
---|
Hofmann et al., “Content Networking: Architecture, Protocols, and Practice”, Morgan Kaufmann Publishers is an Imprint of Elsevier, San Francisco, CA, USA, Copyright 2005 by Lucent Technology and Leland R. Beaumont, ISBN: 1-55860-834-6. |
Fielding et al., “HTTP/1.1: Protocol Parameters”, Part of Hypertext Transfer Protocol—HTTP/1.1, RFC 2616, section 3 [online], [retrieved on Dec. 9, 2011]. Retrieved from the Internet <URL: http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html>. |
Fielding et al., “HTTP/1.1: Caching in HTTP”, Part of Hypertext Transfer Protocol—HTTP/1.1, RFC 2616, section 13 [online], [retrieved on Dec. 9, 2011]. Retrieved from the Internet <URL: http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html>. |
Wikipedia, “HTTP ETag”, Published on: Feb. 17, 2011, Wikipedia, the free encyclopedia [online], [retrieved on Dec. 9, 2011]. Retrieved from the Internet <URL: http://en.wikipedia.org/w/index.php?title=HTTP—ETag&oldid=414457380>. |
Fielding et al., “HTTP/1.1: Header Field Definitions”, Part of Hypertext Transfer Protocol—HTTP/1.1, RFC 2616, section 14 [online], [retrieved on Jun. 29, 2012]. Retrieved from the Internet <URL: http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html>. |
Holt, “YouTube in Talks with Mobile Providers, Manufacturers Over Network Deal” scribbal.com, Jun. 8, 2011 [online], [retrieved on May 24, 2012]. Retrieved from the Internet <URL: http://www.scribbal.com/2011/06/youtube-in-talks-with-mobile-providers-manufacturers-over-network-deal/>. |
Knowlton, “Edge Caching for Media Delivery” learn.iis.net, Published: Mar. 18, 2009, Updated: Feb. 9, 2011 [online], [retrieved on May 24, 2012]. Retrieved from the Internet <URL: http://learn.iis.net/page.aspx/621/edge-caching-for-media-delivery/>. |
Potencier, “Caching on the Edge with Symfony2” slideshare.net, Jun. 24, 2010 [online], [retrieved on May 24, 2012]. Retrieved from the Internet <URL: http://www.slideshare.net/fabpot/caching-on-the-edge-with-symfony2>. |
U.S. Appl. No. 13/548,584, entitled “Intelligent Edge Caching”, filed Jul. 13, 2012. |
Yin et al., “Engineering Web Cache Consistency” ACM Transactions on Internet Technology, vol. 2, No. 3, Aug. 2002, pp. 224-259 [online], [retrieved on May 24, 2012]. Retrieved from the Internet <URL: http://www.cs.utexas.edu/˜lorenzo/papers/toit.pdf>. |
Michael J. Wright; “Constituencies for Users: How to Develop them by Interpreting Logs of Web Site Access”; KMi Knowledge Media Institute; AAAI Spring Symposium on Intelligent Agents in Cyberspace; Mar. 22-24, 1999; Stanford University California. |
Number | Date | Country | |
---|---|---|---|
20130198313 A1 | Aug 2013 | US |