Methods and systems for caching data communications over computer networks

Information

  • Patent Grant
  • 8990354
  • Patent Number
    8,990,354
  • Date Filed
    Monday, May 21, 2012
    12 years ago
  • Date Issued
    Tuesday, March 24, 2015
    9 years ago
Abstract
A computer-implemented method and system for caching multi-session data communications in a computer network.
Description
BACKGROUND

The present application relates generally to the caching of data communications over computer networks such as, e.g., the Internet, a local area network, a wide area network, a wireless network, and others.


Caching of data communications over computer networks is a well-known network optimization technique, affording improvement of application performance and optimal utilization of network resources through storing and delivering popular content close to end users.


Content caching solutions have traditionally focused on caching of client-server communications, e.g., Web browsing or streaming sessions, where the cache intermediates delivery of content objects (e.g., text files and images in case of Web browsing) from server to client.


The content applications supported by the caching solutions are designed to support caching; they do not utilize end-to-end encryption of the data session, and they have optional client-side explicit support for caching and utilize well-known data ports (tcp/80 for HTTP, tcp/1935 for RTMP, etc.).


The data sessions established by Web browsing and streaming applications are atomic. Each such session incorporates all information needed for the cache to identify a content query, content object (or portion of it) requested, and address of the content source where the object may be maintained.


The traditional caching solutions accordingly implement the following caching methodology:

    • (a1) receiving the data session from the client or (a2) identifying and intercepting the data session between the client and content source, using well-known TCP or UDP port or ports or through Layer7 analysis of the data protocol, using a redirecting network element or otherwise;
    • (b) parsing the data protocol used by the client to identify a data query within the session;
    • (c) identifying a unique data object (or portion of it) requested by the client; and
    • (d1) matching the data query with a data response stored in cache and sending the response to the client or (d2) propagating the data query to the server (content source), receiving the response from the server, optionally storing the response in the cache, and sending the response to the requesting client.


In recent years, Internet applications have evolved in functionality and complexity, using dynamic content object identifiers (e.g., HTTP URLs) that survive in the scope of one download session only, supporting transfer of the same content object over several concurrent sessions, from multiple content sources, involving multiple dynamic ports, involving end-to-end encryption of the data sessions. These new applications include multiple types of peer-to-peer (P2P) applications for file sharing and streaming, adaptive bitrate protocols for delivery of video over HTTP, HTTP download accelerators, and software update services such as Microsoft Windows Update.


The peer-to-peer applications typically implement a mechanism of “peer discovery” where the client application accesses the P2P network, queries the network to discover content sources that may offer the content object, and subsequently establishes data sessions with these content sources, with some of these sessions using end-to-end encryption.


It is a common practice for content sources in P2P networks to use dynamic rather than static “well-known” ports.


P2P applications can employ encryption of the session in such a way as to avoid detection by network elements, using Layer7 criteria for session identification.


As a result, traditional caching solutions cannot identify and intercept these data sessions, nor are they able to parse the data protocol to identify the data query, due to the encryption.


Non-P2P applications (e.g., download accelerators, adaptive bitrate video clients, software update services, and others) commonly establish multiple sessions to arrange retrieval of the same content object, where each separate session does not offer all the information needed for the cache to identify the requested data object and/or match a data request with a data response.


The features exhibited by these new applications obviate traditional caching methodology. It would be desirable to provide alternative approaches to content caching to support such new applications.


BRIEF SUMMARY OF THE DISCLOSURE

In accordance with one or more embodiments, a computer-implemented method of caching multi-session data communications in a computer network is provided, including the steps of: (a) receiving, intercepting, or monitoring one or more data sessions between a client executing a multi-session application for retrieving a desired content object and one or more metadata services, said client communicating with the one or more metadata services to discover metadata for the content object; (b) analyzing queries and responses exchanged between the client and the one of more metadata services to discover metadata for the content object; (c) receiving or intercepting subsequent data sessions between the client and content sources; (d) identifying a data protocol used by the client and identifying data queries within the data sessions; (e) identifying the content object or portions thereof requested by the client in the data queries; and (f) determining if the content object or portions thereof are stored in cache and, if so, sending the content object or portions thereof stored in cache to the client, and, if not, sending the data queries to the content sources, storing data responses from the content sources, and sending the data responses to the client.


In accordance with one or more embodiments, a computer-implemented caching service is provided for caching multi-session data communications in a computer network. The caching service is configured to: (a) receive, intercept, or monitor one or more data sessions between a client executing a multi-session application for retrieving a desired content object and one or more metadata services, said client communicating with the one or more metadata services to discover metadata for the content object; (b) analyze queries and responses exchanged between the client and the one of more metadata services to discover metadata for the content object; (c) receive or intercept subsequent data sessions between the client and content sources; (d) identify a data protocol used by the client and identify data queries within the data sessions; (e) identify the content object or portions thereof requested by the client in the data queries; and (f) determine if the content object or portions thereof are stored in cache and, if so, send the content object or portions thereof stored in cache to the client, and, if not, send the data queries to the content sources, store data responses from the content sources, and send the data responses to the client.





BRIEF DESCRIPTION OF THE DRAWINGS


FIG. 1 is simplified diagram illustrating deployment of a caching service in accordance with one or more embodiments.



FIG. 2 is a simplified diagram illustrating deployment of a caching service in accordance with one or more alternate embodiments.





DETAILED DESCRIPTION

In accordance with various embodiments, a service is provided for caching of applications that utilize multiple sessions for retrieval of same content object (e.g., file or stream).


The multi-session applications supported by the caching service can include:

    • (a) applications that utilize one or more sessions to discover information about a content object (hereinafter “content object meta-data”), that identifies the content sources that the application contacts to retrieve the content object, data protocols used to do so, and data queries used to retrieve the object.
    • (b) applications that utilize multiple sessions to retrieve the content object, passing information necessary for object identification only in some of the sessions.


      (a) Multi-Session Applications Utilizing Content Object Meta-Data for Content Object Retrieval



FIG. 1 illustrates an exemplary network architecture illustrating use of a caching service In accordance with one or more embodiments. Client A1 establishes multiple sessions to one or more meta-data services M on a network, sends data queries to retrieve content object meta-data for content object Z1, and receives one or more responses from the meta-data services M.


The content object meta-data includes at least one variable, selected from the following:

    • (i) addresses of content source(s);
    • (ii) protocols supported by an individual content source;
    • (iii) encryption keys, per object or per individual content source; and
    • (iv) content object structure.


The content source address can be identified through an IP address, e.g., using IPv4 IP address 1.1.1.1 or IPv6 address fe80::200:f8ff:fe21:67cf, or using a domain name, e.g., cachel2.bos.us.cdn.net, that can be resolved to IP address using Domain Name System (DNS).


The content source address can use either implicitly named port number for applications using well-known protocol ports (e.g., port tcp/80 used by HTTP protocol) or name ports explicitly.


The content source address can be identified in conjunction with protocols supported by it, including, but not limited to, using universal resource locators (URL), as defined in RFC1738, that specifies protocol, content source address, port and remote path to the object.


The content object structure information includes information allowing client A1 to form data queries for parts of the object and to verify correctness of data responses received in response to such queries.


The content object structure information includes information pertaining to parts comprising the objects, e.g., “pieces” used by Bittorrent protocol, “parts” used by eDonkey P2P protocol or “playback levels” used in adaptive bitrate streaming protocols, such as Microsoft Silverlight Smooth Streaming, Adobe HTTP Dynamic Streaming, Apple HTTP Live Streaming, among others.


The information about content objects parts includes at least one of the following: enumeration of parts of the content object, length of each part, data checksum of each part, availability of parts at a specific content source, where the content source is identified using content source addresses as defined in [0023-0025] above.


The meta-data including all or some of the above information can be stored in a separate file with a pre-defined structure, e.g. a torrent file for Bittorrent or a manifest file used by Microsoft Silverlight smooth streaming.


The meta-data services M offering content object meta-data may include dedicated network servers designed to support delivery of a specific application or one or more content objects (e.g., Bittorrent trackers, ED2K servers, etc.), generic search engines (Google, Microsoft Bing, or others), a network of computer nodes that collectively stores the meta-data (e.g. distributed hash table networks used by P2P applications), or other clients that participate in distributed content source discovery networks (e.g., distributed hash table networks), or other clients that are downloading and/or serving the content object Z1 and maintain meta-data related to it.


Client A1 may use multiple meta-data services M to discover content object meta-data, where one service M1 can provide part of the content object meta-data and optionally point to another service M2 to provide another part.


Thus, for example, client A1 may retrieve a torrent file from a Bittorrent search engine that includes the content object data structure information as well as URL of a Bittorrent tracker that provides the information of currently active content source addresses.


Client A1 may continue to send data queries to meta-data services M during download of content object Z1 or portions of it, for purposes of identification of new content sources and/or content object structure information (for example, in case of object Z1 being a live stream, of which new parts become continuously available).


In accordance with one or more embodiments, the caching service C receives and stores data queries and/or responses exchanged between client A1 and one or more meta-data services M.


In accordance with one or more embodiments, the caching service C intercepts the sessions between A1 and M, either by being in data path between A1 and B, or through use of one or more dedicated redirection devices (e.g., a load balancer, a router, a DPI device, etc.) that sit in data path and redirect specific data sessions to the caching service C, and relays the data queries and responses between A1 and M.


In accordance with one or more embodiments, the caching service C modifies at least one of the meta-data responses provided by the meta-data service M, e.g., to indicate the caching service C as a content source or as a meta-data service for the content object Z1.


In accordance with one or more embodiments, the caching service C receives a copy of communications between the client A1 and the meta-data services M, using an optical tap, mirror port or other device replicating network traffic.


In accordance with one or more embodiments, the caching service C receives the data queries related to content object Z1 from client A1 by virtue of offering at least one of the meta-data services M.


In accordance with one or more embodiments, the caching service C subsequently queries the meta-data services M itself for meta-data related to content object Z1, and receives and stores the responses.


In accordance with one or more embodiments, the caching service C continuously analyzes the queries and responses exchanged between at least one client A1 and the meta-data services M, as well as the responses received by the caching service C directly from the meta-data services M, as described above.


As a result, the caching service C maintains content object meta-data Mz for at least one content object Z1 that client A1 is retrieving.


In accordance with one or more embodiments, the caching service C stores meta-data responses as part of meta-data Mz in conjunction with the most recent time the response was received by C.


The caching service C subsequently periodically discards any responses that were received more than some time ago based on time-out.


In accordance with one or more embodiments, the caching service monitors meta-data requests and responses and discards any stored responses that contradict meta-data responses received later.


Following retrieval of meta-data pertaining to the content object Z1, the client A1 and at least one of content sources B1 discovered by the client A1 using the meta-data services M, start establishing data sessions with each other, for purpose of retrieving content object Z1 or part of it by A1.


In accordance with one or more embodiments, the caching service C intercepts the data sessions S1 established between the client A1 and the content sources B1.


In accordance with one or more embodiments, the caching service C intercepts the data sessions either by being in a data path between A1 and B1, or through use of one or more dedicated redirection devices (e.g., load balancer, router, DPI device, etc.) that sit in data path and redirect specific data sessions to the caching service C.


In accordance with one or more embodiments, the caching service C intercepts only such sessions that have been established between A1 and such content sources B1′, that match the meta-data Mz stored for the object Z1 by the caching service C.


In accordance with one or more embodiments, the client A1 establishes at least one session S2 with the caching service C, which is identified by the client A1 as one of the content sources for the content object Z1.


In accordance with one or more embodiments, the caching service C utilizes at least one of the following protocols to interpret data queries and data responses in the session S1 between the client A1 and content source S1:

    • (i) data protocols associated with the client A1, as part of meta-data Mz, as described above;
    • (ii) data protocols associated with the session S1, as part of meta-data Mz, as described above; and
    • (iii) data protocols identified by the caching service C when analyzing the data queries and responses received in the session S1, using signature-based or other generic protocol identification technique.


In accordance with one or more embodiments, the caching service C utilizes similar approach for session S2.


In accordance with one or more embodiments, when failing to identify data protocol of session S1 and S2, using method described in [0048], the caching service C may apply at least one of encryption keys K, stored by C as part of the meta-data Mz, to establish an encrypted session with either client A1, or content source B1, or both.


The encryption keys K may be associated with the content object Z (e.g., in Bittorrent the hash identifier of object Z is used for encryption of sessions between Bittorrent peers), or specific content sources.


In accordance with one or more embodiments, following establishment of data session with client A1 and identification of the protocol used in this session, the caching service C receives data query Q1 for object Z1 or portion of it from the client A1.


In accordance with one or more embodiments, the caching service C identifies a response matching the query, using the meta-data Mz associated with the content object Z1 as described above.


For example, if the client A1 requests a chunk of 500 Kbps playback level of content object Z1, available over Microsoft Silverlight smooth streaming protocol, that starts at offset 0, without identification of the end offset, the caching service C may use the meta-data Mz describing the object Z1, to identify the end offset.


In accordance with one or more embodiments, if the matching response R1 to the query Q1 is stored by the caching service C, C delivers the response to the end client A1.


In accordance with one or more embodiments, the caching service C may use the stored meta-data Mz associated with the content object Z to verify the validity of the data response R1, before sending it to the client A1.


In accordance with one or more embodiments, when a matching response to the query Q1 is not available at the caching service C and the query Q1 has been sent as part of session S1 between the client A1 and the content source B1, the caching service C forwards the query to retrieve such response from the content source B1, receives and optionally stores the response and relays the response to the client A1.


In accordance with one or more embodiments, when a matching response to the query Q1 is not found at the caching service C, the caching service C sends data query Q1′ allowing it to respond to the data query Q1 to at least one of content sources B, identified by C as carrying the content object Z, based on the meta-data Mz stored by C.


Subsequently, the caching service C receives the responses R1′ for these queries, stores them and optionally verifies their validity against the meta-data Mz, and delivers response to the query Q1 to the client A1.


In accordance with one or more embodiments, when a matching response to the query Q1 is not found at the caching service C, C may redirect the client A1 to one of content sources B for the content object Z, as stored by the caching service in the meta-data Mz.


(b) Multi-Session Applications Allowing Identification of Content Object Only in Some Sessions


Client A2 establishes multiple sessions S2 to one or more destinations B2 to retrieve content object Z2, in parallel or in series. The client A2 sends data queries for portions of the content object Z2 in each such session.


Depending on the naming convention for the content object Z and/or its parts, used by client A2 and destination(s) B2, the caching service C, intercepting or receiving sessions S2, may not be able to identify the content object and/or portions of it requested by client A2 in each session, or identify data responses matching those queries.


The client A2 and content source(s) B2 may use dynamic URL (so-called “hashed URLs”) to identify object Z2 that is assigned uniquely for each download of the content object Z2. In this case caching service C cannot rely on the data in the data query alone to identify a matching response, but rather analyzes data responses to identify the requested object and match it to the previously stored data responses.


According to one or more embodiments, when receiving such data queries and/or responses in one or more sessions S2 that allow identification of the content object Z, C stores the content object Z2 identification together with the IP address of client A2, the IP address of content source B2, and the dynamic content identification (e.g. URL) used by client A2, in a list L2.


According to one or more embodiments, when caching service C receives a data query and/or data response that does not allow it to identify the content object Z referenced in the query and/or response, caching service C establishes whether the IP address of client A2, dynamic content identification URL, and IP address of content source B2 are stored in list L2.


According to one or more embodiments, in case of applications that utilize multiple content sources, the caching service C may disregard the IP address of content source B2.


According to one or more embodiments, caching service C removes entries from list L4 based on the timeout since last activity seen by client A2, related to content object Z2.


The processes of the caching service described above may be implemented in software, hardware, firmware, or any combination thereof. The processes are preferably implemented in one or more computer programs executing on a programmable device including a processor, a storage medium readable by the processor (including, e.g., volatile and non-volatile memory and/or storage elements), and input and output devices. Each computer program can be a set of instructions (program code) in a code module resident in the random access memory of the device. Until required by the device, the set of instructions may be stored in another computer memory (e.g., in a hard disk drive, or in a removable memory such as an optical disk, external hard drive, memory card, or flash drive) or stored on another computer system and downloaded via the Internet or other network.


Having thus described several illustrative embodiments, it is to be appreciated that various alterations, modifications, and improvements will readily occur to those skilled in the art. Such alterations, modifications, and improvements are intended to form a part of this disclosure, and are intended to be within the spirit and scope of this disclosure. While some examples presented herein involve specific combinations of functions or structural elements, it should be understood that those functions and elements may be combined in other ways according to the present disclosure to accomplish the same or different objectives. In particular, acts, elements, and features discussed in connection with one embodiment are not intended to be excluded from similar or other roles in other embodiments.


Additionally, elements and components described herein may be further divided into additional components or joined together to form fewer components for performing the same functions. For example, the caching service may comprise one or more physical machines, or virtual machines running on one or more physical machines. In addition, the caching service may comprise a cluster of computers or numerous distributed computers that are connected by the Internet or another network.


Accordingly, the foregoing description and attached drawings are by way of example only, and are not intended to be limiting.

Claims
  • 1. A computer-implemented method of caching multi-session data communications in a computer network, comprising the steps of: (a) receiving, intercepting, or monitoring a plurality of data sessions between a client executing a multi-session application for retrieving a desired content object and content sources providing portions of the content object;(b) identifying a data protocol used by the client and identifying data queries within the data sessions;(c) identifying the content object or portions thereof requested by the client from some but not all the data queries and data responses thereto; and(d) determining, using information from step (c), if the content object or portions thereof are stored in cache and, if so, sending the content object or portions thereof stored in cache to the client.
  • 2. The method of claim 1, wherein step (d) further comprises determining, using information from step (c), if the content object or portions thereof are stored in cache and, if not, sending the data queries to the content sources, storing data responses from the content sources, and sending the data responses to the client.
  • 3. The method of claim 1, wherein the content object or portions thereof are identified by a dynamic content identification URL.
  • 4. The method of claim 1, further comprising storing in a list an identification of the content object and an IP address of the client.
  • 5. The method of claim 4, further comprising storing in the list an IP address of one or more content sources and the dynamic content identification URL used by the client.
  • 6. The method of claim 4, wherein when a data query and/or data response received does not enable identification of the content object referenced in the data query and/or data response, determining whether the IP address of the client is stored in the list.
  • 7. The method of claim 4, wherein when a data query and/or data response received does not enable identification of the content object referenced in the data query and/or data response, determining whether the identification of the content object is stored in the list.
  • 8. The method of claim 4, further comprising removing entries from the list based on a given timeout since last activity seen by the client related to the content object.
  • 9. The method of claim 1, further comprising establishing an encrypted session with the client and/or the content sources.
CROSS REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. patent application Ser. No. 13/341,619, filed on Dec. 30, 2011, issued as U.S. Pat. No. 8,185,612, entitled METHODS AND SYSTEMS FOR CACHING DATA COMMUNICATIONS OVER COMPUTER NETWORKS, which claims priority from U.S. Provisional Patent Application No. 61/428,538, filed on Dec. 30, 2010, entitled METHODS AND SYSTEMS FOR CACHING DATA COMMUNICATIONS OVER COMPUTER NETWORKS, both of which are hereby incorporated by reference.

US Referenced Citations (51)
Number Name Date Kind
3289358 Longobardi Dec 1966 A
5764982 Madduri Jun 1998 A
5778185 Gregerson et al. Jul 1998 A
5852717 Bhide et al. Dec 1998 A
5884046 Antonov Mar 1999 A
5907678 Housel, III et al. May 1999 A
5950198 Falls et al. Sep 1999 A
5950205 Aviani, Jr. Sep 1999 A
5960404 Chaar et al. Sep 1999 A
5987376 Olson et al. Nov 1999 A
6003045 Freitas et al. Dec 1999 A
6018780 Fenchel Jan 2000 A
6085251 Fabozzi, II Jul 2000 A
6105029 Maddalozzo, Jr. et al. Aug 2000 A
6185598 Farber et al. Feb 2001 B1
6199107 Dujari Mar 2001 B1
6219786 Cunningham et al. Apr 2001 B1
6240461 Cieslak et al. May 2001 B1
6263371 Geagan, III et al. Jul 2001 B1
6286084 Wexler et al. Sep 2001 B1
6304914 Deo et al. Oct 2001 B1
6339785 Feigenbaum Jan 2002 B1
6366907 Fanning et al. Apr 2002 B1
6424992 Devarakonda et al. Jul 2002 B2
6434608 Desai Aug 2002 B1
6460087 Saito et al. Oct 2002 B1
6622157 Heddaya et al. Sep 2003 B1
6742023 Fanning et al. May 2004 B1
6865600 Brydon et al. Mar 2005 B1
6940835 Reza et al. Sep 2005 B2
6976165 Carpentier et al. Dec 2005 B1
7010578 Lewin et al. Mar 2006 B1
7043644 DeBruine May 2006 B2
20020010866 McCullough et al. Jan 2002 A1
20020049760 Scott et al. Apr 2002 A1
20030005040 Kukkal Jan 2003 A1
20030046409 Graham Mar 2003 A1
20030140159 Campbell et al. Jul 2003 A1
20030195940 Basu et al. Oct 2003 A1
20050102427 Yokota et al. May 2005 A1
20050132049 Inoue et al. Jun 2005 A1
20060168088 Leighton et al. Jul 2006 A1
20060212584 Yu et al. Sep 2006 A1
20070198726 Marco et al. Aug 2007 A1
20080005349 Li et al. Jan 2008 A1
20080046596 Afergan et al. Feb 2008 A1
20080281908 McCanne et al. Nov 2008 A1
20090177778 Turk Jul 2009 A1
20100005171 Arolovitch Jan 2010 A1
20110173345 Knox et al. Jul 2011 A1
20120011324 Fung et al. Jan 2012 A1
Foreign Referenced Citations (1)
Number Date Country
1011244 Jun 2000 EP
Non-Patent Literature Citations (7)
Entry
Oversi, “P2P Content Delivery Solutions,” White Paper, Jul. 2007 [retrieved Feb. 2, 2009] Retrieved from the Internet. <URL: http://www.oversi.com/images/stories/white—paper—july.pdf> Entire document, especially: p. 5, para 4; p. 6, para 1-2; pg. 8, para 1, Fig; 4; p. 9, para 1.
Nortel, “Delivering Application Availability, Performance and Security,” Product Brief, Copyright 2007 Nortel Networks, [retrieved Feb. 2, 2009] Retrieved from the Internet. <URL: http://www.pcerapp.com/docs/Nortel—Application—Switch.pdf> Entire document, especially: p. 1, col. 1, para 2-3; p. 2, col. 2, para 1.
Xie, H., “P4P: Proactive Provider Assistance for P2P,” Yale Computer Science YALE/DCS/TR1377, May 25, 2007 [retreived Feb. 2, 2008] Retrieved from the Internet. <URL: http://codex.cs.yale.edu/avi/home-page/p4p-dir/talks/P4PVision—NYP2P.ppt> Entire document, especially: p. 6-11, 14, 19.
Barish, G., et al., “World Wide Web Caching Trends and Techniques,” IEEE Communications Magazine, May 2000, pp. 178-185.
Squid Web Proxy Cache, “Transparent Caching/Proxying,” Squid Frequently Asked Questions, Jun. 19, 2000, XP002352177, www.squid-cache.org, retrieved Nov. 2, 2005 from http://web.archive.org/web/20000619163655/http://www.squidcache. ora/DocIFAQ/FAQ-17.html.
Stevens, et al., Professional Computing Series TCP/IP Illustrated vol. 1: The Protocols,1994, pp. 1-20.
International Search Report and Written Opinion for PCT/US2011/068204, dated Jul. 30, 2012.
Related Publications (1)
Number Date Country
20120233284 A1 Sep 2012 US
Provisional Applications (1)
Number Date Country
61428538 Dec 2010 US
Continuations (1)
Number Date Country
Parent 13341619 Dec 2011 US
Child 13476574 US