 
                 Patent Application
 Patent Application
                     20090049243
 20090049243
                    Many companies use hub-and-spoke network arrangements in which satellite offices communicate with a central main office. Communications within a single office may traverse a local area network and be relatively fast while communications between a satellite office and the main office or another satellite office may traverse a relatively slower network. Furthermore, the bandwidth between the main office and the satellite offices may be relatively small and relatively expensive.
Briefly, aspects of the subject matter described herein relate to caching dynamic content. In aspects, caching components on a requesting entity and on a content server cache requested content. When a request for content similar to cached content is received, the requesting entity sends a request for the content and an identifier of similar cached content to the content server. The content server obtains the requested content and determines the differences between the requested content and the cached content. The content server then sends the differences to the requesting entity. The requesting entity uses the differences and its cached content to construct the requested content and provides the requested content.
This Summary is provided to briefly identify some aspects of the subject matter that is further described below in the Detailed Description. This Summary is not intended to identify key or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
The phrase “subject matter described herein” refers to subject matter described in the Detailed Description unless the context clearly indicates otherwise. The term “aspects” should be read as “at least one aspect.” Identifying aspects of the subject matter described in the Detailed Description is not intended to identify key or essential features of the claimed subject matter.
The aspects described above and other aspects of the subject matter described herein are illustrated by way of example and not limited in the accompanying figures in which like reference numerals indicate similar elements and in which:
    
    
    
    
    
  
Aspects of the subject matter described herein are operational with numerous other general purpose or special purpose computing system environments or configurations. Examples of well known computing systems, environments, and/or configurations that may be suitable for use with aspects of the subject matter described herein include, but are not limited to, personal computers, server computers, hand-held or laptop devices, multiprocessor systems, microcontroller-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.
Aspects of the subject matter described herein may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, and so forth, which perform particular tasks or implement particular abstract data types. Aspects of the subject matter described herein may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
With reference to 
Computer 110 typically includes a variety of computer-readable media. Computer-readable media can be any available media that can be accessed by the computer 110 and includes both volatile and nonvolatile media, and removable and non-removable media. By way of example, and not limitation, computer-readable media may comprise computer storage media and communication media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules, or other data. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile discs (DVDs) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by the computer 110. Communication media typically embodies computer-readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. The term “modulated data signal” means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media. Combinations of any of the above should also be included within the scope of computer-readable media.
The system memory 130 includes computer storage media in the form of volatile and/or nonvolatile memory such as read only memory (ROM) 131 and random access memory (RAM) 132. A basic input/output system 133 (BIOS), containing the basic routines that help to transfer information between elements within computer 110, such as during start-up, is typically stored in ROM 131. RAM 132 typically contains data and/or program modules that are immediately accessible to and/or presently being operated on by processing unit 120. By way of example, and not limitation, 
The computer 110 may also include other removable/non-removable, volatile/nonvolatile computer storage media. By way of example only, 
The drives and their associated computer storage media, discussed above and illustrated in 
The computer 110 may operate in a networked environment using logical connections to one or more remote computers, such as a remote computer 180. The remote computer 180 may be a personal computer, a server, a router, a network PC, a peer device or other common network node, and typically includes many or all of the elements described above relative to the computer 110, although only a memory storage device 181 has been illustrated in 
When used in a LAN networking environment, the computer 110 is connected to the LAN 171 through a network interface or adapter 170. When used in a WAN networking environment, the computer 110 typically includes a modem 172 or other means for establishing communications over the WAN 173, such as the Internet. The modem 172, which may be internal or external, may be connected to the system bus 121 via the user input interface 160 or other appropriate mechanism. In a networked environment, program modules depicted relative to the computer 110, or portions thereof, may be stored in the remote memory storage device. By way of example, and not limitation, 
As mentioned previously, companies that have nodes in various satellite offices may have network links that are relatively slow and have relatively small bandwidth. 
The server 205, nodes 220-221, and network access devices 215-216 may include cache components 235-239 which cache content received by the entities as described in more detail below.
Each of the server 205 and the nodes 220-227 may be implemented on or as one or more computers (e.g., the computer 110 as described in conjunction with 
Note that the terms “main office” and “branch office” are used for illustrative purposes. There is no intention to limit aspects of the subject matter described herein to companies with main offices and branch offices. Nor is there any intention to limit aspects of the subject matter described herein to hub-and-spoke type arrangements or to low bandwidth or high latency networks. Indeed, it will be recognized by those skilled in the art that aspects of the subject matter may be employed between any two entities connected by any type of network.
The network 230 (or at least the links from the entities to the network 230) may be a relatively slow and bandwidth limited network, although aspects of the subject matter described herein may also be applied to high speed and high bandwidth networks. When the network 230 is relatively slow and/or bandwidth limited, it may be more advantageous to minimize the traffic that crosses the network 230. In particular, if a node is able to obtain a cached copy of content without going over the network 230, greatly improved performance may result.
Dynamic web pages have posed a problem in caching content as by very definition, this type of content is dynamic and may change from request to request. In the past, dynamic web pages have not been cached or, if cached, have not been effective in increasing performance.
In accordance with aspects of the subject matter described herein, caching components on various nodes may cache content including dynamic and static web pages and then use the cached content to improve performance.
In particular, a node that is requesting content (hereinafter sometimes referred to as a “requesting node”) may request dynamic content (e.g., dynamic page A). In response, a caching component (sometimes referred to as a requesting cache) associated with a requesting node may send the request to a node that provides the content (hereinafter sometimes referred to as a “server node”). The server node may generate the content, have it cached, and send it back to the requesting cache, which may then cache the content and send it back to the requesting node.
Another or the same requesting node may request additional dynamic content (e.g., dynamic web page A′). The requesting cache may check its cache to determine that it has similar dynamic content already cached (e.g., dynamic web page A). The requesting cache may send a request for the additional dynamic content together with an identifier of the already cached dynamic content to a server node. The server node may then create the dynamic content, calculate the difference between the first dynamic content and the second dynamic content, and send a differences data structure back to the requesting cache. The requesting cache may use the cached first dynamic content together with the differences data structure to construct the second dynamic content. The requesting cache can then send the second dynamic content to the requesting node.
A differencing data structure may comprise a file, markup language such as XML, HTML, and so forth, a list of differences, and the like.
As a further improvement, the server node may attempt to further compress the differencing data structure using one or more compression algorithms before sending the differences data structure to the requesting node. The compression algorithms may be applied during or after the creation of the differencing data structure.
As another optimization, the server node may determine whether it is faster to send the differences or the entire second dynamic content. If it is faster to send the entire second dynamic content, it may be sent instead of the differences. For example, if a list of differences is actually greater in size than the second dynamic content, the second dynamic content may be sent instead of the differences.
Whether dynamic content is similar to what has been requested before may be determined via a URL, cookie, or some other mechanism. For example, dynamic content indicated by the URL of http://www.spaces.com/weather?Parameter1=ZipCodeY may be determined similar to dynamic content indicated by the URL of http://www.spaces.com/weather?Parameter1=ZipCodeX. In one embodiment, this similarity may be determined by examining the URL up to where the parameter starts (e.g., the “?”).
In one embodiment, an identifier for dynamic content may be calculated by performing a hash on the URL, content, portion thereof, or otherwise. In another embodiment, an identifier may be assigned by the requesting cache, the server node, or another component and may be communicated with requests or responses for content.
In one embodiment, the application server 210 may reside on a network that is local to the server 205. In another embodiment, the application server 210 may reside on a network external to the local network upon which the server 205 resides.
The network access devices 215 and 216 may comprise computers (e.g., such as the computer 110 as described in conjunction with 
The caching components 236 and 237 on the nodes 220 and 221, respectively, may be configured such that they intercept or are otherwise able to examine and act on communications with the network 230. For example, the caching components may reside in or be called from a networking stack that receives requests from the node for content external to the node. In this configuration, the caching component may examine each request and forward, receive responses, construct pages, cache responses, and so forth as described previously.
Although the environment described above includes a server, an application server, two network access devices, and nodes in various configurations, it will be recognized that more, fewer, or a different combination of these and other entities may be employed without departing from the spirit or scope of aspects of the subject matter described herein. Furthermore, the entities and communication networks included in the environment may be configured in a variety of ways as will be understood by those skilled in the art without departing from the spirit or scope of aspects of the subject matter described herein.
  
  
The cache 315 comprises any storage media capable of storing content. The term content should be read to include information, program code, program state, program data, other data, and the like. The cache 315 may comprise a file system, database, volatile memory such as RAM, other storage, some combination of the above, and the like and may be distributed across multiple devices. The cache 315 may be external or internal to the content server 305.
The communications mechanism 320 allows the content server 305 to communicate with nodes that seek to access content available from the content server 305 as well as nodes that provide content to the content server 305. The communications mechanism 320 may be a network interface or adapter 170, modem 172, or any other mechanism for establishing communications as described in conjunction with 
The differences component 325 has logic for calculating the differences between two pieces of content. The differences component 325 may calculate these differences and use a variety of mechanisms for representing these differences as will be understood by those skilled in the art without departing from the spirit or scope of aspects of the subject matter described herein. The differences component 325 may output differences between two pieces of content into a differences data structure.
The compression component 335 may take a first data structure or stream as input and provide a second data structure or stream that is compressed as output. For example, the compression component 335 may receive a differences data structure and may produce a compressed differences data structure. As will be understood by those skilled in the art, there are many compression algorithms that may be used by the compression component 335 to compress data. The selection of which compression algorithm to use may depend on the anticipated domain of data to be compressed. After a data structure is compressed, the compressed data structure may then be sent to a requester via the communications mechanism 320.
In one embodiment, the compression component 335 may be included in the differences component 325 such that the differences component 325 attempts to compress the output as it creates the differences. In another embodiment, the compression component 335 may be separate from the compression component 335.
The cache controller 330 may maintain and access the cache 315. Among its functions, the cache controller 330 may determine how to update, replace, and expire cache entries as well as search the cache for entries indicated by an identifier. The cache controller 330 may include a hashing function that hashes the content, an identifier of the content, or a portion of either the content or the identifier to form an index to the cache 315. In an embodiment, only a singled copy of similar dynamic content is cached in the cache 315. Thus, if content is requested including A, A′, and A″, where A, A′, and A″ are similar dynamic content, only one of the content is cached by the cache controller 330.
In another embodiment, the cache controller 330 may cache multiple variations of dynamic content. If it is anticipated that requesters (perhaps from different branches) may request identical dynamic content, this caching of variations may eliminate the need to request content from an application server, for example. A variation entry may be expired if they have not been accessed in a configurable amount of time.
In one embodiment, a variation may be replicated to caches in various branch offices, such that a branch cache will not have to make requests from a server at the main office if a requested variation was replicated and is already cached locally. To avoid some impact on the bandwidth usage caused by the replication, the server at the main office may replicate variation versions only to branches which requested the specified variations within a certain preceding timeframe.
The content generator 340 may receive a request for content and may send one or more requests to other entities to obtain portions of the content. After obtaining the portions of the content, the content generator 340 may assemble the content into a data structure formatted (e.g., HTML, XML, etc.) according to the requestor's request. In an embodiment, the content generator 340 may generate the content or a portion thereof based on data included on the content server 305 with or without obtaining content from other sources.
  
The cache 415 comprises any storage media capable of storing content. The cache 415 may comprise a file system, database, volatile memory such as RAM, other storage, some combination of the above, and the like and may be distributed across multiple devices. The cache 415 may be external or internal to the requester 405.
The communications mechanism 420 allows the requester 405 to request content and to provide identifiers associated with currently cached content. The communications mechanism 420 may be a network interface or adapter 170, modem 172, or any other mechanism for establishing communications as described in conjunction with 
The differences component 425 has logic for creating content from a differences data structure and content. As described previously, this allows the differences component 425 to construct requested content from existing cached content and a differences data structure.
The cache controller 430 may maintain and access the cache 415. Among its functions, the cache controller 430 may determine how to update, replace, and expire cache entries as well as search the cache for entries indicated by an identifier. The cache controller 430 may include a hashing function that hashes the content, an identifier of the content, or a portion of either the content or the identifier to form an index to the cache 415.
The compression component 435 may take a first data structure or stream as input and provide a second data structure or stream that is uncompressed as output. For example, the compression component 435 may receive a differences data structure and may produce an uncompressed differences data structure. After a data structure is uncompressed, the uncompressed data structure may then be sent to the differences component 425 to create content as described previously.
In one embodiment, the compression component 335 may be included in the differences component 325. In another embodiment, the compression component 335 may be separate from the compression component 335.
The proxy interface 440 may operate to receive requests for content from a node, application, or otherwise and to respond to the requests as appropriate. The proxy interface 440 may be structured to be seamless to the requester of the content. In other words, a requesting entity may not need to be modified to work with the proxy interface 440.
In one embodiment, the proxy interface 440 may comprise a component that resides in a network stack. This may be more advantageous for a requester that does not use a proxy network access device to request content. For example, this may be preferred in the nodes 220 and 221. Because it resides in the network stack, the proxy interface 440 may determine when a request is for content that the caching components 410 can expedite.
In another embodiment, the proxy interface 440 may comprise a component that receives and responds to requests from other nodes. For example, referring to 
  
Turning to 
At block 515, the request for the content is sent to a content server. Before the request is sent to the content server, a check may be performed to determine whether the content is cached locally. If so, the content may simply be returned without requesting it from the content server. In addition, a check may be made as to whether the request is for similar content. If so, the actions may continue at block 550. In the example shown in 
At block 520, the content is received from the content server. For example, referring to 
At block 525, the content is cached. For example, referring to 
At block 530, the content is sent to the requesting entity. For example, referring to 
At block 535, a request for similar content (e.g., A′) is received. Referring to 
At block 545, a determination is made that the content is likely to be similar to cached content. This may be done via examining the URL, cookie, or other data associated with the request and comparing this with cached identifiers. For example, referring to 
At block 555, an identifier of the cached content is obtained. For example, referring to 
Turning to 
At block 610, a response including the requested content is received. For example, referring to 
At block 615, the response is uncompressed if needed. If the server sends the response in compressed format (which may happen some or all of the time), the response may be uncompressed at block 615. For example, referring to 
At block 620 a determination is made as to whether differences were received. If so, the actions continue at block 625; otherwise, the actions continue at block 630. Recall that the server may determine that it is faster to send the actual requested content rather than the differences.
At block 625, the requested content is constructed using the differences and the cached content. For example, referring to 
At block 630, the requested content is provided to the requester. For example, referring to 
At block 635, the actions end.
Turning to 
At block 715, the server obtains the content from one or more applications servers if needed. For example, referring to 
Furthermore, although not shown, it is also possible that the server may have cached content that is suitable for the request and may have received the request in conjunction with an identifier of content cached by the requester. If so, the actions may continue at block 745.
At block 720, the content is cached. For example, referring to 
At block 725, a response is provided that includes the content. For example, referring to 
At block 730, a request for content and an identifier of content cached by a requester is received. For example, referring to 
At block 735, the content is obtained similarly to the actions described in conjunction with bock 715.
At block 740, the identifier is used to obtain content cached at the server. For example, referring to 
At block 745, differences between the cached content and the requested content are calculated. For example, referring to 
Turning to 
At block 810, the differences are placed in the response. At block 815, the requested content is placed in the response.
At block 820, the response is compressed. For example, referring to 
At block 825, the response is sent to the requester. For example, referring to 
At block 830, the actions end.
As can be seen from the foregoing detailed description, aspects have been described related to caching dynamic content. While aspects of the subject matter described herein are susceptible to various modifications and alternative constructions, certain illustrated embodiments thereof are shown in the drawings and have been described above in detail. It should be understood, however, that there is no intention to limit aspects of the claimed subject matter to the specific forms disclosed, but on the contrary, the intention is to cover all modifications, alternative constructions, and equivalents falling within the spirit and scope of various aspects of the subject matter described herein.