The present invention relates generally to data networks, and more particularly to a method and system for achieving load balancing for information distribution.
The traditional Internet web content delivery model consists of a user sending a request to a web server (i.e., website) for particular content stored on the web server. The user request is sent via web browser software (e.g., Microsoft Internet Explorer) operating on a client computer. The content is then delivered from the web server to the client computer, and displayed on the client computer via the web browser. The communication between the client computer and the website may be via the well know hypertext transfer protocol (HTTP). This request/delivery model is well known in the art for data communication via the internet.
Many websites, such as news websites, are constantly updating their content. This presents two problems in the context of the traditional web delivery model described above. First, users do not know when content or information has been updated at the website. Therefore, users do not know when to transmit a request to the website for the updated content. This results in either 1) users sending too many unnecessary requests for information when information has not been updated; or 2) users not sending enough requests and therefore not receiving updated information even though such updated information is available. A second problem with the traditional web delivery model is that users have no way of knowing whether website content is of any interest to them until after the entire content is downloaded. This results in wasted network resources (e.g., bandwidth and server processing) while users download large amounts of content that is of no interest to the user.
These deficiencies are addressed in the emerging web delivery model which is based on meta-data delivery using “real simple syndication” (RSS). In the RSS model, as illustrated in
Based on the subscribed-to channel, the client aggregator 108 periodically sends an update request 110 to the publisher web server 102 and the publisher returns a new version of the RSS file via RSS update 112. This RSS file is sometimes referred to as an RSS feed. The aggregator 108 then displays to the user the short descriptions of the new content items. The user may then review the short descriptions. If the user desires the full content for any of the new items, the user may then request the full content from the web server via a request 114. The publisher responds with the full content 116.
While solving some of the problems of the traditional web content delivery model, the RSS model also presents certain problems. The main problem is that as the RSS model becomes increasingly popular, there are significant server and bandwidth loads at the web server/publisher side. Millions of clients may be interested in a particular RSS information channel. This could result in the millions of clients periodically requesting new versions of the RSS file from the publisher website. There is no scalable way for the publisher website to handle this load of delivering RSS files to millions of clients.
The present invention provides an improved method and apparatus for delivering information of interests from content providers to clients via a data network. A network architecture in accordance with the principles of the invention provides for two types of edge servers, referred to herein as forward proxy servers and reverse proxy servers. The forward proxy servers are assigned to serve particular clients with respect to particular information and the reverse proxy servers are assigned to serve particular forward proxy servers with respect to particular information. In an advantageous embodiment, the forward proxy servers are located at the client edge of the network, and the reverse proxy servers are located at the content provider edge of the network.
In one embodiment, each of the forward proxy servers stores information identifiers associated with information for which the forward proxy server is assigned to serve to at least one client. Each of the reverse proxy servers stores information identifiers and the associated forward proxy servers that the reverse proxy server is assigned to serve with respect to information associated with the information identifiers.
Upon receipt of updated content, the reverse proxy servers send the updated content to those forward proxy servers that the reverse proxy server is assigned to serve with respect to the received updated content. The forward proxy servers then provide the updated content to the clients to which they are assigned, either by responding to a request from those clients or by pushing the information to those clients.
In an advantageous embodiment, load balancing is provided by a controller network node for controlling the assignments of clients to forward proxy servers and the assignments of forward proxy servers to reverse proxy servers. The controller node stores these assignments in a database in order to implement a load balancing policy of the system.
These and other advantages of the invention will be apparent to those of ordinary skill in the art by reference to the following detailed description and the accompanying drawings.
The network 226 may be any type of data network, for example the Internet. Network 226 is shown as a single network cloud for ease of illustration, but it should be understood that network 226 may be one or more interconnected networks as well. One skilled in the art will recognize that the various nodes of the network 226 communicate with each other via well known data networking communication links and techniques. These links are not shown in
Also shown in
The network architecture in accordance with an embodiment of the invention includes two types of edge proxy servers, called forward proxy servers (FPS) and reverse proxy servers (RPS). In the example shown in
First, with reference to the clients, each of the client aggregator applications stores a list of information channels which the particular client has subscribed to. For example, Client-1202 subscribes to channels A and B as shown in subscription table 206. Client-2208 subscribes to channels B and C as shown in subscription table 212. Client-3214 subscribes to channel C as shown in subscription table 218. Client-4220 subscribes to channel B as shown in subscription table 224. The channels subscribed to by a client indicate the information (e.g., RSS files) that a particular client is interested in receiving from various publishers.
In accordance with one aspect of the invention, each of the clients is assigned to one FPS with respect to a particular information channel. For example, in
Each of the FPSes stores a subscription table containing an identification of the information for which at least one client is assigned to that FPS. For example, FPS-1228 has clients assigned to it with respect to both information channel A and information channel B, and so FPS-1228 contains a subscription table 230 containing information identifiers identifying information channel A and information channel B. FPS-2232 has clients assigned to it with respect to information channel C and so FPS-2232 contains a subscription table 234 containing an information identifier identifying information channel C. FPS-3236 has a client assigned to it with respect to information channel B and so FPS-3236 contains a subscription table 238 containing an information identifier identifying information channel B. It is noted that an FPS will only have one entry in its subscription table, even though more than one client is assigned to that FPS with respect to the particular channel. For example, FPS-1228 has only one entry for information identifier B in subscription table 230, even though both Client-1202 and Client-2208 are assigned to FPS-1 with respect to information channel B. It is noted that the term information channel is used herein in order to describe the invention using terminology consistent with the RSS data delivery model. It is to be understood, however, that while one advantageous embodiment is to utilize the principles of the present invention in an RSS embodiment, the principles of the present invention may be applied to any type of data network information delivery system. As such, rather than using the term information channel, the term information identifier may be used to more generally describe an identifier used to identify some type of information of interest to clients. The term information channel will be used herein for consistency with RSS terminology, but it is to be understood that the invention is not limited to RSS embodiments.
Each channel subscription stored in an FPS is assigned to an RPS at the publisher edge of the network, and this assignment is stored in an RPS subscription table. For example, subscription channel A in FPS-1228 is assigned to RPS-1240 as represented by line 264. This assignment is further stored in RPS subscription table 242, where RPS-1240 stores the assignment of information channel A along with the associated FPS-1. Similarly, subscription channel B in FPS-1228 is assigned to RPS-2244 as represented by line 266, and this assignment is further stored in RPS subscription table 246, where RPS-2244 stores the assignment of information channel B along with the associated FPS-1. With respect to the subscription table 246 in RPS-2, it is noted that an identification of FPS-3236 is also stored in subscription table 246 associated with information channel B, because RPS-2 is also assigned to FPS-3236 with respect to information channel B, as represented by line 270. Finally, as shown in
The RPSes periodically retrieve updated information from the publisher websites and push that information to the FPSes. The RPSes retrieve this updated information for those information channels that are stored in their subscription tables. In an advantageous RSS model embodiment, this updated information retrieved from the publishers are RSS files containing meta-data describing additional content available from the publisher. For example, RPS-1240 has two information channels, A and C, stored in its subscription table 242. RPS-1242 will periodically send a request for information to publisher 248 to retrieve updated information regarding channel A. Upon receipt of this updated information, RPS-1240 will push this updated information to FPS-1228 as indicated in subscription table 242, where FPS-1 is shown associated with information channel A. Similarly, RPS-1240 will periodically send a request for information to publisher 250 to retrieve updated information regarding channel C. Upon receipt of this updated information, RPS-1240 will push this updated information to FPS-2232 as indicated in subscription table 242, where FPS-2 is shown associated with information channel C. RPS-2244 has one information channel, B, stored in its subscription table 246. RPS-2244 will periodically send a request for information to publisher 252 to retrieve updated information regarding channel B. Upon receipt of this updated information, RPS-2244 will push this updated information to both FPS-1228 an FPS-3236, as indicated in subscription table 246, where FPS-1 and FPS-3 are shown associated with information channel B.
The information pushed to the FPSes from the RPSes remains stored at the FPSes. Periodically, the clients request updated information from the FPSes assigned to them with respect to particular information channels. For example, aggregator 204 of Client-1202 will periodically send a request for information to FPS-1228 for updated information relating to both information channels A and B, because Client-1202 is assigned to FPS-1 for both of these information channels. Aggregator 210 of Client-2208 will periodically send a request for information to FPS-1228 for updated information relating to information channel B, and a request for information to FPS-2232 for updated information relating to information channel C, because Client-2208 is assigned to FPS-1228 with respect to information channel B and to FPS-2232 with respect to information channel C. In a similar manner, Client-3214 will request updated information relating to information channel C from FPS-2232 and Client-4220 will request updated information relating to information channel B from FPS-3236.
In the RSS model embodiment, upon receipt of the updated information (i.e., RSS file) at the clients, a user at the client may then determine if he/she wants to retrieve the full content identified by the meta-data in the RSS file.
The network of
The map server 254 also handles faults in the system. In accordance with one embodiment, each FPS and RPS may execute a software agent, which periodically sends a keep-alive message to the map server 254. This keep-alive message indicates to the map server that the network node that sent the message is functioning properly. If the map server 254 does not receive a keep-alive message from a particular node within some predetermined time period, then the map server 254 determines that the particular node has failed. In the case of node failure, the map server 254 may intelligently re-allocate client-to-FPS and FPS-to-RPS assignments to ensure continued operation of the content delivery system.
The use of FPSes at the client's edge of the network, and the use of RPSes at the publisher's edge of the network, provides for a scalable network architecture for implementing a content delivery system whereby large numbers of client requests for updated information can be accommodated.
In order to further describe the operation of a network configured in accordance with the present invention, and to further describe the subscription and content delivery process, an operational scenario will now be described in conjunction with
The aggregator 304, upon receipt of the assigned FPS, sends a subscribe request 314 requesting a subscription to the information channel identified by URL1. It is noted that the above described steps may be transparent to a user of Client 302, and that the user may merely indicate to aggregator 304 that the user wishes to subscribe to a particular information channel. The aggregator 304 automatically generates and sends the getFPS message, receives the FPS assignment, and generates and sends the subscribe request to the assigned FPS.
FPS1312 then adds URL1 to its subscription table 313. FPS1312 then sends a getRPS request 316 to the map server 308 requesting that the map server 308 assign an RPS with respect to the information channel identified in the request. In this case, the getRPS request would be “getRPS(URL1)”. Based on current assignments and the load balancing policy, the map server 308 determines an assigned RPS and transmits an identification of the assigned RPS to FPS1 as message 318. In this example, the map server 308 replies with RPS1320 as the assigned RPS. The map server 308 also adds a record 322 to its subscription database 324 indicating the assignment of [URL1, FPS1, RPS1].
FPS1312, upon receipt of message 318, forwards the subscription to RPS1320 via message 326. Upon receipt of message 326, RPS1320 adds [URL1,FPS1] to its subscription table 328 indicating that RPS1320 is assigned to serve FPS1312 with respect to the information channel identified by URL1. RPS1320 will periodically perform a conditional get command with respect to the content identified by URL1. A conditional get is part of the well known hypertext transport protocol (HTTP). The request “conditionalGet(URL)” is a request for the recipient to return information identified by the URL only if the content has changed within some time period specified as a parameter in the conditional get command. Thus, RPS1320 periodically sends the conditional get request conditionalGet(URL1) 330 to publisher web server 332.
When the conditional get request parameters are satisfied (i.e., new content is available), then the publisher web server 332 sends the new content associated with URL1 to RPS1320 as represented by 334. In the RSS embodiment, the new content would be an updated RSS file containing meta-data describing further content available from publisher 332. Upon receipt of the new content 334, RPS1320 recognizes that the content is identified by URL1, and performs a look-up in its subscription table 328 to determine to which FPSes it is assigned with respect to URL1. As shown in subscription table 328, RPS1320 is assigned to FPS1 with respect to URL1. RPS1320 then pushes the new information content to FPS1312 via pushContent(URL1) message 336.
At the client side, the aggregator 304 of Client 302 periodically polls FPS1312 via a conditionalGet(URL1) command 338 to determine if updated content is available. If new content is available, then the aggregator 304 receives the new content from FPS1312 via message 340. As an alternative, FPS1312 could push the new content to Client 302 upon receipt from RPS1320. In such an alternate embodiment, FPS-1312 would also store an identification of Client 302 in subscription table 313 associated with URL1.
The various elements shown
Of course, as would be recognized by one skilled in the art, the configuration of hardware and software of an appropriate device will vary depending upon which of the network components is being implemented. In one embodiment, the FPSes communicate with the clients using web services, and communicate with the RPS and map server using TCP/IP sockets. Each FPS runs a server, which listens for subscribe messages from clients and content messages from RPSes. Similarly, the RPSes communicate with the FPSes and map server using TCP/IP sockets. Each RPS also runs a server, which waits for subscribe messages from the FPSes. The map server also communicates with the clients using web services and communicates with the FPSes and RPSes using TCP/IP sockets.
The foregoing Detailed Description is to be understood as being in every respect illustrative and exemplary, but not restrictive, and the scope of the invention disclosed herein is not to be determined from the Detailed Description, but rather from the claims as interpreted according to the full breadth permitted by the patent laws. It is to be understood that the embodiments shown and described herein are only illustrative of the principles of the present invention and that various modifications may be implemented by those skilled in the art without departing from the scope and spirit of the invention. Those skilled in the art could implement various other feature combinations without departing from the scope and spirit of the invention. For example, while the present invention has been described in large part in the context of an RSS data delivery model, the invention is not so limited. The invention is applicable to any type of content delivery in a network.