The present invention relates generally to data networking, and more particularly to content based data packet routing.
Convention routing of data packets in an internet protocol (IP) network is well known. In typical IP routing, a data packet contains a destination address which is the IP address of the ultimate intended destination of the data packet. When a data packet arrives at a network router, the router determines the next router on the path (i.e., the next hop) based on the packet's destination address, and transmits the packet to the next router. One particular type of IP routing uses routing tables to determine a packet's next hop. A routing table contains a list of IP addresses (or more likely IP address ranges) and an associated next hop for each of the IP address ranges. When a data packet is received, the routing table matches the destination address to an appropriate IP address range in its routing table, and transmits the data packet to the next hop as identified in the routing table. The routing table address ranges are often represented as IP address prefixes, and one technique for matching IP packet destination addresses to these IP address prefixes is called longest prefix matching. Routing data packets to their destination address in general, and longest prefix matching in particular, are both well known in the art of data networking.
Another type of routing is referred to herein as content based routing, wherein a data packet is routed based on its content, rather than a pre-specified particular destination address. This type of routing is useful in publish/subscribe systems in which users may subscribe to certain types of information, while content providers publish the information to the network. This type of system allows users to define the type of information they are interested in by subscribing to particular content. Content providers then publish their content to the network without any particular indication as to which users are to receive the content. By matching user subscriptions with content provider publications, content is disseminated through the network and users receive only the content to which they have subscribed.
Filtering and routing content to appropriate users is a complex task, which, in one known implementation, is performed by application level network routers which are organized into an overlay network. An overlay network is a virtual network fabric that is implemented by application level routers that communicate with each other and end user clients using existing underlying IP network infrastructure. Overlay networks typically use the reliable point-to-point communication protocols (e.g., TCP) of the underlying network in order to implement some additional feature or service. The overlay network service is provided independent of the underlying network. In a content based overlay network, the content based services are provided in the overlay network, while the underlying network is used for standard point-to-point data communication.
In a content based overlay network, the content based services are implemented by content based routers. When a user subscribes to certain content, that subscription is stored in the routing tables of the content based routers. The routing tables also identify next hop content based routers for the various stored subscriptions. As published content arrives at the routers, the content is matched against the routers' stored subscriptions and the content is transmitted to the appropriate next hop content based router(s). The content based router at which the published content first enters the overlay network is referred to as the ingress router.
In order to implement content based routing, some technique for describing content must be used so that subscriptions may be defined and content may be matched against those descriptions. One such technique is the extensible Markup Language (XML), which is a well known language for describing electronic documents using tags and values associated with the tags. More accurately, XML is actually a metalanguage—a language for describing other languages—which allows for the design of customized markup languages for various different types of documents. XML may be used to store any kind of structured information, and to enclose or encapsulate information in order to pass it between different computing systems which would otherwise be unable to communicate. XML is defined in further detail in Extensible Markup Language (XML) 1.0 (Third Edition), W3C Recommendation 4 Feb. 2004, F. Yergeau, T. Bray, J. Paoli, C. M. Sperberg-McQueen, E. Maler, 2004 W3C, which is incorporated herein by reference.
In an XML implementation of a content based overlay network, the routers may be referred to as XML routers. In such an implementation, each of the XML routers stores subscriptions as XML queries. As content arrives at each of the XML routers, the router must compare the XML description (i.e., metadata) of the arriving content to the stored XML queries. This first requires parsing the XML description to determine its different tags and their values, and then matching the tags and values against the stored XML queries (i.e., user subscriptions). Upon a determination that arriving content matches a user subscription, the XML router transmits the content to a next hop XML router in the overlay network based upon a routing table. This process of receiving content, parsing the XML description and matching it against stored XML queries, and forwarding the content to the next hop XML router, is performed at each of the XML routers in the overlay network until the content is eventually delivered to the subscriber by the last hop (i.e., egress) XML router.
A problem with the above described XML implemented content based overlay network is that it does not scale well to a large number of users and a significant amount of traffic. Parsing XML descriptions and the associated matching of content to user XML queries is slow and computationally intensive. As such, as the number of users and content traffic increases, the content based overlay network may become overloaded and suffer significant performance delays.
Therefore, what is needed is an improved technique for content based routing which scales easily and efficiently for a large number of users.
The present invention provides advantages over the prior content based routing systems by utilizing label based routing in combination with content based routing. In one embodiment, upon receipt of a data packet at a router, the router matches the content of the data packet against stored user subscriptions. The router assigns a routing label to the data packet based on the matching, and transmits the data packet to a second network router. Intermediate routers along the packet's path use the assigned label in combination with stored routing tables in order to determine next hop routing, rather than performing additional content matching. Upon receipt at an egress router, the content of the message is matched against user subscriptions for those users serviced by the egress router, and the egress router provides the data packet to those end users whose subscriptions match the content. Since the intermediate routers do not need to perform any content matching in order to route the message, the content based routing in accordance with the present invention is faster and more efficient than prior techniques.
In one embodiment, the data packets include XML data which describes the content of the data packets, and the user subscriptions are defined by XML queries. The matching of the data packet content against the user subscriptions is performed by first parsing the XML data and then matching the XML data against the XML queries.
There are various alternatives for utilizing routing labels in accordance with the principles of the invention. A routing label assigned to a message may define a single path from an ingress router to an egress router, possibly including one or more intermediate routers. A routing label may also define a routing path from an ingress router to multiple egress routers, possibly including one or more paths through intermediate routers. In addition, multiple labels may be assigned to a single message, which each of the multiple labels defining either a path or a tree.
Label based routing in combination with content based routing provides improved performance because the time consuming and computationally expensive tasks of XML parsing and query evaluation are not performed in the intermediate routers. Label based routing in combination with content based routing also allows for other benefits as well. For example, the data packet contents may be compressed at the ingress router and transmitted through the intermediate routers in compressed form. Since the routing is pre-defined by the labels, the content itself is not needed in the intermediate routers, and the content only needs to be decompressed at the egress router so that it may be forwarded to appropriate subscribers.
These and other advantages of the invention will be apparent to those of ordinary skill in the art by reference to the following detailed description and the accompanying drawings.
Returning to
It is noted that
An exemplary subscription table is illustrated in
STOCK=ABC CO.
EXCHANGE=NYSE
PRICE>50
Similar subscriptions are shown in
As described above, all published content is routed through ingress router A 102 for initial processing. Thus, when a publisher, such as pub-1122 wishes to publish some content, the content will be inserted into the network by initially sending it to router A 102. Again, the link between publisher pub-1 computer 122 and router A 102 is meant to represent a logical connection and not necessarily a physical connection. Thus, content published from pub-1122 may be routed through additional network nodes prior to arriving at router A 102.
In exemplary operation, suppose that pub-1122 publishes content having the following content description:
STOCK=ABC CO.
EXCHANGE=NYSE
PRICE=20
Again, in an XML embodiment, the published content would be described by standard XML tags and attributes, but for ease of description published content is described herein using a more general notation as shown. Upon receipt of the content by router A 102, the content must be parsed and matched against the subscriptions in subscription table 302. While XML parsing and matching techniques are well known in the art, such processing is time consuming and computation intensive. In prior art techniques, after router A 102 parses and matches content to subscriptions, router A 102 transmits the content to one or more additional routers in the overlay network, at which point each of those additional routers performs the same parsing and matching, and such parsing and matching occurs at each content based router until the content is delivered to the appropriate subscribers. Such prior art processing is very time consuming and therefore such prior art content based routers are not able to scale well to large numbers of subscribers and significant traffic load. In accordance with the present invention, and as will be described in further detail herein, such parsing and matching only takes place at certain routers (e.g., ingress and egress) thereby significantly improving the efficiency of content based routing.
In accordance with an embodiment of the present invention, router A 102 parses and matches the incoming content to the subscriptions in its subscription table. Continuing with the example, the content received from pub-1122 will match only the subscription in record 310 of subscription table 302. As indicated in field 306 of record 310, this content only needs to be forwarded to subscriber sub-2118. Router A 102 also stores a list of content based egress routers associated with each of the subscribers. A content based egress router refers to the last content based router in the overlay network to which the content is to be transmitted prior to delivery to a subscriber. For subscriber sub-2118, the content based egress router is router F 112.
Once the egress router is identified, router A 102 creates a message containing the content, along with a label associated with a predetermined path from router A 102 to router F 112. Thus, the present invention utilizes label based routing within the overlay network in order to improve performance and remove the need for XML parsing and matching at each content based router on the path from the ingress router to the egress router. Label based routing is known in the context of standard IP routing, for example MultiProtocol Label Switching (MPLS), as described in E. Rosen, A. Viswanathan, R. Callon, Multiprotocol Label Switching Architecture, Internet Engineering Task Force (IETF), Request for Comments (RFC) 3031, January 2001, which is incorporated herein by reference. In MPLS, a short fixed-length label is generated that acts as a shorthand representation of an IP packet's header. Subsequent routing decisions (made by label switched routers) are made based on the MPLS label instead of on the original IP address.
Labels are predefined and define various paths between routers. Exemplary labels, and associated paths are illustrated in
The paths shown in
Continuing now with the above example, wherein pub-1122 publishes content with the following content:
STOCK=ABC CO.
EXCHANGE=NYSE
PRICE=20
As described above, this content needs to be forwarded to router F 112 so that it may be delivered to subscriber sub-2118. Upon a determination of egress router F 112 as the destination, ingress router A 102 generates a message containing the published content along with label 2 indicating the path shown in
Upon receipt of the message at egress router F 112, router F 112 needs to parse and match the content against stored subscriptions. Egress routers only need to store subscriptions for those subscribers which are serviced by the egress router. Thus, in the example shown in
As another example, now suppose that pub-1122 publishes content having the following content description:
STOCK=ABC CO.
EXCHANGE=NYSE
PRICE=55
This content will match the subscriptions in both records 308 and 310 of subscription table 302. As indicated in field 306 of records 308 and 310, this content needs to be forwarded to subscribers sub-1116 and sub-2118. As shown in
Upon a determination of egress routers E 110 and F 112 as the destinations, ingress router A 102 generates a message containing the published content along with two labels. The message includes label 1 indicating the path to egress router E 110 as shown in
Upon receipt of the message at egress router E 110, router E 110 will parse and match the content against stored subscriptions as described above and determine that the message should be forwarded to subscriber sub-1116. Upon receipt of the message at egress router F 112, router F 112 will parse and match the content against stored subscriptions as described above and determine that the message should be forwarded to subscriber sub-2118.
As described, an implementation of the present invention utilizes multiple labels with a single message in order to route the content of the message to multiple egress routers, and thus multiple subscribers. In an alternate embodiment, rather than using multiple labels associated with a single message, a single label may be used, where that single label defines a routing tree rather than a single path. For example, returning to the above example in which pub-1122 publishes content having the following content description:
STOCK=ABC CO.
EXCHANGE=NYSE
PRICE=55
As described above, this matches the subscriptions in both records 308 and 310 of subscription table 302 and needs to be forwarded to subscribers sub-1116 and sub-2118 via egress routers E 110 and F 112 respectively.
Instead of ingress router A 102 generating a message containing the published content along with two labels, ingress router A 102 generates a message containing the published content along with a single label (label 4) defining a routing tree as illustrated in
As another example, now suppose that pub-1122 publishes content having the following content description:
STOCK=ABC CO.
EXCHANGE=NYSE
PRICE=65
This content will match the subscriptions in all of records 308, 310 and 312 of subscription table 302. As indicated in field 306 of records 308, 310 and 312, this content needs to be forwarded to subscribers sub-1116, sub-2118 and sub-3120. As shown in
Upon a determination of egress routers E 110, F 112 and G 114 as the destinations, there are two alternative techniques for forwarding the message. In the first technique, ingress router A 102 will generate a message containing the published content along with three labels: label 1, label 2 and label 3. Label 1 indicates the path to egress router E 110 as shown in
The present invention provides advantages over the prior art content based routing schemes. By utilizing label based routing in combination with content based routing, the present invention provides for an improved content based routing system. In accordance with an advantage of the invention, parsing the content and matching the content against subscriptions is only performed at ingress and egress routers. Intermediate routers save time by routing based on assigned labels.
A system implemented in accordance with the principles of the present invention also allows for additional advantages. For example, since the intermediate routers route based on a label, and not on the content, the content may be compressed at the ingress router and decompressed at the egress router, thus reducing the bandwidth required to route the message. Implementing compression in the prior techniques which parsed the message content and performed subscription matching at the intermediate content based routers would be very inefficient. In order to implement compression in the prior techniques, each router along a path must decompress the message, parse it, match the message's content against its stored subscriptions, perform a routing table lookup to identify the interested destinations, and compress the message before sending it to the identified destinations.
One skilled in the art will recognize that there are various alternative embodiments of the invention described herein. For example, the multiple labels assigned to a single message may each be associated with a single path, a tree, or any combination of single paths and trees.
The foregoing Detailed Description is to be understood as being in every respect illustrative and exemplary, but not restrictive, and the scope of the invention disclosed herein is not to be determined from the Detailed Description, but rather from the claims as interpreted according to the full breadth permitted by the patent laws. It is to be understood that the embodiments shown and described herein are only illustrative of the principles of the present invention and that various modifications may be implemented by those skilled in the art without departing from the scope and spirit of the invention. Those skilled in the art could implement various other feature combinations without departing from the scope and spirit of the invention. For example, while the above described embodiments were described in connection with overlay networks, it should be recognized that the principles of the present invention may be implemented in other types of networks as well. For example, the principles of the present invention may be implemented in a conventional network in which the content based routers communicate with each other directly rather than being configured as an overlay network.