A portion of the disclosure of this patent document contains material, which is subject to copyright protection. The owner has no objection to the facsimile reproduction by any one of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyrights whatsoever.
Certain marks referenced herein may be common law or registered trademarks of third parties affiliated or unaffiliated with the applicant or the assignee. Use of these marks is for providing an enabling disclosure by way of example and shall not be construed to limit the scope of this invention to material associated with such marks.
The present invention relates generally to communication networks and, more particularly, to a method and system for managing multicasting in a communication network.
A computer network may be deployed to broadcast (i.e., multicast) data among a plurality of nodes (e.g., computer systems) in the network. Certain nodes may publish the information that is multicasted, while others may subscribe to certain topics of interest to receive the published information.
In a topic-based publish/subscribe (pub/sub) system, messages or events are published on abstract event channels associated with various topics of interest. Users interested in receiving messages published on certain topics issue subscribe requests specifying their topics of interest. The pub/sub infrastructure then distributes each newly published event to all the users that have expressed interest in the event's topic. Due to its simple interface and decoupling of publishers and subscribers, pub/sub-based middleware is commonly used to support many-to-many communication in a wide variety of applications, such as enterprise application-integration, stock-market monitoring engines, RSS feeds, on-line gaming, etc.
Typically, a large data center has thousands of nodes in which hundreds of distributed applications are deployed. Each node hosts dozens of applications and each application is deployed over dozens to hundreds of nodes. Each application is allocated a topic for intra-application communication purposes, most nodes being both publishers and subscribers. Furthermore, the deployment of applications on nodes may be dynamic and dependent upon the relative load incurred upon the nodes.
A communication network comprised of multiple nodes can be described by an overlay topology, which depicts the flow of data between the nodes. An application level overlay network topology consists of a collection of nodes built on top of an existing network. For example, many peer-to-peer networks are overlay networks because they run on top of the Internet. Examples of network overlay topologies include centralized networks and distributed networks.
Referring to
To overcome the above problem, a distributed topology may be used to decentralize the processing and communication functions in a network. In a distributed topology, a meshed network may be employed. In a fully meshed network each node is connected to each other node thus distribution of information from one node to any node in the network can be done directly from any node, without having to go through a centralized node or any other node. Unfortunately, however, a fully meshed topology is inefficient in a network that includes a large number of nodes, because maintaining the overhead associated with all connections between all nodes is very expensive.
Referring to
As shown in
Accordingly, in a partially meshed network, a certain amount of delay is associated with the transmission of data between indirectly connected nodes, as a trade-off to not having to maintain the overhead associated with maintaining connections between all nodes. To limit the delay in transmission, the connections between the nodes may be based on the similarity of subscriptions. That is, creation of links in the overlay can be based on similarity in the nodes' subscription interest in one or more topics.
Overlay-per-topic topologies, (i.e., constructing a dedicated overlay per topic), scale well with the number of nodes. However, they are not scalable with the number of subscriptions per-node. Even a simple logical topology, such as a tree or a ring, requires each node to maintain an average of two connections per subscription. Thus, the number of connections required grows linearly as the number of interested topics for each node increases. For large-scale settings, such as the large data center described above, or a stock-market broker interested in many dozens or even hundreds of quotes, this approach becomes impractical due to node degree limits.
Further, the current practice of the overlay-per-topic approach does not take full advantage of the fact that under typical workloads there is substantial correlation between the interests of different nodes in minimizing/optimizing average node degree. The overlay-per-topic approach only exploits this correlation in a post-processing edge-collapsing stage, rather than building a single overlay that takes into account the correlation between the interests of different nodes in the process of choosing neighbors.
As such, the above-discussed topologies do not scale well to a large data center deployment scenario because they cannot provide the desirable overlay topological characteristics of topic connectivity, scalable average node degree, scalable topic diameter, and churn resistance. Topic connectivity exists when all of the nodes in a network that are interested in a topic t create a connected component, where each interested t-topic node is connected in the same subgraph overlay.
Average node degree is the number of reliable long-lived (e.g., TCP) connections a node maintains as part of the overlay. Topic diameter is the maximum hop-count between a pair of nodes interested in the given topic. And, churn resistance is the ability of the overlay to remain topic-connected and maintain low average node degree, despite nodes leaving, joining and changing their topic of interest.
In order to achieve selective event dissemination, most existing pub/sub systems leverage the properties provided by structured overlay networks, and organizing peers into global dissemination overlay topologies, such as multicast trees. A smaller number of pub/sub architectures are based on unstructured overlays that employ a combination of an unstructured overlay and additional ring structures to support content-based pub/sub. Other architectures assume that topics are organized into a hierarchy in the naming space and construct a hierarchy of unstructured overlays that is based on the topic hierarchy.
Although relying on structured elements is instrumental for routing efficiency, maintaining global topologies incurs the cost of reconfiguration in the presences of dynamic changes, thus making these systems less favorable in highly dynamic settings. Other techniques of building an overlay that are typically used to support decentralized topic-based pub/sub communication include a ring-per-topic overlay, an overlay constructed based on a similarity heuristic, and a fully random overlay. However, in a large setting, these typical overlays result in a less than desirable high average node degree.
Thus, methods and systems are needed that can overcome the aforementioned shortcomings by more efficiently managing connections between nodes in a network according to the topics of interests associated with each node.
The present disclosure is directed to systems, methods and corresponding products that facilitate multicasting in a communication network.
For purposes of summarizing, certain aspects, advantages, and novel features of the invention have been described herein. It is to be understood that not all such advantages may be achieved in accordance with any one particular embodiment of the invention. Thus, the invention may be embodied or carried out in a manner that achieves or optimizes one advantage or group of advantages without achieving all advantages as may be taught or suggested herein.
In accordance with one embodiment, a method for managing multicasting in a communication network is provided. The method comprises determining one or more topics of interest for a first node in a network; selecting a second node in the network that shares at least a first topic of interest with the first node; establishing a connection between the first node and the second node so that the second node covers at least the first topic of interest; and establishing additional connections between the first node and at least a third node in the network that covers at least the first topic of interest, in response to determining that the first node is not covered by a total of K nodes with respect to the first topic of interest.
In accordance with one aspect of the invention, a multicasting system is provided. The system comprises one or more logic units for performing the functions and procedures discussed above. In another embodiment, a computer program product comprising a computer useable medium having a computer readable program is provided. The computer readable program when executed on a computer causes the computer to perform the functions and procedures discussed above to provide a multicasting service.
One or more of the above-disclosed embodiments in addition to certain alternatives are provided in further detail below with reference to the attached figures. The invention is not, however, limited to any particular embodiment disclosed.
Embodiments of the present invention are understood by referring to the figures in the attached drawings, as provided below.
Features, elements, and aspects of the invention that are referenced by the same numerals in different figures represent the same, equivalent, or similar features, elements, or aspects, in accordance with one or more embodiments.
The present disclosure is directed to systems and corresponding methods that facilitate multicasting in a communication network.
In the following, numerous specific details are set forth to provide a thorough description of various embodiments of the invention. Certain embodiments of the invention may be practiced without these specific details or with some variations in detail. In some instances, certain features are described in less detail so as not to obscure other aspects of the invention. The level of detail associated with each of the elements or features should not be construed to qualify the novelty or importance of one feature over the others.
Referring to
Referring to
In accordance with one embodiment, the coverage set for a node includes the set of nodes in the network that are connected to that node and share at least one topic of interest. As such, in the following, when we refer to, for example, a first node being included in the coverage set for a second node, it is meant that the first and second nodes are connected with respect to a topic of interest. Similarly, when we say that a first node covers a second node with respect to a topic of interest, it means that both the first and the second node have a common interest in the same topic.
A coverage threshold K can be set for each topic of interest according to system requirements and depending on implementation. The coverage threshold K represents a value that indicates the number of times a topic of interest for a selected node is covered by the nodes to which the selected node is connected. For example, in
Referring to
It should be emphasized that the above exemplary network, the topics of interest, number of nodes, and the threshold K value selected in the illustrated embodiment are all by way of example. Therefore, the scope of the invention should not be construed as limited to these exemplary values or relationships. Rather, the number of nodes and their relationship, in addition to the threshold value for each topic of interest can vary depending on system requirements and implementation.
As provided in further detail below, depending on implementation, a selected node's topics of interest may be covered by way of first connecting the node to those nodes in the network that share at least a certain number of topics of interest with the selected node (e.g., N7). For example, the selected node first may try to satisfy the threshold coverage by connecting to those nodes in the network that share at least X topics of interest with the selected node.
In some embodiments, the selected node may first try to connect to those nodes in the network that share with it all the same topics of interest. In the alternative, in one embodiment, the selected node may attempt to connect to those nodes in the network that share with it the most number of common topics. Depending on implementation the above methods by way of example may be referred to as the “greedy” method hereafter, represented by Kg (i.e., K greedy).
If the coverage threshold for a selected node is not covered based on the above methods, then depending on implementation the selected node may continue to connect to other nodes that have a smaller number of common topics of interest with the selected node. For example, Kr may represent connecting to a node that has at least one common topic of interest with the selected node. Hereafter, we refer to methods for connecting to a node that has at least one common interest with the selected node as the less greedy method or as the random method by way of example.
In one exemplary embodiment, to obtain K-coverage, Kg may be set to 4 (i.e., each topic to which the selected node is subscribed is covered 4 times), and Kr may be set to 1 (i.e., connecting to nodes that share at least one topic of interest with the selected node). Thus, Kr may be used to achieve K-coverage, if K-coverage is not achieved by the greedy method alone.
Now referring to
In one embodiment, to ensure that a target node in the network is not covered beyond the upper threshold, a node N(i) in the coverage set for the target node is selected such that node N(i) at least covers the topic of interest which is overly covered (S410). It is then determined if node N(i)'s coverage threshold with respect to at least one of its topics of interest is maintained if node N(i) is disconnected from the target node (S420). If so, the counter i is incremented (S430); that is, another node N(i+1) connected to the target node is examined for the purpose of disconnection from the target node.
Otherwise, if it is determined that disconnecting node N(i) from the target node does not adversely affect node Nx's coverage for its topics of interest, then node N(i) is removed from the target node's coverage set (S440) so that the target node is no longer overly covered with respect to a certain topic of interest. If the target nodes after disconnection from node N(i) remain overly covered with respect to the same topic of interest or another topic of interest the above-noted process continues by removing other nodes from the target node's coverage set, until the needed equilibrium is achieved.
It is noteworthy that to the same extent that a greedy algorithm can be employed for achieving K-coverage when a topic of interest is under-covered, a similar greedy algorithm may be also applied to return to K-coverage when a topic of interest is overly covered.
Certain exemplary embodiments of the invention are disclosed with reference to sample pseudo codes and algorithms that are discussed in detail in the following. It remains important to reemphasize and understand that the following is merely provided for the purpose of example so that the reader can fully appreciate certain exemplary implementations. In no event should the following examples be construed as limiting the scope of the invention.
In an exemplary embodiment, a method for achieving K-coverage can be implemented based on two components: a membership protocol, and an overlay construction and maintenance protocol. Both protocols may be fully distributed. The construction protocol aims to achieve connectivity and low diameter for the entire set of topics, while maintaining as few overlay links as possible.
The interest of a node may be the list of topics to which the node has either subscribed to, or is going to publish on. A selection algorithm may be used to achieve K-coverage for a node by combining the greedy and random methods noted earlier. In both methods, each node may try to cover K times each of the topics in which it is interested. That is, for each topic t in which the node is interested, the node tries to maintain connections to K other nodes that are also interested in topic t.
In some embodiments, the greedy and random methods differ in the way they connect to a neighboring node (i.e., a node that is a candidate for being a neighbor) in that said methods use different coverage parameters, Kg and Kr, respectively. The greedy method preferably selects a neighboring node that minimizes the number of topics which are not yet Kg covered. In contrast, the random coverage randomly selects a node whose addition as a neighbor would reduce the number of topics that are not yet Kr covered.
According to theory of k-regular random graphs, for Kr≧3, if each node achieves Kr coverage, then for each topic t, all the nodes interested in topic t form a connected component, with high probability, whose diameter grows logarithmically with the number of subscribers to this topic. Such coverage heuristic achieves the desired connectivity and low diameter per each topic.
To exploit correlated workloads, which are common in practice (e.g., in pub/sub applications such as RSS and stockmarket monitoring engines), the greedy coverage heuristic may be used. In many practical workloads, each link created by the greedy heuristic covers, on average, much more than a single topic, whereas each link created by the random coverage heuristic covers about a single topic. In principle, however, greedy coverage alone may not ensure, with high probability, the desired topic connectivity. Thus, the values chosen for Kg and Kr provide a tradeoff between the average node degree, and interest-based connectivity.
In one embodiment a membership scheme is used to allow each node in both the greedy and the random coverage heuristics to maintain an interest view of other nodes in the system. The interest view includes the identities of other nodes along with their interests, and may be partial and randomized. An interest view may be readily implemented by distributed probabilistic membership protocols augmented with the interest information. For example, each node may know the identities and interests of at least five percent of the nodes in order to achieve both low average node degree and topic-based connectivity.
In building and maintaining an overlay of connected nodes, the greedy and random neighbor maintenance processes may execute the same routine with the exception of the neighbor selection routine. In addition, each of these two tasks may independently manipulate its own set of the data structure consisting of the same collection of variables, where possible.
The following exemplary embodiment describes the implementation of one of the neighbor maintenance tasks without an explicit reference to the exact type of neighbors being maintained. K refers to the coverage parameter. Exemplary data structures maintained by each node p are shown in Code Section 1 below. The neighbors set for each current neighbor q of p may include an identifier (id) for node q, in addition to that node's degree, target degree, and current interest, for example. Further, each node may hold its own interest in a variable herein after represented as “self_interest” by way of example.
In one embodiment, a failure detection mechanism is used to determine if a neighbor node is alive and capable of communicating with the other nodes in the network. The failure detection mechanism may be based on querying or expecting to receive a signal (i.e., heartbeat) from a neighboring node at particular intervals. If the signal is not received or no response is provided, then it is determined that the node is no longer connected to the network. The failure detection mechanism may also be used to periodically update a node's neighbors with some elements of the node's internal state, such as its degree, and target degree.
The neighbor maintenance task may start from an empty neighbors set, and incrementally add neighbors. Neighbors are added, according to a greedy or random heuristic discussed above, until the node reaches K-coverage, such that each topic in the node's interest is represented by the interests of at least K of its neighbors. In some embodiments, however, the node may not add neighbors without a limit.
The number of neighbors in one embodiment may be limited to Lmax+Margin, where Margin may be a small constant (e.g., 5). When the degree exceeds Lmax, the node preferably stops adding new neighbors, and actively tries to disconnect from at least one of its neighbors. In an exemplary embodiment, Lmax is chosen to be equal to a value represented by K*|self_interest|, such that in the worst case, a node reaches K-coverage with each neighbor covering a single topic. In most cases, however, and especially with the greedy heuristic, most nodes should reach K-coverage with less than Lmax neighbors, because each neighbor may cover more than one topic, on average.
Nodes may be added into a set (e.g., a neighbors set) by either sending connect requests, or by accepting connect requests. It is therefore possible for a node to become over-covered. If so, some neighboring nodes may need to be removed from the neighbors set without hampering the K-coverage property of the nodes. In one embodiment, when a node becomes over-covered, it may try to disconnect from some existing neighbors whose removal would not affect the desired coverage level of the node defined by a self_interest value, for example.
In one embodiment, a node p may remove a node q from its neighbors set and stay K-covered, for example, whereas q would lose its K-coverage as a result of this disconnection, or q may have been under-covered to begin with. In some embodiments, the neighbors set is augmented with the degree and target degree of each neighbor. This allows each node to deduce the coverage state of its neighbors.
In accordance with one or more embodiments, a neighbor maintenance task may be implemented to monitor K-coverage for a node. The neighbor maintenance task may comprise two parts. A connect routine may be utilized to obtain K-coverage by connecting to at least one new neighbor, until K-coverage is achieved, or until Lmax is exceeded. A disconnect routine may be utilized to keep the node's degree from growing too much, by trying to disconnect from at least one existing neighbor whose removal would not hamper the desired coverage level. The disconnect routine may be executed when a node's degree exceeds Lmax, or when the node is over-covered.
Referring to code sections 1 through 4 below, depending on implementation, the neighbor maintenance task may comprise several routines. The main routine called M
In an exemplary embodiment, when M
In accordance with one embodiment, the connect routine may try to establish a connection with a new node. The new node to connect to, may be either chosen by the N
In some embodiments, when a node receives a connect request, the node accepts the request if the node's degree is lower than Lmax+Margin. In this embodiment, the requesting node is added to the neighbors set. Otherwise, the node, by issuing a redirect message, redirects the requesting node to a node m in the neighbors set where (1) m has not reached its target degree, and (2) m shares the maximum amount of interest with the requesting node, for example.
When a first node receives a redirect message, the first node may add the sending node to the connect_cand_from_redirect set (see for example Code Section 4, lines 10-11). In turn, the first node may try to connect to the sending node at the next iteration of the M
In an exemplary embodiment, the disconnect routine preferably starts with the node setting its adaptive degree target L to the minimum of Lmax and |value of the neighbors set| thus indicating that the node has reached, or exceeded, the minimum degree required to K-cover the node's set of topics (see for example Code Section 2, lines 22-26). Preferably, the related values are included in C
In one embodiment, a node may invoke D
In one embodiment, when a node p receives a disconnect request from another node q, the node will disconnect from q if the node can remove q without causing its interest to become under-covered, or if the node has more than Lmax neighbors. If indeed p decides to disconnect from q, it will send a D
In accordance with one or more embodiments, redirect messages may be used to prevent a case in which the node p tries to repeatedly connect to the same node q and is being rejected. In an exemplary embodiment, the nodes in connect_cand_from_redirect are given priority when choosing the next node to connect to (see for example Code Section 2, lines 15-19). In some embodiments, neighbors that have exceeded their target degree will not be chosen as redirect candidates.
Accordingly, one or more embodiments of the invention are implemented to handle dynamic changes to a network with a failure detection event handler, by removing suspected nodes from the neighbor set. The suspect node's neighbors may try to connect to new neighbors at the next round of the neighbor maintenance task. An orderly leave message may be sent to the neighbors. When a node p changes its interest, the change is propagated through the membership service and via the heart beat messages. The neighbors of the node p, and or other nodes in the network may take this change into account in the next round of the neighbor maintenance task.
Code Section 1: Data Structure and Parameters Used by the Neighbor Maintenance Implementation
Code Section 2: The Neighbor Maintenance Routines
Code Section 3: Auxiliary Routines
Code Section 4: Message and Failure Detection Event Handlers
Code Section 5: The Greedy and Random Neighbor Selection Routines
In different embodiments, the invention can be implemented either entirely in the form of hardware or entirely in the form of software, or a combination of both hardware and software elements. For example, a network comprising computing systems N1 through N7, for example, may comprise a controlled computing system environment that can be presented largely in terms of hardware components and software code executed to perform processes that achieve the results contemplated by the system of the present invention.
Referring to
As provided here, the software elements that are executed on the illustrated hardware elements are described in terms of specific logical/functional relationships. It should be noted, however, that the respective methods implemented in software may be also implemented in hardware by way of configured and programmed processors, ASICs (application specific integrated circuits), FPGAs (Field Programmable Gate Arrays) and DSPs (digital signal processors), for example.
Software environment 600 is divided into two major classes comprising system software 602 and application software 604. System software 602 comprises control programs, such as the operating system (OS) and information management systems that instruct the hardware how to function and process information.
In one embodiment, the algorithms and logic code represented in code sections 1 through 5 or other software implementations including or relating to the greedy and random connecting methods may be implemented as system software 602 and/or application software 604 executed on one or more hardware environments to multicasting in a network. System software 602 and application software 604 may comprise but are not limited to program code, data structures, firmware, resident software, microcode or any other form of information or routine that may be read, analyzed or executed by a microcontroller.
In an alternative embodiment, the invention may be implemented as computer program product accessible from a computer-usable or computer-readable medium providing program code for use by or in connection with a computer or any instruction execution system. For the purposes of this description, a computer-usable or computer-readable medium can be any apparatus that can contain, store, communicate, propagate or transport the program for use by or in connection with the instruction execution system, apparatus or device.
The computer-readable medium can be an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system (or apparatus or device) or a propagation medium. Examples of a computer-readable medium include a semiconductor or solid-state memory, magnetic tape, a removable computer diskette, a random access memory (RAM), a read-only memory (ROM), a rigid magnetic disk and an optical disk. Current examples of optical disks include compact disk read only memory (CD-ROM), compact disk read/write (CD-R/W) and digital videodisk (DVD).
Referring to
A user interface device 512 (e.g., keyboard, pointing device, etc.) and a display screen 515 can be coupled to the computing system either directly or through an intervening I/O controller 516, for example. A communication interface unit 518, such as a network adapter, may be also coupled to the computing system to enable the data processing system to communicate with other data processing systems or remote printers or storage devices through intervening private or public networks. Wired or wireless modems and Ethernet cards are a few of the exemplary types of network adapters.
In one or more embodiments, hardware environment 500 may not include all the above components, or may comprise other components for additional functionality or utility. For example, hardware environment 500 may be a laptop computer or other portable computing device embodied in an embedded system such as a set-top box, a personal data assistant (PDA), a mobile communication unit (e.g., a wireless phone), or other similar hardware platforms that have information processing and/or data storage and communication capabilities.
In certain embodiments of the system, communication interface 518 communicates with other systems by sending and receiving electrical, electromagnetic or optical signals that carry digital data streams representing various types of information including program code. The communication may be established by way of a remote network (e.g., the Internet), or alternatively by way of transmission over a carrier wave.
Referring to
Software environment 600 may also comprise browser software 608 for accessing data available over local or remote computing networks. Further, software environment 600 may comprise a user interface 606 (e.g., a Graphical User Interface (GUI)) for receiving user commands and data. Please note that the hardware and software architectures and environments described above are for purposes of example, and one or more embodiments of the invention may be implemented over any type of system architecture or processing environment.
It should also be understood that the logic code, programs, modules, processes, methods and the order in which the respective steps of each method are performed are purely exemplary. Depending on implementation, the steps may be performed in any order or in parallel, unless indicated otherwise in the present disclosure. Further, the logic code is not related, or limited to any particular programming language, and may comprise of one or more modules that execute on one or more processors in a distributed, non-distributed or multiprocessing environment.
Therefore, it should be understood that the invention can be practiced with modification and alteration within the spirit and scope of the appended claims. The description is not intended to be exhaustive or to limit the invention to the precise form disclosed. These and various other adaptations and combinations of the embodiments disclosed are within the scope of the invention and are further defined by the claims and their full scope of equivalents.