This invention relates to the delivery of data by way of data communication networks. Embodiments of the invention provide systems and methods for streaming media data such as audio and/or video data.
There is an increasing demand for practical systems capable of delivering digital media to consumers. Delivering digital video and other media by way of a computer network such as the internet involves transmitting large amounts of data over the network.
Some on-demand media delivery systems have a Client/Server architecture in which consumers, also known as clients, request and receive media files from a single server computer or a small number of server computers. The media is hosted on the server computer. Consumers can request that the server deliver media to their computers over the network. In some cases, consumers download media files to local data stores and can subsequently play the media. Other cases stream media files. Streaming delivers media files in such a way that part of a file can be played back by the consumer while another part of the file is simultaneously being transmitted.
A problem with server-based methods is that they are not readily scalable to serve a large number of consumers. As the number of client requests for media increases, as server of any given capacity will eventually fail to respond to requests in a timely manner, and will become the bottleneck in the flow of media files to clients.
Numerous methods have been devised and are know in the art to alleviate this strain on a server's capacity. These include:
Peer-to-peer (P2P) software permits computers on a network to exchange data. Each computer running P2P software may be called a peer, A number of computers running P2P software can form a network. Each peer may share data that it has in a data store with other peers in the network and may request and receive data from other network peers. Typically, peers can join and leave a P2P network dynamically. For instance a peer may leave a P2P network when the P2P program, the peer computer, or the peer computer's connection to the internet is disabled, either by the user or due to a failure of the program, computer or connection. The size of a P2P network can fluctuate continuously. Some examples of P2P computer programs are Gnutella™, Napster™, Kazaa™ and BitTorrent™.
There remains a need for practical and cost effective systems and methods for distributing media (including for example video and audio) as well as other data. There is a particular need for practical and cost effective systems and methods for distributing streaming media.
Any system which permits data to be retrieved from a range of locations must provide some system for identifying the location(s) at which particular data is hosted. P2P programs typically provide the ability to search and find information on the P2P network. Search schemes used in P2P networks can generally be divided into unstructured networks which use flooding to perform searches and structured systems.
Gnutella (see http://www.gnutella.com) and Kazaa (see http://www.kazaa.com/us/index.htm) are examples of unstructured P2P networks that use flooding as a search technique. In such networks, a peer that wishes to find information broadcasts a request for information to many other peers in the network. Peers that receive the request message retransmit the message to other peers in the network. In this manner a network peer that has the requested information will eventually be found and will respond. Although such flooding is simple and works well in a highly dynamic network environment, where peers are continuously added and removed from the network, it generates large numbers of redundant requests, which makes it very difficult to scale to networks with very large numbers of peers.
Structured P2P systems such as Chord (see I. Stoica, R. Morris, D. Karger, M. Kaashoek, H. Balakrishnan, Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications, Proc. of ACM SIGCOMM '01, 2001) and Tapestry (see S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker, A Scalable Content-Addressable Network, Proc. of ACM SIGCOMM '03, 2003) use search techniques based upon Distributed Hash Tables (DHTs). A DHT-based search technique allows a peer searching for information to locate another peer that hosts the desired information within a bounded (finite) number of message requests. However, DIIT-based methods tightly control both the placement of data and the topology of the network, which results in a high maintenance cost. Furthermore, such networks typically only support searches by identifier and lack the flexibility of keyword searching.
There remains a need for effective, practical systems for cataloging information in distributed file-sharing systems.
This invention has several aspects. These aspects can be combined in a data distribution system but also have utility on their own and in combination with other data distribution systems.
The invention provides, without limitation:
Some embodiments provide cost-effective architectures for large-scale video streaming over the Internet. Such an architecture can exploit the often underutilized storage and upload bandwidth available to personal computers, set top boxes in cable systems, or other nodes on a network to support streaming requests. In an embodiment for streaming video, video files can be split into segments. The segments can be hosted at multiple nodes in the network. More popular and/or more important segments may be hosted at more nodes.
In some embodiments, a Media Segment Distributing (MSD) algorithm is applied to distribute the segments to different nodes. The MSD algorithm determines which segments will be hosted by which nodes based on factors such as the nodes' stability, available upload bandwidth, and recent streaming serve load\frequency.
In some embodiments a category overlay search is used to locate segments required for a streaming session. The search may be used during the streaming session. It is not necessary that all segments be located before playback begins. Requests for segments may be receiver-driven. A receiving node may host material to be delivered to other nodes. Segments of a video file may be divided into blocks to facilitate parallel reception of different parts of a segment from different hosting nodes. Thus upload bandwidth from different hosting nodes can be aggregated. A streaming request may be supported by multiple hosting nodes during a streaming session A Multiple-Source Scheduling (MSS) algorithm may be applied to select the hosting node and order of delivery for blocks in a segment to efficiently aggregate upload bandwidths from multiple hosting nodes and coordinate the downloads from the hosting nodes to timely serve one streaming request.
Some embodiments of the invention apply a category overlay search that can be run on a cluster-based P2P infrastructure that overlies an unstructured network of nodes. This structure permits separation of search traffic from system maintenance. Searches can be performed while restricting the number of messages required to perform the searches.
Searching tasks can be divided by category and by cluster. Load balancing can be further improved by separating searching and indexing traffic.
One aspect of the invention provides a methods for storing a data item for download over a data communication network, advantageously to support streaming of the data item to a receiving node. The methods comprise: dividing the data item into a plurality of segments and identifying a plurality of nodes on the network that are capable of hosting the segments; evaluating suitabilities for hosting the segments of nodes of the plurality of nodes; selecting a subset of the plurality of nodes based upon the corresponding suitabilities; and, forwarding the segments of the data item to the nodes of the selected subset for hosting by the nodes of the selected subset.
Another aspect of the invention provides methods for downloading a data item on a data communication network. The methods may be applied to stream the data item to a receiving node. The data item may comprise a media file, such as a video file. The data item comprises a plurality of segments. The segments are hosted on a plurality of nodes on the network. The methods comprise: downloading all of the segments of the data item and assembling the segments to provide the data item. Downloading all of the segments comprises downloading data from each of a plurality of different ones of the nodes. For at least one of the segments, downloading the segment comprises identifying two or more of the nodes that host the segment and requesting different portions of the segment from each of the two or more of the nodes.
Another aspect of the invention provides systems for storage and delivery of data items. Such systems comprise a plurality of nodes. Each of the plurality of nodes comprises a data processor, a data store accessible to the data processor, and stored computer instructions executable by the data processor. A data communication network interconnects the plurality of nodes. For at least some of the nodes: the computer instructions cause the data processor to retrieve a selected data item by downloading a plurality of segments of the data item and assembling the segments to provide the data item. At least one of the segments comprises a plurality of blocks and downloading the segment comprises identifying two or more other nodes that host the segment and requesting different blocks of the segment from each of the two or more of the nodes.
Another aspect of the invention provides systems for storage and delivery of data items. Such systems comprise a plurality of nodes interconnected by a data communication network. At least some of the nodes comprise: a data store; means for identifying two or more other nodes storing a segment of a desired data item, the segment comprising a plurality of blocks; means for requesting a block of the segment from each of the two or more of the nodes; means for receiving the blocks from the other nodes; and means for assembling the blocks to provide the data item.
In addition to the exemplary aspects and embodiments described above, further aspects and embodiments will become apparent by reference to the drawings and by study of the following detailed descriptions.
Example embodiments are illustrated in the drawings. The embodiments described and shown herein are illustrative rather than restrictive.
A system 10 for delivering streaming data is shown in
Publishing Data
Data to be made available on system 10 is published to system 10. The data may be data of any type. In example embodiments, the data comprises:
In preferred embodiments, each segment 31A comprises several blocks 31B. The size of blocks 31B may be adjusted. To facilitate scheduling it is convenient that the blocks within any particular segment are all of the same size. In some embodiments, all of the blocks that collectively make up the segments of a data item are equal in size.
At block 34, method 30 identifies one or more nodes 12 to host each of segments 31A. At block 36, method 30 distributes the segments to different nodes 12. Different nodes 12 may each receive one or more segments. The result of method 30 is that all of the segments 31A of media file 31 are stored in the data stores 16C of nodes 12 of system 10. Preferably each segment 31A is hosted by multiple nodes 12. Segments 31A may be sequential parts of file 31.
Nodes 12 belonging to system 10 may contribute some of their outbound bandwidth and file storage to system 10. The outbound bandwidth and storage that the ith node, Pi, contributes are denoted as Bwi and Sti respectively. The choice of which node(s) to host a segment 31A of a current media file 31 may be based upon a number of factors. For example, the nodes may be selected based upon one or more of:
One possible measure of a node's stability is based upon the lengths of the periods during which the node remains connected to system 10 without interruption. This may be done, for example by computing the smoothed weighted average as follows:
EstimatedStayi(new)=α×EstimatedStayi(prev)+β×CurrentStayi (1)
where CurrentStayi is the length of time that node Pi has participated in system 10 without leaving or failure since it last connected to system 10, EstimatedStayi(new) is a measure of the stability of node Pi taking into account all the history of Pi, EstimatedStayi(prev) is the previous value for EstimatedStayi and α and β are weighting parameters with α+β=1. In one embodiment, a is in the range of 0.8 to 0.9 and β is in the range of 0.1 to 0.2.
A measure of the degree to which system 10 has utilized a node recently may be based upon one or more of the average usage ratio of its contributed bandwidth since it began to participate in system 10 Riusage and the frequency, Freqservei, with which the node has served streaming requests in a recent period.
In an example embodiment, the desirability of a node 12 as a host for a segment is based upon a goodness measure GSti. In an example embodiment, GSti has the form:
where αSt, βSt, γSt are weighting factors, m is the number of nodes participating in system 10. With this formulation, a candidate node that is more stable, has higher available bandwidth and has historically had a lower serve frequency will have a greater GSt.
The node wishing to publish file 31 retrieves values for GSt of other nodes 12. Block 34 may comprise identifying a number Nc of other nodes that have the highest values for GSt and distributing segments 31A among these other nodes 12. In some embodiments, nodes 12 collect the statistics required to compute GSt and forward such statistics to a core node. The core node may maintain a data structure associating nodes 12 with the corresponding values for GSt. The data structure may comprise a list sorted by GSt. Upon request from a publishing node 12 the core node may select Nc nodes having the highest values of GSt and send the information identifying those nodes back to the publishing node.
In some embodiments, nodes 12 of system 10 are grouped in clusters. Each cluster may have a core node responsible for aggregating values of GSt corresponding to nodes 12 belonging to the cluster that the core node is responsible for. In such embodiments, in addition to identifying nodes in its own cluster that have high values for GSt, the core node may pass the request to other core nodes. The publishing node may wait for a timeout period Timeoutp for responses from core nodes and then assign segments 31A to nodes based upon the values for GSt.
In an example embodiment, each cluster of nodes has a tree structure. Apart from the core node, each node in the structure is connected to the core node either directly or via one or more intermediate nodes in the cluster.
The organization of cluster 40 is a logical organization and is only relevant to the routes by which information is passed between the nodes of a cluster and the core node of the cluster. The arrangement of nodes into clusters does not necessarily affect and is not necessarily affected by the architecture of network 14 or the geographical locations of any nodes.
In some embodiments, each node periodically sends “alive” messages to its parent node. The alive messages may include information including one or more of EstimatedStay, Bw, Rusage, and Freqserve. The parent collects the information contained in the received “alive” messages and periodically sends an aggregate report to its parent along with its own “alive” message. Thus, eventually, the core node will have recent information sufficient to compute GSt for every cluster member. The core node may sort the cluster members in descending order of GSt and store the result in a sorted candidates list. The core node periodically maintains the sorted candidates list based on more recent information about cluster members.
In addition to being useful for maintaining a central record of GSt values, the cessation of alive messages can be used to detect when a node has dropped out of system 10 for some reason.
In some embodiments, after a set of suitable host nodes is identified in block 34, segments 31A are distributed among the host nodes in a round robin manner. A segment distribution algorithm assigns the first segment to the candidate node that has the highest GSt, then assigns the second segment to the candidate peer that has the next highest GSt, and so on. Once the segments assignment is done, the publishing node sends segments 31A to the assigned nodes 12 on network 14. The nodes 12 store the received segments 31A in their data stores 16C.
At the conclusion of method 30 one or more data files 31 have been divided into segments 31A and segments 31A have each been stored by a plurality of nodes 12 of system 10.
In some embodiments, the number of copies of the first or first few segments 31 of a data file stored in system 10 is greater than the number of copies of subsequent segments. This is desirable where the data stored in system 10 comprises on-line streaming media because, as is known in the art, there tends to be a significantly greater demand for the first few seconds or minutes of a media file than for the rest of the media file. Consumers may start to play a media file and then realize after a few seconds that they are not interested in playing the media file to the end.
In some embodiments, as consumers use system 10 to retrieve data files 31 for playback at their nodes 12 (or for some other purpose), P2P software 16D causes some segments 31A of the retrieved data files 31 to be retained in the data store 16C of the consumer's node 12. These retained segments can then be made available to other nodes on system 10. This is desirable since it automatically causes more segments of popular data files 31 to be present on system 10. The selection of which segments 31A to be retained may be random or quasi-random. The selection may be biased such that the first few segments 31A have greater odds of being retained than other segments.
In some embodiments, all segments of a data item are retained in data store 16C at least for a period of time. The availability of segments in local data store 16C (which can be accessed relatively very rapidly) permits rapid fast-forwarding or rewinding within the portion of the data item for which segments have been stored in local data store 16C of the receiving node. As long as the segments are retained by the receiving node, the data item (e.g. a video) can be played at the receiving node without the need to download the segments again (subject to the possibility that P2P software 16D may be configured to permit playback of a data item only a certain number of times or only during a certain period).
In some embodiments, complete copies of data items 31 that are available on system 10 may be made available in selected nodes that are very stable (e.g. are on-line continuously). Such nodes may be accessed as a last resort to provide segments that are not otherwise available on system 10.
Requesting and Receiving Data
System 10 may be configured to permit retrieval of data from system 10 in any of a wide range of alternative ways. In some embodiments, when a node (a receiving node) requests a data file from system 10, the receiving node may first identify other nodes on system 10 that host the first segment of the desired data. The first segment may be downloaded from one or more such nodes while the requesting node prepares to download the remaining segments of the desired data. Identifying nodes that host required segments of the desired data may be performed using any suitable searching mechanism. One suitable search mechanism is described below.
The receiving node may determine if the desired data (e.g. a media file) can be streamed to the receiving node by the nodes contained in the search results (hosting nodes). If the receiving node determines that the media file can be streamed, the receiving node may selfishly determine the best hosting nodes to receive the media file from. The receiving node may apply a scheduling method to aggregate bandwidths from the selected hosting nodes and coordinate them to stream segments of the media file to the receiving node beginning with the first segment of the media file.
If the receiving node determines that the media file cannot be streamed to it, the request is rejected. Examples of situations in which the request may be rejected are cases in which:
In some embodiments, playback can commence before all segments are available (or even exist). For example, in a nearly live broadcast, media segments are generated and published to system 10 in real time. After the first few segments have been made available on system 10, users can commence playing the media segments as described herein. As consumers play back the nearly-live media, the P2P software 16D running at the consumers' nodes locates and streams additional segments 31A of the nearly-live media to the consumers' nodes.
As noted above, each segment 31A is preferably hosted by a plurality of nodes and comprises a plurality of separately-transmittable blocks. The receiving node identifies the best set of other nodes from which to source each segment.
The scheduling problem may be articulated as follows. Suppose a segment contains N blocks {B1, B2, . . . , BN} and the receiving node has identified M hosting nodes {P1, P2, . . . PM} to supply the segment. Given the bandwidths contributed by the hosting nodes {Bw1, Bw2, . . . , BwM}, where the sum of the contributed bandwidth is at least equal to Br (the playback bit rate of video), how should one divide among the hosting nodes the responsibility for transmitting the blocks of the current segment to the receiving node to achieve a minimum initial buffering time, as well as download each block as early as possible.
Some embodiments employ a ‘round robin’ (RR) methodology. In such embodiments, hosting nodes for a segment are numbered from 1 to M and blocks are assigned to the hosting nodes in order from 1 to M. RR treats each hosting node equally by making it equally likely that each hosting node will be assigned the same number of blocks, no matter how much bandwidth each hosting node contributes to the streaming session. Thus some bandwidth contributed from hosting nodes that have more contributed bandwidth may be wasted, while hosting nodes that have relatively little bandwidth available for the purposes of system 10 may be assigned undesirably many blocks.
Other embodiments employ a bandwidth proportional method (BP). In BP methods, blocks are assigned to hosting nodes in proportion to the bandwidth available at each hosting node. In this approach, hosting node Pi sends Bwi/Br blocks, starting at the next block after the last block assigned to hosting node Pi-1. This approach utilizes the bandwidth fully from each hosting node when sending blocks.
Other embodiments apply multi-source scheduling (MSS) which combines advantages of both RR and BP. MSS generates a schedule in which blocks are assigned to hosting nodes in a roughly round robin manner. In each round, the blocks are assigned in proportion to the bandwidth contributed by the hosting nodes.
In MSS, hosting nodes may be sorted by their bandwidth Bw in descending order. For a given hosting node Pi, the time Ti is the earliest time at which the hosting node could commence sending the current block. If the hosting node is not already sending a block then Ti is the current time. If the hosting mode is already sending one or more blocks then Ti is the time at which the hosting node will finish sending the blocks that it has already committed to send. Ti may initially be set to zero.
The responsibility for delivering blocks may be assigned to hosting nodes in order of the block number, starting with the first block, B1. Responsibility for delivering each block is assigned to the hosting node that will complete delivery of the block first (taking into account the blocks that have already been assigned to the available hosting nodes). To assign the current block, Bcurrent, the receiving node may compute the estimated finish time for the block for each of the hosting nodes. The estimated finish time may be given by:
where Tfinish(i) is the estimated finish time for the hosting node Pi, and Size is the size of the current block.
Next, the hosting node having the minimum estimated finish time is identified. Responsibility for the current block is then assigned to the hosting node for which the finish time is minimum. The time for the selected hosting node is then set to have a new value equal to Tfinish(i ). This process may be repeated for each block in order until responsibility for the last block in the current segment has been assigned to a hosting node.
Multiple-Source Scheduling (MSS) assigns blocks to hosting nodes based on their estimated finish times for sending a current block. The supplier that has the minimum estimated finish time will be assigned responsibility for delivering the current block. This approach ensures that blocks are assigned to hosting nodes in proportion to their contributed bandwidth, and each block is downloaded by the receiver as early as possible after the previous blocks are received.
The performance of the RR, BP and MSS scheduling methods may be compared by comparing
When the streaming of the first segment is underway, the receiving node may repeat the process to download the second and subsequent segments 31A, until the entire media file 31 has been streamed to the receiving node.
In some embodiments, the media file or other data item being retrieved comprises metadata. The metadata may, for example, identify he segment(s) of a video file corresponding to particular scenes. The metadata may contain more detailed information relating to the information content of individual segments. For example, for a video file, the metadata may identify segments in which a certain actor or actresses appears. Such metadata can permit intelligent search for content. The metadata may be included in one or more of the first few segments of a data item so that it is immediately available for use by a player. A player may be configured to play the data item (for example, by displaying video images and accompanying audio, playing back audio, displaying sequences of images, or the like) provide intelligent fast-forward or rewind to a particular logical scene by searching the metadata to identify the segment at which the desired scene commences and then locating, buffering if necessary, and commencing playback at the identified segment.
Once the receiving node generates a schedule for downloading the blocks that make up a segment, it may send the schedule to the selected hosting nodes. When a hosting node receives the schedule, it may send the assigned blocks to the receiver according to the schedule. The blocks may be delivered using any suitable protocol over network 14. For example, the blocks may be delivered using UDP (user datagram protocol). The participating nodes may perform TCP-friendly congestion control over the UDP connection.
As shown in
In order to accommodate the transient effects of streaming packets arriving late or the selection of new hosting nodes when hosting nodes leave system 10 or fail, receiving node 12 may buffer at least SinitBuff blocks before the media file playback starts (initial buffering). After the initial buffering time, the receiving node may continuously read data from ring buffer 50 and play the media file.
During a streaming session, some hosting nodes may leave system 10 or fail, or incoming streaming rates from one or more hosting nodes may decrease due to network congestion. In such cases, the receiving node may select one or more substitute or additional hosting nodes from which the required segments can be obtained. For example, if a hosting node fails or leaves system 10 during a streaming session then the receiving node may select another hosting node to substitute for the leaving/failing hosting node. The receiving node may generate a new schedule for delivery by the new set of hosting nodes of blocks that have not been received. The receiving node sends the revised schedule to the nodes of the new set of hosting nodes. Once they the revised schedule has been received by the hosting nodes of the new set of hosting nodes, the new set of hosting nodes may send the assigned blocks to the receiving node in the order specified by the schedule. This process may be referred to as “supplier switching”.
While supplier switching is occurring, the aggregate bandwidth may be less than the required playback bit rate, and thus the receiving node may experience buffer underflow. SinitBuff may be chosen to be large enough that playback can continue without interruption even if supplier switching occurs.
In some embodiments, the receiving node 12 monitors the status of ring buffer 50 and tracks the blocks received during a streaming session. Every block should be received at least Tadv seconds before that block is scheduled for playback. If the block has not been received by this time then the block may be identified as “lost”, and the receiving node may send a request to the corresponding hosting node to re-send the lost block.
During the streaming session, the receiver may monitor the rate at which data is being received from each hosting node. If the receiving node detects that the incoming bit rate from a hosting node is decreasing for a period Tdec, or it is notified or detects the departure or failure of a hosting node, the receiving node may perform supplier switching
It can be appreciated that the methods and apparatus described above can be applied to downloading data items. Advantageously the methods and apparatus can be applied to streaming data items. Streaming differs from other downloading operations in that playback (or other use) of the data item is commenced before all parts of the data item have been received. Further, where the data item is streamed, it is not necessary to keep parts of the data item that have already been played back (or otherwise used).
The foregoing description assumes that receiving nodes have access to some mechanism for identifying nodes of system 10 that host segments of data files 31 that are required by the receiving nodes. Any suitable search mechanism may be used to perform this function. A novel search mechanism that may be used to perform this function is described below. This novel search mechanism may also be applied for other purposes.
Apparatus and methods for searching for content in a distributed network such as a P2P network provides separable mechanisms for searching for content in each of a plurality of categories. These mechanisms may be described as category-specific overlays. Such overlays may be provided on an unstructured P2P system. Specific searches may be limited to the applicable overlays. This conserves computing resources and limits the number of messages that must be passed across a data communication network to perform a search.
Nodes may be assigned to maintain content indexes for predefined categories. Such nodes may be referred to as “agent nodes” for their respective categories. The categories are related to the network content and may include but are not limited to categories that may describe media. For example, where a system 10 hosts movies, the categories may include categories such as “action”, “comedy”, “historical”, and so on.
Where it is desired to use the search system for locating hosting nodes that host specific segments of hosted files, categories may be provided for each set of segments. For example, a category may be provided for each of the 1st, 2nd, 3rd, 4th . . . etc. segments of content hosted on a system 10. A receiving node searching for hosting nodes that have a copy of the first segment of a particular file would look up the file in the category for the 1st segments and so on. Categories may also be based upon factors in addition to the ordinal position of a segment. For example, categories may be based upon both a classification of the subject matter of data items and the ordinal position of the desired segment.
The agent node in each category maintains a keyword list table (which may be called a “content index table”) for some or all of the information belonging to the category or categories to which it has been assigned. The content index table may comprise a data structure that stores keyword lists for all the contents (e.g. stored files) belonging to a given category. For example, each entry in the content index table may contain the following information: Category, Keyword List, Owner node. “Category” specifies the category to which the content has been assigned. “Keyword List” includes one or more keywords germane to the content. “Owner Node” specifies on which node the information is stored. For example, an entry <CA, KL, NX> means that node NX has content which goes with the keyword list KL belonging to the category CA. Content index tables are only maintained at agent nodes.
In some embodiments, the nodes in a P2P network, such as system 10 are divided into clusters. Each cluster may include one, two or more nodes. Different clusters may include different numbers of nodes. Each cluster maintains agent nodes for each category. A single node may serve as an agent node for multiple categories.
In some embodiments, the clusters may have a tree topology.
The agent nodes for a specific category that belong to different clusters within system 10 may be associated with one another. The association may comprise links between the agent nodes. A link may comprise a record of the address or other contact information of one agent node for a category maintained at another agent node for the category. In some embodiments, each agent node has an overlay link list listing addresses of one or more agent nodes for the category that are in other clusters. This association of agent nodes based on their assignment to a common category may be referred to as a “category overlay”. Multiple category overlays can be constructed. Each category overlay corresponds to a category. Because clusters may contain different numbers of nodes, certain nodes may belong to multiple category overlays.
In some embodiments, each node 12 maintains a category table, which stores mappings between categories and the corresponding agent nodes in the cluster to which the node belongs. Each entry in the category table may contain the following data: <Category, Agent Node, Timestamp>. For example, an entry <CA, NX, Ti> means that at time Ti, node NX became associated with category CA. Every category has a corresponding entry in this category table and every node maintains a copy of the category table.
To retrieve information from the system, a node may issue a query specifying a category, as well as a list of one or more keywords associated with the information to be retrieved from the system. The requesting node directs the query to the agent node associated with the specified category. The requesting node may locate the agent node by looking up the category in its category table. The agent node receives the query and then searches its content index table for information which matches the keyword(s) in the query. The agent node returns its results to the query initiator.
The agent node may additionally propagate the query within the corresponding overlay. Each agent node in the overlay that receives the query may search its content index table for information that matches the keyword(s) in the query and returns results of the search to the query initiator. The search results include a list of the nodes storing information that satisfies the query.
An example search for information belonging to the category CA is as follows:
In this method, a query needs only to be propagated within the overlay corresponding to the category to which the query relates. This is much more efficient than propagating the query by way of flooding to all nodes in the system. Each category overlay need contain at most N peer nodes, where N is the number of clusters. Therefore, as long as a query is propagated within its category overlay, very few peers (comparing with all the peers in the network) will be contacted with the query.
For example, a user may wish to find a file containing music by a country music singer. The node operated by that user may issue a query in the “country” category. The query may comprise a list of keywords that includes the singer's name or the song title. The agent node(s) may return the addresses of nodes hosting information matching the query.
As another example, a receiving node may use a category overlay search to identify other nodes hosting a particular segment 31A of a data file 31 that is to be streamed to the receiving node. For example, if the required segment is the Nth segment, the receiving node may identify an agent node associated with the category for the Nth segments of files and may then send a query to that agent node in which the category is the “Nth segment” category and the keyword is an identifier for the required file. The search returns one or more lists of nodes that host the Nth segment of the required file.
It may be desirable to impose various limits on the size and structure of clusters. For example, all nodes belonging to a cluster may be required to be within N hops distance from the core node. This N hops distance may be called the cluster range limit. The number of hops from a member node to the core node by way of the tree structure may be called the range of the node. Clustering may be achieved by permitting nodes to request to join existing clusters. For example, a node that comes on-line may be programmed to contact a node in another cluster and request to join that other cluster. If the range of the contacted node is less than the cluster range limit then the requesting node may join the existing cluster. Otherwise, the requesting node may create a new cluster (i.e. become the core node for the newly created cluster). In the alternative, the requesting node may attempt to join another cluster or to find another node that has a range smaller than the cluster range limit.
Clusters should ideally be of similar sizes. The sizes may be regulated, for example, as follows:
When a new cluster is created, the core node for the cluster may initially be the agent node for all categories in the new cluster. As other nodes join the new cluster the core node may migrate some categories to the other nodes. These other nodes may migrate categories to still other nodes. For example, any agent node within a cluster may migrate some of its categories to a newly-joined node that joins the cluster as a child of the agent node. An agent node may cause some categories to be migrated to other nodes in the cluster if its load becomes too high.
The category tables maintained by nodes in a cluster may become out of date and inconsistent with one another if an agent node in the cluster migrates one or more categories to some other node(s). Initially, only the agent nodes and the nodes to whom the categories have migrated will have up-to-date information regarding the current agent node for the migrated category.
A periodical aggregation report scheme may be implemented to solve this inconsistency problem. Every participating node may periodically sends a category update report to one or more randomly-selected neighbor nodes. The category update report may include the latest N updates (or category migration events) known to the reporting node, as well as M random entries from the category table of the reporting node. Upon receiving the category update report, a recipient node may update its own category table based on the timestamps in the category update report.
When a node shares information with the system the node may identify the applicable agent node by looking up the category with which the information is associated in its category table. The node may then generate a message the agent node. The message may include a category, a list of one or more keywords associated with the information and may advise the agent node that the information assigned to the category is available for access by other nodes. Upon receiving this message, the agent node may store the keyword list and the address of the hosting node in its content index table.
In some embodiments, the category and keywords are selected by a user of the hosting node. The user may select the category from a predefined category list and may select a number of keywords to describe that information. The number of keywords may be one or greater than one.
It can be appreciated that the searching methods and apparatus described herein can permit searching to be shared among a wide range of nodes in a system 10. Searching duties can be distributed among a number of nodes that is as large as the number of categories (which can be large) times the number of clusters of nodes in system 10. Infrastructure maintenance duties, such as indexing, adapting system 10 to incorporate new nodes or to deal with the departure of existing nodes can also be distributed over a large number of nodes.
The methods and apparatus disclosed herein may be combined or implemented alone to provide search or streaming capability for fields of use that include storage area networks video, image, music, and data file searching and streaming, and more generally the sharing, streaming or search of any data that can be categorized or divided into segments.
In an example embodiment, the nodes include set top boxes in a cable television system, Such set top boxes have desirable attributes for nodes as a system as described herein because such set top boxes:
A system 10 as described herein may be used to provide streaming media (or other data items) to devices that are not themselves nodes that participate in system 10. For example, such as system may be used to deliver data items to personal digital assistants (PDAs), mobile telephones, or the like. This may be accomplished, for example, by providing proxy nodes in system 10. A proxy node receives data from system 10, as described above, but instead of, or in addition to playing back (or otherwise using) the data item itself, the proxy node forwards data received from system 10 to another device.
It can be appreciated that embodiments of the category search methods and apparatus described herein are readily scalable and can provide good load balancing among nodes (by separating system maintenance, indexing and search responsibilities and distributing them over various nodes). In such embodiments, it is not necessary to provide “super nodes” capable of independently handling all search services for an entire system. It can also be appreciated that in some embodiments, different nodes are associated in groups in two different ways. On one hand, nodes are associated into clusters. Maintenance tasks such as tracking which nodes are agent nodes for each cluster may be managed using the topology of the clusters. On another hand, agent nodes are associated in category overlays. Searching tasks such as locating nodes that hose a desired segment of a data item may be managed using the topology of the appropriate category overlay.
Alternative embodiments of this invention combine the category overlay search methods disclosed herein with any suitable media file storage and retrieving method including those known in the art. The category overlay method provides large-scale efficient keyword search services based on the P2P network model. Therefore, this method can provide a search infrastructure for any system that requires keyword search services. One such example is a file sharing application. Some additional applications of the distributed data storage and retrieval technology described herein are:
Other alternative embodiments combine the disclosed on-demand media streaming methods with any search infrastructure that provides efficient keyword search services.
Where a component (e.g. a processor, data link, device, circuit, player, etc.) is referred to above, unless otherwise indicated, reference to that component (including a reference to a “means”) should be interpreted as including as equivalents of that component any component which performs the function of the described component (i.e., that is functionally equivalent), including components which are not structurally equivalent to the disclosed structure which performs the function in the illustrated exemplary embodiments of the invention.
While a number of exemplary aspects and embodiments have been discussed above, those of skill in the art will recognize certain modifications, permutations, additions and sub-combinations thereof. It is expressly intended that all methods and apparatus comprising any new useful and inventive features, combinations of features or sub-combinations of features disclosed herein are aspects of the invention, whether or not included in the original claims. It is further expressly intended that the features of the different example embodiments disclosed herein may be combined in workable combinations in addition to those combinations that are expressly disclosed herein. It is therefore intended that the following appended claims and claims hereafter introduced are interpreted to include all such modifications, permutations, additions and sub-combinations as are within their true spirit and scope.
This application claims the benefit under 35 U.S.C. §119 of U.S. patent application No. 60/788,046 filed on 3 Apr. 2006 and entitled PEER-TO-PEER INFRASTRUCTURE AND SYSTEM FOR LARGE-SCALE CONTENT SEARCH AND COST-EFFECTIVE ON-DEMAND MEDIA STREAMING which is hereby incorporated herein by reference as though fully set out herein.
Number | Date | Country | |
---|---|---|---|
60788046 | Apr 2006 | US |