IP multicast is a technique for one-to-many and many-to-many real-time communication over an IP infrastructure in a network. It scales to a larger receiver population by not requiring prior knowledge of who or how many receivers there are. Multicast uses network infrastructure efficiently by requiring the source to send a packet only once, even if it needs to be delivered to a large number of receivers. The nodes in the network (typically network switches and routers) take care of replicating the packet to reach multiple receivers such that messages are sent over each link of the network only once. The most common low-level protocol to use multicast addressing is User Datagram Protocol (UDP). By its nature, UDP is not reliable—messages may be lost or delivered out of order. Reliable multicast protocols such as Pragmatic General Multicast (PGM) have been developed to add loss detection and retransmission on top of IP multicast.
ascp-mc as described herein relates to point-to-point transfer applications with a new point-to-multipoint transfer protocol based on IP multicast that enables data distribution to thousands of receivers in a scalable and efficient way. It solves typical large-scale distribution problems in the areas of digital cinema, digital signage, or VOD distribution to cable head-ends:
to transport IP multicast packets.
Design Principles of fasp-mc
fasp-mc is the reliable IP multicast transport protocol implemented in ascp-mc. It implements proprietary mechanisms that ensure reliability, scalability, transport efficiency and security. It is not compatible with or based on any of the publicly available reliable multicast protocol definitions such as PGM.
fasp-mc is based on the following principles:
fasp-mc does not implement any congestion avoidance or dynamic rate control mechanisms. It always sends data at a configurable target rate. On the one hand, this is due to the fact that congestion avoidance in multicast data distributions is highly complex. On the other hand, most multicast-enabled network environments offer ad hoc Quality of Service functionality with bandwidth reservations (satellite networks offer this natively).
High-Level Protocol Description
An overview of the IP multicast transport protocol is as follows. (A detailed description of the multicast algorithm as described above is given in Appendix I entitled “Multicast Transmission Algorithm Specification.”) The transmission of a file or a set of files with fasp-mc occurs in distinct stages, called transmission phases. A transmission always starts with the session initiation phase, in which the sender announces the transmission and receivers join that transmission. During the continuous repair phase, the actual data transmission takes place, including repair of regular data loss. An optional out-of-window repair phase is kicked off if some receivers have not received all data during the continuous repair phase.
In order to transition from one phase to the next, the protocol uses feedback from the receivers and timing information (timeouts). In addition, there is an exclusion strategy that defines how to treat “misbehaving” receivers, and a termination strategy that decides when to terminate a transmission.
Session Initiation
The session initiation phase is used by the sender to announce a transmission and determine which receivers will join the session. During this phase, the sender sends out session announcement packets periodically for a predetermined period of time. The packets contain basic session information such as feedback server address, packet size, segment size, etc. Upon reception of a session initiation packet, a receiver who wants to join the session responds with a session acknowledgement packet, which allows the sender to know which receivers take part in the transmission. Optionally, the session initiation packets might include transfer metadata that is needed by receivers before file data can be received. See next section for details.
Metadata and Data Subsessions
The transfer of file data can only occur once the receivers know where and how to store that data. The “where and how” is called transfer metadata and contains information like destination file paths and names, file sizes, access rights, etc.
Transfer metadata is usually relatively small. In that case, it is inlined in the session initiation packet. However, if many files or entire directory trees are distributed within a single transmission, the metadata can become much larger. In that case, transfer metadata is transferred just like a regular file within the so-called metadata subsession. This subsession works exactly like the data subsession that transports the actual file data subsequently.
Subsession
A subsession consists of a continuous repair phase and an optional out-of-window repair phase.
Continous Repair
The continuous repair phase is the main data transfer phase, during which the sender sends file data (also called original data) at its target bandwidth, unless it needs to resend lost packets. Packet loss information is determined from feedback that receivers send back to the sender regularly.
During continuous repair, data is handled such that all disk I/O is sequential (for performance reasons). This applies to both the sender as well as the receivers. The sender keeps data it has read from disk and sent out on the network in a read cache in memory until all receivers have acknowledged its reception. This prevents the sender from having to go back and re-read data from disk when packet loss is signalled. The receiver caches received non-contiguous data chunks in memory and flushes them to disk only when the missing packets are received—again with the goal to optimize disk throughput thanks to sequential disk access.
The sender and receiver caches work in a similar fashion as the sliding window in protocols like TCP, with two differences:
1. ascp-mc uses the sliding window mainly for performance reasons (sequential disk I/O) instead of packet numbering and reliability 2. The ascp-mc sender will not wait for full reception of the entire window by all receivers. If the window (cache) reaches a configurable maximum size, the window is just moved along and new data is sent, potentially leaving some receivers behind with losses. These are repaired later on during the out-of-window repair phase. To repair lost packets, the system does not just resend the corresponding originals. Instead, a repair packet is generated that has the potential to repair multiple uncorrelated (different) losses at different receivers. Repair packets are calculated based on forward error correction (FEC) techniques, where all packets in a segment (80 packets per default) are combined in such a way that a single repair packet can repair any single packet loss within the segment (and 2 repair packets can repair any two losses, etc.). With large receiver populations, this repair technique can reduce the amount of sent-out repair data by orders of magnitude.
Out-of-Window Repair (OOW Repair)
Due to the fact that the sender and receiver caches are limited in size, receivers might still be missing packets after the continuous repair phase. These missing packets are repaired during the OOW repair phase (named after the fact that they fell outside of the repair window of the continuous phase).
The OOW repair phase might not be able to reach the target send rate due to random disk access. However, the continuous repair phase is usually able to repair most if not all packet loss, so the amount of data to retransmit is very limited.
Exclusion Strategy
The fasp-mc protocol is designed to transfer data efficiently to thousands of receivers simultaneously. But what should it do if the various receivers behave in completely different manners (e.g. because of highly variable network conditions or different hardware performance)? What should happen if 2 out of 100 receivers observe much higher packet loss than the rest?
The answer is provided by the exclusion strategy implemented in the ascp-mc sender. It excludes individual receivers from the transmission in order to guarantee best performance for the remaining (majority of) receivers. The exclusion is based on the losses signalled by an individual receiver in relation to the loss signalled by the majority of receivers.
The exclusion strategy is configured with the following sender command line options:
With the default values, a receiver is excluded from the transmission if it has 1.5 more losses than 90% of the remaining receivers.
Excluding a receiver does not mean that the receiver cannot participate in the transmission anymore. Instead, the sender simply ignores an excluded receiver's feedback, i.e. it will not attempt to repair that receiver's losses. The receiver continues receiving all packets sent out by the sender and might actually be re-integrated in the transmission as a regular receiver if its average loss rate approaches that of the majority.
Termination Strategy
The decision when to terminate a transmission in the optimal case is simple: as soon as all receivers have received the entire content. With many heterogenous receivers and external transmission constraints (i.e. deadline), the problem becomes more subtle.
As with the exclusion strategy, the ascp-mc sender allows to configure the termination behavior. The termination strategy offers the following criteria and conditions for ending a transfer:
Coverage
Terminate the transmission if the coverage (i.e. the number of receivers that have successfully received all content) reaches a given percentage.
Time
Terminate the transmission if a given absolute time (deadline) is reached or if the transmission has been running for a maximum duration.
Volume
Terminate the transmission if the total amount of bytes sent exceeds a given threshold, e.g. 1.5 times the file data.
Multiple of these criteria can be used at the same time. The first match will terminate the transmission.
The coverage condition provides additional criteria that allow the transmission to continue in order to increase the coverage further, but only if the cost for doing so is reasonable. This is expressed as a coverage increase that needs to be reached without exceeding a given transmission volume increase.
Feedback Rate
A main challenge for scaling transmissions to thousands of receivers is minimizing the feedback traffic from receivers. This is achieved with multiple techniques including the reduction of the feedback information itself, the frequency of feedback messages and feedback suppression. ascp-mc offers a very simple mechanism to tune the system in that respect by exposing a maximum aggregate feedback bandwidth that the sender is willing to accept. Based on that value, the algorithm will tune all necessary parameters accordingly.
ascp-mc Application
An ascp-mc application has been developed that is available as two command-line applications (sender.sh and receiver.sh) for Linux operating systems that are bundled in the same tarball.
Installation
Extracting the tarball to an installation directory. The tarball includes the necessary Java Runtime Environment (JRE). This will create the following directory structure:
Command Line Usage
In order to execute a transmission, the receivers have to be started first:
receiver.sh [OPTIONAL ARGUMENTS]
Subsequently, the sender is launched with the desired source file or directory to be transmitted.
sender.sh [OPTIONAL ARGUMENTS] SOURCE [DESTINATION]
Receiver Command Line Options
java MulticastReceiverApp [options . . . ]
Sender Command Line Options
java MulticastSenderApp [options . . . ] SOURCE DESTINATION
SOURCE: The source path. May denote a single file ora directory. In the latter case, all files in the directory and any subdirectories are transferred. DESTINATION: The destination path, relative to the receivers' docroots.
Original data—Unmodified data of the source files to transfer.
FEC repair data—Repair data calculated from original data using a small-block, forward-error-correction code (Reed-Solomon) on a segment of data.
Original repair data—Retransmission of original data. Used when an entire segment needs to be resent.
Segment—A fixed-size chunk of the original data. Original data is (virtually) divided into segments, which serve as the basis for FEC computations.
The invention has been described in conjunction with the foregoing specific embodiments. It should be appreciated that those embodiments may also be combined in any manner considered to be advantageous. Also, many alternatives, variations, and modifications will be apparent to those of ordinary skill in the art. Other such alternatives, variations, and modifications are intended to fall within the scope of the following appended claims.
This application claims priority to U.S. Provisional Patent Application Ser. No. 61/473,270, filed on Apr. 8, 2011, which is hereby incorporated by reference in its entirety. This application incorporates by reference the specifications of the following applications in their entirety: U.S. Provisional Patent Application Ser. No. 60/638,806, filed Dec. 24, 2004, entitled: “BULK DATA TRANSFER PROTOCOL FOR RELIABLE, HIGH-PERFORMANCE DATA TRANSFER WITH INDEPENDENT, FULLY MANAGEABLE RATE CONTROL”; U.S. patent application Ser. No. 11/317,663, filed Dec. 23, 2005, entitled “BULK DATA TRANSFER”; and U.S. patent application Ser. No. 11/849,782, filed Sep. 4, 2007, entitled “METHOD AND SYSTEM FOR AGGREGATE BANDWIDTH CONTROL”.
Number | Date | Country | |
---|---|---|---|
61473270 | Apr 2011 | US |