The present invention relates generally to a method and system for allocating network resources for bit streams, and more particularly to dynamically allocating resources for multimedia bit streams.
Networks are the principal means for communicating multimedia between communication devices. The content of the multimedia can include data, audio, text, images, video, etc. Communication devices include input/output devices, computers, terminals, multimedia workstations, fax machines, printers, servers, telephones, and personal digital assistants.
A multimedia network typically includes network switches connected to each other and to the communication devices by circuits. The circuits can be physical or virtual. In the latter case, the circuit is specified by a source and destination address. The actual physical circuit used will vary over time, depending on network traffic and resource requirements and availability, such as bandwidth.
The multimedia can be formatted in many forms, but increasingly it is formatted into packets. Packets in transit between the communication devices may temporarily be stored in buffers at the switches along the path of the circuit pending sufficient available bandwidth on subsequent circuits along the path.
Important considerations in network operation are admission control and resource allocation. Typically, admission control and resource allocation are ongoing processes that are performed periodically during transmission of bit streams. The admission control and resource allocation determinations may take into account various factors such as network topology and current available network resources, such as buffer space in the switches and capacity in the circuits, any quality-of-service commitments (QoS), e.g., guaranteed bandwidth, and delay or packet loss probabilities.
The admission control and resource allocation problem is complicated when a variable bit-rate (VBR) multimedia source or communications device seeks access to the network and requests a virtual circuit for streaming data. The complication arises because the features, which describe the variations in content of the multimedia, are often imprecise. Thus, it is difficult to predict what the requirements for network resources, such as requirements for bandwidth, by the VBR source will be in the future. For example, the bandwidth requirements of VBR sources typically vary with time, and the bandwidth variations typically are difficult to characterize. Thus, the admission-allocation determination is made with information that may not accurately reflect the demands that the VBR source may place on the network, thereby causing degraded network performance.
More particularly, if the network resource requirements are overestimated, then the network will run under capacity. Alternatively, if the network resources requirements are underestimated, then the network may become congested and packets traversing the network may be lost, see, e.g., Roberts, “Variable-Bit-Rate Traffic-Control in B-ISDN,” IEEE Comm. Mag., pp. 50-56, September 1991; Elwalid et al, “Effective Bandwidth of General Markovian Traffic Sources and Admission Control of High Speed Networks,” IEEE/ACM Trans. on Networking, Vol. 1, No. 3, pp. 329-343, 1993. Guerin et al., “Equivalent Capacity and its Application to Bandwidth Allocation in High-Speed Networks,” IEEE J. Sel. Areas in Comm., Vol. 9, No. 7, pp. 968-981, September 1991.
Transmission of digital multimedia over bandwidth-limited networks will become increasingly important in future Internet and wireless communication. It is a challenging problem to cope with ever changing network parameters, such as the number of multimedia sources and receivers, the bandwidth required by each stream, and the topology of the network itself. Optimal resource allocation should dynamically consider global strategies, i.e., global network management, as well as local strategies, such as, admission control during individual connections.
Bandwidth allocation and management for individual bit streams is generally done at the “edges” of the network in order to conserve computational resources of the network switches. While off-line systems can determine the exact bandwidth characteristics of a stream in advance, in many applications, on-line processing is desired or even required to keep delay and computational requirements low. Furthermore, any information used to make bandwidth decisions should be directly available in the compressed bit stream. It is desirable to have a resource management system that can accurately estimate the required bandwidth in real-time using only compressed domain information.
Resource Renegotiating for VBR Video
Of all multimedia, it is particularly desired to improve resource allocation for VBR video and audio data. These are becoming increasingly popular due to their consistent visual and acoustic quality. The hallmark of VBR data is that bandwidth undergoes both short-term and long-term changes, in reaction to the complexity and therefore, compressibility of the underlying content. Moreover, the long-term variations are more difficult to handle and being able to predict the estimated bandwidth over longer intervals is desired.
As stated above, allocating a constant amount of bandwidth to a VBR stream will usually yield one or more results: inefficient use of network resources, due to over or under-allocated bandwidths, and a requirement of large network buffers and consequent delay. Therefore, the bandwidth requests made by the VBR source should be periodically renegotiated in order to obtain high network utilization and low delay. Determining appropriate renegotiation points is also a problem. If renegotiation is too frequent, overhead increases. On the other hand, if the renegotiation is infrequent, coarse estimations are made.
Conventional methods typically renegotiate resources according to changes in bit stream level statistics, see Zhang et al., “RED-VBR: A new approach to support delay-sensitive VBR video in packet-switched networks,” Proc. NOSSDAV, pp. 258-272 1995. The relationship between past and future traffic is parametrically modeled in techniques described by Chong et al, “Predictive dynamic bandwidth allocation for efficient transport of real-time VBR video over ATM,” IEEE J. Sel. Areas of Comm., Vol. 13, No. 1, pp. 12-23, 1995, and Izquierdo et al. “A survey of statistical source models for variable bit-rate compressed video,” Multi-media Systems, Vol. 7, No. 3, pp. 199-213, 1999, and references therein.
Content-based methods are motivated by the high correlation between long-term traffic characteristics and video content, see Dawood et al, “MPEG video modeling based on scene description,” Proc. IEEE ICIP, Vol. 2, pp. 351-355, 1998, and Bocheck et al, “Content-based VBR traffic modeling and its application to dynamic network resource allocation,” Research Report 48c-98-20, Columbia Univ., 1998. Although multimedia content is a major factor in determining the bandwidth allocation, content alone may not be sufficient for predicting future traffic and in estimating how much resource to request.
Bandwidth Renegotiation Points
In the prior art, on-line determination of bandwidth renegotiation points for VBR content generally falls into three categories: deterministic, traffic-based, and content-based.
Deterministically setting the renegotiation points is the simplest method. Bandwidth requests are made every n frames, where n is an empirically determined balance between request overhead and correlation of bit-rates.
Traffic-based renegotiation occurs when a stream exceeds a previously negotiated bandwidth request, or when utilization drops below some threshold level. Although traffic-based renegotiation tracks the real bandwidth more closely, a single complex frame in a video can cause the requested bandwidth to remain unnecessarily elevated for some time.
A more “natural” renegotiation point is content-based, for example, a scene or “shot” boundary. A shot is defined as all frames acquired in a continuous sequence between when the camera's shutter opens and closes. By examining the bits used per frame in the VBR video, one can learn that the most dramatic change in bit usage occurs at the beginning of a new segment. Within a single segment, the traffic characteristics are usually relatively constant. If a segment has a sudden change in content features, the change can be considered another segment boundary, as far as renegotiation is concerned.
Many methods are known for finding segment boundaries in the compressed domain, see, for example, Yeo et al, “Rapid scene analysis on compressed video,” IEEE Tr. Circuits and Systems for Video Tech., vol. 5, No. 6, pp. 533-544, 1995. That method uses a windowed relative threshold on the sum of absolute pixel differences, and allows for fast, on-line determination of renegotiation points.
Bandwidth Request Per Interval
The next step is to determine how much resource to request at each renegotiation point, without introducing significant delay. For natural renegotiation points such as segment boundaries, previous traffic cannot generally help to determine how much resource to request when the traffic pattern has changed. With the requirement of on-line processing in mind, one can predict the traffic for the entire segment based on a short observation of the beginning part of a new segment, as illustrated in FIG. 1.
In
The content-based prediction method described by Bocheck et al. includes training and testing stages. In the training stage, content features of a training video are quantized into a small number of levels, e.g., slow, medium, or fast motion. Every possible combination of significant features is labeled as a content class for which a typical traffic pattern is determined. During testing, the content class of each segment in the video is identified by extracting the same features, and the typical traffic pattern of the class is used as the predicted traffic for that segment.
However, the Bocheck method has some potential weaknesses. First, the specific prediction structure, via classification, can only feasibly incorporate a limited number of coarsely quantized features; each feature is weighted equally, rather than by its relevance to traffic. Second, prediction based solely on content may not be applicable for bit streams produced with different encoding algorithms or parameters. Third, not all available information during the observation periods is used at the renegotiation points.
Inaccurate predictions can cause allocation requests not to be granted or insufficient resources to be requested. This may result in denial of service, dropped packets, or transcoding to a lower bit-rate, perhaps with degraded quality.
Therefore, there is a need for an improved method and system for dynamically allocating network resources at renegotiation points while transferring multimedia content over a network.
Dynamic resource allocation is critical in the transmission of multimedia bit streams, especially video and audio data. Although content is one of the major factors that controls the bandwidth requirements for the bit streams, content alone is insufficient for predicting future traffic patterns and for determining how much network resources to request. The present invention provides a method for dynamically predicting resource requirements taking into account both content features and available short-term traffic features.
More specifically, the invention provides a method and system for dynamically allocating network resources while transferring a bit stream in a network. The method extracts first content features from the bit stream to determine renegotiation points and observation periods. Second content features and traffic features are extracted from the bit stream during the observation periods. The second content features and the traffic features are combined in a prediction neural network to determine the network resources to be allocated at the renegotiation points. The bit stream can have a variable or constant bit-rate. The features to be extracted can be selected from a training bit stream using either sequential forward selection or a consistency measure, or a combinartion of both.
a is a block diagram of a selection neural network for selecting features;
b is a block diagram of a process for selecting features according to consistency measures;
c is a block diagram of a hybrid feature selection process;
As shown in
As shown in
Although the problem of predicting long-term or future traffic based on short-term traffic can be handled via parametric modeling, it is difficult to derive a simple and effective parametric model when incorporating content features. For this reason, we describe the use of a prediction neural network to accomplish the prediction task.
As shown in
We use the time between the content boundaries 221 and the renegotiation points 301 as observation periods 401. During each observation period 401, we extract additional content features 201 and traffic features 202.
The observed content and traffic features are classified and analyzed, and selected features and features are combined by the prediction neural network 400. Note, the combining in the prediction neural network can be weighted on a range of zero to one. For example, in some applications, the weight of the content features can be zero and the weight of the traffic features can be one so that the prediction is entirely based on the traffic features. Back-propagation, as describe by Kung, “Digital Neural Networks,” Prentice Hall, 1993, can be applied during training to determine the weights. The prediction neural network predicts network resources 410 required at the renegotiation points 301 from the combined content and traffic features.
Feature Selection
As shown in
Sequential Forward Selection and General Regression Neural Network
The feature selection 602 can be performed according to one of the following three feature evaluation and selection procedures.
In a first procedure, we use a non-linear one-pass selection based on a sequential forward selection (SFS), and a general regression neural network (GRNN) to select a subset of relevant features 501-505 for traffic prediction. The principles of SFS and GRNN are described generally by Kittler, in “Feature set search algorithms,” Pattern Recognition and Signal Processing, C. H. Chen, Ed. Sijthoff & Noordhoff, 1978, and Specht in “A general regression neural network,” IEEE Trans. Neural Networks, vol. 2, no. 6, pp. 568-576, 1991, respectively. They do not describe the combination of SFS and GRNN, and the combined use for feature selection in a network resource allocation context.
The SFS procedure selects the best single feature as the first feature of the subset 501. Next, each of the other candidate features is evaluated with the first feature to find the best two features including the first feature. This is repeated until a desired number of features have been selected. The SFS method is suitable for this purpose because it is capable of incrementally constructing relevant subsets from a single feature. Thus, the construction of subsets of features can be done without requiring the observation of many possible subsets.
As shown in
To evaluate the relevancy of the subset features 501-505, we consider the mean square error (MSE) between actual and estimated values of traffic features. In a preferred embodiment, the actual and estimated values are expressed in terms of principal components (PCA) of D-BIND traffic features. D-BIND traffic features are described in greater detail below. Consider the full feature set F 500 and the mapping of the subset of features Fm 501-505. We denote the training data by (xF,p,yp), where xF,p is the p-th feature in the set of P full features 500, and yp is ground truth data that we wish to approximate, i.e., actual DBIND-PCA values. The mapping of each feature from the subset of features to the approximated data is denoted by g(xF
Beginning with the empty subset for Fm, we individually evaluate the relevancy of remaining features in the complementary set, i.e., F-Fm. At each iteration, a new feature is added to the subset Fm. At the end of this process, the subset Fm contains the minimum number of features that yield the lowest MSE.
a shows the mapping of the features that is defined by the selection neural network 700. The selection GRNN 700 includes a first layer 702 and a second layer 703. As shown in
Given the set of training data, we associate each sample point with a single Gaussian kernel of the first network layer 702. The input vector x 701 is assigned as the center of the kernel. For an arbitrary input vector, the output of the p-th unit is given by
where σ is a user-specified smoothing parameter. The GRNN output 704 which represents the estimated function value for x is given by the following convex combination,
where the coefficients αp are defined as follows
Intuitively, the GRNN 700 performs interpolation by linearly combining the given training outputs using a set of adaptively determined coefficients.
Consistency Measure-Based Feature Selection
A second evaluation procedure, shown in
A consistency measure C for each set of features is determined 716:
We want the classes to be compact and well separated from other classes. Therefore, a good feature has a small intra-class distance, and large inter-class distance, yielding a large consistency measure C. The distance measure can be Euclidean. The preferred consistency measure considers content features that are related to traffic in a monotonic way.
We select a subset of features 603 that give the largest C values. In decreasing order of importance, these features include an I-frame spatial complexity 501, the mean magnitude of the acceleration vectors 502, the mean magnitude of the motion vectors 503, and the spatial variance of the motion vectors 504. Other features can also be used if they increase the consistency measure C.
The first, I-frame spatial complexity, directly affects peak bandwidth requirements for future I-frames in the segment, and indirectly, peak bandwidth requirements of P and B frames. The spatial complexity can be estimated using a weighted sum of the magnitudes of the AC coefficients for each macroblock of the I-frame.
Motion vectors from adjacent P frames are subtracted to form “acceleration” vectors. The mean magnitude of the acceleration vectors forms our second content feature,
Where {right arrow over (m)}k is a forward motion vector for macroblock (i, j) of frame k, and M and N are the frame dimensions in macroblocks. A high value of the mean magnitude indicates that the motion in the video is complex, and that the residue frames will become increasingly complex, thus requiring more bits.
Similarly, the mean magnitude of the motion vectors is a measure of how much motion compensation is needed, and therefore, an indication of how complex the residue frames are likely to be. Finally, we measure the spatial covariance of the x and y motion vector components.
Hybrid SFS/GRNN and Consistency Based Feature Selection
A third technique for feature selection uses a hybrid approach as shown in
Traffic Descriptors
Many descriptors of traffic are known. Among them, the peak rate, the average rate, and the mean rate are simple ones. However, these descriptors do not capture the traffic patterns over different time scales. To overcome this problem, and as described above with reference to
D-BIND is a vector that includes a maximum allowed arrival rate for various time intervals. D-BIND provides a performance guarantee for the worst case. It is defined as follows.
The cumulative number of bits arriving during a time interval beginning at time τ and of a length t is A[τ, τ+t]. A tightest bound over all time, called the empirical envelope, is:
B*(t)=sup A[τ, τ+t].
A piecewise-linear bounding function BW
WT={(qk, tk)|k=1, 2, . . . , p}
is a vector of bit arrival and interval pairs. Given a set of tk, the tightest function is denoted B*W
The D-BIND descriptor is usually expressed in terms of arrival rates:
RT={(rk, tk)|k=1, 2, . . . , p},
where rk=qk/tk. This descriptor captures both the short-term “burstiness” and the long-term traffic characteristics of a bit stream, while being relatively simple to implement in admission control and policing.
Fixing [t1, . . . , tp], D-BIND can be described by a vector [r1, . . . , rp] We use r1 through r4 505
When describing an entire segment, the dimensionality of D-BIND becomes large and the prediction complexity goes up. Such an increase is rather wasteful as there is some redundancy in D-BIND. For example, the value rk approaches the mean bit-rate for large k.
Redundancy Check
In order to reduce prediction complexity, we provide two solutions in the form of a redundancy check 734, as shown in
In a first embodiment, we apply principal component analysis (PCA) to the selected subset of features and use the first N principal components as input descriptors to the prediction neural network 400. Thus, the prediction neural network 400 can dynamically predicts the N values.
In a second embodiment, we directly determine cross-correlations between pairs in the selected subset of features. Given that certain pairs of features exhibit high correlation, we can reduce the size of the subset by eliminating redundant features.
Detailed Structure of Dynamic Resource Allocation
The detailed structure of our method is shown in FIG. 8. There are three major blocks, feature extraction 801, feature selection and traffic analysis 802, and traffic prediction 803. The heavy lines 804 indicate data flows used during training and feature selection as described with respect to
Compressed domain processing 806 can use windowed relative thresholds on the sum of absolute pixel differences to perform temporal segmentation 810 of the input multimedia 220 to determine the renegotiation points 301 and the following observation periods 401 of FIG. 4. The features extracted during the observation periods are passed forward for feature selection 602 using any of the three procedures described above. The selected subset of features is passed to the prediction neural network 400.
A traffic descriptor 812 is derived from the extracted traffic features 202. The descriptor is can be used to classify traffic patterns as described above. The dimensionality of the patterns can be reduced by principal component analysis, and a reduced dimensionality traffic descriptor is provided to the prediction neural network 400 to be used in conjunction with the final subset of selected features 603 to predict the network resources 410 to be requested at the renegotiation points 301.
Effect of Dynamic Resource Allocation
We compare channel utilization using our method with known bit stream level approaches. We also evaluate the contribution of content and traffic features of short observation periods to resource prediction. In the comparison we use a 13175 frame video, about 7 minutes, digitized from cable television at 30 frames per second. The video is encoded via MPEG-1 VBR of a fixed quantization step size, with an average bit-rate of 2.1 Mbps.
Link Utilization
The RED-VBR scheme, described by Zhang et al. in “RED-VBR: A new approach to support delay-sensitive VBR video in packet-switched networks,” in Proc. NOSSDAV, pp. 258-272, 1995, is a heuristic renegotiation method. That method raises the reserved bandwidth, as described by D-BIND, by a factor α when the real bandwidth exceeds the current reservation, and lowers it by a factor β when the real bandwidth remains below the reserved resource for K frames. The average R-VBR renegotiation frequency is dependent on α, β, and K.
In contrast, our method uses renegotiation points at video boundaries obtained from the content-based temporal segmentation 810. We identified 177 segments in the sample video. Bandwidth reservations comprise two D-BIND principal components from our prediction neural network 400. We train the prediction neural network 400 by one hundred sweeps with data from the first fifty segments.
Link utilization is obtained by trace-driven simulation, similar to that described by Bocheck et al. Multiple video sources, based on the above described sample video but with random starting points, are multiplexed into a T3 line with a bandwidth of 45 Mbps. The results of the comparison are shown in FIG. 9.
With three sets of parameters specified, renegotiation requests from RED-VBR were generated at average intervals of 0.81, 1.54, and 2.23 seconds. The corresponding utilizations are shown by dashed curves 901-903. The horizontal line 904 shows the utilization when the peak bandwidth is allocated to each segment. The upper solid curve 905 is the utilization according to our method, which renegotiates once every 2.48 seconds, on the average. Our method outperforms the RED-VBR scheme of similar renegotiation frequency by 18% as shown by curve 903, and by 9% against the RED-VBR with tripled renegotiation frequency as shown by curve 901.
Mean Square Error (MSE) of Traffic Prediction
In
With respect to renegotiation points, we consider:
We consider three different neural network inputs for traffic prediction, all based on features extracted during the observation periods:
Constant Bit-Rate Resource Prediction
Our method can also be used in applications where CBR transcoders and encoders are used. The CBR video stream is segmented as above, although the lengths of the segments can be much longer than for a VBR bit stream. Each segment is then transmitted at an appropriate constant bit rate predicted during an observation period at the beginning of the segment. This leads to a piece-wise estimation of bandwidth over time for the CBR bit stream.
We have described a method for dynamically allocating network resources to multimedia bit streams. A content-based approach for determining optimal renegotiation points improves network utilization over non-content-based methods. In traffic prediction, using short-term traffic features as well as content features as inputs to a prediction neural network is more effective than using either content or traffic features alone.
Although the invention has been described by way of examples of preferred embodiments, it is to be understood that various other adaptations and modifications may be made within the spirit and scope of the invention. Therefore, it is the object of the appended claims to cover all such variations and modifications as come within the true spirit and scope of the invention.
Number | Name | Date | Kind |
---|---|---|---|
5675384 | Ramamurthy et al. | Oct 1997 | A |
5838663 | Elwalid et al. | Nov 1998 | A |
6040866 | Chen et al. | Mar 2000 | A |
6067534 | Terho et al. | May 2000 | A |
6263016 | Bellenger et al. | Jul 2001 | B1 |
6269078 | Lakshman et al. | Jul 2001 | B1 |
6320867 | Bellenger et al. | Nov 2001 | B1 |
6665872 | Krishnamurthy et al. | Dec 2003 | B1 |
6721355 | McClennon et al. | Apr 2004 | B1 |
6754241 | Krishnamurthy et al. | Jun 2004 | B1 |
Number | Date | Country | |
---|---|---|---|
20020150044 A1 | Oct 2002 | US |