The invention relates to the coding of streaming media, and in particular, relates to loss recovery in streaming media applications.
Streaming media is a form of data transfer typically used for multimedia content such as video, audio, or graphics, in which a transmitter sends a stream of data so that a receiver can display or playback the content in real time. When multimedia content is streamed over a communication channel, such as a computer network, playback of the content becomes very sensitive to transmission delay and data loss. If the data does not arrive reliably or bandwidth on the client falls below an acceptable minimum, playback of the content is either delayed or discontinued. The rate of data transfer (e.g., the bit rate) required to achieve realistic output at the receiver depends on the type and size of the media being transmitted.
In a typical application of streaming media, a server transmits one or more types of media content to a client. Streaming media is becoming more prevalent on the World Wide Web, where server computers deliver streaming media in the form of network data packets over the Internet to client computers. While multimedia data transfer over computer networks is a primary application of streaming media, it is also used in other applications such as telecommunications and broadcasting. In each of these applications, the transmitter sends a stream of data to the receiver (e.g., the client or clients) over a communication channel. The amount of data a channel can transmit over a fixed period of time is referred to as its bandwidth. Regardless of the communication medium, the bandwidth is usually a limited resource, forcing a trade-off between transmission time and the quality of the media playback at the client. The quality of playback for streaming media is dependent on the amount of bandwidth that can be allocated to that media. In typical applications, a media stream must share a communication channel with other consumers of the bandwidth, and as such, the constraints on bandwidth place limits on the quality of the playback of streaming media.
One way to achieve higher quality output for a given bandwidth is to reduce the size of the streaming media through data compression. At a general level, streaming media of a particular media type can be thought of as a sequential stream of data units. Each data unit in the stream may correspond to a discrete time sample or spatial sample. For example, in video applications, each frame in a video sequence corresponds to a data unit. In order to compress the media with maximum efficiency, an encoder conditionally codes each data unit based on a data unit that will be transmitted to the client before the current unit. This form of encoding is typically called prediction because many of the data units are predicted from a previously transmitted data unit.
In a typical prediction scheme, each predicted data unit is predicted from the neighboring data unit in the temporal or spatial domain. Rather than encoding the data unit, the encoder uses the neighboring data unit to predict the signal represented in the current unit and then only encodes the difference between the current data unit and the prediction of it. This approach can improve the coding efficiency tremendously, especially in applications where there is a strong correlation among adjacent data units in the stream. However, this approach also has the drawback that a lost data unit will not only lose its own data, but will also render useless all subsequent data units that depend on it. In addition, where a stream of data is converted into a stream of units each dependent upon an adjacent unit, there is no way to provide a random access point in the middle of the stream. As a result, playback must always start from the beginning of the stream.
In order to solve these problems, conventional prediction schemes typically sacrifice some compression efficiency by breaking the stream into segments, with the beginning of each segment coded independently from the rest of the stream. To illustrate this point, consider the typical dependency graph of data units of a media stream shown in
The dependency graph in
Conventional prediction schemes classify the data units in the stream as either independent data units (shown marked with the letter I, e.g., 100, 102, and 104) or predicted data units (shown marked with the letter P, e.g., 106-128). The I units are independent in the sense that they are encoded using only information from the data unit itself. The predicted units are predicted based on the similarity of the signal or coding parameters between data units. As such they are dependent on the preceding data unit, as reflected by the arrows indicating the dependency relationship between adjacent data units (e.g., dependency arrows 130, 132, 134, and 136).
Because independent units are encoded much less efficiently than predicted units, they need to be placed as far apart as possible to improve coding efficiency. However, this causes a trade-off between coding efficiency, on the one hand, and data recovery and random access on the other. If a data unit is lost, the predicted units that depend on it are rendered useless. Therefore, independent data units need to be placed closer together to improve data recovery at the sacrifice of coding efficiency. As the independent units are placed closer together, coding efficiency decreases and at some point, the available bandwidth is exceeded. When the bandwidth is exceeded, the quality of the playback of streaming media suffers excessive degradation because the given bandwidth cannot maintain adequate quality with such poor coding efficiency.
Another drawback of the scheme shown in
The invention provides a coding method for streaming media that uses remote prediction to enhance loss recovery. Remote prediction refers to a prediction-based coding method for streaming media in which selected data units are classified as remotely predicted units. The coding method improves loss recovery by using remotely predicted units as loss recovery points and locating them independently of random access points in the data stream. The remotely predicted units improve loss recovery because they depend only on one or a limited number of units located at a remote location in the encoded data stream. As a result they are less sensitive to data loss than a conventional predicted unit, which often depends on multiple data units. The remotely predicted units can be inserted closer together than independent units without substantially decreasing coding efficiency because they have a much higher coding efficiency than independent data units.
One aspect of the invention is a process for classifying the data units in streaming media to enhance loss recovery without significantly decreasing coding efficiency. Typically performed in the encoder, this process classifies data units as independent units (I units), predicted units (P units), or remotely predicted units (R units). Operating on an input stream of data units (e.g., a sequence of video frames), the process groups contiguous sequences of units into independent segments and classifies the data units within each segment so that it includes an I unit, followed by P units, and one or more R units. The R units provide improved loss recovery because they depend only on the I unit in the segment, or alternatively, another R unit. In addition, they are encoded more efficiently than I units because they are predicted from another data unit, as opposed to being coded solely from the unit's own data.
To support remote prediction, an encoder implementation classifies the data units as either I, P, or R type units, and then encodes each differently according to their type. The encoder predicts both the P and R type units from a reference unit, but the reference unit is usually different for these data types. In particular, the R unit is predicted from the I unit in the segment, or alternatively, from another R unit. The P units are predicted from an adjacent unit in the stream (e.g., the immediately preceding data unit). To support both forms of prediction at the same time, the encoder allocates two memory spaces, one each for the reference units used to predict R and P units. In this encoding scheme, the independent segment typically starts with an I unit, which is followed by multiple P units, each dependent on the immediately preceding unit. R units are interspersed in the segment as needed to provide data recovery, while staying within bandwidth constraints.
In the decoder implementation, the decoder identifies the type of data unit, usually based on an overhead parameter embedded in the bit stream, and then decodes the data unit accordingly. Like the encoder, the decoder allocates two memory spaces for storing reference data units, one used to reconstruct the R type units and another to reconstruct P type data units. When the decoder identifies a data unit as an R unit, it reconstructs the original data unit using the I unit for the current segment, or alternatively, a previous R unit. When the decoder identifies a data unit as a P unit, it reconstructs the original data unit using the immediately preceding data unit, which has been previously reconstructed.
A variety of alternative implementations are possible. In particular, the data units, and specifically the R type units, can be classified dynamically based on some criteria derived or provided at run-time, or can be inserted based on a predetermined spacing of R units relative to the P units in a segment. Also, the I, P, and R units can be prioritized for transfer to improve error recovery and make the transmission more robust with respect to data losses. In particular, the data units are preferably prioritized so that I units are transferred with the most reliability, R units the second most reliability, and P units the least.
Further features and advantages of the invention will become apparent with reference to the following detailed description and accompanying drawings.
Introduction
The following sections provide a detailed description of a multimedia coding scheme that uses a remotely predicted data unit to enhance loss recovery. The first section begins with an overview of an encoder and decoder implementation. The next section then provides further detail regarding an implementation of the encoder and decoder for video coding applications. Subsequent sections describe methods for classifying the type of data units in an encoded bit stream and prioritizing the transfer of encoded units based on the data unit type.
To illustrate the concept of a remotely predicted data unit, it is helpful to consider a dependency graph of an encoded data stream that uses remotely predicted units.
If loss recovery and random access are considered independently, the independent data units can be placed further apart, e.g., ten to fifteen seconds or more, because sufficient random access can still be achieved even when the independent data units are spread further apart. Using remotely predicted data units as loss recovery points, a prediction-based coding scheme can place the loss recovery points nearly independently from the random access points.
The arrows (e.g., 230, 232) interconnecting the data units in the stream illustrate the dependency relationships among each of the data units. The dependency relationship means that the dependent data units are coded based on some signal attribute of the reference data unit or on the coding parameters of the reference data unit. The independent data units are the first units in independent segments of the stream. For example, the independent data unit labeled 200 is the first data unit of an independent segment spanning from data unit 200 to data unit 218. Also, independent data unit 202 is the start of another independent data segment. Each of the independent data units is encoded using only information from a single, corresponding data unit in the input stream. For example, in a video stream, the data units correspond to image frames, and the independent data units correspond to intra-coded frames, which are coded using solely the content of the corresponding image frame.
As reflected by the arrows (e.g., 230-232) in the dependency graph shown in
The remotely predicted data units (224-228) shown in
While the remotely predicted data units shown in the dependency graph of
In the dependency graph shown in
Each R type unit forms the beginning of a sub-segment in which each of the higher level R type units in the sub-segment are dependent on the first unit in the sub-segment.
The coding approach shown in
Each of the R type units provides a loss recovery point that is dependent on some other data unit, either R or I, which is transmitted with higher priority. Thus, if some P type units or even some R type units are lost, the decoder on the receiver only needs to select the next higher level R or I type unit to recover from the loss. This improves loss recovery since the next R type unit is typically a lot closer than the next I unit would be in a conventional loss recovery scheme.
To implement this scheme, the remotely predicted data units encode the level of hierarchy that they reside in, as well as an identifier of their reference unit. The decoder uses the level of hierarchy to determine which data unit is the proper recovery point in the event of data losses, and it uses the identifier of the reference unit to determine which previously reconstructed data unit it will use to reconstruct the current data unit. The identifier may be implicit in the level of hierarchy of the remotely predicted unit. In other words, the decoder may select the immediately preceding R unit in a lower level of hierarchy as the reference unit for the current R type unit. In this context, the phrase, “immediately preceding” refers to the order of the data units in the reconstructed data stream, which corresponds to the original order of the data units at the input to the encoder. The identifier may also explicitly identify a previously decoded data unit (e.g., frame of video) by the number of the reference data unit or an offset identifying the location of the reference frame in the reconstructed sequence of data units.
Media coders that incorporate the above-approach can be designed to insert the remotely predicted units at fixed locations in the data stream, or can place them selectively based on some other criteria. For example, an encoder may allow the user to specify the location of the I and R units as an adjustable input parameter. In addition, the encoder may be programmed to insert R units to improve loss recovery in cases where the media content has higher priority or where data losses are expected. At run-time, a real-time encoder can be programmed to determine when data losses are expected by querying the communication interface for the data rate or for a measure of the data loss incurred for previously transmitted data units.
To mitigate losses, the encoder can be adapted to prioritize the transmission of the different data unit types. Independent data units, under this scheme, have the highest priority, then R units, and finally P units.
Each of these features is elaborated on below.
Encoder Overview
Once classified, a predictor module 312 computes the prediction unit 314 for each of the data units in the input stream that are classified as R or P units. In most applications, the predictor module will use some form of estimation to model the changes between a data unit and the data unit it is dependent on. One concrete example of this form of prediction is motion estimation performed in video based coding to predict the motion of pixels in one video frame relative to a reference video frame. In some applications, the changes between a data unit and the reference data unit may be so small that no estimation is necessary. In these circumstances, the prediction unit 314 is simply the reference data unit. The reference unit is the data unit that the current data unit being encoded is directly dependent on. The arrows connecting data units in
If the predictor 312 uses a form of estimation to improve coding efficiency, it produces a set of prediction parameters 316 in addition to the prediction unit 314. These prediction parameters are used to estimate changes between the data unit being decoded and its reference data unit. As shown in the dependency graph of
If the coding system is implemented to support multi-levels of R units, then the encoder will allocate another memory unit for each level of remotely predicted unit. For example, a first level of remotely predicted units, R1, may be inserted between units classified as I type in the original input stream. Each R1 unit is dependent on the I unit in the segment. A second level of remotely predicted units may be inserted between units classified as R1 such that each of these second level (R2) units is dependent on the R1 or I unit at the beginning of the sub-segment. This multi-level hierarchy may be repeated for each level by interspersing data units Rn in between the data units classified in the immediately preceding level Rn−1 in the hierarchy, where n represents the level of hierarchy. The default spacing of the units Rn at each level is such that the Rn units are approximately evenly distributed. However, the spacing of R type units may vary at each level as explained in more detail below. Since each unit Rn is dependent on at least the level below it (e.g., level n−1), this scheme requires that a memory unit be allocated for each level.
To reduce coding errors, the predictor module 312 uses a reconstructed version of the reference data unit for prediction rather than a data unit read directly from the input stream. The predictor 312 uses the reconstructed data unit for prediction (either from memory unit 318 or 320, depending on whether or not the prediction is remote). This aspect of the encoder is reflected in
After the prediction stage, the encoder computes the differences between the prediction unit 314 and the data unit that it is dependent on (the reference unit). For example, in video applications, the error calculator 330 computes the differences between each pixel in the predicted image and the corresponding pixel in the reference image, which is either the reconstructed version of the first frame in the sequence or the reconstructed version of the immediately adjacent frame in the sequence. The output of the error calculator 330 is the error signal 332, which is a representation of the differences between the current unit and its predictor.
The data classifier can also use the magnitude of the prediction error to choose coding between R and P type units. In particular, if the prediction error for P units is getting relatively high, it is worthwhile to insert an R unit instead, since it provides better loss recovery without unduly sacrificing fidelity in this case. When the error gets too big for an R type unit, for example, the data classifier should start a new independent segment, starting with a new I type unit. In addition, the encoder may be adapted to compute the prediction error for both an R and P type unit, and then pick the unit type based on which one has the least prediction error.
There are variety of ways of implementing the inter-process communication between the error calculator/predictor module and the data classifier module. For example, the error calculator 330 or predictor 312 can be designed to notify the data classifier when a threshold is exceeded by signaling an event that the data classifier monitors. Alternatively, the data classifier can query the error calculator periodically (e.g., after processing some number of data units in the stream) to check the error level of the most recent R frame. This interaction is reflected in
After the error is computed for the current data unit, a predicted unit encoder 334 encodes the error signal 332 and prediction parameters 316 for the unit. The output of this portion of the encoder is either an encoded P type unit 336 or an R type unit 338. Since both of these types of data units share substantially the same format, the same encoder can be used to encode both of them. An example of this type of encoder for video applications is a conventional DCT or wavelet encoder. In this example, the encoder might contain a transform coder, such as a DCT or wavelet coder, to transform the signal into the frequency domain. Next, the encoder would quantize the frequency coefficients, and then perform some form of entropy coding such as Huffman or arithmetic coding on the quantized coefficients.
In most applications, the predicted unit encoder 334 will use some form of indicator such as a parameter or a flag that indicates the type of data unit. This parameter is used to signal the type of data unit being encoded so that the decoder can determine which previously reconstructed data unit to use as the reference data unit and which unit to use as a loss recovery point.
As noted above, the predicted data units are encoded using a reference unit that is already passed through the encoding and decoding phases. This process of reconstructing the reference unit for encoded data is illustrated, in part, as the predicted unit decoder 340. The predicted unit decoder 340 performs the reverse operation of the predicted unit encoder 334 to reconstruct the error signal 342. Continuing the video coding example from above, the predicted unit decoder could be implemented with an inverse entropy coder to convert a stream of variable length codes into quantization indicies, followed by an inverse quantization process and an inverse transform coding method to transform the coefficients back into the spatial domain. In this case, the error signal 342 is different than the error signal 332 from the error calculator 330 because the coding method is a lossy process. After reconstructing this error signal 342, the encoder adds (344) the error signal to the prediction unit 314 to compute the reconstructed unit and stores it in memory unit 318 or 320, depending on whether it is located at a reference point for dependent R or P units. As noted above, the encoder may have multiple different memory units to buffer reference units for R type data units if the encoder is designed to support multi-layered R type coding.
The encoder encodes independent data units using only the data from the single, corresponding data unit from the input stream. As shown in
As the encoded I, P, and R units are generated by the respective encoders, a coding format module 354 then integrates these encoded units into the coding format. The coding format module 354 organizes each of the encoded I, P, and R units into a prioritized order for transmission as explained in more detail below.
Then depending on the application, the prioritized data may be further formatted into some form of packet or data stream that is specific to the communication channel being used to transmit the data. In
In some implementations, the channel interface can provide a measure of the data transfer reliability/available bandwidth that is useful to the data unit classifier 310 for determining where to insert R and I type data units in the stream. The arrow back to the data classifier from the channel interface represents the communication of this information back to the data classifier. The data classifier can get this information, for example, by querying the channel interface via a function call, which returns a parameter providing the desired information.
The decoding process begins by reading the unpackaged data from the channel interface and breaking it into three different types of encoded data units: independent data units 406, predicted data units 408, and remotely predicted data units 410. The decode format module 405, shown in
The predictor 418 uses the prediction parameters and the previously decoded reference unit to compute the prediction unit 420 for the current data unit being decoded. The predictor 418 computes the prediction unit differently depending on whether the data unit being decoded is an R or P unit. In the case of an R unit, the predictor 418 selects the reference unit for the R unit stored in memory unit 422. For a first-level R unit, R1, this reference unit is the reconstructed I unit for the segment. The decoder allocates sufficient memory units to buffer the reference units for each level of remotely predicted units. For simplicity, the diagram only shows a single memory unit 422, but the implementation may allocate more memory units, depending on the number of levels in the remote prediction scheme. For a P unit, the predictor 418 selects the immediately preceding data unit, which is stored separately from the reference unit for R type units (shown as the reconstructed unit 424 for P type units in
After computing the prediction unit 420 for the current data unit being decoded, the decoder combines the error signal 414 with the prediction unit 420 to reconstruct the current data unit. It stores the results in memory 424 allocated separately from the memory allocated for the decoded reference unit 422 for R units because both the decoded R reference unit and the immediately preceding data unit are necessary to decode the segment. With the decoding of each P and R unit, the decoder transfers the preceding data unit currently stored in memory 424 to a module for reconstructing the output (e.g., reconstruct output module 426). It then replaces the reference unit stored in memory 424 with the most recently decoded unit.
The decoder decodes the independent data units using the inverse of the steps performed in the independent unit encoder shown in
The decoder transfers a copy of the independent data unit at the beginning of the stream to the module for reconstructing the output, and replaces the independent data unit in memory 422 with each new independent segment in the stream.
Video Coding Applications
In video coding applications, the stream of data units correspond to a sequence of temporally spaced apart images, typically called frames. Presently, there are two primary classes of video coders: frame-based coders and object based coders. In both types of coders, the temporal sequence is comprised of discrete time samples coinciding with each frame, and the data at each time sample is an image. Frame-based and object based coders differ in that object based coders operate on arbitrary shaped objects in the video scene, whereas frame-based coders operate on the entire frame. Object based coders attempt to exploit the coherence of objects in the frames by coding these objects separately. This may involve simply coding a foreground object, such as talking head, separately from a mostly static background. It also may involve more complicated scenes with multiple video objects. An object based coder can be reduced to a frame based coder by treating each frame as a single object. Examples of video coding standards for frame based video encoding include MPEG-2 and H262-H263. The coding standard MPEG-4 addresses object based video coding.
While remote prediction may be used in both frame based and object based video coding, the following example illustrates remote prediction in the context of object based video coding.
The shape coding module 532 reads the definition of an object including its bounding rectangle and extends the bounding rectangle to integer multiples of fixed sized pixel blocks called “macroblocks.” The shape information for an object comprises a mask or “alpha plane.” The shape coding module 532 reads this mask and compresses it, using for example, a conventional chain coding method to encode the contour of the object.
Motion estimation module 534 performs motion estimation between an object in the current frame and its reference object. This reference object is the same object in either the immediately preceding frame for P type data, or the same object in the first frame of the current segment, for R type data. The encoder stores the immediately preceding object as the reconstructed image 536, and stores the object from the first frame in the segment as part of the I-frame image memory 550. The motion estimation module 534 reads the appropriate object from either memory space 536 or 550 based on a parameter that classifies the type of the object as either an R or P type. Using the stored image object and its bounding rectangle, the motion estimation module computes motion estimation data used to predict the motion of an object from the current frame to the reference frame.
In the implementation, the motion estimation module 534 searches for the most similar macroblock in the reference image for each macroblock in the current image to compute a motion vector for each macroblock. The specific format of the motion vector from the motion estimation module 534 can vary depending on the motion estimation method used. In the implementation, there is a motion vector for each macroblock, which is consistent with current MPEG and H26X formats. However, it is also possible to use different types of motion models, which may or may not operate on fixed sized blocks. For example, a global motion model could be used to estimate the motion of the entire object, from the current frame to the reference frame. A local motion model could be used to estimate the motion of sub-regions within the object such as fixed blocks or variable sized polygons. The particular form of motion estimation is not critical to the invention.
Returning again to
Texture coding module 540 compresses blocks of error signals for inter-frame coded objects, namely, R and P type objects. In addition, it compresses blocks of image sample values for the object from the input data stream 530 for intra-frame coded objects, namely the I type objects. The feedback path 542 from the texture coding module 540 represents blocks of signals that were texture coded and then texture decoded, including the reconstructed error signal and I-frame object. The encoder uses the error signal blocks along with the predicted image blocks from the motion compensation module to compute the reconstructed image 536. The reconstructed image memory 536 shown in
The texture coding module 540 codes intra-frame and error signal data for an object using any of a variety of still image compression techniques. Example compression techniques include DCT, wavelet, as well as other conventional image compression methods.
The bit stream of the compressed video sequence includes the shape, motion and texture coded information from the shape coding, motion estimation, and texture coding modules. In addition, it includes overhead parameters that identify the type of coding used at the frame, object, and macroblock levels. Depending on the implementation, the parameter designating the data type as I, R, or P can be encoded at any of these levels. In the implementation, the encoder places a flag at the macroblock level to indicate which type of data is encoded in the macroblock. Multiplexer 544 combines and formats the shape, motion, texture, and overhead parameters into the proper syntax and outputs it to the buffer 546. The multiplexer may perform additional coding of the parameters. For example, overhead parameters and the motion vectors may be entropy coded, using a conventional entropy coding technique such as Huffman or arithmetic coding.
While the encoder can be implemented in hardware or software, it is most likely implemented in software. In a software implementation, the modules in the encoder represent software instructions stored in memory of a computer and executed in the processor, and the video data stored in memory. A software encoder can be stored and distributed on a variety of conventional computer readable media. In hardware implementations, the encoder modules are implemented in digital logic, preferably in an integrated circuit. Some of the encoder functions can be optimized in special-purpose digital logic devices in a computer peripheral to off-load the processing burden from a host computer.
Shape decoding module 664 decodes the shape or contour for the current object being processed. To accomplish this, it employs a shape decoder that implements the inverse of the shape encoding method used in the encoder of
The motion decoding module 666 decodes the motion information in the bit stream. The decoded motion information includes the motion vectors for each macroblock that are reconstructed from entropy codes in the incoming bit stream. The motion decoding module 666 provides this motion information to the motion compensation module 668. The motion compensation module 668 uses the motion vectors to find predicted image samples in the previously reconstructed object data 670 for P type macroblocks and higher level R type macroblocks, or the I-frame object memory 680 for first level R type macroblocks.
The texture decoding module 674 decodes error signals for inter-frame coded texture data (both R and P type) and an array of color values for intra-frame texture data (I type data) and passes this information to a module 672 for computing and accumulating the reconstructed image. It specifically stores the I frame data objects in separate memory 680 so that the motion compensation module can compute the predicted object for first level R type objects using the I frame's object rather than the immediately preceding frame's object. For inter-frame coded objects, including both R and P type data, this module 672 applies the error signal data to the predicted image output from the motion compensation module 668 to compute the reconstructed object for the current frame. For intra-frame coded objects, the texture decoding module 674 decodes the image sample values for the object and places the reconstructed object in the reconstructed object module 672 and I-frame object memory 680. Previously reconstructed objects are temporarily stored in object memory 670 and are used to construct the object for other frames.
Like the encoder, the decoder can be implemented in hardware, software or a combination of both. In software implementations, the modules in the decoder are software instructions stored in memory of a computer and executed by the processor, and video data stored in memory. A software decoder can be stored and distributed on a variety of conventional computer readable media. In hardware implementations, the decoder modules are implemented in digital logic, preferably in an integrated circuit. Some of the decoder functions can be optimized in special-purpose digital logic devices in a computer peripheral to off-load the processing burden from a host computer.
Classifying R Type Data Units
In a coding system that supports remotely predicted data units, the encoder preferably classifies each data unit as an I, R, or P type unit for encoding. There are a variety of ways of implementing the classification scheme. One approach is to implement the encoder so that it always uses the same spacing of I, R, and P data units. Another approach is to allow the user to specify the spacing by making the spacing of each type a user adjustable parameter. Yet another approach is to program the encoder to classify the data units in the stream adaptively or dynamically based on some criteria determined during the encoding process. A scheme for adaptively classifying data units refers to a process in which the encoder selects the data type by adapting it to some characteristic of the media content or some other criteria relating to the coding of the media content or its transmission. The term “dynamic” is similar to “adaptive” and is meant to apply to implementations where data unit type is selected at run-time (during operation) of the encoder or during transmission of the stream.
The classification of data unit types may be dynamic or predetermined based on the media content type. The encoder may determine the spacing of I and R units in the stream based on a predetermined relationship between the media content type and the spacing of I and R units. The encoder, for example, may be programmed to use a default spacing of R units for all video sequences of a certain type. In particular, the encoder might use a first default spacing of R units for low motion talking head video and a second, different spacing for high motion video scenes. The encoder may allow the user to specify a different spacing for the content via a user-adjustable input parameter.
One form of dynamically classifying data units is to select the spacing of I, P and R type units in the stream based on a prediction error threshold. For example, the encoder can be programmed to start a new independent segment with an I type unit when the error signal from the prediction process exceeds a predetermined or user-adjustable threshold. In this case, the encoder may be programmed to classify selected data units in between each I unit in the sequential stream as R type units according to a predetermined or user-adjustable spacing. The encoder would classify the remaining data units as P type units.
Another measure of data transfer reliability is an indicator of the quantity of data being lost for previously transferred data packets over the communication channel (such as network data packets). The encoder could also obtain this information from the channel interface. In particular for computer networks, the network interface software would keep track of the dropped data packets. The encoder could then query the interface to request that it return the number of lost packets. The encoder would then calculate or adjust the spacing of R units for each independent segment based on a predetermined relationship between the number of lost data packets over a period of time and the spacing of R units.
Another form of dynamically classifying data units is to select the spacing of R type units based on a measure of the fidelity of the output stream at the client. One approach is to use the magnitude of the prediction error as a measure of the fidelity. The magnitude of the prediction error can be used to select coding between R and P type data units on the server.
As demonstrated above, there are a number of ways to control the classification of the data unit types. An implementation of the invention may use one or more of these methods, or some alternative method. Note, that in general, the R units may be classified independently from the I units. In other words, the spacing of the R units may be selected so as to be totally independent of the spacing of I units. Thus, the encoder can optimize the spacing of R units depending on a variety of criteria independent of the need for random access points.
Prioritizing Data Units
As introduced above, the encoder implementation may also prioritize the transfer of I, P, and R encoded data units with respect to the reliability of the data transfer. Preferably, the encoded stream should be prioritized such that the I unit has the highest reliability of being transferred, the R unit has the same or lower reliability as the I unit, and the P unit has the same or lower probability as the R unit. Additionally, if the encoder implements a multi-layered R encoding scheme, the priority for each level Rn would decrease. For example, units of type R1, dependent on the I unit in the segment, would have the highest priority, units of type R2, dependent on unit of type R1 in the sub-segment, would have the next highest priority.
The implementation prioritizes data transfer by transmitting the higher priority data units first so that there is more time to retransmit them if they are not received. Once prioritized, the encoded data units maintain that priority level on retransmission. In particular, all higher priority data units are transmitted, including retransmissions if necessary, before lower priority data units.
The implementation of the prioritizing scheme operates in conjunction with an underlying transport mechanism that supports prioritization of data transfer based on some pre-determined block of data (e.g., a packet). The encoder instructs the transport mechanism to place higher priority data in higher priority data packets for transfer, and ensures that these higher priority packets are successfully transmitted before instructing the transport mechanism to send the next lower priority data unit types. This process of sending all data units of a certain type in prioritized fashion operates on a portion of the input data stream, such that all of the I type units in that portion are sent first, then all of the R1 type units in that portion, then all of the R2 units in that portion, and so on. Preferably, the size of this portion is set so as not to add significant latency on the client side. For example, the size of the portion size may be set as 10 seconds worth of data units in a video stream. If the size of the portion becomes larger, the client incurs additional latency waiting for receipt of the portion, before it is able to start reconstructing the stream.
This approach for prioritizing the transfer of the encoded stream helps to ensure that any losses will affect the P unit first, before affecting the R or I units. As a result, the client is more likely to receive the most useful information, namely the independent data units, which provide random access and data recovery, and the higher priority R type units, which provide additional data recovery points.
The prioritization scheme works in the case of a transfer between a server and a single client, and in a multi-cast scenario where a server broadcasts the data packets to multiple clients. The extent to which the client uses the encoded units to reconstruct the output depends on its available transmission bandwidth. In the case of a multi-cast application, the client can subscribe to the number of priority levels that it can support based on its available bandwidth. Consider an example such as
Brief Overview of a Computer System
While
A number of program modules may be stored in the drives and RAM 725, including an operating system 735, one or more application programs 736, other program modules 737, and program data 738. A user may enter commands and information into the personal computer 720 through a keyboard 740 and pointing device, such as a mouse 742. Other input devices (not shown) may include a microphone, joystick, game pad, satellite dish, scanner, or the like. These and other input devices are often connected to the processing unit 721 through a serial port interface 746 that is coupled to the system bus, but may be connected by other interfaces, such as a parallel port, game port or a universal serial bus (USB). A monitor 747 or other type of display device is also connected to the system bus 723 via an interface, such as a display controller or video adapter 748. In addition to the monitor, personal computers typically include other peripheral output devices (not shown), such as speakers and printers.
The personal computer 720 may operate in a networked environment using logical connections to one or more remote computers, such as a remote computer 749. The remote computer 749 may be a server, a router, a peer device or other common network node, and typically includes many or all of the elements described relative to the personal computer 720, although only a memory storage device 750 has been illustrated in
When used in a LAN networking environment, the personal computer 720 is connected to the local network 751 through a network interface or adapter 753. When used in a WAN networking environment, the personal computer 720 typically includes a modem 754 or other means for establishing communications over the wide area network 752, such as the Internet. The modem 754, which may be internal or external, is connected to the system bus 723 via the serial port interface 746. In a networked environment, program modules depicted relative to the personal computer 720, or portions thereof, may be stored in the remote memory storage device. The network connections shown are merely examples and other means of establishing a communications link between the computers may be used.
While the invention has been illustrated using a specific implementation as an example, the scope of the invention is not limited to the specific implementation described above. In particular, the use of remote prediction scheme for improved error recovery is not limited to video, but instead, applies to other media streams as well. For example, the same techniques described above can be applied to audio, where the data units are frames of audio and there is a dependency relationship among the audio frames. In audio coders available today, an audio frame corresponds to a fixed number of PCM samples in the time domain. When an audio frame is made to be dependent on another audio frame, whether it is a dependency between audio signals or audio coding parameters (e.g., the parameters in a parametric coding model in speech), remote prediction can be used to improve error recovery in the manner described above. In particular one or more layers of R units may be inserted to improve loss recovery.
The data dependency graph of the encoded data stream may vary as well. For example, the independent data unit does not have to be at the beginning of an independent segment of data units. For example, the independent unit may be located in the middle of a temporal sequence of data units in an independent segment, rather than at the beginning of the segment. The other predicted units in the segment would be predicted directly or indirectly from the I unit. Note that the only restriction is that the I unit be available for decoding the other units in the segment that depend on it. This constraint can be satisfied by prioritizing the transfer of the data units such that the I unit is available for decoding the dependent units.
As a practical matter, the I unit will likely be at the beginning of an ordered, temporal sequence of data units in an independent segment of the stream for most applications because the receiver usually will need to play it first before the other units in the segment. However, with the use of a prioritization scheme that re-arranges the data units for transmission, there is flexibility in the classification of the data units as I, P, and R units.
Also, as noted above, the remotely predicted data units can use another R type unit as their reference, rather than the I unit in the segment. Rather than maintaining a copy of the I data unit, the coder/decoder would keep a copy of the previously reconstructed R unit in addition to the previously reconstructed P unit. The coding scheme described above could be extended to a multi-layered coding scheme where there are multiple levels of R type units. Each level may be dependent directly on the I unit in the segment, or on other R units, which in turn, are dependent on other R units or the I unit in the segment.
In view of the many possible implementations of the invention, it should be recognized that the implementation described above is only an example of the invention and should not be taken as a limitation on the scope of the invention. Rather, the scope of the invention is defined by the following claims. We therefore claim as our invention all that comes within the scope and spirit of these claims.
This application is a continuation of U.S. patent application Ser. No. 13/967,069, filed Aug. 14, 2013, which is a continuation of U.S. patent application Ser. No. 12/699,354, filed Feb. 3, 2010, now U.S. Pat. No. 8,548,051, which is a continuation of U.S. patent application Ser. No. 11/170,281, filed Jun. 28, 2005, now U.S. Pat. No. 7,685,305, which is a continuation of U.S. patent application Ser. No. 11/088,696, filed Mar. 22, 2005, now U.S. Pat. No. 7,734,821, which is a continuation of U.S. patent application Ser. No. 10/329,107, filed Dec. 23, 2002, now U.S. Pat. No. 6,912,584, which is a continuation of U.S. patent application Ser. No. 09/267,563, filed Mar. 12, 1999, now U.S. Pat. No. 6,499,060, the disclosures of which are hereby incorporated by reference.
Number | Name | Date | Kind |
---|---|---|---|
4838685 | Martinez et al. | Jun 1989 | A |
4989087 | Pele et al. | Jan 1991 | A |
5049991 | Niihara | Sep 1991 | A |
5093720 | Krause et al. | Mar 1992 | A |
5150209 | Baker et al. | Sep 1992 | A |
5175618 | Ueda et al. | Dec 1992 | A |
5214504 | Toriu et al. | May 1993 | A |
5227878 | Puri et al. | Jul 1993 | A |
5255090 | Israelsen | Oct 1993 | A |
5267334 | Normille et al. | Nov 1993 | A |
5317397 | Odaka et al. | May 1994 | A |
5376968 | Wu et al. | Dec 1994 | A |
5412430 | Nagata | May 1995 | A |
5412435 | Nakajima | May 1995 | A |
5424779 | Odaka et al. | Jun 1995 | A |
RE35093 | Wang et al. | Nov 1995 | E |
5467136 | Odaka et al. | Nov 1995 | A |
5469226 | David et al. | Nov 1995 | A |
5477272 | Zhang et al. | Dec 1995 | A |
5481310 | Hibi | Jan 1996 | A |
5493513 | Keith et al. | Feb 1996 | A |
5539663 | Agarwal | Jul 1996 | A |
5541594 | Huang et al. | Jul 1996 | A |
5543847 | Kato | Aug 1996 | A |
5546129 | Lee | Aug 1996 | A |
5557684 | Wang et al. | Sep 1996 | A |
5579430 | Grill et al. | Nov 1996 | A |
5592226 | Lee et al. | Jan 1997 | A |
5594504 | Ebrahimi | Jan 1997 | A |
5598215 | Watanabe | Jan 1997 | A |
5598216 | Lee | Jan 1997 | A |
5612743 | Lee | Mar 1997 | A |
5612744 | Lee | Mar 1997 | A |
5617144 | Lee | Apr 1997 | A |
5617145 | Huang et al. | Apr 1997 | A |
5619281 | Jung | Apr 1997 | A |
5621660 | Chaddha et al. | Apr 1997 | A |
5627591 | Lee | May 1997 | A |
5642166 | Shin et al. | Jun 1997 | A |
5668608 | Lee | Sep 1997 | A |
5673339 | Lee | Sep 1997 | A |
5692063 | Lee et al. | Nov 1997 | A |
5694171 | Katto | Dec 1997 | A |
5699476 | Van Der Meer | Dec 1997 | A |
5701164 | Kato | Dec 1997 | A |
5714952 | Wada | Feb 1998 | A |
5731850 | Maturi et al. | Mar 1998 | A |
5737022 | Yamaguchi | Apr 1998 | A |
5740310 | De Haan et al. | Apr 1998 | A |
5742344 | Odaka et al. | Apr 1998 | A |
5748121 | Romriell | May 1998 | A |
5751360 | Tanaka | May 1998 | A |
5754233 | Takashima | May 1998 | A |
5784107 | Takahashi | Jul 1998 | A |
5784175 | Lee | Jul 1998 | A |
5784528 | Yamane et al. | Jul 1998 | A |
5798794 | Takahashi | Aug 1998 | A |
5818531 | Yamaguchi | Oct 1998 | A |
5822541 | Ncgmmura et al. | Oct 1998 | A |
5825421 | Tan | Oct 1998 | A |
5835144 | Matsumura et al. | Nov 1998 | A |
5835149 | Astle | Nov 1998 | A |
RE36015 | Iu | Dec 1998 | E |
5852664 | Iverson et al. | Dec 1998 | A |
5861919 | Perkins et al. | Jan 1999 | A |
5867230 | Wang et al. | Feb 1999 | A |
5870148 | Lillevold | Feb 1999 | A |
5880784 | Lillevold | Mar 1999 | A |
5903313 | Tucker et al. | May 1999 | A |
5905542 | Linzer | May 1999 | A |
5933195 | Florencio | Aug 1999 | A |
5946043 | Lee et al. | Aug 1999 | A |
5949489 | Nishikawa et al. | Sep 1999 | A |
5963258 | Nishikawa et al. | Oct 1999 | A |
5970173 | Lee et al. | Oct 1999 | A |
5970175 | Nishikawa et al. | Oct 1999 | A |
5982438 | Lin et al. | Nov 1999 | A |
5986713 | Odaka et al. | Nov 1999 | A |
5990960 | Murakami et al. | Nov 1999 | A |
5991447 | Eifrig et al. | Nov 1999 | A |
5991464 | Hsu et al. | Nov 1999 | A |
6002439 | Murakami et al. | Dec 1999 | A |
6002440 | Dalby et al. | Dec 1999 | A |
RE36507 | Iu | Jan 2000 | E |
6011596 | Burl et al. | Jan 2000 | A |
6026195 | Eifrig et al. | Feb 2000 | A |
6029126 | Malvar | Feb 2000 | A |
6052150 | Kikuchi | Apr 2000 | A |
6052417 | Fujiwara et al. | Apr 2000 | A |
6057832 | Lev et al. | May 2000 | A |
6057884 | Chen et al. | May 2000 | A |
6097759 | Murakami et al. | Aug 2000 | A |
6097842 | Suzuki et al. | Aug 2000 | A |
6104754 | Chujoh et al. | Aug 2000 | A |
6104757 | Rhee | Aug 2000 | A |
6122321 | Sazzad et al. | Sep 2000 | A |
6169821 | Fukunaga et al. | Jan 2001 | B1 |
6188794 | Nishikawa et al. | Feb 2001 | B1 |
6212236 | Nishida et al. | Apr 2001 | B1 |
6226327 | Igarashi | May 2001 | B1 |
6243497 | Chiang et al. | Jun 2001 | B1 |
6249318 | Girod et al. | Jun 2001 | B1 |
6282240 | Fukunaga et al. | Aug 2001 | B1 |
6289054 | Rhee | Sep 2001 | B1 |
6307973 | Nishikawa et al. | Oct 2001 | B2 |
6324216 | Igarashi et al. | Nov 2001 | B1 |
6333948 | Kurobe et al. | Dec 2001 | B1 |
6359929 | Boon | Mar 2002 | B1 |
6370276 | Boon | Apr 2002 | B2 |
6373895 | Saunders et al. | Apr 2002 | B2 |
6400990 | Silvian | Jun 2002 | B1 |
6404813 | Haskell et al. | Jun 2002 | B1 |
6408029 | McVeigh et al. | Jun 2002 | B1 |
6415055 | Kato | Jul 2002 | B1 |
6415326 | Gupta et al. | Jul 2002 | B1 |
6418166 | Lin et al. | Jul 2002 | B1 |
6421387 | Rhee | Jul 2002 | B1 |
6441754 | Wang et al. | Aug 2002 | B1 |
6499060 | Wang et al. | Dec 2002 | B1 |
6535558 | Suzuki et al. | Mar 2003 | B1 |
6560284 | Girod et al. | May 2003 | B1 |
6563953 | Wu et al. | May 2003 | B2 |
6625215 | Faryar et al. | Sep 2003 | B1 |
6629318 | Radha et al. | Sep 2003 | B1 |
6640145 | Hoffberg et al. | Oct 2003 | B2 |
6704360 | Haskell et al. | Mar 2004 | B2 |
6735345 | Lin et al. | May 2004 | B2 |
6765963 | Karczewicz et al. | Jul 2004 | B2 |
6785331 | Jozawa et al. | Aug 2004 | B1 |
6807231 | Wiegand et al. | Oct 2004 | B1 |
6907460 | Loguinov et al. | Jun 2005 | B2 |
6912584 | Wang et al. | Jun 2005 | B2 |
7006881 | Hoffberg et al. | Feb 2006 | B1 |
7012893 | Bahadiroglu | Mar 2006 | B2 |
7124333 | Fukushima et al. | Oct 2006 | B2 |
7203184 | Ido et al. | Apr 2007 | B2 |
7242716 | Koto et al. | Jul 2007 | B2 |
7320099 | Miura et al. | Jan 2008 | B2 |
7385921 | Itakura et al. | Jun 2008 | B2 |
7512698 | Pawson | Mar 2009 | B1 |
7545863 | Haskell et al. | Jun 2009 | B1 |
7577198 | Holcomb | Aug 2009 | B2 |
7609895 | Elton | Oct 2009 | B2 |
7685305 | Wang et al. | Mar 2010 | B2 |
7734821 | Wang et al. | Jun 2010 | B2 |
7827458 | Salsbury et al. | Nov 2010 | B1 |
8548051 | Wang et al. | Oct 2013 | B2 |
8634413 | Lin et al. | Jan 2014 | B2 |
9232219 | Wang et al. | Jan 2016 | B2 |
20010026677 | Chen et al. | Oct 2001 | A1 |
20020034256 | Talluri et al. | Mar 2002 | A1 |
20020097800 | Ramanzin | Jul 2002 | A1 |
20020105909 | Flanagan et al. | Aug 2002 | A1 |
20020113898 | Mitsuhashi | Aug 2002 | A1 |
20020114391 | Yagasaki et al. | Aug 2002 | A1 |
20020114392 | Sekiguchi et al. | Aug 2002 | A1 |
20020126754 | Shen et al. | Sep 2002 | A1 |
20030099292 | Wang et al. | May 2003 | A1 |
20030138150 | Srinivasan | Jul 2003 | A1 |
20030156648 | Holcomb et al. | Aug 2003 | A1 |
20030179745 | Tsutsumi et al. | Sep 2003 | A1 |
20030202586 | Jeon | Oct 2003 | A1 |
20040013308 | Jeon et al. | Jan 2004 | A1 |
20040066848 | Jeon et al. | Apr 2004 | A1 |
20040110499 | Kang et al. | Jun 2004 | A1 |
20040131267 | Adiletta et al. | Jul 2004 | A1 |
20040233992 | Base et al. | Nov 2004 | A1 |
20050123274 | Crinon et al. | Jun 2005 | A1 |
20050135484 | Lee et al. | Jun 2005 | A1 |
20050147167 | Dumitras et al. | Jul 2005 | A1 |
20050193311 | Das et al. | Sep 2005 | A1 |
20050286542 | Shores et al. | Dec 2005 | A1 |
20060120464 | Hannuksela | Jun 2006 | A1 |
20060140281 | Nagai et al. | Jun 2006 | A1 |
20060146830 | Lin et al. | Jul 2006 | A1 |
20060210181 | Wu et al. | Sep 2006 | A1 |
20070009044 | Tourapis et al. | Jan 2007 | A1 |
20070205928 | Chujoh et al. | Sep 2007 | A1 |
20080063359 | Grigorian | Mar 2008 | A1 |
20080151881 | Liu et al. | Jun 2008 | A1 |
20100226430 | Hamilton et al. | Sep 2010 | A1 |
20120329779 | Griffin | Dec 2012 | A1 |
20130010861 | Lin et al. | Jan 2013 | A1 |
Number | Date | Country |
---|---|---|
0579319 | Jan 1994 | EP |
0612156 | Sep 1994 | EP |
0614318 | Sep 1994 | EP |
0707425 | Oct 1994 | EP |
0625853 | Nov 1994 | EP |
5130595 | May 1993 | JP |
6030394 | Feb 1994 | JP |
6078298 | Mar 1994 | JP |
09-149421 | Jun 1997 | JP |
10-079949 | Mar 1998 | JP |
10164600 | Jun 1998 | JP |
11-027645 | Jan 1999 | JP |
11150731 | Jun 1999 | JP |
H11-317946 | Nov 1999 | JP |
2000152247 | May 2000 | JP |
2001-148853 | May 2001 | JP |
2002-010265 | Jan 2002 | JP |
2002-185958 | Jun 2002 | JP |
2003-032689 | Jan 2003 | JP |
2003-152544 | May 2003 | JP |
2003-264837 | Sep 2003 | JP |
2004-215201 | Sep 2003 | JP |
2003-284078 | Oct 2003 | JP |
2004-215201 | Jul 2004 | JP |
2004-254195 | Sep 2004 | JP |
2001-0030721 | Apr 2001 | KR |
2002-0033089 | May 2002 | KR |
2003-0011211 | Feb 2003 | KR |
WO 9111782 | Aug 1991 | WO |
WO 9705746 | Feb 1997 | WO |
WO 0135650 | May 2001 | WO |
WO 0184732 | Nov 2001 | WO |
WO 2004102946 | Nov 2004 | WO |
Entry |
---|
Borgwardt, “Core Experiment on Interlaced Video Coding,” ITU Study Group 16 Question 16, VCEG-N85, 10 pp. (Oct. 2001). |
Chang et al., “Next Generation Content Representation, Creation, and Searching for New-Media Applications in Education,” Proc. IEEE, vol. 86, No. 5, pp. 884-904 (1998). |
Cote et al., “Effects of Standard-compliant Macroblock Intra Refresh on Rate-distortion Performance,” ITU-T, Study Group 16, Question 15, 2 pp. (Jul. 1998). |
Curcio et al., “Application Rate Adaptation for Mobile Streaming,” Sixth IEEE International Symposium, 2005, 6 pages. |
“DivX Multi Standard Video Encoder,” 2 pp. (Downloaded from the World Wide Web on Jan. 24, 2006). |
Farber et al., “Robust H.263 Compatible Video Transmission for Mobile Access to Video Servers,” Proc. Int. Conf. Image Processing, 4 pp. (1997). |
Gibson et al., Digital Compression of Multimedia, Chapter 10, “Multimedia Conferencing Standards,” pp. 309-362 (1998). |
Gibson et al., Digital Compression of Multimedia, Chapter 11, “MPEG Compression,” pp. 363-418 (1998). |
Golomb, “Run-Length Encodings,” 12 IEEETrans. on Info. Theory, pp. 399-401, Jul. 1966. |
Hotter, “Optimization and Efficiency of an Object-Oriented Analysis—Synthesis Coder,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 2, pp. 181-194 (Apr. 1994). |
Hsu et al., “Software Optimization of Video Codecs on Pentium Processor With Mmx Technology,” Received Mar. 14, 2001, Revised May 3, 2001, 10 pages. |
IBM Technical Disclosure Bulletin, “Method to Deliver Scalable Video Across a Distributed Computer Systems,” vol. 37, No. 5, pp. 251-256 (1997). |
Irani et al., “Video Indexing Based on Mosaic Representations,” Proc. IEEE, vol. 86, No. 5, pp. 905-921 (May 1998). |
ISO/IEC, 11172-2, “Information Technology—Coding of Moving Pictures and Associated Audio for Digital Storage Media at up to about 1.5 Mbit/s—Part 2: Video,” pp. i-ix and 1-113 (Aug. 1993). |
ISO/IEC JTC1/SC29/WG11, “Information Technology—Coding of Audio Visual Objects: Visual, ISO/IEC 14496-2,” pp. vii-xiii, 14-23, 30-37, and 192-225 (Mar. 1998). |
ISO/IEC JTC1/SC29/WG11, N2459, “Overview of the MPEG-4 Standard,” (Oct. 1998). |
ISO/IEC, “JTC1/SC29/WG11 N2202, Information Technology—Coding of Audio-Visual Objects: Visual, ISO/IEC 14496-2,” 329 pp. (Mar. 1998). |
ISO, ISO/IEC JTC1/SC29/WG11 MPEG 97/N1642, “MPEG-4 Video Verification Model Version 7.0 3. Encoder Definition,” pp. 1, 17-122, Bristol (Apr. 1997). |
ITU-T, “ITU-T Recommendation H.261, video Codec for Audiovisual Sesrvices at p×64 kbits,” 25 pp. (Mar. 1993). |
ITU-T, “ITU-T Recommendation H.262, Information Technology—Generic Coding of Moving Pictures and Associated Audio Information: Video,” 205 pp. (Jul. 1995). |
ITU-T, Draft Recommendation H.263, “Video Coding for Low Bitrate Communication,” 51 pp. (Dec. 1995). |
ITU-T, “ITU-T Recommendation H.263 Video Coding for Low Bit Rate Communication,” 162 pp. (Feb. 1998). |
Joint Video Team (JVT) of ISO/IEC MPEG & ITU-T VCEG, “Joint Final Committee Draft (JFCD) of Joint Video Specification,” JVT-D157, 207 pp. (Aug. 2002). |
Karczewicz et al., “A Proposal for SP-frames,” VCEG-L27, 9 pp. (Jan. 2001). |
Karczewicz et al., “The SP- and SI-Frames Design for H.264/AVC,” IEEE Trans. on Circuits and Systems for Video Technology, vol. 13, No. 7, pp. 637-644 (2003). |
Kim et al., “Low-Complexity Macroblock Mode Selection for H.264/AVC Encoders,” IEEE Int. Conf. on Image Processing, 4 pp. (Oct. 2004). |
Kim et al., “Network Adaptive Packet Scheduling for Streaming Video over Error-prone Networks,” IEEE 2004, pp. 241-246. |
Kim et al., “TCP-Friendly Internet Video Streaming Employing Variable Frame-Rate Encoding and Interpolation,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 10, No. 7, Oct. 2000, pp. 1164-1177. |
Kurceren et al., “SP-frame demonstrations,” VCEG-N42, 4 pp. (Sep. 2001). |
Kurçeren et al., “Synchronization-Predictive Coding for Video Compression: The SP Frames Design for JVT/H.26L,” Proc. of the Int'l Conf. on Trans Image Processing, pp. 497-500 (2002). |
Lee et al., “A Layered Video Object Coding System Using Sprite and Affine Motion Model,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 7, No. 1, pp. 130-145 (Feb. 1997). |
Le Gall, “MPEG: A Video Compression Standard for Multimedia Applications,” Communications of the ACM, vol. 34, No. 4, pp. 47-58 (Apr. 1991). |
Melanson, “VP3 Bitstream Format and Decoding Process,” v0.5, 21 pp. (document marked Dec. 8, 2004). |
Microsoft Corporation and RealNetworks, Inc., “Advanced Streaming Format (ASF) Specification,” pp. 1-56 (Feb. 26, 1998). |
Microsoft Corporation, “Microsoft Debuts New Windows Media Player 9 Series, Redefining Digital Media on the PC,” 4 pp. (Sep. 4, 2002) [Downloaded from the World Wide Web on May 14, 2004]. |
Microsoft Corporation, “Windows Media Technologies: Overview—Technical White Paper,” pp. 1-16 (Month unknown, 1998). |
Mook, “Next-Gen Windows Media Player Leaks to the Web,” BetaNews, 17 pp. (Jul. 2002) [Downloaded from the World Wide Web on Aug. 8, 2003]. |
On2 Technologies Inc., “On2 Introduces TrueMotion VP3.2,” 1 pp., press release dated Aug. 16, 2000 (downloaded from the World Wide Web on Dec. 6, 2012). |
Pennebaker et al., “JPEG Image Data Compression Standard,” Chapter 20, pp. 325-329 (1993). |
Regunathan et al., “Multimode Video Coding for Noisy Channels,” Proceedings of the 1997 International Conference on Image Processing, ICIP '97, 4 pages. |
Regunathan et al., “Scalable video coding with robust mode selection,” Signal Processing: Image Communication, vol. 16, No. 8, pp. 725-732 (May 2001). |
Rhee et al., “Error Recovery using FEC and Retransmission for Interactive Video Transmission,” Technical Report, 23 pp. (Jul. 1998). |
Rhee et al., “FEC-based Loss Recovery for Interactive Transmission—Experimental Study,” 23 pp. (Nov. 1998). |
Rose et al., “Towards Optimal Scalability in Predictive Video Coding”, IEEE, 1998, 5 pages. |
Rui et al., “Digital Image/Video Library and MPEG-7: Standardization and Research Issues,” ICASSP (1998). |
Sullivan, “Draft for ‘H.263++’ Annexes U, V, and W to Recommendation H.263,” ITU-T, Study Group 16, Question 15, 46 pp. (Nov. 2000). |
Sullivan et al., “The H.264/AVC Advanced Video Coding Standard: Overview and Introduction to the Fidelity Range Extensions,” 21 pp. (Aug. 2004). |
Sullivan et al., “Meeting Report of the Twelfth Meeting of the ITU-T Video Coding Experts Group,” VCEG-L46, 43 pp. (Jan. 2001). |
Tourapis et al., “Timestamp Independent Motion Vector Prediction for P and B frames with Division Elimination,” Joint Video Team (JVT) of ISO/IEC MPEG & ITU-T VCEG, JVT-D040, 4th Meeting, Klagenfurt, Austria, 18 pages, Jul. 22-26, 2002. |
Wang et al., “Error Control and Concealment for Video Communication: A Review,” Proc. IEEE, vol. 86, No. 5, pp. 974-997 (May 1998). |
Wang et al., “Error Resilient Video Coding Techniques,” IEEE Signal Processing Magazine, pp. 61-82 (Jul. 2000). |
Wenger et al., “Intra-macroblock Refresh in Packet (Picture) Lossy Scenarios,” ITU-T, Study Group 16, Question 15, 3 pp. (Jul. 1998). |
Wenger et al., “Simulation Results for H.263+ Error Resilience Modes K, R, N on the Internet,” ITU-T, Study Group 16, Question 15, 22 pp. (Apr. 1998). |
Wiegand et al., “Block-Based Hybrid Coding Using Motion Compensated Long-Term Memory Prediction,” Picture Coding Symposium, No. 143, pp. 153-158 (Sep. 1997). |
Wiegand et al., “Fast Search for Long-Term Memory Motion-Compensated Prediction,” Proc. ICIP, vol. 3, pp. 619-622 (Oct. 1998). |
Wiegand, “H.26L Test Model Long-Term No. 9 (TML-9) draft 0,” ITU-Telecommunications Standardization Sector, Study Group 16, VCEG-N83, 74 pp. (Dec. 2001). |
Wiegand et al., “Motion-Compensating Long-Term Memory Prediction,” Proc. ICIP, vol. 2, pp. 53-56 (Oct. 1997). |
Wiegand, “Multi-frame Motion-Compensated Prediction for Video Transmissions,” Shaker Verlag, 141 pp. (Sep. 2001). |
Wien, “Variable Block-Size Transforms for Hybrid Video Coding,” Dissertation, 182 pp. (Feb. 2004). |
Wikipedia, “Theora,” 10 pp. (downloaded from the World Wide Web on Dec. 6, 2012). |
Wikipedia, “VP3,” 4 pp. (downloaded from the World Wide Web on Dec. 6, 2012). |
Wu et al., “On End-to-End Architecture for Transporting MPEG-4 Video Over the Internet,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 10, No. 6, Sep. 2000, pp. 923-941. |
Xiph.org Foundation, “Theora I Specification,” 206 pp. (Sep. 17, 2004). |
Xiph.org Foundation, “Theora Specification,” 206 pp. (Aug. 5, 2009). |
Yu et al., “Two-Dimensional Motion Vector Coding for Low Bitrate Videophone Applications,” Proc. Intl Conf. on Image Processing6, pp. 414-417 (1995). |
Zhang et al., “Optimal Estimation for Error Concealment in Scalable Video Coding,” IEEE Conf. on Signals, Systems and Computers, vol. 2, pp. 1374-1378 (Oct. 2000). |
Zhang et al., “Robust Video Coding for Packet Networks with Feedback,” Proceedings of the Conference on Data Compression 2000, Mar. 28-30, 2000, 10 pages. |
Number | Date | Country | |
---|---|---|---|
20160249048 A1 | Aug 2016 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 13967069 | Aug 2013 | US |
Child | 14950889 | US | |
Parent | 12699354 | Feb 2010 | US |
Child | 13967069 | US | |
Parent | 11170281 | Jun 2005 | US |
Child | 12699354 | US | |
Parent | 11088696 | Mar 2005 | US |
Child | 11170281 | US | |
Parent | 11329107 | Dec 2002 | US |
Child | 11088696 | US | |
Parent | 09267563 | Mar 1999 | US |
Child | 11329107 | US |