With recent advances in digital data transmission techniques and digital video compression such as used in the MPEG standards (e.g., MPEG-2, MPEG-4), it is possible to deliver several digitally compressed video programs in the same bandwidth that is otherwise occupied by a single analog television (TV) channel. These capabilities provide opportunities for programming service providers (e.g., broadcasters such as CNN, ABC), network operators (e.g., cable and satellite network owners), and end users.
In a multi-program transmission environment, several programs (e.g., channels) are coded, multiplexed and transmitted over a single communication channel. Since these programs share a limited channel capacity, the aggregate bit rate of the programs must be no greater than the communication channel rate. In order to optimize the quality and efficiency of the program transmission process, the bit rate of the program or video streams can be varied as needed to manage the utilization of network bandwidth.
One video transmission technique that may be used to manage bandwidth is statistical multiplexing, which combines several programs each comprising a compressed video bit stream into a single multiplexed bit stream, e.g., for transmission of a plurality of programs on a single frequency. In this manner a service provider may provide more programs to a subscriber base using the same network infrastructure, e.g. providing four programs in the same network infrastructure that would have previously provided a single program. For a given level of video quality, the bit rate of a given compressed stream generally varies with time based on the complexity of the corresponding video signals. A statistical multiplexer attempts to estimate the complexity of the various video frame sequences of a video signal and allocates channel bits among the corresponding compressed video bit streams. In some cases the bits are allocated so as to provide a desired (e.g., approximately constant) level of video quality across all of the multiplexed streams. For example, a given video frame sequence with a relatively large amount of spatial activity or motion may be more complex than other sequences and therefore allocated more bits than the other sequences. On the subscriber side, a network element, such as set top box, pulls the desired program out of the multiplex stream by use of a tuner to tune to the multiplex frequency and a decoder to decode the desired program such as by using its associated program ID (PID).
One problem with statistical multiplexing is that it requires the provision of specialized equipment within the network, which is cost effective only for broadcast programs which serve a large number of clients in a simultaneous manner of program distribution. That is, given a large number of customers, the cost per customer may be extremely small, even for expensive encoders and rate shapers. However, for applications such as video-on-demand (VOD) and network digital video recording (NDVR), which involves a single customer, the bandwidth management costs associated with statistical multiplexing may be unacceptably high.
Another technique that may be used to manage bandwidth involves adaptively streaming data as streaming media. Streaming media differs from ordinary media in the sense that streaming media may be played out as soon as it is received rather than waiting for the entire file to be transferred over the network. One advantage associated with streaming media is that it allows the user to begin viewing the content immediately or on a real-time basis with rather short delay. In contrast, simply downloading a media file to a customer, which is a very effective technique to manage bandwidth because it allows for very wide swings in bit rate, does not allow the user to begin viewing the content on a real-time basis. The downloaded media must also be stored prior to playback, requiring significant and often expensive storage capabilities on the subscriber device. This delay and associated additional expense may be unacceptable to customers, thereby making streaming media more attractive.
In adaptive streaming, the encoded bit rate of a media data stream is dynamically adjusted depending on specific network conditions. To achieve this, the streaming server continuously estimates the current state of the network and adjusts the transmitted bit rate upward or downward depending on the available bandwidth associated with that particular streaming communication frequency data link.
One problem with media streaming architectures is the tight coupling that is required between the streaming server and client. The communication between the client and server that streaming media requires creates additional server overhead, because the server tracks the current state of each client. Significantly, this limits the scalability of the server as the number of media data streams being streamed increases. In addition, the client cannot quickly react to changing conditions, such as increased packet loss, reduced bandwidth, user requests for different content or to modify the existing content (e.g., speed up or rewind), and so forth, without first communicating with the server and waiting for the server to adapt and respond. Often, when a client reports a lower available bandwidth, the server does not adapt quickly enough causing breaks in the media to be noticed by the user on the client as packets that exceed the available bandwidth are not received and new lower bit rate packets are not sent from the server in time. To avoid these problems, clients often buffer data, but buffering introduces latency, which for live events may be unacceptable.
Accordingly, neither statistical multiplexing nor adaptive streaming are fully satisfactory methods of managing bandwidth in a network.
In accordance with one aspect of the invention, a method is provided for delivering streaming media content to client devices over a network. The method includes receiving, for each of a plurality of services, a plurality of media streams encoded at different bit rates. The plurality of media streams for each service contain common content to be received by one or more of the client devices. Each of the media streams includes a plurality of segments having a prescribed duration. For each service a need parameter is obtained for each segment contained within the media streams for that service. Each need parameter reflects a bit rate needed to transmit over the network the respective segment of the media streams for that service at a given quality level. One of the media streams for each service is selected by allocating bandwidth to the media streams based at least in part on the need parameters for each corresponding segment of the media streams. The selected media streams are multiplexed to thereby form a multiplexed stream. The multiplexed stream is adaptively streamed over the network to the client devices.
In accordance with another aspect of the invention, a statistical multiplexing streaming server is provided. The statistical multiplexing streaming server includes a statistical multiplexer for receiving, for each of a plurality of services, a plurality of media streams encoded at different bit rates. The plurality of media streams for each service contain common content to be received by one or more client devices. Each of the media streams has associated therewith one or more need parameters reflecting a bit rate needed to transmit the respective media stream over a network at a given quality level. The statistical multiplexer includes (i) a rate control processor for selecting one of the plurality of media streams for each service by allocating bandwidth to the media streams based on the need parameters associated therewith, and (ii) a multiplexer for multiplexing each of the selected media streams to form a multiplexed media stream. The statistical multiplexing streaming server also includes an adaptive streaming server for receiving the multiplexed media stream from the statistical multiplexer and adaptively streaming the multiplexed media stream over a network to the client devices.
Both statistical multiplexing and adaptive streaming can be used together to manage bandwidth in a manner that can overcome the aforementioned problems. As detailed below, for each service (e.g., program, channel) to be delivered to client devices a multi-bit rate video encoder creates two or more media streams which have the same content, but which are encoded at different bit rates. The video encoder appends to or otherwise associates with each media stream a need parameter. The need parameter gives an indication of the bit rate required to transmit a short segment (e.g., 1-5 seconds) of the media stream with which it is associated. The required bit rate may be determined, for example, by a desired or otherwise specified video quality level. A statistical multiplexer aggregates the need parameters on a segment by segment basis for all the services and allocates bandwidth to each service, again on a segment by segment basis, based on its need parameter relative to the aggregate need parameter for all the services.
A headend 210 is in communication with each of the client devices 220, 230 and 240 via IP network 270. Mobile phone 220 communicates with headend 210 over the IP network 270 and a wireless network such as a GSM or a UMTS network, for example. Set top terminal 230 communicates with headend 210 over the IP network 270 and a hybrid fiber/coax (HFC) network 260 and PC 240 communicates with the headend 210 over the IP network 270, typically via an Internet service provider (not shown). Of course, the architecture depicted in
The headend 210 is the facility from which a network operator delivers programming content and provides other services to the client devices. The headend 210 includes a multi-bit rate video encoder 218 and a statistical multiplexing streaming server 215. The multi-bit rate video encoder 218 supplies programming content to the statistical multiplexing streaming server 215 at different bit rates. That is, for any given service (e.g., program) multi-bit rate video encoder 218 provides multiple media streams for that service which require different amounts of bandwidth to be transmitted. For instance, multi-bit rate video encoder 218 may provide multiple media streams of a given service at e.g., 2, 4, 8 and 15 MbPs, thereby providing two standard definition media streams and two high definition media streams.
It should be noted that in some cases the functionality of some all of the statistical multiplexing streaming server 215 may be performed in one of the networks themselves instead of the headend. For instance, the functionality of the statistical multiplexing streaming server 215 may be transferred to one or more hubs in the HFC network 260.
Statistical multiplexing streaming server 215 receives the multiple media streams for each service from the multi-bit rate video encoder 218. Statistical multiplexing streaming server 215 selects one media stream for each service, which are then multiplexed and streamed to the client devices over the appropriate network. The term “streaming” is used to indicate that the data representing the media content is provided over a network to a client device and that playback of the content can begin prior to the content being delivered in its entirety (e.g., providing the data on an as-needed basis rather than pre-delivering the data in its entirety before playback).
As is well known to those of ordinary skill in the art, the transport of media streams preferably uses a video encoding standard, such as MPEG-2, and a transport standard such as MPEG-2 Transport Stream, MPEG-4, or Real-Time Protocol (RTP), as well as coding standards for audio and ancillary data. Higher end client devices such as set top terminals typically receive content encapsulated in an MPEG-2 Transport Stream whereas lower end client devices such as a PDA receive content encapsulated using a transport protocol such as the real-time protocol (RTP).
Since many higher end devices such as set top terminals are capable of receiving and decoding MPEG-2 Transport Streams, statistical multiplexing streaming server 215 will typically deliver the encoded content in this format. On the other hand, if, for instance, RTP is used as the delivery mechanism, in the example shown in
Of course, the media streams may be encoded by the multi-bit rate video encoder 218 in accordance with a variety of media formats and is not limited MPEG-2. For instance, the media streams may be encoded in accordance with other media formats, including but not limited to Hypertext Markup Language (HTML), Virtual Hypertext Markup Language (VHTML), X markup language (XML), H.261, H.263, H.264 or VC1 formats as well as any of the MPEG standards such as MPEG-2 and MPEG-4. A video stream that conforms to the MPEG-2 standard will be used herein for illustrative purposes only and not as a limitation on the invention.
Each media stream delivered by the multi-bit rate video encoder 218 may be logically divided into a series of segments that have a predetermined duration. Each media stream includes a need parameter for each segment, which may be carried in metadata or other syntax formats. The need parameter is one measure that gives information regarding the difficultly involved with compressing the segment with which it is associated, which in turn gives an indication of the bit rate required to transmit that specified segment of the media stream at a given quality level. Since the need parameter inherently depends on the complexity of a given segment, each segment have only one need parameter per service. That is, corresponding segments of the various media streams encoded at different bit rates for a given service will all have the same need parameter.
The need parameters for each segment of the media streams may be calculated in any of a variety of different ways. For instance, in the case of an MPEG-2 digital stream the complexity of a video frame is measured by the product of the quantization level (QL) used to encode that frame and the number of bits used for coding the frame (R). This means the complexity of a frame is not known until it has been encoded. If the video encoder 218 receives content from a content provider which has already been compressed using a suitable encoding technique such as MPEG-2, then the need parameter may specify the actual complexity of each media stream interval. In this case the video encoder 218 may serve as a transcoder which converts a pre-compressed video bit stream into another bit stream at a new rate. For simplicity, both encoders and transcoders are referred to herein as encoders.
On the other hand, if the content from the content provider is first encoded by the video encoder 218, then the need parameter provided by the video encoder 218 may not specify the actual complexity of a segment because the need parameter may be calculated before the segment has actually been encoded. In this case the need parameter provided by the video encoder 218 may estimate the complexity by using, for example, some pre-encoding statistics about the media stream, such as intra-frame activity, or motion estimation (ME) scores, which can be used as a substitute for the traditional measure of complexity. Examples illustrating how a need parameter can be calculated are shown in U.S. Pat. No. 6,731,685.
As previously mentioned, the need parameter may be included as metadata in the encoded media streams provided by the video encoder 218. The manner in which the metadata is carried by the media stream will in part depend on the encoding scheme that is used. For instance, in the case of MPEG-2, the MPEG protocol supports the carriage of metadata that can be used to provide instructions or other information to a downstream device such as a statistical multiplexer. More particularly, MPEG-2 supports the incorporation of private metadata, which in some implementations may be conveniently used to carry the need parameter metadata. Such private metadata may be incorporated into the MPEG-2 encoded video stream as private metadata at the transport stream level, the program elementary stream (PES) level, or the video sequence level (i.e., the level at which images such as I, B and P pictures are defined). The private metadata may be embodied in any appropriate data structure that may in part depend on the level at which the information is embedded. For instance, in one particular implementation the need parameter may be located in the adaptation field of the transport packets and a descriptor that describes the structure of the metadata may be located in the program map table (PMT).
Alternatively, the need parameters may be incorporated into a manifest that is transmitted by the video encoder 218 to the statistical multiplexing streaming server 215 as a separate file, either before or after the media streams are transmitted. For example, one streaming technique that employs such a manifest is specified in the HTTP Line Streaming (HLS) protocol.
In the example shown in
The statistical multiplexer 520 selects one of the media streams for each service 515, 518 and 522 based on the maximum available bandwidth on the network and the need parameter for the corresponding segments of each service. The particular media stream that is selected for each service may differ from segment to segment. In this way more complex segments of a given media stream can be allocated more bits than less complex segments of the media stream without exceeding the total bandwidth that is available. The statistical multiplexer 520 then multiplexes the selected media stream segments and sends them to the adaptive streaming server. In
As mentioned above, the rate control processor 325 allocates bandwidth (i.e., bit rates) to the services that are to be delivered to client devices by collecting the latest need parameters from a media stream associated with each service. As previously mentioned, the media streams associated with each program or service all have the same need parameters and thus the need parameters only need to be extracted from one media stream for each of the services. In some cases the rate control processor 325 may also has access to minimum and maximum bandwidth limits established by the user for each individual channel and/or the capabilities of the client device. Prior to selecting the appropriate media stream for a given segment of each service, the rate control processor 325 sums up all the need parameters for that segment and assigns a need bit rate to each stream in proportion to the stream's need parameter. A stream having a need parameter that indicates that its stream during that particular interval is more complex than another stream will be allocated more bandwidth and hence assigned a higher bit rate.
If maximum and minimum bandwidth limits are established by each client device (or group of client devices) that is receiving a respective stream, the rate control processor 325 will generally attempt to honor all minimum bandwidth limits first. If the sum total of all minimum bandwidths for a given segment exceeds the total bandwidth available to all the streams, the rate control processor 325 distributes the total bandwidth amongst all the streams in proportion to their minimum bandwidth limits. Otherwise, each service is given its minimum bandwidth for that segment.
The minimum bandwidth assigned to any particular media stream is subtracted from the need bit rate previously calculated for that stream, since this indicates a portion of the need had been satisfied. After allocating the minimum bandwidths, any remaining bandwidth is distributed to all the services in proportion to their remaining needs, with the constraint that no stream can exceed its maximum bandwidth limit.
If there is still available bandwidth remaining for that segment, (which is possible if one or more media streams reach their maximum bandwidth limits), the remaining bandwidth may be distributed to those streams that have not yet reached their maximum bandwidth limit. This distribution may be made according to the ratio of a given stream's need parameter to the sum of the need parameters belonging to those streams that have not reached their maximum bandwidth limit.
Once the rate controller has allocated bandwidth to each service, it selects the media stream for each service having a bit rate that mostly closely conforms to the allocated bandwidth. In particular, the rate controller 325 will generally select a media stream having a bit rate that is closest to, but not greater than, the allocated bandwidth. In this way the maximum available bandwidth will not be exceeded when the selected media streams are multiplexed together. The media streams which are selected for each service may of course vary from segment to segment based on the relative amount of bandwidth they each require. As a consequence, the statistical multiplexer 520 may be regularly changing its selection and therefore switching from one media stream to another media stream for each service.
The media streams that are selected by the rate controller 325 are sent by the FIFO buffers 3121, 3122 and 3123 to the multiplexer 320. The multiplexer 320 multiplexes the selected streams and sends them to the transport buffer 330 for transmission to the adaptive streaming server 510 shown in
One example of a streaming server that may employ the methods, techniques and systems described herein is shown in
The streaming server 100 includes a memory array 101, an interconnect device 102, and stream server modules 103a through 103n (103). Memory array 101 is used to store the on-demand content and could be many Gigabytes or Terabytes in size. Such memory arrays may be built from conventional memory solid state memory including, but not limited to, dynamic random access memory (DRAM) and synchronous DRAM (SDRAM). The stream server modules 103 retrieve the content from the memory array 101 and generate multiple asynchronous streams of data that can be transmitted to the client devices. The interconnect 102 controls the transfer of data between the memory array 101 and the stream server modules 103. The interconnect 102 also establishes priority among the stream server modules 103, determining the order in which the stream server modules receive data from the memory array 101.
The communication process starts with a stream request being sent from a client device (e.g., client devices 220, 230 and 240 in
Control functions, or non-streaming payloads, are handled by the master CPU 107. For instance, stream control in accordance with the RTSP protocol is performed by CPU 107. Program instructions in the master CPU 107 determine the location of the desired content or program material in memory array 101. The memory array 101 is a large scale memory buffer that can store video, audio and other information. In this manner, the server system 100 can provide a variety of content to multiple customer devices simultaneously. Each client device can receive the same content or different content. The content provided to each client device is transmitted as a unique asynchronous media stream of data that may or may not coincide in time with the unique asynchronous media streams sent to other client devices.
If the requested content is not already resident in the memory array 101, a request to load the program is issued over signal line 118, through a backplane interface 105 and over a signal line 119. An external processor or CPU (not shown) responds to the request by loading the requested program content over a backplane line 116, under the control of backplane interface 104. Backplane interface 104 is connected to the memory array 101 through the interconnect 102. This allows the memory array 101 to be shared by the stream server modules 103, as well as the backplane interface 104. The program content is written from the backplane interface 104, sent over signal line 115, through interconnect 102, over signal line 112, and finally to the memory array 101.
When the first block of program material has been loaded into memory array 101, the streaming output can begin. Streaming output can also be delayed until the entire program has been loaded into memory array 101, or at any point in between. Data playback is controlled by a selected one or more stream server modules 103. If the stream server module 103a is selected, for example, the stream server module 103a sends read requests over signal line 113a, through the interconnect 102, over a signal line 111 to the memory array 101. A block of data is read from the memory array 101, sent over signal line 112, through the interconnect 102, and over signal line 113a to the stream server module 103a. Once the block of data has arrived at the stream server module 103a, the transport protocol stack is generated for this block and the resulting primary media stream is sent to transport network 122a over signal line 114a. The transport network then carries the primary media stream to the client device. This process is repeated for each data block contained in the program source material.
If the requested program content already resides in the memory array 101, the CPU 107 informs the stream server module 103a of the actual location in the memory array. With this information, the stream server module can begin requesting the program stream from memory array 101 immediately.
The media access controllers 402 are connected utilizing signal lines 412a-412n (412), to media interface modules 403a-403n (403), which are responsible for the physical media of the network connection. This could be a twisted-pair transceiver for Ethernet, Fiber-Optic interface for Ethernet, SONET or many other suitable physical interfaces, which exist now or will be created in the future, such interfaces being appropriate for the physical low-level interface of the desired network. The media interface modules 403 then send the primary media streams over the signal lines 114a-114n (114) to the appropriate client device or devices.
In practice, the stream server processor 401 divides the input and output packets depending on their function. If the packet is an outgoing payload packet, it can be generated directly in the stream server processor (SSP) 401. The SSP 401 then sends the packet to MAC 402a, for example, over signal line 411a. The MAC 402a then uses the media interface module 403a and signal line 412a to send the packet as part of the primary stream to the network over signal line 114a.
Client control requests are received over network line 114a by the media interface module 403a, signal line 412a and MAC 402a. The MAC 402a then sends the request to the SSP 401. The SSP 401 then separates the control packets and forwards them to the module CPU 404 over the signal line 413. The module CPU 404 then utilizes a stored program in ROM/Flash ROM 406, or the like, to process the control packet. For program execution and storing local variables, it is typical to include some working RAM 407. The ROM 406 and RAM 407 are connected to the CPU over local bus 415, which is usually directly connected to the CPU 404.
The module CPU 404 from each stream server module uses signal line 414, control bus interface 405, and bus signal line 117 to forward requests for program content and related system control functions to the master CPU 107 in
As used in this application, the terms “component,” “module,” “unit,” “system,” “apparatus,” “interface,” or the like are generally intended to refer to a computer-related entity, either hardware, a combination of hardware and software, software, or software in execution. For example, a component may be, but is not limited to being, a process running on a processor, a processor, an object, an executable, a thread of execution, a program, and/or a computer. By way of illustration, both an application running on a controller and the controller can be a component. One or more components may reside within a process and/or thread of execution and a component may be localized on one computer and/or distributed between two or more computers.
Furthermore, the claimed subject matter may be implemented as a method, apparatus, or article of manufacture using standard programming and/or engineering techniques to produce software, firmware, hardware, or any combination thereof to control a computer to implement the disclosed subject matter. The term “article of manufacture” as used herein is intended to encompass a computer program accessible from any computer-readable device, carrier, or media. For example, computer readable storage media can include but are not limited to magnetic storage devices (e.g., hard disk, floppy disk, magnetic strips . . . ), optical disks (e.g., compact disk (CD), digital versatile disk (DVD) . . . ), smart cards, and flash memory devices (e.g., card, stick, key drive . . . ). Of course, those skilled in the art will recognize many modifications may be made to this configuration without departing from the scope or spirit of the claimed subject matter.
Although described specifically throughout the entirety of the instant disclosure, representative embodiments of the present invention have utility over a wide range of applications, and the above discussion is not intended and should not be construed to be limiting, but is offered as an illustrative discussion of aspects of the invention. What has been described and illustrated herein are embodiments of the invention along with some of their variations. The terms, descriptions and figures used herein are set forth by way of illustration only and are not meant as limitations. Those skilled in the art will recognize that many variations are possible within the spirit and scope of the invention, wherein the invention is intended to be defined by the following claims—and their equivalents—in which all terms are mean in their broadest reasonable sense unless otherwise indicated.