Adaptive streaming allows a client device to request content segments appropriate for the currently available network throughput. In order to support adaptive streaming of content, such as live programming or video on demand (VOD) content, content delivery systems typically make available segments of content encoded at multiple bitrates. A client device may initially begin consuming high bitrate encoded content segments and then experience a drop in available network throughput. In response, the client device may begin retrieving segments of content with a lower bitrate encoding. This allows the client device to continue consuming the content without interruption. A problem with current systems is that the segment boundaries corresponding to segments of various bitrate encodings may become out of alignment causing a segment at one bitrate encoding to have different boundaries from a segment at a second bitrate encoding. When a user device requests segments from among the different bitrate encodings, the user may experience inconsistent video playback or experience scene skipping forward (or backward) at the misaligned segment boundaries.
Some encoders insert encoder boundary points (EBPs) into encoded content streams to aide in keeping boundaries aligned during the segmentation process, however the boundary points may fall out of alignment, for various reasons, or packets of encoded content may be lost, along with any encoder boundary points in those packets, before reaching segmenters. Therefore, segment boundaries may become out of alignment among streams having various bitrate encodings.
In light of the foregoing background, the following presents a simplified summary of the present disclosure in order to provide a basic understanding of some aspects described herein. This summary is not an extensive overview, and is not intended to identify key or critical elements or to delineate the scope of the claims. The following summary merely presents various described aspects in a simplified form as a prelude to the more detailed description provided below.
In some aspects of the disclosure, a method is provided to synchronize the segmentation of multiple bitrate encodings of a content stream. In other aspects, a method is provided for determining a synchronization point for use in segmenting multiple encoded streams of a content item. In aspects of the disclosure, a controller may coordinate a segmenting process among a number of segmenters, causing the segmenters to create segments with time-aligned segment boundaries.
Some features herein are illustrated by way of example, and not by way of limitation, in the accompanying drawings. In the drawings, like numerals reference similar elements between the drawings.
In the following description of various illustrative embodiments, reference is made to the accompanying drawings, which form a part hereof, and in which is shown, by way of illustration, various embodiments in which aspects of the disclosure may be practiced. It is to be understood that other embodiments may be utilized and structural and functional modifications may be made, without departing from the scope of the present disclosure.
There may be one link 101 originating from the local office 103, and it may be split a number of times to distribute the signal to various premises 102 in the vicinity (which may be many miles) of the local office 103. The links 101 may include components not illustrated, such as splitters, filters, amplifiers, etc. to help convey the signal clearly, but in general each split introduces a bit of signal degradation. Portions of the links 101 may also be implemented with fiber-optic cable, while other portions may be implemented with coaxial cable, other lines, or wireless communication paths.
The local office 103 may include an interface 104, such as a termination system (TS), for example a cable modem termination system (CMTS) in an example of an HFC-type network, which may be a computing device configured to manage communications between devices on the network of links 101 and backend devices such as servers 105-107 (to be discussed further below). In the example of an HFC-type network, the TS may be as specified in a standard, such as the Data Over Cable Service Interface Specification (DOCSIS) standard, published by Cable Television Laboratories, Inc. (a.k.a. CableLabs), or it may be a similar or modified device instead. The TS may be configured to place data on one or more downstream frequencies to be received by modems at the various premises 102, and to receive upstream communications from those modems on one or more upstream frequencies. The local office 103 may also include one or more network interfaces 108, which can permit the local office 103 to communicate with various other external networks 109. These networks 109 may include, for example, Internet Protocol (IP) networks Internet devices, telephone networks, cellular telephone networks, fiber optic networks, local wireless networks (e.g., WiMAX), satellite networks, and any other desired network, and the interface 108 may include the corresponding circuitry needed to communicate on the network 109, and to other devices on the network such as a cellular telephone network and its corresponding cell phones.
As noted above, the local office 103 may include a variety of servers 105-107 that may be configured to perform various functions. For example, the local office 103 may include a push notification server 105. The push notification server 105 may generate push notifications to deliver data and/or commands to the various premises 102 in the network (or more specifically, to the devices in the premises 102 that are configured to detect such notifications). The local office 103 may also include a content server 106. The content server 106 may be one or more computing devices that are configured to provide content to users in the homes. This content may be, for example, video on demand movies, television programs, songs, audio, services, information, text listings, etc. In some embodiments, the content server 106 may include software to validate (or initiate the validation of) user identities and entitlements, locate and retrieve (or initiate the locating and retrieval of) requested content, encrypt the content, and initiate delivery (e.g., streaming, transmitting via a series of content fragments) of the content to the requesting user and/or device.
The local office 103 may also include one or more application servers 107. An application server 107 may be a computing device configured to offer any desired service, and may run various languages and operating systems (e.g., servlets and JSP pages running on Tomcat/MySQL, OSX, BSD, Ubuntu, Red Hat Linux, HTML5, JavaScript, AJAX and COMET). For example, an application server may be responsible for collecting television program listings information and generating a data download for electronic program guide listings. Another application server may be responsible for monitoring user media habits and collecting that information for use in selecting advertisements. Another application server may be responsible for formatting and inserting advertisements in a video stream and/or content item being transmitted to the premises 102. It should be understood by those skilled in the art that the same application server may be responsible for one or more of the above listed responsibilities.
An example premises 102a may include an interface 110 (such as a modem, or another receiver and/or transmitter device suitable for a particular network), which may include transmitters and receivers used to communicate on the links 101 and with the local office 103. The interface 110 may be, for example, a coaxial cable modem (for coaxial cable lines 101), a fiber interface node (for fiber optic lines 101), or any other desired modem device. The interface 110 may be connected to, or be a part of, a gateway interface device 111. The gateway interface device 111 may be a computing device that communicates with the interface 110 to allow one or more other devices in the home to communicate with the local office 103 and other devices beyond the local office. The gateway interface device 111 may be a set-top box (STB), digital video recorder (DVR), computer server, or any other desired computing device. The gateway interface device 111 may also include (not shown) local network interfaces to provide communication signals to other devices in the home (e.g., user devices), such as televisions 112, additional STBs 113, personal computers 114, laptop computers 115, wireless devices 116 (wireless laptops, tablets and netbooks, mobile phones, mobile televisions, personal digital assistants (PDA), etc.), telephones 117, window security sensors 118, door home security sensors 119, tablet computers 120, personal activity sensors 121, video cameras 122, motion detectors 123, microphones 124, and/or any other desired computers, sensors, and/or other devices. Examples of the local network interfaces may include Multimedia Over Coax Alliance (MoCA) interfaces, Ethernet interfaces, universal serial bus (USB) interfaces, wireless interfaces (e.g., IEEE 802.11), Bluetooth interfaces, and others.
One or more aspects of the disclosure may be embodied in computer-usable data and/or computer-executable instructions, such as in one or more program modules, executed by one or more computers (such as computing device 200) or other devices to perform any of the functions described herein. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types when executed by a processor in a computer or other data processing device. The computer executable instructions may be stored on one or more computer readable media such as a hard disk, optical disk, removable storage media, solid state memory, RAM, etc. The functionality of the program modules may be combined or distributed as desired in various embodiments. In addition, the functionality may be embodied in whole or in part in firmware or hardware equivalents such as integrated circuits, field programmable gate arrays (FPGA), and the like. Example data structures may be used to illustrate one or more aspects described herein, but these are merely illustrative examples.
The transcoder 304 may comprise any computing device that incorporates the use of at least one processor and at least one memory for storing software or processor executable instructions. The transcoder 304 may comprise random access memory (RAM), non-volatile memory, and an input/output (I/O) module for communicating with other components or elements of the example network configuration 300. A single transcoder 304 is depicted in
In some embodiments, the transcoder 304 may transcode a received content stream into various media rates or bitrates. In some embodiments, a received content stream may be transcoded into multiple representations of the content stream, with each representation at a different bitrate encoding. For example, a received content stream may be transcoded into MPEG representations (streams) at 250 Kbps, 500 Kbps, 1 Mbps, and the like. In some embodiments, a received content stream may be transcoded into various other alternate or additional representations, including HEVC (h.265) representations.
In order to produce transcoded content conforming to one or more of the many adaptive streaming technologies, such as HTTP Live Streaming (HLS), HTTP Dynamic Streaming (HDS), HTTP Smooth Streaming (HSS), or Dynamic Adaptive Streaming over HTTP (DASH), the transcoded content stream should be segmented into smaller chunks. The transcoder 304 may create and associate encoder boundary points (EBP) with each transcoded stream. Each EBP may be placed at a position corresponding to a particular program time in relation to a reference clock for the content, such as a presentation timestamp (PTS) in MPEG content. The encoder boundary points may convey information for use by the segmenters 308 and 312, discussed below, in segmenting the transcoded streams. In some embodiments, an encoder boundary point may comprise a structure as defined by the CableLabs Encoder Boundary Point specification OC-SP-EBP-I01-130118. For example, the EBP may designate a suggested segmentation point in a transcoded stream.
The transcoder 304 may transmit the transcoded streams to the segmenters 308 and 312, for example, via multicast, on the network 306. The segmenters 308 and 312 may comprise any computing device that incorporates the use of at least one processor and at least one memory for storing software or processor executable instructions. The segmenters 308 and 312 may comprise random access memory (RAM), non-volatile memory, and an input/output (I/O) module for communicating with other components or elements of the example network configuration 300. In various embodiments, a segmenter may segment one or more encoded streams. One or more segmenters may be used to segment transcoded streams of a content stream.
In some embodiments, the segmenters 308 and 312 may join one or more multicast groups in order to receive the transcoded streams transmitted by the transcoder 304. A segmenter, such as the segmenter 308, may receive one or more of the transcoded streams. In some embodiments, the segmenter 308 may receive all transcoded streams of a particular content stream. In other embodiments, transcoded streams of a particular content stream may be received by more than one segmenter. In some embodiments, some of the transcoded streams of a particular content stream may be received by the segmenter 308 while other transcoded streams of the particular content stream may be received by the segmenter 312.
The segmenters 308 and 312 may segment the received transcoded streams into chunks, also referred to as segments. In some embodiments, the segmenters 308 and 312 may segment the transcoded streams according to the encoder boundary points, mentioned above. For example, in some adaptive streaming technologies, it is preferable to create segments that are time aligned, so that client devices may select segments among various media rates in response to changing network conditions and play those segments without an interruption or discontinuity in the playback of the content.
In some embodiments, the controller 310 may communicate with the segmenters 308 and 312 in order to coordinate the segmenting process. The controller 310 may comprise any computing device that incorporates the use of at least one processor and at least one memory for storing software or processor executable instructions. The controller 310 may comprise random access memory (RAM), non-volatile memory, and an input/output (I/O) module for communicating with other components or elements of the example network configuration 300. The segmenters 308 and 312 may receive and buffer packets of the transcoded stream, as output by the transcoder 304. Each segmenter may assemble a number of packets and prepare to create a segment of the assembled packets. For example, the segmenters 308 and 312 may assemble packets to create segments each representing particular amounts of play time (2 seconds of play time, for example). As will be understood by those skilled in the art, 2 seconds of play time may represent various sized segments, according to the media rate used during transcoding. In various embodiments, the segmenters may assemble packets to create segments representing various other amounts of play time, for example, they may assemble packets to create segments representing 10 seconds or play time or 8 seconds of play time.
In some embodiments, the segmenters 308 and 312 may process the assembled packets to locate an encoder boundary point. Upon encountering an EBP in the transcoded stream, the segmenter may block or otherwise halt processing and indicate (e.g., report) the EBP to the controller 310. In some embodiments, the segmenter may indicate a clock position, such as a PTS, corresponding to the EBP.
In practice, the transcoded streams received by the segmenters 308 and 312 may contain various defects or errors. For example, packets delivered via multicast are not guaranteed delivery and may therefore be lost. In addition, the transcoder 304 may experience any number of errors and may associate an EBP incorrectly with a transcoded stream, for example, an EBP in one transcoded stream may be placed incorrectly and therefore may lead or lag, for example in relation to the PTS, an EBP in another transcoded stream, for a particular content stream. In some embodiments, the controller 310 may coordinate the segmenters so that they produce segments of the transcoded stream in sync, even in cases where the transcoder has incorrectly placed an EBP or in cases where a packet has been lost. In other words, the controller may cause the segmenters to produce segments with corrdinated boundaries.
Upon receiving an EBP indication from each segmenter, the controller 310 may determine a sync point for the content stream. In some embodiments, the sync point may be equated to the position of the most common EBP. For example, if there are five transcoded streams of a particular content stream, and an EBP at 6 seconds from a prior sync point is indicated for four of the five transcoded streams, while an EBP at 8 seconds from the prior sync point is indicated for the fifth transcoded stream, the controller may determine that the next sync point should correspond to the EBP at 6 seconds. In the example above, the EBP position was referenced to a prior sync point in order to simplify the example, but it is contemplated that the EBP position may be referenced to other reference points, such as the PTS in MPEG content, for example, or a starting position of the content.
Various other methods are contemplated for determining the sync point among the indicated EBPs, using the methods as described herein. For example, the sync point may be selected among the EBPs using statistical inference, including Bayesian inference, among others.
Once the controller 310 has determined the sync point, the controller 310 may communicate with the segmenters 308 and 312 to coordinate segmentation. In some embodiments, when a lagging EBP is indicated (e.g., reported) by a segmenter, the controller may cause the segmenter to continue processing and to indicate the next EBP without creating a segment.
In some embodiments, when no lagging EBP is indicated from a segmenter, the controller 310 may unblock any segmenters that have indicated an EBP in sync with the sync point and allow them to create a segment. When a segmenter is unblocked, it may continue processing the transcoded content stream to locate the next EBP. In some embodiments, the controller may cause any segmenter that has indicated a leading EBP to remain blocked at the previously indicated EBP.
In some embodiments, one or more additional controllers may be provided in a redundant arrangement whereby any additional controller may serve as backup in case of failure of another controller.
After a segmenter has generated a segment, the segment may be stored in the storage system 314. In some embodiments, the storage system may be a memory provided by the segmenter. In other embodiments, the storage system may be a network-attached storage (NAS) device. In still other embodiments, the storage system may be provided by a server, such as the origin server 316. The storage system 314 may comprise a data storage repository for storing content such as multimedia programs that may be requested by a client device. The storage system 314 may comprise magnetic hard disk drives, optical discs such as CDs and DVDs, and/or other optical media or optical drives, NAS devices, and/or any combination thereof. The programs stored in the storage system 314 may comprise any type of linear or non-linear program such as a video on demand (VOD) program. The program may comprise video, audio, or any type of multimedia program such a movies, sporting events, or shows, for example.
A Media Presentation Description (MPD) file may be generated to include links to the segments and may include information about the bitrate encoding used for each segment. In some embodiments, the manifest file may represent access information associated with segments for a particular content stream, such as a linear program or a VOD program. The MPD file may be stored at an origin server 316 (such as the content server 106 of
For playback of the content, the CPE 320 may access the origin server 316 (or one or more segmenters, in embodiments where the segmenters store the MPD file) via the network 318 to retrieve the MPD file and then select segments to play on a segment by segment basis, depending on network and other conditions. In some embodiments, the network 318 may include the communication links 101 discussed above, the external network 109, the network 210, an in-home network, a provider's wireless, coaxial, fiber, or hybrid fiber/coaxial distribution system (e.g., a DOCSIS network), or any other desired network. The CPE 320 may comprise any computing device that incorporates the use of at least one processor and at least one memory for storing software or processor executable instructions. The CPE 320 may comprise random access memory (RAM), non-volatile memory, and an input/output (I/O) module for communicating with other components or elements of the example network configuration 300. In some embodiments, the CPE 320 may include a DASH client application or other HTTP streaming client application. In some embodiments, the CPE 320 may correspond to the gateway interface device 111, personal computers 114, laptop computers 115 or other devices as illustrated in
At step 402, indications (e.g., reports) of encoder boundary points may be received from segmenters indicating that those segmenters have each encountered an encoder boundary point and blocked (stopped processing), awaiting instructions. At step 404, the indicated encoder boundary points may be analyzed to determine a sync point. In some embodiments, the sync point may be determined via a simple leadership election by considering the content or program time associated with each indicated encoder boundary point and choosing the most popular encoder boundary point as the sync point. In other words, of the indicated encoder boundary points, a most common value among the indicated values may be selected as the sync point.
At step 406, each encoder boundary point indicated by a segmenter may be compared to the sync point, determined above. Any segmenter indicating an encoder boundary point lagging behind the sync point may be determined to be a lagger. These segmenters may be unblocked to continue processing to the next encoder boundary point, without creating a segment. The process may then continue at 402 where the process may wait for the unblocked segmenter(s) to indicate (e.g., report) their next encoder boundary point.
If it is determined in step 406 that there are no laggers, the process may continue at step 410, where a segment event may take place according to the sync point. In some embodiments, a segment boundary may be generated by each of the segmenters at the sync point. Any ongoing segment may be ended and the next segment started at the sync point. A segment may be generated with a pre-determined time length, such as 2 seconds. In other words, a 2 second segment may be started in each transcoded stream, beginning at the sync point determined above.
At step 412, the encoder boundary points indicated by the segmenters may be compared with the sync point to determine whether any of the segmenters have indicated encoder boundary points ahead of the sync point determined above. If it is determined that one or more segmenters have indicated an encoder boundary point ahead of the sync point, these segmenters may be determined to be leaders, or segmenters that are ahead of the sync point. The process may continue at step 416 where the leading segmenters remain in a blocked state while all synced segmenters may be unblocked to continue processing. In some embodiments, a missing EBP may cause a segmenter to be determined to be a leader.
If at step 412 it is determined that there are no leading segmenters, the process may continue at step 414 where all segmenters may be unblocked to continue processing. The process may continue at step 418 where it may be determined whether there is additional content to be segmented. If there is additional content, the process may continue at step 402, otherwise the process may end.
Upon return from the determineSynch function, the process may detect whether any segmenters are lagging the sync point. If one or more lagging segmenters are detected, these segmenters are caused to be unblocked and to look for the next encoder boundary point, without creating a segment. The process may then return to wait for the unblocked segmenters to indicate the occurrence of the next encoder boundary point and block. If there are no lagging segmenters, a segmentation event may be caused to occur at all segmenters. In some embodiments, each segmenter may be caused to create a segment at the sync point of pre-determined time duration. If any segmenters have indicated an encoder boundary point occurrence that is leading the sync point, those segmenters are determined to be “leading.” Leading segmenters are put into the blocked state at the indicated encoder boundary point. For segmenters that have indicated an encoder boundary point that is the same as the sync point, those segmenters are unblocked to look for the next encoder boundary point. The process may then return to the beginning and wait for all unblocked segmenters to indicate an encoder boundary point and to block.
The above steps may be simultaneously taking place for any number of encoded streams.
The example implementation depicted in the source code listings of
The descriptions above are merely example embodiments of various concepts. They may be rearranged/divided/combined as desired, and one or more components or steps may be added or removed without departing from the spirit of the present disclosure. The scope of this patent should only be determined by the claims that follow.