Streaming in wireless and wired networks may utilize adaptation due to variable bandwidth in a network. Content providers may publish content encoded at multiple rates and/or resolutions, which may enable clients to adapt to varying channel bandwidth. For example, Moving Picture Experts Group (MPEG) and third generation partnership project (3GPP) Dynamic Adaptive Streaming over Hypertext Transfer Protocol (HTTP) (DASH) standards may define a framework for the design of an end-to-end service that may enable efficient and high-quality delivery of streaming services over wireless and wired networks.
The DASH standard may define types of connections between streams, which may be referred to as stream access points (SAPs). Catenation of streams along SAPs may produce a correctly decodable MPEG stream. However, the DASH standard does not provide means or guidelines for ensuring invisibility of transitions between streams. If no special measures are applied, stream switches in DASH playback may be noticeable and may lead to decreased quality of experience (QoE) for the user. Changes in visual quality may be particularly noticeable when differences between rates are relatively large, and, for example, may be particularly noticeable when changing from a higher-quality stream to a lower-quality stream.
A method and apparatus for providing smooth stream switching in video and/or audio encoding and decoding may be provided. Smooth stream switching may include the generation and/or display of one or more transition frames that may be utilized between streams of media content encoded at different rates. The transition frames may be generated via crossfading and overlapping, crossfading and transcoding, post-processing techniques using filtering, post-processing techniques using re-quantization, etc.
Smooth stream switching may include receiving a first data stream of media content and a second data stream of media content. The media content may include video. The first data stream may be characterized by a first signal-to-noise ratio (SNR). The second data stream may be characterized by a second SNR. The first SNR may be greater than the second SNR, or the first SNR may be less than the second SNR.
Transition frames may be generated using at least one of frames of the first data stream characterized by the first SNR and frames of the second data stream characterized by the second SNR. The transition frames may be characterized by one or more SNR values that are between the first SNR and the second SNR. The transition frames may be characterized by a transition time interval. The transition frames may be part of one segment of the media content. One or more frames of the first data stream may be displayed, the transition frames may be displayed, and one or more frames of the second data stream may be displayed, for example, in that order.
Generating the transition frames may include crossfading the frames characterized by the first SNR with the frames characterized by the second SNR to generate the transition frames. Crossfading may include calculating a weighted average of the frames characterized by the first SNR and the frames characterized by the second SNR to generate the transition frames. The weighted average may changes over time. Crossfading may include calculating a weighted average of the frames characterized by the first SNR and the frames characterized by the second SNR by applying a first weight to the frames characterized by the first SNR and a second weight to the frames characterized by the second SNR. At least one of the first weight and the second weight may change over the transition time interval. Crossfading may be performed using a linear transition or a non-linear transition between the first date stream and the second data stream.
The first data stream and second data stream may include overlapping frames of the media content. Crossfading the frames characterized by the first SNR with the frames characterized by the second SNR to generate the transition frames may include crossfading the overlapping frames of the first data stream and the second data stream to generate the transition frames. The overlapping frames may be characterized by corresponding frames of the first data stream and of the second data stream. The overlapping frames may be characterized by an overlap time interval. One or more frames of the first data stream may be displayed before the overlap time interval, the transition frames may be displayed during the overlap time interval, and one or more frames of the second data stream may be displayed after the overlap time interval. The one or more frames of the first data stream may be characterized by times preceding the overlap time interval and the one or more frames of the second data stream may be characterized by times succeeding the overlap time interval.
A subset of frames of the first data stream may be transcoded to generate corresponding frames characterized by the second SNR. Crossfading the frames characterized by the first SNR with the frames characterized by the second SNR to generate the transition frames may include crossfading the subset of frames of the first data stream with the corresponding frames characterized by the second SNR to generate the transition frames.
Generating the transition frames may include filtering the frames characterized by the first SNR using a low-pass filter characterized by a cutoff frequency that changes over the transition time interval to generate the transition frames. Generating the transition frames may include transforming and quantizing the frames characterized by the first SNR using one or more of step sizes to generate the transition frames.
A detailed description of illustrative embodiments will now be described with reference to the various Figures. Although this description provides a detailed example of possible implementations, it should be noted that the details are intended to be exemplary and in no way limit the scope of the application.
As shown in
The communications systems 100 may also include a base station 114a and a base station 114b. Each of the base stations 114a, 114b may be any type of device configured to wirelessly interface with at least one of the WTRUs 102a, 102b, 102c, 102d to facilitate access to one or more communication networks, such as the core network 106/107/109, the Internet 110, and/or the networks 112. By way of example, the base stations 114a, 114b may be a base transceiver station (BTS), a Node-B, an eNode B, a Home Node B, a Home eNode B, a site controller, an access point (AP), a wireless router, and the like. While the base stations 114a, 114b are each depicted as a single element, it will be appreciated that the base stations 114a, 114b may include any number of interconnected base stations and/or network elements.
The base station 114a may be part of the RAN 103/104/105, which may also include other base stations and/or network elements (not shown), such as a base station controller (BSC), a radio network controller (RNC), relay nodes, etc. The base station 114a and/or the base station 114b may be configured to transmit and/or receive wireless signals within a particular geographic region, which may be referred to as a cell (not shown). The cell may further be divided into cell sectors. For example, the cell associated with the base station 114a may be divided into three sectors. Thus, in one embodiment, the base station 114a may include three transceivers, e.g., one for each sector of the cell. In another embodiment, the base station 114a may employ multiple-input multiple output (MIMO) technology and, therefore, may utilize multiple transceivers for each sector of the cell.
The base stations 114a, 114b may communicate with one or more of the WTRUs 102a, 102b, 102c, 102d over an air interface 115/116/117, which may be any suitable wireless communication link (e.g., radio frequency (RF), microwave, infrared (IR), ultraviolet (UV), visible light, etc.). The air interface 115/116/117 may be established using any suitable radio access technology (RAT).
More specifically, as noted above, the communications system 100 may be a multiple access system and may employ one or more channel access schemes, such as CDMA, TDMA, FDMA, OFDMA, SC-FDMA, and the like. For example, the base station 114a in the RAN 103/104/105 and the WTRUs 102a, 102b, 102c may implement a radio technology such as Universal Mobile Telecommunications System (UMTS) Terrestrial Radio Access (UTRA), which may establish the air interface 115/116/117 using wideband CDMA (WCDMA). WCDMA may include communication protocols such as High-Speed Packet Access (HSPA) and/or Evolved HSPA (HSPA+). HSPA may include High-Speed Downlink Packet Access (HSDPA) and/or High-Speed Uplink Packet Access (HSUPA).
In another embodiment, the base station 114a and the WTRUs 102a, 102b, 102c may implement a radio technology such as Evolved UMTS Terrestrial Radio Access (E-UTRA), which may establish the air interface 115/116/117 using Long Term Evolution (LTE) and/or LTE-Advanced (LTE-A).
In other embodiments, the base station 114a and the WTRUs 102a, 102b, 102c may implement radio technologies such as IEEE 802.16 (e.g., Worldwide Interoperability for Microwave Access (WiMAX)), CDMA2000, CDMA2000 1X, CDMA2000 EV-DO, Interim Standard 2000 (IS-2000), Interim Standard 95 (IS-95), Interim Standard 856 (IS-856), Global System for Mobile communications (GSM), Enhanced Data rates for GSM Evolution (EDGE), GSM EDGE (GERAN), and the like.
The base station 114b in
The RAN 103/104/105 may be in communication with the core network 106/107/109, which may be any type of network configured to provide voice, data, applications, and/or voice over internet protocol (VoIP) services to one or more of the WTRUs 102a, 102b, 102c, 102d. For example, the core network 106/107/109 may provide call control, billing services, mobile location-based services, pre-paid calling, Internet connectivity, video distribution, etc., and/or perform high-level security functions, such as user authentication. Although not shown in
The core network 106/107/109 may also serve as a gateway for the WTRUs 102a, 102b, 102c, 102d to access the PSTN 108, the Internet 110, and/or other networks 112. The PSTN 108 may include circuit-switched telephone networks that provide plain old telephone service (POTS). The Internet 110 may include a global system of interconnected computer networks and devices that use common communication protocols, such as the transmission control protocol (TCP), user datagram protocol (UDP) and the internet protocol (IP) in the TCP/IP internet protocol suite. The networks 112 may include wired or wireless communications networks owned and/or operated by other service providers. For example, the networks 112 may include another core network connected to one or more RANs, which may employ the same RAT as the RAN 103/104/105 or a different RAT.
Some or all of the WTRUs 102a, 102b, 102c, 102d in the communications system 100 may include multi-mode capabilities, e.g., the WTRUs 102a, 102b, 102c, 102d may include multiple transceivers for communicating with different wireless networks over different wireless links. For example, the WTRU 102c shown in
The processor 118 may be a general purpose processor, a special purpose processor, a conventional processor, a digital signal processor (DSP), a plurality of microprocessors, one or more microprocessors in association with a DSP core, a controller, a microcontroller, Application Specific Integrated Circuits (ASICs), Field Programmable Gate Array (FPGAs) circuits, any other type of integrated circuit (IC), a state machine, and the like. The processor 118 may perform signal coding, data processing, power control, input/output processing, and/or any other functionality that enables the WTRU 102 to operate in a wireless environment. The processor 118 may be coupled to the transceiver 120, which may be coupled to the transmit/receive element 122. While
The transmit/receive element 122 may be configured to transmit signals to, or receive signals from, a base station (e.g., the base station 114a) over the air interface 115/116/117. For example, in one embodiment, the transmit/receive element 122 may be an antenna configured to transmit and/or receive RF signals. In another embodiment, the transmit/receive element 122 may be an emitter/detector configured to transmit and/or receive IR, UV, or visible light signals, for example. In yet another embodiment, the transmit/receive element 122 may be configured to transmit and receive both RF and light signals. It will be appreciated that the transmit/receive element 122 may be configured to transmit and/or receive any combination of wireless signals.
In addition, although the transmit/receive element 122 is depicted in
The transceiver 120 may be configured to modulate the signals that are to be transmitted by the transmit/receive element 122 and to demodulate the signals that are received by the transmit/receive element 122. As noted above, the WTRU 102 may have multi-mode capabilities. Thus, the transceiver 120 may include multiple transceivers for enabling the WTRU 102 to communicate via multiple RATs, such as UTRA and IEEE 802.11, for example.
The processor 118 of the WTRU 102 may be coupled to, and may receive user input data from, the speaker/microphone 124, the keypad 126, and/or the display/touchpad 128 (e.g., a liquid crystal display (LCD) display unit or organic light-emitting diode (OLED) display unit). The processor 118 may also output user data to the speaker/microphone 124, the keypad 126, and/or the display/touchpad 128. In addition, the processor 118 may access information from, and store data in, any type of suitable memory, such as the non-removable memory 130 and/or the removable memory 132. The non-removable memory 130 may include random-access memory (RAM), read-only memory (ROM), a hard disk, or any other type of memory storage device. The removable memory 132 may include a subscriber identity module (SIM) card, a memory stick, a secure digital (SD) memory card, and the like. In other embodiments, the processor 118 may access information from, and store data in, memory that is not physically located on the WTRU 102, such as on a server or a home computer (not shown).
The processor 118 may receive power from the power source 134, and may be configured to distribute and/or control the power to the other components in the WTRU 102. The power source 134 may be any suitable device for powering the WTRU 102. For example, the power source 134 may include one or more dry cell batteries (e.g., nickel-cadmium (NiCd), nickel-zinc (NiZn), nickel metal hydride (NiMH), lithium-ion (Li-ion), etc.), solar cells, fuel cells, and the like.
The processor 118 may also be coupled to the GPS chipset 136, which may be configured to provide location information (e.g., longitude and latitude) regarding the current location of the WTRU 102. In addition to, or in lieu of, the information from the GPS chipset 136, the WTRU 102 may receive location information over the air interface 115/116/117 from a base station (e.g., base stations 114a, 114b) and/or determine its location based on the timing of the signals being received from two or more nearby base stations. It will be appreciated that the WTRU 102 may acquire location information by way of any suitable location-determination method while remaining consistent with an embodiment.
The processor 118 may further be coupled to other peripherals 138, which may include one or more software and/or hardware modules that provide additional features, functionality and/or wired or wireless connectivity. For example, the peripherals 138 may include an accelerometer, an e-compass, a satellite transceiver, a digital camera (for photographs or video), a universal serial bus (USB) port, a vibration device, a television transceiver, a hands free headset, a Bluetooth® module, a frequency modulated (FM) radio unit, a digital music player, a media player, a video game player module, an Internet browser, and the like.
As shown in
The core network 106 shown in
The RNC 142a in the RAN 103 may be connected to the MSC 146 in the core network 106 via an IuCS interface. The MSC 146 may be connected to the MGW 144. The MSC 146 and the MGW 144 may provide the WTRUs 102a, 102b, 102c with access to circuit-switched networks, such as the PSTN 108, to facilitate communications between the WTRUs 102a, 102b, 102c and traditional land-line communications devices.
The RNC 142a in the RAN 103 may also be connected to the SGSN 148 in the core network 106 via an IuPS interface. The SGSN 148 may be connected to the GGSN 150. The SGSN 148 and the GGSN 150 may provide the WTRUs 102a, 102b, 102c with access to packet-switched networks, such as the Internet 110, to facilitate communications between and the WTRUs 102a, 102b, 102c and IP-enabled devices.
As noted above, the core network 106 may also be connected to the networks 112, which may include other wired or wireless networks that are owned and/or operated by other service providers.
The RAN 104 may include eNode-Bs 160a, 160b, 160c, though it will be appreciated that the RAN 104 may include any number of eNode-Bs while remaining consistent with an embodiment. The eNode-Bs 160a, 160b, 160c may each include one or more transceivers for communicating with the WTRUs 102a, 102b, 102c over the air interface 116. In one embodiment, the eNode-Bs 160a, 160b, 160c may implement MIMO technology. Thus, the eNode-B 160a, for example, may use multiple antennas to transmit wireless signals to, and receive wireless signals from, the WTRU 102a.
Each of the eNode-Bs 160a, 160b, 160c may be associated with a particular cell (not shown) and may be configured to handle radio resource management decisions, handover decisions, scheduling of users in the uplink and/or downlink, and the like. As shown in
The core network 107 shown in
The MME 162 may be connected to each of the eNode-Bs 160a, 160b, 160c in the RAN 104 via an S1 interface and may serve as a control node. For example, the MME 162 may be responsible for authenticating users of the WTRUs 102a, 102b, 102c, bearer activation/deactivation, selecting a particular serving gateway during an initial attach of the WTRUs 102a, 102b, 102c, and the like. The MME 162 may also provide a control plane function for switching between the RAN 104 and other RANs (not shown) that employ other radio technologies, such as GSM or WCDMA.
The serving gateway 164 may be connected to each of the eNode-Bs 160a, 160b, 160c in the RAN 104 via the S1 interface. The serving gateway 164 may generally route and forward user data packets to/from the WTRUs 102a, 102b, 102c. The serving gateway 164 may also perform other functions, such as anchoring user planes during inter-eNode B handovers, triggering paging when downlink data is available for the WTRUs 102a, 102b, 102c, managing and storing contexts of the WTRUs 102a, 102b, 102c, and the like.
The serving gateway 164 may also be connected to the PDN gateway 166, which may provide the WTRUs 102a, 102b, 102c with access to packet-switched networks, such as the Internet 110, to facilitate communications between the WTRUs 102a, 102b, 102c and IP-enabled devices.
The core network 107 may facilitate communications with other networks. For example, the core network 107 may provide the WTRUs 102a, 102b, 102c with access to circuit-switched networks, such as the PSTN 108, to facilitate communications between the WTRUs 102a, 102b, 102c and traditional land-line communications devices. For example, the core network 107 may include, or may communicate with, an IP gateway (e.g., an IP multimedia subsystem (IMS) server) that serves as an interface between the core network 107 and the PSTN 108. In addition, the core network 107 may provide the WTRUs 102a, 102b, 102c with access to the networks 112, which may include other wired or wireless networks that are owned and/or operated by other service providers.
As shown in
The air interface 117 between the WTRUs 102a, 102b, 102c and the RAN 105 may be defined as an R1 reference point that implements the IEEE 802.16 specification. In addition, each of the WTRUs 102a, 102b, 102c may establish a logical interface (not shown) with the core network 109. The logical interface between the WTRUs 102a, 102b, 102c and the core network 109 may be defined as an R2 reference point, which may be used for authentication, authorization, IP host configuration management, and/or mobility management.
The communication link between each of the base stations 180a, 180b, 180c may be defined as an R8 reference point that includes protocols for facilitating WTRU handovers and the transfer of data between base stations. The communication link between the base stations 180a, 180b, 180c and the ASN gateway 182 may be defined as an R6 reference point. The R6 reference point may include protocols for facilitating mobility management based on mobility events associated with each of the WTRUs 102a, 102b, 102c.
As shown in
The MIP-HA may be responsible for IP address management, and may enable the WTRUs 102a, 102b, 102c to roam between different ASNs and/or different core networks. The MIP-HA 184 may provide the WTRUs 102a, 102b, 102c with access to packet-switched networks, such as the Internet 110, to facilitate communications between the WTRUs 102a, 102b, 102c and IP-enabled devices. The AAA server 186 may be responsible for user authentication and for supporting user services. The gateway 188 may facilitate interworking with other networks. For example, the gateway 188 may provide the WTRUs 102a, 102b, 102c with access to circuit-switched networks, such as the PSTN 108, to facilitate communications between the WTRUs 102a, 102b, 102c and traditional land-line communications devices. In addition, the gateway 188 may provide the WTRUs 102a, 102b, 102c with access to the networks 112, which may include other wired or wireless networks that are owned and/or operated by other service providers.
Although not shown in
Streaming in wired and wireless networks (e.g., 3 G, WiFi, Internet, the networks shown in
In a bandwidth adaptive streaming system, a content provider may offer the same content at one or more different bitrates, for example as shown in
Publishing of content at multiple rates may pose challenges, for example, an increase in production, quality assurance management, storage cost, etc. A number of rates/resolutions (e.g., three, four, five, etc.) may be made available.
Stream switching behavior may be controlled by the server, for example, based on client or network feedback. This model may be used with streaming technologies based on RTP/RTSP protocols, for example.
Bandwidth of an access network may vary, for example, due to the underlying technology used (e.g., as shown in Table 1) and/or due to a number of users, location, signal strength, etc. Table 1 illustrates an example of peak bandwidth of an access network.
Content may be viewed on screens having different sizes, for example on smartphones, tablets, laptops, and larger screens such as HDTVs, for example. Table 2 illustrates an example of sample screen resolutions of various devices that may include multimedia streaming capabilities. Providing a small number of rates may not be enough to provide a good user experience to a variety of clients.
An example of screen resolutions that may be utilized by the implementations described herein is listed in Table 3.
Content providers, such as YouTube®, iTunes®, Hulu®, etc., for example, may use HTTP progressive download to distribute multimedia content. HTTP progressive download may include content being downloaded (e.g., partially or fully) before it can be played back. Distribution using HTTP may be an internet transport protocol that may not be blocked by firewalls. Other protocols, such as RTP/RTSP or multicasting, for example, may be blocked by firewalls or disabled by internes service providers. Progressive download may not support bandwidth adaptation. Techniques for bandwidth adaptive multimedia streaming over HTTP may be developed for distributing live and on-demand content over packet networks.
A media presentation may be encoded at one or more bitrates, for example, in bandwidth adaptive streaming over HTTP. An encoding of the media presentation may be partitioned into one or more segments of shorter duration, for example as shown in
Bandwidth adaptive HTTP streaming techniques may include HTTP Live Streaming (HLS), Smooth Streaming, HTTP Dynamic Streaming, HTTP Adaptive Streaming (HAS), and Adaptive HTTP Streaming (AHS), for example.
Dynamic Adaptive HTTP Streaming (DASH) may consolidate several approaches for HTTP streaming. DASH may be used to cope with variable bandwidth in wireless and wired networks. DASH may be supported by a large number of content providers and devices.
In DASH, the streaming session may be controlled by the client 601 by requesting segments using HTTP and splicing the segments together as they are received from the content provider and/or CDN 603. A client 601 may monitor (e.g., continually monitor) and adjust media rate, for example, based on network conditions (e.g., packet error rate, delay jitter, etc.) and/or the state of the client 601 (e.g., buffer fullness, user behavior and preferences, etc.), for example, to effectively move intelligence from the network to the client 601.
An adaptation set may refer to a set of interchangeable encoded versions of one or more media content components. For example, there may be an adaptation set for video, for primary audio, for secondary audio, for captions, etc. An adaptation set may be multiplexed. Interchangeable versions of the multiplex may be described as a single adaptation set. For example, an adaptation set may include both video and main audio for a period.
A representation may refer to a deliverable encoded version of one or more media content components. A representation may include one or more media streams (e.g., one for each media content component in the multiplex). A representation within an adaptation set may be sufficient to render the media content components. A client may switch from representation to representation within an adaptation set in order to adapt to network conditions and/or other factors. A client may ignore a representation that use codecs, profiles, and/or parameters that the client does not support.
Content within a representation may be divided in time into one or more segments of fixed or variable length. A URL may be provided for a segment (e.g., for each segment). A segment may be the largest unit of data that can be retrieved with a single HTTP request.
The Media Presentation Description (MPD) file may be an XML document that includes metadata that may be used by a DASH client to construct appropriate HTTP-URLs to access one or more segments and/or to provide the streaming service to the user. A base URL in the MPD file may be used by the client to generate HTTP GET requests for one or more segments and/or other resources in the Media Presentation. HTTP partial GET requests may be used to access a limited portion of a segment, for example, by using a byte range (e.g., via the ‘Range’ HTTP header). Alternative base URLs may be specified to allow access to the presentation in case a location is unavailable. Alternative base URLs may provide redundancy to the delivery of multimedia streams, for example, which may allow client-side load balancing and/or parallel download.
An MPD file may be of type static or dynamic. A static MPD file type may not change during the Media Presentation. A static MPD file may be used for on demand presentations. A dynamic MPD file type may be updated during the Media Presentation. A dynamic MPD file type may be used for live presentations. An MPD file may be updated, for example to extend the list of segments for a representation, to introduce a new period, to terminate the Media Presentation, and/or to process or adjust a timeline.
In DASH, encoded versions of different media content components (e.g., video, audio) may share a common timeline. The presentation time of access units within the media content may be mapped to a global common presentation timeline, which may be referred to as a media presentation timeline. The media presentation timeline may allow for synchronization of different media components. The media presentation timeline may enable seamless switching of different coded versions (e.g., Representations) of the same media components.
A segment may include the actual segmented media streams. A segment may include additional information relating to how to map a media stream into the media presentation timeline, for example, for switching and synchronous presentation with other representations.
A segment availability timeline may be used to signal clients the availability time of one or more segments at a specified HTTP URL. The availability time may be provided in wall-clock times. A client may compare the wall-clock time to a segment availability time, for example, before accessing the segments at the specified HTTP URL.
The availability time of one or more segments may be identical, for example, for on-demand content. Segments of the media presentation (e.g., all segments) may be available on the server once one of the segments is available. The MPD file may be a static document.
The availability time of one or more segments may depend on the position of the segment in the media presentation timeline, for example, for live content. A segment may become available with time as the content is produced. The MPD file may be updated (e.g., periodically) to reflect changes in the presentation over time. For example, one or more segment URLs for one or more new segments may be added to the MPD file. Segments that are no longer available may be removed from the MPD file. Updating the MPD file may not be necessary, for example, if segment URLs are described using a template.
The duration of a segment may represent the duration of the media included in the segment, for example, when presented at normal speed. The segments in a representation may have the same or roughly the same duration. Segment duration may differ from representation to representation. A DASH presentation may be constructed with one or more short segments (e.g., 2-8 seconds) and/or one or more longer segments. A DASH presentation may include a single segment for the entire representation.
Short segments may be suitable for live content (e.g., by reducing end-to-end latency) and may allow for high switching granularity at the segment level. Long segments may improve cache performance by reducing the number of files in the presentation. Long segments may enable a client to make flexible request sizes, for example, by using byte range requests. The use of long segments may compel the use of a segment index.
A segment may not be extended over time. A segment may be a complete and discrete unit that may be made available in its entirety. A segment may be referred to as a movie fragment. A segment may be subdivided into sub-segments. A sub-segment may include a whole number of complete access units. An access unit may be a unit of a media stream with an assigned media presentation time. If a segment is divided into one or more sub-segments, then the segment may be described by a segment index. The segment index may provide the presentation time range in the representation and/or corresponding byte range in the segment occupied by each sub-segment. A client may download the segment index in advance. A client may issue requests for individual sub-segments using HTTP partial GET requests. The segment index may be included in a media segment, for example, in the beginning of the file. Segment index information may be provided in one or more index segments (e.g., separate index segments).
DASH may utilize a plurality (e.g., four) types of segment. The types of segments may include initialization segments, media segments, index segments, and/or bitstream switching segments. Initialization segments may include initialization information for accessing a representation. Initialization segments may not include media data with an assigned presentation time. An initialization segment may be processed by the client to initialize the media engines for enabling play-out of a media segment of the included representation.
A media segment may include and/or encapsulate one or more media streams that may be described within this media segment and/or described by the initialization segment of the representation. A media segment may include one or more complete access units. A media segment may include at least one Stream Access Point (SAP), for example, for each included media stream.
An index segment may include information that is related to one or more media segments. An index segment may include indexing information for one or more media segments. An index segment may provide information for one or more media segments. An index segment may be media format specific. More details may be defined for a media format that supports an index segment.
A bitstream switching segment may include data for switching to its assigned representation. A bitstream switching segment may be media format specific. More details may be defined for each media format that supports bitstream switching segments. One bitstream switching segment may be defined for each representation.
A client may switch from representation to representation within an adaptation set, for example, at any point in the media. Switching at arbitrary positions may be complicated, for example, because of coding dependencies within representations. The download of overlapping data, for example, media for the same time period from multiple representations, may be performed. Switching may be performed at a random access point in a new stream.
DASH may define a codec-independent concept of a stream access point (SAP) and/or may identify one or more types of SAPs. A stream access point type may be communicated as one of the properties of the adaptation set, for example, assuming that all segments within an adaptation set have same SAP type. A SAP may enable random access into a file container of one or more media streams. A SAP may be a position in a container enabling playback of an identified media stream to be started, for example, using the information included in the container starting from that position onwards. Initialization data from other parts of the container and/or that may be externally available may be used. A SAP may be a connection between streams, for example, within DASH. For example, a SAP may be characterized by a position within a representation where a client may switch into the representation, for example, from another representation. A SAP may ensure that catenation of streams along SAPs may produce a correctly decodable data stream (e.g., MPEG stream).
TSAP may be the earliest presentation time of any access unit of the media stream, for example, such that access units of a media stream with a presentation time greater than or equal to TSAP may be correctly decoded using data in the bitstream starting at ISAP and no data before ISAP. ISAP may be the greatest position in the bitstream, for example, such that access units of the media stream with a presentation time greater than or equal to TSAP may be correctly decoded using bitstream data starting at ISAP and no data before ISAP. ISAU may be the starting position in the bitstream of the latest access unit in decoding order within the media stream, for example, such that access units of the media stream with presentation time greater than or equal to TSAP may be correctly decoded using the latest access unit and access units following in decoding order, and no access units earlier in the decoding order.
TDEC may be the earliest presentation time of an access unit of the media stream that may be correctly decoded using data in the bitstream starting at ISAU and without any data before ISAU. TEPT may be the earliest presentation time of an access unit of the media stream starting at ISAU in the bitstream. TPTF may be the presentation time of the first access unit of the media stream in decoding order in the bitstream starting at ISAU.
A plurality (e.g., six) SAP types may be defined. The use of different SAP types may be limited based on profile. For example, SAPs of types 1, 2, and 3 may be allowed for some profiles. The type of SAP may depend on which access units may be correctly decodable and/or the arrangement in the presentation order of the access units.
An example of a GDR may be the intra refreshing process, which may be extended over N frames, and where part of a frame may be coded with intra macroblocks (MBs). Non-overlapping parts may be intra coded across N frames. This process may be repeated until the entire frame is refreshed.
A type 5 SAP may be described by the following: TEPT=TDEC<TSAP. The type 5 SAP may correspond to a case in which there may be at least one access unit in the decoding order starting from ISAP that cannot be correctly decoded and/or may have a presentation time that is greater than TDEC, and/or where TEC may be the earliest presentation time of an access unit starting from ISAU.
A type 6 SAP may be described by the following: TEPT<TDEC<TSAP. The type 6 SAP may correspond to a case in which there may be at least one access unit in the decoding order starting from ISAP that may not be correctly decoded and/or may have a presentation time that is greater than TDEC, and where TDEC may not be the earliest presentation time of an access unit starting from ISAU. The type 4, 5, and/or 6 SAPs may be utilized in a case of handling transitions in audio coding.
Smooth stream switching in video and/or audio encoding and decoding may be provided. Smooth stream switching may include the generation and/or display of one or more transition frames that may be utilized between streams (e.g., portions of a stream) of media content encoded at different rates. The transition frames may be generated via crossfading and overlapping, crossfading and transcoding, post-processing techniques using filtering, post-processing techniques using re-quantization, etc.
Smooth stream switching may include receiving a first data stream of media content and a second data stream of media content. The media content may include video and/or audio. The media content may be in an MPEG container format. The first data stream and/or the second data stream may be identified in a MPD file. The first data stream may be an encoded data stream. The second data stream may be an encoded data stream. The first data stream and the second data stream may be portions of the same data stream. For example, the first data stream may temporally proceed (e.g., immediately proceed) the second data stream. For example, the first data stream and/or the second data stream may begin and/or end at a SAP of the media content.
The first data stream may be characterized by a first signal-to-noise ratio (SNR). The second data stream may be characterized by a second SNR. For example, the first SNR and the second SNR may relate to the encoding of the first data stream and the second data stream, respectively. The first SNR may be greater than the second SNR, or the first SNR may be less than the second SNR.
Transition frames may be generated using at least one of frames of the first data stream and frames of the second data stream. The transition frames may be characterized by one or more SNR values that are between the first SNR and the second SNR. The transition frames may be characterized by a transition time interval. The transition frames may be part of one segment of the media content. One or more frames of the first data stream may be displayed, the transition frames may be displayed, and one or more frames of the second data stream may be displayed, for example, in that order. The switch from the first data stream to the transition frames and/or from the transition frames to the second data stream may be done at a SAP of the media content.
Generating the transition frames may include crossfading the frames characterized by the first SNR with the frames characterized by the second SNR to generate the transition frames. Crossfading may include calculating a weighted average of the frames characterized by the first SNR and the frames characterized by the second SNR to generate the transition frames. The weighted average may changes over time. Crossfading may include calculating a weighted average of the frames characterized by the first SNR and the frames characterized by the second SNR by applying a first weight to the frames characterized by the first SNR and a second weight to the frames characterized by the second SNR. At least one of the first weight and the second weight may change over the transition time interval. Crossfading may be performed using a linear transition or a non-linear transition between the first date stream and the second data stream.
The first data stream and second data stream may include overlapping frames of the media content. Crossfading the frames characterized by the first SNR with the frames characterized by the second SNR to generate the transition frames may include crossfading the overlapping frames of the first data stream and the second data stream to generate the transition frames. The overlapping frames may be characterized by corresponding frames of the first data stream and of the second data stream. The overlapping frames may be characterized by an overlap time interval. One or more frames of the first data stream may be displayed before the overlap time interval, the transition frames may be displayed during the overlap time interval, and one or more frames of the second data stream may be displayed after the overlap time interval. The one or more frames of the first data stream may be characterized by times preceding the overlap time interval and the one or more frames of the second data stream may be characterized by times succeeding the overlap time interval.
A subset of frames of the first data stream may be transcoded to generate corresponding frames characterized by the second SNR. Crossfading the frames characterized by the first SNR with the frames characterized by the second SNR to generate the transition frames may include crossfading the subset of frames of the first data stream with the corresponding frames characterized by the second SNR to generate the transition frames.
Generating the transition frames may include filtering the frames characterized by the first SNR using a low-pass filter characterized by a cutoff frequency that changes over the transition time interval to generate the transition frames. Generating the transition frames may include transforming and quantizing the frames characterized by the first SNR using one or more of step sizes to generate the transition frames.
One or more parameters of media content (e.g., a video sequence) may be controlled during encoding to effect changes in the bitrate of the encoded media content. For example, the parameters may include, but are not limited to signal-to-noise ratio (SNR), frame resolution, frame rate, etc. The SNR of media content may be controlled during encoding to generate encoded versions of the media content with varying bitrates. For example, the SNR may be controlled via a quantization parameter (QP) used on transform coefficients during encoding. For example, changing the QP may affect the SNR (e.g., and bitrate) of an encoded video sequence. For example, the change in the QP may result in a video sequence that has a different visual quality and/or SNR. SNR and bitrate may be related. For example, changing the QP during encoding may be a way to control bitrate. For example, if the QP is lower, then the encoded video sequence may have a higher SNR, a higher bitrate, and/or a higher visual quality.
The SNR of media content (e.g., an encoded video stream) may refer to the encoding of the media content. For example, the SNR of media content may be controlled by the QP used during encoding of the media content. For example, media content may be encoded at different rates to generate corresponding versions of the media content that may be characterized by different SNR values, for example, as described with reference to
The frame resolution of one or more frames of media content (e.g., the horizontal and vertical dimensions of a video frame in pixels) may be controlled (e.g., between 240 p, 360 p, 720 p, 1080 p, etc.) during encoding to generated encoded versions of the media content with varying bitrates. For example, changing the frame resolution during encoding may change the bitrate of encoded versions of the media content (e.g., an encoded video sequence). Frame resolution and bitrate may be related. For example, if the frame resolution is lower, then a lower bitrate may be used to encode a video sequence at a similar visual quality.
The frame rate (e.g., the number of frames per second (fps)) of media content may be controlled (e.g., between 15 fps, 20 fps, 30 fps, 60 fps, etc.) during encoding to generated encoded versions of the media content with varying bitrates. For example, changing frame rate during encoding may change the bitrate of encoded versions of the media content (e.g., an encoded video sequence). Frame rate and bitrate may be related. For example, if the frame rate is lower, then a lower bitrate may be used to encode a video sequence at a similar subjective visual quality.
One or more of the parameters of media content (e.g., a video sequence) may be controlled (e.g., changed) during encoding to achieve a target bitrate of the media content for bandwidth adaptive streaming. The SNR (e.g., via the QP) of media content may be controlled during encoding to generate the media content encoded at different bitrates. For example, for one or more different bitrates, a video sequence may be encoded at the same frame rate (e.g., 30 frames per second) and the same resolution (e.g., 720 p), while the SNR of the encoded video sequence may be changed. Changing the SNR of the encoded video sequences may be useful when the range of target bitrates is relatively small (e.g., between 1 and 2 Mbps), for example, because changing the QP of the video sequenced may produce video sequences of good visual quality at the desired target bitrates.
The frame resolution of media content may be controlled to generate the media content encoded at different bitrates. The media content (e.g., a video sequence) may be encoded at the same frame rate (e.g., 30 frames per second) and the SNR, while the frame resolution of the frames of the media content may be changed. For example, video sequences may be encoded at one or more different resolutions (e.g., 240 p, 360 p, 720 p, 1080 p, etc.), while maintaining the same frame rate (e.g., 30 fps) and the same SNR. Changing the frame resolution of the media content may be useful when the range of the target bitrate is large (e.g., between 500 kbps and 10 Mbps).
The frame rate of media content may be controlled during encoding to generate the media content encoded at different bitrates. The media content (e.g., a video sequence) may be encoded at the same frame resolution (e.g., 720 p) and the same SNR, while the frame rate (e.g., 15 fps, 20 fps, 30 fps, 60 fps, etc.) of the media content may be changed. For example, video sequences may be encoded with lower frame rates to generate encoded video sequences of lower bitrates. For example, video sequences at higher bitrates may be encoded at full 30 fps, while video sequences at lower bitrates may be encoded at 5-20 fps, while maintaining the same resolution (e.g., 720 p) and the same SNR.
The SNR (e.g., via the QP) and frame resolution of media content may be controlled during encoding to generate the media content encoded at different rates. For example, video sequences may be encoded with lower SNR and frame resolution to generate encoded video sequences of lower bitrates, while the same frame rate may be used for the encoded video sequences. For example, video sequences at higher rates may be encoded at 720 p, 30 fps, and at a number of SNR points, while sequences at lower rates may be encoded at 360 p, 30 fps, and at the same SNR.
The SNR (e.g., via the QP) and frame rate of media content may be controlled during encoding to generate the media content encoded at different rates. For example, video sequences may be encoded with lower SNR and frame rates to generate encoded video sequences of lower bitrates, while the same frame resolution may be maintained for the encoded video sequences. For example, video sequences at higher rates may be encoded at 720 p, 30 fps, and at a number of SNR points, while video sequences at lower rates may be encoded at 720 p, 10 fps, and at the same SNR.
The frame resolution and frame rate of media content may be controlled during encoding to generate the media content encoded at different rates. For example, video sequences may be encoded with lower frame resolution and frame rate to generate encoded video sequences of lower bitrates, while maintaining the same visual quality (e.g., SNR) for the encoded video sequences. For example, video sequences at higher bitrates may be encoded at 720 p, at frame rates of 20 to 30 fps, and with the same SNR, while sequences at lower bitrates may be encoded at 360 p, at frame rates of 10 to 20 fps, and with the same SNR.
The SNR (e.g., via the QP), the frame resolution, and the frame rate of media content may be controlled during encoding to generate the media content encoded at different rates. For example, video sequences may be encoded with lower SNR, frame resolution, and frame rate to generate encoded video sequences of lower bitrates. For example, video sequences at higher bitrates may be encoded at 720 p, 30 fps, and at a higher SNR point, while video sequences at lower bitrates may be encoded at 360 p, 10 fps, and at a lower SNR point.
Implementations described herein may be used to smooth the transitions between media streams (e.g., video stream, audio stream, etc.) of media content (e.g., video, audio, etc.) that are characterized by a different bitrates, SNR, frame resolutions, and/or frame rates. Although described herein as a transition between media streams encoded at two different bitrates (e.g., high (H) and low (L)), SNR, frame resolutions, and/or frame rates, the implementations described herein may be applied to transitions between media streams encoded at any number of different bitrates, SNR, frame resolutions, and/or frame rates.
If smooth stream switching is not utilized, for example as shown in
Smooth stream switching may provide stream switches that may be less noticeable to a user, and which may improve the user experience. Smooth stream switching may allow for different segments of media content to utilize use different codecs, for example, by substantially eliminating differences in artifacts. Smooth stream switching may reduce the number of encodings/rates produced by a content provider for media content.
A streaming client may receive one or more streams of media content (e.g., video, audio, etc.) prepared by a DASH-compliant encoder. For example, the one or more streams of media content may include stream access points of any type, for example, types 1-6.
A client may include processing for concatenating and feeding encoded media segments to a playback engine. A client may include processing for decoding media segments, and/or applying cross-fade and/or post-processing operations. A client may load overlapping parts of media segments, and/or utilize the overlapping segments for smooth stream switching, for example, via the processing described herein.
Smooth stream switching between streams with different SNR (e.g., SNR points) may be performed using one or more of the implementations described herein, for example, using overlapping and crossfading, using transcoding and crossfading, using crossfading with scalable codecs, using progressive transcoding, and/or using post-processing. These implementations may be used for H-to-L and/or L-to-H transitions, for example.
Although described with reference to streams encoded at two different rates (e.g., H and L), the smooth stream switching implementations described herein may be utilized on streams of media content encoded at any number of different rates. The frame rate and/or resolution of the encoded streams of the media content (e.g., H and L) may be the same, while the SNR of the encoded streams of the media content may be different.
A client may request and/or receive overlapping segments or sub-segments of media content and perform crossfade between encoded streams of the media content, for example, using the overlapping segments or sub-segments. Sub-segments of a particular segment may be utilized for smooth stream switching. For example, if a segment is of a longer duration, such as more than 30 seconds, for example, then the client may request and/or receive overlapping sub-segments of that segment, such as 2-5 seconds worth of sub-segments, for example, to perform smooth stream switching. Segment(s) may refer to the entire segment(s) and/or may refer to one or more sub-segments of the segment(s).
After receiving overlapping segments, crossfading may be performed between the frames of the overlapping segments to generate one or more transition frames. For example, crossfading may be performed between the frames encoded at rate H and the temporally corresponding (e.g., overlapping) frames encoded at rate L, as shown in
Crossfading may include calculating a weighted average of the overlapping frames encoded at one rate with the overlapping frames encoded at another rate such that the resulting transition frames have parameters that gradually transition from one rate to another over the transition time interval. For example, the weights applied to the overlapping frames encoded at each rate may change over time (e.g., the transition time interval) such that the generated transition frames may be utilized for a more gradual transition between the media content encoded at the various rates. For example, crossfading may include calculating a weighted average of one or more frames characterized by one rate (e.g., a first SNR) and one or more frames characterized by another rate (e.g., a second SNR), for example, by applying a first weight to the frames characterized by the first rate and a second weight to the frames characterized by the second rate. At least one of the first weight and the second weight may change over time (e.g., the transition time interval). For example, crossfading may refer to a smooth fade-in or alpha-blending.
After generating the transition frames via crossfading, the transition frames may be displayed by the client, for example, instead of the temporally corresponding frames at one or more of the rates (e.g., rate H and/or rate L). For example, the client may display one or more frames of the media content encoded at one rate (e.g., rate H) before the transition and/or overlap time interval, display one or more transition frames during the transition and/or overlap time interval, and display one or more frames of the media content encoded at another rate (e.g., rate L) after the transition and/or overlap time interval, for example, in that order. This may provide a smooth transition between the media content encoded at different rates.
z=α(t)L+[1−α(t)]H, where αa(t)=(t−ta)/(tb−ta) for ta<t<tb.
z=α(t)H+[1−α(t)]L, where α(t)=(t−ta)/(tb−ta) for ta<t<tb.
Equations described with reference to the systems of
The overlapping stream at a rate (e.g., rate L) may be partitioned into sub-segments, for example, when utilizing overlapping and crossfading transitions in DASH. For example, if the overlapping stream at rate L is partitioned in sub-segments, then time ta (e.g., for a H-to-L transition) or time tb (e.g., for a L-to-H transition) may be selected such that they match the beginning or end, respectively, of a sub-segment, for example, as shown in
After transcoding, the temporally corresponding media content at rate H (e.g., a high SNR) and rate L (e.g., a low SNR) may be utilized similarly as the overlapping segments described herein. For example, the temporally corresponding media content at rate H (e.g., the high SNR) and at rate L (e.g., the low SNR) may be crossfaded to generate one or more transition segments. The transition frames may be displayed instead of the temporally corresponding frames at rate H (e.g., the SNR H), for example, during the transition time (e.g., the time between ta and tb in
z=α(t)L+[1−α(t)]H, where α(t)=(t−ta)/(tb−ta) for ta<t<tb.
z=α(t)H+[1−α(t)]L, where α(t)=(t−ta)/(tb−ta) for ta<t<tb.
y−y1=m(x−x1), where m=(y2−y1)/(x2−x1).
Other types of crossfading besides a linear transition, for example, non-linear transitions, may be used. For example, α(t) may vary non-linearly.
For example, for a non-linear transition, α(t) may be a non-linear function, a logarithmic function, and/or an exponential function. For example, a non-linear function may be a polynomial of degree two or higher (e.g., α(t) may be a polynomial of degree two, where α(t)=a*t2+b*t+c). For example, a logarithmic function may be defined as: α(t)=log(α(t)), where log may be a logarithm base “b” and α(t) may be a function of t. For example, an exponential function may be defined as: α(t)=exp(α(t)), where exp may be the base (e.g., “2,” “e,” “10,” etc.) and α(t) may be a function of t. α(t) may be a linear function, a non-linear function, a logarithmic function, or exponential function of t.
The input for post-processing may be media content encoded at a higher rate and/or characterized by higher parameter(s) (e.g., frames encoded with a higher SNR). The output of post-processing may be transition frames that may be utilized during the transition time to more gradually transition from a stream encoded at one rate to a stream encoded at another. Various post-processing techniques, such as filtering and re-quantization, for example, may be used to degrade visual quality of media content to generate transition frames.
Filtering may be utilized as a post-processing technique to generate transition frames for smooth stream switching.
The strength (e.g., the cutoff frequency) of the low-pass filter may vary according to the desired degree of degradation of the frame at rate H, for example, as shown in
p(m,n)=h(m,n)*lp(k,l), where “*” may denote convolution.
Re-quantization may be utilized as a post-processing technique to generate one or more transition frames for smooth stream switching. For example, the pixel values of a frame at rate H may be transformed and quantized at different levels to generate transition frames at rates lower than H. One or more quantizers (e.g., uniform quantizers) may be utilized to generate transition frames. For example, the one or more quantizers may be characterized by step sizes that vary according to the desired degree of degradation of a frame at rate H. A larger step size may result in greater/higher degradation, and/or be utilized to generate a transition frame that more closely resembles a frame at rate L. The number of quantization levels may be sufficient to avoid contouring (e.g., contiguous regions of pixels with constant levels, whose boundaries may be referred to as contours). If h(m,n) is the frame at rate H, and Q(•, s) is a uniform quantizer of step size s, then the post-processed frame p(m,n) (e.g., transition frame) may be generated using pixel quantization according to the following equation:
p(m,n)=Q(h(m,n),s).
Smooth switching may be utilized with streams having different spatial resolutions. A client device (e.g., a smartphone, tablet, etc.) may stretch a video to full screen during streaming playback. Stretching a video to full screen may enable a switch between streams encoded at different spatial resolutions during the streaming session. Up-sampling streams from low resolutions may cause visual artifacts, which may cause the video to become blurred, for example, because high frequency information may be lost during down-sampling.
z=α(t)L+[1−α(t)]H, where α(t)=(t−ta)/(tb−ta) for ta<t<tb.
z=α(t)H+[1−α(t)]L, where α(t)=(t−ta)/(tb−ta) for ta<t<tb.
Smooth stream switching may be utilized with streams having different frame rates. Media content (e.g., video streams) with a low frame rate may suffer from poor temporal correlation between frames, for example, because frames may be farther apart in time from each other as compared to media content with higher frame rate. Frame rate upsampling (FRU) techniques may be utilized to convert a stream of media content with a low frame rate to a high frame rate.
z=α(t)L+[1−α(t)]H, where α(t)=(t−ta)/(tb−ta) for ta<t<tb.
z=α(t)H+[1−α(t)]L, where α(t)=(t−ta)/(tb−ta) for ta<t<tb.
Asymmetry of duration for smoothening H-to-L and/or L-to-H transitions may be utilized. A transition from a low-quality representation to a high-quality representation may be characterized by a less degrading effect than a transition from a high-quality representation to a low-quality representation. The time delays for smoothening transitions from H-to-L and from L-to-H may be different. For example, longer transitions (e.g., transition including more transition frames) may be longer for H-to-L transition and shorter for L-to-H transitions. For example, a transition of a couple seconds (e.g., two seconds) may be utilized for H-to-L quality transitions, and/or a slightly shorter transition (e.g., one second) may be utilized for L-to-H transitions.
Smooth stream switching may be utilized for audio transitions, for example, in DASH. The DASH standard may define one or more types of connections between streams, which may be referred to as SAPs. A SAP may be utilized to ensure that catenation of streams along these points may produce a correctly decodable MPEG stream.
An audio codec may decode and discard one block at the beginning. This may be sufficient mathematically for correct decoding of all blocks that follow, for example, due to a perfect-reconstruction property of the MDCT transform that may employ overlapping windows. A block proceeding the block that is being decoded may be retrieved, decoded, and then discarded prior to decoding the requested data, for example, in order to achieve random access. For an audio codec (e.g., HE-AAC, AAC-ELD, MPEG-Surround, etc.), the number of blocks to be discard at the beginning may be more or less than one (e.g., three blocks), for example, due to the use of an SBR tool.
Audio segments may be unlabeled (e.g., do not include a StartWithSAP attribute), or labeled with SAP type=1, for example, if there are no stream switches, and/or if there are switches between streams that use the same codec, operate with audio captured at the same sampling rate and same cut-off frequency, use same number of channels, and/or use the same tools and modes in the codec (e.g., no addition/removal of a SBR tool, use the same stereo coding mode, etc.).
For example, a stereo AAC stream at 128 Kbps may be utilized for high-quality reproduction. The stream may be reduced to approximately 64-80 Kbps for lower quality. In order to go to rates of 32-48 Kbps, a SBR tool (e.g. use HE-AAC), a switch to parametric stereo, etc. may be utilized.
SAP point declaration maybe utilized for switchable audio streams. For example, for MDCT-core AAC, Dolby AC3, and/or MP3 codecs, SAPs may be defined as SAP type 4 points. For example, for HE-AAC, AAC-ELD, MPEG Surround, MPEG SAOC, and/or MPEG USAC codecs, SAPs may be defined as SAP type 6 points. For example, for a new SAP type (e.g. SAP type “0”) may be defined for use with audio codec. The new SAP type may be characterized by the following: TEPT<=TPFT<TDEC<=TSAP. For example, if TDEC<TSAP, then an additional parameter may be utilized to define a distance between the points. For example, the use of a new SAP type (e.g., type 0) may not involve a change in profile, for example, since most profiles in DASH support SAPs of types <=3.
Seamless stream switching between audio streams may be implemented. If SAP types are defined correctly, a catenation of segments may not produce a best user experience during playback. Changes in codecs or sampling rates may manifest in clicks during playback. In order to avoid such clicks, a client (e.g., a DASH client) may implement a decode and/or a cross-fade operation, for example, similar those described above with reference to video switching.
z=α(t)L+[1−α(t)]H.
z=α(t)H+[1−α(t)]L.
Although some of implementations are described above with reference to one of encoding or decoding, one of ordinary skill in the art will appreciate that the implementations may be utilized for both encoding and decoding streams of media content.
Although features and elements are described above in particular combinations, one of ordinary skill in the art will appreciate that each feature or element can be used alone or in any combination with the other features and elements. In addition, the methods described herein may be implemented in a computer program, software, or firmware incorporated in a computer-readable medium for execution by a computer or processor. Examples of computer-readable media include electronic signals (transmitted over wired or wireless connections) and computer-readable storage media. Examples of computer-readable storage media include, but are not limited to, a read only memory (ROM), a random access memory (RAM), a register, cache memory, semiconductor memory devices, magnetic media such as internal hard disks and removable disks, magneto-optical media, and optical media such as CD-ROM disks, and digital versatile disks (DVDs). A processor in association with software may be used to implement a radio frequency transceiver for use in a WTRU, UE, terminal, base station, RNC, or any host computer.
This application claims the benefit of U.S. Provisional Patent Application No. 61/637,777, filed Apr. 24, 2012, the content/contents of which is/are hereby incorporated by reference herein.
Number | Date | Country | |
---|---|---|---|
61637777 | Apr 2012 | US |