The present invention generally relates to systems and methods for multimedia streaming over wired and wireless data networks, and the like, more particularly to systems and methods for streaming and real time editing of multimedia content within the context of current systems, and methods for Moving Picture Experts Group (MPEG)-Dynamic Adaptive Streaming over HTTP (DASH) delivery, and the like.
In recent years, video streaming over HTTP (Hyper Text Transfer Protocol) has become the dominant means of video delivery over the internet, and the like. However, HTTP alone has proven insufficient for video delivery, particularly under challenging network conditions, such as resource constrained networks, devices, and the like. A number of dynamic adaptive streaming methods over HTTP protocols have emerged to improve on the basic static progressive download paradigm of traditional HTTP video delivery. The Moving Picture Experts Group (MPEG) committee standardized MPEG-Dynamic Adaptive Streaming over HTTP (DASH) in 2012 to help increase end user compatibility with various content providers and services. Hence, many video streaming service providers have started adopting MPEG-DASH streaming technology. However, the MPEG DASH content streamed from such service providers has new limitations, relative to static progressive download, and the like, and which has largely prevented traditional man-in-the-middle video optimization services, and the like, from editing or transcoding content in real-time during the content delivery, and the like.
Therefore, there is a need for methods and systems that address the above and other problems with systems and methods for content streaming, and the like. Accordingly, the above and other needs are addressed by the illustrative embodiments of the present invention, which provide a novel method and system that facilitates transcoding and editing of multimedia content in real-time when it is being streamed with an on-demand Moving Picture Experts Group (MPEG)-Dynamic Adaptive Streaming over HTTP (DASH) format. The resultant content advantageously adheres to the MPEG-DASH streaming standard, and does not alter the playability of the stream as determined by various MPEG-DASH syntax elements, and the like. For example, the novel system, method and computer program product can include modifying MPEG-DASH on-demand multimedia content in real time, including a restructuring stage that enables parallel processing and reduces latency; a transcoding stage where media subsegments are transcoded; a padding stage configured for maintaining subsegment offsets; a distributed system for padding to save on bandwidth; and a signaling protocol to pass information related to the padding. The system, method and computer program product provide the flexibility to modify MPEG-DASH, on-demand content, in real time, including arbitrary partial modification of a content stream, advantageously, without altering the functionality of the stream, as determined by various MPEG-DASH syntax elements, and the like.
Accordingly, in an illustrative aspect, there is provided a system, method and computer program product for real-time post-processing system that transforms MPEG-DASH on-demand media streams, including a DASH media player device; an intercepting media server device; a MPEG-DASH content origin server device; and a proxy media client device coupled to the DASH media player device and the intercepting media server device and configured to intercept MPEG-DASH HTTP requests from the DASH media player device and forward the intercepted requests to the intercepting media server device instead of the MPEG-DASH content origin server device. The intercepting media server device is configured to act as an HTTP proxy device, and forward the intercepted requests to the MPEG-DASH content origin server, and with each corresponding MPEG-DASH media subsegment acquired perform analysis of the video media content within the subsegment and apply selective transcoding.
The applied selective transcoding is configured to achieve one of higher compression to improve delivery efficiency over a communications network, and addition of watermarking, advertisement and logo insertion, picture-in-picture insertion, and text overlay. The output of the intercepting media server device is directed back to the proxy media client device including a transcoded subsegment that is smaller than the original size of the subsegment, or the original subsegment unmodified, yielding an overall DASH media subsegment stream that arbitrarily interleaves modified subsegments and unmodified subsegments and has an overall size that is less-than or equal to the original, the proxy media client device inspects the incoming media subsegment and if it has been modified invokes a padding agent device to augment data bytes removed due to transcoding, the output of the proxy media client device is directed back to the DASH media player device configured to buffer and index the incoming subsegment in an identical manner whether modified or unmodified, and the MPEG-DASH subsegment index (sidx) is not modified while structure and functionality of an MPEG-DASH on-demand stream in its entirety is maintained through the manipulation of an MPEG-DASH ISO Base media file format “free box” syntax element.
The MPEG-DASH initialization segment (init) and subsegment index (sidx) are stored in the intercepting media server device in persistent and shareable cache memory and added to the beginning of every media subsegment that arrives at the intercepting media server device for independent decoding and parallel processing of media subsegment payloads.
The payload of modified media subsegments are augmented with employed “pad” data bytes equal in total size to the data bytes removed during the transcoding process at the intercepting media server device, and using the “free box” syntax element of the ISO Base media file format within the MPEG-DASH on-demand stream, the original size of MPEG-DASH subsegments processed by the intercepting media server device component are restored, thus maintaining coherence with the sidx subsegment indexes.
The “free box” syntax element of the ISO Base media file format is used as an in-band signaling mechanism in the transcoded MPEG-DASH media subsegment conveying parameters including position, amount, and format of padding employed by the padding agent device of the proxy media client device.
The overall DASH media stream that arbitrarily interleaves modified media subsegments and unmodified media subsegments is a completely valid MPEG-DASH stream and adheres to an original subsegment index (sidx) offsets.
Still other aspects, features, and advantages of the present invention are readily apparent from the following detailed description, simply by illustrating a number of illustrative embodiments and implementations, including the best mode contemplated for carrying out the present invention. The present invention also is capable of other and different embodiments, and its several details can be modified in various respects, all without departing from the spirit and scope of the present invention. Accordingly, the drawings and descriptions are to be regarded as illustrative in nature, and not as restrictive.
The embodiments of the present invention are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings, in which like reference numerals refer to similar elements, and in which:
The present invention includes recognition that Moving Picture Experts Group (MPEG)-Dynamic Adaptive Streaming over HTTP (DASH) technology is a streaming technology that can be adopted by content providers, and the like, and that there is a need by various parties, such as network optimization service providers, and the like, to modify the streamed multimedia content in real time. The streaming technology can include various components, such as Media Content Hosting Services (CDN), an intercepting media server (IMS), a Proxy Media Client (PMC), and an MPEG-DASH player.
Accordingly, the system, method and computer program product for modifying MPEG-DASH content in real time can include a restructuring stage that converts each subsegment into an independent stream, a transcoding and/or modification stage of the subsegment, and a padding stage that suffixes the modified partial content, such as to maintain original offsets, advantageously maintaining playback options, such as seeking forward and backward at different speeds, and the like.
Advantageously, restructuring of the stream into multiple short streams that can be independently processed facilitates processing of a partially received HTTP response and also allows parallel processing of different parts of the stream, thus potentially reducing processing latency.
During the server side transcoding stage, a subsegment received from the Media Content Hosting Services (CDN) goes through a multi-stage processing pipeline, for example, including decoding, content-analysis, application of desired modifications and encoding with desired rate-distortion parameters, and the like. For example, the video DASH stream packets that arrive from Media Content Hosting Services (CDN) are first de-multiplexed to parse out video elementary streams, which are then sent to the video decoder for decoding. During the decoding stage, the compressed video stream is decoded into a raw YUV representation. Modifications such as higher compression ratio, addition of watermarking, advertisement and logo insertion, picture-in-picture insertion, and text overlay are then performed on the YUV representation. The content analysis module analyzes the characteristics of the modified YUV video, after which best-fit encoding parameters are chosen for the encoder to realize the desired rate-distortion target. A video encoding system, for example, as described in PCT patent application serial number PCT/US13/32303, entitled “METHOD AND SYSTEM FOR IMPROVED VIDEO CODEC RATE-DISTORTION PERFORMANCE BY PRE AND POST-PROCESSING,” filed on Mar. 15, 2013, incorporated by reference herein, provides an example method and system for realizing rate-distortion optimized transcoding in this context. Finally, the encoded stream is multiplexed back into the original DASH container format before the stream leaves the server. Similar processes and methods can be applied to transcode audio DASH streams, and the like.
The padding stage includes augmenting the modified or transcoded content with data conforming to a given file format (e.g., MP4, WEBM, etc.) specification, such that the relative positions of the media subsegments are maintained and so that the original structure of the content remains intact. Advantageously, this ensures that the DASH client receives the expected amount of data, and that the playback, seeking, and the like, functionality of the player is not affected.
The above system can function on a single hardware instance or as a distributed system of multiple hardware instances, and the like. In the case of distributed processing, the restructuring and padding stages can occur at an intercepting media server (IMS) and padding agent (PA), respectively. The padding employed is communicated from the intercepting media server to the padding agent using an in-band signaling mechanism in the stream itself, which conveys information about the padding, for example, including the position of padding, the amount of padding employed, and the like. An illustrative implementation of the intercepting media server can include a Linux server with a public IP address configured with appropriate software modules to function as a forwarding HTTP proxy, and further configured for local caching and accessing of analysis and transcoding services provided by Web Services or Remote Procedure Call (RPC) hosted internally or externally on additional servers. Such servers and services can be hosted on public clouds, such as Amazon Web Services (AWS), on privately owned and operated server infrastructure, and the like. An example of the padding agent includes a Squid caching proxy installed on a personal computer (PC), acting as the Proxy Media Client (PMC), configured for interoperation with C-ICAP services, which in turn are configured to fulfill padding agent stream filtering, pad-byte insertion functionality, and the like.
Another aspect of the described system and method includes delivering a valid MPEG-DASH on-demand stream, even if the stream is only partially modified, and the like. A partially modified stream can include an arbitrarily interleaved pattern of original subsegments and modified subsegments. Since the subsegment offsets are maintained, the interleaved stream is a completely valid MPEG-DASH on-demand stream.
Referring now to the drawings, wherein like reference numerals designate identical or corresponding parts throughout the several views, and more particularly to
In the case that the initial HTTP content request 108 is not a DASH request it is forwarded directly to the Media Content Hosting Services (CDN) 101 if it is related to the media playback session as request 111 with a subsequent response 112 or directly to the World-Wide-Web host 116 as request 117 with a subsequent response 118 if it is not related to the media playback session.
The embedded MPEG-DASH player 109 initiates an HTTP request 108 for a Media Presentation Description (MPD) file. After receiving a valid MPD file, based on the representations specified in the MPD file and the prevalent network conditions (e.g., bandwidth and latency), the MPEG-DASH player 109 selects an appropriate representation and sends a request for the same at 108. The player can follow different types of request patterns such as, one request per subsegment, one request for multiple subsegments or requests not aligned to the subsegment boundary, and the like.
The Media Content Hosting Services (CDN) 101 serves the MPD file and all the representations that are specified within this file. The URL links to representations along with other information, such as initialization segment range, indexing segment range, and the like, is specified in the MPD file. Whenever the Media Content Hosting Services (CDN) 101 receives a request for a video representation, it is accompanied by the desired byte range. Accordingly, the Media Content Hosting Services (CDN) 101 streams the corresponding requested bytes to the MPEG-DASH player 109.
The intercepting media server 102 acts as an HTTP proxy and intercepts all the relevant MPEG DASH traffic between Media Content Hosting Services (CDN) 101 and Proxy Media Client 103. The intercepting media server 102 can have different capabilities, such as editing the video, transcoding the video, and the like. Advantageously, such functionality can be employed to improve the video quality by removing certain artifacts, to reduce the bandwidth consumption based on certain conditions, to change the resolution according to the display device, to add watermarking, for advertisement or logo insertion, for picture-in-picture insertion, to add text overlays, and the like. The challenge, however, is that the end user experience should not be negatively affected. Advantageously, the novel system and method solve these and other problems, as the edited content is delivered in real time with MPEG-DASH syntax elements that support player features such as seeking and bitrate adaptation, and the like, continuing to function.
When content is streamed on demand using MPEG-DASH, stream access points (SAPs) are used as indexing points for changing representations (e.g., generally to modulate the employed bit-rate, and as specified in ISO/IEC standard 14496-12, section 8.16.3: “Segment Index Box”). However, most indexing schemes employ a priori knowledge of the length of each subsegment and the total number of subsegments, and run contrary to dynamic content generation or modification. For example, if the media stream is edited during delivery and the edited stream differs in size, the media subsegment offsets specified in the indexing segment no longer correctly corresponds to the modified stream potentially breaking the functionality of the stream.
As a solution to the above and other problems, the novel method and system, as further illustrated in
In the illustrative examples, employed are MPEG-DASH utilizing the ISO Base Media File Format with MP4 extensions (MP4), and the like, but the method and system can be used with other container formats, and the like, based on the teaching of the present disclosure, as will be appreciated by those skilled in the relevant art(s). In addition, the disclosure illustrates an end-to-end working data flow model, which modifies MPEG DASH on demand video streams in real-time. However, similar dynamic modification can be applied to DASH audio streams, and the like, based on the teaching of the present disclosure, as will be appreciated by those skilled in the relevant art(s).
As shown in
The traffic between the embedded DASH player 109 and the Media Content Hosting Services (CDN) 101 can be intercepted in various ways including, for example, by the proxy media client (PMC) 103, if it is installed on the client or by a global proxy redirecting HTTP traffic to the intercepting media server (IMS) 102, followed by HTTP request filtering at the intercepting media server 102 in the HTTP filtering proxy module 115.
As an example pipeline, consider transcode pipeline 406. Here the subsegment is decoded, followed by re-encoding with alternative encoding parameters and tools, which achieves higher compression, thus, reducing the size of the encoded media subsegment. This modified subsegment is packaged 407 as “moof” and “mdat” boxes as required by the ISO Base Media File Format container format. To make sure that the stream continues to conform to the original “sidx” indexing segment, the “mdat” media subsegment box is followed by the insertion of ISOBMFF “free” box 408. The “free” box is an optional ISOBMFF syntax element that consumes space but has no functional role. The size of the “free” box is specified by a four byte field such that the combined size of the “moof”, “mdat” and “free” boxes equals the original media subsegment size. In other words, the “free” box size compensates for the modified subsegments size reduction. Note, at this functional data point the “free” box is empty and only possesses a size property, but no actual payload.
The new subsegment representation (with “free” box size but no payload) is smaller than the original and thus saves bandwidth over a transmission channel or storage device. The “free” box, however, must ultimately include a payload of pad bytes equal to its declared size before the modified subsegment is passed to the DASH player 109. To achieve this, the Padding Agent module 114 in
A complication can come up in the above mentioned system when the intercepting media server (IMS) 102 does not transcode all the subsegments;
The above described devices and subsystems of the illustrative embodiments can include, for example, any suitable servers, workstations, PCs, laptop computers, PDAs, Internet appliances, handheld devices, cellular telephones, wireless devices, computer architectures including x86, ARM, MIPS with operating system (OS) platforms including Windows, Linux, iOS, Android, other electronic devices, and the like, capable of performing the processes of the illustrative embodiments. The devices and subsystems of the illustrative embodiments can communicate with each other using any suitable protocol and can be implemented using one or more programmed computer systems or devices. One or more interface mechanisms can be used with the illustrative embodiments, including, for example, Internet access, telecommunications in any suitable form (e.g., voice, modem, and the like), wireless communications media, and the like. For example, employed communications networks or links can include one or more wireless communications networks, cellular communications networks, cable communications networks, satellite communications networks, G3 communications networks, Public Switched Telephone Network (PSTNs), Packet Data Networks (PDNs), the Internet, intranets, WiMAX Networks, “cloud” computer networks, virtual machine and hosting networks, a combination thereof, and the like.
It is to be understood that the devices and subsystems of the illustrative embodiments are for illustrative purposes, as many variations of the specific hardware and/or software used to implement the illustrative embodiments are possible, as will be appreciated by those skilled in the relevant art(s). For example, the functionality of one or more of the devices and subsystems of the illustrative embodiments can be implemented via one or more programmed computer systems or devices.
To implement such variations as well as other variations, a single computer system can be programmed to perform the special purpose functions of one or more of the devices and subsystems of the illustrative embodiments. On the other hand, two or more programmed computer systems or devices can be substituted for any one of the devices and subsystems of the illustrative embodiments. Accordingly, principles and advantages of distributed processing, such as redundancy, replication, and the like, also can be implemented, as desired, to increase the robustness and performance the devices and subsystems of the illustrative embodiments.
The devices and subsystems of the illustrative embodiments can store information relating to various processes described herein. This information can be stored in one or more memories, such as a hard disk, optical disk, magneto-optical disk, RAM, and the like, of the devices and subsystems of the illustrative embodiments. One or more databases of the devices and subsystems of the illustrative embodiments can store the information used to implement the illustrative embodiments of the present invention. The databases can be organized using data structures (e.g., records, tables, arrays, fields, graphs, trees, lists, and the like) included in one or more memories or storage devices listed herein. The processes described with respect to the illustrative embodiments can include appropriate data structures for storing data collected and/or generated by the processes of the devices and subsystems of the illustrative embodiments in one or more databases thereof. All or a portion of the devices and subsystems of the illustrative embodiments can be conveniently implemented using one or more general purpose computer systems, microprocessors, digital signal processors, micro-controllers, application processors, domain specific processors, application specific signal processors, and the like, programmed according to the teachings of the illustrative embodiments of the present invention, as will be appreciated by those skilled in the computer and software arts. Appropriate software can be readily prepared by programmers of ordinary skill based on the teachings of the illustrative embodiments, as will be appreciated by those skilled in the software art. In addition, the devices and subsystems of the illustrative embodiments can be implemented by the preparation of application-specific integrated circuits or by interconnecting an appropriate network of conventional component circuits, as will be appreciated by those skilled in the electrical art(s). Thus, the illustrative embodiments are not limited to any specific combination of hardware circuitry and/or software.
Stored on any one or on a combination of computer readable media, the illustrative embodiments of the present invention can include software for controlling the devices and subsystems of the illustrative embodiments, for driving the devices and subsystems of the illustrative embodiments, for enabling the devices and subsystems of the illustrative embodiments to interact with a human user, and the like. Such software can include, but is not limited to, device drivers, firmware, operating systems, development tools, applications software, and the like. Such computer readable media further can include the computer program product of an embodiment of the present invention for performing all or a portion (e.g., if processing is distributed) of the processing performed in implementing the illustrative embodiments. Computer code devices of the illustrative embodiments of the present invention can include any suitable interpretable or executable code mechanism, including but not limited to scripts, interpretable programs, dynamic link libraries (DLLs), Java classes and applets, complete executable programs, Common Object Request Broker Architecture (CORBA) objects, SW frameworks including .NET/CLR, JVM, scripting frameworks including PHP, Python, Perl, Shell, and the like. Moreover, parts of the processing of the illustrative embodiments of the present invention can be distributed for better performance, reliability, cost, and the like.
As stated above, the devices and subsystems of the illustrative embodiments can include computer readable medium or memories for holding instructions programmed according to the teachings of the present invention and for holding data structures, tables, records, and/or other data described herein. Computer readable medium can include any suitable medium that participates in providing instructions to a processor for execution. Such a medium can take many forms, including but not limited to, non-volatile media, volatile media, transmission media, and the like. Non-volatile media can include, for example, optical or magnetic disks, magneto-optical disks, flash memories, and the like. Volatile media can include dynamic memories, and the like. Transmission media can include coaxial cables, copper wire, fiber optics, and the like. Transmission media also can take the form of acoustic, optical, electromagnetic waves, and the like, such as those generated during radio frequency (RF) communications, infrared (IR) data communications, transmission media including WiFi/802.11, BT, 3G, LTE, and the like. Common forms of computer-readable media can include, for example, a floppy disk, a flexible disk, hard disk, magnetic tape, any other suitable magnetic medium, a CD-ROM, CDRW, DVD, solid-state drive (SSD) storage devices, any other suitable optical medium, punch cards, paper tape, optical mark sheets, any other suitable physical medium with patterns of holes or other optically recognizable indicia, a RAM, a PROM, an EPROM, a FLASH-EPROM, a DRAM, a DDR, a NAND/NOR flash device, any other suitable memory chip or cartridge, a carrier wave, or any other suitable medium from which a computer can read.
While the present invention have been described in connection with a number of illustrative embodiments and implementations, the present invention is not so limited, but rather covers various modifications and equivalent arrangements, which fall within the purview of the appended claims.
The present invention is a continuation of is a continuation of U.S. patent application Ser. No. 17/176,157 of KARTHIK RAJA THANGARAJ et al., entitled “METHOD AND SYSTEM FOR REAL-TIME TRANSCODING OF MPEG-DASH ON-DEMAND MEDIA SEGMENTS WHILE IN TRANSIT FROM CONTENT HOST TO DASH CLIENT,” filed on Feb. 15, 2021, now allowed, which is a continuation of U.S. patent application Ser. No. 14/748,567 of KARTHIK RAJA THANGARAJ et al., entitled “METHOD AND SYSTEM FOR REAL-TIME TRANSCODING OF MPEG-DASH ON-DEMAND MEDIA SEGMENTS WHILE IN TRANSIT FROM CONTENT HOST TO DASH CLIENT,” filed on Jun. 24, 2015, now U.S. Pat. No. 10,924,781, which claims priority to U.S. Provisional patent application Ser. No. 62/018,160 of KARTHIK RAJA THANGARAJ et al., entitled “METHOD AND SYSTEM FOR REAL-TIME TRANSCODING OF MPEG-DASH ON-DEMAND MEDIA SEGMENTS WHILE IN TRANSIT FROM CONTENT HOST TO DASH CLIENT,” filed on Jun. 27, 2014, the entire disclosure of which is hereby incorporated by reference herein.
Number | Date | Country | |
---|---|---|---|
62018160 | Jun 2014 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 17176157 | Feb 2021 | US |
Child | 17971441 | US | |
Parent | 14748567 | Jun 2015 | US |
Child | 17176157 | US |