The present disclosure generally relates to multimedia content and more particularly, to systems and methods for streaming video based on conversion of a target key frame within the video bitstream.
As smartphones and other mobile devices have become ubiquitous, people have the ability to stream video content virtually any time. Furthermore, with an ever-growing amount of content available to consumers through the Internet and other sources, consumers have access to a vast amount of digital content. Furthermore, mobile devices can begin playback of content without having to first download the entire video. However, there are perceived shortcomings within conventional streaming techniques.
Briefly described, one embodiment, among others, is a method implemented in a computing device for streaming a video to a client device. The method comprises transmitting information relating to a video bitstream to a client device, the client device displaying a user interface to a user based on the information relating to the video bitstream. The method further comprises receiving an input signal from the client device indicating a desired start time in the video bitstream, the desired start time specifying a point in the video bitstream to begin streaming. Based on the start time, a target key frame is identified within the video, the target key frame comprising a nearest key frame in the video bitstream preceding the desired start time. The method further comprises identifying a target frame header within the video bitstream, the target frame header comprising a nearest header frame in the video preceding the target key frame and converting the target key frame to a first frame type. The method further comprises sending only the target frame header, the target key frame, and video bitstream that follows the converted target key frame to the client device, wherein an original frame referencing encoded in the video bitstream is utilized, and wherein the client device decodes the video bitstream utilizing the target frame header and the target key frame.
Another embodiment is a system that comprises a memory storing instructions and a processor coupled to the memory. The processor is configured by the instructions to transmit information relating to a video bitstream to a client device, the client device displaying a user interface to a user based on the information relating to the video bitstream. The processor is further configured by the instructions to receive an input signal from the client device indicating a desired start time in the video bitstream, the desired start time specifying a point in the video bitstream to begin streaming. Based on the start time, a target key frame is identified within the video, the target key frame comprising a nearest key frame in the video bitstream preceding the desired start time. The processor is further configured by the instructions to identify a target frame header within the video bitstream, the target frame header comprising a nearest header frame in the video preceding the target key frame and convert the target key frame to a first frame type. The processor is further configured by the instructions to send only the target frame header, the target key frame, and video bitstream that follows the converted target key frame to the client device, wherein an original frame referencing encoded in the video bitstream is utilized, and wherein the client device decodes the video bitstream utilizing the target frame header and the target key frame.
Various aspects of the disclosure can be better understood with reference to the following drawings. Moreover, in the drawings, like reference numerals designate corresponding parts throughout the several views.
Conventional systems utilize various techniques for streaming video content to remote devices. A common technique involves streaming an entire source video to the device where the video is being viewed. If the viewer elects to start playback at a point within the video other than at the beginning, the computing device begins streaming at the nearest possible point prior to the desired start point. However, there is generally a delay in viewing the video at the actual desired starting point as the system must first decode video content that precedes the actual desired start point.
Another technique for streaming video involves transcoding the video. One advantage of this technique is that the transcoding is applied to the actual desired start point selected by the user, thereby avoiding the need to stream unwanted portions of the video. The transcoded video is then streamed to the user. However, one perceived shortcoming with this approach is that the transcoding process is computationally intensive and inadequate computing resources may result in delays during the streaming process. Another perceived shortcoming with transcoding is the progressive loss of video quality. Video standards include H.264/AVC, H.265/HEVC, MPEG 2, VP8, VP9, and so on. For purposes of illustration, the following embodiments are based on the H.264/AVC standard. However, it should be noted that different standards/codecs may be implemented into the system while performing the streaming techniques disclosed herein.
The H.264/AVC codec format allows multiple views to be encoded into a single bitstream and introduces inter-view reference to take advantage of the spatial locality of different views. Inter-view prediction involves decoded samples of inter-view reference pictures or inter-view-only reference components for decoding another view component in the same access unit. The H.264/AVC standard for video coding defines a set of prediction mechanisms that include intra-prediction and inter-prediction in which frames in a video bitstream are expressed in terms of one or more neighboring frames. The H.264/AVC standard further comprises bi-directional prediction (B-frame) for advanced compression quality. A typical group of picture (GOP) structure comprises I-frames, B-frames, and P-frames. The I-frame is used to predict the first P-frame and these two frames are also used to predict the first and the second B-frame. The second P-frame is predicted using the first P-frame and they join to predict the third and fourth B-frames.
In order to arrange decoded pictures for prediction, H.264/AVC specifies rules relating to picture ordering in reference lists as well as rules relating to picture reordering for better compression flexibility. Within the H.264/AVC standard, a reference list comprises a list of reference pictures used for inter prediction of a P, B, or SP slice. As defined in the H.264/AVC standard, a slice relates to an integer number of macroblocks or macroblock pairs ordered consecutively in the raster scan within a particular slice group. The macroblock addresses are derived from the first macroblock address in a slice, as represented in the slice header, and the macroblock to slice group map. A slice header relates to a part of a coded slice containing the data elements relating to the first or all macroblocks represented in the slice.
Various embodiments are disclosed for systems and methods for streaming video whereby the spacing between the nearest appropriate start point to the actual desired start point is reduced, thereby improving the user experience by reducing any latency that may occur during decoding process. Compared to conventional streaming techniques, the nearest appropriate start point in accordance with various embodiments is generally closer to the actual desired start point. A description of a system for implementing video streaming is now described followed by a discussion of the operation of the components within the system.
The client devices 103 may be embodied, for example, as a desktop computer, computer workstation, laptop, smartphone, tablet, or other computing system with display capability. For example, the client devices 103 may comprise a display device such as a smart TV or other device with similar video streaming capabilities. In other embodiments, the client devices 103 may be embodied as a video gaming console, which includes a video game controller for receiving user input. For such embodiments, the video gaming console may be connected to a television or other display.
A content handler 104 executes on a processor of the computing device 102 and configures the processor to perform various operations, as described in more detail below. The content handler 104 retrieves media content 116 from a data store 117 and processes the retrieved media content 116, as described in more detail below. As shown, the data store 117 may be implemented on a separate computing device 102. Alternatively, the data store 117 may be implemented within the computing device 102.
The content handler 104 comprises various components for processing the retrieved media content 116, where such components include a key frame identifier 106, a header identifier 108, a frame type converter 110, and a content streamer 112. The content handler 104 further comprises a user interface component 114 for facilitating the rendering of user interfaces on client devices 103. The user interfaces are rendered on the display of client devices 103 for receiving an input signal indicating the desired start points in videos and for displaying playback of streaming video.
A content viewer application 105 executes on the client devices 103 and/or computing device 102 and allows the user to specify desired start time(s) and responsive to the specified start time(s), view streaming video sent by the computing device 102. In this regard, the user interface can be displayed on a computing device 102 and/or the client devices 103. In some embodiments, the client devices 103 are utilized primarily for displaying streaming video. Thus, a user can specify the desired start time via either device. For example, in one implementation, the computing device 102 can be embodied as a personal computer and provides a user interface that allows the user to specify a desired start time. In another implementation, the client device 103 can be embodied as a television that displays a user interface, where the user can specify a desired start time using a remote controller or other device.
The processing device 202 may include any custom made or commercially available processor, a central processing unit (CPU) or an auxiliary processor among several processors associated with the computing device 102, a semiconductor based microprocessor (in the form of a microchip), a macroprocessor, one or more application specific integrated circuits (ASICs), a plurality of suitably configured digital logic gates, and other well known electrical configurations comprising discrete elements both individually and in various combinations to coordinate the overall operation of the computing system.
The memory 214 can include any one of a combination of volatile memory elements (e.g., random-access memory (RAM, such as DRAM, and SRAM, etc.)) and nonvolatile memory elements (e.g., ROM, hard drive, tape, CDROM, etc.). The memory 214 typically comprises a native operating system 216, one or more native applications, emulation systems, or emulated applications for any of a variety of operating systems and/or emulated hardware platforms, emulated operating systems, etc. For example, the applications may include application specific software which may comprise some or all the components of the computing device 102 depicted in
Input/output interfaces 205 provide any number of interfaces for the input and output of data. For example, where the computing device 102 comprises a personal computer, these components may interface with one or more user input/output interfaces 205, which may comprise a keyboard or a mouse, as shown in
In the context of this disclosure, a non-transitory computer-readable medium stores programs for use by or in connection with an instruction execution system, apparatus, or device. More specific examples of a computer-readable medium may include by way of example and without limitation: a portable computer diskette, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM, EEPROM, or Flash memory), and a portable compact disc read-only memory (CDROM) (optical).
Although the flowchart 300 of
To begin, at step 302, the computing device 102 retrieves media content 116 comprising a video from a data store 117 and sends user interface data relating to the video to the client devices 103. At step 304, the client devices 103 displays the user interface based on the user interface data. At step 306, a user of the client device 103 then specifies a desired start time in the video for playback, where the desired start time may comprise any point in the video. At step 306, the client devices 103 sends the desired start time to the computing device 102.
At step 308, based on the received start time, the key frame identifier 106 in the computing device 102 identifies a closest key frame preceding the desired start. To illustrate, reference is made to
Referring back to
Reference is made back to
An incompatible frame refers to a frame type that requires prior decoding of another frame (e.g., an I-frame or P-frame) in order to be decoded. For some embodiments, if the target key frame is an I-frame, the frame type converter 110 in the computing device 102 may be configured to convert the I-frame to an Instantaneous Decoding Refresh (IDR) frame (as defined in the H.264/AVC and HEVC video coding standards), where the IDR is a type of I-frame where no frame after the IDR frame refers to any frame preceding the IDR frame. Conversion of the I-frame to an IDR frame facilitates random access within the video so that playback can start on an IDR frame. In other embodiments, the frame type converter 110 may be configured to insert a blank frame comprising an IDR frame in front of the target key frame. In the other embodiments, if the target key frame is an inter coded frame (e.g., a B-frame or a P-frame), the frame type converter 110 directly converts the target key frame to the first frame type (e.g., IDR frame).
In step 314, the original reference number of frames is preserved and utilized during the streaming process. Specifically, rather than modifying the reference list, the content handler 104 reuses the original reference list for frame referencing during the streaming process. In step 316, content that comprises only the header, key frame, and video bitstream that follows the key frame is streamed. To illustrate, reference is made to
Note that as shown in
Reference is made to
Although the flowchart 1000 of
In block 1010, information relating to a video bitstream is transmitted to a client device 103, and the client device 103 displays a user interface to a user based on the information relating to the video bitstream. The information can comprise metadata associated with the video bitstream such as the total playback time of the video bitstream, descriptive information, and so on.
In block 1020, an input signal is received from the client device indicating a desired start time in the video bitstream, where the desired start time specifies a point in the video bitstream to begin streaming. For some implementations, the user interface described above can include a timeline component to facilitate selection of the desired start time by the user. The timeline component can include other components such as thumbnail graphics corresponding to frames within the video bitstream.
In block 1030, a target key frame is identified within the video based on the start time. The target key frame comprises a nearest key frame in the video bitstream preceding the desired start time, as shown, for example, in
In block 1050, the target key frame is converted to a first frame type. In some implementations, this conversion is performed if the target key frame is an inter coded frame (e.g., a B-frame, P-frame). As previously described, the I-frame can also be converted to a first frame type (e.g., IDR frame). In block 1060, only the target frame header, the target key frame, and video bitstream that follows the converted target key frame is sent to the client device, where the client device then decodes the video bitstream utilizing the target frame header and the target key frame. In some implementations, the original frame referencing encoded in the video bitstream is utilized. Thereafter the process ends.
It should be emphasized that the above-described embodiments of the present disclosure are merely possible examples of implementations set forth for a clear understanding of the principles of the disclosure. Many variations and modifications may be made to the above-described embodiment(s) without departing substantially from the spirit and principles of the disclosure. All such modifications and variations are intended to be included herein within the scope of this disclosure and protected by the following claims.
This application claims priority to, and the benefit of, U.S. Provisional Patent Application entitled, “Systems and Methods for Video Streaming Based on Conversion of a Target Key Frame,” having Ser. No. 62/299,589, filed on Feb. 25, 2016, which is incorporated by reference in its entirety.
Number | Name | Date | Kind |
---|---|---|---|
5818439 | Nagasaka | Oct 1998 | A |
6029045 | Picco | Feb 2000 | A |
6654541 | Nishi | Nov 2003 | B1 |
6658199 | Hallberg | Dec 2003 | B1 |
6871006 | Oguz | Mar 2005 | B1 |
7058721 | Ellison | Jun 2006 | B1 |
7096488 | Zhang | Aug 2006 | B1 |
7231516 | Sparrell | Jun 2007 | B1 |
7986867 | Gavin | Jul 2011 | B2 |
9565476 | Bakke | Feb 2017 | B2 |
20010005400 | Tsujii | Jun 2001 | A1 |
20010017978 | Nagasawa | Aug 2001 | A1 |
20020028061 | Takeuchi | Mar 2002 | A1 |
20020051621 | Cuccia | May 2002 | A1 |
20020144262 | Plotnick | Oct 2002 | A1 |
20020168175 | Green | Nov 2002 | A1 |
20030118243 | Sezer | Jun 2003 | A1 |
20040231004 | Seo | Nov 2004 | A1 |
20070171973 | Kobayashi | Jul 2007 | A1 |
20080154941 | Park | Jun 2008 | A1 |
20080212775 | Mirsky | Sep 2008 | A1 |
20080267287 | Hannuksela | Oct 2008 | A1 |
20080273698 | Manders | Nov 2008 | A1 |
20080317246 | Manders | Dec 2008 | A1 |
20090106807 | Suzuki | Apr 2009 | A1 |
20090282444 | Laksono | Nov 2009 | A1 |
20100050209 | Price | Feb 2010 | A1 |
20100180011 | Sood | Jul 2010 | A1 |
20100290537 | Moriyoshi | Nov 2010 | A1 |
20110153858 | Bae | Jun 2011 | A1 |
20110264676 | Belan | Oct 2011 | A1 |
20110317771 | Chen | Dec 2011 | A1 |
20120008921 | Hattori | Jan 2012 | A1 |
20120147122 | Wan | Jun 2012 | A1 |
20120183042 | Kiran | Jul 2012 | A1 |
20120240174 | Rhyu | Sep 2012 | A1 |
20130003867 | Miki | Jan 2013 | A1 |
20130321364 | Chong | Dec 2013 | A1 |
20140086326 | Dziecielewski | Mar 2014 | A1 |
20140105576 | Lou | Apr 2014 | A1 |
20140376886 | Pettersson | Dec 2014 | A1 |
20150358382 | Chen et al. | Dec 2015 | A1 |
20170034538 | Lee | Feb 2017 | A1 |
Number | Date | Country | |
---|---|---|---|
20170251284 A1 | Aug 2017 | US |
Number | Date | Country | |
---|---|---|---|
62299589 | Feb 2016 | US |