The present disclosure generally relates to playing video in reverse and more particularly but not exclusively relates to smoothly playing video in reverse at a faster than normal speed.
Entertainment systems are used to present audio and video information to users. For example, satellite and cable television systems present programming content to users through presentation systems such as televisions and stereos. The programming content may include sporting events, news events, television shows, or any other information. The programming content generally includes audio information, video information, and control information which coordinates the presentation of the audio and video data.
In many cases, the programming content is encoded according to an accepted multimedia encoding standard. For example, the programming content may conform to an ITU-T H.264 standard, an ISO/IEC MPEG-4 standard, or some other standard.
In many cases, the accepted multimedia encoding standard will encode the video data as a sequence of constituent frames. The constituent frames are used independently or in combination to generate presentable video frames which can be sent in sequence to a presentation device such as a display. The video data may, for example, be encoded as a video data stream of I-frames, P-frames, and B-frames according to a multimedia standard protocol.
An I-frame, or intra-frame, is a frame of video data encoded without reference to any other frame. A video data stream will begin with an I-frame. Subsequent I-frames will be included in the video data stream at regular intervals. I-frames typically provide identifiable points for specific access into the video data stream. For example, when a user is seeking to find a particular point in a multimedia file, a decoder may access and decode I-frames in a video data stream in either a fast-forward or reverse playback mode. An advantage of I-frames is that they include enough information to generate a complete frame of presentable data that can be sent to a display device. A disadvantage of I-frames is that they are relatively large compared to other frames.
A P-frame, or predictive inter-frame, is encoded with reference to a previous I-frame or a previous P-frame. Generally speaking, a P-frame does not include enough information to generate static elements of a presentable frame that have not changed from previous frames. Instead, the P-frame merely references a particular previous frame and uses the video information that is found in the previous frame. Stated differently, the areas of a presentable frame that have not changed are propagated from a previous frame, and only the areas of a presentable frame that have changed (i.e., the areas that are in motion) are updated in the current frame. Thus, only the areas of the presentable frame that are in motion are encoded in the P-frame. Accordingly, P-frames are generally much smaller in size than I-frames.
A B-frame, or bi-directionally predictive inter-frame, is encoded with reference to one or more preceding reference frames as well as one or more future reference frames. B-frames improve the quality of multimedia content by smoothly tying video frames of moving video data together. A B-frame is typically very small in size relative to I-frames and P-frames. On the other hand, a B-frame typically requires more memory, time, and processing capability to decode.
The encoded video data 108, 110 is transmitted via a wired or wireless network 112. An entertainment device 114 is configured to receive the encoded video data. The entertainment device 114 is further configured to decode the encoded video data 108, 110 into a sequence of presentable video frames that are subsequently passed to a presentation device 116.
The entertainment device 114 includes an input circuit 117 to receive a stream of video data 108, 110. The input circuit 117 is configured as a front-end circuit (e.g., as found on a set top box) to receive the video data stream 108, 110. The input circuit 117 may receive many other types of data in addition to video data, and the data may arrive in any one of the many formats. For example, the data may include over the air broadcast television (TV) programming content, satellite or cable TV programming content, digitally streamed multimedia content from outside the entertainment device, digitally streamed multimedia content from a storage medium located inside the entertainment device 114 or coupled to it, and the like.
The entertainment device 114 of
The CPU 118 is configured to operate a video decoder module 119. The video decoder module 119 may be separate and distinct from the CPU 118, or the video decoder module 119 may be integrated with the CPU 118. Alternatively, the video decoder module may be integrated with the functionality of a GPU 120. Generally speaking, the video decoder module 119 may include hardware and software to parse a video data stream into constituent frames and decode constituent frames to produce presentable video frames.
The entertainment device 114 includes a graphics processing unit (GPU) 120. The GPU 120 typically includes a processing unit, memory, and hardware circuitry particularly suited for presenting image frames to a presentation device. The GPU 120 performs certain video processing tasks independently and other tasks under control of the CPU 118. In some cases the GPU 120 is a separate processing device coupled to the CPU 118, and in other cases the GPU 120 is formed as part of the functionality of CPU 118.
A graphics generator, or on-screen display (OSD) 122 is a module configured to superimpose graphic images on a presentation device 116. The OSD 122 is typically used to display information such as volume, channel, and time. The information generated by the OSD 122 is generally prepared as an overlay to video data generated by the GPU 120. A video multiplexor 124 selects the video information that is passed to the presentation device 116. In some cases, the video multiplexor 124 selects information from either the GPU 120 or the OSD 122. In other cases, the video multiplexor 124 selects information from both the GPU 120 and the OSD 122, and in such cases the information from the OSD 122 is typically superimposed on the information from the GPU 120.
Referring back to
The entertainment device 114 of
A conventional entertainment device 114, such as the one illustrated in
It has been noticed by some users that certain video data streams can be played back smoothly in reverse by the entertainment device 114 and other video data streams, when they are played in reverse, are not played smoothly at all. Instead, when those other video data streams are played in reverse, the user sees a very choppy playback and has an overall poor user experience. Upon further study, it has been found that the video data streams that play in reverse smoothly are encoded with a simpler protocol than the video data streams that are encoded with a more complex protocol.
High definition (HD) video data streams, such as MPEG-4, require a substantially more complex decoder than lower resolution video data streams (e.g., MPEG-2). In the circuitry of a complex decoder, there is typically not enough SRAM, frame buffer, and other fast memory to smoothly play the higher definition video in reverse. For example, a higher definition encoding protocol may require 32 (or even more) frames resident in memory in order to construct the 33rd frame.
The higher definition video data encoding protocol provides increased efficiency when playing data in the forward direction. Accordingly, when playing data in the forward direction at normal speed, and even at high speed, a decoder can generate the presentable frames of data at a sufficiently high speed. Particular limitations in memory, computational capability, and the like will determine how quickly a video data stream can be decoded and played in the forward direction, but the highest speed of the decoder has been found to be generally sufficient to provide a satisfactory user experience during forward play.
When a high definition video stream of data is played in reverse, however, it has been found that conventional decoders do not provide a good viewing experience for the user. In conventional configurations, the high-definition (HD) video is too complex to decode in reverse at real-time speeds. Decoding each frame of an HD video stream consumes substantial memory resources to temporarily store at least one intra-frame and many inter-frames and substantial computing resources to analyze and process all relationships between frames. Accordingly, conventional configurations play HD video streams in reverse merely by identifying and decoding progressively previous intra-frames and outputting them.
Since the conventional decoder cannot configure its resources to decode each frame of the video data stream on the fly, in reverse, and at high speed, the user will typically see a very choppy playback of the video data stream. This is contrasted with a lower definition video data stream, which has a more predictable, fixed structure of relationships between standalone intra-frames and reliant inter-frames. Typically, the conventional decoder does have resources configured to decode each frame of the lower definition video data stream on the fly, even in reverse. Since each frame of the lower definition video stream can be decoded in reverse, the user will see a smooth presentation of video frames in reverse.
One common technique now employed by conventional decoders to play higher definition video data streams in reverse is to only generate presentable video frames from I-frames of the video data stream. The problem, however, is that a higher definition video data stream may only have one or two (or fewer) I-frames per second of normal speed video data. In this case, when the higher definition video data stream is played in reverse, the user will see a very choppy video of frames that appear to change only once or twice per second or even less.
The streaming data flow of video frames 146 represents a normal speed, forward playback of the video data. The entertainment device 114 can also be directed by a user for particular trick play. That is, a user can direct the entertainment device 114 to play the video data at a higher speed in the forward direction or in the reverse direction. A fast-forward flow of presentable frames 148 is shown in
When the user directs the entertainment device 114 to play the video in a fast-forward mode,
When the user directs the entertainment device 114 to play the video in a reverse-play mode, only the I-frames are decoded. When viewed by a user, the reverse play flow of frames 150 will appear very choppy.
In embodiments of the present invention, the HD video stream may be smoothly played in reverse video playback on low-cost, current generation set-top box hardware. Progressively earlier segments (e.g., one second segments of time N, time N−1, time N−2, etc.) of the HD video stream are identified. The operation begins by decoding a first segment in a forward direction and storing the decoded presentable frames in a buffer. After the decode-and-store act of the first segment, subsequent segments are also decoded-and-stored. Concurrently, as a new, earlier segment is decoded-and-stored, the presentable frames of a previously decoded-and-stored segment are retrieved in reverse order and output to an attached display. The operation can proceed by alternating decode-and-store tasks to one buffer with retrieve-and-output tasks from another buffer. The use of the two buffers alternates in a ping-pong technique.
In one embodiment a method to play video in reverse includes decoding a first plurality of bits of a video data stream into a first sequence of presentable frames ordered for forward play from frame (Y+1) to frame Z, wherein Y and Z are integers, and Z is larger than Y. The first sequence of presentable frames is stored in a first buffer. Then a second plurality of bits of the video data stream is decoded into a second sequence of presentable frames ordered for forward play from frame (X+1) to frame Y, wherein X is an integer, and Y is larger than X. The second sequence of presentable frames is stored in a second buffer. The first sequence of presentable frames is retrieved from the first buffer, and the first sequence of presentable frames are output as a reverse playing video stream of frames ordered frame Z to (Y+1). The second sequence of presentable frames are retrieved from the second buffer, and the second sequence of presentable frames are output as a reverse playing video stream of frames ordered Y to X+1.
In another embodiment, an entertainment device includes an input circuit to receive a stream of video data; a memory configurable as a plurality of buffers; a video decoder module; an on-screen display controller; a processing unit, the processing unit configured to direct the video decoder module to decode a first segment of the stream of video data into a first series of presentable frames and store the first series of presentable frames in a first buffer; the processing unit configured to direct the video decoder module to decode a second segment of the stream of video data into a second series of presentable frames and store the second series of presentable frames in a second buffer; and concurrent with the decoding of the second segment, the processing unit configured to direct the on-screen display controller to output the first series of presentable frames from the first buffer in a reverse direction.
In yet another embodiment, a non-transitory computer-readable storage medium whose stored contents configure a computing system to perform a method includes directing a video output module to output a decoded sequence of video frames to a display device; storing a decoded and down-sampled first sequence of presentable frames; storing a decoded and down-sampled second sequence of presentable frames; and while storing the decoded and down-sampled second sequence of presentable frames, directing an on-screen display module to output the first sequence of presentable frames in reverse order to the display device.
Non-limiting and non-exhaustive embodiments are described with reference to the following drawings, wherein like labels refer to like parts throughout the various views unless otherwise specified. The sizes and relative positions of elements in the drawings are not necessarily drawn to scale. For example, the shapes of various elements and angles are not necessarily drawn to scale, and some of these elements are enlarged and positioned to improve drawing legibility. One or more embodiments are described hereinafter with reference to the accompanying drawings in which:
A conventional video decoder is not configured to smoothly play a high definition video data stream in reverse. Due to the complexity of the high definition encoding protocol, the decoder, as conventionally configured, cannot delete interim data and re-decode new data fast enough to play the video stream in reverse. Additionally, the decoder cannot avoid all of the repeated decoding by maintaining the interim data frames because of the amount of memory that would be required for smooth reverse playback. The decoder does not have enough memory to keep all of the interim frames used during the decode process of the high definition video data stream such that the video can be played smoothly in reverse.
A solution to the problem of not being able to smoothly play high definition video in reverse is now proposed. The solution is robust enough to smoothly play the high definition video stream in reverse at normal speed and at faster than normal speed.
In one embodiment, a user will direct an entertainment device to play a high definition video stream in reverse. Upon the device being directed to play in reverse, a segment of the video data stream will first be decoded in the forward direction to generate a sequence of presentable frames. The decoded presentable frames of the segment, which may be reduced in quality, are stored in sequence. Subsequently, the presentable frames of the sequence will be output to the presentation device in reverse order. During that time when the presentable frames of this sequence are being output in reverse order, the next earlier segment of the video data stream will be decoded in the forward direction. The presentable frames of the next earlier segment, which may also be reduced in quality, are stored in sequence. After the supply of presentable frames from the first segment is exhausted, the presentable frames from the next earlier segment will be output to the presentation device in reverse order. As the second segment of frames is being output, a third even earlier segment of the video data stream will be decoded. The process of decoding each earlier segment of video data during the time that later frames are being output in reverse can continue until the user directs the entertainment device to stop the reverse playback.
The entertainment device 214 includes an input circuit 117 to receive a stream of video data 108, 110. A CPU 118 is configured to control operations of the entertainment device 214 including a video decoder module 119. That is, the CPU 118 and the video decoder module 119 may work cooperatively to decode a plurality of bits of the video data stream into a sequence of presentable frames for display on a presentation device. In
The entertainment device 214 includes a user input control 126 and a user input circuit 128. The entertainment device 214 also includes a video multiplexor 124 which is configured to receive video data from a GPU 120 or an OSD 122. The video multiplexor 124 is coupled to a presentation device 116.
The entertainment device 214 includes memory 130. The memory 130 may be comprised of one or more memory devices. The memory 130 may be internal, external, or some combination of both. The memory 130 may be formed as a non-transitory computer-readable storage medium whose stored contents configure a computing system to perform particular acts of a method.
As will be described further, memory 130 is configured with various buffers and pointers to the buffers. The buffers may be allocated physically or virtually. That is, the buffers may be instantiated at a physical address in a specific memory area, or the buffers may be continually allocated and released as memory from a common pool or in some other scheme. As used herein, it is understood that the buffers described as being formed in memory 130 are not necessarily formed with such specificity. Instead, each use of a specifically named buffer identified herein may be formed in the same or in a different memory location by any known programming techniques.
In
The memory 130 of
A reverse playback module 144 is also formed within the entertainment device 214. The reverse playback module 144 may include particular dedicated hardware circuitry, or the reverse playback module 144 may include only software instructions that are executed by CPU 118. In one embodiment, the reverse playback module has some components that are independent and controlled by CPU 118 and other components that are carried out by the CPU 118.
A plurality of buffers 132, 134 is configured in the memory 130 of
The processing unit of the reverse playback module 144 directs the video decoder module to decode a first segment of the stream of video data 146 into a first series of presentable frames and store the first series of presentable frames in the first buffer 132. In
Subsequently, the processing unit of the reverse playback module 144 directs the video decoder module to decode a second segment of the stream of video data 146 into a second series of presentable frames and store the second series of presentable frames in the second buffer 134. The second segment also includes about 1 second of video data, and the second segment of video data in the data flow 146 occurs right after the first segment of video data. The presentable frames in the buffer 134 are labeled FM, FM+1, FM+2, . . . FM+X to indicate that the frames are stored in sequence. Various embodiments may store the frames in a forward-play sequence, a reverse-play sequence, or in some other configuration.
Additional acts are carried out by the processing unit of the reverse playback module 144 to smoothly present the video frames from the buffers in a reverse-play mode. Particularly, concurrent with the decoding of one segment of data drawn from the stream of video data 146, the processing unit is configured to direct an OSD 122 (
In an embodiment, an entertainment device 214 (
After the segment of data between second 3 and second 4 is decoded and stored, the segment of data between second 2 and second 3 is decoded. During the decoding of this second segment, the forward pointer FPA 136 is used as an index to store the presentable frames in the buffer 132. Concurrent with the decoding of this second segment, the OSD 122 is directed by the processing unit to output the presentable frames from buffer 134 to a presentation device 116. The reverse pointer RPB 142 is used as an index into the buffer 134 to retrieve the presentable frames in reverse order.
Subsequently, the decoding of the segment of data between second 2 and second 3 is completed and the presentable frames are stored in buffer 132. Additionally the presentable frames stored in buffer 134 have been output in reverse order to the presentation device 116 and viewed by the user as a sequence of smoothly playing reverse video.
In a ping-pong operation, alternating between buffers 132 and 134, a third segment of data between second 1 and second 2 is decoded and stored in buffer 134. During the decoding operation of the third segment, the reverse-play pointer RPA 140 is used by the OSD 122 to output the presentable frames of buffer 132 in reverse order to the presentation device 116. Then, in a corresponding fashion, the fourth segment of data between second 0 and second 1 is decoded while the presentable frames from the third segment of data between second 1 and second 2 are output in reverse order.
From the description of the decoding and outputting operations, the user will view the stream of video data 146 smoothly presented in reverse. The stream of video data 146 can be very short or alternatively, it can cover a very long stream of video data. The first and second buffers are alternately filled and emptied using the ping-pong technique. Segments of data are decoded in a forward direction and the presentable frames are stored in one buffer. Concurrently, the presentable frames are retrieved in a reverse direction from the other buffer.
Various decoding, storage, and retrieval embodiments are implemented. In one embodiment, each segment of data begins with an I-frame. Each frame of each segment is decoded and stored in the buffer. In some embodiments, only some of the decoded frames are stored in the buffers, and in some embodiments the decoded frames are down-sampled before they are stored in the buffers, for example, to a resolution that is below a standard definition (SD) resolution. That is, the decoded frames may have some information removed prior to storage in the buffers. The information that can be removed may include color depth, luminance, resolution, or other information. The down-sampling may reduce the size of the presentable frame such that less memory is used. The down-sampling may reduce the time used to store, retrieve, or display the presentable frame. In some embodiments, the down-sampling and determination of which frames will be stored permit video to be played smoothly in reverse even more quickly. That is, instead of smoothly playing the presentable frames in reverse at normal speed, the frames may be presented at two times, four times, eight times, sixteen times, the normal forward playback frame rate or by some other multiplier. In some embodiments the frames are presented at a user-selectable reverse-playback frame rate, and the reverse playback frame rate is limited only by the speed of a forward decoding operation. This is a significant improvement over the prior art.
Within the embodiment illustrated in
A complex device is used to efficiently decode the data flow 146 in a forward play order into presentable frames. In some cases, 8, 16, 32, or more frames are decoded, temporarily stored, and used to decode a subsequent frame. For example, in one case a presentable frame is formed from at least one I-frame and at least 24 inter-frames (i.e., P-frames and B-frames).
In the entertainment device 214, a processing unit is capable of decoding a first plurality of bits of the video data stream 146 into a first sequence of presentable frames ordered for forward play from frame (Y+1) to frame Z. The first sequence of presentable frames is stored in a first buffer. The processing unit is capable of decoding a second plurality of bits of the video data stream 146 into a second sequence of presentable frames ordered for forward play from frame (X+1) to frame Y. The second sequence of presentable frames is stored in the second buffer. Concurrently, the first sequence of presentable frames is retrieved from the first buffer. The first sequence of presentable frames is output as a reverse playing video stream of frames ordered from frame Z to frame (Y+1). Subsequently, the second sequence of presentable frames are retrieved from the second buffer and output as a reverse playing video stream of frames ordered from frame Y to frame (X+1).
Also in the entertainment device 214, the processing unit is capable of decoding a third plurality of bits of the video data stream 146 into a third sequence of presentable frames ordered for forward play from frame (W+1) to frame X. The third sequence of presentable frames is stored in a first buffer. The processing unit is capable of decoding a fourth plurality of bits of the video data stream 146 into a fourth sequence of presentable frames ordered for forward play from frame (V+1) to frame W. The fourth sequence of presentable frames is stored in the second buffer. Concurrently, the first sequence of presentable frames is retrieved from the first buffer. The first sequence of presentable frames is output as a reverse playing video stream of frames ordered from frame X to frame (W+1). Subsequently, the second sequence of presentable frames are retrieved from the second buffer and output as a reverse playing video stream of frames ordered from frame W to frame (V+1).
In the embodiment illustrated in
In the smooth reverse playback operations described with reference to
It has been recognized that in one aspect smoothly outputting the video in reverse includes a smooth transition between the operations that are concurrently performed on each buffer. That is, it is desirable to begin a decode-and-store operation with reference to the beginning of a retrieve-and-output operation such that both operations end at about the same time. Preferably, presentable frames are retrieved from each buffer according to a predictable rate. If there is any undesirable latency between the time a final presentable frame is retrieved from one buffer and a first presentable frame is retrieved from the other buffer, the latency is not discernible by a user viewing the reverse-play presentation.
There are many ways to synchronize the decode-and-store operation with the retrieve-and-output operation. In one embodiment, a processing unit will calculate a latency between beginning a decode operation and when a first presentable frame is stored in the buffer. The processing unit can then use the latency as a basis for outputting a reverse-playing series of presentable frames. In another embodiment, the latency is predicted and used to delay the start of a decode operation so that the decode operation will predictably end when the presentable frames are scheduled to be output.
These synchronization operations and latency calculations or predictions may be performed on each alternate buffer operation. Alternatively, the latency calculations or predictions may be performed when a reverse-play mode is commenced, and the latency values are used throughout the operation. Additionally, the latency values may be based on the frame rate of reverse playback. For example, the presentable frames may be output from the buffers in the reverse direction at any rate of N times a normal playback speed. N may be an integer between 2 and 32, or N may be some other value such as a user selectable value. In one embodiment, the rate of outputting a series of presentable frames from a buffer in the reverse direction is based on an upper limit of a rate of decoding a segment of the stream of video data. The particular rate at which presentable frames are output may affect the latency calculations or predictions for starting and stopping operations in the entertainment device 214.
Subsequent decode-and-store operations occur as described with respect to
At 902, frames (Y+1) to Z are decoded and stored in a first Buffer A. At 904, frames (X+1) to Y are decoded and stored in a second Buffer B. At 902a, concurrent with the decode-and-store operation at 904, the forward playing frames may optionally be output to the presentation device 116 with the GPU 120. At 902b, also concurrent with the decode-and-store operation at 904, the reverse playing frames Z to (Y+1) are output to the presentation device 116 with the OSD 122.
At 906, frames (W+1) to X are decoded and stored in the first Buffer A.
At 904a, concurrent with the decode-and-store operation at 906, the forward playing frames may optionally be output to the presentation device 116 with the GPU 120. At 904b, also concurrent with the decode-and-store operation at 906, the reverse playing frames Y to (X+1) are output to the presentation device 116 with the OSD 122.
At 908, frames (V+1) to W are decoded and stored in the second Buffer B.
At 906a, concurrent with the decode-and-store operation at 908, the forward playing frames may optionally be output to the presentation device 116 with the GPU 120. At 906b, also concurrent with the decode-and-store operation at 908, the reverse playing frames X to (W+1) are output to the presentation device 116 with the OSD 122.
At 910, additional data segments may be decoded and alternately stored in the second and first Buffer B and Buffer A.
At 908a, concurrent, continuing operations at 910, the forward playing frames may optionally be output to the presentation device 116 with the GPU 120. At 908b, also concurrent with continuing operations at 910, the reverse playing frames W to (V+1) are output to the presentation device 116 with the OSD 122.
In some embodiments, the reverse playing frames are superimposed on the forward playing frames, which are optionally output with the GPU 120. In other embodiments, the GPU 120 does not output the forward-playing frames. In some embodiments, the GPU 120 may even be configured to output the reverse playing frames.
In the foregoing description, certain specific details are set forth in order to provide a thorough understanding of various disclosed embodiments. However, one skilled in the relevant art will recognize that embodiments may be practiced without one or more of these specific details, or with other methods, components, materials, etc. In other instances, well-known structures associated with electronic and computing systems including client and server computing systems, as well as networks have not been shown or described in detail to avoid unnecessarily obscuring descriptions of the embodiments.
Unless the context requires otherwise, throughout the specification and claims which follow, the word “comprise” and variations thereof, such as, “comprises” and “comprising” are to be construed in an open, inclusive sense, e.g., “including, but not limited to.”
Reference throughout this specification to “one embodiment” or “an embodiment” and variations thereof means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment. Thus, the appearances of the phrases “in one embodiment” or “in an embodiment” in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
As used in this specification and the appended claims, the singular forms “a,” “an,” and “the” include plural referents unless the content clearly dictates otherwise. It should also be noted that the term “or” is generally employed in its sense including “and/or” unless the content clearly dictates otherwise.
The headings and Abstract of the Disclosure provided herein are for convenience only and do not interpret the scope or meaning of the embodiments.
The various embodiments described above can be combined to provide further embodiments. Aspects of the embodiments can be modified, if necessary to employ concepts of the various patents, applications and publications to provide yet further embodiments. These and other changes can be made to the embodiments in light of the above-detailed description. In general, in the following claims, the terms used should not be construed to limit the claims to the specific embodiments disclosed in the specification and the claims, but should be construed to include all possible embodiments along with the full scope of equivalents to which such claims are entitled. Accordingly, the claims are not limited by the disclosure.
This application is a continuation of U.S. patent application Ser. No. 13/461,564, filed May 1, 2012, which is incorporated by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
Parent | 13461564 | May 2012 | US |
Child | 15849588 | US |