1. Field
The present disclosure relates generally to telecommunication systems, and more particularly, to concepts and techniques for rapid tuning in multimedia applications.
2. Background
Digital video and audio compression technologies have ushered in an era of explosive growth in digital multimedia distribution. Since the early 1990's, international standards groups such as, for example, the Video Coding Experts Group (VCEG) of ITU-T and the Motion Pictures Expert Group of ISO/IEC, have developed international video recording standards. The standards developed include, for example, MPEG-1, MPEG-2, MPEG-4 (collectively referred to as MPEG-x), H.261, H.262, H.263, and H.264 (collectively referred to as H.26x).
The international video recording standards follow what is known as a block-based hybrid video coding approach. In the block-based hybrid video coding approach, pixels serve as the basis for the digital representation of a picture or, as it is commonly called and will be referred to in this disclosure, a frame. A group of pixels form what is known as a block. A common block size for performing digital compression operations is known as the macroblock. Macroblocks are made up of 16×16 pixels. Sub-macroblocks are made up of smaller sets of pixels including, for example, 16×8, 8×16, 8×8, 8×4, 4×8, and 4×4 pixels. Compression operations can also be performed on sub-macroblocks, therefore in order to not obscure the various concepts described herein, the operations will be discussed as operating on portions of a frame which can include all block sizes or groups of block sizes. A group of macroblocks form what is known as a slice. Slices can be made up of contiguous macroblocks in the form of, for example, a row, a column, a square, or a rectangle. Slices can also be made up of separated macroblocks or a combination of separated and contiguous macroblocks. Slices are grouped together to form a frame in a video sequence.
The MPEG-x and H.26x standards describe data processing and manipulation techniques that are well suited for the compression and delivery of video, audio and other information using fixed or variable length source coding techniques. In particular, the above-referenced standards, and other hybrid coding standards and techniques will compress video information using Intra-frame coding techniques (such as, for example, run-length coding, Huffman coding and the like) and Inter-frame coding techniques (such as, for example, forward and backward predictive coding, motion compensation and the like). Specifically, in the case of video processing systems, hybrid video processing systems are characterized by prediction-based compression encoding of video frames with Intra-frame and/or Inter-frame motion compensation encoding.
Inter-frame prediction coding exploits the fact that there are very few differences between two adjacent frames in a video sequence. Often the only difference is that some parts of the image have shifted slightly between frames. Inter-frame prediction coding can be used to partition a current frame into macroblocks and search an adjacent frame, or reference frame, to determine whether the macroblock has moved. If the content of the macroblock in the current frame can be located in a reference frame, then it does not need to be reproduced. The content can be represented by a “motion vector” indicating its displacement in the current frame from its position in the reference frame and the difference between the two macroblocks. Prediction coding techniques may be applied to the motion vector and the difference information before being recorded or transmitted.
A frame with Intra-frame prediction coding that is performed without reference to an adjacent frame is called an “I-frame,” or an “intra-coded video frame.” A frame with Inter-picture prediction coding that is performed with reference to a single frame is called a “P-frame.” A frame with Inter-picture prediction coding that is performed by referring simultaneously to two frames is called a “B-frame.” Two frames whose display time is either forward or backward to that of a current frame can be selected arbitrarily as reference for coding a B-frame. The reference frames can be specified for each macroblock of the frame. Both a P-frame and a B-frame is referred to as an “inter-coded video frame.”
Decoding of an Inter-frame with prediction coding cannot be accomplished unless the frame upon which the current frame references has been previously decoded. Hence, the video sequence in the form of downloaded files or streaming media cannot be played back instantaneously. Instead, decoding may start only at a Random Access Points (RAP) within the video sequence. A RAP is a frame that can be decoded without relying on a reference frame, such as an I-frame. In video streaming applications, the inability to decode the video sequence instantaneously may adversely impact the user's experience. For example, when a user is channel surfing, an undesirable delay may be encountered on each channel as the decoder waits for a RAP to join the video sequence.
One possible solution is to increase the number of RAPs in the video stream. This solution, however, reduces the compression of the video sequence causing the data rate to increase significantly. Accordingly, there is a need in the art for an improved method for acquiring a video sequence without comprising video compression.
An aspect of a wireless communications device is disclosed. The wireless communications device includes a receiver configured to receive a plurality of video streams each comprising intra-coded and inter-coded video frames, a video decoder, and a processing unit configured to switch the video streams to the video decoder. The processing unit is further configured to receive a prompt to switch from a first one of the video streams to a second one of the video streams, and in response to the prompt, delay switching to the second one of the video streams until an intra-coded video frame is received in the second one of the video streams.
Another aspect of a wireless communications device is disclosed. The wireless communications device includes receiving means for receiving a plurality of video streams each comprising intra-coded and inter-coded video frames, decoding means for decoding video, and switching means for switching the video streams to the decoding means, the switching means being configured to receive a prompt to switch from a first one of the video streams to a second one of the video streams, and in response to the prompt, delay switching to the second one of the video streams until an intra-coded video frame is received in the second one of the video streams.
An aspect of a method for communicating is disclosed. The method includes receiving a plurality of video streams each comprising intra-coded and inter-coded video frames, decoding a first one of the video streams, receiving a prompt to decode a second one of the video streams, and in response to the prompt, delay switching to the second one of the video streams to decode until an intra-coded video frame is received in the second one of the video streams.
An aspect of a computer program product is disclosed. The computer program product includes computer-readable medium. The computer-readable medium includes switching code to cause a computer to switch a plurality of video streams to a video decoder, each of the video streams comprising intra-coded and inter-coded video frames, the switching code further being configured to receive a prompt to switch from a first one of the video streams to a second one of the video streams, and in response to the prompt, delay switching to the second one of the video streams until an intra-coded video frame is received in the second one of the video streams.
Various aspects of a wireless communications system are illustrated by way of example, and not by way of limitation, in the accompanying drawings, wherein:
The detailed description set forth below in connection with the appended drawings is intended as a description of various configurations of the invention and is not intended to represent the only configurations in which the invention may be practiced. The detailed description includes specific details for the purpose of providing a thorough understanding of the invention. However, it will be apparent to those skilled in the art that the invention may be practiced without these specific details. In some instances, well known structures and components are shown in block diagram form in order to avoid obscuring the concepts of the invention.
Various techniques described throughout this disclosure will be described in the context of a wireless multimedia broadcast system. As used herein, “broadcast” and “broadcasting” refer to transmission of multimedia content to a group of users of any size and includes broadcast, anycast, multicast, unicast, datacast, and/or any other suitable communication session. One example of such a broadcast system is Qualcomm's MediaFLO technology. Media-FLO uses an orthogonal frequency division multiplexing (OFDM)-based air interface designed specifically for multicasting a significant volume of rich multimedia content cost effectively to wireless subscribers. MediaFLO is merely an example of the type of multimedia broadcast system described herein and other, functionally equivalent multimedia broadcast systems are contemplated as well.
The broadcast system 100 is shown with a distribution center 102 which serves as an access point for various content providers 104. A content provider is a company, media center, server, or other entity capable of providing content to a number of wireless communication devices 106 through the broadcast system 100. The content from a content provider 104 is commonly referred to as a service. A service is an aggregation of one or more independent data components. Each independent data component of a service is called a flow and may include a video component, audio component, text component, signaling component, or some other component of a service. Each flow is carried in a stream. The streams for each service are transmitted through the physical layer of the broadcast system 100 on a media logical channel. In this example, the distribution center 102 is responsible for mapping the media streams to each media logical channel for distribution to wireless communication devices through a distribution network 108. A wireless communications device 106 may be a mobile telephone, a personal digital assistant (PDA), a mobile television, a personal computer, a laptop computer, a game console, or other device capable of receiving multimedia content.
The video encoder 200 includes a subtractor 204 which computes the differences between a video frame and a reference frame stored in memory 206. The differences are computed on a macroblock-by-macroblock basis using a motion estimator 208 and a motion compensator 210. The motion estimator 208 receives a macroblock from a current video frame and searches a reference frame in memory 206 for a corresponding macroblock. Once located, the motion estimator 208 generates a motion vector to represent the displacement of the macroblock in the current video frame from its position in the reference frame. The motion vector is used by the motion compensator 210 to retrieve from memory 206 the corresponding macroblock from the reference frame, which is then subtracted from the macroblock from the current video frame to produce residual information (i.e., information representing the difference between the two). The residual information is transformed by a Discrete Cosine Transform (DCT) 212 into discrete spatial frequency coefficients, quantized by a quantizer 214, and provided to a coding unit 216 for further compression.
The current video frame processed by the video encoder 200 should be stored in the memory 206 so that it can be used later as a reference frame. Instead of simply copying the current video frame into memory 206, the quantized transform coefficients are processed by an inverse quantizer 217 and an inverse transform 218 before being summed with the macroblocks of the reference frame by an adder 220. This process ensures that the contents of the current video frame stored in memory 206 is identical to the frame reconstructed by the wireless communication devices.
A RAP generator 202 instructs the motion estimator 208 and motion compensator 210 to code each frame of video as an I, P, or B-frame. In the case of an I-frame, the motion estimator 208 does not need to generate a motion vector and the motion compensator 210 does not need to retrieve macroblocks of a reference frame from memory 206. Rather, the macroblocks for the current video frame are passed directly through the substractor 106 to the DCT 212. The RAP generator 202 also provides a signal to the coding unit 216 indicating whether the frame is an I, P, or B frame. The signal is part of a header for each video frame.
A wireless communication device 106 moving through the broadcast system 100 may be configured to receive a service containing one or more streams from the distribution network 108 using any suitable wireless interface. One non-limiting example of a wireless interface uses multiple subcarriers, such as orthogonal frequency division multiplexing (OFDM). OFDM is a multi-carrier modulation technique effectively partitions overall system bandwidth into multiple (N) sub-carriers. These sub-carriers, which are also referred to as tones, bins, frequency channels, etc., are spaced apart at precise frequencies to provide orthogonality. Content may be modulated onto the sub-carriers by adjusting each sub-carrier's phase, amplitude or both. Typically, quadrature phase shift keying (QPSK) or quadrature amplitude modulation (QAM) is used, but other modulation schemes may also be used.
In OFDM wireless interfaces, content is generally broadcast in super-frames.
The protocol stack for the multimedia broadcast system described thus far includes an application layer, which resides above a stream layer, which resides above a medium access control (MAC) layer, which resides above a physical layer. The application layer controls the broadcast of the multimedia content, access to the content, encoding, and so on. The stream layer provides binding of application layer packets to the media streams on the media logical channels. The MAC layer performs multiplexing of packets for the different media streams associated with each media logical channel. The physical layer provides a mechanism to broadcast the media streams through various communication channels in the multimedia broadcast system.
The MAC layer forms a MAC capsule for the media logical channel for each super-frame. The MAC capsule includes a MAC capsule header and MAC capsule payload. The MAC capsule header carries embedded overhead information for the media logical channel, which includes the location of the media logical channel in the next super-frame. The MAC capsule payload carries the stream layer packets to be broadcast in the super-frame for the media logical channel.
The MAC layer also fragments the MAC capsule into multiple MAC packets. In this example, the MAC capsule header and signaling stream packet are divided into N0 MAC packets, the video stream packet is divided into N1 MAC packets, and the audio stream packet is divided into N2 MAC packets. To facilitate independent reception of the media streams, each stream layer packet is sent in an integer number of MAC packets.
The MAC layer also performs block encoding on the MAC packets for the media logical channel and generates NP parity MAC packets. The parity MAC packets are appended to the block of MAC packets to create an encoded MAC capsule. The physical layer receives the encoded MAC capsule and processes (e.g., encodes, interleaves, and symbol maps) each MAC packet to generate a corresponding physical layer packet.
The data processor 608 also receives composite overhead information to be sent at the start of each super-frame from the controller 612. The data processor 608 processes the composite overhead information in accordance with a mode for the composite overhead information to produce a stream of overhead symbols. The mode used for the composite overhead information is typically associated with a lower code rate and/or a lower order modulation scheme than that used for the media streams to ensure robust reception of the composite overhead information.
A channelizer 614 multiplexes the data, overhead, and pilot symbols into time slots within the super-frame. The time slots are assigned by a scheduler 610. An OFDM modulator 616 converts the composite symbol stream into N parallel streams and performs OFDM modulation on each set of N symbols to produce a stream of OFDM symbols to an analog front end (AFE) 606. The AFE 606 conditions (e.g., converts to analog, filters, amplifies, and frequency upconverts) the OFDM symbol stream and generates a modulated signal that is broadcast from an antenna 618.
The processing unit 704 is shown with various blocks to illustrate its functionality. These functional blocks may be implemented in hardware, software, firmware, middleware, microcode, hardware description languages, or any combination thereof. Each functional block may be implemented separately, integrated with one or more functional blocks, or integrated with one or more other entities not shown.
When implemented in hardware, either in whole or part, the processor may be implemented within one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable gate arrays (FPGAs), controllers, micro-controllers, state machines, discrete gate or transistor logic, discrete hardware components, or any combination thereof to perform some or all of the processor functions described herein.
When implemented in software, firmware, middleware or microcode, in whole or part, the processor may be implemented with a special purpose or general purpose computer, and may also include computer-readable media for carrying or having program code or instructions that, when executed, performs some or all of the processor functions described herein. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a computer. By way of example, and not limitation, such computer-readable media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. Also, any connection is properly termed a computer-readable medium. For example, if the software is transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. Disk and disc, as used herein, includes compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk and blu-ray disc where “disks” usually reproduce data magnetically, while “discs” reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
Referring to
The controller 714 receives a selection from the user interface 716 for a service. The controller 714 then determines the time slot assignment for the media logical channel carrying the service based on the composite overhead information broadcast at the start of the current super-frame. The controller 714 then provides a control signal to the dechannelizer 710. The dechannelizer 710 performs demultiplexing of the data and overhead symbol estimates and provides the demultiplexed data and overhead symbol estimates to a data processor 712. The data processor 712 processes (e.g., symbol demaps, deinterleaves, and decodes) the overhead symbol estimates in accordance with the mode used for the composite overhead information and provides the processed overhead information to the controller 714. The data processor 712 also processes the data symbol estimates for the media logical channel carrying the service selected by the user, in accordance with the mode used for each stream, and provides a corresponding processed data stream to a decoder 718.
The multimedia processor 718 enables a decoder for each media stream in the media logical channel selected by the user. By way of example, a typical service may provide a signaling stream, video stream and audio stream. In this example, the multimedia processor 718 may enable a decoder for each. In the case of a video stream, the decoder performs the inverse processing functions of the video encoder described earlier in connection with
The decoding process is used to recover the transform coefficients for each macroblock. The transform coefficients are inverse quantized and inverse transformed to extract the residual information for each macroblock in the video frames following an I-frame. Using the motion vectors to retrieve information from memory for the corresponding macroblocks in a reference frame, the pixel information for the video frame can be recovered. The pixel information for the video frame is presented to the display 720 for viewing by the user.
The user interface 716 may allow a user to surf channels of multimedia content on the wireless communications 106. When a user is channel surfing, or merely selecting a new channel, the controller 714 uses the composite overhead information broadcast at the start of the next super-frame to locate the media logical channel for the new service selected by the user. The controller 714 then switches the channel by prompting the dechannelizer 710 to select the data and overhead symbol estimates for the media streams contained in the new media logical channel and provide the selected symbol estimates to the data and multimedia processors 712 and 718. The multimedia processor 718 then searches the video stream for an I-frame to begin the decoding process, which may result in an undesirable delay in presenting new content to the display 712.
Various techniques may be employed to reduce or eliminate this delay. By way of example, a broadcast scheme may be implemented wherein the timing of the I-frames is known, a priori, by the wireless communications device 106. The timing of the I-frames may be determined in a number of ways. One way is to synchronize the application layer to the physical layer. Referring to
In an alternative configuration of the multimedia processor 718, the timing of the I-frames for each service can be broadcast in an overhead channel. By way of example, the overhead information 206 (see
The previous description is provided to enable any person skilled in the art to practice the various embodiments described herein. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments. Thus, the claims are not intended to be limited to the embodiments shown herein, but is to be accorded the full scope consistent with the language claims, wherein reference to an element in the singular is not intended to mean “one and only one” unless specifically so stated, but rather “one or more.” All structural and functional equivalents to the elements of the various embodiments described throughout this disclosure that are known or later come to be known to those of ordinary skill in the art are expressly incorporated herein by reference and are intended to be encompassed by the claims. Moreover, nothing disclosed herein is intended to be dedicated to the public regardless of whether such disclosure is explicitly recited in the claims. No claim element is to be construed under the provisions of 35 U.S.C. §112, sixth paragraph, unless the element is expressly recited using the phrase “means for” or, in the case of a method claim, the element is recited using the phrase “step for.”
The present Application for patent claims priority to Provisional Application No. 60/775,441 entitled “RAPID TUNING AND SCAN VIA LAYER SYNCHRONIZATION FOR MULTIMEDIA” filed Feb. 21, 2006, and assigned to the assignee hereof and hereby expressly incorporated by reference herein.
Number | Date | Country | |
---|---|---|---|
60775441 | Feb 2006 | US |