The present invention relates to the field of video encoding and decoding, and more specifically to scalable video data processing on a fine granularity scalability basis.
Conventional video coding standards (e.g. MPEG-1, H.261/263/264) incorporate motion estimation and motion compensation to remove temporal redundancies between video frames. These concepts are very familiar for skilled readers with a basic understanding of video coding, and will not be described in detail.
The scalable extension to H.264/AVC, which is here incorporated by reference in addition with the H.264/AVC video coding standard, currently enables fine-grained scalability, according to which the quality of a video sequence may be improved by increasing the bit rate in increments of 10% or less. According to the traditional implementation, each FGS (Fine Granularity Scalability) slice must cover the same spatial region as the corresponding slice in its “base layer picture”, i.e. the starting macroblock and the size in number of macroblocks of an FGS slice must be the same as the corresponding slice in its “base layer picture”. Consequently, each FGS plane must have the same number of slices as the “base layer picture”.
The constraint, according to the present state of the art, that each FGS slice must cover the same spatial region as the corresponding slice in its “base layer picture” takes effect on the NAL (Network Abstraction Layer) unit sizes hence disable optimal transport according to known packet loss rate and protocol data unit (PDU) size. Furthermore, the constraint disallows region-of-interest (ROI) FGS enhancement, wherein those interested regions may have better quality than other regions.
The object of the present invention is to provide a methodology, a device, and a system for efficiently encoding or decoding, respectively, which overcomes the above mentioned problems of the state of the art and provides an effective and qualitatively improved coding.
The main advantages resides in that an FGS slice can be coded such that the starting macroblock position and the size in number of macroblocks can be decided according to the requirement for optimal transport, for example, such that the size of the slice in number of bytes is close but never exceeds the protocol data unit (PDU) size in bytes, and in that an FGS slice may be coded such that it covers the interested region that is more important or part thereof, and it is coded in a higher quality than non-important regions, or alternatively, only FGS slices covering the interested region are encoded and transmitted.
According to the present invention the constraint that each FGS slice must cover the same spatial region as the corresponding slice in its “base layer picture” is removed. Rather, the region covered by an FGS slice (i.e. the starting macroblock and the size in number of macroblocks) is independent of its base layer picture. Accordingly, a FGS slice may be coded in the way that the starting macroblock and the number of macroblocks are independent from its base picture layer.
Accordingly, any application that applies scalable video coding, wherein FGS slices are supported, will benefit from the inventive step of the present invention.
The objects of the present invention are solved by the subject matter defined in the accompanying independent claims.
According to a first aspect of the present invention, a method for scalable encoding of video data is provided. Said method comprises the following operations: obtaining said video data, generating a base layer based on said obtained video data, generating at least one corresponding scalable enhancement layer depending on said video data and said base layer, wherein said at least one enhancement layer comprises fine granularity scalability (FGS) information based on one or more enhancement FGS-slices, said FGS-slices describing certain regions within said base layer; and defining at least one of said one or more generated enhancement FGS-slices in such manner that said at least one generated enhancement FGS-slice covers a different region than the region covered by the corresponding slice in the base layer picture; and encoding said base layer and said at least one enhancement layer resulting in encoded video data.
Thus it is now achieved to provide a method for flexible coding of FGS slices in the sense that the region covered by an FGS slice (i.e. the starting macroblock and the size in number of macroblocks) is independent of its base layer picture. And consequently, each FGS plane can have a different number of slices than the “base layer picture”.
According to an embodiment of the present invention, said at least one FGS enhancement layer comprises progressive refinement slices as specified in the scalable extension to the H.264/AVC video coding standard. Thus, standard conform encoding may be implemented.
According to another embodiment of the present invention, said generating of said base layer and said enhancement layers is based on motion information within said video data, said motion information being provided by a motion estimation process.
According to another embodiment of the present invention, said encoded video data does not comprise FGS-slices covering a non-interested region. Therein, conventional coding is enabled.
According to another embodiment of the present invention, said FGS-slices relate to certain regions of interest of individual pictures within said video data.
According to another embodiment of the present invention, said FGS-slice is encoded such that its size in bytes is close to but less than a pre-determined value.
According to another embodiment of the present invention, said FGS-slice is associated with a variable that indicates the number of macroblocks in the FGS-slice.
According to another embodiment of the present invention, said variable is used to control the encoding of syntax elements in the FGS-slice.
According to another aspect of the present invention, a method for scalable decoding of encoded video data is provided. Said method comprises the following operations: obtaining said encoded video data, identifying a base layer and a plurality of enhancement layers within said encoded video data, determining fine granularity scalability (FGS) information relating to said base layer within said plurality of enhancement layers, wherein said FGS-information comprises at least one FGS-slice describing certain regions within said base layer and at least one of said FGS-slices covers a different region than the region covered by said the corresponding slice in the base layer picture, decoding said encoded video data comprising said base layer, said plurality of enhancement layers and said FGS-information resulting in decoded video data.
According to another embodiment of the present invention, said FGS-slice is associated with a variable that indicates the number of macroblocks in the FGS-slice.
According to another embodiment of the present invention, said variable is used to control the decoding of syntax elements in the FGS-slice.
According to another aspect of the present invention a device, operative according to the above mentioned methods is provided.
According to another asepct of the present invention a system for supporting data transmission according to the above mentioned methods is provided.
According to another aspect of the present invention, a data transmission system, including at least one encoding device and at least one decoding device is provided.
According to another aspect of the present invention, a computer program product comprising a computer readable storage structure embodying computer program code thereon for execution by a computer processor hosted by an electronic device is provided, wherein said computer program code comprises instructions for performing a method according to any of the above mentioned methods.
According to another aspect of the present invention, a computer program product comprising a computer readable storage structure embodying computer program code thereon for execution by a computer processor hosted by an electronic device is provided, wherein said computer program code comprises instructions for performing a method according to anyone of the above mentioned methods.
According to another aspect of the present invention, an apparatus for scalable encoding of video data is provided, wherein said module comprises: a component for obtaining said video data, a component for generating a base layer based on said obtained video data, a component for generating at least one corresponding scalable enhancement layer depending on said video data and said base layer, wherein said at least one enhancement layer comprises fine granularity scalability (FGS) information based on one or more enhancement FGS-slices, said FGS-slices describing certain regions within said base layer; and a component for defining at least one of said one or more generated enhancement FGS-slices in such manner that said at least one generated enhancement FGS-slice covers a different region than the region covered by said the corresponding slice in the base layer picture; and a component for encoding said base layer and said at least one enhancement layer resulting in encoded video data.
According to another aspect of the present invention, an apparatus for scalable decoding of encoded video data is provided, said module comprising: a component for obtaining said encoded video data, a component for identifying a base layer and a plurality of enhancement layers within said encoded video data, a component for determining fine granularity scalability (FGS) information relating to said base layer within said plurality of enhancement layers, wherein said FGS-information comprises at least one FGS-slice describing certain regions within said base layer and at least one of said FGS-slices covers a different region than the region covered by said the corresponding slice in the base layer picture, a component for decoding said encoded video data by combining said base layer, said plurality of enhancement layers and said FGS-information resulting in decoded video data.
According to another aspect of the present invention, a data transmission system is provided including at least one encoding device for carrying out a method for scalable encoding video data. The video data is obtained and a base layer based on said obtained video data is generated. At least one corresponding scalable enhancement layer depending on said video data and said base layer is generated. The at least one enhancement layer comprises fine granularity scalability (FGS) information based on one or more enhancement FGS-slices generated. The FGS-slices describes certain regions within said base layer. At least one of said one or more generated enhancement FGS-slices is defined in such manner that said at least one generated enhancement FGS-slice covers a different region than a region covered by a corresponding slice in the base layer. The base layer and said at least one enhancement layer are encoded resulting in encoded video data.
The data transmission system further comprises a decoding device for carrying out a method for scalable decoding of encoded video data. The encoded video data is obtained and a base layer and a plurality of enhancement layers is identified within said encoded video data. Fine granularity scalability (FGS) information relating to said base layer within said plurality of enhancement layers is determined. The FGS-information comprises at least one FGS-slice describing certain regions within said base layer and at least one of said FGS-slices covers a different region than a region covered by a corresponding slice in the base layer. The encoded video data is decoded by combining said base layer. The plurality of enhancement layers and the FGS-information result in decoded video data.
Advantages of the present invention will become apparent to the reader of the present invention when reading the detailed description referring to embodiments of the present invention, based on which the inventive concept is easily understandable.
Throughout the detailed description and the accompanying drawings same or similar components, units, or devices will be referenced by same reference numerals for clarity purposes.
The accompanying drawings are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification. The drawings illustrate embodiments of the present invention and together with the description serve to explain the principles of the invention. In the drawings,
Even though the invention is described above with reference to embodiments according to the accompanying drawings, it is clear that the invention is not restricted thereto but it can be modified in several ways within the scope of the appended claims.
In the following description of the various embodiments, reference is made to the accompanying drawings which form a part thereof, and in which is shown by way of illustration various embodiments in which the invention may be practiced. It is to be understood that other embodiments may be utilized and structural and functional modifications may be made without departing from the scope of the invention. Wherever possible same reference numbers are used throughout drawings and description to refer to similar or like parts.
To enable the coding of an FGS slice in accordance with one embodiment of the present invention, a variable indicating the number of macroblocks in the slice (for instance “num_mbs_in_slice”) may be signaled in the slice header, and used in the FGS slice data syntax for enhanced coding or decoding respectively.
According to the present invention said variable is used to control encoding or decoding, respectively of syntax elements within the FGS-slice.
Therefore, it is now possible to encode or decode FGS-slices so that the region, which is described by the FGS-slice in question, is independent of its corresponding base layer picture. Thus, each FGS plane can have a different number of slices than the “base layer picture”. Additionally, there is a direct link between the number of macroblocks in the slice and the slice header used for further implementation purposes.
The mobile device 10 may communicate over a voice network and/or may likewise communicate over a data network, such as any public land mobile networks (PLMNs) in the form of e.g. digital cellular networks, especially GSM (global system for mobile communication) or UMTS (universal mobile telecommunications system). Typically the voice and/or data communication is operated via an air interface, i.e. a cellular communication interface subsystem in cooperation with further components (see above) to a base station (BS) or Node B (not shown) being part of a radio access network (RAN) of the infrastructure of the cellular network. The cellular communication interface subsystem as depicted illustratively with reference to
After any required network registration or activation procedures have been completed, which may involve the subscriber identification module (SIM) 210 required for registration in cellular networks, the mobile device 10 may then send and receive communication signals, including both voice and data signals, over the wireless network. Signals received by the antenna 129 from the wireless network are routed to the receiver 121, which provides for such operations as signal amplification, frequency down conversion, filtering, channel selection, and analog to digital conversion. Analog to digital conversion of a received signal allows more complex communication functions, such as digital demodulation and decoding, to be performed using the digital signal processor (DSP) 120. In a similar manner, signals to be transmitted to the network are processed, including modulation and encoding, for example, by the digital signal processor (DSP) 120 and are then provided to the transmitter 122 for digital to analog conversion, frequency up conversion, filtering, amplification, and transmission to the wireless network via the antenna 129.
The microprocessor/microcontroller (μC) 100, which may also designated as a device platform microprocessor, manages the functions of the mobile device 10. Operating system software 149 used by the processor 110 is preferably stored in a persistent store such as the non-volatile memory 140, which may be implemented, for example, as a Flash memory, battery backed-up RAM, any other non-volatile storage technology, or any combination thereof. In addition to the operating system 149, which controls low-level functions as well as (graphical) basic user interface functions of the mobile device 10, the non-volatile memory 140 includes a plurality of high-level software application programs or modules, such as a voice communication software application 142, a data communication software application 141, an organizer module (not shown), or any other type of software module (not shown). These modules are executed by the processor 100 and provide a high-level interface between a user of the mobile device 10 and the mobile device 10. This interface typically includes a graphical component provided through the display 135 controlled by a display controller 130 and input/output components provided through a keypad 175 connected via a keypad controller 170 to the processor 100, an auxiliary input/output (I/O) interface 200, and/or a short-range (SR) communication interface 180. The auxiliary I/O interface 200 comprise especially USB (universal serial bus) interface, serial interface, MMC (multimedia card) interface and related interface technologies/standards, and any other standardized or proprietary data communication bus technology, whereas the short-range communication interface radio frequency (RF) low-power interface including especially WLAN (wireless local area network) and Bluetooth communication technology or an IRDA (infrared data access) interface. The RF low-power interface technology referred to herein should especially be understood to include any IEEE 801.xx standard technology, which description is obtainable from the Institute of Electrical and Electronics Engineers. Moreover, the auxiliary I/O interface 200 as well as the short-range communication interface 180 may each represent one or more interfaces supporting one or more input/output interface technologies and communication interface technologies, respectively. The operating system, specific device software applications or modules, or parts thereof, may be temporarily loaded into a volatile store 150 such as a random access memory (typically implemented on the basis of DRAM (direct random access memory) technology for faster operation. Moreover, received communication signals may also be temporarily stored to volatile memory 150, before permanently writing them to a file system located in the non-volatile memory 140 or any mass storage preferably detachably connected via the auxiliary I/O interface for storing data. It should be understood that the components described above represent typical components of a traditional mobile device 10 embodied herein in form of a cellular phone. The present invention is not limited to these specific components and their implementation depicted merely for the way for illustration and sake of completeness.
An exemplary software application module of the mobile device 10 is a personal information manager application providing PDA (Personal Digital Assistant) functionality including typically a contact manager, calendar, a task manager, and the like. Such a personal information manager is executed by the processor 100, may have access to the components of the mobile device 10, and may interact with other software application modules. For instance, interaction with the voice communication software application allows for managing phone calls, voice mails, etc., and interaction with the data communication software application enables for managing SMS (soft message service), MMS (multimedia service), e-mail communications and other data transmissions. The non-volatile memory 140 preferably provides a file system to facilitate permanent storage of data items on the device including particularly calendar entries, contacts etc. The ability for data communication with networks, e.g. via the cellular interface, the short-range communication interface, or the auxiliary I/O interface enables upload, download, synchronization via such networks.
The application modules 141 to 149 represent device functions or software applications that are configured to be executed by the processor 100. In most known mobile devices, a single processor manages and controls the overall operation of the mobile device as well as all device functions and software applications. Such a concept is applicable for today's mobile devices. Especially the implementation of enhanced multimedia functionalities includes for example reproducing of video streaming applications, manipulating of digital images, and video sequences captured by integrated or detachably connected digital camera functionality but also gaming applications with sophisticated graphics drives the requirement of computational power. One way to deal with the requirement for computational power, which has been pursued in the past, solves the problem for increasing computational power by implementing powerful and universal processor cores. Another approach for providing computational power is to implement two or more independent processor cores, which is a well known methodology in the art. The advantages of several independent processor cores can be immediately appreciated by those skilled in the art. Whereas a universal processor is designed for carrying out a multiplicity of different tasks without specialization to a pre-selection of distinct tasks, a multi-processor arrangement may include one or more universal processors and one or more specialized processors adapted for processing a predefined set of tasks. Nevertheless, the implementation of several processors within one device, especially a mobile device such as mobile device 10, requires traditionally a complete and sophisticated re-design of the components.
In the following, the present invention will provide a concept which allows simple integration of additional processor cores into an existing processing device implementation enabling the omission of expensive complete and sophisticated redesign. The inventive concept will be described with reference to system-on-a-chip (SoC) design. System-on-a-chip (SoC) is a concept of integrating at least numerous (or all) components of a processing device into a single high-integrated chip. Such a system-on-a-chip can contain digital, analog, mixed-signal, and often radio-frequency functions—all on one chip. A typical processing device comprise of a number of integrated circuits that perform different tasks. These integrated circuits may include especially microprocessor, memory, universal asynchronous receiver-transmitters (UARTs), serial/parallel ports, direct memory access (DMA) controllers, and the like. A universal asynchronous receiver-transmitter (UART) translates between parallel bits of data and serial bits. The recent improvements in semiconductor technology caused that very-large-scale integration (VLSI) integrated circuits enable a significant growth in complexity, making it possible to integrate numerous components of a system in a single chip. With reference to
Additionally, said device 10 is equipped with a module for scalable encoding 105 and decoding 106 of video data according to the inventive operation of the present invention. By means of the CPU 100 said modules 105, 106 may be individually be used. However, said device 10 is adapted to perform video data encoding or decoding respectively. Said video data may be received by means of the communication modules of the device or it also may be stored within any imaginable storage means within the device 10.
With reference to
If no further processing is needed the operational sequence may come to an end operation S490, and may be restarted according to a new iteration.
On the basis of the received FGS-slices, base layer and enhancement layers the decoder is adapted to reconstruct the original sequence S530. According to the inventive step of the present invention the received FGS-information may be used for certain regions of interests within the base layer picture.
If no further processing is needed the operational sequence may come to an end operation S590, and may be restarted according to a new iteration.
With reference to
Even though the invention is described above with reference to embodiments according to the accompanying drawings, it is clear that the invention is not restricted thereto but it can be modified in several ways within the scope of the appended claims.
This application claims priority from U.S. Provisional Application Ser. No. 60/671,155 filed Apr. 13, 2005 and U.S. Provisional Application Ser. No. 60/676,243 filed Apr. 29, 2005.
Number | Date | Country | |
---|---|---|---|
60671155 | Apr 2005 | US | |
60676243 | Apr 2005 | US |