The present invention generally relates to multi-view media data, and in particular to generation and processing of such multi-view media data.
The ongoing standardization of Multi-View Video Coding (MVC) by Moving Picture Experts Group (MPEG) [1] and Telecommunication Standardization Sector (ITU-T) Study Group 16 (SG16) is a video coding technology which encodes video sequences produced by several cameras or a camera array. MVC exploits redundancy between the multiple video views in an efficient way to provide a compact encoded video stream. MVC is based on the Advanced Video Coding (AVC) standard, also known as ITU-T H.264, and consequently the MVC bit stream syntax and semantics have been kept similar to the AVC bit stream syntax and semantics.
ISO/IEC 14496-15 [2] is an international standard designed to contain AVC bit stream information in a flexible and extensible format that facilitates management of the AVC bit stream. This standard is compatible with the MP4 File Format [3] and the 3GPP File Format [4]. All these standards are derived from the ISO Base Media File Format [5] defined by MPEG. The storage of MVC video streams is referred to as the MVC file format.
In the MVC file format, a multi-view video stream is represented by one or more video tracks in a file. Each track represents one or more views of the stream. The MVC file format comprises, in addition to the encoded multi-view video data itself, metadata to be used when processing the video data. For instance, each view has an associated view identifier implying that the MVC Network Abstraction Layer (NAL) units within one view have all the same view identifier, i.e. same value of the view_id fields in the MVC NAL unit header extensions. The MVC NAL unit header extension also comprises a priority_id field specifying a priority identifier for the NAL unit. In the proposed standards [6], a lower value of the priority_id specifies a higher priority. The priority_id is used for defining the NAL unit priority and is dependant on the bit stream as it reflects the inter-coding relationship of the video data from different views.
The priority identifiers used today merely specify inter-coding relationships of the video data from the camera views provided in the MVC file. Such encoding-related priorities are, though, of limited use for achieving a content-based processing of the video data from the different camera view.
The present embodiments overcome these and other drawbacks of the prior art arrangements.
It is a general objective to provide multi-view media data that can be more efficiently processed.
This and other objectives are met by the embodiments as defined by the accompanying patent claims.
Briefly, a present embodiment involves generating multi-view media data by providing encoded media data representative of multiple media views of a scene. Each of the media views is associated with a respective structural priority identifier. The structural priority identifier is representative of the encoding inter-relationship of the media data of the associated media view relative media data of at least another media view. Thus, the structural priority identifiers are dependent on the bit stream in so far that they relate to the encoding of the media data and provide instructions of the hierarchical level of inter-view predictions used in the media data encoding.
A content priority identifier is determined for each media view of at least a portion of the multiple media views. In clear contrast to the structural priority identifiers, a content priority identifier is representative of the rendering importance level of the media data of the associated media view. The determined content priority identifier is associated to the relevant media view, for instance by being included in one or more data packets carrying the media data of the media view or being connected to a view identifier indicative of the media view.
The encoded media data may optionally be included as one or more media tracks of a media container file. The structural priority identifiers and the content priority identifiers are then included as metadata applicable to the media track or tracks during processing of the media data.
The content priority identifiers allow a selective and differential content-based processing of the multi-view media data at a data processing device. In such a case, a media data subset of the encoded media data is selected based on the content priority identifiers and preferably also based on the structural priority identifiers. Processing of media data is then solely applied to the selected media data subset or another type of media data processing is used for the selected media data subset as compared to remaining media data.
The embodiments together with further objects and advantages thereof, may best be understood by making reference to the following description taken together with the accompanying drawings, in which:
Throughout the drawings, the same reference characters will be used for corresponding or similar elements.
The present embodiments generally relate to generation and processing of so-called multi-view media data and in particular to provision of priority information and usage of such priority information in connection with the media data processing.
Multi-view media data implies that multiple media views of a media content are available, where each such media view generates media data representative of the media content but from one of multiple available media views. A typical example of such multi-view media is multi-view video. In such a case, multiple cameras or other media recording/creating equipment or an array of multiple such cameras are provided relative a scene to record. As the cameras have different positions relative the content and/or different pointing directions and/or focal lengths, they thereby provide alternative views for the content.
As is well known in the art, video data encoding is typically based on relative pixel predictions, such as in H.261, H.263, MPEG-4 and H.264. In H.264 there are three pixel prediction methods utilized, namely intra, inter and bi-prediction. Intra prediction provides a spatial prediction of a current pixel block from previously decoded pixels of the current frame. Inter prediction gives a temporal prediction of the current pixel block using a corresponding but displaced pixel block in a previously decoded frame. Bi-directional prediction gives a weighted average of two inter predictions. Thus, intra frames do not depend on any previous frame in the video stream, whereas inter frames, including such inter frames with bi-directional prediction, use motion compensation from one or more other reference frames in the video stream.
Multi-view video coding has taken this prediction-based encoding one step further by not only allowing predictions between frames from a single camera view but also inter view prediction. Thus, a reference frame can be a frame of a same relative time instance but belonging to another camera view as compared to a current frame to encode. A combination of inter-view and intra-view prediction is also possible thereby having multiple reference frames from different camera views.
This concept of having multiple media views and inter-encoding of the media data from the media views is not necessarily limited to video data. In clear contrast, the concept of multi-view media can also be applied to other types of media, including for instance graphics, e.g. Scalable Vector Graphics (SVG). Actually, embodiments of the invention can be applied to any media type that can be represented in the form of multiple media views and where media encoding can be performed at least partly between the media views.
In the art and as disclosed in the MVC standard draft [6], priority in the form of so-called priority_id is included in the NAL unit header. In a particular case, all the NAL unit belonging to a particular view could have the same priority_id, thus giving a sole prior art priority identifier per view. These prior art priority identifiers can be regarded as so-called structural priority identifiers since the priority identifiers are indicative of the encoding inter-relationship of the media data from the different media views. For instance and with reference to
In Table I as in the proposed standard [6], a lower value of the structural priority identifier specifies a higher priority. Thus, the base view 26 is given the lowest structural priority identifier and with the two camera views 22, 28 being encoded in dependency on the base view 26 having the next lowest structural priority identifier. The camera view 22 that is being encoded in dependency on one of the camera views 24, 28 with the second lowest structural priority identifier therefore has the highest structural priority identifier of the four camera views in this example.
The structural priority identifiers are, thus, dependant on the bit stream as they reflect the inter-coding relationship of the video data from different camera views. The embodiments provide and use an alternative form of priority identifiers that are applicable to multi-view media and are instead content dependant.
This multi-view media data provision of step Si can be implemented by fetching the media data from an accessible media memory, in which the media data previously has been entered. Alternatively, the media data is received from some other external unit, where the media data has been stored, recorded or generated. A further possibility is to actually create and encode the media data, such as recording a video sequence or synthetically generating the media data.
A next step S2 determines a so-called content priority identifier for a media view of the multiple available media views. In clear contrast to the structural priority identifiers that are dependent on encoding inter-relationships between the media views, the content priority identifier determined in step S2 is indicative of a rendering importance level of the media data of the media view. Thus, the content priority identifiers are more relating to the actual media content and provide priorities to the media views relative how important the media data originating from one of the media view is in relation to the media data from the other media views.
With anew reference to
Thus, from a rendering point of view, the closer the camera view is to the most interesting portion of the football field, i.e. the goal, the higher content priority and the lower the content priority identifier of the camera view.
In an alternative approach, the higher structural/content priority of a media view, the higher structural/content priority identifier value.
The determined content priority identifier from step S2 is then associated to and assigned to the relevant media view of the multiple media views in step S3. This association can be implemented by storing the content priority identifier together with a view identifier of the media view. Alternatively, the content priority identifier is stored together with the media data from the relevant media view.
The content priority identifier is determined for at least a portion of the multiple media views, which is schematically illustrated by the line L1. This means that the loop formed by steps S2 and S3 can be conducted once so that only one of the media views have a content priority identifier. Preferably, the steps S2 and S3 are determined multiple times and more preferably once for each media view of the multiple media views. Thus, if the multi-view media data has been recorded from M media views, steps S2 and S3 can be conducted N times, where 1≦N≦M and M≧2.
The method then ends.
The content priority identifier is indicative of the rendering or play-out importance level of the media data from the media view to which the content priority identifier is associated. As was discussed above in connection with
The content priority identifiers of the embodiments can be determined by the content provider recording and/or processing, such as encoding, the multi-view media data. For instance, a manual operator can, by inspecting the recorded media data from the different media views, determine and associate content priority identifiers based on his/her opinions of which media view or views that is or are regarded as being more important for a viewing user during media rendering as compared to other media views.
The determination of content priority identifiers can also be determined automatically, i.e. without any human operations. In such a case, any of the above mentioned parameters, such as camera position, focal direction, focal length, camera resolution, can be used by a processor or algorithm for classifying the camera views into different content priority levels.
The determined content priority identifiers are, as the structural priority identifiers, typically static, implying that a single content priority identifier is associated with a camera view for the purpose of a recorded content. However, sometimes it may be possible that rendering importance level of media data from different media views may actually change over time. In such a case, content priority identifiers can be associated with a so-called time to live value or be designed to apply for limited period of time or for a limited amount of media frames. For instance, a media view could have a first content priority identifier for the first f media frames or the first m minutes of media content, while a second, different content priority identifier is used for the following media frames or the remaining part of the media data from that media view. This can of course be extended to a situation with more than one change between content priority identifiers for a media view.
The media container file can be regarded as a complete input package that is used by a media server during a media session for providing media content and forming media data into transmittable data packets. Thus, the container file preferably comprises, in addition to the media content per se, information and instructions required by the media server for performing the processing and allowing transmission of the media content during a media session.
In an embodiment, each media view has a separately assigned media track of the container file, thereby providing a one-to-one relationship between the number of media views and the number of media tracks. Alternatively, the media data of at least two, possibly all, media views can be housed in a single media track of the media container file.
The respective media data of the multiple media views, irrespective of being organized into one or more media tracks, is preferably assigned respective view identifiers associated with the media views.
A next step S11 of
The association can be in the form of a pointer from the storage location of the media data of the media view within the media container file to the storage location of the structural priority identifier, or vice versa. This pointer or metadata therefore enables, given the particular media data or its location within the media container file, identification of the associated structural priority identifier or the storage location of the structural priority identifier within the file. Instead of employing a pointer, the metadata can include a view identifier of the media data/media view. The metadata is then used to identify one of the media data to which the structural priority identifier apply.
The next step S12 of
A non-limiting example of providing content priority identifiers to a media container file is to include a box “vipr” in Sample Group Description Box of the media container file [6].
Alternatively, the box “vipr” could be provided in the Sample Entry of the media container file.
The additional steps S10 to S12 of
The structural and content priority identifiers included in the media container file in addition to the media tracks can be regarded as metadata that can be used during processing of the multi-view media data in the media tracks. Thus, the priority identifiers are applicable to and useful as additional data for facilitating the processing of the formed media container file as is further described herein.
A priority assigner 130 is implemented in the media generating device 100 for assigning content priority identifiers to one or more of the multiple media views. The content priority identifiers are indicative of the rendering importance levels of the media data of the multiple media views. The priority assigner 130 may receive the content priority identifiers from an external source, such as through the receiver 110. Alternatively, the content priority identifiers can be input manually by a content creator, in which case the priority assigner 130 includes or is connected to a user input and fetches the content priority identifiers from the user input.
In a further embodiment, the media generating device 100 comprises a priority determiner 150 connected to the priority assigner 130. The priority determiner 150 is arranged for determining a content priority identifier for at least one media view of the multiple media views. The priority determiner 150 preferably uses input parameters, such as from the media engine, the media provider 120, the receiver 110 or a user input, relating to the cameras 12-18 or equipment used for recording or generating the multi-view media data. These input parameters include at least one of camera position relative recorded scene, focal direction, focal length and camera resolution.
The determined content priority identifiers are forwarded from the priority determiner 150 to the priority assigner 130, which assigns them to the respective media views. Each media view therefore preferably receives an assigned content priority identifier by the priority assigner 130, though other embodiments merely assign the content priority identifiers to a subset of at least one media view of the multiple media views.
An optional track organizer 160 is provided in the media generating device 100 and becomes operated if the multi-view media data from the media provider 120 is to be organized into a media container file. In such a case, the track organizer organizes the encoded media data from the media provider 120 as at least one media track in the media container file.
A priority organizer 170 is preferably implemented in the media generating device 100 for organizing priority identifiers in the media container file. The priority organizer 170 therefore associatively organizes the structural priority identifiers and the content priority identifiers in the media container file relative the one or more media tracks. In such a case, the priority organizer 170 preferably stores each of the structural and content priority identifiers together with a respective view identifier representing the media view and media data to which the structural or content priority identifier applies.
The media container frame generated according to an embodiment of the media generating device 100 can be entered in the media memory 140 for a later transmission to an external unit that is to forward or process the media container file. Alternatively, the media container file can be directly transmitted to this external unit, such as a media server, transcoder or user terminal with media rendering or play-out facilities.
The units 110-130 and 150-170 of the media generating device 100 may be provided in hardware, software or a combination of hardware and software. The media generating device 100 may advantageously be arranged in a network node of a wired or preferably wireless, radio-based communication system. The media generating device 100 can constitute a part of a content provider or server or can be connected thereto.
The content priority identifiers determined and assigned to multi-view media data as discussed above provide improved content-based processing of the multi-view media data as compared to corresponding multi-view media data that merely has assigned structural priority identifiers.
For instance, assume a video recording arrangement as illustrated in
Assume a situation where media data corresponding to one of the media views has to be pruned and discarded, for instance, due to limited storage capability and/or limited bandwidth when transmitting the encoded multi-view media data.
According to the prior art techniques, media data is discarded based solely on the structural priority identifiers. This means that the media data from the media view 22 and camera 12 being positioned closest to one of the goals of the football field will be discarded as it has highest assigned structural priority identifier and therefore lowest structural priority. However, in reality this camera view 22 is typically regarded as being the most important one as it is closer to the goal and is the only camera view of the four illustrated camera views 22-28 that will capture any goal made during the football match.
However, by also utilizing the content priority identifiers in the media processing, i.e. media pruning in this example, a more correct media processing from a media rendering point of view is achieved. Thus, using only the content priority identifiers or indeed both the content priority identifiers and the structural priority identifiers in the media pruning, the media data originating from the media view 28 will be discarded as it has the highest content priority identifier and also the highest total priority identifier, i.e. content priority identifier plus structural priority identifier.
Removing media data from the media view 28 instead of the media view 22 closest to the goal is much more preferred from a viewing user's point of view when the scoring of a goal is regarded as the most interesting part to see of a football match.
The next step S21 selects a media data subset of the received multi-view media data. Thus, this step S21 selects media data corresponding to a subset of the multiple media views. As a consequence, step S21 selects media data from P media views, where 1≦P<M and M represents the total number of media views for the present multi-view media data.
The subset selection is furthermore performed at least partly based on the at least one content priority identifier associated with the media views. Step S21 can be conducted solely based on the content priority identifiers but is preferably also based on the structural priority identifiers. This is in particular advantageous when pruning or discarding media data as otherwise media data from a base view could be discarded when only regarding the content priority identifiers, thereby making the remaining media data undecodable.
The selected media data subset from step S21 is further processed in step S22. Thus, the content priority identifier of the embodiments is used to classify media data from different views to thereby achieve a differential media data processing by processing only a subset of the media data or optionally applying at least one other form of processing to remaining media data of the multi-view media data.
The method then ends.
In
Step S30 of
If a terminal has rendering capability, such as a media player, but cannot or selects not to decode and render all the multi-view media data, the content priority identifiers can be used for selecting the media data subset to decode and render in step S50 of
Data protection is often applied to media data and data packets transmitted over radio-based networks to combat the deleterious effects of fading and interferences. Generally, the higher level of data protection, the more extra or overload data is needed. It is therefore a balance between protection level and extra overhead. The content priority identifiers can advantageously be used as a basis for identifying the media data in a multi-view arrangement that should have the highest level of data protection. Thus, media data that have low content priority identifiers and are therefore regarded as being of high rendering importance can have first level of data protection in step S60 of
Examples of such data protection that can be used in connection with this embodiment are Forward Error Correction (FEC), checksum, Hamming code, Cyclic Redundancy Check (CRC), etc., which are suitable for real time transmission as any error can be corrected instantaneously.
For non-real time applications, Automatic Repeat Request (ARQ), such as in TCP/IP (Transmission Control Protocol/Internet Protocol), where retransmissions are required when error occur, can also be used for providing data protection.
Encryption is another type of high level data protection that could be considered herein. In such a case, the content priority identifiers can be used to determine to what extend the strength of encryption protection should be applied.
The content priority can also be used for providing a differential charging of provided media content. Thus, media data from media views that are regarded as being of higher rendering relevance and importance for buying viewers can be charged differently, i.e. at a higher cost, than less important media data, which has comparatively higher content priority identifiers. This concept is illustrated in
The data processing device 200 comprises a receiver 210 for receiving encoded media data representative of multiple media views of a media content. The media data, carried in a number of data packets, may be in the form of a media container file comprising, in addition to the encoded media data in at least one media track, metadata applicable during processing of the media data. This metadata comprises, among others, the structural and content priority identifiers described herein. If the multi-view media data is not provided in the form of a media container file, the media data from each media view comprises in at least one of its data packets, such as in the header thereof, the structural and content priority identifier applicable to that media view.
The data processing device 200 also comprises a media selector 220 arranged for selecting a media data subset of the received multi-view media data. The media selector 220 retrieves the content priority identifiers for the different media views associated with the media data and preferably also retrieves the structural priority identifiers. The media selector 220 uses the retrieved content priority identifiers and preferably the structural priority identifiers for identifying and selecting the particular media data subset to further process.
The further processing of the media data of the selected media data subset may be conducted by the user processing device 200 itself or by a further device connected thereto. For instance, the data processing device 200 can comprise a media pruner 250 for pruning and discarding media data corresponding to one or a subset of all media views of the multi-view media data. The media pruner 250 then prunes the media data subset selected by the media selector 220 based at least partly on the content priority identifiers.
The pruning of the media data may be required to reduce the total bit size of the multi-view media data when storing it on a media memory 230 or reducing the bandwidth when transmitting it by a transmitter 210 of the data processing device 200.
The data processing device 200 can be adapted for decoding the received media data and then render it on an included or connected display screen 280. In such a case, a decoder 245 could operate to only decode the media data subset selected by the media selector 220. The decoded media data is rendered by a media player 240 and is therefore displayed on the display screen 280. In an alternative approach, the decoder 245 may decode more media data than the selected media data subset. However, the media player 240 merely renders the media data corresponding to the media data subset selected by the media selector 220. Any non-rendered but decoded media data could be required for decoding at least some of the media data in the selected media data subset due to any inter-view predictive encoding/ decoding.
The units 210, 220, 240 and 250 of the data processing device 200 may be provided in hardware, software or a combination of hardware and software.
A protection applier 360 is optionally provided in the data processing device for applying differential levels of data protection to the data packets carrying the multi-view media data. This differential data protection allows the protection applier to apply a first level of data protection to data packets carrying media data of the media data subset selected by the media selector 320. Correspondingly, a second, different or multiple different levels of data protection are then applied to the data packets carrying the remainder of the media data.
An optional charging applier 370 can be arranged in the data processing device 300 for providing charging information applicable to the multi-view media data. A differentiated cost for media data from different media views is then preferably used by the charging applier 370 using the content priority identifiers. Thus, the charging applier 370 determines a first charging cost for the media data of the media data subset selected by the media selector 320. At least a second, different charging cost is correspondingly determined for the remainder of the media data.
The units 310, 320 and 350-370 of the data processing device 300 may be provided in hardware, software or a combination of hardware and software.
In
It will be understood by a person skilled in the art that various modifications and changes may be made to the present invention without departure from the scope thereof, which is defined by the appended claims.
[1] ISO/IEC JTC1/SC29/WG11—Coding of Moving Pictures and Audio, MPEG-4 Overview, July 2000
[2] ISO/IEC 14496-15:2004—Information Technology, Coding of Audio-Visual Objects, Part 15: Advanced Video Coding (AVC) File Format
[3] ISO/IEC 14496-14:2003—Information Technology, Coding of Audio-Visual Objects, Part 14: MP4 File Format
[4] 3GPP TS 26.244 V7.3.0—3rd Generation Partnership Project; Technical Specification Group Services and System Aspects; Transparent end-to-end packet switched streaming service (PSS); 3GPP file format, 2007
[5] ISO/IEC 14496-12:2005—Information Technology, Coding of Audio-Visual Objects, Part 12: ISO Base Media File Format
[6] ISO/IEC 14496-15, Working Draft 2.0 MVC File Format, July 2008, Hannover, Germany, Document No. 10062
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/SE2008/051459 | 12/15/2008 | WO | 00 | 4/5/2011 |
Number | Date | Country | |
---|---|---|---|
61103399 | Oct 2008 | US |