This application is a U.S. National Phase of International Patent Application No. PCT/JP2015/075318 filed on Sep. 7, 2015, which claims priority benefit of Japanese Patent Application No. JP 2014-187085 filed in the Japan Patent Office on Sep. 12, 2014. Each of the above-referenced applications is hereby incorporated herein by reference in its entirety.
The present technology relates to a transmission device, a transmission method, a reception device, and a reception method, and more particularly relates to a transmission device and the like related to a technology for transmitting audio data of a plurality of types.
Conventionally, as a stereophonic (3D) sound technology, there has been a proposed technology for mapping encoded sample data with a speaker existing at an arbitrary location on the basis of metadata to render (for example, see Patent Document 1).
When object encoded data composed of encoded sample data and metadata is transmitted together with channel encoded data such as 5.1 channel and 7.1 channel, this may provide a sound reproduction with more realistic surrounding effect in a reception side.
An object of the present technology is to reduce a process load in the reception side when a plurality of types of encoded data is transmitted.
A concept of the present technology lies in a transmission device includes:
a transmitting unit configured to transmit a metafile having meta information used to acquire, in a reception device, a predetermined number of audio streams including a plurality of groups of encoded data; and
an information inserting unit configured to insert, to the metafile, attribute information indicating each attribute of the encoded data of the plurality of groups.
In the present technology, a transmitting unit transmits a metafile having meta information used to acquire, in a reception device, a predetermined number of audio streams including a plurality of groups of encoded data. For example, the encoded data of the plurality of groups may include one of or both of channel encoded data and object encoded data.
An information inserting unit inserts, to the metafile, attribute information indicating each attribute of the encoded data of the plurality of groups. For example, the metafile may be a media presentation description (MPD) file. In this case, for example, the information inserting unit may insert the attribute information to the metafile by using “Supplementary Descriptor.”
Further, for example, the transmitting unit may transmit the metafile via an RF transmission path or a communication network transmission path. Further, for example, the transmitting unit may further transmit a container in a predetermined format having the predetermined number of audio streams including the plurality of groups of the encoded data. The container is an MP4, for example. According to the present technology report, the MP4 indicates an ISO base media file format (ISOBMFF) (ISO/IEC 14496-12:2012).
In this manner, according to the present technology, to a metafile having meta information used to acquire, in a reception device, a predetermined number of audio streams including a plurality of groups of encoded data, attribute information indicating each attribute of the encoded data of the plurality of groups is inserted. Thus, a reception side can easily recognize each attribute of the encoded data of the plurality of groups before the relevant encoded data is decoded, so that encoded data of a necessary group can be selectively decoded and used, and an a process load can be reduced.
Here, according to the present technology, for example, the information inserting unit may further insert, to the metafile, stream correspondence relation information indicating in which audio stream the encoded data of the plurality of groups is included respectively. In this case, for example, stream correspondence relation information may be information indicating a correspondence relation between group identifiers that identify the respective pieces of encoded data of the plurality of groups and identifiers that identify the respective streams of the predetermined number of audio streams. In this case, the reception side can easily recognize the audio stream including the encoded data of the necessary group and this can reduce the process load.
Further, another concept of the present technology lies in a reception device including:
a receiving unit configured to receive a metafile having meta information used to acquire, in the reception device, a predetermined number of audio streams including a plurality of groups of encoded data,
the metafile including inserted attribute information that indicates each attribute of the encoded data of the plurality of groups; and
a processing unit configured to process the predetermined number of audio streams on the basis of the attribute information.
According to the present technology, a receiving unit receives the metafile. The metafile includes meta information used to acquire, in the reception device, a predetermined number of audio streams including a plurality of groups of encoded data. For example, the encoded data of the plurality of groups may include one of or both of channel encoded data and object encoded data. To the metafile, attribute information indicating each attribute of the encoded data of the plurality of groups is inserted. A processing unit processes the predetermined number of audio streams on the basis of the attribute information.
In this manner, according to the present technology, the process on the predetermined number of audio streams is performed on the basis of the attribute information indicating each attribute of the encoded data of the plurality of groups inserted in the metafile. Thus, only the encoded data of the necessary group can be selectively decoded and used, and this can reduce the process load.
Here, according to the present technology, for example, the metafile may further include stream correspondence relation information indicating in which audio stream the encoded data of the plurality of groups is included respectively, and the processing unit may process the predetermined number of audio streams on the basis of the stream correspondence relation information as well as the attribute information. In this case, audio stream that includes the encoded data of the necessary group can be easily recognized, and this can reduce the process load.
Further, according to the present technology, for example, the processing unit may selectively perform a decode process on the audio stream including encoded data of a group having an attribute compatible with a speaker configuration and user selection information on the basis of the attribute information and the stream correspondence relation information.
Further, still another concept of the present technology lies in a reception device including:
a receiving unit configured to receive a metafile having meta information used to acquire, in the reception device, a predetermined number of audio streams including a plurality of groups of encoded data,
the metafile including inserted attribute information indicating each attribute of the encoded data of the plurality of groups;
a processing unit configured to selectively acquire encoded data of a predetermined group from the predetermined number of audio streams on the basis of the attribute information, and reconfigure an audio stream including the encoded data of the predetermined group; and
a stream transmitting unit configured to transmit the reconfigured audio stream to an external device.
According to the technology, a receiving unit receives the metafile. The metafile includes meta information used to acquire, in the reception device, a predetermined number of audio streams including a plurality of groups of encoded data. To the metafile, the attribute information indicating each attribute of the encoded data of the plurality of groups is inserted.
A processing unit selectively acquires the encoded data of a predetermined group from the predetermined number of audio stream on the basis of the attribute information, and an audio stream including the encoded data of the predetermined group is reconfigured. Then, a stream transmitting unit transmits the reconfigured audio stream to an external device.
In this manner, according to the present technology, on the basis of the attribute information, which is inserted in the metafile, that indicates each attribute of the encoded data of the plurality of groups, the encoded data of the predetermined group is selectively acquired from the predetermined number of audio streams, and an audio stream to be transmitted to an external device is reconfigured. The encoded data of the necessary group can be easily acquired, and this can reduce the process load.
Here, according to the present technology, for example, to the metafile, stream correspondence relation information indicating in which audio stream the encoded data of the plurality of groups is included respectively is further inserted, and the processing unit may selectively acquire the encoded data of the predetermined group from the predetermined number of audio streams on the basis of the stream correspondence relation information as well as the attribute information. In this case, the audio stream including the encoded data of the predetermined group can be easily recognized, and this can reduced the process load.
According to the present technology, the process load in the reception side can be reduced when a plurality of types of encoded data is transmitted. Here, the effect described in this specification is only an example and does not set any limitation, and there may be additional effect.
In the following, modes (hereinafter, referred to as “embodiments”) for carrying out the invention will be described. It is noted that the descriptions will be given in the following order.
1. Embodiment
2. Modified Examples
[Overview of MPEG-DASH-Based Stream Delivery System]
First, an overview of an MPEG-DASH-based stream delivery system to which the present technology can be applied will be described.
The DASH stream file server 31 generates a stream segment (hereinafter, referred to appropriately as a “DASH segment”) of a DASH specification on the basis of media data (video data, audio data, subtitle data, or the like) of predetermined content, and transmits the segment according to an HTTP request made from the service receiver. The DASH stream file server 31 may be a server dedicated for streaming and function as a web server as well.
Further, in response to a request of a segment of a predetermined stream transmitted from the service receiver 33 (33-1, 33-2, . . . , and 33-N) via the CDN 34, the DASH stream file server 31 transmits, via the CDN 34, the segment of the stream to the receiver as a source of the request. In this case, the service receiver 33 selects a stream of an optimal rate according to a state of a network environment in which a client is located with reference to a value of a rate described in the media presentation description (MPD) file, and makes a request.
The DASH MPD server 32 is a server that generates the MPD file for acquiring the DASH segment generated in the DASH stream file server 31. The MPD file is generated on the basis of content metadata received from a content management server (not illustrated) and an address (url) of the segment generated in the DASH stream file server 31. Here, the DASH stream file server 31 and the DASH MPD server 32 may physically be the same.
In an MPD format, each attribute is described using an element such as a representation (Representation) for each stream such as a video or an audio. For example, representations are divided for every plurality of video data streams having different rates, and each rate thereof is described in the MPD file. The service receiver 33 can select an optimal stream according to the state of the network environment in which the service receiver 33 is located in view of the value of the rate as described above.
In the case of the stream delivery system 30B, the broadcast transmission system 36 transmits a stream segment (a DASH segment) of a DASH specification generated by the DASH stream file server 31 and an MPD file generated by the DASH MPD server 32 through a broadcast wave.
As illustrated in
As illustrated in
Further, stream switching can freely be performed among a plurality of representations grouped according to the adaptation set. Thus, it is possible to select a stream of an optimal rate according to a state of a network environment in which a service receiver is located and perform seamless delivery.
[Exemplary Configuration of Transceiving System]
Further, in the transceiving system 10, the service receiver 200 corresponds to the service receivers 33 (33-1, 33-2, . . . , 33-N) in the above described stream delivery system 30A of
The service transmission system 100 transmits a DASH/MP4, that is, an MPD file serving as a metafile and an MP4 including a media stream (a media segment) such as a video or an audio via the RF transmission path (see
The header includes information such as a packet type (Packet Type), a packet label (Packet Label), and a packet length (Packet Length). Information defined by the packet type of the header is arranged in the payload. The payload information includes “SYNC” information corresponding to a synchronization start code, “Frame” information serving as actual data of 3D audio transmission data, and “Config” information indicating a configuration of the “Frame” information.
The “Frame” information includes channel encoded data and object encoded data configuring the 3D audio transmission data. Here, the channel encoded data is configured with encoded sample data such as a single channel element (SCE), a channel pair element (CPE), and a low frequency element (LFE). Further, the object encoded data is configured with the encoded sample data of the single channel element (SCE) and the metadata for mapping the encoded sample data with a speaker located at an arbitrary position and rendering the encoded sample data. The metadata is included as an extension element (Ext_element).
The two pieces of object encoded data are encoded data of an immersive audio object (IAO) and a speech dialog object (SDO). The immersive audio object encoded data is object encoded data for an immersive sound and composed of encoded sample data SCE2 and metadata EXE_El (Object metadata) 2 which is used for mapping the encoded sample data SCE2 with a speaker located at an arbitrary position and rendering the encoded sample data SCE2.
The speech dialog object encoded data is object encoded data for a spoken language. In this example, there are pieces of the speech dialog object encoded data respectively corresponding to first and second languages. The speech dialog object encoded data corresponding to the first language is composed of encoded sample data SCE3 and metadata EXE_El (Object metadata) 3 which is used for mapping the encoded sample data SCE3 with a speaker located at an arbitrary position and rendering the encoded sample data SCE3. Further, the speech dialog object encoded data corresponding to the second language is composed of encoded sample data SCE4 and metadata EXE_El (Object metadata) 4 which is used for mapping the encoded sample data SCE4 with a speaker located at an arbitrary position and rendering the encoded sample data SCE4.
The encoded data is distinguished by a concept of a group (Group) according to the data type. In the illustrated example, the encoded channel data of 5.1 channel is defined as Group 1 (Group 1), the immersive audio object encoded data is defined as Group 2 (Group 2), the speech dialog object encoded data related to the first language is defined as Group 3 (Group 3), and the speech dialog object encoded data related to the second language is defined as Group 4 (Group 4).
Further, the groups which can be switched in the receiving side are registered in a switch group (SW Group) and encoded. In the illustrated example, Group 3 and Group 4 are registered in Switch Group 1 (SW Group 1). Further, some groups can be grouped as a preset group (preset Group) and be reproduced according to a use case. In the illustrated example, Group 1, Group 2, and Group 3 are grouped as Preset Group 1, and Group 1, Group 2, and Group 4 are grouped as Preset Group 2.
Referring back to
The illustrated correspondence relation indicates that the encoded data of Group 1 is channel encoded data which does not compose a switch group and is included in Audio track 1. Further, the illustrated correspondence relation indicates that the encoded data of Group 2 is object encoded data for an immersive sound (immersive audio object encoded data) which does not compose a switch group and is included in Audio track 2.
Further, the illustrated correspondence relation indicates that the encoded data of Group 3 is object encoded data for spoken language (speech dialog object encoded data) of the first language, which composes switch group 1 and is included in Audio track 3. Further, the illustrated correspondence relation indicates that the encoded data of Group 4 is object encoded data for a spoken language (speech dialog object encoded data) of the second language, which composes switch group 1 and is included in Audio track 4.
Further, the illustrated correspondence relation indicates that Preset Group 1 includes Group 1, Group 2, and Group 3. Further, the illustrated correspondence relation indicates that Preset Group 2 includes Group 1, Group 2, and Group 4.
Further, the illustrated correspondence relation indicates that the encoded data of Group 3 is object encoded data for a spoken language (speech dialog object encoded data) of the first language, which composes switch group 1 and is included in Audio track 2. Further, the illustrated correspondence relation indicates that the encoded data of Group 4 is object encoded data for a spoken language (speech dialog object encoded data) of the second language, which composes switch group 1 and is included in Audio track 2.
Further, the illustrated correspondence relation indicates that Preset Group 1 includes Group 1, Group 2, and Group 3. Further, the illustrated correspondence relation indicates that Preset Group 2 includes Group 1, Group 2, and Group 4.
Referring back to
The service transmission system 100 inserts the attribute information and stream correspondence relation information to the MPD file. In the present embodiment in which “SupplementaryDescriptor” can newly define schemeIdUri” as a broadcast or any other application, separately from an existing definition in an existing standard, the service transmission system 100 inserts the attribute information and stream correspondence relation information to the MPD file by using “SupplementaryDescriptor.”
Firstly, the description example of the MPD file of
A description of “<SupplementaryDescriptor schemeIdUri=“urn:brdcst:codecType” value=“mpegh”/> indicates that a codec of the audio stream is MPEGH (3D audio). As illustrated in
A description of “<SupplementaryDescriptor schemeIdUri=“urn:brdcst:3dAudio:groupId” value=“group1”/>” indicates that encoded data of Group 1 “group1” is included in the audio stream. As illustrated in
A description of “<SupplementaryDescriptor schemeIdUri=“urn:brdcst:3dAudio:attribute” value=“channeldata”/>” indicates that the encoded data of Group 1 “group1” is channel encoded data “channeldata.” As illustrated in
A description of “<SupplementaryDescriptor schemeIdUri=“urn:brdcst:3dAudio:switchGroupId” value=“0”/>” indicates that the encoded data of Group 1 “group1” does not belong to any switch group. As illustrated in
A description of “<SupplementaryDescriptor schemeIdUri=“urn:brdcst:3dAudio:presetGroupId” value=“preset1”/>” indicates that the encoded data of Group 1 “group1” belongs to Preset Group 1 “preset1.” Further, a description of “<SupplementaryDescriptor schemeIdUri=“urn:brdcst:3dAudio:presetGroupId” value=“preset2”/>” indicates that the encoded data of Group 1 “group1” belongs to Preset Group 2 “preset2.” As illustrated in
A description of “<Representation id=“1” bandwidth=“128000”>” indicates that there is an audio stream having a bit rate of 128 kbps, which includes the encoded data of Group 1 “group1” in an adaptation set of Group 1, as a representation identified by “Representation id=“1”.” Then, a description of “<baseURL>audio/jp1/128.mp4</BaseURL>” indicates that a location destination of the audio stream is “audio/jp1/128.mp4.”
A description of “<SupplementaryDescriptor schemeIdUri=“urn:brdcst:3dAudio:levelId” value=“level1”/>” indicates that the audio stream is transmitted with a track corresponding to Level 1 “level1.” As illustrated in
Further, a description of “<AdaptationSet mimeType=“audio/mp4” group=“2”>” indicates that there is an adaptation set (AdaptationSet) for the audio stream, the audi stream is supplied in an MP4 file structure, and Group 2 is allocated.
A description of “<SupplementaryDescriptor schemeIdUri=“urn:brdcst:codecType” value=“mpegh”/> indicates that a codec of the audio stream is “MPEGH (3D audio).” A description of “<SupplementaryDescriptor schemeIdUri=“urn:brdcst:3dAudio:groupId” value=“group2”/>” indicates that encoded data of Group 2 “group2” is included in the audio stream.
A description of “<SupplementaryDescriptor schemeIdUri=“urn:brdcst:3dAudio:attribute” value=“objectSound”/>” indicates that the encoded data of Group 2 “group2” is object encoded data “objectSound” for an immersive sound. A description of “<SupplementaryDescriptor schemeIdUri=“urn:brdcst:3dAudio:switchGroupId” value=“0”/>” indicates that the encoded data of Group 2 “group2” does not belong to any switch group.
A description of “<SupplementaryDescriptor schemeIdUri=“urn:brdcst:3dAudio:presetGroupId” value=“preset1”/>” indicates that the encoded data of Group 2 “group2” belongs to Preset Group 1 “preset1.” A description of “<SupplementaryDescriptor schemeIdUri=“urn:brdcst:3dAudio:presetGroupId” value=“preset2”/>” indicates that the encoded data of Group 2 “group2” belongs to Preset Group 2 “preset2.”
A description of “<Representation id=“2” bandwidth=“128000”>” indicates that, there is an audio stream having a bit rate of 128 kbps, which includes the encoded data of Group 2 “group2” in an adaptation set of Group 2, as a representation identified by “Representation id=“2”.” Then, a description of “<baseURL>audio/jp2/128.mp4</BaseURL>” indicates that a location destination of the audio stream is “audio/jp2/128.mp4.” Then, a description of “<SupplementaryDescriptor schemeIdUri=“urn:brdcst:3dAudio:levelId” value=“level2”/>” indicates that the audio stream is transmitted with a track corresponding to Level 2 “level2.”
Further, a description of “<AdaptationSet mimeType=“audio/mp4” group=“3”>” indicates that there is an adaptation set (AdaptationSet) corresponding to the audio stream, the audi stream is supplied in an MP4 file structure, and Group 3 is allocated.
A description of “<SupplementaryDescriptor schemeIdUri=“urn:brdcst:codecType” value=“mpegh”/> indicates that a codec of the audio stream is “MPEGH (3D audio).” A description of “<SupplementaryDescriptor schemeIdUri=“urn:brdcst:3dAudio:groupId” value=“group3”/>” indicates that encoded data of Group 3 “group3” is included in the audio stream. A description of “<SupplementaryDescriptor schemeIdUri=“urn:brdcst:3dAudio:attribute” value=“objectLang1”/>” indicates that the encoded data of Group 3 “group3” is object encoded data “objectLang1” for a spoken language of the first language.
A description of “<SupplementaryDescriptor schemeIdUri=“urn:brdcst:3dAudio:switchGroupId” value=“1”/>” indicates that the encoded data of Group 3 “group3” belongs to switch group 1 (switch group 1). A description of “<SupplementaryDescriptor schemeIdUri=“urn:brdcst:3dAudio:presetGroupId” value=“preset1”/>” indicates that the encoded data of Group 3 “group3” belongs to Preset Group 1 “preset1.”
A description of “<Representation id=“3” bandwidth=“128000”>” indicates that there is an audio stream having a bit rate of 128 kbps, which includes the encoded data of Group 3 “group3” in an adaptation set of Group 3, as a representation identified by “Representation id=“3”.” Then, a description of “<baseURL>audio/jp3/128.mp4</BaseURL>” indicates that a location destination of the audio stream is “audio/jp3/128.mp4.” Then, a description of “<SupplementaryDescriptor schemeIdUri=“urn:brdcst:3dAudio:levelId” value=“level3”/>” indicates that the audio stream is transmitted with a track corresponding to Level 3 “level3.”
Further, a description of “<AdaptationSetmimeType=“audio/mp4” group=“4”>” indicates that there is an adaptation set (AdaptationSet) corresponding to an audio stream, the audi stream is supplied in an MP4 file structure, and Group 4 is allocated.
A description of “<SupplementaryDescriptor schemeIdUri=“urn:brdcst:codecType” value=“mpegh”/> indicates that a codec of the audio stream is “MPEGH (3D audio).” A description of “<SupplementaryDescriptor schemeIdUri=“urn:brdcst:3dAudio:groupId” value=“group4”/>” indicates that encoded data of Group 4 “group4” is included in the audio stream. A description of “<SupplementaryDescriptor schemeIdUri=“urn:brdcst:3dAudio:attribute” value=“objectLang2”/>” indicates that the encoded data of Group 4 “group4” is object encoded data “objectLang2” for a spoken language of the second language.
A description of “<SupplementaryDescriptor schemeIdUri=“urn:brdcst:3dAudio:switchGroupId” value=“1”/>” indicates that the encoded data of Group 4 “group4” belongs to switch group 1 (switch group 1). A description of “<SupplementaryDescriptor schemeIdUri=“urn:brdcst:3dAudio:presetGroupId” value=“preset2”/>” indicates that the encoded data of Group 4 “group4” belongs to Preset Group 2 “preset2.”
A description of “<Representation id=“4” bandwidth=“128000”>” indicates that there is an audio stream having a bit rate of 128 kbps, which includes the encoded data of Group 4 “group4” in an adaptation set of Group 4, as a representation identified by “Representation id=“4”.” Then, a description of “<baseURL>audio/jp4/128.mp4</BaseURL>” indicates that a location destination of the audio stream is “audio/jp4/128.mp4.” Then, a description of “<SupplementaryDescriptor schemeIdUri=“urn:brdcst:3dAudio:levelId” value=“level4”/>” indicates that the audio stream is transmitted with a track corresponding to Level 4 “level4.”
Next, the description example of the MPD file of
A description of “<Representation id=“1” bandwidth=“128000”>” indicates that there is an audio stream having a bit rate of 128 kbps in an adaptation set of Group 1, as a representation identified by “Representation id=“1”.” Then, a description of “<baseURL>audio/jp1/128.mp4</BaseURL>” indicates that a location destination of the audio stream is “audio/jp1/128.mp4.” Further, a description of “<SupplementaryDescriptor schemeIdUri=“urn:brdcst:3dAudio:levelId” value=“level1”/>” indicates that the audio stream is transmitted with a track corresponding to Level 1 “level1.”
A description of “<SubRepresentation id=“11” subgroupSet=“1”>” indicates that there is a sub-representation identified by “SubRepresentation id=“11”” in a representation identified by “Representation id=“1”,” and sub-group set 1 is allocated.
A description of “<SupplementaryDescriptor schemeIdUri=“urn:brdcst:3dAudio:groupId” value=“group1”/>” indicates that encoded data of Group 1 “group1” is included in the audio stream. A description of “<SupplementaryDescriptor schemeIdUri=“urn:brdcst:3dAudio: attribute” value=“channeldata”/>” indicates that the encoded data of Group 1 “group1” is channel encoded data “channeldata.”
A description of <SupplementaryDescriptor schemeIdUri=“urn:brdcst:3dAudio:switchGroupId” value=“0”/>” indicates that the encoded data of Group 1 “group1” does not belong to any switch group. A description of “<SupplementaryDescriptor schemeIdUri=“urn:brdcst:3dAudio:presetGroupId” value=“preset1”/>” indicates that the encoded data of Group 1 “group1” belongs to Preset Group 1 “preset1.” Further, a description of “<SupplementaryDescriptor schemeIdUri=“urn:brdcst:3dAudio:presetGroupId” value=“preset2”/>” indicates that the encoded data of Group 1 “group1” belongs to Preset Group 2 “preset2.”
A description of “<SubRepresentation id=“12” subgroupSet=“2”>” indicates that there is a sub-representation identified by “SubRepresentation id=“12”” in a representation identified by “Representation id=“1”,” and sub-group set 2 is allocated.
A description of “<SupplementaryDescriptor schemeIdUri=“urn:brdcst:3dAudio:groupId” value=“group2”/>” indicates that encoded data of Group 2 “group2” is included in an audio stream. A description of “<SupplementaryDescriptor schemeIdUri=“urn:brdcst:3dAudio: attribute” value=“objectSound”/>” indicates that the encoded data of Group 2 “group2” is object encoded data “objectSound” for an immersive sound.
A description of “<SupplementaryDescriptor schemeIdUri=“urn:brdcst:3dAudio:switchGroupId” value=“0”/>” indicates that the encoded data of Group 2 “group2” does not belong to any switch group. A description of “<SupplementaryDescriptor schemeIdUri=“urn:brdcst:3dAudio:presetGroupId” value=“preset1”/>” indicates that the encoded data of Group 2 “group2” belongs to Preset Group 1 “preset1.” A description of “<SupplementaryDescriptor schemeIdUri=“urn:brdcst:3dAudio:presetGroupId” value=“preset2”/>” indicates that the encoded data of Group 2 “group2” belongs to Preset Group 2 “preset2.”
Further, a description of “<AdaptationSet mimeType=“audio/mp4” group=“2”>” indicates that there is an adaptation set (AdaptationSet) corresponding to an audio stream, the audi stream is supplied in an MP4 file structure, and Group 2 is allocated. Then, a description of “<SupplementaryDescriptor schemeIdUri=“urn:brdcst:codecType” value=“mpegh”/> indicates that a codec of the audio stream is “MPEGH (3D audio).”
A description of “<Representation id=“2” bandwidth=“128000”>” indicates that there is an audio stream having a bit rate of 128 kbps in an adaptation set of Group 1, as a representation identified by “Representation id=“2”.” Then, a description of “<baseURL>audio/jp2/128.mp4</BaseURL>” indicates that a location destination of the audio stream is “audio/jp2/128.mp4.” Further, a description of “<SupplementaryDescriptor schemeIdUri=“urn:brdcst:3dAudio:levelId” value=“level2”/>” indicates that the audio stream is transmitted with a track corresponding to Level 2 “level2.”
A description of “<SubRepresentation id=“21” subgroupSet=“3”>” indicates that there is a sub-representation identified by “SubRepresentation id=“21”” in a representation identified by “Representation id=“2”” and sub-group set 3 is allocated.
A description of “<SupplementaryDescriptor schemeIdUri=“urn:brdcst:3dAudio:groupId” value=“group3”/>” indicates that encoded data of Group 3 “group3” is included in the audio stream. A description of “<SupplementaryDescriptor schemeIdUri=“urn:brdcst:3dAudio:attribute” value=“objectLang1”/>” indicates that the encoded data of Group 3 “group3” is object encoded data “objectLang1” for a spoken language of the first language.
A description of “<SupplementaryDescriptor schemeIdUri=“urn:brdcst:3dAudio:switchGroupId” value=“1”/>” indicates that the encoded data of Group 3 “group3” belongs to switch group 1 (switch group 1). A description of “<SupplementaryDescriptor schemeIdUri=“urn:brdcst:3dAudio:presetGroupId” value=“preset1”/>” indicates that the encoded data of Group 3 “group3” belongs to Preset Group 1 “preset1.”
A description of “<SubRepresentation id=“22” subgroupSet=“4”>” indicates that there is a sub-representation identified by “SubRepresentation id=“22”” in a representation identified by “Representation id=“2”” and sub-group set 4 is allocated.
A description of “<SupplementaryDescriptor schemeIdUri=“urn:brdcst:3dAudio:groupId” value=“group4”/>” indicates that encoded data of Group 4 “group4” is included in the audio stream. A description of “<SupplementaryDescriptor schemeIdUri=“urn:brdcst:3dAudio:attribute” value=“objectLang2”/>” indicates that the encoded data of Group 4 “group4” is object encoded data “objectLang2” for a spoken language of the second language.
A description of “<SupplementaryDescriptor schemeIdUri=“urn:brdcst:3dAudio:switchGroupId” value=“1”/>” indicates that the encoded data of Group 4 “group4” belongs to switch group 1 (switch group 1). A description of “<SupplementaryDescriptor schemeIdUri=“urn:brdcst:3dAudio:presetGroupId” value=“preset2”/>” indicates that the encoded data of Group 4 “group4” belongs to Preset Group 2 “preset2.”
In the following, a media file substance of a location destination indicated by “<baseURL>,” that is, a file contained in each audio track, will be described. In a case of a non-fragmented MP4 (Non-Fragmented MP4), for example, a media file substance is sometimes defined as “url 1” as illustrated in
Further, in a case of a fragmented MP4 (Fragmented MP4), for example, a media file substance is sometimes defined as “url 2” as illustrated in
Further, a combination of the above described “url 1” and “url 2” is also considered. In this case, for example, “url 1” may be set as an initialization segment, and “url 1” and “url 2” may be set as an MP4 of one service. Alternatively, “url 1” and “url 2” may be combined into one and defined as “url 3” as illustrated in
As described above, in the “moov” box, a correspondence between track identifiers (track ID) and level identifiers (level ID) is written. As illustrated in
As illustrated in
As described above, in the “moov” box composing the initialization segment (is), a correspondence between track identifiers (track ID) and level identifiers (level ID) is written. Further, as illustrated in
Referring back to
As described above, in addition to the video stream, the MP4 includes a predetermined number of audio tracks (audio streams) including a plurality of groups of encoded data that compose 3D audio transmission data. Then, in the MPD file, attribute information that indicates each attribute of the plurality of groups of encoded data included in the 3D audio transmission data is inserted and stream correspondence relation information that indicates which audio track (audio stream) the encoded data of the plurality of groups is included respectively.
The service receiver 200 selectively performs a decode process on the audio stream including encoded data of a group having an attribute compatible with a speaker configuration and user selection information on the basis of the attribute information and stream correspondence relation information and obtains an audio output of 3D audio.
[DASH/MP4 Generation Unit of Service Transmission System]
The video encoder 112 inputs video data SV, performs encoding such as MPEG2, H.264/AVC, and H.265/HEVC on the video data SV, and generates a video stream (video elementary stream). The audio encoder 113 inputs, as audio data SA, object data of an immersive audio and a speech dialog together with channel data.
The audio encoder 113 performs encoding of MPEGH on the audio data SA and obtains 3D audio transmission data. As illustrated in
The DASH/MP4 formatter 114 generates an MP4 including a media stream (media segment) of a video and an audio as the content on the basis of the video stream generated in the video encoder 112 and the predetermined number of audio streams generated in the audio encoder 113. Here, each stream of the video or audio is stored in the MP4 as separate tracks (tracks) respectively.
Further, the DASH/MP4 formatter 114 generates an MPD file by using content metadata, segment URL information, and the like. In the present embodiment, the DASH/MP4 formatter 114 inserts, in the MPD file, attribute information that indicates each attribute of the encoded data of the plurality of groups included in the 3D audio transmission data and also inserts stream correspondence relation information that indicates which audio track (audio stream) the encoded data of the plurality of groups is included respectively (see
An operation of the DASH/MP4 generation unit 110 illustrated in
The audio data SA is supplied to the audio encoder 113. The audio data SA includes channel data and object data of an immersive audio and a speech dialog. The audio encoder 113 performs encoding of MPEGH on the audio data SA and obtains 3D audio transmission data.
The 3D audio transmission data includes immersive audio object encoded data (IAO) and speech dialog object encoded data (SDO) in addition to the channel encoded data (CD) (see
The DASH/MP4 formatter 114 generates an MP4 including media stream (media segment) of a video, an audio or the like of the content on the basis of the video stream generated in the video encoder 112 and the predetermined number of audio streams generated in the audio encoder 113. Here, each stream of the video and audio is stored in the MP4 as separated tracks (track) respectively.
Further, the DASH/MP4 formatter 114 generates an MPD file by using content metadata, segment URL information, or the like. In the MPD file, attribute information that indicates each attribute of the encoded data of the plurality of groups included in the 3D audio transmission data is inserted and stream correspondence relation information that indicates which audio track (audio stream) the encoded data of the plurality of groups is included respectively is also inserted.
[Exemplary Configuration of Service Receiver]
The CPU 221 controls an operation of each unit in the service receiver 200. The flash ROM 222 stores control software and saves data. The DRAM 223 composes a work area of the CPU 221. The CPU 221 activates software by developing software or data read from the flash ROM 222 in the DRAM 223 and controls each unit in the service receiver 200.
The remote control receiving unit 225 receives a remote control signal (remote control code) transmitted from the remote control transmitter 226 and supplies the signal to the CPU 221. The CPU 221 controls each unit in the service receiver 200 on the basis of the remote control code. The CPU 221, flash ROM 222, and DRAM 223 are connected to the internal bus 224.
The receiving unit 201 receives a DASH/MP4, which is an MPD file as metafile, and an MP4 including a media stream (media segment) such as a video and an audio, transmitted from the service transmission system 100 via the RF transmission path or communication network transmission path.
In addition to a video stream, the MP4 includes a predetermined number of audio tracks (audio streams) including a plurality of groups of encoded data composing the 3D audio transmission data. Further, in the MPD file, attribute information that indicates each attribute of the encoded data of the plurality of groups included in the 3D audio transmission data is inserted and stream correspondence relation information that indicates which audio track (audio stream) the encoded data of the plurality of groups is included respectively is also inserted.
The DASH/MP4 analyzing unit 202 analyzes the MPD file and MP4 received by the receiving unit 201. The DASH/MP4 analyzing unit 202 extracts a video stream from the MP4 and transmits the video stream to the video decoder 203. The video decoder 203 performs a decoding process on the video stream and obtains uncompressed video data.
The image processing circuit 204 performs a scaling process and an image quality adjusting process on the video data obtained by the video decoder 203 and obtains video data for displaying. The panel driving circuit 205 drives the display panel 206 on the basis of the video data to be displayed, which is obtained by the image processing circuit 204. The display panel 206 is configured with, for example, a liquid crystal display (LCD), an organic electroluminescence display (organic EL display), or the like.
Further, the DASH/MP4 analyzing unit 202 extracts MPD information included in the MPD file and transmits the MPD information to the CPU 221. The CPU 221 controls an obtaining process of a stream of a video or an audio on the basis of the MPD information. Further, the DASH/MP4 analyzing unit 202 extracts metadata such as header information of each track, meta description of the content substance, time information, for example, from the MP4 and transmits the metadata to the CPU 221.
The CPU 21 recognizes an audio track (audio stream) including encoded data of a group having an attribute compatible with a speaker configuration and viewer (user) selection information, on the basis of the attribute information indicating an attribute of encoded data of each group and the stream correspondence relation information indicating in which audio track (audio stream) each group is included in the MPD file.
Further, under the control by the CPU 221, the DASH/MP4 analyzing unit 202 refers to the level ID (level ID), the track ID (track ID) in other words, and selectively extracts one or more audio streams including encoded data of a group having an attribute compatible with the speaker configuration and viewer (user) selection information among the predetermined number of audio streams included in the MP4.
The container buffers 211-1 to 211-N import each audio stream extracted by the DASH/MP4 analyzing unit 202 respectively. Here, N of the number of container buffers 211-1 to 211-N is a necessary and sufficient number and, in an actual operation, the number is equivalent to the number of audio streams extracted in the DASH/MP4 analyzing unit 202.
The combiner 212 reads audio stream of each audio frame from the container buffers, among the container buffers 211-1 to 211-N, in which each audio stream extracted by the DASH/MP4 analyzing unit 202 is imported, and supplies, to the 3D audio decoder 213, encoded data of a group having an attribute compatible with the speaker configuration and viewer (user) selection information.
The 3D audio decoder 213 performs a decode process on the encoded data supplied from the combiner 212 and obtains audio data to drive each speaker of the speaker system 215. Here, there may be three cases of the encoded data on which the decode process is performed, which are a case that only channel encoded data is included, a case that only object encoded data is included, and a case that both of channel encoded data and object encoded data are included.
When decoding the channel encoded data, the 3D audio decoder 213 obtains audio data to drive each speaker by performing downmixing and upmixing for the speaker configuration of the speaker system 215. Further, when decoding the object encoded data, the 3D audio decoder 213 calculates speaker rendering (a mixing rate for each speaker) on the basis of object information (metadata) and mixes the audio data of the object to the audio data to drive each speaker according to the calculation result.
The audio output processing circuit 214 performs a necessary process such as D/A conversion, amplification, or the like on the audio data to drive each speaker, which is obtained from the 3D audio decoder 213, and supplies the data to the speaker system 215. The speaker system. 215 includes multiple speakers such as multiple channels, for example, 2 channel, 5.1 channel, 7.1 channel, 22.2 channel, or the like.
An operation of the service receiver 200 illustrated in
In the DASH/MP4 analyzing unit 202, the MPD file and MP4 received in the receiving unit 201 are analyzed. Then, in the DASH/MP4 analyzing unit 202, a video stream is extracted from the MP4 and transmitted to the video decoder 203. In the video decoder 203, a decoding process is performed on the video stream and uncompressed video data is obtained. The video data is supplied to the image processing circuit 204.
In the image processing circuit 204, a scaling process, an image quality adjusting process, or the like are performed on the video data obtained in the video decoder 203 and video data to be displayed is obtained. The video data to be displayed is supplied to the panel driving circuit 205. In the panel driving circuit 205, the display panel 206 is driven on the basis of the video data to be displayed. With this configuration, on the display panel 206, an image corresponding to the video data to be displayed is displayed.
Further, in the DASH/MP4 analyzing unit 202, MPD information included in the MPD file is extracted and transmitted to the CPU 221. Further, in the DASH/MP4 analyzing unit 202, metadata, for example, header information of each track, a meta description of a content substance, time information or the like are extracted from the MP4 and transmitted to the CPU 221. In the CPU 221, an audio track (audio stream) in which encoded data of a group having attribute compatible with the speaker configuration and viewer (user) selection information is recognized on the basis of the attribute information, stream correspondence relation information, or the like included in the MPD file.
Further, under the control by the CPU 221, in the DASH/MP4 analyzing unit 202, one or more audio streams including the encoded data of the group having the attribute compatible with the speaker configuration and viewer (user) selection information is selectively extracted from the predetermined number of audio streams included in the MP4 by referring to the track ID (track ID).
The audio stream extracted in the DASH/MP4 analyzing unit 202 is imported to a corresponding container buffer among the container buffers 211-1 to 211-N. In the combiner 212, audio stream is read from each audio frame from each container buffer in which the audio stream is imported and supplied to the 3D audio decoder 213 as encoded data of the group having the attribute compatible with the speaker configuration and viewer selection information. In the 3D audio decoder 213, a decode process is performed on the encoded data supplied from the combiner 212 and audio data for driving each speaker of the speaker system 215 is obtained.
Here, when the channel encoded data is decoded, processes of downmixing and upmixing for the speaker configuration of the speaker system 215 are performed and audio data for driving each speaker is obtained. Further, when the object encoded data is decoded, speaker rendering (a mixing rate for each speaker) is calculated on the basis of the object information (metadata), and audio data of the object is mixed to the audio data for driving each speaker according to the calculation result.
The audio data for driving each speaker, which is obtained in the 3D audio decoder 213, is supplied to the audio output processing circuit 214. In the audio output processing circuit 214, necessary processes such as D/A conversion, amplification, or the like are performed on the audio data for driving each speaker. Then, the processed audio data is supplied to the speaker system 215. With this configuration, a sound output corresponding to a display image of the display panel 206 is obtained from the speaker system 215.
Next, in step ST4, the CPU 221 reads information related to each audio stream of the MPD information, which are “groupID,” “attribute,” “switchGroupID,” “presetGroupID,” and “levelID.” Then, in step ST5, the CPU 221 recognizes a track ID (track ID) of an audio track to which an encoded data group having an attribute compatible with the speaker configuration and viewer selection information belongs.
Next, in step ST6, the CPU 221 selects each audio track on the basis of the recognition result and imports the stored audio stream to the container buffer. Then, in step ST7, the CPU 221 reads the audio stream for each audio frame from the container buffer and supplies encoded data of a necessary group to the 3D audi decoder 213.
Next, in step ST8, the CPU 221 determines whether or not to decode the object encoded data. When the object encoded data is decoded, the CPU 221 calculates speaker rendering (a mixing rate for each speaker) by using azimuth (orientation information) and elevation (elevation angle information) on the basis of the object information (metadata) in step ST9. After that, the CPU 221 proceeds to a process in step ST10. Here, in step ST8, when the object encoded data is not decoded, the CPU 221 immediately proceeds to a process in step ST10.
In step ST10, the CPU 221 determines whether or not to decode the channel encoded data. When the channel encoded data is decoded, the CPU 221 performs processes of downmixing and upmixing for the speaker configuration of the speaker system 215 and obtains audio data for driving each speaker in step ST11. After that, the CPU 221 proceeds to a process in step ST12. Here, when the object encoded data is not decoded in step ST10, the CPU 221 immediately proceeds to a process in step ST12.
In step ST12, when decoding the object encoded data, the CPU 221 mixes the audio data of the object to the audio data for driving each speaker according to the calculation result in step ST9 and performs a dynamic range control after that. Then, the CPU 21 ends the process in step ST13. Here, when the object encoded data is not decoded, the CPU 221 skips the process in step ST12.
As described above, in the transceiving system 10 illustrated in
Further, in the transceiving system 10 illustrated in
Here, in the above embodiments, the service receiver 200 is configured to selectively extract the audio stream in which the encoded data of the group having an attribute compatible with the speaker configuration and viewer selection information from the plurality of audio streams transmitted from the service transmission system 100 and obtain audio data for driving a predetermined number of speakers by performing a decode process.
Here, as a service receiver, it may be considered to selectively extract one or more audio streams including encoded data of a group having an attribute compatible with the speaker configuration and viewer selection information from a plurality of audio streams transmitted from the service transmission system 100, reconfigure an audio stream having the encoded data of the group having the attribute compatible with the speaker configuration and viewer selection information, and distribute the reconfigured audio stream to a device (including a DLNA device) which is connected to an internal network.
Under the control by the CPU 221, the DASH/MP4 analyzing unit 202 refers to a level ID (level ID), that is, a track ID (track ID), and selectively extracts one or more audio streams including the encoded data of the group having the attribute compatible with the speaker configuration and viewer (user) selection information from a predetermined number of audio streams included in the MP4.
The audio stream extracted in the DASH/MP4 analyzing unit 202 is imported to a corresponding container buffer among the container buffers 211-1 to 211-N. In the combiner 212, an audio stream for each audio frame is read from each container buffer, in which the audio stream is imported, and supplied to a stream reconfiguration unit 231.
In the stream reconfiguration unit 231, the encoded data of the predetermined group having the attribute compatible with the speaker configuration and viewer selection information is selectively acquired and the audio stream having the encoded data of the predetermined group is reconfigured. The reconfigured audio stream is supplied to a delivery interface 232. Then, the reconfigured audio stream is delivered (transmitted) to a device 300 connected to the internal network from the delivery interface 232.
The internal network connection includes Ethernet connection and wireless connection such as “WiFi” and “Bluetooth.” Here, “WiFi” and “Bluetooth” are registered trademarks.
Further, the device 300 includes a surround sound speaker, a second display, an audio output device attached to a network terminal. The device 300 that receives the delivery of the reconfigured audio stream performs a decode process similarly to the 3D audio decoder 213 in the service receiver 200 of
Further, a service receiver may have a configuration that transmits the above described reconfigured audio stream to a device connected by a digital interface such as “high-definition multimedia interface (HDMI),” “mobile high definition link (MHL),” “DisplayPort” and the like. Here, “HDMI” and “MHL” are registered trademarks.
Further, the above described embodiment describes an example that a field of “attribute” is provided and attribute information of encoded data of each group is transmitted (see
Further, the above described embodiments describe an example that channel encoded data and object encoded data are included in encoded data of a plurality of groups (see
Here, the present technology may have the following configurations.
(1) A transmission device including:
a transmitting unit configured to transmit a metafile having meta information used to acquire, in a reception device, a predetermined number of audio streams including a plurality of groups of encoded data; and
an information inserting unit configured to insert, to the metafile, attribute information indicating each attribute of the encoded data of the plurality of groups.
(2) The transmission device according to (1), wherein the information inserting unit further inserts, to the metafile, stream correspondence relation information indicating in which audio stream the encoded data of the plurality of groups is included respectively.
(3) The transmission device according to (2), wherein the stream correspondence relation information is information indicating a correspondence relation between group identifiers that respectively identify each piece of the encoded data of the plurality of groups and identifiers that respectively identify each of the predetermined number of audio streams.
(4) The transmission device according to any of (1) to (3), wherein the metafile is an MPD file.
(5) The transmission device according to (4), wherein the information inserting unit inserts the attribute information to the metafile by using “Supplementary Descriptor.”
(6) The transmission device according to any of (1) to (5), wherein the transmitting unit transmits the metafile via an RF transmission path or a communication network transmission path.
(7) The transmission device according to any of (1) to (6), wherein the transmitting unit further transmits a container in a predetermined format having the predetermined number of audio streams including the encoded data of the plurality of groups.
(8) The transmission device according to (7), wherein the container is an MP4.
(9) The transmission device according to any of (1) to (8), wherein the encoded data of the plurality of groups includes one of or both of channel encoded data and object encoded data.
(10) A transmission method including:
a transmission step of transmitting, by a transmitting unit, a metafile having meta information used to acquire, in a reception device, a predetermined number of audio streams including a plurality of groups of encoded data; and
an information insertion step of inserting, to the metafile, attribute information indicating each attribute of the encoded data of the plurality of groups.
(11) A reception device including:
a receiving unit configured to receive a metafile having meta information used to acquire, in the reception device, a predetermined number of audio streams including a plurality of groups of encoded data,
the metafile including inserted attribute information that indicates each attribute of the encoded data of the plurality of groups; and
a processing unit configured to process the predetermined number of audio streams on the basis of the attribute information.
(12) The reception device according to (11),
wherein
to the metafile, stream correspondence relation information indicating in which audio stream the encoded data of the plurality of groups is included respectively is further inserted, and
the processing unit processes the predetermined number of audio streams on the basis of the stream correspondence relation information as well as the attribute information.
(13) The reception device according to (12), wherein the processing unit selectively performs a decode process on an audio stream including encoded data of a group having an attribute compatible with a speaker configuration and user selection information, on the basis of the attribute information and the stream correspondence relation information.
(14) The reception device according to any of (11) to (13), wherein the encoded data of the plurality of groups includes one of or both of channel encoded data and object encoded data.
(15) A reception method including:
a receiving step of receiving, by a receiving unit, a metafile having meta information used to acquire, in a reception device, a predetermined number of audio streams including a plurality of groups of encoded data,
the metafile including inserted attribute information indicating each attribute of the encoded data of the plurality of groups; and
a processing step of processing the predetermined number of audio streams on the basis of the attribute information.
(16) A reception device including:
a receiving unit configured to receive a metafile having meta information used to acquire, in the reception device, a predetermined number of audio streams including a plurality of groups of encoded data,
the metafile including inserted attribute information indicating each attribute of the encoded data of the plurality of groups;
a processing unit configured to selectively acquire encoded data of a predetermined group from the predetermined number of audio streams on the basis of the attribute information, and reconfigure an audio stream including the encoded data of the predetermined group; and
a stream transmitting unit configured to transmit the reconfigured audio stream to an external device.
(17) The reception device according to (16), wherein
to the metafile, stream correspondence relation information indicating in which audio stream the encoded data of the plurality of groups is included respectively is further inserted, and
the processing unit selectively acquires the encoded data of the predetermined group from the predetermined number of audio streams on the basis of the stream correspondence relation information as well as the attribute information.
(18) A reception method including:
a receiving step of receiving, by a receiving unit, a metafile having meta information used to acquire, in a reception device, a predetermined number of audio streams including a plurality of groups of encoded data,
the metafile including inserted attribute information indicating each attribute of the encoded data of the plurality of groups;
a processing step of selectively acquiring encoded data of a predetermined group from the predetermined number of audio streams on the basis of the attribute information and reconfiguring an audio stream including the encoded data of the predetermined group; and
a stream transmission step of transmitting the reconfigured audio stream to an external device.
Major characteristics of the present technology can reduce the process load in the reception side by inserting, to an MPD file, attribute information that indicates respective attributes of encoded data of a plurality of groups included in a predetermined number of audio streams and stream correspondence relation information that indicates which audio track (audio stream) the encoded data of the plurality of groups is included respectively (see
Number | Date | Country | Kind |
---|---|---|---|
2014-187085 | Sep 2014 | JP | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2015/075318 | 9/7/2015 | WO | 00 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2016/039287 | 3/17/2016 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
9026450 | Dressler | May 2015 | B2 |
9319014 | Oh | Apr 2016 | B2 |
20090262957 | Oh | Oct 2009 | A1 |
20090265023 | Oh | Oct 2009 | A1 |
20100017002 | Oh | Jan 2010 | A1 |
20100017003 | Oh | Jan 2010 | A1 |
20120232910 | Dressler | Sep 2012 | A1 |
20130279879 | Watanabe et al. | Oct 2013 | A1 |
20140105422 | Oh | Apr 2014 | A1 |
20140161285 | Oh | Jun 2014 | A1 |
20150089533 | Giladi et al. | Mar 2015 | A1 |
20150381692 | Giladi et al. | Dec 2015 | A1 |
20170006274 | Watanabe et al. | Jan 2017 | A1 |
20170104803 | Giladi et al. | Apr 2017 | A1 |
20170263259 | Tsukagoshi | Sep 2017 | A1 |
Number | Date | Country |
---|---|---|
2 665 262 | Nov 2013 | EP |
2665262 | Nov 2013 | EP |
2009-278381 | Nov 2009 | JP |
2012-033243 | Feb 2012 | JP |
2014-520491 | Aug 2014 | JP |
2014109321 | Jul 2014 | WO |
Entry |
---|
“Information technology—Dynamic adaptive streaming over HTTP (DASH)—Part 1: Media presentation description and segment formats”, ISO/IEC JTC 1/SC 29, ISO/IEC 23009-1:2012(E), ISO/IEC JTC 1/SC 29/WG 11, XP030018824, Jan. 2012, 133 pages. |
“Information technology—High efficiency coding and media delivery in heterogeneous environments—Part 3: 3D audio”, ISO/IEC JTC 1/SC 29 N, ISO/IEC CD 23008-3, ISO/IEC JTC 1/SC 29/WG 11, XP030021195, Apr. 2014, 337 pages. |
Stephan Schreiner, et al., “On Multiple MPEG-H 3D Audio Streams”, International Organisation for Standardisation Organisation Internationale De Normalisation, ISO/IEC JTC1/SC29/WG11, Coding of Moving Pictures and Audio, XP030062639, Jul. 2014, 6 pages. |
European Office Communication dated Jun. 6, 2019 including European Search Report in European Patent Application No. 19156452.5. |
Text of ISO/IEC 23008-3/CD, 3D audio, Coding of Moving Pictures and Audio, International Organisation for Standardisation, ISO/IEC JTC1/SC29/WG11 N14459, Apr. 2014, Valencia, Spain. |
Stephan Schreiner et al., on multiple MPEG-H 3D Audio streams, Coding of Moving Pictures and Audio, International Organisation for Standardisation, ISO/IEC JTC1/SC29/WG11 MPEG2014/M34266, Jul. 2014, Sapporo, Japan. |
Information technology—Dynamic adaptive streaming over HTTP (DASH)—Part 1: Media presentation description and segment formats, ISO/IEC 23009-1:2012(E), ISO/IEC JTC 1/SC 29/WG 11, Date: Jan. 5, 2012. |
Office Action dated Oct. 8, 2019 in Japanese Patent Application No. 2016-547428. |
Notification of the First Office Action dated Mar. 3, 2020 in corresponding Chinese Patent Application No. 2015800474693 (with English translation)(18 pages). |
Number | Date | Country | |
---|---|---|---|
20170263259 A1 | Sep 2017 | US |