The present disclosure relates to an information processing apparatus and an information processing method, and especially relates to an information processing apparatus and an formation processing method that enable easy reproduction of audio data of a predetermined kind, of audio data of a plurality of kinds.
In recent years, the mainstream of streaming services on the Internet has been over the top video (OTT-V). A technology growing popular as a basic technology is moving picture experts group phase-dynamic adaptive streaming over HTTP (MPEG-DASH) (for example, see Non-Patent Document 1).
In MPEG-DASH, a distribution server prepares moving image data groups with different screen sizes and encoding speeds, for one piece of moving image content, and a reproduction terminal requires the moving image data group with an optimum screen size and an optimum encoding speed according to a state of a transmission path, so that adaptive streaming distribution is realized.
Non-Patent Document 1: Dynamic Adaptive Streaming over HTTP (MPEG-DASH) (URL:http://mpeg.chiariglione.org/standards/mpeg-dash/media-presentation-description-and-segment-formats/text-isoiec-23009-12012-dam-1)
However, easy reproduction of audio data of a predetermined group, of audio data of a plurality of groups, has not been considered.
The present disclosure has been made in view of the foregoing, and enables easy reproduction of audio data of a desired group, of audio data of a plurality of groups.
An information processing apparatus of a first aspect of the present disclosure is an information processing apparatus including a file generation unit that generates a file in which audio data of a plurality of kinds is divided into tracks for each one or more of the kinds and arranged, and information related to the plurality of kinds is arranged.
An information processing method of the first aspect of the present disclosure corresponds to the information processing apparatus of the first aspect of the present disclosure.
In the first aspect of the present disclosure, the file in which audio data of a plurality of kinds is divided into tracks for each one or more of the kinds and arranged, and information related to the plurality of kinds is arranged is generated.
An information processing apparatus of a second aspect of the present disclosure is an information processing apparatus including a reproduction unit that reproduces, from a file in which audio data of a plurality of kinds is divided into tracks for each one or more of the kinds and arranged, and information related to the plurality of kinds is arranged, the audio data in a predetermined track.
An information processing method of the second aspect of the present disclosure corresponds to the information processing apparatus of the second aspect of the present disclosure.
In the second aspect of the present disclosure, the audio data of a predetermined track is reproduced from the file in which audio data of a plurality of kinds is divided into tracks for each one or more of the kinds and arranged, and information related to the plurality of kinds is arranged.
Note that the information processing apparatuses of the first and second aspects can be realized by causing a computer to execute a program.
Further, to realize the information processing apparatuses of the first and second aspects, the program executed by the computer can be transmitted through a transmission medium, or can be recorded on a recording medium and provided.
According to the first aspect of the present disclosure, a file can be generated. Further, according to the first aspect of the present disclosure, a file that enables easy reproduction of audio data of a predetermined kind, of audio data of a plurality of kinds, can be generated.
According to the second aspect of the present disclosure, audio data can be reproduced. Further, according to the second aspect of the present disclosure, audio data of a predetermined kind, of audio data of a plurality of kinds, can be easily reproduced.
Hereinafter, presuppositions of the present disclosure and embodiments for implementing the present disclosure (hereinafter, referred to as embodiments) will be described. Note that the description will be given as follows:
In an analysis (parsing) of an MPD file, an optimum one is selected from “Representation” attributes included in “Periods” of the MPD file (Media Presentation of
Then, a file is acquired and processed by reference to a uniform resource locator (URL) and the like of “Initialization Segment” in a head of the selected “Representation”. Following that, a file is acquired and reproduced by reference to a URL and the like of subsequent “Media Segment”.
Note that relationship among “Period”, “Representation”, and “Segment” in the MPD file is illustrated in
Therefore, the MPD file has a hierarchical structure illustrated in
In an MP4 file, codec information of the moving image content, and position information indicating a position in a file can be managed for each track. In a 3D audio file format of MP4, all of audio streams (elementary streams (ESs)) of 3D audio (Channel audio/Object audio/SAOC Object audio/HOA audio/metadata) are recorded as one track in units of a sample (frame). Further, the codec information (Pro file/level/audio configuration) of the 3D audio is stores as sample entry.
The Channel audio that configures the 3D audio is audio data in units of a channel, and the Object audio is audio data in units of an object. Note that an object is a sound source, and the audio data in units of an object is acquired with a microphone or the like attached to the object. The object may be a substance such as a fixed microphone stand or a moving body such as a person.
Further, the SAOC Object audio is audio data of spatial audio object coding (SAOC), the HOA audio is audio data of higher order ambisonics (HOA), and the metadata is metadata of the Channel audio, the Object audio, the SAOC Object audio, and the HOA audio.
As illustrated in
By the way, in broadcasting or local storage reproduction of the MP4 file, typically, a server side sends the audio streams of all of the 3D audio. Then, a client side decodes and outputs only the audio streams of necessary 3D audio while parsing the audio streams of all of the 3D audio. However, in a case where a bit rate is high or there is a restriction on a reading rate of the local storage, it is desirable to reduce a load of the decoding processing by acquiring only the audio streams of the necessary 3D audio.
Further, in stream reproduction of the MP4 file conformable to MPEG-DASH, the server side prepares the audio streams at a plurality of encoding speeds. Therefore, the client side can select and acquire the audio streams at an encoding speed optimum for a reproduction environment by acquiring only the audio streams of necessary 3D audio.
As described above, in the present disclosure, by dividing the audio streams of the 3D audio into tracks according to kinds, and arranging the audio streams in an audio file, only the audio streams of a predetermined kind of the 3D audio can be efficiently acquired. Accordingly, in the broadcasting or the local storage reproduction, the load of the decoding processing can be reduced. Further, in stream reproduction, the audio streams with highest quality, of the audio streams of the necessary 3D audio, can be reproduced according to a band.
As illustrated in
The audio elements of the same audio kind (Channel/Object/SAOC Object/HOA) form a group. Therefore, examples of a group type (GroupType) include Channels, Objects, SAOC Objects, and HOA. Two or more groups can form switch Group or group Preset as needed.
The switch Group is a group (exclusive reproduction group) in which an audio stream of the group included therein is exclusively reproduced. That is, as illustrated in
Meanwhile, the group Preset defines a combination of the groups intended by a content creator.
Further, the metadata of the 3D audio is Extelement (Ext Element) that is different in each metadata. Types of the Extelement include Object Metadata, SAOC 3D Metadata, HOA Metadata, DRC Metadata, SpatialFrame, SaocFrame, and the like. The Extelement of the Object Metadata is metadata of all of the Object audio, and the Extelement of the SAOC 3D Metadata is metadata of all of the SAOC audio. Further, the Extelement of the HOA Metadata is metadata of all of the HOA audio, and Extelement of dynamic range control (DRC) Metadata is metadata of all of the Object audio, the SAOC audio, and the HOA audio.
As described above, division units of the audio data, of the 3D audio, include the audio element, the group type, the group, the switch Group, and the group Preset. Therefore, the audio streams of the audio data, of the 3D audio, can be divided into different tracks in each kind, where the kind is the audio element, the group type, the group, the switch Group, or the group Preset.
Further, division units of the metadata, of the 3D audio, include a type of the Extelement and the audio element corresponding to the metadata. Therefore, the audio streams of the metadata of the 3D audio can be divided into different tracks in each kind, where the kind is the Extelement or the audio element corresponding to the metadata.
In the embodiment below, the audio streams of the audio data are divided into the tracks in each one or more groups, and the audio streams of the metadata are divided into the tracks in each type of the Extelement.
An information processing system 140 of
In the information processing system 140, the web server 142 distributes the audio streams of the tracks in the group to be reproduced to the moving image reproduction terminal 144 by a method conforming to MPEG-DASH.
To be specific, the file generation device 141 encodes the audio data and the metadata of the 3D audio of the moving image content at plurality of encoding speeds to generate the audio streams. The file generation device 141 makes files of all of the audio streams at the encoding speeds and in each time unit from several seconds to ten seconds, which is called segment, to generate the audio file. At this time, the file generation device 141 divides the audio streams for each group and each type of the Extelement, and arranges the audio streams in the audio file as the audio streams in the different tracks. The file generation device 141 uploads the generated audio file onto the web server 142.
Further, the file generation device 141 generates the MPD file (management file) that manages the audio file and the like. The file generation device 141 uploads the MPD file onto the web server 142.
The web server 142 stores the audio file of each encoding speed and segment, and the MPD file uploaded by the file generation device 141. The web server 142 transmits the stored audio file, the MPD file, and the like, to the moving image reproduction terminal 144, in response to a request from the moving image reproduction terminal 144.
The moving image reproduction terminal 144 executes control software of streaming data (hereinafter, referred to as control software) 161, moving image reproduction software 162, client software for hypertext transfer protocol (HTTP) access (hereinafter, referred to as access software) 163, and the like.
The control software 161 is software that controls data streamed from the web server 142. To be specific, the control software 161 causes the moving image reproduction terminal 144 to acquire the MPD file from the web server 142.
Further, the control software 161 commands the access software 163 to send a transmission request of the group to be reproduced specified by the moving image reproduction software 162, and the audio streams of the tracks of the type of Extelement corresponding to the group, on the basis of the MPD file.
The moving image reproduction software 162 is software that reproduces the audio streams acquired from the web server 142. To be specific, the moving image reproduction software 162 specifies the group to be reproduced and the type of the Extelement corresponding to the group, to the control software 161. Further, the moving image reproduction software 162 decodes the audio streams received from the moving image reproduction terminal 144 when receiving notification of reception start from the access software 163. The moving image reproduction software 162 synthesizes and outputs the audio data obtained as a result of the decoding, as needed.
The access software 163 is software that controls communication between the moving image reproduction terminal 144 and the web server 142 through the Internet 13 using the HTTP. To be specific, the access software 163 causes the moving image reproduction terminal 144 to transmit a transmission request of the audio stream of the track to be reproduced included in the audio file in response to the command of the control software 161. Further, the access software 163 causes the moving image reproduction terminal 144 to start reception of the audio streams transmitted from the web server 142 in response to the transmission request, and supplies notification of the reception start to the moving image reproduction software 162.
Note that, in the present specification, only the audio file of the moving image content will be described. However, in reality, a corresponding image file is generated and reproduced together with the audio file.
Note that, in
As illustrated in
Track Reference is arranged in a track box of each of the tracks. The Track Reference indicates reference relationship between a corresponding track and another track. To be specific, the Track Reference indicates an ID of another track in the reference relationship, unique to the track (hereinafter, referred to as track ID).
In the example of
Further, 4cc (character code) of the sample entry of the base track is “mha2”, and in the sample entry of the base track, an mhaC box including config information of all of the groups of the 3D audio or config information necessary for decoding only the base track, and an mhas box including information related to all of the groups and the switch Group of the 3D audio are arranged. The information related to the groups is configured from the IDs of the groups, information indicating content of data of the element classified into the groups, and the like. The information related to the switch Group is configured from an ID of the switch Group, the IDs of the groups that form the switch Group, and the like.
The 4cc of the sample entry of the track of each of the groups is “mhg1”, and in the sample entry of the track of each of the groups, an mhgC box including information related to the group maybe arranged. In a case where a group forms the switch Group, an mhsC box including information related to the switch Group is arranged in the sample entry of the track in the group.
In a sample of the base track, reference information to samples of the tracks in the groups or config information necessary for decoding the reference information is arranged. By arranging the samples of the groups referenced by the reference information in order of arrangement of the reference information, the audio streams of the 3D audio before being divided into the tracks can be generated. The reference information is configured from positions and sizes of the samples of the tracks the groups, the group types, and the like.
As illustrated in
As illustrated in
In the mhaC box, Config information necessary for decoding the corresponding track is described. Further, in the mhgC box, AudioScene information related to the corresponding group is described as GroupDefinition. In the mhsC box, AudioScene information related to the switch Group is described in SwitchGroupDefinition in a case where the corresponding group forms the switch Group.
In the segment structure of
Further, the media segment is configured from an sidx box, an ssix box, and one or more subsegments. In the sidx box, position information indicating positions of the subsegments in the audio file is arranged. In the ssix box, position information of the audio streams of the levels arranged in an mdat box is arranged. Note that the level corresponds to the track. Further, the position information of the first track is the position information of data made of an moof box and the audio stream of the first track.
The subsegment is provided for each arbitrary time length, and the subsegment is provided with a pair of the moof box and the mdat box, which is common to all of the tracks. In the mdat box, the audio streams of all of the tracks are collectively arranged by an arbitrary time length, and in the moof box, management information of the audio streams is arranged. The audio streams of the tracks arranged in the mdat box are successive in each track.
In the example of
The segment structure of
That is, the Initial segment of
The subsegment is provided for each arbitrary time length, and the subsegment is provided with a pair of the moof box and the mdat box for each track. That is, in the mdat box of each of the tracks, the audio streams of the tracks are collectively arranged (interleave storage) by an arbitrary time length, and in the moof box, management information of the audio streams is arranged.
As illustrated in
The level assignment box is a box that associates the track ID of each of the tracks and the level used in the ssix box. In the example of
As illustrated in
The “Representation” and the “SubRepresentation” include “codecs” that indicates the kind (profile or level) of codec of the corresponding segment as a whole or the track in a 3D audio file format.
The “SubRepresentation” includes a “level” that is a value set in the level assignment box as a value that indicates the level of the corresponding track. “SubRepresentation” includes “dependencyLevel” that is a value indicating the level corresponding to another track (hereinafter, referred to as reference track) having the reference relationship (having dependency).
Further, the “SubRepresentation” includes <EssentialProperty schemeIdUri=“urn:mpeg:DASH:3daudio:2014” value=“dataType,definition”>.
The “dataType” is a number that indicates a kind of content (definition) of the Audio Scene information described in the sample entry of the corresponding track, and the definition is its content. For example, in a case where GroupDefinition is included in the sample entry of the track, 1 is described as “dataType” of the track, and the GroupDefinition is described as “definition”. Further, in a case where the SwitchGroupDefinition is included in the sample entry of the track, 2 is described as the “dataType” of the track, and the SwitchGroupDefinition is described as the “definition”. That is, the “dataType” and the “definition” are information that indicates whether the SwitchGroupDefinition exists in the sample entry of the corresponding track. The “definition” is binary data, and is encoded by a base64 method.
Note that, in the example of
The file generation device 141 of
The audio encoding processing unit 171 of the file generation device 141 encodes the audio data and the metadata of the 3D audio of the moving image content at a plurality of encoding speeds to generate the audio streams. The audio encoding processing unit 171 supplies the audio stream of each encoding speed to the audio file generation unit 172.
The audio file generation unit 172 allocates the track to the audio stream supplied from the audio encoding processing unit 171 for each group and each type of the Extelement. The audio file generation unit 172 generates the audio file in the segment structure of
The MPD generation unit 173 determines the URL of the web server 142 in which the audio file supplied from the audio file generation unit 172 is to be stored, and the like. Then, the MPD generation unit 173 generates the MPD file in which the URL of the audio file and the like are arranged in the “Segment” of the “Representation” for the audio file. The MPD generation unit 173 supplies the generated MPD file and the audio file to the server upload processing unit 174.
The server upload processing unit 174 uploads the audio file and the MPD file supplied from the MPD generation unit 173 onto the web server 142.
In step S191 of
In step S192, the audio file generation unit 172 allocates the track to the audio stream supplied from the audio encoding processing unit 171 for each group and each type of the Extelement.
In step S193, the audio file generation unit 172 generates the audio file in the segment structure of
In step S194, the MPD generation unit 173 generates the MPD file including the URL of the audio file and the like. The MPD generation unit 173 supplies the generated MPD file and the audio file to the server upload processing unit 174.
In step 5195, the server upload processing unit 174 uploads the audio file and the MPD file supplied from the MPD generation unit 173 onto the web server 142. Then, the processing is terminated.
A streaming reproduction unit 190 of
The MPD acquisition unit 91 of the streaming reproduction unit 190 acquires the MPD file from the web server 142, and supplies the MPD file to the MPD processing unit 191.
The MPD processing unit 191 extracts the information of the URL of the audio file of the segment to be reproduced described in the “Segment” for the audio file, and the like, from the MPD file supplied from the MPD acquisition unit 91, and supplies the information to the audio file acquisition unit 192.
The audio file acquisition unit 192 requests the web server 142 and acquires the audio stream of the track to be reproduced in the audio file identified with the URL supplied from the MPD processing unit 191. The audio file acquisition unit 192 supplies the acquired audio stream to the audio decoding processing unit 194.
The audio decoding processing unit 194 decodes the audio stream supplied from the audio file acquisition unit 192. The audio decoding processing unit 194 supplies the audio data obtained as a result of the decoding to the audio synthesis processing unit 195. The audio synthesis processing unit 195 synthesizes the audio data supplied from the audio decoding processing unit 194, as needed, and outputs the audio data.
As described above, the audio file acquisition unit 192, the audio decoding processing unit 194, and the audio synthesis processing unit 195 function as a reproduction unit, and acquire and reproduce the audio stream of the track to be reproduced from the audio file stored in the web server 142.
In step S211 of
In step S212, the MPD processing unit 191 extracts the information of the URL of the audio file of the segment to be reproduced described in the “Segment” for the audio file, and the like, from the MPD file supplied from the MPD acquisition unit 91, and supplies the information to the audio file acquisition unit 192.
In step S213, the audio file acquisition unit 192 requests the web server 142 and acquires the audio stream of the track to be reproduced in the audio file identified by the URL on the basis of the URL supplied from the MPD processing unit 191. The audio file acquisition unit 192 supplies the acquired audio stream to the audio decoding processing unit 194.
In step S214, the audio decoding processing unit 194 decodes the audio stream supplied from the audio file acquisition unit 192.
The audio decoding processing unit 194 supplies the audio data obtained as a result of the decoding to the audio synthesis processing unit 195. In step S215, the audio synthesis processing unit 195 synthesizes the audio data supplied from the audio decoding processing unit 194, as needed, and outputs the audio data.
Note that, in the above description, the GroupDefinition and the SwitchGroupDefinition are arranged in the sample entry. However, as illustrated in
In this case, as illustrated in
Further, the sample entry of the track of each of the groups becomes one illustrated in
The configuration of the track of the audio data of
That is, the sample entry of the base track of
Further, the sample entry of the group track is the sample entry with the 4cc of “mhg1”, which includes the syntax for group track of when the audio streams of the audio data, of the 3D audio, are divided into a plurality of tracks and arranged, similarly to
Further, similarly to
Note that the mhgC box and the mhsC box may not be described in the sample entry of the group track. Further, in a case where the mhaC box including the config information of all of the groups of the 3D audio is described in the sample entry of the base track, the mhaC box may not be described in the sample entry of the group track. However, in a case where the mhaC box including the config information that can independently reproduce the base track is described in the sample entry of the base track, the mhaC box including the config information that can independently reproduce the group track is described in the sample entry of the group track. Whether it is in the former state or in the latter state can be recognized according to existence/non-existence of the config information in the sample entry. However, the recognition can be made by describing a flag in the sample entry or by changing the type of the sample entry. Note that, although illustration is omitted, in a case of making the former state and the latter state recognizable by changing the type of the sample entry, the 4cc of the sample entry of the base track is “mha2” in the case of the former state, and is “mha4” in the case of the latter state.
The MPD file of
In the “SubRepresentation” of the base track, the “codecs”, the “level”, the “dependencyLevel” of the base track, and <EssentialProperty schemeIdUri=“urn:mpeg:DASH:3daudio:2014” value =“dataType,definition”> are described, similarly to the “SubRepresentation” of the group track.
In the example of
Note that, as illustrated in
In the example of
Further, “2” to “7” are set as numbers that respectively indicate, as kinds, “Dialog EN” that indicates the content of the group with the group ID “2”, “Dialog FR” that indicates the content of the group with the group ID “3”, “VoiceOver GE” that indicates the content of the group with the group ID “4”, “Effects” that indicates the content of the group with the group ID “5”, “Effect” that indicates the content of the group with the group ID “6”, and “Effect” that indicates the content of the group with the group ID “7”.
Therefore, in the “SubRepresentation” of the base track of
The configuration of the track of the audio data of
In the case of
Further, because the 4ccs of the sample entries are “mha2”, the corresponding track being the track of when the audio streams of the audio data, of the 3D audio, are divided and arranged in a plurality of tracks, can be recognized.
Note that, in the mhaC box of the sample entry of the base track, the config information of all of the groups of the 3D audio or the config information that can independently reproduce the base track is described, similarly to the cases of
Meanwhile, in the sample entry of the group track, the mhas box is not arranged. Further, in a case where the mhaC box including the config information of all of the groups of the 3D audio is described in the sample entry of the base track, the mhaC box may not be described in the sample entry of the group track. However, in a case where the mhaC box including the config information that can independently reproduce the base track is described in the sample entry of the base track, the mhaC box including the config information that can independently reproduce the base track is described in the sample entry of the group track. Whether it is in the former state or in the latter state can be recognized according to existence/non-existence of the config information in the sample entry. However, the former state and the latter state can be identified by describing a flag in the sample entry or by changing the type of the sample entry. Note that, although illustration is omitted, in a case of making the former state and the latter state recognizable by changing the type of the sample entry, the 4cc of the sample entry of the base track and the 4cc of the sample entry of the group track are, for example, “mha2” in the case of the former state, and “mha4” in the case of the latter state.
The MPD file of
Note that, although illustration is omitted, the AudioScene information may be divided and described in the “SubRepresentation” of the base track, similarly to the case of
The configuration of the tracks of the audio data of
In the case of
Therefore, similarly to the case of
As illustrated in
That is, in the sample entry with the 4cc of “mha3”, the mhaC box (MHAConfigration Box), the mhas box (MHAAudioSceneInfo Box), the mhgC box (MHAGroupDefinitionBox), the mhsC box (MHASwitchGropuDefinition Box), and the like are arranged.
In the mhaC box of the sample entry of the base track, the config information of all of the groups of the 3D audio or the config information that can independently reproduce the base track is described. Further, in the mhas box, the AudioScene information including the information related to all of the groups and the switch Group of the 3D audio is described, and the mhgC box and the mhsC box are not arranged.
In a case where the mhaC box including the config information of all of the groups of the 3D audio is described in the sample entry of the base track, the mhaC box may not be described in the sample entry of the group track . However, in a case where the mhaC box including the config information that can independently reproduce the base track is described in the sample entry of the base track, the mhaC box including the config information that can independently reproduce the group track is described in the sample entry of the group track. Whether it is in the former state or in the latter state can be recognized according to existence/non-existence of the config information in the sample entry. However, the former state and the latter state can be recognized by describing a flag in the sample entry, or by changing the type of the sample entry. Note that, although illustration is omitted, in a case of making the former state and the latter state recognizable by changing the type of the sample entry, the 4ccs of the sample entries of the base track and the group track are, for example, “mha3” in the case of the former state, and are “mha5” in the case of the latter state. Further, the mhas box is not arranged in the sample entry of the group track. The mhgC box and the mhsC box may be or may not be arranged.
Note that, as illustrated in
The MPD file of
Note that, although illustration is omitted, the AudioScene information may be divided and described in the “SubRepresentation” of the base track, similarly to the case of
Further, in the above-description, the Track Reference is arranged in the track box in each of the tracks. However, the Track Reference may not be arranged. For example,
The MPD files of the cases where the configurations of the tracks of the audio file are the configurations of
The configuration of the tracks of the audio data of
To be specific, an mhmt box that describes which tracks the groups described in the AudioScene information are divided into is newly arranged in the sample entry with the 4cc of “mha2”, which includes the syntax for base track of when the audio streams of the audio data, of the 3D audio, are divided into a plurality of tracks.
The configuration of the sample entry with the 4cc of “mha2” of
In the mhmt box, as the reference information, corresponding relationship between the group ID (group_ID) and the track ID (track_ID) is described. Note that, in the mhmt box, the audio element and the track ID may be described in association with each other.
In a case where the reference information is not changed in each sample, the reference information can be efficiently described by arranging the mhmt box in the sample entry.
Note that, although illustration is omitted, in the cases of
In this case, the syntax of the sample entry with the 4cc of “mha3” becomes one illustrated in
Further, in
Further, in
As illustrated in
As illustrated in
The “Representation” includes “codecs”, “id”, “associationId”, and “assciationType”. The “id” is an ID of the “Representation” including the same. The “associationId” is information indicating reference relationship between corresponding track and another track, and is “id” of a reference track. The “assciationType” is a code indicating meaning of reference relationship (dependency) with the reference track, and for example, a value that is the same as a value of track reference of MP4 is used.
Further, the “Representation” of the tracks of groups include <EssentialProperty schemeIdUri=“urn:mpeg:DASH:3daudio:2014” value=“dataType,definition”>. In the example of FIG. 39, the “Representations” that manage the segments of the audio files are provided under one “AdaptationSet”. However, the “AdaptationSet” may be provided for each of the segments of the audio files, and the “Representation” that manages the segment may be provided thereunder. In this case, in the “AdaptationSet”, the “associationId” and <EssentialProperty schemeIdUri=“urn:mpeg:DASH:3daudioAssociationData:2014” value=“dataType,id”> indicating meaning of the reference relationship with the reference track may be described, similarly to the “assciationType”. Further, AudioScene information, GroupDefinition, and SwitchGroupDefinition described in the “Representations” of a base track and a group track maybe divided and described, similarly to the case of FIG. 25. Further, the AudioScene information, the GroupDefinition, and the Switch Group Definition divided and described in the “Representations” may be described in the “AdaptationSets”.
The same configurations, of configurations illustrated in
An information processing system 210 of
In the information processing system 210, the web server 142 distributes an audio stream of the audio file of the group to be reproduced to the moving image reproduction terminal 144 by a method conforming to MPEG-DASH.
To be specific, the file generation device 211 encodes audio data and metadata of the 3D audio of moving image content at a plurality of encoding speeds to generate the audio streams. The file generation device 211 divides the audio streams for each group and each type of Extelement to have the audio streams in different tracks. The file generation device 211 makes files of the audio streams at each encoding speed, for each segment, and for each track, to generate the audio files. The file generation device 211 uploads the audio files obtained as a result onto the web server 212. Further, the file generation device 211 generates an MPD file and uploads the MPD file onto the web server 212.
The web server 212 stores the audio files at each encoding speed, for each segment, and for each track, and the MPD file uploaded from the file generation device 211. The web server 212 transmits the stored audio files, the stored MPD file, and the like to the moving image reproduction terminal 214, in response to a request from the moving image reproduction terminal 214.
The moving image reproduction terminal 214 executes control software 221, moving image reproduction software 162, access software 223, and the like.
The control software 221 is software that controls data streamed from the web server 212. To be specific, the control software 221 causes the moving image reproduction terminal 214 to acquire the MPD file from the web server 212.
Further, the control software 221 commands the access software 223 to send a transmission request of the group to be reproduced specified with the moving image reproduction software 162, and the audio stream of the audio file of the type of Extelement corresponding to the group, on the basis of the MPD file.
The access software 223 is software that controls communication between the moving image reproduction terminal 214 and the web server 212 through the Internet 13 using the HTTP. To be specific, the access software 223 causes the moving image reproduction terminal 144 to transmit a transmission request of the audio stream of the audio file to be reproduced in response to the command of the control software 221. Further, the access software 223 causes the moving image reproduction terminal 144 to start reception of the audio stream transmitted from the web server 212, in response to the transmission request, and supplies notification of the reception start to the moving image reproduction software 162.
The same configurations, of configurations illustrated in
The configuration of the file generation device 211 of
To be specific, the audio file generation unit 241 of the file generation device 211 allocates a track to the audio stream supplied from the audio encoding processing unit 171 for each group and each type of the Extelement. The audio file generation unit 241 generates the audio file in which the audio stream is arranged, at each encoding speed, for each segment, and for each track. The audio file generation unit 241 supplies the generated audio files to the MPD generation unit 242.
The MPD generation unit 242 determines a URL of the web server 142 to which the audio files supplied from the audio file generation unit 172 are to be stored, and the like. The MPD generation unit 242 generates the MPD file in which the URL of the audio file and the like are arranged in the “Segment” of the “Representation” for the audio file. The MPD generation unit 173 supplies the generated MPD file and the generated audio files to the server upload processing unit 174.
Processing of steps S301 and S302 of
In step S303, the audio file generation unit 241 generates the audio file in which the audio stream is arranged at each encoding speed, for each segment, and for each track. The audio file generation unit 241 supplies the generated audio files to the MPD generation unit 242.
Processing of steps S304 and S305 is similar to the processing of steps S194 and S195 of
The same configurations, of configurations illustrated in
The configuration of a streaming reproduction unit 260 of
The audio file acquisition unit 264 requests the web server 142 to acquire the audio stream of the audio file on the basis of the URL of the audio file of the track to be reproduced, of the URLs supplied from the MPD processing unit 191. The audio file acquisition unit 264 supplies the acquired audio stream to the audio decoding processing unit 194.
That is, the audio file acquisition unit 264, the audio decoding processing unit 194, and the audio synthesis processing unit 195 function as a reproduction unit, and acquire the audio stream of the audio file of the track to be reproduced, from the audio files stored in the web server 212 and reproduce the audio stream.
Processing of steps S321 and S322 of
In step S323, the audio file acquisition unit 192 requests the web server 142 to acquire the audio stream of the audio file, of the URLs supplied from the MPD processing unit 191, on the basis of the URL of the audio file of the track to be reproduced. The audio file acquisition unit 264 supplies the acquired audio stream to the audio decoding processing unit 194.
Processing of steps S324 and S325 is similar to the processing of steps S214 and S215 of
Note that, in the second embodiment, the GroupDefinition and the Switch Group Definition may also be arranged in sample group entry, similarly to the first embodiment.
Further, in the second embodiment, the configurations of the track of the audio data can also be the configurations illustrated in
The MPD of
Further, the MPD of
Further, the MPD of
Note that, in the MPD of
In the above description, only one base track is provided. However, a plurality of the base tracks may be provided. In this case, the base track is provided for each viewpoint of the 3D audio (details will be given below), for example, and in the base tracks, mhaC boxes including config information of all of the groups of the 3D audio of the viewpoints are arranged. Note that, in the base tracks, mhas boxes including the AudioScene information of the viewpoints may be arranged.
The viewpoint of the 3D audio is a position where the 3D audio can be heard, such as a viewpoint of an image reproduced at the same time with the 3D audio or a predetermined position set in advance.
As described above, in a case where the base track is divided for each viewpoint, audio different for each viewpoint can be reproduced from the audio stream of the same 3D audio on the basis of the position of an object on a screen and the like included in the config information of each of the viewpoints. As a result, a data amount of the audio streams of the 3D audio can be reduced.
That is, in a case where the viewpoints of the 3D audio are a plurality of viewpoints of images of a baseball stadium, which can be reproduced at the same time with the 3D audio, an image having a viewpoint in a center back screen is prepared as a main image that is an image of a basic viewpoint. Further, images having viewpoints in a seat behind the plate, a first-base infield bleacher seat, a third-base infield bleacher seat, a left outfield bleacher seat, a right outfield bleacher seat, and the like are prepared as multi-images that are images of the viewpoints other than the basic viewpoint.
In this case, if the 3D audio of all of the viewpoints is prepared, the data amount of the 3D audio becomes large. Therefore, by describing, to the base tracks, the positions of the object on the screen and the like in the viewpoints, the audio streams such as Object audio and SAOC Object audio, which are changed according to the positions of the object on the screen, can be shared by the viewpoints. As a result, the data amount of the audio streams of the 3D audio can be reduced.
At the time of reproduction of the 3D audio, for example, different audio is reproduced according to the viewpoint, using the audio streams such as the Object audio and the SAOC Object audio of the basic viewpoint, and the base track corresponding to the viewpoint of the main image or the multi-image reproduced at the same time with the audio stream.
Similarly, for example, in a case where the viewpoints of the 3D audio are positions of a plurality of seats of a stadium set in advance, the data amount of the 3D audio becomes large if the 3D audio of all of the viewpoints is prepared. Therefore, by describing, to the base tracks, the positions of the object on the screen, in the viewpoints, the audio streams such as the Object audio and the SAOC Object audio can be shared by the viewpoints. As a result, different audio can be reproduced according to the seat selected by the user using a seating chart, using the Object audio and the SAOC Object audio of one viewpoint, and the data amount of the audio streams of the 3D audio can be reduced.
In a case where the base track is provided for each viewpoint of the 3D audio in the track structure of
In this case, three base tracks are provided for each viewpoint of the 3D audio, as illustrated in
The mhaC box including config information of all of groups of the 3D audio of each of the viewpoints is arranged in the sample entry of each of the base tracks. As the config information of all of the groups of the 3D audio of each of the viewpoints is the position of the object on the screen, in the viewpoint, for example. Further, the mhas box including the AudioScene information of each of the viewpoints is arranged in each of the base tracks.
The audio streams of the groups of the Channel audio of the viewpoints are arranged in samples of the base tracks.
Note that, in a case where Object Metadata that describes the position of the object on the screen, in each of the viewpoints, in units of a sample, exists, the Object Metadata is also arranged in the sample of each of the base tracks.
That is, in a case where the object is a moving body (for example, a sport athlete), the position of the object on the screen in each of the viewpoints is temporally changed. Therefore, the position is described as Object Metadata in units of the sample. In this case, the Object Metadata in units of the sample is arranged, for each viewpoint, in the sample of the base track corresponding to the viewpoint.
The configurations of the group tracks of
Note that, in the track structure of
In the example of
Further, the audio stream of the group of the Channel audio of the viewpoint corresponding to the base track with the track ID of “3” is arranged in the group track with the track ID of “6”.
Note that, in the examples of
Further, although illustration is omitted, a case in which the base track is provided for each viewpoint of the 3D audio in all of the above-described track structures other than the track structure of
The series of processing of the web server 142 (212) can be executed by hardware or can be executed by software. In a case of executing the series of processing by software, a program that configures the software is installed to the computer. Here, the computer includes a computer incorporated in special hardware, and a general-purpose personal computer that can execute various types of functions by installing various types of programs, and the like.
In the computer, a central processing unit (CPU) 601, a read only memory (ROM) 602, and a random access memory (RAM) 603 are mutually connected by a bus 604.
An input/output interface 605 is further connected to the bus 604. An input unit 606, an output unit 607, a storage unit 608, a communication unit 609, and a drive 610 are connected to the input/output interface 605.
The input unit 606 is made of a keyboard, a mouse, a microphone, and the like. The output unit 607 is made of a display, a speaker, and the like. The storage unit 608 is made of a hard disk, a non-volatile memory, and the like. The communication unit 609 is made of a network interface, and the like. The drive 610 drives a removable medium 611 such as a magnetic disk, an optical disk, or a magneto-optical disk, or a semiconductor memory.
In the computer configured as described above, the CPU 601 loads the program stored in the storage unit 608 onto the RAM 603 through the input/output interface 605 and the bus 604, and executes the program, so that the series of processing is performed.
The program executed by the computer (CPU 601) can be provided by being recorded in the removable medium 611 as a package medium, for example. Further, the program can be provided through a wired or wireless transmission medium, such as a local area network, the Internet, or digital satellite broadcasting.
In the computer, the program can be installed to the storage unit 608 through the input/output interface 605 by attaching the removable medium 611 to the drive 610. Further, the program can be received by the communication unit 609 through a wired or wireless transmission medium, and installed to the storage unit 608. In addition, the program can be installed to the ROM 602 or the storage unit 608 in advance.
Note that the program executed by the computer may be a program processed in time series according to the order described in the present specification, or may be a program processed in parallel or at necessary timing such as when called.
Further, the hardware configuration of the moving image reproduction terminal 144 (214) can have a similar configuration to the computer of
In the present specification, a system means a collective of a plurality of configuration elements (devices, modules (components), and the like), and all of the configuration elements may or may not be in the same casing. Therefore, both of a plurality of devices accommodated in separate casings and connected via a network, and a single device in which a plurality of modules are accommodated in a single casing are the systems.
Note that embodiments of the present disclosure are not limited to the above-described embodiments, and various changes can be made without departing from the spirit and scope of the present disclosure.
Further, the present disclosure can be applied to an information processing system that performs broadcasting or local storage reproduction, instead of streaming reproduction.
In the embodiments of the MPD, the information is described by Essential Property having descriptor definition that can be ignored when the content described by the schema cannot be understood. However, the information may be described by SupplementalProperty having descriptor definition that can be reproduced even if the content described by the schema cannot be understood. This description method is selected by the side that creates the content with intention.
Further, the present disclosure can employ the configurations like below.
An information processing apparatus including:
a file generation unit configured to generate a file in which audio data of a plurality of kinds is divided into tracks for each one or more of the kinds and arranged, and information related to the plurality of kinds is arranged.
The information processing apparatus according to (1), wherein
the information related to the plurality of kinds is arranged in sample entry of a predetermined track.
The information processing apparatus according to (2), wherein
the predetermined track is one of the tracks in which the audio data of a plurality of kinds is divided and arranged.
The information processing apparatus according to any one of (1) to (3), wherein,
for each of the tracks, information related to the kind corresponding to the track is arranged in the file.
The information processing apparatus according to (4), wherein,
for each of the tracks, information related to an exclusive reproduction kind made of the kind corresponding to the track, and the kind corresponding to the audio data exclusively reproduced from the audio data of the kind corresponding to the track is arranged in the file.
The information processing apparatus according to (5), wherein
information related to the kind corresponding to the track and the information related to an exclusive reproduction kind are arranged in sample entry of the corresponding track.
The information processing apparatus according to (5) or (6), wherein
the file generation unit generates a management file that manages the file including information indicating whether the information related to an exclusive reproduction kind exists for each of the tracks.
The information processing apparatus according to any one of (1) to (7), wherein
reference information to the tracks corresponding to the plurality of kinds is arranged in the file.
The information processing apparatus according to (8), wherein
the reference information is arranged in a sample of the predetermined track.
The information processing apparatus according to (9), wherein
the predetermined track is one of the tracks in which the audio data of a plurality of kinds is divided and arranged.
The information processing apparatus according to any one of (1) to (10), wherein
information indicating reference relationship among the tracks is arranged in the file.
The information processing apparatus according to any one of (1) to (11), wherein
the file generation unit generates a management file that manages the file including information indicating reference relationship among the tracks.
The information processing apparatus according to any one of (1) to (12), wherein
the file is one file.
The information processing apparatus according to any one of (1) to (12), wherein
the file is a file of each of the tracks.
An information processing method including the step of:
by an information processing apparatus, generating a file in which audio data of a plurality of kinds is divided into tracks for each one or more of the kinds and arranged, and information related to the plurality of kinds is arranged.
An information processing apparatus including:
a reproduction unit configured to reproduce, from a file in which audio data of a plurality of kinds is divided into tracks for each one or more of the kinds and arranged, and information related to the plurality of kinds is arranged, the audio data of a predetermined track.
An information processing method including the step of:
by an information processing apparatus, reproducing, from file in which audio data of a plurality of kinds is divided into tracks for each one or more of the kinds and arranged, and information related to the plurality of kinds is arranged, the audio data of a predetermined track.
Number | Date | Country | Kind |
---|---|---|---|
2014-134878 | Jun 2014 | JP | national |
2015-107970 | May 2015 | JP | national |
2015-109838 | May 2015 | JP | national |
2015-119359 | Jun 2015 | JP | national |
2015-121336 | Jun 2015 | JP | national |
2015-124453 | Jun 2015 | JP | national |
This application is a U.S. National Phase of International Patent Application No. PCT/JP2015/068751 filed on Jun. 30, 2015, which claims priority benefit of Japanese Patent Application No. JP 2014-134878 filed in the Japan Patent Office on Jun. 30, 2014, Japanese Patent Application No. JP 2015-107970 filed in the Japan Patent Office on May 27, 2015, Japanese Patent Application No. JP 2015-109838 filed in the Japan Patent Office on May 29, 2015, Japanese Patent Application No. JP 2015-119359 filed in the Japan Patent Office on Jun. 12, 2015, Japanese Patent Application No. JP 2015-121336 filed in the Japan Patent Office on Jun. 16, 2015 and also claims priority benefit of Japanese Patent Application No. JP 2015-124453 filed in the Japan Patent Office on Jun. 22, 2015. Each of the above-referenced applications is hereby incorporated herein by reference in its entirety.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2015/068751 | 6/30/2015 | WO | 00 |