This patent application is a U.S. National Stage application of International Patent Application Number PCT/FI2021/050089 filed Feb. 10, 2021, which is hereby incorporated by reference in its entirety, and claims priority to GB 2002900.5 filed Feb. 28, 2020.
The present application relates to apparatus and methods for sound-field related audio representation and associated rendering, but not exclusively for audio representation for an audio encoder and decoder.
Immersive audio codecs are being implemented supporting a multitude of operating points ranging from a low bit rate operation to transparency. An example of such a codec is the immersive voice and audio services (IVAS) codec which is being designed to be suitable for use over a communications network such as a 3GPP 4G/5G network. Such immersive services include uses for example in immersive voice and audio for applications such as virtual reality (VR), augmented reality (AR) and mixed reality (MR). This audio codec is expected to handle the encoding, decoding and rendering of speech, music and generic audio. It is furthermore expected to support channel-based audio and scene-based audio inputs including spatial information about the sound field and sound sources. The codec is also expected to operate with low latency to enable conversational services as well as support high error robustness under various transmission conditions.
Furthermore parametric spatial audio processing is a field of audio signal processing where the spatial aspect of the sound is described using a set of parameters. For example, in parametric spatial audio capture from microphone arrays, it is a typical and an effective choice to estimate from the microphone array signals a set of parameters such as directions of the sound in frequency bands, and the ratios between the directional and non-directional parts of the captured sound in frequency bands. These parameters are known to well describe the perceptual spatial properties of the captured sound at the position of the microphone array. These parameters can be utilized in synthesis of the spatial sound accordingly, for headphones binaurally, for loudspeakers, or to other formats, such as Ambisonics.
There is provided according to a first aspect an apparatus comprising means configured to: receive at least a first audio data stream and a second audio data stream, wherein at least one of the first and second audio stream comprises a spatial audio stream to enable immersive audio during a communication; determine a type of each of the first and second audio streams to identify which of the received first and second audio data streams comprises the spatial audio stream; process the second audio data stream with at least one parameter dependent on the determined type; and render the first audio data stream and the processed second audio data stream.
The second audio data stream may be configured to comprise at least one further audio data stream, and wherein the at least one further audio data stream may comprise a determined type, and the at least one further audio data stream may be an embedded level audio data stream with respect to the second audio data stream.
The at least one further audio data stream may comprise at least one further embedded level, wherein each embedded level may comprise at least one additional audio data stream with a determined type.
The second audio data stream may be a master level audio data stream.
Each audio data streams may be further associated with at least one of: a stream identifier configured to uniquely identify the audio data stream; and a stream descriptor configured to describe the type of the audio data stream.
The type may be one of: a mono audio signal type; an immersive voice and audio services audio signal.
The at least one parameter may be configured to define a room characteristic or scene description.
The at least one parameter defining a room characteristic or scene description may comprise at least one of: direction; direction azimuth; direction elevation; distance; gain; spatial extent; energy ratio; and position.
The means may be further configured to: receive an additional audio data stream; embed the additional audio data stream within one or other of the first audio data stream and the second audio data stream.
According to a second aspect there is provided a method for an apparatus, the method comprising: receiving at least a first audio data stream and a second audio data stream, wherein at least one of the first and second audio stream comprises a spatial audio stream to enable immersive audio during a communication; determining a type of each of the first and second audio streams to identify which of the received first and second audio data streams comprises the spatial audio stream; processing the second audio data stream with at least one parameter dependent on the determined type; and rendering the first audio data stream and the processed second audio data stream.
The second audio data stream may be configured to comprise at least one further audio data stream, and wherein the at least one further audio data stream may comprise a determined type, and the at least one further audio data stream may be an embedded level audio data stream with respect to the second audio data stream.
The at least one further audio data stream may comprise at least one further embedded level, wherein each embedded level may comprise at least one additional audio data stream with a determined type.
The second audio data stream may be a master level audio data stream.
Each audio data streams may be further associated with at least one of: a stream identifier configured to uniquely identify the audio data stream; and a stream descriptor configured to describe the type of the audio data stream.
The type may be one of: a mono audio signal type; an immersive voice and audio services audio signal.
The at least one parameter may be configured to define a room characteristic or scene description.
The at least one parameter defining a room characteristic or scene description may comprise at least one of: direction; direction azimuth; direction elevation; distance; gain; spatial extent; energy ratio; and position.
The method may further comprise: receiving an additional audio data stream; embedding the additional audio data stream within one or other of the first audio data stream and the second audio data stream.
According to a third aspect there is provided an apparatus comprising at least one processor and at least one memory including a computer program code, the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus at least to: receive at least a first audio data stream and a second audio data stream, wherein at least one of the first and second audio stream comprises a spatial audio stream to enable immersive audio during a communication; determine a type of each of the first and second audio streams to identify which of the received first and second audio data streams comprises the spatial audio stream; process the second audio data stream with at least one parameter dependent on the determined type; and render the first audio data stream and the processed second audio data stream.
The second audio data stream may be configured to comprise at least one further audio data stream, and wherein the at least one further audio data stream may comprise a determined type, and the at least one further audio data stream may be an embedded level audio data stream with respect to the second audio data stream.
The at least one further audio data stream may comprise at least one further embedded level, wherein each embedded level may comprise at least one additional audio data stream with a determined type.
The second audio data stream may be a master level audio data stream.
Each audio data streams may be further associated with at least one of: a stream identifier configured to uniquely identify the audio data stream; and a stream descriptor configured to describe the type of the audio data stream.
The type may be one of: a mono audio signal type; an immersive voice and audio services audio signal.
The at least one parameter may be configured to define a room characteristic or scene description.
The at least one parameter defining a room characteristic or scene description may comprise at least one of: direction; direction azimuth; direction elevation; distance; gain; spatial extent; energy ratio; and position.
The apparatus may be further caused to: receive an additional audio data stream; embed the additional audio data stream within one or other of the first audio data stream and the second audio data stream.
According to a fourth aspect there is provided an apparatus comprising receiving circuitry configured to receive at least a first audio data stream and a second audio data stream, wherein at least one of the first and second audio stream comprises a spatial audio stream to enable immersive audio during a communication; determining circuitry configured to determine a type of each of the first and second audio streams to identify which of the received first and second audio data streams comprises the spatial audio stream; processing circuitry configured to process the second audio data stream with at least one parameter dependent on the determined type; and rendering circuitry configured to render the first audio data stream and the processed second audio data stream.
According to a fifth aspect there is provided a computer program comprising instructions [or a computer readable medium comprising program instructions] for causing an apparatus to perform at least the following: receive at least a first audio data stream and a second audio data stream, wherein at least one of the first and second audio stream comprises a spatial audio stream to enable immersive audio during a communication; determine a type of each of the first and second audio streams to identify which of the received first and second audio data streams comprises the spatial audio stream; process the second audio data stream with at least one parameter dependent on the determined type; and render the first audio data stream and the processed second audio data stream.
According to a sixth aspect there is provided a non-transitory computer readable medium comprising program instructions for causing an apparatus to perform at least the following: receive at least a first audio data stream and a second audio data stream, wherein at least one of the first and second audio stream comprises a spatial audio stream to enable immersive audio during a communication; determine a type of each of the first and second audio streams to identify which of the received first and second audio data streams comprises the spatial audio stream; process the second audio data stream with at least one parameter dependent on the determined type; and render the first audio data stream and the processed second audio data stream.
According to a seventh aspect there is provided an apparatus comprising: means for receiving at least a first audio data stream and a second audio data stream, wherein at least one of the first and second audio stream comprises a spatial audio stream to enable immersive audio during a communication; means for determining a type of each of the first and second audio streams to identify which of the received first and second audio data streams comprises the spatial audio stream; means for processing the second audio data stream with at least one parameter dependent on the determined type; and means for rendering the first audio data stream and the processed second audio data stream.
According to an eighth aspect there is provided a computer readable medium comprising program instructions for causing an apparatus to perform at least the following: receive at least a first audio data stream and a second audio data stream, wherein at least one of the first and second audio stream comprises a spatial audio stream to enable immersive audio during a communication; determine a type of each of the first and second audio streams to identify which of the received first and second audio data streams comprises the spatial audio stream; process the second audio data stream with at least one parameter dependent on the determined type; and render the first audio data stream and the processed second audio data stream.
An apparatus comprising means for performing the actions of the method as described above.
An apparatus configured to perform the actions of the method as described above.
A computer program comprising program instructions for causing a computer to perform the method as described above.
A computer program product stored on a medium may cause an apparatus to perform the method as described herein.
An electronic device may comprise apparatus as described herein.
A chipset may comprise apparatus as described herein.
Embodiments of the present application aim to address problems associated with the state of the art.
For a better understanding of the present application, reference will now be made by way of example to the accompanying drawings in which:
The following describes in further detail suitable apparatus and possible mechanisms to embed spatial stream(s) as object streams(s) and send the spatial stream as-is as an object to the receiving participant. The object metadata is updated based on the spatial scene. In other words the object stream type is itself another audio stream with respective object metadata generated by a processing element. This operation can be performed by a suitable device (e.g., mobile, user equipment—UE) that receives more than one input format or, for example a conference bridge (e.g. multi-point control unit—MCU).
The invention relates to immersive audio codecs capable of supporting many input audio formats, immersive audio scene representations, and services where incoming encoded audio may be, e.g., mixed, re-encoded and/or forwarded to listeners.
The IVAS codec discussed above is an extension of the 3GPP EVS codec and intended for new real-time immersive voice and audio services over 4G/5G. Such immersive services include, e.g., immersive voice and audio for virtual reality (VR) and augmented reality (AR). The multi-purpose audio codec is expected to handle the encoding, decoding and rendering of speech, music and generic audio. It is expected to support channel-based audio and scene-based audio inputs including spatial information about the sound field and sound sources. It is also expected to operate with low latency to enable conversational services as well as support high error robustness under various transmission conditions.
The IVAS encoder is configured to be able to receive an input in a supported format (and in some allowed combination of the formats). Similarly, it is expected that the decoder can output the audio in a number of supported formats. A pass-through mode has been proposed, where the audio could be provided in its original format after transmission (encoding/decoding).
There have been proposed methods describing object-based audio being implemented as an acceptable format for an IVAS codec configured to process spatial metadata combined with a suitable (mono) audio signal(s) and which can be rendered to the user. The metadata parameters can be, for example, captured from a real environment with help from any visual or auditory tracking method, or any other modality. In some embodiments radio based technology can be used to generate metadata. e.g. Bluetooth, Wifi or GPS locator technologies can be used to get object coordinates. Orientation data can be received in some embodiments using sensors such as magnetometer, accelerometer and/or gyrometer. Also other sensors such as proximity sensors can be used to generate scene relevant metadata from the real environment.
Alternatively, the metadata can be created artificially according to the defined virtual scene, for example, by a teleconferencing bridge or by a user equipment (e.g., smartphone). For example, a user may set or indicate some desired acoustic features via a suitable UI.
In some embodiments the object-based audio spatial metadata can be defined as one or more objects where each object may be defined by parameters such as Azimuth, Elevation, Distance, Gain and Spatial Extent.
Furthermore Metadata-assisted spatial audio (MASA) is a parametric spatial audio format and representation. On high level, it can be considered a representation consisting of ‘N channels+spatial metadata’. It is a scene-based audio format particularly suited for spatial audio capture on practical devices, such as smartphones, where spherical arrays for FOA/HOA capture are not realistic or convenient. The idea is to describe the sound scene in terms of time- and frequency-varying sound source directions. Where no directional sound source is detected, the audio is described as diffuse. In MASA (as currently proposed for IVAS), there can be one or two directions for each time-frequency (TF) tile. The spatial metadata is described relative to the directions and can include, e.g., spatial metadata for each direction and common spatial metadata that is independent of the directions.
For example the spatial metadata relative to the directions may comprise parameters such as a direction index, a direct-to-total energy ratio, a spread coherence, and distance. The spatial metadata that is independent of the directions may comprise parameters such as Diffuse-to-total energy ratio, Surround coherence, and Remainder-to-total energy ratio.
An example use case for IVAS is for AR/VR teleconferencing. There each participant may have his/her own object, which can be freely panned in 3D space. In the teleconferencing scenario the conference bridge may, for example, receive several IVAS streams from multiple participants. These streams are then combined to a common stream, for example, using objects for at least each active participant. Alternatively, a pre-rendered spatial scene may be created and for example represented as MASA or FOA/HOA audio formats. If objects are used, the incoming object or other mono stream (for example an EVS stream) can be directly copied to be an object stream of the out-going common conference stream by attaching a suitable metadata representation to the waveform. This may or may not include a re-encoding of the audio waveform. However, if the participant is sending a spatial audio stream such MASA or HOA, the conference bridge has to then decode all incoming IVAS streams and reduce the stream(s) to mono, before sending it downstream as a (mono) audio object.
A further use case is one where a user is capturing a scene (for example making a live pod-cast video) with a mobile device on fixed stand that has spatial audio capture enabled. Additionally a headset or some other form of close-up microphone can be used to enhance a voice recording. The close-up capture device is also capable to capture spatial audio for example with binaural capture from the headset or MASA from the spatial audio capable lavalier microphone. The close-up captured voice may then be added to the device captured IVAS spatial audio stream as an object stream. The object location and distance can be conveniently captured for example using a suitable location beacon attached to the close-up capture device. When only a mono object is allowed in IVAS the device has to down-mix the spatial stream coming from the close-up capture to mono, before embedding it to the IVAS stream. The embodiments as described herein attempt to avoid or minimise added latency and complexity and furthermore attempts to increase the maximum achievable quality.
Some embodiments as described herein thus increase the flexibility for various IVAS audio inputs in audio source mixing and forwarding. For example in AR/VR teleconferencing and other immersive use cases.
Additionally in some embodiments there is substantially less delay and complexity, avoiding generating down-mix spatial stream(s) at the AR/VR conference bridge or the capturing device. Additionally there is no loss of original input properties and or quality loss for the converted audio formats.
In some embodiments the decoder is configured to have an interface output format, a so-called pass-through mode, to have external renderers with more capability than normal integrated renderer to act as output mode.
With respect to
The system as shown in
Thus, in some embodiments the (IVAS) object stream can be configured to comprise another “objectified” (IVAS) data stream. Furthermore, object metadata is configured to contain information, whether the object is a (mono) object-based audio representation (e.g., EVS stream with spatial metadata) or full IVAS spatial stream (e.g., MASA or stereo or even object containing IVAS) that can be given object-like metadata (e.g., positional metadata). In such embodiments any “objectified” (IVAS) data stream may contain another (IVAS) object. These (IVAS) objects can be moved around to be part of any other (IVAS) object or the “main” (IVAS) data stream. Any object metadata is then updated so that it stays meaningful for the whole newly formed IVAS stream. Furthermore, in some embodiments the rest of the object metadata fields are updated according to the spatial scene description.
In such embodiments a higher quality and lower delay for conference bridge use cases, where incoming audio streams are spatial captured/created are expected. Furthermore some embodiments may be implemented in use cases where there is a main spatial audio captured by, e.g., mobile phone (UE) and additional spatial audio object(s) are captured by wireless microphones to, e.g., enhance the voice capture benefit similarly and allows (IVAS) encoding on a new class of devices (wireless microphones) without need to decode the audio at the UE to allow further encoding. Instead, the stream can be simply embedded as is.
Before discussing the embodiments further, we initially discuss the systems for obtaining and rendering spatial audio signals which may be used in some embodiments.
With respect to
With respect to
The encoder 103/113/123/133 in some embodiments comprises an audio (IVAS) input 301. The audio input 301 is configured to be able to receive one or more sets of spatial data (IVAS) streams from multiple sources either local or remote. The source(s) may be local, for example, more than one spatial capture devices in a known spatial configuration in the location of the encoder and/or multiple remote participants sending spatial IVAS streams. The audio input 301 is configured to pass the audio data stream to an object header creator 303 and to a (IVAS) decoder 311 as part of an IVAS datastream processor 313.
The encoder 103/113/123/133 in some embodiments comprises a scene controller 305 configured to control the processing of the received audio input 301.
For example in some embodiments the encoder 103/113/123/133 comprises an object header creator 303. The object header creator 303, controlled by the scene controller 305 is configured to insert each data stream as an object to the “master” data stream. In some embodiments the object header creator 305 may furthermore be configured to add missing object parameters such as distance and direction based on either true spatial configuration or a virtually defined scene.
In some embodiments the object header creator 303 is configured to determine if any inserted data stream contains objects, move those audio objects freely to be either directly part of “master” IVAS stream and update their metadata, or move the object under any other IVAS object. Additionally the object header creator 303 is configured to update the object metadata so that it is correct for the whole spatial configuration.
The encoder 103/113/123/133 in some embodiments comprises an IVAS datastream processor 313. The IVAS datastream processor 313 may comprise a (IVAS) decoder 311. The (IVAS) decoder 311 is configured to receive the one or more sets of spatial audio data streams and decode the spatial audio signals and pass them to an audio scene renderer 231.
The IVAS datastream processor 313 may comprise an audio scene renderer 231 configured to receive the audio signals and generate an audio scene rendering based on the decoded (IVAS) spatial audio signals. The audio scene rendering may constitute, e.g., a downmix of the various inputs from the (IVAS) decoder 311. The rendered audio scene audio signals may then be passed to an encoder 315.
The IVAS datastream processor 313 may comprise an encoder 315 which receives the rendered spatial audio signals and encodes them. In other words the IVAS datastream processor 313 is configured to decode all or at least some incoming datastreams and generate a common spatial scene, e.g., using IVAS MASA, IVAS HOA/FOA or IVAS mono objects.
In some embodiments where there are multiple embedded objects these can then be sent for those receivers, which have with high capability rendering available. The rest of the recipients receive only the pre-rendered spatial scene. Alternatively, a combination of at least one “IVAS stream object” and a pre-rendered “spatial scene IVAS stream object” can be used to reduce bit rate.
Additionally the encoder comprises an audio object multiplexer 309 configured to combine the objects and output a combined object datastream.
The operations of the encoder are furthermore shown by a flow diagram in
The audio (IVAS) data streams are received in
Additionally the spatial scene configuration and control is determined in
Based on the determined spatial scene configuration and control and the input audio data streams object headers for the audio data streams are created as shown in
Furthermore, optionally, the data stream is decoded based on the determined spatial scene configuration and control and the input audio data streams as shown in
The decoded data streams may then be rendered as shown in
The rendered audio scene is then encoded using a suitable (IVAS) encoder as show in
The data streams can then be multiplexed and output as shown in
The IVAS object stream metadata may utilize any suitable acoustic/spatial metadata. An example of which is provided in the following table.
However, in some embodiments other positional information, such as x-y-z or cartesian coordinates may be employed instead of azimuth-elevation-distance. For example, a further configuration may be provided by the table
However, some minimum stream description metadata is additionally required to signal the (IVAS) object data stream configuration information. For example, this information may be signaled using the following format.
In such embodiments a ‘Stream ID’ parameter is used to uniquely identify in the current session each IVAS object stream. Thus, it can be signaled each original and mixed audio component (input stream). For example, the signaling allows identification of the component in a system or on a user interface. A ‘Stream type’ parameter defines the meaning of each “audio object”. In some embodiments, an audio object is thus not only an object-based audio input. Rather, the object data stream can be an object-based audio (input) or it can be any IVAS scene. This for example is shown in
For example in
With respect to
The further object data stream 517 can furthermore comprise an explicit stream description part 523, or the stream contents may be determined by starting to decode the object stream. In this case, it is explicitly described as a MASA-based scene (e.g., ‘Stream description=MASA’).
Additionally, the further object data stream 517 comprises an MASA format bitstream part 525 (the encoded representation of the audio object) and a stream identifier 519 ‘Stream ID=000002’ uniquely identifying the object data stream.
A first advantage of the approach discussed herein is that IVAS inputs can often be conveniently forwarded without decoding/encoding operations. For example, there are no decoding/encoding operations required where a mixer device, a teleconferencing bridge (e.g., an AR/VR conference server), or other entity used to combine and/or forward audio inputs is present in the IVAS end-to-end service. Thus, by re-allocating a received (encoded) input as an IVAS object stream, the complexity and delay of the operation is reduced. For example, where the playback capability of the receiver is unknown, a server may optimize complexity by simply providing the received scene as is. Any IVAS stream can be decoded and rendered as mono to support even the simplest IVAS device. Also skipping any decoding/encoding operation at an intermediate point (e.g., conferencing server) reduces the end-to-end delay for that audio component. The user experience is thus improved.
Furthermore, the embodiments are configured such that there are only shallow embedded “objectified” IVAS streams. In other words where there is an object stream which also contains an object (and therefore may comprise multiple levels of object) a deep data structure is avoided and thus the decoder complexity is reduced. Thus the embedding as proposed in some embodiments permits an IVAS object to comprise another IVAS object, in other words although an IVAS objects can be two or more levels deep any “deep” object can in some embodiments be moved to an “upper level” object closer to the “master” IVAS stream and its metadata can be updated so that its representation stays meaningful for the newly formed scene. In some embodiments the IVAS object can be moved to be part of another IVAS object. So, the object is moved “deeper”. This may allow, for example, encoding or decoding of audio objects (e.g., mono objects) together in order to save complexity or bit rate. If the formats of same type are at different levels in the structure, they generally need be encoded/decoded at different times or using different instances. This may introduce additional complexity.
Furthermore, the embodiments as discussed herein may have a second advantage in that it is possible to conveniently nest IVAS object streams, for example, for content distribution purposes. In such embodiments a more complex scene can be handled as a single (mono) audio object. An example nested packetization is presented in
Thus, for example
Furthermore, as shown in
Additionally, the nested sixth object data stream 614 furthermore comprises further nested object data streams 622 and 624. These may for example be object data streams associated with a sub-sub-section of the sub-section of the whole scene. The seventh object data stream 622 comprises a stream ID 681 (Stream ID=000002) uniquely identifying the object data stream, a stream type identifier 683 (Stream type=1) and a data part 685. The eighth object data stream 624 comprises a stream ID 691 (Stream ID=000003) uniquely identifying the object data stream, a stream type identifier 693 (Stream type=1) and a data part 695.
A further advantage in implementing some embodiments is that any IVAS input or IVAS scene that does already comprise a spatial parameter for example positional properties can determine such properties. For example, this can be implemented by adding acoustic spatial metadata (for example one of the parameters from the earlier tables) to an IVAS object stream (‘Stream type=1’). This enables enhanced experiences, e.g., in AR/VR teleconferencing use cases.
For example,
The conventional approach shown on the top right of
Whereas by implementing embodiments as described herein the listener can switch 730 between a first option of audio object rendering of the second spatial capture 723 and the 1st spatial capture scene 721 or the second option of audio object rendering of the first spatial capture 733 and the second spatial capture scene 731. The IVAS codec can thus import as an IVAS object stream a second spatial audio representation. Thus, when the user captures a spatial audio scene using their UE, a wireless multimicrophone device or indeed a second UE connected to the “master” UE could capture a full spatial representation of the sound scene at the second position. This sound scene could now be encoded by the second device as an IVAS bitstream and provided to the second UE that could “act as a conference bridge”, ingest the IVAS bitstream, and embed it as an IVAS object stream. It would then be delivered to the listener two spatial audio scenes. For example, the user could switch between them such that a mono downmix of each scene is provided as an audio object rendering for the other scene being rendered for the user.
While
In some embodiments a look-up table specifying the packet contents can be employed. The look-up-table can be defined as ‘Payload header’, and it can be, e.g., an RTP payload header. This may include, e.g., the sizes of various blocks etc. What follows the header is the payload.
For example, as shown in
In some embodiments as shown in
There is an associated cost of nesting in the generation of additional ‘Payload header’ information and their parsing.
With respect to the decoder/renderer 105, 115, 125, 135. The decoder/renderer 105, 115, 125, 135 is configured to receive the various (IVAS) object data streams and decode and render the data streams in parallel.
In some embodiments the handling of nested audio object data streams can be performed for each sub-scene level individually and then combined at the higher level.
For example, with respect to the example shown in
In such embodiments the rendering may be either carried out on the subscene level with a summation in the rendered domain, or a combined rendering may be carried out at the end of decoding.
In some embodiments the decoder is configured to launch a separate decoder instance for each subscene. Thus, for each ‘Stream type=1’, a separate IVAS decoder instance is initialised.
With respect to
In some embodiments the device 1400 comprises at least one processor or central processing unit 1407. The processor 1407 can be configured to execute various program codes such as the methods such as described herein.
In some embodiments the device 1400 comprises a memory 1411. In some embodiments the at least one processor 1407 is coupled to the memory 1411. The memory 1411 can be any suitable storage means. In some embodiments the memory 1411 comprises a program code section for storing program codes implementable upon the processor 1407. Furthermore, in some embodiments the memory 1411 can further comprise a stored data section for storing data, for example data that has been processed or to be processed in accordance with the embodiments as described herein. The implemented program code stored within the program code section and the data stored within the stored data section can be retrieved by the processor 1407 whenever needed via the memory-processor coupling.
In some embodiments the device 1400 comprises a user interface 1405. The user interface 1405 can be coupled in some embodiments to the processor 1407. In some embodiments the processor 1407 can control the operation of the user interface 1405 and receive inputs from the user interface 1405. In some embodiments the user interface 1405 can enable a user to input commands to the device 1400, for example via a keypad. In some embodiments the user interface 1405 can enable the user to obtain information from the device 1400. For example, the user interface 1405 may comprise a display configured to display information from the device 1400 to the user. The user interface 1405 can in some embodiments comprise a touch screen or touch interface capable of both enabling information to be entered to the device 1400 and further displaying information to the user of the device 1400. In some embodiments the user interface 1405 may be the user interface for communicating with the position determiner as described herein.
In some embodiments the device 1400 comprises an input/output port 1409. The input/output port 1409 in some embodiments comprises a transceiver. The transceiver in such embodiments can be coupled to the processor 1407 and configured to enable a communication with other apparatus or electronic devices, for example via a wireless communications network. The transceiver or any suitable transceiver or transmitter and/or receiver means can in some embodiments be configured to communicate with other electronic devices or apparatus via a wire or wired coupling.
The transceiver can communicate with further apparatus by any suitable known communications protocol. For example, in some embodiments the transceiver can use a suitable universal mobile telecommunications system (UMTS) protocol, a wireless local area network (WLAN) protocol such as for example IEEE 802.X, a suitable short-range radio frequency communication protocol such as Bluetooth, or infrared data communication pathway (IRDA).
The transceiver input/output port 1409 may be configured to receive the signals and in some embodiments determine the parameters as described herein by using the processor 1407 executing suitable code. Furthermore, the device may generate a suitable downmix signal and parameter output to be transmitted to the synthesis device.
In some embodiments the device 1400 may be employed as at least part of the synthesis device. As such the input/output port 1409 may be configured to receive the downmix signals and in some embodiments the parameters determined at the capture device or processing device as described herein, and generate a suitable audio signal format output by using the processor 1407 executing suitable code. The input/output port 1409 may be coupled to any suitable audio output for example to a multichannel speaker system and/or headphones (which may be a headtracked or a non-tracked headphones) or similar.
In general, the various embodiments of the invention may be implemented in hardware or special purpose circuits, software, logic or any combination thereof. For example, some aspects may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device, although the invention is not limited thereto. While various aspects of the invention may be illustrated and described as block diagrams, flow charts, or using some other pictorial representation, it is well understood that these blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.
The embodiments of this invention may be implemented by computer software executable by a data processor of the mobile device, such as in the processor entity, or by hardware, or by a combination of software and hardware. Further in this regard it should be noted that any blocks of the logic flow as in the Figures may represent program steps, or interconnected logic circuits, blocks and functions, or a combination of program steps and logic circuits, blocks and functions. The software may be stored on such physical media as memory chips, or memory blocks implemented within the processor, magnetic media such as hard disk or floppy disks, and optical media such as for example DVD and the data variants thereof, CD.
The memory may be of any type suitable to the local technical environment and may be implemented using any suitable data storage technology, such as semiconductor-based memory devices, magnetic memory devices and systems, optical memory devices and systems, fixed memory and removable memory. The data processors may be of any type suitable to the local technical environment, and may include one or more of general purpose computers, special purpose computers, microprocessors, digital signal processors (DSPs), application specific integrated circuits (ASIC), gate level circuits and processors based on multi-core processor architecture, as non-limiting examples.
Embodiments of the inventions may be practiced in various components such as integrated circuit modules. The design of integrated circuits is by and large a highly automated process. Complex and powerful software tools are available for converting a logic level design into a semiconductor circuit design ready to be etched and formed on a semiconductor substrate.
Programs, such as those provided by Synopsys, Inc. of Mountain View, California and Cadence Design, of San Jose, California automatically route conductors and locate components on a semiconductor chip using well established rules of design as well as libraries of pre-stored design modules. Once the design for a semiconductor circuit has been completed, the resultant design, in a standardized electronic format (e.g., Opus, GDSII, or the like) may be transmitted to a semiconductor fabrication facility or “fab” for fabrication.
The foregoing description has provided by way of exemplary and non-limiting examples a full and informative description of the exemplary embodiment of this invention. However, various modifications and adaptations may become apparent to those skilled in the relevant arts in view of the foregoing description, when read in conjunction with the accompanying drawings and the appended claims. However, all such and similar modifications of the teachings of this invention will still fall within the scope of this invention as defined in the appended claims.
Number | Date | Country | Kind |
---|---|---|---|
2002900 | Feb 2020 | GB | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/FI2021/050089 | 2/10/2021 | WO |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2021/170903 | 9/2/2021 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
20150149187 | Kastner et al. | May 2015 | A1 |
20170289720 | Tsukagoshi | Oct 2017 | A1 |
20170302995 | Tsukagoshi | Oct 2017 | A1 |
20180114534 | Tsukagoshi | Apr 2018 | A1 |
20190103118 | Atti et al. | Apr 2019 | A1 |
20200013426 | Sen | Jan 2020 | A1 |
Number | Date | Country |
---|---|---|
2575509 | Jan 2020 | GB |
2004538727 | Dec 2004 | JP |
2015527611 | Sep 2015 | JP |
WO 03015414 | Feb 2003 | WO |
WO-2016035731 | Jun 2017 | WO |
WO-2016052191 | Jul 2017 | WO |
WO-2016060101 | Jul 2017 | WO |
WO-2016171002 | Feb 2018 | WO |
WO-2019105575 | Jun 2019 | WO |
WO 2020152394 | Jul 2020 | WO |
WO 2020193852 | Oct 2020 | WO |
Entry |
---|
“IVAS design constraints from an end-to-end perspective”, Dolby Laboratories Inc., 3GPP TSG-SA#100 meeting, S4(18)1099, Oct. 2018, 12 pages. |
Number | Date | Country | |
---|---|---|---|
20230085918 A1 | Mar 2023 | US |