The present application relates to apparatus and methods for sound-field related audio representation and rendering, but not exclusively for audio representation for an audio decoder.
Spatial audio playback to present media with multiple viewing directions is known. Examples of this playback include the viewing visual content of such a media include playback with: on head-mounted displays (or phones in head mounts) with (at least) head orientation tracking; or on phone screen without a head-mount where the view direction can be tracked by changing the position/orientation of the phone, or by any user interface gestures; or on surrounding screens.
A video associated with “media with multiple viewing directions” can be for example 360-degree video, 180-degree video, or other video substantially wider in viewing angle than traditional video. Traditional video refers to video content typically displayed as whole on a screen without an option (or any particular need) to change the viewing direction.
Audio associated with the video with multiple viewing directions can be presented on headphones, where the viewing direction is tracked and is affecting the spatial audio playback, or with surround loudspeaker setups.
Spatial audio that is associated with the video with multiple viewing directions can originate from spatial audio capture from microphone arrays (e.g., an array mounted on OZO-like VR camera, or a hand-held mobile device), or other sources such as studio mixes. The audio content can be also a mixture of several content types, such as microphone-captured sound and an added commentator track.
Spatial audio associated with the video with multiple viewing directions can be in various forms, for example: Ambisonic signal (of any order) consisting of spherical harmonic audio signal components. The spherical harmonics can be considered as a set of spatially selective beam signals. Ambisonics is utilized currently, e.g., in YouTube 360 VR video service. The advantage of Ambisonics is that it is a simple and well-defined signal representation; Surround loudspeaker signal, e.g., 5.1. Presently the spatial audio of typical movies is conveyed in this form. The advantage of a surround loudspeaker signal is the simplicity and legacy compatibility. Some audio formats similar to the surround loudspeaker signal format include audio objects, which can be considered as audio channels with a time-variant position. A position may inform both the direction and distance of the audio object, or the direction; Parametric spatial audio, such as two audio channels audio signal and associated spatial metadata in perceptually relevant frequency bands. Some state-of-the-art audio coding methods and spatial audio capture methods apply such a signal representation. The spatial metadata essentially determines how the audio signals should be spatially reproduced at the receiver end (e.g. to which directions at different frequencies). The advantage of parametric spatial audio is its versatility, quality, and ability to use low bit rates for encoding.
There is provided according to a first aspect an apparatus comprising means configured to: obtain at least one focus parameter configured to define a focus shape; process a spatial audio signal that represents an audio scene to generate a processed spatial audio signal that represents a modified audio scene, so as to control relative emphasis in, at least in part, a portion of the spatial audio signal in the focus shape relative to at least in part other portions of the spatial audio signals outside the focus shape; and output the processed spatial audio signal, wherein the modified audio scene enables the relative emphasis in, at least in part, the portion of the spatial audio signal in the focus shape relative to at least in part other portions of the spatial audio signals outside the focus shape.
At least one focus parameter may be further configured to define a focus amount, and the means configured to process the spatial audio signal may be configured to process the spatial audio signal so as to control relative emphasis in, at least in part, a portion of the spatial audio signal in the focus shape relative to at least in part other portions of the spatial audio signals outside the focus shape further according to the focus amount.
The means configured to process the spatial audio signal may be configured to: increase relative emphasis in or decrease relative emphasis in, at least in part, the portion of the spatial audio signal in the focus shape relative to at least in part other portions of the spatial audio signals outside the focus shape.
The means configured to process the spatial audio signal may be configured to increase or decrease a relative sound level in, at least in part, the portion of the spatial audio signal in the focus shape relative to at least in part other portions of the spatial audio signals outside the focus shape.
The means configured to process the spatial audio signal may be configured to increase or decrease a relative sound level in, at least in part, the portion of the spatial audio signal in the focus shape relative to at least in part other portions of the spatial audio signals outside the focus shape according to the focus amount.
The means may be configured to obtain reproduction control information to control at least one aspect of outputting the processed spatial audio signal, and wherein the means configured to output the processed spatial audio signal may be configured to perform one of: process the processed spatial audio signal that represents the modified audio scene to generate an output spatial audio signal in accordance with the reproduction control information; process the spatial audio signal in accordance with the reproduction control information prior to the means configured to process the spatial audio signal that represents an audio scene to generate the processed spatial audio signal that represents a modified audio scene and output the processed spatial audio signal as the output spatial audio signal.
The spatial audio signal and the processed spatial audio signal may comprise respective Ambisonic signals and wherein the means configured to process the spatial audio signal to generate the processed spatial audio signal may be configured, for one or more frequency sub-bands, to: convert the Ambisonic signals associated with the spatial audio signal to a set of beam signals in a defined pattern; generate, a set of modified beam signals based on the set of beam signals, the focus shape and the focus amount; and convert the modified beam signals to generate the modified Ambisonic signals associated with the processed spatial audio signal.
The defined pattern may comprise a defined number of beams which are evenly spaced over a plane or over a volume.
The spatial audio signal and the processed spatial audio signal may comprise respective higher order Ambisonic signals.
The spatial audio signal and the processed spatial audio signal may comprise a subset of Ambisonic signal components of any order.
The spatial audio signal and the processed spatial audio signal may comprise respective parametric spatial audio signals, wherein a parametric spatial audio signal may comprise one or more audio channels and spatial metadata, wherein the spatial metadata may comprise a respective direction indication, an energy ratio parameter, and potentially a distance indication for a plurality of frequency sub-bands, wherein the means configured to process the input spatial audio signal to generate the processed spatial audio signal may be configured to: compute, for one or more frequency sub-bands spectral adjustment factors based on the spatial metadata and the focus shape and focus amount; apply the spectral adjustment factors for the one or more frequency sub-bands of the one or more audio channels to generate one or more processed audio channels; compute respective modified energy ratio parameters associated with the one or more frequency sub-bands of the processed spatial audio signal based on the focus shape, focus amount and at least a part of the spatial metadata; and compose the processed spatial audio signal comprising the one or more processed audio channels, the modified energy ratio parameters, and the spatial metadata other than the energy ratio parameters.
The spatial audio signal and the processed spatial audio signal may comprise multi-channel loudspeaker channels and/or audio object channels, wherein the means configured to process the spatial audio signal into the processed spatial audio signal may be configured to: compute gain adjustment factors based on the respective audio channel direction indication, the focus shape and focus amount; apply the gain adjustment factors to the respective audio channels; and compose the processed spatial audio signal comprising the one or more processed multichannel loudspeaker audio channels and/or the one or more processed audio object channels.
The multi-channel loudspeaker channels and/or audio object channels may further comprise respective audio channel distance indication, and wherein the computing gain adjustment factors may be further based on the audio channel distance indication.
The means may be further configured to determine a default respective audio channel distance, and wherein the computing gain adjustment factors may be further based on the audio channel distance.
The at least one focus parameter configured to define a focus shape may comprise at least one of: a focus direction; a focus width; a focus height; a focus radius; a focus distance; a focus depth; a focus range; a focus diameter; and a focus shape characterizer.
The means may be further configured to obtain a focus input from a sensor arrangement that comprises at least one direction sensor and at least one user input, wherein the focus input may comprise: an indication of a focus direction for the focus shape based on the at least one direction sensor direction; and an indication of a focus width based on the at least one user input.
The focus input may further comprise an indication of the focus amount based on the at least one user input.
According to a second aspect there is provided a method comprising: obtaining at least one focus parameter configured to define a focus shape; processing a spatial audio signal that represents an audio scene to generate a processed spatial audio signal that represents a modified audio scene, so as to control relative emphasis in, at least in part, a portion of the spatial audio signal in the focus shape relative to at least in part other portions of the spatial audio signals outside the focus shape; and outputting the processed spatial audio signal, wherein the modified audio scene enables the relative emphasis in, at least in part, the portion of the spatial audio signal in the focus shape relative to at least in part other portions of the spatial audio signals outside the focus shape.
At least one focus parameter may be further configured to define a focus amount, and processing the spatial audio signal may comprise processing the spatial audio signal so as to control relative emphasis in, at least in part, a portion of the spatial audio signal in the focus shape relative to at least in part other portions of the spatial audio signals outside the focus shape further according to the focus amount.
Processing the spatial audio signal may comprise: increasing relative emphasis in or decrease relative emphasis in, at least in part, the portion of the spatial audio signal in the focus shape relative to at least in part other portions of the spatial audio signals outside the focus shape.
Processing the spatial audio signal may comprise increasing or decreasing a relative sound level in, at least in part, the portion of the spatial audio signal in the focus shape relative to at least in part other portions of the spatial audio signals outside the focus shape.
Processing the spatial audio signal may comprise increasing or decreasing a relative sound level in, at least in part, the portion of the spatial audio signal in the focus shape relative to at least in part other portions of the spatial audio signals outside the focus shape according to the focus amount.
The method may comprise obtaining reproduction control information to control at least one aspect of outputting the processed spatial audio signal, and wherein outputting the processed spatial audio signal may comprise performing one of: processing the processed spatial audio signal that represents the modified audio scene to generate an output spatial audio signal in accordance with the reproduction control information; processing the spatial audio signal in accordance with the reproduction control information prior to the means configured to process the spatial audio signal that represents an audio scene to generate the processed spatial audio signal that represents a modified audio scene and output the processed spatial audio signal as the output spatial audio signal.
The spatial audio signal and the processed spatial audio signal may comprise respective Ambisonic signals and wherein processing the spatial audio signal to generate the processed spatial audio signal may comprise, for one or more frequency sub-bands: converting the Ambisonic signals associated with the spatial audio signal to a set of beam signals in a defined pattern; generating, a set of modified beam signals based on the set of beam signals, the focus shape and the focus amount; and converting the modified beam signals to generate the modified Ambisonic signals associated with the processed spatial audio signal.
The defined pattern may comprise a defined number of beams which are evenly spaced over a plane or over a volume.
The spatial audio signal and the processed spatial audio signal may comprise respective higher order Ambisonic signals.
The spatial audio signal and the processed spatial audio signal may comprise a subset of Ambisonic signal components of any order.
The spatial audio signal and the processed spatial audio signal may comprise respective parametric spatial audio signals, wherein a parametric spatial audio signal may comprise one or more audio channels and spatial metadata, wherein the spatial metadata may comprise a respective direction indication, an energy ratio parameter, and potentially a distance indication for a plurality of frequency sub-bands, wherein processing the input spatial audio signal to generate the processed spatial audio signal may comprise: computing, for one or more frequency sub-bands spectral adjustment factors based on the spatial metadata and the focus shape and focus amount; applying the spectral adjustment factors for the one or more frequency sub-bands of the one or more audio channels to generate one or more processed audio channels; computing respective modified energy ratio parameters associated with the one or more frequency sub-bands of the processed spatial audio signal based on the focus shape, focus amount and at least a part of the spatial metadata; and composing the processed spatial audio signal comprising the one or more processed audio channels, the modified energy ratio parameters, and the spatial metadata other than the energy ratio parameters.
The spatial audio signal and the processed spatial audio signal may comprise multi-channel loudspeaker channels and/or audio object channels, wherein processing the spatial audio signal into the processed spatial audio signal may comprise: computing gain adjustment factors based on the respective audio channel direction indication, the focus shape and focus amount; applying the gain adjustment factors to the respective audio channels; and composing the processed spatial audio signal comprising the one or more processed multichannel loudspeaker audio channels and/or the one or more processed audio object channels.
The multi-channel loudspeaker channels and/or audio object channels may further comprise respective audio channel distance indication, and wherein the computing gain adjustment factors may be further based on the audio channel distance indication.
The method may further comprise determining a default respective audio channel distance, and wherein the computing gain adjustment factors may be further based on the audio channel distance.
The at least one focus parameter configured to define a focus shape may comprise at least one of: a focus direction; a focus width; a focus height; a focus radius; a focus distance; a focus depth; a focus range; a focus diameter; and a focus shape characterizer.
The method may further comprise obtaining a focus input from a sensor arrangement that comprises at least one direction sensor and at least one user input, wherein the focus input may comprise: an indication of a focus direction for the focus shape based on the at least one direction sensor direction; and an indication of a focus width based on the at least one user input.
The focus input may further comprise an indication of the focus amount based on the at least one user input.
According to a third aspect there is provided an apparatus comprising at least one processor and at least one memory including a computer program code, the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus at least to: obtain at least one focus parameter configured to define a focus shape; process a spatial audio signal that represents an audio scene to generate a processed spatial audio signal that represents a modified audio scene, so as to control relative emphasis in, at least in part, a portion of the spatial audio signal in the focus shape relative to at least in part other portions of the spatial audio signals outside the focus shape; and output the processed spatial audio signal, wherein the modified audio scene enables the relative emphasis in, at least in part, the portion of the spatial audio signal in the focus shape relative to at least in part other portions of the spatial audio signals outside the focus shape.
At least one focus parameter may be further configured to define a focus amount, and the apparatus caused to process the spatial audio signal may be caused to process the spatial audio signal so as to control relative emphasis in, at least in part, a portion of the spatial audio signal in the focus shape relative to at least in part other portions of the spatial audio signals outside the focus shape further according to the focus amount.
The apparatus caused to process the spatial audio signal may be caused to: increase relative emphasis in or decrease relative emphasis in, at least in part, the portion of the spatial audio signal in the focus shape relative to at least in part other portions of the spatial audio signals outside the focus shape.
The apparatus caused to process the spatial audio signal may be caused to increase or decrease a relative sound level in, at least in part, the portion of the spatial audio signal in the focus shape relative to at least in part other portions of the spatial audio signals outside the focus shape.
The apparatus caused to process the spatial audio signal may be caused to increase or decrease a relative sound level in, at least in part, the portion of the spatial audio signal in the focus shape relative to at least in part other portions of the spatial audio signals outside the focus shape according to the focus amount.
The apparatus may be caused to obtain reproduction control information to control at least one aspect of outputting the processed spatial audio signal, and wherein the apparatus caused to output the processed spatial audio signal may be caused to perform one of: process the processed spatial audio signal that represents the modified audio scene to generate an output spatial audio signal in accordance with the reproduction control information; process the spatial audio signal in accordance with the reproduction control information prior to the means configured to process the spatial audio signal that represents an audio scene to generate the processed spatial audio signal that represents a modified audio scene and output the processed spatial audio signal as the output spatial audio signal.
The spatial audio signal and the processed spatial audio signal may comprise respective Ambisonic signals and wherein the apparatus caused to process the spatial audio signal to generate the processed spatial audio signal may be caused, for one or more frequency sub-bands, to: convert the Ambisonic signals associated with the spatial audio signal to a set of beam signals in a defined pattern; generate, a set of modified beam signals based on the set of beam signals, the focus shape and the focus amount; and convert the modified beam signals to generate the modified Ambisonic signals associated with the processed spatial audio signal.
The defined pattern may comprise a defined number of beams which are evenly spaced over a plane or over a volume.
The spatial audio signal and the processed spatial audio signal may comprise respective higher order Ambisonic signals.
The spatial audio signal and the processed spatial audio signal may comprise a subset of Ambisonic signal components of any order.
The spatial audio signal and the processed spatial audio signal may comprise respective parametric spatial audio signals, wherein a parametric spatial audio signal may comprise one or more audio channels and spatial metadata, wherein the spatial metadata may comprise a respective direction indication, an energy ratio parameter, and potentially a distance indication for a plurality of frequency sub-bands, wherein the apparatus caused to process the input spatial audio signal to generate the processed spatial audio signal may be caused to: compute, for one or more frequency sub-bands spectral adjustment factors based on the spatial metadata and the focus shape and focus amount; apply the spectral adjustment factors for the one or more frequency sub-bands of the one or more audio channels to generate one or more processed audio channels; compute respective modified energy ratio parameters associated with the one or more frequency sub-bands of the processed spatial audio signal based on the focus shape, focus amount and at least a part of the spatial metadata; and compose the processed spatial audio signal comprising the one or more processed audio channels, the modified energy ratio parameters, and the spatial metadata other than the energy ratio parameters.
The spatial audio signal and the processed spatial audio signal may comprise multi-channel loudspeaker channels and/or audio object channels, wherein the apparatus caused to process the spatial audio signal into the processed spatial audio signal may be caused to: compute gain adjustment factors based on the respective audio channel direction indication, the focus shape and focus amount; apply the gain adjustment factors to the respective audio channels; and compose the processed spatial audio signal comprising the one or more processed multichannel loudspeaker audio channels and/or the one or more processed audio object channels.
The multi-channel loudspeaker channels and/or audio object channels may further comprise respective audio channel distance indication, and wherein the computing gain adjustment factors may be further based on the audio channel distance indication.
The apparatus may be further caused to determine a default respective audio channel distance, and wherein the computing gain adjustment factors may be further based on the audio channel distance.
The at least one focus parameter configured to define a focus shape may comprise at least one of: a focus direction; a focus width; a focus height; a focus radius; a focus distance; a focus depth; a focus range; a focus diameter; and a focus shape characterizer.
The apparatus may be further caused to obtain a focus input from a sensor arrangement that comprises at least one direction sensor and at least one user input, wherein the focus input may comprise: an indication of a focus direction for the focus shape based on the at least one direction sensor direction; and an indication of a focus width based on the at least one user input. The focus input may further comprise an indication of the focus amount based on the at least one user input.
According to a fourth aspect there is provided an apparatus comprising focus parameter obtaining circuitry configured to obtain at least one focus parameter configured to define a focus shape; spatial audio signal processing circuitry configured to process a spatial audio signal that represents an audio scene to generate a processed spatial audio signal that represents a modified audio scene, so as to control relative emphasis in, at least in part, a portion of the spatial audio signal in the focus shape relative to at least in part other portions of the spatial audio signals outside the focus shape; and output control circuitry configured to output the processed spatial audio signal, wherein the modified audio scene enables the relative emphasis in, at least in part, the portion of the spatial audio signal in the focus shape relative to at least in part other portions of the spatial audio signals outside the focus shape.
According to a fifth aspect there is provided a computer program comprising instructions [or a computer readable medium comprising program instructions] for causing an apparatus to perform at least the following: obtain at least one focus parameter configured to define a focus shape; process a spatial audio signal that represents an audio scene to generate a processed spatial audio signal that represents a modified audio scene, so as to control relative emphasis in, at least in part, a portion of the spatial audio signal in the focus shape relative to at least in part other portions of the spatial audio signals outside the focus shape; and output the processed spatial audio signal, wherein the modified audio scene enables the relative emphasis in, at least in part, the portion of the spatial audio signal in the focus shape relative to at least in part other portions of the spatial audio signals outside the focus shape.
According to a sixth aspect there is provided a non-transitory computer readable medium comprising program instructions for causing an apparatus to perform at least the following: obtain at least one focus parameter configured to define a focus shape; process a spatial audio signal that represents an audio scene to generate a processed spatial audio signal that represents a modified audio scene, so as to control relative emphasis in, at least in part, a portion of the spatial audio signal in the focus shape relative to at least in part other portions of the spatial audio signals outside the focus shape; and output the processed spatial audio signal, wherein the modified audio scene enables the relative emphasis in, at least in part, the portion of the spatial audio signal in the focus shape relative to at least in part other portions of the spatial audio signals outside the focus shape.
According to a seventh aspect there is provided an apparatus comprising: means for obtaining at least one focus parameter configured to define a focus shape; means for processing a spatial audio signal that represents an audio scene to generate a processed spatial audio signal that represents a modified audio scene, so as to control relative emphasis in, at least in part, a portion of the spatial audio signal in the focus shape relative to at least in part other portions of the spatial audio signals outside the focus shape; and means for outputting the processed spatial audio signal, wherein the modified audio scene enables the relative emphasis in, at least in part, the portion of the spatial audio signal in the focus shape relative to at least in part other portions of the spatial audio signals outside the focus shape.
According to an eighth aspect there is provided a computer readable medium comprising program instructions for causing an apparatus to perform at least the following: obtaining at least one focus parameter configured to define a focus shape; processing a spatial audio signal that represents an audio scene to generate a processed spatial audio signal that represents a modified audio scene, so as to control relative emphasis in, at least in part, a portion of the spatial audio signal in the focus shape relative to at least in part other portions of the spatial audio signals outside the focus shape; and outputting the processed spatial audio signal, wherein the modified audio scene enables the relative emphasis in, at least in part, the portion of the spatial audio signal in the focus shape relative to at least in part other portions of the spatial audio signals outside the focus shape.
An apparatus comprising means for performing the actions of the method as described above.
An apparatus configured to perform the actions of the method as described above.
A computer program comprising program instructions for causing a computer to perform the method as described above.
A computer program product stored on a medium may cause an apparatus to perform the method as described herein.
An electronic device may comprise apparatus as described herein.
A chipset may comprise apparatus as described herein.
Embodiments of the present application aim to address problems associated with the state of the art.
For a better understanding of the present application, reference will now be made by way of example to the accompanying drawings in which:
The following describes in further detail suitable apparatus and possible mechanisms for the provision of efficient rendering and playback of spatial audio signals.
Previous spatial audio signal playback example allows the user to control the focus direction and the focus amount. However, in some situations, such control of the focus direction/amount may not be sufficient. In some situations, it may be desirable to enable the user with a control interface to control the shape of the focus. In a sound field, there may be a number of different features such as multiple dominant sound sources in certain viewing directions as well as ambient sounds. Some users may prefer to hear certain features of the sound field whereas some others may prefer to hear alternative features of the sound field depending on which viewing direction is desirable. It is understood that such playback audio is dependent on one or more preferences and can be configurable based on user related preferences. The desired performance from the playback apparatus is to configure playback of the spatial sound so that the focus to various shapes or areas (e.g., narrow, wide, shallow, deep, near, far) can be controlled.
As an example, there may be audio content of interest within a sector (or a cone or another spatial span or range) rather than simply in one direction. Specifically it may be useful to control the spatial span of the focus. The
For example at first focus to all sources in the theatre play by keeping the focus sector relatively wide (as shown in
As another example, the desired or interesting audio content may be at a certain distance (with respect to the listener or with respect to another position). For example there may be an undesired or uninteresting audio source at a certain distance in a certain direction and a desired or an interesting audio source at another distance in the same direction (or nearly the same direction). This is shown in
Hence, the embodiments as discussed herein attempt to provide control of the focus shape (in addition to the focus direction and amount). The concept as discussed with respect to the embodiments described herein relates to spatial audio reproduction in media playback with multiple viewing directions by providing control of the audio focus shape where the audio scene over the controlled audio focus shape changes but the signal format can remain the same.
The embodiments provide at least one focus shape parameter corresponding to a selectable direction by adjusting any (or a combination of two or all) of the following parameters corresponding to the selected direction: focus width; focus height; focus radius; focus distance; and focus depth.: This parameter set in some embodiments comprises parameters which define any arbitrary shape. The spatial audio signal processing can in some embodiments be performed by: obtaining spatial audio signals associated with the media with multiple viewing directions; obtaining the focus direction and amount parameters; obtaining at least one focus shape parameter; modifying the spatial audio signals to have the desired focus characteristics; and reproducing the modified spatial audio signals (with headphones or loudspeakers).
The obtained spatial audio signals may, for example, be: Ambisonic signals; loudspeaker signals; parametric spatial audio formats such as a set of audio channels and the associated spatial metadata.
The focus shape may in some embodiments depend on which parameters are available. For example, in the case of having only direction, width, and height, the shape may be an ellipsoid cone-type volume. As another example, in the case of having only distance and depth, the focus shape may be a hollow sphere. In the case of not having width/height and/or depth, they may be considered to have some default value. Moreover, in some embodiments, an arbitrary focus shape may be used.
The focus amount may in some embodiments determine the ‘degree’ or how much to focus. For example the focus may be from 0% to 100%, where 0% means keeping the original sound scene unmodified, and 100% means focusing maximally on the desired spatial shape.
In some embodiments different users may want to have different focus characteristics and the original spatial audio signals may be individually modified and reproduced for each user, based on their individual preferences.
In the illustration of
Typically, the input audio signal and the audio signal with a focused sound component are provided in the same predefined spatial format, whereas the output audio signal may be provided in the same spatial format as applied for the input audio signal (and the audio signal with a focused sound component) or a different predefined spatial format may be employed for the output audio signal. The spatial audio format of the output audio signal is selected in view of the characteristics of the sound reproduction hardware applied for playback for the output audio signal.
In general, the input audio signal may be provided in a first predetermined spatial audio format and the output audio signal may be provided in a second predetermined spatial audio format. Non-limiting examples of spatial audio formats suitable for use as the first and/or second spatial audio format include Ambisonics, surround loudspeaker signals according to a predefined loudspeaker configuration, a predefined parametric spatial audio format. More detailed non-limiting examples of usage of these spatial audio formats in the framework of the spatial audio processing arrangement 250 as the first and/or second spatial audio format are provided later in this disclosure.
The spatial audio processing arrangement 250 is typically applied to process the input spatial audio signal 200 as a sequence of input frames into a respective sequence of output frames, each input (output) frame including a respective segment of digital audio signal for each channel of the input (output) spatial audio signal, provided as a respective time series of input (output) samples at a predefined sampling frequency. In some embodiments the input signal to the spatial audio processing arrangement 250 can be an encoded form, for example AAC, or AAC+ embedded metadata. In such embodiments the encoded audio input can be initially decoded. Similarly in some embodiments, the output from the spatial audio processing arrangement 250 could be encoded in any suitable manner.
In typical example, the spatial audio processing arrangement 250 employs a fixed predefined frame length such that each frame comprises respective L samples for each channel of the input spatial audio signal, which at the predefined sampling frequency maps to a corresponding duration in time. As an example in this regard, the fixed frame length may be 20 milliseconds (ms), which at a sampling frequency of 8, 16, 32 or 48 kHz results in a frame of L=160, L=320, L=640 and L=960 samples per channel, respectively. The frames may be non-overlapping or they may be partially overlapping, depending on if the processors apply filter banks and how these filter banks are configured. These values, however, serve as non-limiting examples and frame lengths and/or sampling frequencies different from these examples may be employed instead, depending e.g. on the desired audio bandwidth, on desired framing delay and/or on available processing capacity.
In the spatial audio processing arrangement 250, the focus refers to a user-selectable spatial region of interest. The focus may be, for example, a certain direction, distance, radius, arc of the audio scene in general. In another example, the focus region in which a (directional) sound source of interest is currently positioned. In the former scenario, the user-selectable focus typically denotes a region that stays constant or changes infrequently since the focus is predominantly in a specific spatial region, whereas in the latter scenario the user-selected focus may change more frequently since the focus is set to a certain sound source that may (or may not) change its position/shape/size in the audio scene over time. In an example, the focus may be defined, for example, as an azimuth angle that defines the spatial direction of interest with respect to a first predefined reference direction and/or as an elevation angle that defines the spatial direction of interest with respect to a second predefined reference direction and/or a shape and/or distance and/or radius or shape parameter.
The functionality described in the foregoing with references to components of the spatial audio processing arrangement 250 may be provided, for example, in accordance with a method 260 illustrated by a flowchart depicted in
The method 260 may be varied in a plurality of ways, for example in accordance with examples pertaining to respective functionality of components of the spatial audio processing arrangement 250 provided in the foregoing and in the following.
In some embodiments the input to the spatial audio processing arrangement 250 is Ambisonic signals. The apparatus can be configured to receive (and the method can be applied to) Ambisonic signals of any order. However, as the first-order Ambisonic (FOA) signal is in terms of the spatial selectivity fairly broad (first-degree directivity in specific), having fine control on focus shape is better exemplified with higher-order Ambisonics (HOA) that have higher spatial selectivity.
In particular in the following examples the method and apparatus is configured to receive 3rd order Ambisonic audio signals.
3rd order Ambisonic audio signals have 16 beam pattern signals in total (in 3D). However for simplicity the following example consider here only those 7 Ambisonic components (in other word the audio signals) that are more “horizontal”, as shown in
With respect to
The input to the focus processor 550 in this example as described above is a subset 3rd order Ambisonic signal, for example the subsets 309 and 311. The 3rd order Ambisonic signal xHOA(t) 500 is also described in the following as HOA for simplicity. A signal x(t), where t is the discrete sample index, arriving from horizontal azimuth θ can be represented as a HOA signal by:
where a(θ) is the vector of Ambisonic weights for azimuth θ. As seen in this equation, the selected subset of the Ambisonic patterns can be defined with these very simple mathematical expressions in the horizontal plane.
In some embodiments the focus processor 550 comprises a matrix processor 501. The matrix processor 501 is configured in some embodiments to convert the Ambisonic (HOA) signals 500 (corresponding to Ambisonic or spherical harmonic patterns) to a set of beam signals (corresponding to beam patterns) in 7 evenly spaced horizontal directions. This in some embodiments may be represented by a transformation matrix T(θf), where θf is the focus direction 502 parameter:
Note that the transformation includes the focus direction θf 502 parameter based processing such that the first pattern is aligned to the focus direction and the other patterns are aligned to other directions symmetrically spaced.
For example, when θf=20 degrees, the beam patterns corresponding to the transformed signals xc(t) 504 and the beam patterns corresponding to the original HOA signals are shown in
The focus processor 550 may further comprise a spatial beams (based on focus parameters) processor 503. The spatial beams processor 503 is configured to receive the transformed Ambisonic signals xc(t) 504 from the matrix processor 501 and furthermore receive the focus amount and width focus parameters 508.
The spatial beams processor 503 is configured to then to modify the spatial beam signals xc(t) 504 to generate processed or modified spatial beam signals x′c(t) 506 based on the focus amount and shape parameters 508. The processed or modified spatial beam signals x′c(t) 506 can then be output to a further matrix processor 505. The spatial beams processor 503 is configured to implement various processing methods based on the types of focus shape parameters. In this example embodiment the focus parameters are focus direction, focus width, and focus amount. The focus amount can be determined as a value a ranging between 0 . . . 1 where 1 denotes the maximum focus. The focus width θw(determined as the angle from the focus direction to the edge of the focus arc) is also a variable or controllable parameter. The spatial beam signals can be generated by
x′
c(t)=I(θw, a)xc(t),
where I(θw, a) is a diagonal matrix with its diagonal elements determined as i(θw, a), where
It should be noticed that the beams xc(t) are in this example formulated in such a manner that the first beam points towards the focus direction, the second beam towards the focus direction+p, and so on. As the result, when applying the matrix I(θw, a), the beams farther away from the focus direction will be attenuated depending on the focus width parameter.
The focus processor 201 comprises a further matrix processor 505. The further matrix processor 505 is configured to receive the processed or modified spatial beam signals x′c(t) 506 and the focus direction 502 and inverse transform the result to generate the focus-processed HOA signals. The transformation matrix T(θf) is invertible, and therefore the inversion processing can be expressed as
x′
HOA(t)=T−1(θf)x′c(t),
where x′HOA(t) is the focus processed HOA output 510.
With respect to
In the above examples, HOA processing is considered only in a set of more “horizontal” beam pattern signals was shown. It would be understood that these operations can be extended to 3D, using a set of beam patterns in 3D.
With respect to
The initial operation is receiving the HOA audio signals (and the focus parameters such as direction, width, amount or other control information) as shown in
The next operation is the generating of the transformed HOA audio signals into beam signals as shown in
Having transformed the HOA audio signals into beam signals then the next operation is one of spatial beams processing as shown in
Then the processed beam audio signals are then inverse transformed back into a HOA format as shown in
The processed HOA audio signals are then output as shown in
With respect to
Services) audio stream, which can be decoded and demultiplexed to the form of spatial metadata and audio channels. A typical number of audio channels in such a parametric spatial audio stream is two audio channels audio signals, however in some embodiments the number of audio channels can be any number of audio channels.
In these examples the parametric information comprises depth/distance information, which may be implemented in 6-degrees of freedom (6DOF) reproduction. In 6DOF, the distance metadata is used (along with the other metadata) to determine how the sound energy and direction should change as a function of user movement.
Therefore in this example each spatial metadata direction parameter is associated both with a direct-to-total energy ratio and a distance parameter. The estimation of distance parameters in context of parametric spatial audio capture has been detailed in earlier applications such as GB patent applications GB1710093.4 and GB1710085.0 and is not explored further for clarity reasons.
The focus processor 850 configured to receive parametric (in this case 6DOF-enabled) spatial audio 800 is configured to use the focus parameters (which in these examples are focus direction, amount, distance, and radius) to determine how much the direct and ambient components of the parametric spatial audio signal should be attenuated or emphasized to enable the focus effect.
In the following example the method (and the formulas) are expressed without any variations over time it should be understood that all the parameters may vary over the time.
In some embodiments the focus processor comprises a ratio modifier and spectral adjustment factor determiner 801 which is configured to receive the focus parameters 808 and additionally the spatial metadata consisting of directions 802, distances 822, direct-to-total energy ratios 804 in frequency bands.
The ratio modifier and spectral adjustment factor determiner is configured to implement the focus shape as a sphere in 3D space. First, the focus direction and distance are converted to a Cartesian coordinate system (3×1 y-z-x vector f) by
Similarly, at each frequency band k, the spatial metadata directions and distances are converted into the Cartesian coordinate system (3×1 y-z-x vector m(k)) by
The units of the spatial metadata distance and focus distance parameters should be the same (e.g., both in meters, or in any other scale). A mutual distance value d(k) between f and m(k) may be formulated simply as:
d(k)=|f−m(k)|,
which here means the length of the vector (f−m(k)).
The mutual distance value d(k) is then utilized in a gain-function along with the focus amount parameter a that is between 0 . . . 1 and the focus radius parameter dr (in same units as d(k)). When we perform focus, an example gain formula is
where c is a gain constant for the focus, for example a value of 4.
In practice, it may be desirable to smooth the above functions such that the focus gain function smoothly transitions from a high value at the focus area to a low value at the non-focused area.
Then a new direct portion value D(k) of the parametric spatial audio signal can be formulated as
D(k)=r(k )*f(k)
where r(k) is the direct-to-total energy ratio value at band k. A new ambient portion value A(k) can be formulated as
A(k)=(1−r(k))*(1−a).
The spectral correction factors (k) that is output 812 to a spectral adjustment processor 803 is then formulated based on the overall modification of the sound energy, in other words,
s(k)=√{square root over (D(k)+A(k))}.
A new modified direct-to-total energy ratio parameter r′(k) is then formulated to replace r(k) in the spatial metadata
At the numerically undetermined case D(k)=A(k)=0, then r′(k) can also be set to zero.
The direction and distance parameters of the spatial metadata may in some embodiments be not modified by the metadata adjustment and spectral adjustment factor determiner 801 and the modified and unmodified metadata output 810.
The spatial processor 850 may comprise a spectral adjustment processor 803. The spectral adjustment processor 803 may be configured to receive the audio signals 806 and the spectral adjustment factors 812. The audio signals can in some embodiments be in a time-frequency representation, or alternatively they are first transformed to the time-frequency domain for the spectral adjustment processing. The output 814 also can be in the time-frequency domain, or inverse transformed to the time domain before the output. The domain of the input and output depends on the implementation.
The spectral adjustment processor 803 may be configured to multiply, for each band k, the frequency bins (of the time frequency transform) of all channels within the band k by the spectral adjustment factor s(k). In other words performing the spectral adjustment. The multiplication (i.e., spectral correction) may be smoothed over time to avoid processing artefacts.
In other words, the processor is configured to modify the spectrum of the signal and the spatial metadata such that the procedure results in a parametric spatial audio signal that has been modified according to the focus parameters (in this case: focus direction, amount, distance, radius).
With respect to
The initial operation is receiving the parametric spatial audio signals (and focus parameters or other control information) as shown in
The next operation is the modifying of the parametric metadata and generating the spectral adjustment factors as shown in
The next operation is making a spectral adjustment to the audio signals as shown in
With respect to
For audio objects which have a direction and a distance (i.e., a position), the focus gain determiner 901 can utilize the same implementation processing as expressed in context of the parametric audio processing to determine the direct-gain f(k) 912 based on the spatial metadata and the focus parameters. In these embodiments there is no filter bank. In other words, there is only one frequency band k.
The focus processor furthermore may comprise a focus gain processor (for each channel) 903. The focus gain processor 903 is configured to receive the focus gains f(k) 912 for each audio channel and the audio signals 906. The focus gains 912 can then be applied to the corresponding audio channel signals 906 (and in some embodiments furthermore be temporal smoothed). The output from the focus gain processor 903 may be a focus-processed audio channel audio signal 914.
In these examples the channel directional/positional information 902 is unaltered and also provided as a channel directional/positional information output 910.
In some embodiments when the input audio channels do not have distance information (e.g., the input is loudspeaker or object sound with only directions but not distance) one option to handle such audio channels is to determine a fixed default distance for such signals and apply the same formula to determine f(k).
In some embodiments determining the focus gain f(k) 912 for such audio channels may be based on the angular difference between the focus direction and the direction of the audio channel. In some embodiments this may first determine a focus width θw. For example as shown in
Then the angle θa is determined between the focus direction and the direction of the audio channel (for each audio channel individually). Then similar formula as discussed above can be used to determine f(k), where dr is replaced by θw and d(k) replaced by θa (when determining the focus gain for the audio channels without the distance information). In some embodiments when the focus radius is larger than focus distance, the asin function above is not defined, and a large value (e.g., π) can be used for the focus width θw.
With respect to
The initial operation is receiving the multichannel/object audio signals (and focus parameters or other control information and channel information such as directions/distances) as shown in
The next operation generating the focus gain factors as shown in
The next operation is applying a focus gain for each channel audio signals as shown in
Then the processing audio signal and unmodified channel directions (and distances) can then be output as shown in
With respect to
In these examples reproduction processor may comprise an Ambisonic rotation matrix processor 1101. The Ambisonic rotation matric processor 1101 is configured to receive the Ambisonic signal with focus processing 1100 and the view direction 1102. The Ambisonic rotation matrix processor 1101 is configured to generate a rotation matrix based on the view direction parameter 1102. This may in some embodiments use any suitable method, such as those applied in head-tracked Ambisonic binauralization (or more generally, such rotation of spherical harmonics is used in many fields, including other than audio). The rotation matrix then be applied to the Ambisonic audio signals. The result of which are rotated
Ambisonic signals with added focus 1104, which are output to an Ambisonic to binaural filter 1103.
The Ambisonic to binaural filter 1103 is configured to receive the rotated Ambisonic signals with added focus 1104. The Ambisonic to binaural filter 1103 may comprise a pre-formulated 2×K matrix of finite impulse response (FIR) filters that are applied to the KAmbisonic signals to generate the 2 binaural signals 1106. The FIR filters may have been generated by least-squares optimization methods with respect to a set of head-related impulse responses (HRIRs). An example of such a design procedure is to transform the HRIR data set to frequency bins (for example by FFT) to obtain the HRTF data set, and to determine for each frequency bin a complex-valued processing matrix that in a least-squares sense approximates the available HRTF data set at the data points of the HRTF data set. When for all frequency bins the complex valued matrices are determined in such a way, the result can be inverse transformed (for example by inverse FFT) as time-domain FIR filters. The FIR filters may also be windowed, for example by using a Hann window.
There are many known methods which may be used to render an Ambisonic signal to loudspeaker output. One example may be a linear decoding of the Ambisonic signals to a target loudspeaker configuration. This may be applied when the order of the Ambisonic signals is sufficiently high, for example, at least 3rd order, but preferably 4th order. In a specific example of such linear decoding an Ambisonic decoding matrix may be designed that, when applied to the Ambisonic signals (corresponding to Ambisonic beam patterns), generates loudspeaker signals corresponding to beam patterns that in a least-square sense approximate the vector-base amplitude panning (VBAP) beam patterns suitable for the target loudspeaker configuration. Processing the Ambisonic signals with such a designed Ambisonic decoding matrix may be configured to generate the loudspeaker sound output. In such embodiments the reproduction processor is configured to receive information regarding the loudspeaker configuration.
With respect to
The initial operation is receiving the focus processed Ambisonic audio signals (and the view directions) as shown in
The next operation is one of generating rotation matrix based on the view direction as shown in
The next operation is applying the rotation matrix to the Ambisonic audio signals to generate rotated Ambisonic audio signals with focus processing as shown in
Then the next operation is converting the Ambisonic audio signals to a suitable audio output format, for example a binaural format (or a multichannel audio format) as shown in
Then the output audio format is then output as shown in
With respect to
In some embodiments the reproduction processor comprises a filter bank 1201 configured to receive the audio channels 1200 audio signals and transform the audio channels to frequency bands (unless the input is already in a suitable time-frequency domain). Examples of suitable filter banks include the short-time
Fourier transform (STFT) and the complex quadrature mirror filter (QMF) bank. The time-frequency audio signals 1202 can be output to a parametric binaural synthesizer 1203.
In some embodiments the reproduction processor comprises a parametric binaural synthesizer 1203 configured to receive the time-frequency audio signals 1202 and the modified (and unmodified) metadata 1204 and also the view direction 1206 (or suitable reproduction related control or tracking information). In context of 6DOF reproduction, the user position may be provided along with the view direction parameter.
The parametric binaural synthesizer 1203 may be configured to implement any suitable known parametric spatial synthesis method configured to generate a binaural audio signal (in frequency bands) 1208, since the focus modification has taken place already for the signals and the metadata before the parametric binauralization block. The binauralized time-frequency audio signals 1208 can then be passed to an inverse filter bank 1205. The embodiments may further feature the reproduction processor comprising an inverse filter bank 1205 configured to receive the binauralized time-frequency audio signals 1208 and generate an inverse to the applied forward filter bank thus generate a time domain binauralized audio signal 1210 with the focus characteristics suitable for reproduction by headphones (not shown in
In some embodiments the binaural audio signal output is replaced by a loudspeaker channel audio signals output format from the parametric spatial audio signals using suitable loudspeaker synthesis methods. Any suitable approach may be used, for example one where the view direction parameter is replaced with information of the positions of the loudspeakers, and the binaural processor is replaced with a loudspeaker processor, based on suitable known methods.
With respect to
The initial operation is receiving the focus processed parametric spatial audio signals (and the view directions or other reproduction related control or tracking information) as shown in
The next operation is one of time-frequency converting the audio signals as shown in
The next operation is applying a parametric binaural (or loudspeaker channel format) processor based on the time-frequency converted audio signals, the metadata and viewing direction (or other information) as shown in
Then the output audio format is then output as shown in
Considering a loudspeaker output for the reproduction processor when the audio signal is in a form of multichannel audio and focus processor 950 in
In some embodiments the conversion from the first loudspeaker configuration to the second loudspeaker configuration may be implemented using any suitable amplitude panning technique. For example an amplitude panning technique may comprise deriving a N-by-M matrix of amplitude panning gains that define conversion from a M channels of the first loudspeaker configuration to a N channels of the second loudspeaker configuration and then use the matrix to multiply the channels of an intermediate spatial audio signal provided as a multi-channel loudspeaker signal according to the first loudspeaker configuration. The intermediate spatial audio signal may be understood to be similar to the audio signal with a focused sound component 204 as shown in
For binaural output any suitable binauralization of a multi-channel loudspeaker signal format (and/or objects) may be implemented. For example a typical binauralization may comprise processing the audio channels with head-related transfer functions (HRTFs) and adding synthetic room reverberation to generate an auditory impression of a listening room. The distance+directional (i.e., positional) information of the audio object sounds can be utilized for the 6DOF reproduction with user movement, by adopting the principles outlined for example in GB patent application GB1710085.0.
An example apparatus suitable for implementation is shown in
An audio bitstream obtainer 1423 is configured to obtain an audio bitstream 1424, for example being received/retrieved from storage. In some embodiments the mobile device comprises a decoder 1425 configured to receive compressed audio and decode it. Examples of the decoder is an AAC decoder in the case of AAC decoding. The resulting decoded (for example Ambisonic where the example implements the examples as shown in
The mobile phone 1401 receives controller data 1400 (for example via Bluetooth) from an external controller at a controller data receiver 1411 and passes that data to the focus parameter (from controller data) determiner 1421. The focus parameter (from controller data) determiner 1421 determines the focus parameters, for example based on the orientation of the controller device and/or button events. The focus parameters can comprise any kind of combination of the proposed focus parameters (e.g., focus direction, focus amount, focus height, and focus width). The focus parameters 1422 are forwarded to the focus processor 1427.
Based on the Ambisonic audio signals and focus parameters a focus processor 1427 is configured to create modified Ambisonic signals 1428 that have desired focus characteristics. These modified Ambisonic signals 1428 are forwarded to the Ambisonic to binaural processor 1429. The Ambisonic to binaural processor 1429 also is configured to receive head orientation information 1404 from the orientation tracker 1413 of the mobile phone 1401. Based on the modified
Ambisonic signals 1428 and the head orientation information 1404, the Ambisonic to binaural processor 1429 is configured to create head-tracked binaural signals 1430 which can be outputted from the mobile phone, and played back using, e.g., headphones.
In some embodiments the focus amount can be controlled using Focus amount buttons (shown in
In some embodiments the focus shape can be determined by drawing the desired shape with a controller (e.g., with the one depicted in
In some embodiments, the focus controller as shown in
In an example scene, there are two sources of interest, for example talkers. The user then points and clicks “select focus direction” to both of these sources, and the visual display then indicates for the user that these sources (which are not only auditory sources but also visual sources at certain directions and distances) have been selected for audio focus. Then the user selects the focus amount and focus radius parameters, where the focus radius indicates how far auditory events from the sources of interest are to be included within the determined focus shape. During control adjustment, the focus radius could be indicated as visual spheres around the visual sources of interest.
The visual field may react to user movement, but also the sources may move within the scene, and the source positions are tracked, typically visually. Therefore, the focus shape, which in this case may be represented by two spheres in the 3D space, then change its overall shape adaptively by moving those spheres.
In other words, a complex focus shape with also depth focus is obtained. Then, depending on the spatial audio format that focus shape can be either accurately reproduced (in a condition where the spatial audio has reliable distance information), or approximated otherwise, for example as was exemplified in above.
In some embodiments, it may be desirable to further specify the focus processing, for example by determining a desired frequency range or spectral property of the focused signal. In particular, it may be useful to emphasize the focused audio spectrum at the speech frequency range to improve the intelligibility, for example by attenuating low frequency content (for example, below 200 Hz), and the high-frequency content (for example, above 8 kHz), thus leaving a particularly useful frequency range related to speech.
It is understood that the focus-processed signal may be further processed with any known audio processing techniques, such as automatic gain control or enhancement techniques (e.g. bandwidth extension, noise suppression).
In some further embodiments, the focus parameters (including the direction, the amount and at least one focus shape parameter) are generated by a content creator, and the parameters are sent alongside the spatial audio signal. For example the scene may be a VR video/audio recording of an unplugged music concert near the stage. The content creator may assume that the typical remote listener wishes to determine a focus arc that spans towards the stage, and also to the sides for room acoustic effect, but removes the direct sounds from the audience (behind the VR camera main direction) at least to some degree. Therefore, a focus parameter track is added to the stream, and it can be set as the default rendering mode. However, the audience sounds are nevertheless present in the stream, and some users may prefer to discard the focus processing and enable the full sound scene including the audience sounds to be reproduced.
In other words, instead of user needing to select the direction and shape of the focus, a potentially dynamic focus parameter pre-set can be selected. The pre-set may have been fine-tuned by the content creator to well follow the show, for example, such that the focusing is turned off at the end of each song, to play back the applause to the listener. The content creator can generate some expected preference profiles as the focus parameters. The approach is beneficial since only one spatial audio signal needs to be conveyed, but different preference profiles can be added. A legacy player not enabled with focus may decode the Ambisonic signal without focus procedures.
In some further embodiments, the focus shape is controlled along with a visual zoom in the video with multiple viewing directions. The visual zoom can be conceptualized as the user controlling a set of virtual binoculars in the panoramic or 360 or 3D video. In such a use case, when the visual zoom feature is enabled (for example at least 1.5× zoom is set), then the audio focus of the spatial audio signal can also be enabled. Since the user is then clearly interested in that particular direction, the focus amount can be set to a high value, for example 80%, and the focus width can be set to correspond to the arc of the visual view in the virtual binoculars. In other words, the focus width gets smaller when the visual zoom is increased. As the focus was set to 80%, the user can hear to some degree the remaining spatial sound at the appropriate directions. In that way, the user hears the occurrence of interesting new content, and knows to turn off the visual zoom and to view to the new direction of interest. The zoom processing may also be used in the context of audio codecs that allow such processing. An example of such a codec could, e.g., be MPEG-I.
A user in such embodiments as described above may control the focus shape in a versatile way using the present invention.
An example processing output based on the implementation described for higher-order Ambisonics (HOA) signals is shown in
With respect to
In some embodiments the device 1700 comprises at least one processor or central processing unit 1707. The processor 1707 can be configured to execute various program codes such as the methods such as described herein.
In some embodiments the device 1700 comprises a memory 1711. In some embodiments the at least one processor 1707 is coupled to the memory 1711. The memory 1711 can be any suitable storage means. In some embodiments the memory 1711 comprises a program code section for storing program codes implementable upon the processor 1707. Furthermore in some embodiments the memory 1711 can further comprise a stored data section for storing data, for example data that has been processed or to be processed in accordance with the embodiments as described herein. The implemented program code stored within the program code section and the data stored within the stored data section can be retrieved by the processor 1707 whenever needed via the memory-processor coupling.
In some embodiments the device 1700 comprises a user interface 1705. The user interface 1705 can be coupled in some embodiments to the processor 1707. In some embodiments the processor 1707 can control the operation of the user interface 1705 and receive inputs from the user interface 1705. In some embodiments the user interface 1705 can enable a user to input commands to the device 1700, for example via a keypad. In some embodiments the user interface 1705 can enable the user to obtain information from the device 1700. For example the user interface 1705 may comprise a display configured to display information from the device 1700 to the user. The user interface 1705 can in some embodiments comprise a touch screen or touch interface capable of both enabling information to be entered to the device 1700 and further displaying information to the user of the device 1700.
In some embodiments the device 1700 comprises an input/output port 1709. The input/output port 1709 in some embodiments comprises a transceiver. The transceiver in such embodiments can be coupled to the processor 1707 and configured to enable a communication with other apparatus or electronic devices, for example via a wireless communications network. The transceiver or any suitable transceiver or transmitter and/or receiver means can in some embodiments be configured to communicate with other electronic devices or apparatus via a wire or wired coupling.
The transceiver can communicate with further apparatus by any suitable known communications protocol. For example in some embodiments the transceiver can use a suitable universal mobile telecommunications system (UMTS) protocol, a wireless local area network (WLAN) protocol such as for example IEEE 802.X, a suitable short-range radio frequency communication protocol such as Bluetooth, or infrared data communication pathway (IRDA).
The transceiver input/output port 1709 may be configured to receive the signals and in some embodiments obtain the focus parameters as described herein.
In some embodiments the device 1700 may be employed to generate a suitable audio signal using the processor 1707 executing suitable code. The input/output port 1709 may be coupled to any suitable audio output for example to a multichannel speaker system and/or headphones (which may be a headtracked or a non-tracked headphones) or similar.
In general, the various embodiments of the invention may be implemented in hardware or special purpose circuits, software, logic or any combination thereof. For example, some aspects may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device, although the invention is not limited thereto. While various aspects of the invention may be illustrated and described as block diagrams, flow charts, or using some other pictorial representation, it is well understood that these blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.
The embodiments of this invention may be implemented by computer software executable by a data processor of the mobile device, such as in the processor entity, or by hardware, or by a combination of software and hardware. Further in this regard it should be noted that any blocks of the logic flow as in the Figures may represent program steps, or interconnected logic circuits, blocks and functions, or a combination of program steps and logic circuits, blocks and functions. The software may be stored on such physical media as memory chips, or memory blocks implemented within the processor, magnetic media such as hard disk or floppy disks, and optical media such as for example DVD and the data variants thereof, CD.
The memory may be of any type suitable to the local technical environment and may be implemented using any suitable data storage technology, such as semiconductor-based memory devices, magnetic memory devices and systems, optical memory devices and systems, fixed memory and removable memory. The data processors may be of any type suitable to the local technical environment, and may include one or more of general purpose computers, special purpose computers, microprocessors, digital signal processors (DSPs), application specific integrated circuits (ASIC), gate level circuits and processors based on multi-core processor architecture, as non-limiting examples.
Embodiments of the inventions may be practiced in various components such as integrated circuit modules. The design of integrated circuits is by and large a highly automated process. Complex and powerful software tools are available for converting a logic level design into a semiconductor circuit design ready to be etched and formed on a semiconductor substrate.
Programs, such as those provided by Synopsys, Inc. of Mountain View, California and Cadence Design, of San Jose, Calif. automatically route conductors and locate components on a semiconductor chip using well established rules of design as well as libraries of pre-stored design modules. Once the design for a semiconductor circuit has been completed, the resultant design, in a standardized electronic format (e.g., Opus, GDSII, or the like) may be transmitted to a semiconductor fabrication facility or “fab” for fabrication.
The foregoing description has provided by way of exemplary and non-limiting examples a full and informative description of the exemplary embodiment of this invention. However, various modifications and adaptations may become apparent to those skilled in the relevant arts in view of the foregoing description, when read in conjunction with the accompanying drawings and the appended claims. However, all such and similar modifications of the teachings of this invention will still fall within the scope of this invention as defined in the appended claims.
Number | Date | Country | Kind |
---|---|---|---|
1908346.8 | Jun 2019 | GB | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/FI2020/050387 | 6/3/2020 | WO |