Embodiments of the invention relate to transport channel or downmix signaling for directional audio coding.
Directional Audio Coding (DirAC) technique [Pulkki07] is an efficient approach to the analysis and reproduction of spatial sound. DirAC uses a perceptually motivated representation of the sound field based on spatial parameters, i.e., the direction of arrival (DOA) and diffuseness measured per frequency band. It is built upon the assumption that at one time instant and at one critical band, the spatial resolution of auditory system is limited to decoding one cue for direction and another for inter-aural coherence. The spatial sound is then represented in the frequency domain by cross-fading two streams: a non-directional diffuse stream and a directional non-diffuse stream.
DirAC was originally intended for recorded B-format sound but can also be extended for microphone signals matching a specific loudspeaker setup like 5.1 [2] or any configuration of microphone arrays [5]. In the latest case, more flexibility can be achieved by recording the signals not for a specific loudspeaker setup, but instead recording the signals of an intermediate format.
Such an intermediate format, which is well-established in practice, is represented by (higher-order) Ambisonics [3]. From an Ambisonics signal, one can generate the signals of every desired loudspeaker setup including binaural signals for headphone reproduction. This requires a specific renderer which is applied to the Ambisonics signal, using either a linear Ambisonics renderer [3] or a parametric renderer such as Directional Audio Coding (DirAC).
An Ambisonics signal can be represented as a multi-channel signal where each channel (referred to as Ambisonics component) is equivalent to the coefficient of a so-called spatial basis function. With a weighted sum of these spatial basis functions (with the weights corresponding to the coefficients) one can recreate the original sound field in the recording location [3]. Therefore, the spatial basis function coefficients (i.e., the Ambisonics components) represent a compact description of the sound field in the recording location. There exist different types of spatial basis functions, for example spherical harmonics (SHs) [3] or cylindrical harmonics (CHs) [3]. CHs can be used when describing the sound field in the 2D space (for example for 2D sound reproduction) whereas SHs can be used to describe the sound field in the 2D and 3D space (for example for 2D and 3D sound reproduction).
As an example, an audio signal f (t) which arrives from a certain direction (φ, θ) results in a spatial audio signal f (φ, θ, t) which can be represented in Ambisonics format by expanding the spherical harmonics up to a truncation order H:
whereby Ylm(φ, θ) being the spherical harmonics of order l and mode m, and ϕlm(t) the expansion coefficients. With increasing truncation order H the expansion results in a more precise spatial representation. Spherical harmonics up to order H=4 with Ambisonics Channel Numbering (ACN) index are illustrated in
DirAC was already extended for delivering higher-order Ambisonics signals from a first order Ambisonics signal (FOA as called B-format) or from different microphone arrays [5]. This document focuses on a more efficient way to synthesize higher-order Ambisonics signals from DirAC parameters and a reference signal. In this document, the reference signal, also referred to as the down-mix signal, is considered a subset of a higher-order Ambisonics signal or a linear combination of a subset of the Ambisonics components.
In the DirAC analysis the spatial parameters of DirAC are estimated from the audio input signals. Originally, DirAC has been developed for first-order Ambisonics (FOA) input that can e.g. be obtained from B-format microphones, however other input signals are well possible, too. In the DirAC synthesis, the output signals for the spatial reproduction, e.g., loudspeaker signals, are computed from the DirAC parameters and the associated audio signals. Solutions have been described for using an omnidirectional audio signal only for the synthesis or for using the entire FOA signal [Pulkki07]. Alternatively, only a subset of the four FOA signal components can be used for the synthesis.
Due to its efficient representation of spatial sound, DirAC is also well suited as basis for spatial audio coding systems. The objective of such a system is to be able to code spatial audio scenes at low bit-rates and to reproduce the original audio scene as faithfully as possible after transmission. In this case the DirAC analysis is followed by a spatial metadata encoder, which quantizes and encodes DirAC parameters to obtain a low bit-rate parametric representation. Along with the metadata, a down-mix signal derived from the original audio input signals is coded for transmission by a conventional audio core-coder. For example, an EVS-based audio coder can be adopted for coding the down-mix signal. The down-mix signal consists of different channels, called transport channels: The down-mix signal can be e.g. the four coefficient signals composing a B-format signal (i.e., FOA), a stereo pair, or a monophonic down-mix depending of the targeted bit-rate. The coded spatial parameters and the coded audio bit-stream are multiplexed before transmission.
In the following, an overview of a state-of-the-art spatial audio coding system based on DirAC designed for Immersive Voice and Audio Services (IVAS) is presented. The objective of such a system is to be able to handle different spatial audio formats representing the audio scene and to code them at low bit-rates and to reproduce the original audio scene as faithfully as possible after transmission.
The system can accept as input different representations of audio scenes. The input audio scene can be represented by multi-channel signals aimed to be reproduced at the different loudspeaker positions, auditory objects along with metadata describing the positions of the objects over time, or a first-order or higher-order Ambisonics format representing the sound field at the listener or reference position.
Advantageously the system is based on 3GPP Enhanced Voice Services (EVS) since the solution is expected to operate with low latency to enable conversational services on mobile networks.
The encoder side of the DirAC-based spatial audio coding supporting different audio formats is illustrated in
In addition to the described channel-based, HOA-based, and object-based input formats, the IVAS encoder may receive a parametric representation of spatial sound composed of spatial and/or directional metadata and one or more associated audio input signals. The metadata can for example correspond to the DirAC metadata, i.e. DOA and diffuseness of the sound. The metadata may also include additional spatial parameters such as multiple DOAs with associated energy measures, distance or position values, or measures related to the coherence of the sound field. The associated audio input signals may be composed of a mono signal, an Ambisonics signal of first-order or higher-order, an X/Y-stereo signal, an NB-stereo signal, or any other combination of signals resulting from recordings with microphones having various directivity patterns and/or mutual spacings.
For parametric spatial audio input, the IVAS encoder determines the DirAC parameter used for transmission based on the input spatial metadata.
Along with the parameters, a down-mix (DMX) signal derived from the different sources or audio input signals is coded for transmission by a conventional audio core-coder. In this case an EVS-based audio coder is adopted for coding the down-mix signal. The down-mix signal consists of different channels, called transport channels: The signal can be e.g. the four coefficient signals composing a B-format or first-order Ambisonics (FOA) signal, a stereo pair, or a monophonic down-mix depending on the targeted bit-rate. The coded spatial parameters and the coded audio bitstream are multiplexed before being transmitted over the communication channel.
In the decoder, shown in
The decoder of the DirAC-spatial audio coding delivering different audio formats is illustrated in
A conventional HOA synthesis using DirAC paradigm is depicted in
The down-mix signal can be the original microphone signals or a mixture of the original signals depicting the original audio scene. For example if the audio scene is captured by a sound field microphone, the down-mix signal can be the omnidirectional component of the scene (W), a stereo down-mix (L/R), or the first order Ambisonics signal (FOA).
For each time-frequency tile, a sound direction, also called Direction-of-Arrival (DOA), and a diffuseness factor are estimated by the direction estimator 2020 and by the diffuseness estimator 2010, respectively, if the down-mix signal contains sufficient information for determining such DirAC parameters. It is the case, for example, if the down-mix signal is a First Oder Ambisonics signal (FOA). Alternatively or if the down-mix signal is not sufficient to determine such parameters, the parameters can be conveyed directly to the DirAC synthesis via an input bit-stream containing the spatial parameters. The bit-stream could consist for example of quantized and coded parameters received as side-information in the case of audio transmission applications. In this case, the parameters are derived outside the DirAC synthesis module from the original microphone signals or the input audio formats given to the DirAC analysis module at the encoder side as illustrated by switch 2030 or 2040.
The sound directions are used by a directional gains evaluator 2050 for evaluating, for each time-frequency tile of the plurality of time-frequency tiles, one or more set of (H+1)2 directional gains Glm(k,n), where H is the order of the synthesized Ambisonics signal.
The directional gains can be obtained by evaluation the spatial basis function for each estimated sound direction at the desired order (level) l and mode m of the Ambisonics signal to synthesize. The sound direction can be expressed for example in terms of a unit-norm vector n(k,n) or in terms of an azimuth angle φ(k,n) and/or elevation angle θ(k,n), which are related for example as:
After estimating or obtaining the sound direction, a response of a spatial basis function of the desired order (level) l and mode m can be determined, for example, by considering
real-valued spherical harmonics with SN3D normalization as spatial basis function:
with the ranges 0≤l≤H, and −l≤m≤l. Pl|m| are the Legendre-functions and Nl|m| is a normalization term for both the Legendre functions and the trigonometric functions which takes the following form for SN3D:
where the Kronecker-delta δm is one for m=0 and zero otherwise. The directional gains are then directly deduced for each time-frequency tile of indices (k,n) as:
G
l
m(k,n)=Ylm(φ(k,n),θ(k,n))
The direct sound Ambisonics components Ps,lm are computed by deriving a reference signal Pref from the down-mix signal and multiplied by the directional gains and a factor function of the diffuseness Ψ(k,n):
P
s,l
m(k,n)=Pref(k,n)√{square root over (1−Ψ(k,n))}Glm(k,n)
For example, the reference signal Pref can be the omnidirectional component of the down-mix signal or a linear combination of the K channels of the down-mix signal.
The diffuse sound Ambisonics component can be modelled by using a response of a spatial basis function for sounds arriving from all possible directions. One example is to define the average response Dlm by considering the integral of the squared magnitude of the spatial basis function Ylm(φ, θ) over all possible angles φ and θ:
D
l
m=∫02π∫0π|Ylm(φ,θ)|2 sin θdθdφ
The diffuse sound Ambisonics components Pd,lm are computed from a signal Pdiff multiplied by the average response and a factor function of the diffuseness Ψ(k,n):
P
d,l
m(k,n)=Pdiff,lm(k,n)√{square root over (Ψ(k,n))}√{square root over (Dlm)}
The signal Pdiff,1 can be obtained by using different decorrelators applied to the reference signal Pref.
Finally, the direct sound Ambisonics component and the diffuse sound Ambisonics component are combined 2060, for example, via the summation operation, to obtain the final Ambisonics component Plm of the desired order (level) l and mode m for the time-frequency tile (k,n), i.e.,
P
l
m(k,n)=Ps,im(k,n)+Pdiff,lm(k,n)
The obtained Ambisonics components may be transformed back into the time domain using an inverse filter bank 2080 or an inverse STFT, stored, transmitted, or used for example for spatial sound reproduction applications. Alternatively, a linear Ambisonics renderer 2070 can be applied for each frequency band for obtaining signals to be played on a specific loudspeaker layout or over headphone before transforming the loudspeakers signals or the binaural signals to the time domain.
It should be noted that [Thiergart17] also taught the possibility that diffuse sound components Pdiff,lm could only be synthesized up to an order L, where L<H. This reduces the computational complexity while avoiding synthetic artifacts due to the intensive use of decorrelators.
It is the object of the present invention to provide an improved concept for generating a sound field description from an input signal.
The common DirAC synthesis, based on a received DirAC-based spatial audio coding stream, is described in the following. The rendering performed by the DirAC synthesis is based on the decoded down-mix audio signals and the decoded spatial metadata.
The down-mix signal is the input signal of the DirAC synthesis. The signal is transformed into the time-frequency domain by a filter bank. The filter bank can be a complex-valued filter bank like complex-valued QMF or a block transform like STFT.
The DirAC parameters can be conveyed directly to the DirAC synthesis via an input bit-stream containing the spatial parameters. The bit-stream could consist for example of quantized and coded parameters received as side-information in the case of audio transmission applications.
For determining the channel signals for loudspeaker based sound reproduction, each loudspeaker signal is determined based on the down-mix signals and the DirAC parameters. The signal of the j-th loudspeaker Pj(k,n) is obtained as a combination of a direct sound component and a diffuse sound component, i.e.,
P
j(k,n)=Pdir,j(k,n)+Pdiff,j(k,n)
The direct sound component of the j-th loudspeaker channel Pdir,j(k,n) can be obtained by scaling a so-called reference signal Pref,j(k,n) with a factor depending on the diffuseness parameter Ψ(k,n) and a directional gain factor Gj(v(k,n)), where the gain factor depends on the direction-of-arrival (DOA) of sound and potentially also on the position of the j-th loudspeaker channel. The DOA of sound can be expressed for example in terms of a unit-norm vector v(k,n) or in terms of an azimuth angle φ(k,n) and/or elevation angle θ(k,n), which are related for example as
The directional gain factor Gj(v(k,n)) can be computed using well-known methods such as vector-base amplitude panning (VBAP) [Pulkki97].
Considering the above, the direct sound component can be expressed by
P
dir,j(k,n)=Pref,j(k,n)√{square root over (1−Ψ(k,n))}Gj(v(k,n))
The spatial parameters describing the DOA of sound and the diffuseness are either estimated at the decoder from the transport channels or obtained from the parametric metadata included in the bitstream.
The diffuse sound component Pdiff,j(k,n) can be determined based on the reference signal and the diffuseness parameter:
P
diff,j(k,n)=Pref,j(k,n)√{square root over (Ψ(k,n))}Gnorm
The normalization factor Gnorm depends on the playback loudspeaker configuration. Usually, the diffuse sound components associated with the different loudspeaker channels Pdiff,j (k,n) are further processed, i.e., they are mutually decorrelated. This can also be achieved by decorrelating the reference signal for each output channel, i.e.,
P
diff,j(k,n)={tilde over (P)}ref,j(k,n){right arrow over (Ψ(k,n))}Gnorm,
where {tilde over (P)}ref,j(k,n) denotes a decorrelated version of Pref,j(k,n).
The reference signal for the j-th output channel is obtained based on the transmitted down-mix signals. In the simplest case, the down-mix signal consists of a monophonic omnidirectional signal (e.g. the omnidirectional component W(k,n) of an FOA signal) and the reference signal is identical for all output channels:
P
ref,j(k,n)=W(k,n)
If the transport channels correspond to the four components of an FOA signal, the reference signals can be obtained by a linear combination of the FOA components. Typically, the FOA signals are combined such that the reference signal of the j-th channel corresponds to a virtual cardioid microphone signal pointing to the direction of the j-th loudspeaker [Pulkki07].
The DirAC synthesis typically provides an improved sound reproduction quality for an increased number of down-mix channels, as both the required amount of synthetic decorrelation, the degree of nonlinear processing by the directional gain factors, or cross-talk between different loudspeaker channels can be reduced and associated artifacts can be avoided or mitigated.
Generally, the straightforward approach to introduce many different transport signals into the encoded audio scene is inflexible on the one hand and bitrate-consuming on the other hand. Typically, it may not be necessary in all cases to introduce, for example, all four component signals of a first order Ambisonics signal into the encoded audio signal, since one or more components do not have a significant energy contribution. On the other hand, the bitrate requirements may be tight which forbids to introduce more than two transport channels into the encoded audio signal representing a spatial audio representation. In case of such tight bitrate requirements, the encoder and the decoder would have to pre-negotiate a certain representation, and, based on this pre-negotiation, a certain amount of transport signals is generated based on a pre-negotiated way and, then, the audio decoder can synthesize the audio scene from the encoded audio signal based on the pre-negotiated knowledge. This, however, although being useful with respect to bitrate requirements, is inflexible, and additionally may amount in a significantly reduced audio quality, since the pre-negotiated procedure may not be optimum for a certain audio piece or may not be optimum for all frequency bands or for all time frames of the audio piece.
Thus, the prior art procedure of representing an audio scene is non-optimum with respect to bitrate requirements, is inflexible, and, additionally, has a high potential of resulting in a significantly reduced audio quality.
An embodiment may have an apparatus for encoding a spatial audio representation representing an audio scene to acquire an encoded audio signal, the apparatus comprising: a transport representation generator for generating a transport representation from the spatial audio representation, and for generating transport metadata related to the generation of the transport representation or indicating one or more directional properties of the transport representation; and an output interface for generating the encoded audio signal, the encoded audio signal comprising information on the transport representation, and information on the transport metadata.
Another embodiment may have an apparatus for decoding an encoded audio signal, comprising: an input interface for receiving the encoded audio signal comprising information on a transport representation and information on transport metadata; and a spatial audio synthesizer for synthesizing a spatial audio representation using the information on the transport representation and the information on the transport metadata.
Another embodiment may have a method for encoding a spatial audio representation representing an audio scene to acquire an encoded audio signal, the method comprising: generating a transport representation from the spatial audio representation; generating transport metadata related to the generation of the transport representation or indicating one or more directional properties of the transport representation; and generating the encoded audio signal, the encoded audio signal comprising information on the transport representation, and information on the transport metadata.
Another embodiment may have a method for decoding an encoded audio signal, the method comprising: receiving the encoded audio signal comprising information on a transport representation and information on transport metadata; and synthesizing a spatial audio representation using the information on the transport representation and the information on the transport metadata.
Another embodiment may have a non-transitory digital storage medium having a computer program stored thereon to perform the method for encoding a spatial audio representation representing an audio scene to acquire an encoded audio signal, the method comprising: generating a transport representation from the spatial audio representation; generating transport metadata related to the generation of the transport representation or indicating one or more directional properties of the transport representation; and generating the encoded audio signal, the encoded audio signal comprising information on the transport representation, and information on the transport metadata, when said computer program is run by a computer.
Another embodiment may have a non-transitory digital storage medium having a computer program stored thereon to perform the method for decoding an encoded audio signal, the method comprising: receiving the encoded audio signal comprising information on a transport representation and information on transport metadata; and synthesizing a spatial audio representation using the information on the transport representation and the information on the transport metadata, when said computer program is run by a computer.
Another embodiment may have an encoded audio signal comprising: information on a transport representation of a spatial audio representation; and information on transport metadata.
The present invention is based on the finding that a significant improvement with respect to bitrate, flexibility and audio quality is obtained by using, in addition to a transport representation derived from the spatial audio representation, transport metadata that are related to the generation of the transport representation or that indicate one or more directional properties of the transport representation. An apparatus for encoding a spatial audio representation representing an audio scene therefore generates the transport representation from the audio scene, and, additionally, the transport metadata related to the generation of the transport representation or indicating one or more directional properties of the transport representation or being related to the generation of the transport representation and indicating one or more directional properties of the transport representation. Furthermore, an output interface generates the encoded audio signal comprising information on the transport representation and information on the transport metadata.
On the decoder-side, the apparatus for decoding the encoded audio signal comprises an interface for receiving the encoded audio signal comprising information on the transport representation and the information on the transport metadata and a spatial audio synthesizer then synthesizes the spatial audio representation using both, the information on the transport representation and the information on the transport metadata.
The explicit indication of how the transport representation such as a downmix signal has been generated and/or the explicit indication of one or more directional properties of the transport representation by means of additional transport metadata allows the encoder to generate an encoded audio scene in a highly flexible way that, on the one hand, provides a good audio quality, and on the other hand, fulfills small bitrates requirements. Additionally, by means of the transport metadata, it is even possible for the encoder to find a required optimum balance between bitrate requirements on the one hand and audio quality represented by the encoded audio signal on the other hand. Thus, the usage of explicit transport metadata allows the encoder to apply different ways of generating the transport representation and to additionally adapt the transport representation generation not only from audio piece to audio piece, but even from one audio frame to the next audio frame or, within one and the same audio frame from one frequency band to the other frequency band. Naturally, the flexibility is obtained by generating the transport representation for each time/frequency tile individually so that, for example, the same transport representation can be generated for all frequency bins within a time frame or, alternatively, the same transport representation can be generated for one and the same frequency band over many audio time frames, or an individual transport representation can be generated for each frequency bin of each time frame. All this information, i.e., the way of generating the transport representation and whether the transport representation is related to a full frame, or only to a time/frequency bin or a certain frequency band over many time frames is also included in the transport metadata so that a spatial audio synthesizer is aware of what has been done at the encoder-side and can then apply the optimum procedure at the decoder-side.
Advantageously, certain transport metadata alternatives are selection information indicating which components of a certain set of components representing the audio scene have been selected. A further transport metadata alternative relates to a combination information, i.e., whether and/or how certain component signals of the spatial audio representation have been combined to generate the transport representation. Further information useful as transport metadata relates to sector/hemisphere information indicating to which sector or hemisphere a certain transport signal or a transport channel relates to. Further, metadata useful in the context of the present invention relate to look direction information indicating a look direction of an audio signal included as the transport signal of, advantageously, a plurality of different transport signals in the transport representation. Other look direction information relates to microphone look directions, when the transport representation consists of one or more microphone signals that can, for example, be recorded by physical microphones in a (spatially extended) microphone array or by coincident microphones or, alternatively, these microphone signals can be synthetically generated. Other transport metadata relate to shape parameter data indicating whether a microphone signal is an omnidirectional signal, or has a different shape such as a cardioid shape or a dipole shape. Further transport metadata relate to locations of microphones in case of having more than one microphone signal within the transport representation. Other useful transport metadata relate to orientation data of the one or more microphones, to distance data indicating a distance between two microphones or directional patterns of the microphones. Furthermore, additional transport metadata may relate to a description or identification of a microphone array such as a circular microphone array or which microphone signals from such a circular microphone array have been selected as the transport representation.
Further transport metadata may relate to information on beamforming, corresponding beamforming weights or corresponding directions of beams and, in such a situation, the transport representation typically consists of a advantageously synthetically created signal having a certain beam direction. Further transport metadata alternatives may relate to the pure information whether the included transport signals are omnidirectional microphone signals or are non-omnidirectional microphone signals such as cardioid signals or dipole signals.
Thus, it becomes clear that the different transport metadata alternatives are highly flexible and can be represented in a highly compact way so that the additional transport metadata typically do not result in a significant amount of additional bitrate. Instead, the bitrate requirements for the additional transport metadata may typically be as small as less than 1% or even less than 1/1000 or even smaller of the amount for the transport representation. On the other hand, however, this very small amount of additional metadata results in a higher flexibility, and at the same time, a significant increase of audio quality due to the additional flexibility and due to the potential of having changing transport representations over different audio pieces or, even within one and the same audio piece over different time frames and/or frequency bins.
Advantageously, the encoder additionally comprises a parameter processor for generating spatial parameters from the spatial audio representation so that, in addition to the transport representation and the transport metadata, spatial parameters are included in the encoded audio signal to enhance the audio quality over a quality only obtainable by means of the transport representation and the transport metadata. These spatial parameters are advantageously time and/or frequency-dependent direction of arrival (DoA) data and/or frequency and/or time-dependent diffuseness data as are, for example, known from DirAC coding.
On the audio decoder-side, an input interface receives the encoded audio signal comprising information on a transport representation and information on transport metadata. Furthermore, the spatial audio synthesizer provided in the apparatus for decoding the encoded audio signal synthesizes the spatial audio representation using both, the information on the transport representation and the information on the transport metadata. In embodiments, the decoder additionally uses optionally transmitted spatial parameters to synthesize the spatial audio representation not only using the information on the transport metadata and the information on the transport representation, but also using the spatial parameters.
The apparatus for decoding the encoded audio signal receives the transport metadata, interprets or parses the received transport metadata, and then controls a combiner for combining transport representation signals or for selecting from the transport representation signals or for generating one or several reference signals. The combiner/selector/reference signal generator then forwards the reference signal to a component signal calculator that calculates the required output components from the specifically selected or generated reference signals. In embodiments, not only the combiner/selector/reference signal generator as in the spatial audio synthesizer is controlled by the transport metadata, but also the component signal calculator so that, based on the received transport data, not only the reference signal generation/selection is controlled, but also the actual component calculation. However, embodiments in which only the component signal calculation is controlled by the transport metadata or only the reference signal generation or selection is only controlled by the transport metadata are also useful and provide improved flexibility over existing solutions.
Advantageous procedures of different signal selection alternatives are selecting one of a plurality of signals in the transport representation as a reference signal for a first subset of component signals and selecting the other transport signal in the transport representation for the other orthogonal subset of the component signals for multichannel output, first order or higher order Ambisonics output, audio object output, or binaural output. Other procedures rely on calculating the reference signal based on a linear combination of the individual signals included in the transport representation. Depending on the certain transport representation implementation, the transport metadata is used for determining a reference signal for (virtual) channels from the actually transmitted transport signals and determining missing components based on a fallback, such as a transmitted or generated omnidirectional signal component. These procedures rely on calculating missing, advantageously FOA or HOA components using a spatial basis function response related to a certain mode and order of a first order or higher order Ambisonics spatial audio representation.
Other embodiments relate to transport metadata describing microphone signals included in the transport representation, and, based on the transmitted shape parameter and/or look direction, a reference signal determination is adapted to the received transport metadata. Furthermore, the calculation of omnidirectional signals or dipole signals and the additional synthesis of remaining components is also performed based on the transport metadata indicating, for example, that the first transport channel is a left or front cardioid signal, and the second transport signal is a right or back cardioid signal.
Further procedures relate to the determination of reference signals based on a smallest distance of a certain speaker to a certain microphone position or the selection, as a reference signal, of a microphone signal included in the transport representation with a closest look direction or a closest beamformer or a certain closest array position. A further procedure is the choosing of an arbitrary transport signal as a reference signal for all direct sound components and the usage of all available transport signals such as transmitted omnidirectional signals from spaced microphones for the generation of diffuse sound reference signals and the corresponding components are then generated by adding direct and diffuse components to obtain a final channel or Ambisonics component or an object signal or a binaural channel signal. Further procedures that are particularly implemented in the calculation of the actual component signal based on a certain reference signal relate in the setting (advantageously restricting) an amount of correlation based on a certain microphone distance.
Embodiments of the present invention will be detailed subsequently referring to the appended drawings, in which:
In the embodiments discussed with respect to
In some applications it is not possible to transmit all four components of an FOA signal as transport channels due to bitrate limitations, but only a down-mix signal with reduced number of signal components or channels. In order to achieve improved reproduction quality at the decoder, the generation of the transmitted down-mix signals can be done in a time-variant way and can be adapted to the spatial audio input signal. If the spatial audio coding system allows to include flexible down-mix signals, it is important to not only transmit these transport channels but in addition include metadata that specifies important spatial characteristics of the down-mix signals. The DirAC synthesis located at the decoder of a spatial audio coding system is then able to adapt the rendering process in an optimum way considering the spatial characteristics of the down-mix signals. This invention therefore proposes to include down-mix related metadata in the parametric spatial audio coding stream that is used to specify or describe important spatial characteristics of the down-mix transport channels in order to improve the rendering quality at the spatial audio decoder.
In the following, illustrative examples for practical down-mix signal configurations are described.
If the input spatial audio signal mainly includes sound energy in the horizontal plane, only the first three signal components of the FOA signal corresponding to an omnidirectional signal, a dipole signal aligned with the x-axis and a dipole signal aligned with the y-axis of a Cartesian coordinate system are included in the down-mix signal, whereas the dipole signal aligned with the z-axis is excluded.
In another example, only two down-mix signals may be transmitted to further reduce the required bitrate for the transport channels. For example, if there is dominant sound energy originating from the left hemisphere, it is advantageous to generate a down-mix channel that includes sound energy mainly from the left direction and an additional down-mix channel including the sound originating mainly from the opposite direction, i.e. the right hemisphere in this example. This can be achieved by a linear combination of the FOA signal components such that the resulting signals correspond to directional microphone signals with cardioid directivity patterns pointing to the left and right, respectively. Analogously, down-mix signals corresponding to first-order directivity patterns pointing to the front and back direction, respectively, or any other desired directional patterns can be generated by appropriately combining the FOA input signals.
In the DirAC synthesis stage, the computation of the loudspeaker output channels based on the transmitted spatial metadata (e.g. DOA of sound and diffuseness) and the audio transport channels has to be adapted to the actually used down-mix configuration. More specifically, the most suitable choice for the reference signal of the j-th loudspeaker Pref,j(k,n) depends on the directional characteristic of the down-mix signals and the position of the j-th loudspeaker.
For example, if the down-mix signals correspond to two cardioid microphone signals pointing to the left and right, respectively, the reference signal of a loudspeaker located in the left hemisphere should solely use the cardioid signal pointing to the left as reference signal Pref,j(k,n). A loudspeaker located at the center may use a linear combination of both down-mix signal instead.
On the other hand, if the down-mix signals correspond to two cardioid microphone signals pointing to the front and back, respectively, the reference signal of a loudspeaker located in the frontal hemisphere should solely use the cardioid signal pointing to the front as reference signal Pref,j(k,n).
It is important to note that a significant degradation of the spatial audio quality has to be expected if the DirAC synthesis uses a wrong down-mix signal as the reference signal for rendering. For example, if the down-mix signal corresponding to the cardioid microphone pointing to the left is used for generating an output channel signal for a loudspeaker located in the right hemisphere, the signal components originating from the left hemisphere of the input sound field would be directed mainly to the right hemisphere of the reproduction system leading to an incorrect spatial image of the output. It is therefore advantageous to include parametric information in the spatial audio coding stream that specifies spatial characteristics of the down-mix signals such as directivity patterns of corresponding directional microphone signals. The DirAC synthesis located at the decoder of a spatial audio coding system is then able to adapt the rendering process in an optimum way considering the spatial characteristics of the down-mix signals as described in the down-mix related metadata.
In this embodiment, the spatial audio signal, i.e., the audio input signal to the encoder, corresponds to an FOA (first-order Ambisonics) or HOA (higher-order Ambisonics) audio signal. A corresponding block scheme of the encoder is depicted in
In the following, the “down-mix generation” block and down-mix parameters are explained in more detail. If for example the input spatial audio signal mainly includes sound energy in the horizontal plane, only the three signal components of the FOA/HOA signal corresponding to the omnidirectional signal W(k,n), the dipole signal X(k,n) aligned with the x-axis, and the dipole signal Y(k,n) aligned with the y-axis of a Cartesian coordinate system are included in the down-mix signal, whereas the dipole signal Z(k,n) aligned with the z-axis (and all other higher-order components, if existing) are excluded. This means, the down-mix signals are given by
D
1(k,n)=W(k,n),D2(k,n)=X(k,n),D3(k,n)=Y(k,n).
Alternatively, if for example the input spatial audio signal mainly includes sound energy in the x-z-plane, the down-mix signals include the dipole signal Z(k,n) instead of Y(k,n).
In this embodiment, the down-mix parameters, depicted in
Note that the selection of the FOA/HOA components for the down-mix signal can be done e.g. based on manual user input or automatically. For example, when the spatial audio input signal was recorded at an airport runway, it can be assumed that most sound energy is contained in a specific vertical Cartesian plane. In this case, e.g. the W(k,n), X(k,n) and Z(k,n) components are selected. In contrast, if the recording was carried out at a street crossing, it can be assumed that most sound energy is contained in the horizontal Cartesian plane. In this case, e.g. the W(k,n), X(k,n) and Y(k,n) components are selected. Alternatively, if for example a video camera is used together with the audio recording, a face recognition algorithm can be used to detect in which Cartesian plane the talker is located and hence, the FOA components corresponding to this plane can be selected for the down-mix. Alternatively, one can determine the plane of the Cartesian coordinate system with highest energy by using a state-of-the-art acoustic source localization algorithm.
Also note that the FOA/HOA component selection and corresponding down-mix metadata can be time and frequency-dependent, e.g., a different set of components and indices, respectively, may be selected automatically for each frequency band and time instance (e.g., by automatically determining the Cartesian plane with highest energy for each time-frequency point). Localizing the direct sound energy can be done for example by exploiting the information contained in the time-frequency dependent spatial parameters [Thiergart09].
The decoder block scheme corresponding to this embodiment is depicted in
The spatial audio synthesis (DirAC synthesis) described before requires a suited reference signal Pref,j(k,n) for each output channel j. In this invention, it is proposed to compute Pref,j(k,n) from the down-mix signals Dm(k,n) using the additional down-mix metadata. In this embodiment, the down-mix signals Dm(k,n) consist of specifically selected components of an FOA or HOA signal, and the down-mix metadata describes which FOA/HOA components have been transmitted to the decoder.
When rendering to loudspeakers (i.e., MC output of the decoder), a high-quality output can be achieved when computing for each loudspeaker channel a so-called virtual microphone signal, which is directed towards the corresponding loudspeaker, as explained in [Pulkki07]. Normally, computing the virtual microphone signals requires that all FOA/HOA components are available in the DirAC synthesis. In this embodiment, however, only a subset of the original FOA/HOA components is available at the decoder. In this case, the virtual microphone signals can be computed only for the Cartesian plane, for which the FOA/HOA components are available, as indicated by the down-mix metadata. For example, if the down-mix metadata indicates that the W(k,n), X(k,n), and Y(k,n) component have been transmitted, we can compute the virtual microphone signals for all loudspeakers in the x-y plane (horizontal plane), where the computation can be performed as described in [Pulkki07]. For elevated loudspeakers outside the horizontal plane, we can use a fallback solution for the reference signal Pref,j(k,n), e.g., we can use the omnidirectional component W(k,n).
Note that a similar concept can be used when rendering to binaural stereo output, e.g., for headphone playback. In this case, the two virtual microphones for the two output channels are directed towards the virtual stereo loudspeakers, where the position of the loudspeakers depends on the head orientation of the listener. If the virtual loudspeakers are located within the Cartesian plane, for which the FOA/HO components have been transmitted as indicated by the down-mix metadata, we can compute the corresponding virtual microphone signals. Otherwise, a fallback solution is used for the reference signal Pref,j(k,n), e.g., the omnidirectional component W(k,n).
When rendering to FOA/HOA (FOA/HOA output of the decoder in
In this embodiment, the spatial audio signal, i.e., the audio input signal to the encoder, corresponds to an FOA (first-order Ambisonics) or HOA (higher-order Ambisonics) audio signal. A corresponding block scheme of the encoder and is depicted in
The down-mix signals are generated in the encoder in the “down-mix generation” block in
D
m(k,n)=am,WW(k,n)+am,XX(k,n)+am,YY(k,n)+am,ZZ(k,n).
Note that in case of HOA audio input signals, the linear combination can be performed similarly using the available HOA coefficients. The weights for the linear combination, i.e., the weights am,W, am,X, am,Y, and am,Z in this example, determine the directivity pattern of the resulting directional microphone signal, i.e., of the m-th down-mix signal Dm(k,n). In case of FOA audio input signals, the desired weights for the linear combination can be computed as
Here, cm is the so-called first-order parameter or shape parameter and Φm and Θm are the desired azimuth angle and elevation angle of the look direction of the generated m-th directional microphone signal. For example, for cm=0.5, a directional microphone with cardioid directivity is achieved, cm=1 corresponds to an omnidirectional characteristic cm=0 corresponds to a dipole characteristic. In other words, the parameter cm describes the general shape of the first-order directivity pattern.
The weights for the linear combination, e.g., am,W, am,X, am,Y, and am,Z, or the corresponding parameters cm, Φm, and Θm, describe the directivity patterns of the corresponding directional microphone signals. This information is represented by the down-mix parameters in the encoder in
Different encoding strategies can be used to efficiently represent the down-mix parameters in the bitstream including quantization of the directional information or referring to a table entry by an index, where the table includes all relevant parameters.
In some embodiments it is already sufficient or more efficient to use only a limited number of presets for the look directions Φm and Θm as well as for the shape parameter cm. This obviously corresponds to using a limited number of presets for the weights am,W, am,X, am,Y, and am,Z, too. For example, the shape parameters can be limited to represent only three different directivity patterns: omnidirectional, cardioid, and dipole characteristic. The number of possible look directions Φm and Θm can be limited such that they only represent the cases left, right, front, back, up, and down.
In another even simpler embodiment, the shape parameter is kept fixed and corresponds to a cardioid pattern or the shape parameter is not defined at all. The down-mix parameters associated with the look direction are used to signal whether a pair of downmix-channels correspond to a left/right or a front/back channel pair configuration such that the rendering process at the decoder can use the optimum down-mix channel as reference signal for rendering a certain loudspeaker channel located in the in the left, right or frontal hemisphere.
In the practical application, the parameter cm can be defined, e.g., manually (typically cm=0.5). The look directions Φm and Θm can be set automatically (e.g., by localizing the active sound sources using a state-of-the-art sound source localization approach and directing the first down-mix signal towards the localized source and the second down-mix signal towards the opposite direction).
Note that similarly as in the previous embodiment, the down-mix parameters can be time-frequency dependent, i.e., a different down-mix configuration may be used for each time and frequency (e.g., when directing the down-mix signals depending on the active source direction localized separately in each frequency band). The localization can be done for example by exploiting the information contained in the time-frequency dependent spatial parameters [Thiergart09].
In the “spatial audio synthesis” stage in the decoder in
For example, when generating loudspeaker output channels (MC output), the computation of the reference signals Pref,j(k,n) has to be adapted to the actually used down-mix configuration. More specifically, the most suitable choice for the reference signal Pref,j(k,n) of the j-th loudspeaker depends on the directional characteristic of the down-mix signals (e.g., its look direction) and the position of the j-th loudspeaker. For example, if the down-mix metadata indicates that the down-mix signals correspond to two cardioid microphone signals pointing to the left and right, respectively, the reference signal of a loudspeaker located in the left hemisphere should mainly or solely use the cardioid down-mix signal pointing to the left as reference signal Pref,j(k,n). A loudspeaker located at the center may use a linear combination of both down-mix signals instead (e.g., a sum of the two down-mix signals). On the other hand, if the down-mix signals correspond to two cardioid microphone signals pointing to the front and back, respectively, the reference signal of a loudspeaker located in the frontal hemisphere should mainly or solely use the cardioid signal pointing to the front as reference signal Pref,j(k,n).
When generating FOA or HOA output in the decoder in
P
ref,1(k,n)=D1(k,n)+D2(k,n).
In fact, it is known that the sum of two cardioid signals with opposite look direction leads to an omnidirectional signal. In this case, Pref,1(k,n) directly results in the first component of the desired FOA or HOA output signal, i.e., no further spatial sound synthesis is required for this component. Similarly, the third FOA component (dipole component in y-direction) can be computed as the difference of the two cardioid down-mix signals, i.e.,
P
ref,3(k,n)=D1(k,n)−D2(k,n).
In fact, it is known that the difference of two cardioid signals with opposite look direction leads to a dipole signal. In this case, Pref,3(k,n) directly results in the third component of the desired FOA or HOA output signal, i.e., no further spatial sound synthesis is required for this component. All remaining FOA or HOA components may be synthesized from an omnidirectional reference signal, which contains audio information from all directions. This means, in this example the sum of the two down-mix signals is used for the synthesis of the remaining FOA or HOA components. If the down-mix metadata indicates a different directivity of the two audio down-mix signals, the computation of the reference signals Pref,j(k,n) can be adjusted accordingly. For example, if the two cardioid audio down-mix signals are directed towards the front and back (instead of left and right), the difference of the two down-mix signals can be used to generate the second FOA component (dipole component in x-direction) instead of the third FOA component. In general, as shown by the examples above, the optimal reference signal Pref,j(k,n) can be found by a linear combination of the received down-mix audio signals, i.e.,
P
ref,j(k,n)=A1,jD1(k,n)+A2,jD2(k,n)
where the weights A1,j and the A2,j of the linear combination depend on the down-mix metadata, i.e., on the transport channel configuration and the considered j-th reference signal (e.g. when rendering to the j-th loudspeaker).
Note that the synthesis of FOA or HOA components from an omnidirectional component using spatial metadata is described for example in [Thiergart17].
In general, it is important to note that a significant degradation of the spatial audio quality has to be expected if the spatial audio synthesis uses a wrong down-mix signal as the reference signal for rendering. For example, if the down-mix signal corresponding to the cardioid microphone pointing to the left is used for generating an output channel signal for a loudspeaker located in the right hemisphere, the signal components originating from the left hemisphere of the input sound field would be directed mainly to the right hemisphere of the reproduction system leading to an incorrect spatial image of the output.
In this embodiment, the input to the encoder corresponds to a so-called parametric spatial audio input signal, which comprises the audio signals of an arbitrary array configuration consisting of two or more microphones together with spatial parameters of the spatial sound (e.g., DOA and diffuseness).
The encoder for this embodiment is depicted in
In the following, it is described how the audio down-mix signals and corresponding down-mix metadata can be generated.
In a first example, the audio down-mix signals are generated by selecting a subset of the available input microphone signals. The selection can be done manually (e.g., based on presets) or automatically. For example, if the microphone signals of a uniform circular array with M spaced omnidirectional microphones are used as input to the spatial audio encoder and two audio down-mix transport channels are used for transmission, a manual selection could consist e.g. of selecting a pair of signals corresponding to the microphones at the front and at the back of the array, or a pair of signals corresponding to the microphones at the left and right side of the array. Selecting the front and back microphone as down-mix signals enables a good discrimination between frontal sounds and sounds from the back when synthesizing the spatial sound at the decoder. Similarly, selecting the left and right microphone would enable a good discrimination of spatial sounds along the y-axis when rendering the spatial sound at the decoder side. For example, if a recorded sound source is located at the left side of the microphone array, there is a difference in the time-of-arrival of the source's signal at the left and right microphone, respectively. In other words, the signal reaches the left microphone first, and then the right microphone. At the rendering process at the decoder, it is therefore also important to use the down-mix signal associated with the left microphone signal for rendering to loudspeakers located in the left hemisphere and analogously to use the down-mix signal associated with the right microphone signal for rendering to loudspeakers located in the right hemisphere. Otherwise, the time differences included in the left and right down-mix signals, respectively, would be directed to loudspeakers in an incorrect way and the resulting perceptual cues caused by the loudspeaker signals are incorrect, i.e. the perceived spatial audio image by a listener would be incorrect, too. Analogously, it is important to be able at the decoder to distinguish between down-mix channels corresponding to front and back or up and down in order to achieve optimum rendering quality.
The selection of the appropriate microphone signals can be done by considering the Cartesian plane that contains most of the acoustic energy, or which is expected to contain most relevant sound energy. To carry out an automatic selection, one can perform e.g. a state-of-the-art acoustic source localization, and then select the two microphones that are closest to the axis corresponding to the source direction. A similar concept can be applied e.g. if the microphone array consists of M coincident directional microphones (e.g., cardioids) instead of spaced omnidirectional microphones. In this case, one can could select the two directional microphones that are oriented in the direction and in the opposite direction of the Cartesian axes that contains (or is expected to contain) most acoustic energy.
In this first example, the down-mix metadata contains the relevant information on the selected microphones. This information can contain for example the microphone positions of the selected microphones (e.g., in terms of absolute or relative coordinates in a Cartesian coordinate system) and/or inter-microphone distances and/or the orientation (e.g., in terms of coordinates in the polar coordinate system, i.e., in terms of an azimuth and elevation angle Φm and Θm). Additionally, the down-mix metadata may comprise information on the directivity pattern of the selected microphones, e.g., by using the first-order parameter cm described before.
On the decoder side (
When generating FOA/HOA output at the decoder, a single down-mix signal may be selected (at will) for generating the direct sound for all FOA/HOA components if the down-mix metadata indicates that spaced omnidirectional microphones have been transmitted. In fact, each omnidirectional microphone contains the same information on the direct sound to be reproduced due to the omnidirectional characteristic. However, for generating the diffuse sound reference signals {tilde over (P)}ref,j, one can consider all transmitted omnidirectional down-mix signals. In fact, if the sound field is diffuse, the spaced omnidirectional down-mix signals will be partially decorrelated such that less decorrelation is required to generate mutually uncorrelated reference signals {tilde over (P)}ref,j. The mutually uncorrelated reference signals can be generated from the transmitted down-mix audio signals by using e.g. the covariance-based rendering approach proposed in [Vilkamo13].
It is well-known that the correlation between the signals of two microphones in a diffuse sound field strongly depends on the distance between the microphones: the larger the distance of the microphones the less the recorded signals in a diffuse sound field are correlated [Laitinen11]. The information related to the microphone distance included in the down-mix parameters can be used at the decoder to determine by how much the down-mix channels have to be synthetically decorrelated to be suitable for rendering diffuse sound components. In case of the down-mix signals are already sufficiently decorrelated due to sufficiently large microphone spacings, artificial decorrelation may even be discarded and any decorrelation related artifacts can be avoided.
When the down-mix metadata indicates that e.g. coincident directional microphone signals have been transmitted as downmix signals, then the reference signals Pref,j(k,n) for FOA/HOA output can be generated as explained in the second embodiment.
Note that instead of selecting a subset of microphones as down-mix audio signals in the encoder, one could select all available microphone input signal (for example two or more) as down-mix audio signal. In this case, the down-mix metadata describes the entire microphone array configuration, e.g., in terms of Cartesian microphone positions, microphone look directions Φm and Θm in polar coordinates, or microphone directivities in terms of first-order parameters cm.
In a second example, the down-mix audio signals are generated in the encoder in the “down-mix generation” block using a linear combination of the input microphone signals, e.g., using spatial filtering (beamforming). In this case, the down-mix signals Dm(k,n) can be computed as
D
m(k,n)=wmHx(k,n)
Here, x(k,n) is a vector containing all input microphone signals and wmH are the weights for the linear combination, i.e., the weights of the spatial filter or beamformer, for the m-th audio down-mix signal. There are various ways to compute spatial filters or beamformers in an optimal way [Veen88]. In many cases, a look direction {Φm, Θm} is defined, towards which the beamformer is directed. The beamformer weights can then be computed, e.g., as a delay-and-sum beamformer or MVDR beamformer [Veen88]. In this embodiment, the beamformer look direction {Φm, Θm} is defined for each audio down-mix signal. This can be done manually (e.g., based on presets) or automatically in the same ways as described in the second embodiment. The look direction {Φm, Θm} of the beamformer signals, which represent the different audio down-mix signals, then can represents the down-mix metadata that is transmitted to the decoder in
Another example is especially suitable when using loudspeaker output at the decoder (MC output). In this case, that down-mix signal Dm(k,n) is used as Pref,j(k,n) for which the beamformer look direction is closest to the loudspeaker direction. The required beamformer look direction is described by the down-mix metadata.
Note that in all examples, the transport channel configuration, i.e., down-mix parameters, can be adjusted time-frequency dependent, e.g., based on the spatial parameters, similarly as in the previous embodiments.
Subsequently, further embodiments of the present invention or the embodiments already described before are discussed with respect to the same or additional or further aspects.
Advantageously, the transport representation generator 600 of
The transport data generated by one or several of the blocks 602 are input into the transport metadata generator 605 included in the transport representation generator 600 of
Any one of the blocks 602 generates the advantageously non-encoded transport representation 614 that is then further encoded by a core encoder 603 such as the one illustrated in
It is outlined that an actual implementation of the transport representation generator 600 may comprise only a single one of the blocks 602 in
Further embodiments relate to the transport metadata indicating a shape parameter referring to the shape of, for example, a certain physical or virtual microphone directivity generating the corresponding transport representation signal. The shape parameter may indicate an omnidirectional microphone signal shape or a cardioid microphone signal shape or a dipole microphone signal shape or any other related shape. Further transport metadata alternatives relate to microphone locations, microphone orientations, a distance between microphones or a directional pattern of microphones that have, for example, generated or recorded the transport representation signals included in the (encoded) transport representation 614. Further embodiments relate to the look direction or a plurality of look directions of signals included in the transport representation or information on beamforming weights or beamformer directions or, alternatively or additionally, related to whether the included microphone signals are omnidirectional microphone signals or cardioid microphone signals or other signals. A very small transport metadata side information (with respect to bit rate) can be generated by simply including a single flag indicating whether the transport signals are microphone signals from an omnidirectional microphone or from any other microphone different from an omnidirectional microphone.
In
The reference signal for the (virtual) channels is determined based on the transport downmix data and a fallback procedure is used for the missing component, i.e., for the fourth component with respect to the examples in
In an alternative implementation, a selection of a component as an FOA component is performed as indicated in block 913 and the calculation of the missing component is performed using a spatial basis function response as illustrated at item 914 in
Furthermore, different look directions can comprise left, right, front, back, up, down, a specific direction of arrival consisting of an azimuth angle φ and an elevation angle θ or, alternatively, a short metadata consisting of an indication that the pair of signals in the transport representation comprise a left/right pair or a front/back pair.
In
For the purpose of performing one or several of the alternatives 931 to 935, several associated transport metadata are useful that are indicated to the right of
Furthermore,
The result of the weighter 824 is the diffuse portion and the diffuse portion is added to the direct portion by the adder 825 in order to obtain a certain mid-order sound field component for a certain mode m and a certain order l. It is advantageous to apply the diffuse compensation gain discussed with respect to
A direct portion only generation is illustrated in
However, in generating the sound field components, particularly for an FOA or HOA representation, either the procedure of
Naturally, the component generation illustrated in
While
Furthermore, the reference signal generated by the reference signal calculator Pref is input into the decorrelation filter 823 to obtain a decorrelated reference signal and then the signal is weighted, advantageously using a diffuseness parameter and also advantageously using a microphone distance obtained from the transport metadata 710. The output of the weighter 824 is the diffuse component Pdiff and the adder 825 adds the direct component and the diffuse component to obtain a certain loudspeaker signal or object signal or binaural channel for the corresponding representation. In particular, when virtual loudspeaker signals are calculated the procedure performed by the reference signal calculator 821, 760 in reply to the transport metadata can be performed as illustrated in
Although some aspects have been described in the context of an apparatus, it is clear that these aspects also represent a description of the corresponding method, where a block or device corresponds to a method step or a feature of a method step. Analogously, aspects described in the context of a method step also represent a description of a corresponding block or item or feature of a corresponding apparatus.
Depending on certain implementation requirements, embodiments of the invention can be implemented in hardware or in software. The implementation can be performed using a digital storage medium, for example a floppy disk, a DVD, a CD, a ROM, a PROM, an EPROM, an EEPROM or a FLASH memory, having electronically readable control signals stored thereon, which cooperate (or are capable of cooperating) with a programmable computer system such that the respective method is performed.
Some embodiments according to the invention comprise a data carrier having electronically readable control signals, which are capable of cooperating with a programmable computer system, such that one of the methods described herein is performed.
Generally, embodiments of the present invention can be implemented as a computer program product with a program code, the program code being operative for performing one of the methods when the computer program product runs on a computer. The program code may for example be stored on a machine readable carrier.
Other embodiments comprise the computer program for performing one of the methods described herein, stored on a machine readable carrier or a non-transitory storage medium.
In other words, an embodiment of the inventive method is, therefore, a computer program having a program code for performing one of the methods described herein, when the computer program runs on a computer.
A further embodiment of the inventive methods is, therefore, a data carrier (or a digital storage medium, or a computer-readable medium) comprising, recorded thereon, the computer program for performing one of the methods described herein.
A further embodiment of the inventive method is, therefore, a data stream or a sequence of signals representing the computer program for performing one of the methods described herein. The data stream or the sequence of signals may for example be configured to be transferred via a data communication connection, for example via the Internet.
A further embodiment comprises a processing means, for example a computer, or a programmable logic device, configured to or adapted to perform one of the methods described herein.
A further embodiment comprises a computer having installed thereon the computer program for performing one of the methods described herein.
In some embodiments, a programmable logic device (for example a field programmable gate array) may be used to perform some or all of the functionalities of the methods described herein. In some embodiments, a field programmable gate array may cooperate with a microprocessor in order to perform one of the methods described herein. Generally, the methods are advantageously performed by any hardware apparatus.
While this invention has been described in terms of several embodiments, there are alterations, permutations, and equivalents which fall within the scope of this invention. It should also be noted that there are many alternative ways of implementing the methods and compositions of the present invention. It is therefore intended that the following appended claims be interpreted as including all such alterations, permutations and equivalents as fall within the true spirit and scope of the present invention.
Number | Date | Country | Kind |
---|---|---|---|
19152911.4 | Jan 2019 | EP | regional |
This application is a continuation of copending International Application No. PCT/EP2020/051396, filed Jan. 21, 2020, which is incorporated herein by reference in its entirety, and additionally claims priority from European Application No. 19152911.4, filed Jan. 21, 2019, which is incorporated herein by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/EP2020/051396 | Jan 2020 | US |
Child | 17375465 | US |