The present invention relates to audio processing and, in particular, to an apparatus and method for sound acquisition via the extraction of geometrical information from direction of arrival estimates.
Traditional spatial sound recording aims at capturing a sound field with multiple microphones such that at the reproduction side, a listener perceives the sound image as it was at the recording location. Standard approaches for spatial sound recording usually use spaced, omnidirectional microphones, for example, in AB stereophony, or coincident directional microphones, for example, in intensity stereophony, or more sophisticated microphones, such as a B-format microphone, e.g. in Ambisonics, see, for example,
For the sound reproduction, these non-parametric approaches derive the desired audio playback signals (e.g., the signals to be sent to the loudspeakers) directly from the recorded microphone signals.
Alternatively, methods based on a parametric representation of sound fields can be applied, which are referred to as parametric spatial audio coders. These methods often employ microphone arrays to determine one or more audio downmix signals together with spatial side information describing the spatial sound. Examples are Directional Audio Coding (DirAC) or the so-called spatial audio microphones (SAM) approach. More details on DirAC can be found in
For more details on the spatial audio microphones approach, reference is made to
In DirAC, for instance the spatial cue information comprises the direction-of-arrival (DOA) of sound and the diffuseness of the sound field computed in a time-frequency domain. For the sound reproduction, the audio playback signals can be derived based on the parametric description. In some applications, spatial sound acquisition aims at capturing an entire sound scene. In other applications spatial sound acquisition only aims at capturing certain desired components. Close talking microphones are often used for recording individual sound sources with high signal-to-noise ratio (SNR) and low reverberation, while more distant configurations such as XY stereophony represent a way for capturing the spatial image of an entire sound scene. More flexibility in terms of directivity can be achieved with beamforming, where a microphone array can be used to realize steerable pick-up patterns. Even more flexibility is provided by the above-mentioned methods, such as directional audio coding (DirAC) (see [2], [3]) in which it is possible to realize spatial filters with arbitrary pick-up patterns, as described in
All the above-mentioned concepts have in common that the microphones are arranged in a fixed known geometry. The spacing between microphones is as small as possible for coincident microphonics, whereas it is normally a few centimeters for the other methods. In the following, we refer to any apparatus for the recording of spatial sound capable of retrieving direction of arrival of sound (e.g. a combination of directional microphones or a microphone array, etc.) as a spatial microphone.
Moreover, all the above-mentioned methods have in common that they are limited to a representation of the sound field with respect to only one point, namely the measurement location. Thus, the microphones that may be used may be placed at very specific, carefully selected positions, e.g. close to the sources or such that the spatial image can be captured optimally.
In many applications however, this is not feasible and therefore it would be beneficial to place several microphones further away from the sound sources and still be able to capture the sound as desired.
There exist several field reconstruction methods for estimating the sound field in a point in space other than where it was measured. One method is acoustic holography, as described in
Acoustic holography allows to compute the sound field at any point with an arbitrary volume given that the sound pressure and particle velocity is known on its entire surface. Therefore, when the volume is large, the number of sensors that may be used is unpractically large. Moreover, the method assumes that no sound sources are present inside the volume, making the algorithm unfeasible for our needs. The related wave field extrapolation (see also [8]) aims at extrapolating the known sound field on the surface of a volume to outer regions. The extrapolation accuracy however degrades rapidly for larger extrapolation distances as well as for extrapolations towards directions orthogonal to the direction of propagation of the sound, see
A major drawback of traditional approaches is that the spatial image recorded is relative to the spatial microphone used. In many applications, it is not possible or feasible to place a spatial microphone in the desired position, e.g., close to the sound sources. In this case, it would be more beneficial to place multiple spatial microphones further away from the sound scene and still be able to capture the sound as desired.
According to an embodiment, an apparatus for generating an audio output signal to simulate a recording of the audio output signal by a virtual microphone at a configurable virtual position in an environment may have: a sound events position estimator for estimating a sound event position indicating a position of a sound event in the environment, wherein the sound event is active at a certain time instant or in a certain time-frequency bin, wherein the sound event is a real sound source or a mirror image source, wherein the sound events position estimator is configured to estimate the sound event position indicating a position of a mirror image source in the environment when the sound event is a mirror image source, and wherein the sound events position estimator is adapted to estimate the sound event position based on a first direction information provided by a first real spatial microphone being located at a first real microphone position in the environment, and based on a second direction information provided by a second real spatial microphone being located at a second real microphone position in the environment, wherein the first real spatial microphone and the second real spatial microphone are spatial microphones which physically exist; and wherein the first real spatial microphone and the second real spatial microphone are apparatuses for acquisition of spatial sound capable of retrieving direction of arrival of sound, and an information computation module for generating the audio output signal based on a first recorded audio input signal, based on the first real microphone position, based on the virtual position of the virtual microphone, and based on the sound event position, wherein the first real spatial microphone is configured to record the first recorded audio input signal, or wherein a third microphone is configured to record the first recorded audio input signal, wherein the sound events position estimator is adapted to estimate the sound event position based on a first direction of arrival of the sound wave emitted by the sound event at the first real microphone position as the first direction information and based on a second direction of arrival of the sound wave at the second real microphone position as the second direction information, and wherein the information computation module includes a propagation compensator, wherein the propagation compensator is adapted to generate a first modified audio signal by modifying the first recorded audio input signal, based on a first amplitude decay between the sound event and the first real spatial microphone and based on a second amplitude decay between the sound event and the virtual microphone, by adjusting an amplitude value, a magnitude value or a phase value of the first recorded audio input signal, to acquire the audio output signal; or wherein the propagation compensator is adapted to generate a first modified audio signal by compensating a first time delay between an arrival of a sound wave emitted by the sound event at the first real spatial microphone and an arrival of the sound wave at the virtual microphone by adjusting an amplitude value, a magnitude value or a phase value of the first recorded audio input signal, to acquire the audio output signal.
According to another embodiment, a method for generating an audio output signal to simulate a recording of the audio output signal by a virtual microphone at a configurable virtual position in an environment may have the steps of: estimating a sound event position indicating a position of a sound event in the environment, wherein the sound event is active at a certain time instant or in a certain time-frequency bin, wherein the sound event is a real sound source or a mirror image source, wherein estimating the sound event position includes estimating the sound event position indicating a position of a mirror image source in the environment when the sound event is a mirror image source, and wherein estimating the sound event position is based on a first direction information provided by a first real spatial microphone being located at a first real microphone position in the environment, and based on a second direction information provided by a second real spatial microphone being located at a second real microphone position in the environment, wherein the first real spatial microphone and the second real spatial microphone are spatial microphones which physically exist; and wherein the first real spatial microphone and the second real spatial microphone are apparatuses for acquisition of spatial sound capable of retrieving direction of arrival of sound, and generating the audio output signal based on a first recorded audio input signal, based on the first real microphone position, based on the virtual position of the virtual microphone, and based on the sound event position, wherein the first real spatial microphone is configured to record the first recorded audio input signal, or wherein a third microphone is configured to record the first recorded audio input signal, wherein estimating the sound event position is conducted based on a first direction of arrival of the sound wave emitted by the sound event at the first real microphone position as the first direction information and based on a second direction of arrival of the sound wave at the second real microphone position as the second direction information, wherein generating the audio output signal includes generating a first modified audio signal by modifying the first recorded audio input signal, based on a first amplitude decay between the sound event and the first real spatial microphone and based on a second amplitude decay between the sound event and the virtual microphone, by adjusting an amplitude value, a magnitude value or a phase value of the first recorded audio input signal, to acquire the audio output signal; or wherein generating the audio output signal includes generating a first modified audio signal by compensating a first time delay between an arrival of a sound wave emitted by the sound event at the first real spatial microphone and an arrival of the sound wave at the virtual microphone by adjusting an amplitude value, a magnitude value or a phase value of the first recorded audio input signal, to acquire the audio output signal.
Another embodiment may have a computer program for implementing the method for generating an audio output signal to simulate a recording of the audio output signal by a virtual microphone at a configurable virtual position in an environment, which method may have the steps of: estimating a sound event position indicating a position of a sound event in the environment, wherein the sound event is active at a certain time instant or in a certain time-frequency bin, wherein the sound event is a real sound source or a mirror image source, wherein estimating the sound event position includes estimating the sound event position indicating a position of a mirror image source in the environment when the sound event is a mirror image source, and wherein estimating the sound event position is based on a first direction information provided by a first real spatial microphone being located at a first real microphone position in the environment, and based on a second direction information provided by a second real spatial microphone being located at a second real microphone position in the environment, wherein the first real spatial microphone and the second real spatial microphone are spatial microphones which physically exist; and wherein the first real spatial microphone and the second real spatial microphone are apparatuses for acquisition of spatial sound capable of retrieving direction of arrival of sound, and generating the audio output signal based on a first recorded audio input signal, based on the first real microphone position, based on the virtual position of the virtual microphone, and based on the sound event position, wherein the first real spatial microphone is configured to record the first recorded audio input signal, or wherein a third microphone is configured to record the first recorded audio input signal, wherein estimating the sound event position is conducted based on a first direction of arrival of the sound wave emitted by the sound event at the first real microphone position as the first direction information and based on a second direction of arrival of the sound wave at the second real microphone position as the second direction information, wherein generating the audio output signal includes generating a first modified audio signal by modifying the first recorded audio input signal, based on a first amplitude decay between the sound event and the first real spatial microphone and based on a second amplitude decay between the sound event and the virtual microphone, by adjusting an amplitude value, a magnitude value or a phase value of the first recorded audio input signal, to acquire the audio output signal; or wherein generating the audio output signal comprises generating a first modified audio signal by compensating a first time delay between an arrival of a sound wave emitted by the sound event at the first real spatial microphone and an arrival of the sound wave at the virtual microphone by adjusting an amplitude value, a magnitude value or a phase value of the first recorded audio input signal, to acquire the audio output signal, when being executed on a computer or a signal processor.
According to an embodiment, an apparatus for generating an audio output signal to simulate a recording of a virtual microphone at a configurable virtual position in an environment is provided. The apparatus comprises a sound events position estimator and an information computation module. The sound events position estimator is adapted to estimate a sound source position indicating a position of a sound source in the environment, wherein the sound events position estimator is adapted to estimate the sound source position based on a first direction information provided by a first real spatial microphone being located at a first real microphone position in the environment, and based on a second direction information provided by a second real spatial microphone being located at a second real microphone position in the environment.
The information computation module is adapted to generate the audio output signal based on a first recorded audio input signal being recorded by the first real spatial microphone, based on the first real microphone position, based on the virtual position of the virtual microphone, and based on the sound source position.
In an embodiment, the information computation module comprises a propagation compensator, wherein the propagation compensator is adapted to generate a first modified audio signal by modifying the first recorded audio input signal, based on a first amplitude decay between the sound source and the first real spatial microphone and based on a second amplitude decay between the sound source and the virtual microphone, by adjusting an amplitude value, a magnitude value or a phase value of the first recorded audio input signal, to obtain the audio output signal. In an embodiment, the first amplitude decay may be an amplitude decay of a sound wave emitted by a sound source and the second amplitude decay may be an amplitude decay of the sound wave emitted by the sound source.
According to another embodiment, the information computation module comprises a propagation compensator being adapted to generate a first modified audio signal by modifying the first recorded audio input signal by compensating a first delay between an arrival of a sound wave emitted by the sound source at the first real spatial microphone and an arrival of the sound wave at the virtual microphone by adjusting an amplitude value, a magnitude value or a phase value of the first recorded audio input signal, to obtain the audio output signal.
According to an embodiment, it is assumed to use two or more spatial microphones, which are referred to as real spatial microphones in the following. For each real spatial microphone, the DOA of the sound can be estimated in the time-frequency domain. From the information gathered by the real spatial microphones, together with the knowledge of their relative position, it is possible to constitute the output signal of an arbitrary spatial microphone virtually placed at will in the environment. This spatial microphone is referred to as virtual spatial microphone in the following.
Note that the Direction of Arrival (DOA) may be expressed as an azimuthal angle if 2D space, or by an azimuth and elevation angle pair in 3D. Equivalently, a unit norm vector pointed at the DOA may be used.
In embodiments, means are provided to capture sound in a spatially selective way, e.g., sound originating from a specific target location can be picked up, just as if a close-up “spot microphone” had been installed at this location. Instead of really installing this spot microphone, however, its output signal can be simulated by using two or more spatial microphones placed in other, distant positions.
The term “spatial microphone” refers to any apparatus for the acquisition of spatial sound capable of retrieving direction of arrival of sound (e.g. combination of directional microphones, microphone arrays, etc.).
The term “non-spatial microphone” refers to any apparatus that is not adapted for retrieving direction of arrival of sound, such as a single omnidirectional or directive microphone.
It should be noted, that the term “real spatial microphone” refers to a spatial microphone as defined above which physically exists.
Regarding the virtual spatial microphone, it should be noted, that the virtual spatial microphone can represent any desired microphone type or microphone combination, e.g. it can, for example, represent a single omnidirectional microphone, a directional microphone, a pair of directional microphones as used in common stereo microphones, but also a microphone array.
The present invention is based on the finding that when two or more real spatial microphones are used, it is possible to estimate the position in 2D or 3D space of sound events, thus, position localization can be achieved. Using the determined positions of the sound events, the sound signal that would have been recorded by a virtual spatial microphone placed and oriented arbitrarily in space can be computed, as well as the corresponding spatial side information, such as the Direction of Arrival from the point-of-view of the virtual spatial microphone.
For this purpose, each sound event may be assumed to represent a point like sound source, e.g. an isotropic point like sound source. In the following “real sound source” refers to an actual sound source physically existing in the recording environment, such as talkers or musical instruments etc. On the contrary, with “sound source” or “sound event” we refer in the following to an effective sound source, which is active at a certain time instant or in a certain time-frequency bin, wherein the sound sources may, for example, represent real sound sources or mirror image sources. According to an embodiment, it is implicitly assumed that the sound scene can be modeled as a multitude of such sound events or point like sound sources. Furthermore, each source may be assumed to be active only within a specific time and frequency slot in a predefined time-frequency representation. The distance between the real spatial microphones may be so, that the resulting temporal difference in propagation times is shorter than the temporal resolution of the time-frequency representation. The latter assumption guarantees that a certain sound event is picked up by all spatial microphones within the same time slot. This implies that the DOAs estimated at different spatial microphones for the same time-frequency slot indeed correspond to the same sound event. This assumption is not difficult to meet with real spatial microphones placed at a few meters from each other even in large rooms (such as living rooms or conference rooms) with a temporal resolution of even a few ms.
Microphone arrays may be employed to localize sound sources. The localized sound sources may have different physical interpretations depending on their nature. When the microphone arrays receive direct sound, they may be able to localize the position of a true sound source (e.g. talkers). When the microphone arrays receive reflections, they may localize the position of a mirror image source. Mirror image sources are also sound sources.
A parametric method capable of estimating the sound signal of a virtual microphone placed at an arbitrary location is provided. In contrast to the methods previously described, the proposed method does not aim directly at reconstructing the sound field, but rather aims at providing sound that is perceptually similar to the one which would be picked up by a microphone physically placed at this location. This may be achieved by employing a parametric model of the sound field based on point-like sound sources, e.g. isotropic point-like sound sources (IPLS). The geometrical information that may be used, namely the instantaneous position of all IPLS, may be obtained by conducting triangulation of the directions of arrival estimated with two or more distributed microphone arrays. This might be achieved, by obtaining knowledge of the relative position and orientation of the arrays. Notwithstanding, no a priori knowledge on the number and position of the actual sound sources (e.g. talkers) is necessary. Given the parametric nature of the proposed concepts, e.g. the proposed apparatus or method, the virtual microphone can possess an arbitrary directivity pattern as well as arbitrary physical or non-physical behaviors, e.g. with respect to the pressure decay with distance. The presented approach has been verified by studying the parameter estimation accuracy based on measurements in a reverberant environment.
While conventional recording techniques for spatial audio are limited in so far as the spatial image obtained is relative to the position in which the microphones have been physically placed, embodiments of the present invention take into account that in many applications, it is desired to place the microphones outside the sound scene and yet be able to capture the sound from an arbitrary perspective. According to embodiments, concepts are provided which virtually place a virtual microphone at an arbitrary point in space, by computing a signal perceptually similar to the one which would have been picked up, if the microphone had been physically placed in the sound scene. Embodiments may apply concepts, which may employ a parametric model of the sound field based on point-like sound sources, e.g. point-like isotropic sound sources. The geometrical information that may be used may be gathered by two or more distributed microphone arrays.
According to an embodiment, the sound events position estimator may be adapted to estimate the sound source position based on a first direction of arrival of the sound wave emitted by the sound source at the first real microphone position as the first direction information and based on a second direction of arrival of the sound wave at the second real microphone position as the second direction information.
In another embodiment, the information computation module may comprise a spatial side information computation module for computing spatial side information. The information computation module may be adapted to estimate the direction of arrival or an active sound intensity at the virtual microphone as spatial side information, based on a position vector of the virtual microphone and based on a position vector of the sound event.
According to a further embodiment, the propagation compensator may be adapted to generate the first modified audio signal in a time-frequency domain, by compensating the first delay or amplitude decay between the arrival of the sound wave emitted by the sound source at the first real spatial microphone and the arrival of the sound wave at the virtual microphone by adjusting said magnitude value of the first recorded audio input signal being represented in a time-frequency domain.
In an embodiment, the propagation compensator may be adapted to conduct propagation compensation by generating a modified magnitude value of the first modified audio signal by applying the formula:
wherein d1(k, n) is the distance between the position of the first real spatial microphone and the position of the sound event, wherein s(k, n) is the distance between the virtual position of the virtual microphone and the sound source position of the sound event, wherein Pref(k, n) is a magnitude value of the first recorded audio input signal being represented in a time-frequency domain, and wherein Pv(k, n) is the modified magnitude value.
In a further embodiment, the information computation module may moreover comprise a combiner, wherein the propagation compensator may be furthermore adapted to modify a second recorded audio input signal, being recorded by the second real spatial microphone, by compensating a second delay or amplitude decay between an arrival of the sound wave emitted by the sound source at the second real spatial microphone and an arrival of the sound wave at the virtual microphone, by adjusting an amplitude value, a magnitude value or a phase value of the second recorded audio input signal to obtain a second modified audio signal, and wherein the combiner may be adapted to generate a combination signal by combining the first modified audio signal and the second modified audio signal, to obtain the audio output signal.
According to another embodiment, the propagation compensator may furthermore be adapted to modify one or more further recorded audio input signals, being recorded by the one or more further real spatial microphones, by compensating delays between an arrival of the sound wave at the virtual microphone and an arrival of the sound wave emitted by the sound source at each one of the further real spatial microphones. Each of the delays or amplitude decays may be compensated by adjusting an amplitude value, a magnitude value or a phase value of each one of the further recorded audio input signals to obtain a plurality of third modified audio signals. The combiner may be adapted to generate a combination signal by combining the first modified audio signal and the second modified audio signal and the plurality of third modified audio signals, to obtain the audio output signal.
In a further embodiment, the information computation module may comprise a spectral weighting unit for generating a weighted audio signal by modifying the first modified audio signal depending on a direction of arrival of the sound wave at the virtual position of the virtual microphone and depending on a virtual orientation of the virtual microphone to obtain the audio output signal, wherein the first modified audio signal may be modified in a time-frequency domain.
Moreover, the information computation module may comprise a spectral weighting unit for generating a weighted audio signal by modifying the combination signal depending on a direction of arrival or the sound wave at the virtual position of the virtual microphone and a virtual orientation of the virtual microphone to obtain the audio output signal, wherein the combination signal may be modified in a time-frequency domain.
According to another embodiment, the spectral weighting unit may be adapted to apply the weighting factor
α+(1−α)cos(φv(k,n)), or the weighting factor
0.5+0.5 cos(φv(k,n))
on the weighted audio signal,
wherein φv(k, n) indicates a direction of arrival vector of the sound wave emitted by the sound source at the virtual position of the virtual microphone.
In an embodiment, the propagation compensator is furthermore adapted to generate a third modified audio signal by modifying a third recorded audio input signal recorded by an omnidirectional microphone by compensating a third delay or amplitude decay between an arrival of the sound wave emitted by the sound source at the omnidirectional microphone and an arrival of the sound wave at the virtual microphone by adjusting an amplitude value, a magnitude value or a phase value of the third recorded audio input signal, to obtain the audio output signal.
In a further embodiment, the sound events position estimator may be adapted to estimate a sound source position in a three-dimensional environment.
Moreover, according to another embodiment, the information computation module may further comprise a diffuseness computation unit being adapted to estimate a diffuse sound energy at the virtual microphone or a direct sound energy at the virtual microphone.
The diffuseness computation unit may, according to a further embodiment, be adapted to estimate the diffuse sound energy Ediff(VM) at the virtual microphone by applying the formula:
wherein N is the number of a plurality of real spatial microphones comprising the first and the second real spatial microphone, and wherein Ediff(SMi) is the diffuse sound energy at the i-th real spatial microphone.
In a further embodiment, the diffuseness computation unit may be adapted to estimate the direct sound energy by applying the formula:
wherein “distance SMi−IPLS” is the distance between a position of the i-th real microphone and the sound source position, wherein “distance VM−IPLS” is the distance between the virtual position and the sound source position, and wherein Edir(SMi) is the direct energy at the i-th real spatial microphone.
Moreover, according to another embodiment, the diffuseness computation unit may furthermore be adapted to estimate the diffuseness at the virtual microphone by estimating the diffuse sound energy at the virtual microphone and the direct sound energy at the virtual microphone and by applying the formula:
wherein ψ(VM) indicates the diffuseness at the virtual microphone being estimated, wherein Ediff(VM) indicates the diffuse sound energy being estimated and wherein Edir(VM) indicates the direct sound energy being estimated.
Embodiments of the present invention will be detailed subsequently referring to the appended drawings, in which:
In embodiments, the sound event localization in space, as well as describing the position of the virtual microphone may be conducted based on the positions and orientations of the real and virtual spatial microphones in a common coordinate system. This information may be represented by the inputs 121 . . . 12N and input 104 in
The output of the apparatus or a corresponding method may be, when desired, one or more sound signals 105, which may have been picked up by a spatial microphone defined and placed as specified by 104. Moreover, the apparatus (or rather the method) may provide as output corresponding spatial side information 106 which may be estimated by employing the virtual spatial microphone.
In the following, position estimation of a sound events position estimator according to an embodiment is described in more detail.
Depending on the dimensionality of the problem (2D or 3D) and the number of spatial microphones, several solutions for the position estimation are possible.
If two spatial microphones in 2D exist, (the simplest possible case) a simple triangulation is possible.
In
The triangulation fails when the two lines 430, 440 are exactly parallel. In real applications, however, this is very unlikely. However, not all triangulation results correspond to a physical or feasible position for the sound event in the considered space. For example, the estimated position of the sound event might be too far away or even outside the assumed space, indicating that probably the DOAs do not correspond to any sound event which can be physically interpreted with the used model. Such results may be caused by sensor noise or too strong room reverberation. Therefore, according to an embodiment, such undesired results are flagged such that the information computation module 202 can treat them properly.
Similarly to the 2D case, the triangulation may fail or may yield unfeasible results for certain combinations of directions, which may then also be flagged, e.g. to the information computation module 202 of
If more than two spatial microphones exist, several solutions are possible. For example, the triangulation explained above, could be carried out for all pairs of the real spatial microphones (if N=3, 1 with 2, 1 with 3, and 2 with 3). The resulting positions may then be averaged (along x and y, and, if 3D is considered, z).
Alternatively, more complex concepts may be used. For example, probabilistic approaches may be applied as described in
According to an embodiment, the sound field may be analyzed in the time-frequency domain, for example, obtained via a short-time Fourier transform (STFT), in which k and n denote the frequency index k and time index n, respectively. The complex pressure Pv(k, n) at an arbitrary position pv for a certain k and n is modeled as a single spherical wave emitted by a narrow-band isotropic point-like source, e.g. by employing the formula:
Pv(k,n)=PIPLS(k,n)·γ(k,pIPLS(k,n),pv), (1)
where PIPLS(k, n) is the signal emitted by the IPLS at its position pIPLS(k, n). The complex factor γ(k, pIPLS, pv) expresses the propagation from pIPLS(k, n) to pv, e.g., it introduces appropriate phase and magnitude modifications. Here, the assumption may be applied that in each time-frequency bin only one IPLS is active. Nevertheless, multiple narrow-band IPLSs located at different positions may also be active at a single time instance.
Each IPLS either models direct sound or a distinct room reflection. Its position pIPLS(k, n) may ideally correspond to an actual sound source located inside the room, or a mirror image sound source located outside, respectively. Therefore, the position pIPLS(k, n) may also indicates the position of a sound event.
Please note that the term “real sound sources” denotes the actual sound sources physically existing in the recording environment, such as talkers or musical instruments. On the contrary, with “sound sources” or “sound events” or “IPLS” we refer to effective sound sources, which are active at certain time instants or at certain time-frequency bins, wherein the sound sources may, for example, represent real sound sources or mirror image sources.
Both the actual sound source 153 of
While this single-wave model is accurate only for mildly reverberant environments given that the source signals fulfill the W-disjoint orthogonality (WDO) condition, i.e. the time-frequency overlap is sufficiently small. This is normally true for speech signals, see, for example,
However, the model also provides a good estimate for other environments and is therefore also applicable for those environments.
In the following, the estimation of the positions pIPLS(k, n) according to an embodiment is explained. The position pIPLS(k, n) of an active IPLS in a certain time-frequency bin, and thus the estimation of a sound event in a time-frequency bin, is estimated via triangulation on the basis of the direction of arrival (DOA) of sound measured in at least two different observation points.
Here, φ1(k, n) represents the azimuth of the DOA estimated at the first microphone array, as depicted in
e1(k,n)=R1·e1POV(k,n),
e2(k,n)=R2·e2POV(k,n), (3)
where R are coordinate transformation matrices, e.g.,
when operating in 2D and c1=[c1,x, c1,y]T. For carrying out the triangulation, the direction vectors d1(k, n) and d2(k, n) may be calculated as:
d1(k,n)=d1(k,n)e1(k,n),
d2(k,n)=d2(k,n)e2(k,n), (5)
where d1(k, n)=∥d1(k, n)∥ and d2(k, n)=∥d2(k, n)∥ are the unknown distances between the IPLS and the two microphone arrays. The following equation
p1+d1(k,n)=p2+d2(k,n) (6)
may be solved for d1(k, n). Finally, the position pIPLS(k, n) of the IPLS is given by
pIPLS(k,n)=d1(k,n)e1(k,n)+p1. (7)
In another embodiment, equation (6) may be solved for d2(k, n) and pIPLS(k, n) is analogously computed employing d2(k, n).
Equation (6) provides a solution when operating in 2D, unless e1(k, n) and e2(k, n) are parallel. However, when using more than two microphone arrays or when operating in 3D, a solution cannot be obtained when the direction vectors d do not intersect. According to an embodiment, in this case, the point which is closest to all direction vectors d is be computed and the result can be used as the position of the IPLS.
In an embodiment, all observation points p1, p2, . . . should be located such that the sound emitted by the IPLS falls into the same temporal block n. This requirement may simply be fulfilled when the distance Δ between any two of the observation points is smaller than
where nFFT is the SIFT window length, 0≦R<1 specifies the overlap between successive time frames and fs is the sampling frequency. For example, for a 1024-point SIFT at 48 kHz with 50% overlap (R=0.5), the maximum spacing between the arrays to fulfill the above requirement is Δ=3.65 m.
In the following, an information computation module 202, e.g. a virtual microphone signal and side information computation module, according to an embodiment is described in more detail.
To compute the audio signal of the virtual microphone, the geometrical information, e.g. the position and orientation of the real spatial microphones 121 . . . 12N, the position, orientation and characteristics of the virtual spatial microphone 104, and the position estimates of the sound events 205 are fed into the information computation module 202, in particular, into the propagation parameters computation module 501 of the propagation compensator 500, into the combination factors computation module 502 of the combiner 510 and into the spectral weights computation unit 503 of the spectral weighting unit 520. The propagation parameters computation module 501, the combination factors computation module 502 and the spectral weights computation unit 503 compute the parameters used in the modification of the audio signals 111 . . . 11N in the propagation compensation module 504, the combination module 505 and the spectral weighting application module 506.
In the information computation module 202, the audio signals 111 . . . 11N may at first be modified to compensate for the effects given by the different propagation lengths between the sound event positions and the real spatial microphones. The signals may then be combined to improve for instance the signal-to-noise ratio (SNR). Finally, the resulting signal may then be spectrally weighted to take the directional pick up pattern of the virtual microphone into account, as well as any distance dependent gain function. These three steps are discussed in more detail below.
Propagation compensation is now explained in more detail. In the upper portion of
The lower portion of
The signals at the two real arrays are comparable only if the relative delay Dt12 between them is small. Otherwise, one of the two signals needs to be temporally realigned to compensate the relative delay Dt12, and possibly, to be scaled to compensate for the different decays.
Compensating the delay between the arrival at the virtual microphone and the arrival at the real microphone arrays (at one of the real spatial microphones) changes the delay independent from the localization of the sound event, making it superfluous for most applications.
Returning to
The propagation compensation module 504 is configured to use this information to modify the audio signals accordingly. If the signals are to be shifted by a small amount of time (compared to the time window of the filter bank), then a simple phase rotation suffices. If the delays are larger, more complicated implementations may be used.
The output of the propagation compensation module 504 are the modified audio signals expressed in the original time-frequency domain.
In the following, a particular estimation of propagation compensation for a virtual microphone according to an embodiment will be described with reference to
In the embodiment that is now explained, it is assumed that at least a first recorded audio input signal, e.g. a pressure signal of at least one of the real spatial microphones (e.g. the microphone arrays) is available, for example, the pressure signal of a first real spatial microphone. We will refer to the considered microphone as reference microphone, to its position as reference position pref and to its pressure signal as reference pressure signal Pref(k, n). However, propagation compensation may not only be conducted with respect to only one pressure signal, but also with respect to the pressure signals of a plurality or of all of the real spatial microphones.
The relationship between the pressure signal PIPLS(k, n) emitted by the IPLS and a reference pressure signal Pref(k, n) of a reference microphone located in pref can be expressed by formula (9):
Pref(k,n)=PIPLS(k,n)·γ(k,pIPLS,pref), (9)
In general, the complex factor γ(k, pa, pb) expresses the phase rotation and amplitude decay introduced by the propagation of a spherical wave from its origin in pa to pb. However, practical tests indicated that considering only the amplitude decay in γ leads to plausible impressions of the virtual microphone signal with significantly fewer artifacts compared to also considering the phase rotation.
The sound energy which can be measured in a certain point in space depends strongly on the distance r from the sound source, in
Assuming that the first real spatial microphone is the reference microphone, then pref=p1. In
s(k,n)=∥s(k,n)∥=∥p1+d1(k,n)−pv∥. (10)
The sound pressure Pv(k, n) at the position of the virtual microphone is computed by combining formulas (1) and (9), leading to
As mentioned above, in some embodiments, the factors γ may only consider the amplitude decay due to the propagation. Assuming for instance that the sound pressure decreases with 1/r, then
When the model in formula (1) holds, e.g., when only direct sound is present, then formula (12) can accurately reconstruct the magnitude information. However, in case of pure diffuse sound fields, e.g., when the model assumptions are not met, the presented method yields an implicit dereverberation of the signal when moving the virtual microphone away from the positions of the sensor arrays. In fact, as discussed above, in diffuse sound fields, we expect that most IPLS are localized near the two sensor arrays. Thus, when moving the virtual microphone away from these positions, we likely increase the distance s=∥s∥ in
By conducting propagation compensation on the recorded audio input signal (e.g. the pressure signal) of the first real spatial microphone, a first modified audio signal is obtained.
In embodiments, a second modified audio signal may be obtained by conducting propagation compensation on a recorded second audio input signal (second pressure signal) of the second real spatial microphone.
In other embodiments, further audio signals may be obtained by conducting propagation compensation on recorded further audio input signals (further pressure signals) of further real spatial microphones.
Now, combining in blocks 502 and 505 in
Possible solutions for the combination comprise:
The task of module 502 is, if applicable, to compute parameters for the combining, which is carried out in module 505.
Now, spectral weighting according to embodiments is described in more detail. For this, reference is made to blocks 503 and 506 of
For each time-frequency bin the geometrical reconstruction allows us to easily obtain the DOA relative to the virtual microphone, as shown in
The weight for the time-frequency bin is then computed considering the type of virtual microphone desired.
In case of directional microphones, the spectral weights may be computed according to a predefined pick-up pattern. For example, according to an embodiment, a cardioid microphone may have a pick up pattern defined by the function g(theta),
g(theta)=0.5+0.5 cos(theta),
where theta is the angle between the look direction of the virtual spatial microphone and the DOA of the sound from the point of view of the virtual microphone.
Another possibility is artistic (non physical) decay functions. In certain applications, it may be desired to suppress sound events far away from the virtual microphone with a factor greater than the one characterizing free-field propagation. For this purpose, some embodiments introduce an additional weighting function which depends on the distance between the virtual microphone and the sound event. In an embodiment, only sound events within a certain distance (e.g. in meters) from the virtual microphone should be picked up.
With respect to virtual microphone directivity, arbitrary directivity patterns can be applied for the virtual microphone. In doing so, one can for instance separate a source from a complex sound scene.
Since the DOA of the sound can be computed in the position pv of the virtual microphone, namely
where cv is a unit vector describing the orientation of the virtual microphone, arbitrary directivities for the virtual microphone can be realized. For example, assuming that Pv(k,n) indicates the combination signal or the propagation-compensated modified audio signal, then the formula:
{tilde over (P)}v(k,n)=Pv(k,n)[1+cos(φv(k,n))] (14)
calculates the output of a virtual microphone with cardioid directivity. The directional patterns, which can potentially be generated in this way, depend on the accuracy of the position estimation.
In embodiments, one or more real, non-spatial microphones, for example, an omnidirectional microphone or a directional microphone such as a cardioid, are placed in the sound scene in addition to the real spatial microphones to further improve the sound quality of the virtual microphone signals 105 in
In a further embodiment, computation of the spatial side information of the virtual microphone is realized. To compute the spatial side information 106 of the microphone, the information computation module 202 of
The output of the spatial side information computation module 507 is the side information of the virtual microphone 106. This side information can be, for instance, the DOA or the diffuseness of sound for each time-frequency bin (k, n) from the point of view of the virtual microphone. Another possible side information could, for instance, be the active sound intensity vector Ia(k, n) which would have been measured in the position of the virtual microphone. How these parameters can be derived, will now be described.
According to an embodiment, DOA estimation for the virtual spatial microphone is realized. The information computation module 120 is adapted to estimate the direction of arrival at the virtual microphone as spatial side information, based on a position vector of the virtual microphone and based on a position vector of the sound event as illustrated by
h(k,n)=s(k,n)−r(k,n).
The desired DOA a(k, n) can now be computed for each (k, n) for instance via the definition of the dot product of h(k, n) and v(k,n), namely
a(k,n)=arcos(h(k,n)·v(k,n)/(∥h(k,n)∥∥v(k,n)∥).
In another embodiment, the information computation module 120 may be adapted to estimate the active sound intensity at the virtual microphone as spatial side information, based on a position vector of the virtual microphone and based on a position vector of the sound event as illustrated by
From the DOA a(k, n) defined above, we can derive the active sound intensity Ia(k, n) at the position of the virtual microphone. For this, it is assumed that the virtual microphone audio signal 105 in
Ia(k,n)=−(½rho)|Pv(k,n)|2*[cos a(k,n), sin a(k,n)]T,
where [ ]T denotes a transposed vector, rho is the air density, and Pv(k, n) is the sound pressure measured by the virtual spatial microphone, e.g., the output 105 of block 506 in
If the active intensity vector shall be computed expressed in the general coordinate system but still at the position of the virtual microphone, the following formula may be applied:
Ia(k,n)=(½rho)|Pv(k,n)|2h(k,n)/∥h(k,n)∥.
The diffuseness of sound expresses how diffuse the sound field is in a given time-frequency slot (see, for example, [2]). Diffuseness is expressed by a value ψ, wherein 0≦ψ≦1. A diffuseness of 1 indicates that the total sound field energy of a sound field is completely diffuse. This information is important e.g. in the reproduction of spatial sound. Traditionally, diffuseness is computed at the specific point in space in which a microphone array is placed.
According to an embodiment, the diffuseness may be computed as an additional parameter to the side information generated for the Virtual Microphone (VM), which can be placed at will at an arbitrary position in the sound scene. By this, an apparatus that also calculates the diffuseness besides the audio signal at a virtual position of a virtual microphone can be seen as a virtual DirAC front-end, as it is possible to produce a DirAC stream, namely an audio signal, direction of arrival, and diffuseness, for an arbitrary point in the sound scene. The DirAC stream may be further processed, stored, transmitted, and played back on an arbitrary multi-loudspeaker setup. In this case, the listener experiences the sound scene as if he or she were in the position specified by the virtual microphone and were looking in the direction determined by its orientation.
A diffuseness computation unit 801 of an embodiment is illustrated in
Let Edir(SM1) to Edir(SMN) and Ediff(SM1) to Ediff(SMN) denote the estimates of the energies of direct and diffuse sound for the N spatial microphones computed by energy analysis unit 810. If Pi is the complex pressure signal and ψi is diffuseness for the i-th spatial microphone, then the energies may, for example, be computed according to the formulae:
Edir(SMi)=(1−Ψi)·|Pi|2
Ediff(SMi)=Ψi·|Pi|2
The energy of diffuse sound should be equal in all positions, therefore, an estimate of the diffuse sound energy Ediff(VM) at the virtual microphone can be computed simply by averaging Ediff(SM1) to Ediff(SMN), e.g. in a diffuseness combination unit 820, for example, according to the formula:
A more effective combination of the estimates Ediff(SM1) to Ediff(SMN) could be carried out by considering the variance of the estimators, for instance, by considering the SNR.
The energy of the direct sound depends on the distance to the source due to the propagation. Therefore, Edir(SM1) to Edir(SMN) may be modified to take this into account. This may be carried out, e.g., by a direct sound propagation adjustment unit 830. For example, if it is assumed that the energy of the direct sound field decays with 1 over the distance squared, then the estimate for the direct sound at the virtual microphone for the i-th spatial microphone may be calculated according to the formula:
Similarly to the diffuseness combination unit 820, the estimates of the direct sound energy obtained at different spatial microphones can be combined, e.g. by a direct sound combination unit 840. The result is Edir(VM), e.g., the estimate for the direct sound energy at the virtual microphone. The diffuseness at the virtual microphone ψ(VM) may be computed, for example, by a diffuseness sub-calculator 850, e.g. according to the formula:
As mentioned above, in some cases, the sound events position estimation carried out by a sound events position estimator fails, e.g., in case of a wrong direction of arrival estimation.
Additionally, the reliability of the DOA estimates at the N spatial microphones may be considered. This may be expressed e.g. in terms of the variance of the DOA estimator or SNR. Such an information may be taken into account by the diffuseness sub-calculator 850, so that the VM diffuseness 103 can be artificially increased in case that the DOA estimates are unreliable. In fact, as a consequence, the position estimates 205 will also be unreliable.
Although some aspects have been described in the context of an apparatus, it is clear that these aspects also represent a description of the corresponding method, where a block or device corresponds to a method step or a feature of a method step. Analogously, aspects described in the context of a method step also represent a description of a corresponding block or item or feature of a corresponding apparatus.
The inventive decomposed signal can be stored on a digital storage medium or can be transmitted on a transmission medium such as a wireless transmission medium or a wired transmission medium such as the Internet.
Depending on certain implementation requirements, embodiments of the invention can be implemented in hardware or in software. The implementation can be performed using a digital storage medium, for example a floppy disk, a DVD, a CD, a ROM, a PROM, an EPROM, an EEPROM or a FLASH memory, having electronically readable control signals stored thereon, which cooperate (or are capable of cooperating) with a programmable computer system such that the respective method is performed.
Some embodiments according to the invention comprise a non-transitory data carrier having electronically readable control signals, which are capable of cooperating with a programmable computer system, such that one of the methods described herein is performed.
Generally, embodiments of the present invention can be implemented as a computer program product with a program code, the program code being operative for performing one of the methods when the computer program product runs on a computer. The program code may for example be stored on a machine readable carrier.
Other embodiments comprise the computer program for performing one of the methods described herein, stored on a machine readable carrier.
In other words, an embodiment of the inventive method is, therefore, a computer program having a program code for performing one of the methods described herein, when the computer program runs on a computer.
A further embodiment of the inventive methods is, therefore, a data carrier (or a digital storage medium, or a computer-readable medium) comprising, recorded thereon, the computer program for performing one of the methods described herein.
A further embodiment of the inventive method is, therefore, a data stream or a sequence of signals representing the computer program for performing one of the methods described herein. The data stream or the sequence of signals may for example be configured to be transferred via a data communication connection, for example via the Internet.
A further embodiment comprises a processing means, for example a computer, or a programmable logic device, configured to or adapted to perform one of the methods described herein.
A further embodiment comprises a computer having installed thereon the computer program for performing one of the methods described herein.
In some embodiments, a programmable logic device (for example a field programmable gate array) may be used to perform some or all of the functionalities of the methods described herein. In some embodiments, a field programmable gate array may cooperate with a microprocessor in order to perform one of the methods described herein. Generally, the methods are advantageously performed by any hardware apparatus.
While this invention has been described in terms of several embodiments, there are alterations, permutations, and equivalents which fall within the scope of this invention. It should also be noted that there are many alternative ways of implementing the methods and compositions of the present invention. It is therefore intended that the following appended claims be interpreted as including all such alterations, permutations and equivalents as fall within the true spirit and scope of the present invention.
Literature
This application is a continuation of copending International Application No. PCT/EP2011/071629, filed Dec. 2, 2011, which claims priority from U.S. Provisional No. 61/419,623, filed Dec. 3, 2010, and U.S. Provisional No. 61/420,099, filed Dec. 6, 2010, which are each incorporated herein in its entirety by this reference thereto.
Number | Name | Date | Kind |
---|---|---|---|
6072878 | Moorer | Jun 2000 | A |
6600824 | Matsuo | Jul 2003 | B1 |
6618485 | Matsuo | Sep 2003 | B1 |
6904152 | Moorer | Jun 2005 | B1 |
7606373 | Moorer | Oct 2009 | B2 |
8405323 | Finney et al. | Mar 2013 | B2 |
20020001389 | Amiri et al. | Jan 2002 | A1 |
20040138873 | Heo et al. | Jul 2004 | A1 |
20040157661 | Ueda et al. | Aug 2004 | A1 |
20040186734 | Heo et al. | Sep 2004 | A1 |
20040193430 | Heo et al. | Sep 2004 | A1 |
20050141728 | Moorer | Jun 2005 | A1 |
20050281410 | Grosvenor et al. | Dec 2005 | A1 |
20060002566 | Choi et al. | Jan 2006 | A1 |
20060010445 | Peterson et al. | Jan 2006 | A1 |
20060171547 | Lokki et al. | Aug 2006 | A1 |
20070032894 | Uenishi et al. | Feb 2007 | A1 |
20070203598 | Seo et al. | Aug 2007 | A1 |
20070297616 | Plogsties et al. | Dec 2007 | A1 |
20080298610 | Virolainen et al. | Dec 2008 | A1 |
20090043591 | Breebaart et al. | Feb 2009 | A1 |
20090051624 | Finney et al. | Feb 2009 | A1 |
20090129609 | Oh et al. | May 2009 | A1 |
20090147961 | Lee et al. | Jun 2009 | A1 |
20090252356 | Goodwin et al. | Oct 2009 | A1 |
20100169103 | Pulkki | Jul 2010 | A1 |
20100208904 | Nakajima et al. | Aug 2010 | A1 |
20110313763 | Amada | Dec 2011 | A1 |
20120014535 | Oouchi et al. | Jan 2012 | A1 |
20120140947 | Shin | Jun 2012 | A1 |
20130016842 | Schultz-Amling et al. | Jan 2013 | A1 |
Number | Date | Country |
---|---|---|
1452851 | Oct 2003 | CN |
1714600 | Dec 2005 | CN |
101473645 | Jul 2009 | CN |
101485233 | Jul 2009 | CN |
2154910 | Feb 2010 | EP |
2414369 | Nov 2005 | GB |
H01109996 | Apr 1989 | JP |
H04181898 | Jun 1992 | JP |
H1063470 | Mar 1998 | JP |
2001045590 | Feb 2001 | JP |
2002051399 | Feb 2002 | JP |
2004193877 | Jul 2004 | JP |
2004242728 | Sep 2004 | JP |
2006503491 | Jan 2006 | JP |
2008028700 | Feb 2008 | JP |
2008197577 | Aug 2008 | JP |
2008245984 | Oct 2008 | JP |
2009089315 | Apr 2009 | JP |
2009216473 | Sep 2009 | JP |
2009246827 | Oct 2009 | JP |
2009537876 | Oct 2009 | JP |
2010147692 | Jul 2010 | JP |
2010525646 | Jul 2010 | JP |
2010193451 | Sep 2010 | JP |
2010232717 | Oct 2010 | JP |
2315371 | Jan 2008 | RU |
2383939 | Mar 2010 | RU |
2396608 | Aug 2010 | RU |
200701823 | Jan 2007 | TW |
WO-2004077884 | Sep 2004 | WO |
2005098826 | Oct 2005 | WO |
WO-2006006935 | Jan 2006 | WO |
2006072270 | Jul 2006 | WO |
2006105105 | Oct 2006 | WO |
WO-2007025033 | Mar 2007 | WO |
WO-2008128989 | Oct 2008 | WO |
2009046223 | Apr 2009 | WO |
2009089353 | Jul 2009 | WO |
2010028784 | Mar 2010 | WO |
WO-2010028784 | Mar 2010 | WO |
2010122455 | Oct 2010 | WO |
2010128136 | Nov 2010 | WO |
Entry |
---|
Schultz-Amling et al., “Virtual acoustic zoom based on parametric spatial audio representations”, U.S. Appl. No. 61/287,596, filed Dec. 17, 2009, 11 pages. |
Chien, Jen-Tzung et al., “Car Speech Enhancement Using Microphone Array Beamforming and Post Filters”, Proceedings of the 9th Australian International Conference on Speech Science & Technology; Melbourne, Dec. 2-5, 2002, pp. 568-572. |
Del Galdo, G. et al., “Generating Virtual Microphone Signals Using Geometrical Information Gathered by Distributed Arrays”, IEEE, 2011 Joint Workshop on Hands-free Speech Communications and Microphone Arrays., May 30-Jun. 1, 2011, pp. 185-190. |
Del Galdo et al., “Optimized Parameter Estimation in Directional Audio Coding Using Nested Mircophone Arrays”, AES Convention Paper 7911; Presented at the 127th Convention; New York, NY, USA, Oct. 9-12, 2009, 9 pages. |
Engdegard, J. et al., “Spatial Audio Object Coding (SAOC)—The Upcoming MPEG Standard on Parametric Object Based Audio Coding”, Audio Engineering Society Convention Paper, Presented at the 124th Convention, Amsterdam, The Netherlands, May 17-20, 2008, 15 pages. |
Fahy, F.J., “Sound energy and sound intensity”, Chapter 4, Essex: Elsevier Science Publishers Ltd., 1989, pp. 38-88. |
Faller, C. , “Microphone Front-Ends for Spatial Audio Coders”, Audio Engineering Society Convention Paper 7508; Presented at the 125th Convention, San Francisco, CA, USA, Oct. 2-5, 2008, 10 pages. |
Faller, C., “Obtaining a Highly Directive Center Channel from Coincident Stereo Microphone Signals”, AES Convention Paper 7380; Presented at the 124th Convention; Amsterdam, The Netherlands, May 17-20, 2008, 7 pages. |
Furness, R. , “Ambisonics—An Overview”, Minim Electronics Limited, Burnham, Slough,U.K.; AES 8th International Conference; Apr. 1990, pp. 181-190. |
Gallo, Emmanuel et al., “Extracting and Re-Rendering Structured Auditory Scenes from Field Recordings”, AES 30th Int'l Conference; Saariselkä, Finland, Mar. 15-17, 2007, 11 pages. |
Gerzon, M., “Ambisonics in Multichannel Broadcasting and Video”, Journal Audio Engineering Society, vol. 33, No. 11, Nov. 1985, pp. 859-871. |
Herre, J. et al., “Interactive Teleconferencing Combining Spatial Audio Object Coding and DirAC Technology”, AES Convention Paper 8098; Presented at the 128th Convention; London, UK, May 22-25, 2010, 12 pages. |
Herre, J. et al., “MPEG Surround—The ISO/MPEG Standard for Efficient and Compatible Multi-Channel Audio Coding”, Audio Engineering Society Convention Paper, Presented at the 122nd Convention, Vienna, Austria, May 5-8, 2007, 23 pages. |
Kallinger, M. et al. “A Spatial Filtering Approach for Directional Audio Coding”, AES Convention Paper 7653; Presented at the 126th Convention; Munich, Germany, May 7-10, 2009, 10 pages. |
Kallinger, M. et al., “Enhanced Direction Estimation using Microphone Arrays for Directional Audio Coding”, in Hands-Free Speech Communication and Microphone Arrays (HSCMA), May 2008, pp. 45-48. |
Kuntz, A. et al., “Limitations in the Extrapolation of Wave Fields from Circular Measurements”, 15th European Signal Processing Conference (EUSIPCO 2007), Poznan, Poland, Sep. 3-7, 2007, pp. 2331-2335. |
Marro, C. et al., “Analysis of Noise Reduction and Dereverberation Techniques Based on Microphone Arrays With Postfiltering”, IEEE Transactions on Speech and Audio Processing, vol. 6, No. 3, May 1998, pp. 240-259. |
Pulkki, V., “Directional audio coding in spatial sound reproduction and stereo upmixing”, AES 28th International Conference, Piteå, Sweden, Jun. 30-Jul. 2, 2006, pp. 1-8. |
Pulkki, V., “Spatial Sound Reproduction with Directional Audio Coding”, J. Audio Eng. Soc., Helsinki Univ. of Technology, Finland; 55(6), Jun. 2007, pp. 503-516. |
Rickard, S. et al., “On the Approximate W-Disjoint Orthogonality of Speech”, In the International Conference on Acoustics, Speech and Signal Processing, Apr. 2002, vol. 1, pp. I-529-I-532. |
Roy, R. et al. , “Direction-of-Arrival Estimation by Subspace Rotation Methods—ESPRIT”, In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Stanford, CA, USA, Apr. 1986, pp. 2495-2498. |
Roy, R. et al., “ESPRIT—Estimation of Signal Parameters Via Rotational Invariance Techniques”, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 37, No. 7, Jul. 1989, pp. 984-995. |
Schmidt, R. , “Multiple Emitter Location and Signal Parameter Estimation”, IEEE Transactions on Antennas and Propagation, vol. 34, No. 3, Mar. 1986, pp. 276-280. |
Schultz-Amling, R. et al., “Acoustical Zooming Based on a Parametric Sound Field Representation”, AES Convention Paper 8120; Presented at the 128th Convention; London, UK, May 22-25, 2010, 9 pages. |
Schultz-Amling, R. et al., “Planar Microphone Array Processing for the Analysis and Reproduction of Spatial Audio using Directional Audio Coding”, Audio Engineering Society, Convention Paper 7375, Presented at the 124th Convention, Amsterdam, The Netherlands, May 17-20, 2008, 10 pages. |
Simmer, K. U. et al., “Time Delay Compensation for Adaptive Multichannel Speech Enhancement Systems”, Proceedings of ISSSE-92, Paris, Sep. 1-4, 1992, 4 pages. |
Steele, Michael J. , “Optimal Triangulation of Random Samples in the Plane”, The Annals of Probability, vol. 10, No. 3, Aug. 1982, pp. 548-553. |
Vilkamo, J. et al., “Directional Audio Coding: Virtual Microphone-Based Synthesis and Subjective Evaluation”, J. Audio Eng. Soc., vol. 57, No. 9., Sep. 2009, pp. 709-724. |
Walther, A. et al., “Linear Simulation of Spaced Microphone Arrays Using B-Format Recordings”, Audio Engineering Society, Convention Paper 7987, Presented at the 128th Convention, May 22-25, 2010, London, UK, 7 pages. |
Williams, E.G., “Fourier Acoustics: Sound Radiation and Nearfield Acoustical Holography; Chapter 3, The Inverse Problem: Planar Nearfield Acoustical Holography”, Academic Press, Jun. 1999, pp. 89-114. |
Karbasi, Amin et al., “A New DOA Estimation Method Using a Circular Microphone Array”, School of Comp. and Commun. Sciences, Ecole Polytechnique Federale de Lausanne CH-1015 Lausanne, Switzerland, 2007, 778-782. |
Number | Date | Country | |
---|---|---|---|
20130259243 A1 | Oct 2013 | US |
Number | Date | Country | |
---|---|---|---|
61419623 | Dec 2010 | US | |
61420099 | Dec 2010 | US |
Number | Date | Country | |
---|---|---|---|
Parent | PCT/EP2011/071629 | Dec 2011 | US |
Child | 13904870 | US |