The present embodiments relate generally to signal processing, particularly to systems having a parallel bank of resonant filters.
While acoustic spaces and vibrating systems have long been studied using modal analysis (e.g. A. Benade, Fundamentals of Musical Acoustics, Oxford University Press, 1976, pp. 172 et seq.), P. Morse and K. Ingard, Theoretical Acoustics, Princeton University Press, 1987, pp 576 et seq.), such systems have only recently been synthesized using modal structures (e.g. J. Abel et al., “A modal architecture for artificial reverberation,” The Journal of the Acoustical Society of America, vol. 134, no. 5, p. 4220, 2013, J. Abel et al., “A modal architecture for artificial reverberation with application to room acoustics modeling,” Audio Engineering Society Convention, vol. 137, (Los Angeles, Calif.), Oct. 9-12 2014 (“Modal Architecture”)). A feature is that a rational system can be represented as the linear combination of resonant or “mode” filters, one for each mode of the system. The resulting parallel structure allows accurate modeling of the acoustic space or object, and provides explicit, interactive control over its features.
One barrier to implementing such systems is estimating the parameters which describe each mode filter—its resonant frequency, damping, and amplitude-a task made difficult by the potentially thousands of modes needed to accurately characterize the desired acoustic space or vibrating system. In “Modal Architecture,” room mode frequencies are approximated by the locations of peaks in the power spectrum of the impulse response. Mode dampings are approximated using reverberation times in bands about the estimated mode frequencies. Though this method is computationally efficient and appears to produce acceptable perceptual results, its accuracy can be improved, as spectral peaks and mode frequencies rarely coincide, and mode dampings can vary noticeably from mode to nearby mode.
In E. Maestre et al., “Design of recursive digital filters in parallel form by linearly constrained pole optimization,” IEEE Signal Processing Letters, vol. 23, pp. 1547-1550, November 1997 and E. Maestre et al., “Constrained pole optimization for modal reverberation,” in International Conference on Digital Audio Effects (DAFx-17), vol. 18, (Edinburgh, UK), Sep. 5-9 2017, frequency bands containing relatively few spectral peaks are formed, and mode parameters (in these cases, biquad coefficients) are fit in each band using a nonlinear optimization. The method produces excellent fits to the spectrum, but requires significant computation, and is not easily automated.
Thus, there is a need for a method to estimate modal parameters from a given impulse response that is both accurate and computationally efficient. There is a need for a method to estimate modal parameters which also may be automated.
According to certain aspects, the present embodiments estimate modal parameters from given impulse responses using a direct computation that is accurate, and can handle a large number of modes. In one embodiment, the pseudoinverse of a Hankel matrix of the given impulse response samples is formed. In another embodiment, a Hankel matrix of offset impulse response samples is formed, and the generalized eigenvalues are computed to form frequency and damping parameters. In a further embodiment, mode amplitudes are estimated using linear least squares based on estimates of the mode frequency and damping parameters. In yet another embodiment, the impulse response is decomposed into a set of possibly overlapping frequency bands, and down-sampled in advance of estimating modal parameters for each band.
These and other aspects and features of the present embodiments will become apparent to those ordinarily skilled in the art upon review of the following description of specific embodiments in conjunction with the accompanying figures, wherein:
The present embodiments will now be described in detail with reference to the drawings, which are provided as illustrative examples of the embodiments so as to enable those skilled in the art to practice the embodiments and alternatives apparent to those skilled in the art. Notably, the figures and examples below are not meant to limit the scope of the present embodiments to a single embodiment, but other embodiments are possible by way of interchange of some or all of the described or illustrated elements. Moreover, where certain elements of the present embodiments can be partially or fully implemented using known components, only those portions of such known components that are necessary for an understanding of the present embodiments will be described, and detailed descriptions of other portions of such known components will be omitted so as not to obscure the present embodiments. Embodiments described as being implemented in software should not be limited thereto, but can include embodiments implemented in hardware, or combinations of software and hardware, and vice-versa, as will be apparent to those skilled in the art, unless otherwise specified herein. In the present specification, an embodiment showing a singular component should not be considered limiting; rather, the present disclosure is intended to encompass other embodiments including a plurality of the same component, and vice-versa, unless explicitly stated otherwise herein. Moreover, applicants do not intend for any term in the specification or claims to be ascribed an uncommon or special meaning unless explicitly set forth as such. Further, the present embodiments encompass present and future known equivalents to the known components referred to herein by way of illustration.
According to certain aspects, the present applicant recognizes aspects of Hankel norm filter design (e.g. J. Smith, Techniques for System Identification and Filter Design with Application to the Violin. Ph.D. dissertation, Stanford University, Stanford, Calif., 1983, § 1.3), in which the impulse response samples are arranged in a Hankel matrix, and a rational system is fit to a reconstruction of the matrix using its largest singular values. The present applicant further recognizes that the outer product of vectors of complex exponential samples is a Hankel matrix. In this way, the Hankel matrix of impulse response samples is seen to be an outer product of mode filter responses. The present applicant still further recognizes that a manipulation of the Hankel matrix, similar to that of ESPRIT (e.g., R. Roy and T. Kailath, “Esprit-estimation of signal parameters via rotational invariance techniques,” IEEE Transactions on acoustics, speech, and signal processing, vol. 37, pp. 984-995, July 1989) can be used to generate the desired mode parameters.
Denote by h(t) an impulse response available at samples t=0, 1, . . . , T, with T being even. Assuming that h(t) is associated with a rational system without repeated poles, then it may be expressed as the sum of M mode responses hm(t),
in which each mode response is a complex exponential,
h
m(t)=γm exp{jωm−αm}, (2)
characterized by a mode frequency ωm, mode decay rate αm and mode amplitude γm. The modal filter is this parallel combination of resonant mode filters. The goal, then, is to estimate the mode count M and mode parameters ωm, αm and γm, m=1, 2, . . . M, to best approximate noisy observations of the impulse response:
{tilde over (h)}(t)=h(t)+n(t), (3)
where n(t) represents additive measurement noise. It should be noted that embodiments will be described herein with reference to using an impulse response, those skilled in the art will appreciate that other types of responses could be used, such as selected portions of impulse responses (e.g. portions not starting at time zero).
Consider νm, the stack of samples of the complex exponential portion of the mth mode response hm(t), and V, the concatenation of the νm,
V=[ν1 . . . νM],νm=[e(jω
where ⋅T represents the non-Hermetian transpose. Note that the elements of the column νm form a geometric series, and as such its outer product (using a non-Hermetian transpose) forms a Hankel matrix, with elements exp{jωm−αm)·t} along the t-th antidiagonal,
Accordingly, the Hankel matrix formed by the impulse response samples H,
may be written as an outer product of V with a diagonal matrix of mode amplitudes, Γ=diag{[γ1 . . . γM]}
H=VΓV
T (7)
In this way, the unknown mode amplitudes are separated from the unknown mode frequencies and dampings.
Following an approach similar to that of ESPRIT, the Hankel matrix formed using the impulse response offset by p samples, Kp,
may also be written as an outer product,
Kp=VΨ
p
ΓV
T (9)
where Ψp is a diagonal matrix of mode exponential kernels,
Ψp=diag{[e(jω
Post-multiplying Kp by the pseudoinverse of the matrix H,
K
p
H
+
=VΨ
p
V
−1 (11)
produces a matrix with eigenvalues that are the desired mode exponential kernels, e(jω*−α*). (Note that the same eigenvalues can be produced by pre-multiplying by H+. Also note that these eigenvalues are the generalized eigenvalues of the pair (Kp, H.).
Based on the foregoing observations and relationships, the following describes an example methodology of estimating the mode count M and mode parameters ωm, αm and γm, m=1, 2, . . . M, from noisy observations of the impulse response h(t), as shown in
In block 102, a measured impulse response of the acoustic space or resonating object (e.g. having T time domain samples) is obtained using many ways known to those skilled the art. From the measured impulse response, offset samples at offsets p are obtained, also using many ways known to those skilled in the art.
In block 104, form the Hankel matrices:
{tilde over (H)} and {tilde over (K)}p
from the measured impulse response and offset samples of the measured impulse response:
{tilde over (h)}(t),t=0,1, . . . ,T and {tilde over (h)}(t)r,t=p,p+1, . . . ,{circumflex over (p)}+T.
respectively, using the equations (6) and (8) as set forth above.
In block 106, compute the pseudoinverse of the measured impulse response Hankel matrix:
{tilde over (H)}
+.
for instance by forming a singular value decomposition, and inverting the “signal” subspace of
{tilde over (H)}.
The signal subspace may be defined to be that associated with the {acute over (M)} largest singular values. It may also be found by dividing the singular values into two groups having larger and smaller singular values, respectively associated with “signal” and “noise” subspaces of:
{tilde over (H)}.
Here, the rank of the signal space is used as an estimate of the number of modes, {acute over (M)}.
In block 108, find the {acute over (M)} largest eigenvalues of
{tilde over (K)}
p
{tilde over (H)}
+ or {tilde over (H)}+{tilde over (K)}p,
denoted by νm, m=1, 2, . . . , {acute over (M)}. The mode frequency and damping estimates are then the real and imaginary parts of the log eigenvalues, scaled by 1/p,
Finally, in block 110 estimate of the Vandermonde matrix of complex exponentials:
{circumflex over (V)}
either from the eigenvalue decomposition above, or directly using the estimated mode frequencies and dampings. The column of unknown dampings, γ can then be estimated by linear least squares fit to the measured impulse response. Denoting by
{tilde over (h)}
the stack of measured impulse response samples, we have
{tilde over (γ)}=({circumflex over (V)}TW{circumflex over (V)})−1{circumflex over (V)}TW{tilde over (h)}, (13)
where W is a weighting matrix, used for instance to achieve a better fit to the impulse response onset relative to the tail, though the identity matrix W=I generally produces good results. The dampings may also be estimated as the diagonal elements of the matrix
{circumflex over (V)}
−1
{tilde over (H)}{circumflex over (V)}
−T. (14)
The final result in block 110 is a modal filter (i.e. a transfer function h(t)) comprising a parallel combination of M resonant mode filters having the estimated mode parameters as described above. This modal filter can then be applied to an input signal x(t) to produce an output y(t) that has the desired reverberation effect associated with the target acoustic space or resonating object.
Aspects of the methods described herein are illustrated with reference to the example impulse response shown in
τ=[3 7 10 12 19 22 34 44], (15)
and a block Hadamard mixing matrix
Q=I⊗H
4/2, (16)
where ⊗ represents the Kronecker product, and H4 a 4×4 Hadamard matrix. The feedback filters are designed to give a constant reverberation time of 200 ms for the four shorter delay lines and a frequency-dependent reverberation time, varying between 400 ms at low frequencies and 100 ms at high frequencies, for the four longer delay lines, thus producing a two-stage decay.
White Gaussian noise was added at a level of −80 dB to the impulse response 202 of
The signal subspace of {tilde over (H)} was inverted, and the generalized eigenvalues between {tilde over (K)}1 and {tilde over (H)} computed to estimate the mode frequencies {circumflex over (ω)}m and decay times {circumflex over (α)}m (cf. (12)). The mode frequencies are represented by the x-axis values of the sample points in
The mode dampings γm were estimated using unweighted least squares, (13) with W=I. The impulse response 402 of the modal filter designed using the estimated mode parameters is overlaid on the noise-free impulse response in
Finally, it should be pointed out that in the absence of noise, the estimated impulse response differs from the given impulse response to within numerical error, in excess of 200 dB below the impulse response onset level. Similarly, mode frequencies and dampings computed from a state-space models of FDN reverberators were seen to match those estimated from the noise-free impulse response to within 1×e−11.
To get a feel for the statistical behavior of the inventive method, 256-trial Monte Carlo simulations were run using a four-state FDN reverberator made using the four shorter delay lines above, a Hadamard mixing matrix, a constant 60 dB decay time of 200 ms, and a sampling rate of 48 kHz. White Gaussian noise was added to the FDN impulse response at levels of (−50, −60, −70, −80) dB relative to the initial impulse response level to form the measured impulse responses for each trial. Modal filter impulse response, and modal frequency, damping, and amplitude estimates were made using the first 2049 measured impulse response samples.
In many scenarios, the mode frequencies and decay times (or dampings) are of primary importance. In these cases, it is possible to work from only a portion of the impulse response, and even the response of the system being modeled to any terminating signal. Such system responses are available in room acoustics measurement scenarios such as when a balloon pop, hand clap, or interrupted noise responses are available. Here, as the input is zero after some point in time, the system is responding with the sum of its mode exponentials, with perhaps unknown mode amplitudes. In these cases, applying the methods described herein to the system response when the system input is zero can produce meaningful mode frequency and damping estimates.
In a similar scenario, the system might have a number of long-lasting modes and many short-duration modes. In this case, it would be possible to estimate the parameters of the long-lasting modes by applying the inventive techniques to the portion of the impulse response when only the long-lasting modes are present, i.e., after the shorter-duration modes have sufficiently decayed. Here, the mode complex amplitudes can be adjusted to start at any desired time. Such a design could be used with a hybrid implementation in which a convolutional or similar efficient method is used to model the quickly decaying portion of impulse response, and a modal approach is used to implement the long-lasting modes.
Other aspects of the inventive method are motivated by the fact that there are a number of modal filter design scenarios in which there are many, long-lasting modes. To accurately estimate the mode frequencies and dampings in these scenarios, it desired to use a sufficiently long portion of the impulse response that a number of cycles of the low-frequency modes and a good amount of decay of the long-lasting modes can be captured. Also to provide the needed statistical leverage, the number of impulse response samples used should be at least several times the number of modes.
For example, an impulse response measured from a small recital hall could have a 60 dB decay time of about a second, and have several thousand modes in the audio band. An impulse response length of τ=16384 could be needed. In addition to being computation-ally expensive, numerical issues can arise with such large matrices. To circumvent these difficulties, the impulse response may be processed separately in a set of frequency bands.
The idea is to bandpass filter and down-sample the impulse response for each of a set of overlapping bands. In this way, each band will contain relatively few modes and last a relatively short number of down-sampled samples. The mode frequencies and dampings in each band may be estimated using the techniques described above, and translated to the original sampling rate and frequency band accordingly. (Note that the modes associated with the bandpass filtering will appear in the band mode parameter estimate, and should be discarded.) The mode amplitudes may be estimated, for instance, by least squares as described above, using the measured impulse response and estimated mode frequencies and dampings.
This processing is illustrated with respect to a synthesized impulse response having 24 modes randomly distributed across a 12 kHz wide band, each having a 60 dB decay time of 5000 ms. A modal model is designed by processing the impulse response in bands using heterodyning and the elliptic low-pass filter shown in
An example system according to embodiments can be included in a sound editing application and implemented by a plug-in in a digital audio workstation (DAW).
Such an example system can includes a filter design module and a modal reverberator module. The filter design module receives a measured impulse response for a room or resonating object to be modeled. For example, the DAW can include a library of measured responses from which a desired response can be selected. As another example, the measured response can be directly obtained using techniques known to those skilled in the art. Using the measured response, the design module then generates the parameters for the modal reverberator, specifically the mode frequencies, dampings and amplitudes for each of the M resonant mode filters using the example methodology described in connection with
The reverberator module effectively implements the modal filter formed by the M resonant mode filters, so as to apply artificial reverberation to a source signal that yields an output signal having reverberation characteristics matching the desired acoustic space or resonating object. It should be noted that the filter design module and reverberator module are not necessarily included in the same system in all embodiments.
In one example implementation, a system according to embodiments is included in a computer configured with DAW software such as Digidesign Pro Tools that supports plugin applications (e.g. AU, VST, RTAS compatible plugins), and possibly having a DSP card (not shown) that accelerates plugin processing. One possible example of a card and corresponding plugin application that can be adapted for use with this invention is UAD-1 DSP card and plugin bundle from Universal Audio of Santa Cruz, Calif. In such an example, the modal reverb techniques of the present invention are included as one plug-in application, or one application of among many plug-in applications provided with the card. In one example embodiment, the computer is a Mac or PC having a processor such as an Intel Pentium or other Intel CPU, AMD Athlon or other AMD CPU, or a Power-compatible CPU.
Audio (either provided within or to the system in real-time or via recorded media) can be processed by the DAW using the plug-in application and the techniques of the present invention. The plug-in application can further allow a user, via a user interface such as a graphical display, mouse, keyboard, etc., to select and adjust the parameters used by the designs and reverberating modules (e.g. selecting desired impulse responses, adjusting number of modes, extension controls, etc.), which can further cause the DAW to process the audio with the desired effect. Those skilled in the art will be able to understand how to implement the embodiments using software written in accordance with the methodologies described herein for use in a DAW after being taught by the present disclosure.
Although the present embodiments have been particularly described with reference to preferred examples thereof, it should be readily apparent to those of ordinary skill in the art that changes and modifications in the form and details may be made without departing from the spirit and scope of the present disclosure. It is intended that the appended claims encompass such changes and modifications.
The present application is a continuation of U.S. patent application Ser. No. 16/558,012 filed Aug. 30, 2019, now U.S. Pat. No. 11,087,733, which application is a continuation-in-part of U.S. patent application Ser. No. 16/384,266 filed Apr. 15, 2019, now U.S. Pat. No. 11,049,482, which application is a continuation of U.S. patent application Ser. No. 15/796,327 filed Oct. 27, 2017, now U.S. Pat. No. 10,262,645, which application is a continuation of U.S. patent application Ser. No. 14/558,531, filed Dec. 2, 2014, now U.S. Pat. No. 9,805,704, which application claims priority to U.S. Provisional Patent Application Nos. 62/061,219 filed Oct. 8, 2014, 61/913,093 filed Dec. 6, 2013 and 61/910,548 filed Dec. 2, 2013. U.S. patent application Ser. No. 16/558,012 filed Aug. 30, 2019, now U.S. Pat. No. 11,087,733, is a continuation-in-part of U.S. patent application Ser. No. 16/432,866, filed Jun. 5, 2019, which application is a continuation-in-part of U.S. patent application Ser. No. 16/030,789 filed Jul. 9, 2018, which application is a continuation of U.S. patent application Ser. No. 15/201,013 filed Jul. 1, 2016, now U.S. Pat. No. 10,019,980, which application claims priority to U.S. Provisional Patent Application No. 62/188,299 filed Jul. 2, 2015. U.S. patent application Ser. No. 16/558,012 also claims priority to U.S. Provisional Patent Application No. 62/726,325 filed Sep. 3, 2018. The contents of all the above applications are incorporated herein by reference in their entirety.
Number | Date | Country | |
---|---|---|---|
62188299 | Jul 2015 | US | |
61910548 | Dec 2013 | US | |
61913093 | Dec 2013 | US | |
62061219 | Oct 2014 | US | |
62726325 | Sep 2018 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 16558012 | Aug 2019 | US |
Child | 17397743 | US | |
Parent | 15796327 | Oct 2017 | US |
Child | 16384266 | US | |
Parent | 14558531 | Dec 2014 | US |
Child | 15796327 | US | |
Parent | 15201013 | Jul 2016 | US |
Child | 16030789 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 16384266 | Apr 2019 | US |
Child | 16558012 | US | |
Parent | 16432866 | Jun 2019 | US |
Child | 16558012 | US | |
Parent | 16030789 | Jul 2018 | US |
Child | 16432866 | US |