This disclosure relates in general to reverberation algorithms and reverberators for using the disclosed reverberation algorithms. More specifically, this disclosure relates to calculating a reverberation initial power (RIP) correction factor and applying it in series with a reverberator. This disclosure also relates to calculating a reverberation energy correction (REC) factor and applying it in series with a reverberator.
Virtual environments are ubiquitous in computing environments, finding use in video games (in which a virtual environment may represent a game world); maps (in which a virtual environment may represent terrain to be navigated); simulations (in which a virtual environment may simulate a real environment); digital storytelling (in which virtual characters may interact with each other in a virtual environment); and many other applications. Modern computer users are generally comfortable perceiving, and interacting with, virtual environments. However, users' experiences with virtual environments can be limited by the technology for presenting virtual environments. For example, conventional displays (e.g., 2D display screens) and audio systems (e.g., fixed speakers) may be unable to realize a virtual environment in ways that create a compelling, realistic, and immersive experience.
Virtual reality (“VR”), augmented reality (“AR”), mixed reality (“MR”), and related technologies (collectively, “XR”) share an ability to present, to a user of an XR system, sensory information corresponding to a virtual environment represented by data in a computer system. Such systems can offer a uniquely heightened sense of immersion and realism by combining virtual visual and audio cues with real sights and sounds. Accordingly, it can be desirable to present digital sounds to a user of an XR system in such a way that the sounds seem to be occurring—naturally, and consistently with the user's expectations of the sound—in the user's real environment. Generally speaking, users expect that virtual sounds will take on the acoustic properties of the real environment in which they are heard. For instance, a user of an XR system in a large concert hall will expect the virtual sounds of the XR system to have large, cavernous sonic qualities; conversely, a user in a small apartment will expect the sounds to be more dampened, close, and immediate.
Digital, or artificial, reverberators may be used in audio and music signal processing to simulate perceived effects of diffuse acoustic reverberation in rooms. A system that provides accurate and independent control of reverberation loudness and reverberation decay for each digital reverberator, for example, for intuitive control for sound designers may be desired.
Systems and methods for providing accurate and independent control of reverberation properties are disclosed. In some embodiments, a system may include a reverberation processing system, a direct processing system, and a combiner. The reverberation processing system can include a reverb initial power (RIP) control system and a reverberator. The RIP control system can include a reverb initial gain (RIG) and a RIP corrector. The RIG can be configured to apply a RIG value to the input signal, and the RIP corrector can be configured to apply a RIP correction factor to the signal from the RIG. The reverberator can be configured to apply reverberation effects to the signal from the RIP control system.
In some embodiments, the reverberator can include one or more comb filters to filter out one or more frequencies in the system. The one or more frequencies can be filtered out to mimic environmental effects, for example. In some embodiments, the reverberator can include one or more all-pass filters. Each all-pass filter can receive a signal from the comb filters and can be configured to pass its input signal without changing its magnitude, but can change a phase of the signal.
In some embodiments, the RIG can include a reverb gain (RG) configured to apply a RG value to the input signal. In some embodiments, the RIG can include a REC configured to apply a RE correction factor to the signal from the RG.
In the following description of examples, reference is made to the accompanying drawings which form a part hereof, and in which it is shown by way of illustration specific examples that can be practiced. It is to be understood that other examples can be used and structural changes can be made without departing from the scope of the disclosed examples.
Example Wearable System
In some examples involving augmented reality or mixed reality applications, it may be desirable to transform coordinates from a local coordinate space (e.g., a coordinate space fixed relative to wearable head device 400A) to an inertial coordinate space, or to an environmental coordinate space. For instance, such transformations may be necessary for a display of wearable head device 400A to present a virtual object at an expected position and orientation relative to the real environment (e.g., a virtual person sitting in a real chair, facing forward, regardless of the position and orientation of wearable head device 400A), rather than at a fixed position and orientation on the display (e.g., at the same position in the display of wearable head device 400A). This can maintain an illusion that the virtual object exists in the real environment (and does not, for example, appear positioned unnaturally in the real environment as the wearable head device 400A shifts and rotates). In some examples, a compensatory transformation between coordinate spaces can be determined by processing imagery from the depth cameras 444 (e.g., using a Simultaneous Localization and Mapping (SLAM) and/or visual odometry procedure) in order to determine the transformation of the wearable head device 400A relative to an inertial or environmental coordinate system. In the example shown in
In some examples, the depth cameras 444 can supply 3D imagery to a hand gesture tracker 411, which may be implemented in a processor of wearable head device 400A. The hand gesture tracker 411 can identify a user's hand gestures, for example, by matching 3D imagery received from the depth cameras 444 to stored patterns representing hand gestures. Other suitable techniques of identifying a user's hand gestures will be apparent.
In some examples, one or more processors 416 may be configured to receive data from headgear subsystem 404B, the IMU 409, the SLAM/visual odometry block 406, depth cameras 444, a microphone (not shown); and/or the hand gesture tracker 411. The processor 416 can also send and receive control signals from the 6DOF totem system 404A. The processor 416 may be coupled to the 6DOF totem system 404A wirelessly, such as in examples where the handheld controller 400B is untethered. Processor 416 may further communicate with additional components, such as an audio-visual content memory 418, a Graphical Processing Unit (GPU) 420, and/or a Digital Signal Processor (DSP) audio spatializer 422. The DSP audio spatializer 422 may be coupled to a Head Related Transfer Function (HRTF) memory 425. The GPU 420 can include a left channel output coupled to the left source of imagewise modulated light 424 and a right channel output coupled to the right source of imagewise modulated light 426. GPU 420 can output stereoscopic image data to the sources of imagewise modulated light 424, 426. The DSP audio spatializer 422 can output audio to a left speaker 412 and/or a right speaker 414. The DSP audio spatializer 422 can receive input from processor 416 indicating a direction vector from a user to a virtual sound source (which may be moved by the user, e.g., via the handheld controller 400B). Based on the direction vector, the DSP audio spatializer 422 can determine a corresponding HRTF (e.g., by accessing a HRTF, or by interpolating multiple HRTFs). The DSP audio spatializer 422 can then apply the determined HRTF to an audio signal, such as an audio signal corresponding to a virtual sound generated by a virtual object. This can enhance the believability and realism of the virtual sound, by incorporating the relative position and orientation of the user relative to the virtual sound in the mixed reality environment—that is, by presenting a virtual sound that matches a user's expectations of what that virtual sound would sound like if it were a real sound in a real environment.
In some examples, such as shown in
While
Mixed Reality Environment
Like all people, a user of a mixed reality system exists in a real environment—that is, a three-dimensional portion of the “real world,” and all of its contents, that are perceptible by the user. For example, a user perceives a real environment using one's ordinary human senses—sight, sound, touch, taste, smell—and interacts with the real environment by moving one's own body in the real environment. Locations in a real environment can be described as coordinates in a coordinate space; for example, a coordinate can comprise latitude, longitude, and elevation with respect to sea level; distances in three orthogonal dimensions from a reference point; or other suitable values. Likewise, a vector can describe a quantity having a direction and a magnitude in the coordinate space.
A computing device can maintain, for example, in a memory associated with the device, a representation of a virtual environment. As used herein, a virtual environment is a computational representation of a three-dimensional space. A virtual environment can include representations of any object, action, signal, parameter, coordinate, vector, or other characteristic associated with that space. In some examples, circuitry (e.g., a processor) of a computing device can maintain and update a state of a virtual environment; that is, a processor can determine at a first time, based on data associated with the virtual environment and/or input provided by a user, a state of the virtual environment at a second time. For instance, if an object in the virtual environment is located at a first coordinate at time, and has certain programmed physical parameters (e.g., mass, coefficient of friction); and an input received from user indicates that a force should be applied to the object in a direction vector; the processor can apply laws of kinematics to determine a location of the object at time using basic mechanics. The processor can use any suitable information known about the virtual environment, and/or any suitable input, to determine a state of the virtual environment at a time. In maintaining and updating a state of a virtual environment, the processor can execute any suitable software, including software relating to the creation and deletion of virtual objects in the virtual environment; software (e.g., scripts) for defining behavior of virtual objects or characters in the virtual environment; software for defining the behavior of signals (e.g., audio signals) in the virtual environment; software for creating and updating parameters associated with the virtual environment; software for generating audio signals in the virtual environment; software for handling input and output; software for implementing network operations; software for applying asset data (e.g., animation data to move a virtual object over time); or many other possibilities.
Output devices, such as a display or a speaker, can present any or all aspects of a virtual environment to a user. For example, a virtual environment may include virtual objects (which may include representations of inanimate objects; people; animals; lights; etc.) that may be presented to a user. A processor can determine a view of the virtual environment (for example, corresponding to a “camera” with an origin coordinate, a view axis, and a frustum); and render, to a display, a viewable scene of the virtual environment corresponding to that view. Any suitable rendering technology may be used for this purpose. In some examples, the viewable scene may include only some virtual objects in the virtual environment, and exclude certain other virtual objects. Similarly, a virtual environment may include audio aspects that may be presented to a user as one or more audio signals. For instance, a virtual object in the virtual environment may generate a sound originating from a location coordinate of the object (e.g., a virtual character may speak or cause a sound effect); or the virtual environment may be associated with musical cues or ambient sounds that may or may not be associated with a particular location. A processor can determine an audio signal corresponding to a “listener” coordinate—for instance, an audio signal corresponding to a composite of sounds in the virtual environment, and mixed and processed to simulate an audio signal that would be heard by a listener at the listener coordinate—and present the audio signal to a user via one or more speakers.
Because a virtual environment exists only as a computational structure, a user cannot directly perceive a virtual environment using one's ordinary senses. Instead, a user can perceive a virtual environment only indirectly, as presented to the user, for example by a display, speakers, haptic output devices, etc. Similarly, a user cannot directly touch, manipulate, or otherwise interact with a virtual environment; but can provide input data, via input devices or sensors, to a processor that can use the device or sensor data to update the virtual environment. For example, a camera sensor can provide optical data indicating that a user is trying to move an object in a virtual environment, and a processor can use that data to cause the object to respond accordingly in the virtual environment.
Reverberation Algorithms and Reverberators
In some embodiments, digital reverberators may be designed based on delay networks with feedback. In such embodiments, reverberator algorithm design guidelines may be included/available for accurate parametric decay time control and for maintaining reverberation loudness when decay time is varied. Relative adjustment of the reverberation loudness may be realized by providing an adjustable signal amplitude gain in cascade with the digital reverberator. This approach may enable a sound designer or a recording engineer to tune reverberation decay time and reverberation loudness independently, while audibly monitoring a reverberator output signal in order to achieve a desired effect.
Programmatic applications, such as interactive audio engines for video games or VR/AR/MR, may simulate multiple moving sound sources at various positions and distances around a listener (e.g., a virtual listener) in a room/environment (e.g., virtual room/environment), relative reverberation loudness control may not be sufficient. In some embodiments, an absolute reverberation loudness is applied that may be experienced from each virtual sound source at rendering time. Many factors may adjust this value, such as, for example, listener and sound source positions, as well as acoustic properties of the room/environment, for example, simulated by a reverberator. In some embodiments, such as in interactive audio applications, it is desirable to programmatically control the reverberation initial power (RIP), for example, as defined in “Analysis and synthesis of room reverberation based on a statistical time-frequency model” by Jean-Marc Jot, Laurent Cerveau, and Olivier Warusfel. The RIP may be used to characterize a virtual room irrespective of positions of a virtual listener or virtual sound sources.
In some embodiments, a reverberation algorithm (executed by a reverberator) may be configured to perceptually match acoustic reverberation properties of a specific room. Example acoustic reverberation properties can include, but are not limited to, reverberation initial power (RIP) and reverberation decay time (T60). In some embodiments, the acoustic reverberation properties of a room may be measured in a real room, calculated by a computer simulation based on geometric and/or physical description of a real room or virtual room, or the like.
Example Audio Rendering System
Audio rendering system 500 can include a reverberation processing system 510A, a direct processing system 530, and a combiner 540. Both the reverberation processing system 510A and the direct processing system 530 can receive the input signal 501.
The reverberation processing system 510A can include a RIP control system 512 and a reverberator 514. The RIP control system 512 can receive the input signal 501 and can output a signal to the reverberator 514. The RIP control system 512 can include a reverb initial gain (RIG) 516 and a RIP corrector 518. The RIG 516 can receive the first portion of the input signal 501 and can output a signal to the RIP corrector 518. The RIG 516 can be configured to apply a RIG value to the input signal 501 (step 552 of process 550). Setting the RIG value can have an effect of specifying an absolute amount of RIP in output signal of the reverberation processing system 510A.
The RIP corrector 518 can receive a signal from the RIG 516 and can be configured to calculate and apply a RIP correction factor to its input signal (from the RIG 516) (step 554). The RIP corrector 518 can output a signal to the reverberator 514. The reverberator 514 can receive a signal from the RIP corrector 518 and can be configured to introduce reverberation effects in the signal (step 556). The reverberation effects can be based on the virtual environment, for example. The reverberator 514 is discussed in more detail below.
The direct processing system 530 can include a propagation delay 532 and a direct gain 534. The direct processing system 530 and the propagation delay 532 can receive the second portion of the input signal 501. The propagation delay 532 can be configured to introduce a delay in the input signal 501 (step 558) and can output the delayed signal to the direct gain 534. The direct gain 534 can receive a signal from the propagation delay 532 and can be configured to apply a gain to the signal (step 560).
The combiner 540 can receive the output signals from both the reverberation processing system 510A and the direct processing system 530 and can be configured to combine (e.g., add, aggregate, etc.) the signals (step 562). The output from the combiner 540 can be the output signal 540 of the audio rendering system 500.
Example Reverberation Initial Power (RIP) Normalization
In the reverberation processing system 510A, both the RIG 516 and the RIP corrector 518 can apply (and/or calculate) the RIG value and the RIP correction factor, respectively, such that when applied in series the signal output from the RIP corrector 518 can be normalized to a predetermined value (e.g., unity (1.0)). That is, the RIG value of an output signal can be controlled by applying the RIG 516 in series with the RIP corrector 518. In some embodiments, the RIP correction factor can be applied directly after the RIG value. The RIP normalization process is discussed in more detail below.
In some embodiments, in order to produce a diffuse reverberation tail, a reverberation algorithm may, for instance, include parallel comb filters, followed by a series of all-pass filters. In some embodiments, a digital reverberator may be constructed as a network including one or more delay units interconnected with feedback and/or feedforward paths that may also include signal gain scaling or filter units. The RIP correction factor of a reverberation processing system such as the reverberation processing system 510A of
In some embodiments, the RIP correction factor of the reverberation processing system may be equal to a root mean square (RMS) power of an impulse response of the reverberation system when a reverberation time is set to infinity. In some embodiments, for example, as illustrated in
The RMS power Prms(t) of a digital signal {x} at time t, expressed in samples, may be equal to an average of a squared signal amplitude. In some embodiments, the RMS power may be expressed as:
where t is the time, N is the number of consecutive signal samples, and n is the signal sample. The average may be evaluated over a signal window starting at time t and containing N consecutive signal samples.
The RMS amplitude may be equal to the square root of the RMS power Prms(t). In some embodiments, the RMS amplitude may be expressed as:
Arms(t)=√{square root over (Prms(t))} (2)
In some embodiments, in the impulse response of the reverberator (e.g., as illustrated in
In some embodiments, the reverberation time of the reverberation processing system 510A may be set to a finite value. With the finite value, the RMS power may substantially follow an exponential decay (after a reverberation onset time), as shown in
Example Reverberators
In some embodiments, the reverberator 514 (of
In some embodiments, the RIP correction factor for the reverberator may be calculated by setting the reverberation time to infinity. Setting the reverberation time to infinity may be equivalent to assuming that the comb filters do not have any built-in attenuation. If a Dirac impulse is input through the comb filters, the output signal of the reverberator 514 may be a sequence of full scale impulses, for example.
In some embodiments, the reverberator may have a plurality of comb filters, and the RMS amplitude may be expressed as:
where N is the number of comb filters in the reverberator, and dmean is the mean feedback delay length. The mean feedback delay length dmean may be expressed in samples and averaged across the N comb filters.
Reverberation processing system 510B can include a RIP control system 512 and a reverberator 1114. The RIP control system 512 can include a RIG 516 and a RIP corrector 518. The RIP control system 512 and the RIP corrector 518 can be correspondingly similar to those included in the reverberation processing system 510A (of
The RIG 516 may be configured to apply a RIG value (step 1152 of process 1150), and the RIP corrector 518 can apply a RIP correction factor (step 1154), both in series with the reverberator 1114. The serially configuration of the RIG 516, the RIP corrector 518, and the reverberator 114 may cause the RIP of the reverberation processing system 510B to be equal to the RIG.
In some embodiments, the RIP correction factor can be expressed as:
The application of the RIP correction factor to the signal can cause the RIP to be set to a predetermined value, such as unity (1.0), when the RIG value is set to 1.0.
The reverberator 514 can receive a signal from the RIP control system 512 and can be configured to introduce reverberation effects into the first portion of the input signal (step 1156). The reverberator 514 can include one or more comb filters 1115. The comb filter(s) 1115 can be configured to filter out one or more frequencies in the signal (step 1158). For example, the comb filter(s) 1115 can filter out (e.g., cancel) one or more frequencies to mimic environmental effects (e.g., the walls of the room). The reverberator 1114 can output two or more output signals 502A and 502B (step 1160).
Reverberation processing system 510C can be similar to the reverberation processing system 510B (of
The reverberation processing system 510C can include a RIP control system 512 and a reverberator 1214. The RIP control system 512 can include a RIG 516 and a RIP corrector 518. The RIP control system 512 and the RIP corrector 518 can be correspondingly similar to those included in the reverberation processing system 510A (of
The reverberator 1214 may additionally include all-pass filters 1215 that can receive signals from the comb filters 1115. Each all-pass filter 1215 can receive a signal from the comb filters 1115 and can be configured to pass its input signal without changing their magnitudes (step 1262). In some embodiments, the all-pass filter 1215 can change a phase of the signal. In some embodiments, each all-pass filter can receive a unique signal from the comb filters. The outputs of the all-pass filters 1215 can be the output signals 502 of the reverberation processing system 510C and the audio rendering system 500. For example, the all-pass filter 1215A can receive a unique signal from the comb filters 1115 and can output the signal 502A; similarly, the all-pass filter 1215B can receive a unique signal from the comb filters 1115 and can output the signal 502B.
Comparing to
When applying the RIP correction factor, if the reverberation time is set to infinity, the RIG value is set to 1.0, and a single unit impulse is input through the reverberation processing system 510C, a noise-like output with a constant RMS level of 1 maybe be obtained.
In some embodiments, the RIP normalization method described in connection with
Example Feedback Delay Networks
The embodiments disclosed herein may have a reverberator that includes a feedback delay network (FDN), according to some embodiments. The FDN may include an identity matrix, which may allow the output of a delay unit to be fed back to its input.
The combiners 1522 can receive the input signal 1501 and can be configured to combine (e.g., add, aggregate, etc.) its inputs (step 1552 of process 1550). The combiners 1522 can also receive a signal from the feedback matrix 1520. The delays 1524 can receive the combined signals from the combiners 1522 and can be configured to introduce a delay into one or more signals (step 1554). The gains 1526 can receive the signals from the delays 1524 and can be configured to introduce a gain into one or more signals (step 1556). The output signals from the gains 1526 can form the output signal 1502 and may also be input into the feedback matrix 1520. In some embodiments, the feedback matrix 1520 may be a N×N unitary (energy-preserving) matrix.
In the general case where the feedback matrix 1520 is a unitary matrix, the expression of the RIP correction factor may also be given by Equation (5) because the overall energy transfer around the feedback loop of the reverberator remains unchanged and delay-free.
For a given arbitrary choice of reverberator design and internal parameter settings, a RIP correction factor may be calculated, for example. The calculated RIP correction factor may be such that if the RIG value is set to 1.0, then the RIP of the overall reverberation processing system 510 is also 1.0.
In some embodiments, the reverberator may include a FDN with one or more all-pass filters.
FDN 1615 can include a plurality of all-pass filters 1630, a plurality of delays 1632, and a mixing matrix 1640B. The all-pass filters 1630 can include a plurality of gains 1526, an absorptive delay 1632, and another mixing matrix 1640A. The FDN 1615 may also include a plurality of combiners (not shown).
The all-pass filters 1630 receive the input signal 1501 and may be configured to pass its input signal without changing its magnitude. In some embodiments, the all-pass filter 1630 can change a phase of the signal. In some embodiments, each all-pass filter 1630 can be configured such that power input to the all-pass filter 1630 can be equal to power output from the all-pass filter. In other words, each all-pass filter 1630 may have no absorption. Specifically, the absorptive delay 1632 can receive the input signal 1501 and can be configured to introduce a delay in the signal. In some embodiments, the absorptive delay 1632 can delay its input signal by a number of samples. In some embodiments, each absorptive delay 1632 can have a level of absorption such that its output signal is a certain level less than its input signal.
The gains 1526A and 1526B can be configured to introduce a gain in its respective input signal. The input signal for the gain 1526A can be the input signal to the absorptive delay, and the output signal for the gain 1526B can be the output signal to the mixing matrix 1640A.
The output signals from the all-pass filters 1630 can be input signals to delays 1632. The delays 1632 can receive signals from the all-pass filters 1630 and can be configured to introduce delays into its respective signals. In some embodiments, the output signals from the delays 1632 can be combined to form the output signal 1502, or, in some embodiments, these signals may be separately taken as multiple output channels in others. In some embodiments, the output signal 1502 may be taken from other points in the network.
The output signals from the delays 1632 can also be input signals into the mixing matrix 1640B. The mixing matrix 1640B can be configured to receive multiple input signals and can output its signals to be fed back into the all-pass filters 1630. In some embodiments, each mixing matrix can be a full mixing matrix.
In these reverberator topologies, the RIP correction factor may be expressed by Equation (5) because the overall energy transfer in and around the feedback loop of the reverberator can remain unchanged and delay-free. In some embodiments, the FDN 1615 may vary the input and/or output signal placement to achieve the desired output signal 1501.
The FDN 1615 with the all-pass filters 1630 can be a reverberating system that takes the input signal 1501 as its input and creates a multi-channel output that can include the correct decaying reverberation signal. The input signal 1501 can be the mono-input signal.
In some embodiments, the RIP correction factor may be expressed as a mathematical function of a set of reverberator parameters {P} that determine the reverberation RMS amplitude Arms({P}) when the reverberation time is set to infinity, as shown in
RIPcorrection=1/Arms({P}) (6)
For a given reverberator topology and a given setting of delay unit lengths of the reverberator, the RIP correction factor may be calculated by performing the following steps: (1) setting the reverberation time to infinity; (2) recording the reverberator impulse response (as shown in
In some embodiments, the RIP correction factor may be calculated by performing the following steps: (1) setting the reverberation time to any finite value; (2) recording the reverberator impulse response; (3) deriving the reverberation RMS amplitude decay curve Arms(t) (as shown in
RIPcorrection=1/Arms({0}) (7)
Example Reverberaton Energy Normalization Method
In some embodiments, it may be desirable to provide a perceptually relevant reverberation gain control method, for example, for application developers, sound engineers, and the like. For example, in some reverberator or room simulator embodiments, it may be desirable to provide programmatic control over a measure of a power amplification factor representative of an effect of a reverberation processing system on the power of an input signal. The power of an input signal may be expressed in dB, for example. The programmatic control over the power amplification factor may allow application developers, sound engineers, and the like, for example, to determine a balance between reverberation output signal loudness and input signal loudness, or direct sound output signal loudness.
In some embodiments, the system can apply a reverberation energy (RE) correction factor.
Reverberation processing system 510D can include a RIP control system 512 and a reverberator 514. The RIP control system 512 can include a RIG 516 and a RIP corrector 518. The RIP control system 512, the reverberator 514, and the RIP corrector 518 can be correspondingly similar to those included in the reverberation processing system 510A (of
The reverberation processing system 510D may also include a RIG 516 that comprises a reverb gain (RG) 1716 and a RE corrector 1717. The RG 1716 can receive the input signal 501 and can output a signal to the RE corrector 1717. The RG 1716 can be configured to apply a RG value to the first portion of the input signal 501 (step 1752 of process 1750). In some embodiments, the RIG can be realized by cascading the RG 1716 with the RE corrector 1717, such that the RE correction factor is applied to the first portion of the input signal after the RG value is applied. In some embodiments, the RIG 516 can be cascaded with the RIP corrector 518, forming the RIP control system 512 that is cascaded with the reverberator 514.
The RE corrector 1717 can receive a signal from the RG 1716 and can be configured to calculate and apply a RE correction factor to its input signal (from RG 1716) (step 1754). In some embodiments, the RE correction factor may be calculated such that it represents the total energy in a reverberator impulse response when: (1) a RIP is set to 1.0, and (2) a reverberation onset time is set equal to the time of emission of a unit impulse by a sound source. Both the RG 1716 and the REC 1717 can apply (and/or calculate) the RG value and the REC correction factor, respectively, such that when applied in series, the signal output from the RE corrector 1717 can be normalized to a predetermined value (e.g., unity (1.0)). The RIP of an output signal can be controlled by applying a reverberator gain in series with the reverberator, the reverberator energy corrector factor, and the reverberator initial power factor, as shown in
The RIP corrector 518 can receive a signal from the RIG 516 and can be configured to calculate and apply a RIP correction factor to its input signal (from the RIG 516) (step 1756). The reverberator 514 can receive a signal from the RIP corrector 518 and can be configured to introduce reverberation effects in the signal (step 1758).
In some embodiments, the RIP of a virtual room may be controlled using the reverberation processing system 510A of
The RG 516 of the reverberation processing system 510D (of
In some embodiments, the RE can be calculated and used to represent the amplification of an input signal by a reverberation processing system. The amplification may be expressed in terms of signal power. As shown in
In some embodiments, the RMS power curve may be expressed as a continuous function of time t. In such instance, the RE may be expressed as:
In some embodiment, such as discrete-time embodiments of a reverberation processing system, the RMS power curve can be expressed as a function of the discrete time t=n/Fs. In such instance, the RE may be expressed as:
where FS is the same rate.
In some embodiments, a RE correction factor may be calculated and applied in series with the RIP correction factor and the reverberator, so that the RE may be normalized to a predetermined value (e.g., unity (1.0)). The REC may be set equal to the reciprocal of the square root of RE, as follows:
In some embodiments, a RIP of an output reverberation signal may be controlled by applying a RG value in series with a RE correction factor, a RIP correction factor, and a reverberator, such as shown in the reverberation processing system 510C of
RIG=RG*REC (11)
Therefore, the RE correction factor (REC) may be used to control the RIP correction factor in terms of the signal-domain RG quantity, instead of the RIG.
In some embodiments, the RIP may be mapped to a signal power amplification measured derived by integrated RE in the system impulse response. As shown above in Equations (10)-(11), this mapping allows the control of the RIP via the familiar notion of a signal amplification factor, namely, the RG. In some embodiments, the advantage of assuming instant reverberation onset for the RE calculation, as shown in
In some embodiments, the reverb RMS power curve of an impulse response of the reverberator 514 can be expressed as a decaying function of time. The decaying function of time can start at time t=0.
Prms(t)=RIP*e−at (12)
In some embodiments, the decay parameter can be expressed as a function of decay time T60, as follows:
α=3*log(10)/T60 (13)
The total RE may be expressed as:
In some embodiments, the RIP may be normalized to a predetermined value (e.g., unity (1.0)), and the REC may be expressed as follows:
In some embodiments, the REC may be approximated according to the following equation:
Due to the application of the REC factor, adjusting the RG value or the reverberation decay time T60 at runtime may have an effect of automatically correcting the RIP of the reverberation processing system such that the RG can operate as an amplification factor for the RMS amplitude of an output signal (e.g., output signal 502) relative to the RMS amplitude of an input signal (e.g., input signal 501). It should be noted that adjusting the reverberation decay time T60 may not require recalculating the RIP correction factor because, in some embodiments, the RIP may not be affected by a modification of the decay time.
In some embodiments, the REC may be defined based on measuring the RE as the energy in the reverberation tail between two points specified in time from a sound source emission, after having set the RIP to 1.0 by applying the RIP correction factor. This may be beneficial, for example, when using convolution with a measured reverberation tail.
In some embodiments, the RE correction factor may be defined based on measuring the RE as the energy in the reverberation tail between two points defined using energy thresholds, after having set the RIP to 1.0 by applying the RIP correction factor. In some embodiments, energy thresholds relative to the direct sound, or absolute energy thresholds, may be used.
In some embodiments, the RE correction factor may be defined based on measuring the RE as the energy in the reverberation tail between one point defined in time and one point defined using an energy threshold, after having set the RIP to 1.0 by applying the RIP correction factor.
In some embodiments, the RE correction factor may be computed by considering a weighted sum of the energy contributed by the different coupled spaces, after having set the RIP of each of the reverberation tails to 1.0 by applying the RIP correction factor to each reverb. One exemplary application of this RE correction factor computation may be where an acoustical environment includes two or more coupled spaces.
With respect to the systems and methods described above, elements of the systems and methods can be implemented by one or more computer processors (e.g., CPUs or DSPs) as appropriate. The disclosure is not limited to any particular configuration of computer hardware, including computer processors, used to implement these elements. In some cases, multiple computer systems can be employed to implement the systems and methods described above. For example, a first computer processor (e.g., a processor of a wearable device coupled to a microphone) can be utilized to receive input microphone signals, and perform initial processing of those signals (e.g., signal conditioning and/or segmentation, such as described above). A second (and perhaps more computationally powerful) processor can then be utilized to perform more computationally intensive processing, such as determining probability values associated with speech segments of those signals. Another computer device, such as a cloud server, can host a speech recognition engine, to which input signals are ultimately provided. Other suitable configurations will be apparent and are within the scope of the disclosure.
Although the disclosed examples have been fully described with reference to the accompanying drawings, it is to be noted that various changes and modifications will become apparent to those skilled in the art. For example, elements of one or more implementations may be combined, deleted, modified, or supplemented to form further implementations. Such changes and modifications are to be understood as being included within the scope of the disclosed examples as defined by the appended claims.
This application is a continuation of U.S. Non-Provisional patent application Ser. No. 16/442,359, filed on Jun. 14, 2019, which claims benefit of U.S. Provisional Patent Application No. 62/685,235, filed on Jun. 14, 2018, which are hereby incorporated by reference in their entirety.
Number | Name | Date | Kind |
---|---|---|---|
5555306 | Gerzon | Sep 1996 | A |
9940922 | Schissler et al. | Apr 2018 | B1 |
10810992 | Audfray | Oct 2020 | B2 |
20070195967 | Wu | Aug 2007 | A1 |
20140003630 | Demiya | Jan 2014 | A1 |
20150379980 | Neugebauer | Dec 2015 | A1 |
20170126194 | Jot | May 2017 | A1 |
20170223478 | Jot et al. | Aug 2017 | A1 |
Entry |
---|
International Preliminary Report on Patentability and Written Opinion dated Dec. 15, 2020, for PCT Application No. PCT/US2019/037384, filed Jun. 14, 2019, seven pages. |
Final Office Action dated Apr. 8, 2020, for U.S. Appl. No. 16/442,359, filed Jun. 14, 2019, 15 pages. |
International Search Report dated Sep. 13, 2019, for PCT Application No. PCT/US19/37384, filed Jun. 14, 2019, three pages. |
Non-Final Office Action dated Oct. 30, 2019, for U.S. Appl. No. 16/442,359, filed Jun. 14, 2019, eleven pages. |
Notice of Allowance dated Jun. 12, 2020, for U.S. Appl. No. 16/442,359, filed Jun. 14, 2019, ten pages. |
European Search Report dated Jul. 9, 2021, for EP Application No. 19820590.8, ten pages. |
Jot, J-M. et al. (Oct. 2016). “Augmented Reality Headphone Environment Rendering,” Audio Eng. Soc. Conf. on Audio for Virtual and Augmented Reality, Los Angeles, California, Sep. 30-Oct. 1, 2016, six pages. |
Murgai, P. et al. (Oct. 2017). “Blind Estimation of the Reverberation Fingerprint of Unknown Acoustic Environments,” Convention Paper 9905, Audio Engineering Society, 143rd Conv. New York, New York, Oct. 18-21, 2017, six pages. |
Number | Date | Country | |
---|---|---|---|
20210065675 A1 | Mar 2021 | US |
Number | Date | Country | |
---|---|---|---|
62685235 | Jun 2018 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 16442359 | Jun 2019 | US |
Child | 17020584 | US |