This invention relates generally to acoustic modeling, and more particularly, to a system and method for adjusting delay of an audio signal.
There is a growing interest to improve methods and systems for audio displays that can present audio signals conveying accurate impressions of three dimensional sound fields. The audio display systems utilize techniques that model the transfer of acoustic energy in a sound environment from one point to another. The realism of an acoustic display can be enhanced by including ambient effects. One important effect is caused by reflections. A listener hears the sound not only directly from the source but also as reflections of the sound from nearby objects. In most environments, a sound field comprises sound waves arriving at a particular point, such as a listener's ear, along a direct path from the sound source and along paths reflecting off one or more surfaces of walls, floor, ceiling, and other objects. Sounds cannot only be heard as emanating from a sound source, but also as they are reflected off of walls, leak through doors from an adjoining room, get occluded as they disappear around a corner, or suddenly appear overhead as a listener steps into the open from a room.
Once a sound wave has been emitted, it travels through an environment where several things happen. The sound can travel directly to the listener (direct path), bounce off an object once and then reach the listener (first order reflected path), bounce off two surfaces before reaching the listener (second order reflected path), and so on. Second and higher order reflections usually combine to form late field reflections, or reverb. The direction of arrival for a reflection is generally not the same as that of the direct path sound wave. The propagation path of a reflected sound wave is longer than a direct path sound wave, thus reflections arrive later. In addition, the amplitude and spectral content of a reflection will generally differ because of energy absorbing qualities of the reflective surfaces. Reflections add to the naturalness and immersiveness of the sound field and provide cues to the size, shape, and composition of the acoustic environment.
In addition to the variable propagation delay of reflections, the time at which sounds are heard by a right ear and left ear of a listener varies based on the location of the source of the sound due to interaural time difference (ITD). Interaural time difference refers to the fact that a sound will typically arrive earlier at one ear than at the other ear. If the sound arrives at the left ear first, for example, the listener's brain knows that the sound is somewhere to the left.
The material from which the reflecting object is made affects the way the sound reflects off and transmits through an object. Each time a sound is reflected off of an object, the material of the object has an effect on how much each frequency component of the sound wave is absorbed, and how much is reflected back into the environment. For example, a carpeted room sounds very different from a glass room. An object's material characteristics can be measured empirically by recording known sounds as they bounce off of materials and modeled as a gain value, for example. Wall surface materials and acoustic space geometries are typically stored in a database for use by a sound processor.
Sound processors are designed to simulate the acoustics of an environment relative to a listener. The processor simulates direct path propagation, reflections, and other acoustic effects. For example, effects of reflection and ITD may be synthesized by appropriately delaying the source signal. Individual reflections are typically modeled as copies of an original signal modified with appropriate spectral, positional, and temporal cues. The output is the summation of the individual reflections, direct paths, and other acoustic effects. An example is the simulation of a person talking inside a rectangular room having carpeted walls. The signals include a direct path signal and six first-order reflections (one for each of the four walls, floor, and ceiling). Propagation distance and direction of arrival for the seven signals is determined from information about the acoustic space, including room geometry and source and listener locations. In order to simulate the different propagation distances, each signal is delayed an amount proportional to the propagation distance. Amplitude and spectral cues are added to each signal for propagation effects such as distance, attenuation, and atmospheric absorption. Gain, delay, and spectral effects are added to each signal to provide localization cues based on the direction of arrival of the sound. Pitch of the signals may also vary due to Doppler effects when the listener or source is moving. Reflections also have amplitude and spectral cues added to them based on the reflective properties of the walls. All of these added cues may change continuously due to changes in the simulation or environment (e.g., change in position of source or listener). The output is a summation of the direct path and six reflections, each having different delays, gains, pitch, and spectral effects, which produce the perception of a person talking inside the modeled room.
Conventional audio processors provide the variable delays used to simulate propagation distances by positioning taps (a, b, c) at different locations along a delay line buffer B, located on a host computer, for example (
The computational cost of performing interpolation, as well as acoustic processing for propagation, reflection, and localization effects for a large number of reflections is significant. While this processing may be performed using special purpose hardware, the amount of special purpose memory required to store the delay lines is high. For example, a one-half second delay at a 48 kHz sampling rate requires the storage of 24,000 samples.
There is, therefore, a need for a system and method of efficiently rendering sound reflections in special purpose hardware with limited memory requirements.
A method and system for adjusting a time delay between a first audio signal and a second audio signal to provide acoustic rendering is disclosed. The method generally includes generating the first audio signal from a buffer as a first data stream and generating the second audio signal from the buffer as a second data stream after an initial time delay. The method further includes receiving the first data stream at a first sample rate converter at a first consumption rate and generating a first output data stream at an output sample rate and receiving the second data stream at a second sample rate converter at a second consumption rate and generating a second output data steam at the output sample rate. One of the first and second consumption rates is changed so that the initial time delay between the first and second output data streams is adjusted over time to provide an adjusted time delay.
A system for adjusting a time delay between a first audio signal and a second audio signal generally includes a buffer operable to receive an audio signal as a data stream which includes a plurality of samples and transmit the first and second audio samples. The system further includes a first sample rate converter operable to receive the first audio samples from the buffer at a first consumption rate and generate a first output data stream at an output sample rate. A second sample rate converter is provided to receive the second audio samples from the buffer at a second consumption rate and generate a second output data stream at the output sample rate. The system further includes a controller operable to change one of the first or second consumption rates to adjust the time delay between the first output data stream and the second output data stream over time.
The above is a brief description of some deficiencies in the prior art and advantages of the present invention. These and other features, advantages, and embodiments of the invention will be apparent to those skilled in the art from the following description, drawings, and claims.
Corresponding reference characters indicate corresponding parts throughout the several views of the drawings.
Referring now to the drawings and first to
The sound signal is input into a delay line (e.g., buffer or queue) 24 on a host computer as a stream of data 22 which includes a plurality of samples at an input rate. The source signal is stored as sampled data which is representative of an input waveform, or other suitable audio data format, on the host computer along with the geometry of the sound environment (e.g., locations of objects, walls, floor, and ceiling) and locations of the source and listener relative to the sound environment. It is to be understood that the sampled data or the geometry of the sound environment may also be stored on a sound card or other special purpose hardware, instead of the host computer. The positioning data is continuously updated to account for movement of the sound source and listener.
The delay line 24 includes a plurality of non-interpolating taps 26, 28, 30 which stream delay line signal samples sequentially to a sound chip. For simplification, the operation of the delay line will be described in terms of a fixed buffer of data with a plurality of taps moving through the buffer, rather than a queue with fixed taps having data moving through the queue as previously described with respect to
The number of taps may increase over time to account for new objects or surfaces which are introduced into the sound environment and result in additional reflections, or decrease when a sound wave no longer reflects off an object, due to a listener leaving a room, for example. In order to provide a smooth transition as new reflections are added or old reflections are removed, the volume is preferably ramped up from a low volume (e.g., zero) to the required volume level when a new tap is added and the volume is ramped down to a low volume before a tap is removed. The new taps may be added directly between existing taps as they move along the buffer of data, rather than being placed at the beginning of the buffer. Preferably, the number of taps are kept to a minimum by using appropriate resource management tools to select certain reflections to model while eliminating other less significant reflections. Resource management may be used to selectively add, remove, or swap out reflections as required during modeling of a sound wave to provide high quality modeling while limiting the required resources.
It is to be understood that the number of taps and positioning of the taps may be different than shown herein without departing from the scope of the invention. The taps may also be used to provide a delay to model various other audio effects.
The data is streamed from the delay line 24 at the locations of the taps 26, 28, 30 through a bus 32 (e.g., PCI bus) to a plurality of processing blocks 34 located on the sound chip (
It is to be understood that the arrangement of components within the system may be different than shown and described herein without departing from the scope of the invention. For example, the delay line 24 may be located on the sound card in which case the FIFO queue 36 may be eliminated. In this case the sample rate converter 38 can pull data samples directly off the delay line as required.
The FIFO queue 36 contains the most recent samples of the input signal which were tapped off of the delay line 24 and streamed through the bus 32. The queue 36 holds a plurality of samples of data so that the sample rate converter 38 has a number of samples available to it for performing interpolation. The sample rate converter 38 pulls data from the queue 36 as it needs additional data and as the queue gets low it pulls additional data samples from the delay line 24. The FIFO queue 36 holds a sufficient number of data samples so that it can provide data to the sample rate converter 38 whenever the converter needs additional samples. The rate at which the sample rate converter 38 loads data from the queue 36 (consumption rate) is dependent on the input sample rate as well as the current rate of delay change, as further described below.
The sample rate converter 38 converts the input data stream comprising a plurality of input samples at one sample rate to an output data stream comprising a plurality of output samples at a different sample rate. The sample rate is converted to provide a constant output frequency at the sample rate converter 38, for example. The sample rate converter 38 includes an interpolation filter to allow for the instantaneous value of the signal to be determined at any arbitrary point between samples, as is necessary when a different sampling rate is introduced due to the non-coincidence of sample times within the converter. The interpolation filter preferably allows for arbitrary changes in the sampling rate. The interpolation filter may use linear interpolation, second order interpolation, cubic interpolation, or any other appropriate interpolation method as well known to those skilled in the art. The sample rate converter 38 operates in conjunction with the delay line 24 to provide a specified delay to the audio stream to model reflections, as further described below.
The following describes a method for controlling the time delay of the audio stream as it passes through the system 20. The time delay is first defined and a method for measuring the delay is described. The method used to vary the delay is next described and followed with specific examples showing different delay times.
Delay, in the context of the reflection processing described herein, is a relative term that allows for comparison of the current location of a tap in the delay line buffer 24 to a desired location. In the following description, delay is defined relative to a hypothetical zero-delay tap moving through the buffer at the buffer sample rate. Direct path and reflection signals will lag behind this zero-delay tap according to their respective propagation delays. Delay may also be defined relative to another tap moving through the buffer instead of a hypothetical zero-delay tap. Delay at a time t, may be expressed as:
The first quotient (Number Samples Output(t)/Output Sample rate) represents the location of the zero-delay tap in the buffer relative to a starting point and time such as the beginning of the buffer. The second quotient (Number of Samples Consumed(t)/Buffer Sample rate) represents the tap for which the delay is being defined.
The Number of Samples Output(t) may be the number of samples output by the sample rate converter from time to t0 time t, in which case:
Number of Samples Output(t)=Output Sample Rate×(t−t0).
The Number of Samples Output may be measured by reading a counter register of the sample rate converter 38, for example.
The Number of Samples Consumed(t) may be the number of samples input to the sample rate converter from time to t0 time t, in which case it is a function of the sample rate converter step size over the interval from t0 to t:
The Number of Samples Consumed(t) may be determined by measuring the location of the tap in the delay line buffer 24 relative to the location at time to and the number of samples remaining in the FIFO queue 36.
The rate of consumption of the input data stream into the sample rate converter 38 (i.e., the rate at which data is pulled from the FIFO queue 36 and input into the sample rate converter 38 (# samples/second)) is adjusted to provide the required increase or decrease in delay of the signal. The rate of consumption is controlled by varying a step size used to convert sample rates within the sample rate converter 38. By increasing or decreasing the step size, a delay can be subtracted or added over time. Step size at time t may be calculated as follows:
For example, if the buffer is sampled at 24 kHz and the sample rate converter output rate is 48 kHz, the Base Step Size is equal to 0.5. By changing the term ΔStep Size(t) to increase or decrease the Step Size, a delay can be subtracted or added over time. Changing the ΔStep Size(t), thus changes the delay as follows:
If the data is continuously output from the sample rate converter 38 at a constant sampling frequency, a change in step size results in a change in consumption rate of the data. The sample rate converter 38 is thus operable to vary the rate of consumption of data from the FIFO queue 36 into the sample rate converter by adjusting the ΔStep Size of the sample rate converter.
In
During operation of the system, after the sound wave is started through the delay line 24, locations of the taps are compared to their desired location as determined by a desired delay. The desired delay is calculated in the host, based on information from the geometry engine, for example. If the delay provided by the delay line 24 is different than the desired delay, the step size of the sample rate converter 38 is adjusted to either increase or decrease the overall delay between the time the signal is input to the delay line buffer 24 and output from the processing block 34 (
One embodiment of a feedback control system, generally indicated at 58, is shown schematically in
The following describes the control system shown in
The calculation of step size is preferably performed periodically to reduce the delay error. The control system 58 monitors progress after a predetermined period of time has passed. If error has been reduced to zero, the consumption rate is returned to its original value. If the processor overshoots or undershoots the desired delay, the control system 58 must reprogram the consumption rate to further reduce the delay error. The control system 58 is designed to reduce the error as quickly as possible, without significant overshoot or long-term drift, while limiting the maximum change in consumption rate to prevent objectionable Doppler effects.
Processing steps for a second embodiment of the control system is shown in
It is to be understood that a control system different than those described herein may be used to monitor and adjust the time delay of the signals, without departing from the scope of the invention. Allocation of the steps of
While the system 20 has been described with respect to modeling reflections, the system may also be used for other special purpose applications such as reverberator applications, for example.
The delay line 24 may also be used to create interaural time differences (ITD) due to differences in the time it takes sound to reach the left ear and the time it takes for the sound to reach the right ear. At each tap location, a pair of taps may be provided one for each ear, to account for ITD. Since ITD values are relatively small (e.g., <1 msec), the requirement of the feedback control mechanisms described below are stricter than for reflections. It is to be understood that changes in the signal to account for ITD can also be performed in an audio processor to reduce the number of taps on the delay line 24.
In view of the above, it will be seen that the several objects of the invention are achieved and other advantageous results attained. As various changes could be made in the above constructions and methods without departing from the scope of the invention, it is intended that all matter contained in the above description and shown in the accompanying drawings shall be interpreted as illustrative and not in a limiting sense.
Number | Name | Date | Kind |
---|---|---|---|
5342990 | Rossum | Aug 1994 | A |
5457719 | Guo et al. | Oct 1995 | A |
5781461 | Jaffe et al. | Jul 1998 | A |
6138207 | Rossum | Oct 2000 | A |
6477255 | Yoshida et al. | Nov 2002 | B1 |