The present invention relates to audio dynamic range control methods and apparatus in which an audio processing device analyzes an audio signal and changes the level, gain or dynamic range of the audio as a function of auditory events. The invention also relates to computer programs for practicing such methods or controlling such apparatus.
The techniques of automatic gain control (AGC) and dynamic range control (DRC) are well known and are a common element of many audio signal paths. In an abstract sense, both techniques measure the level of an audio signal in some manner and then gain-modify the signal by an amount that is a function of the measured level. In a linear, 1:1 dynamics processing system, the input audio is not processed and the output audio signal ideally matches the input audio signal. Additionally, if one has an audio dynamics processing system that automatically measures characteristics of the input signal and uses that measurement to control the output signal, if the input signal rises in level by 6 dB and the output signal is processed such that it only rises in level by 3 dB, then the output signal has been compressed by a ratio of 2:1 with respect to the input signal. International Publication Number WO 2006/047600 A1 (“Calculating and Adjusting the Perceived Loudness and/or the Perceived Spectral Balance of an Audio Signal” by Alan Jeffrey Seefeldt) provides a detailed overview of the five basic types of dynamics processing of audio: compression, limiting, automatic gain control (AGC), expansion and gating.
The division of sounds into units or segments perceived as separate and distinct is sometimes referred to as “auditory event analysis” or “auditory scene analysis” (“ASA”) and the segments are sometimes referred to as “auditory events” or “audio events.” An extensive discussion of auditory scene analysis is set forth by Albert S. Bregman in his book Auditory Scene Analysis—The Perceptual Organization of Sound, Massachusetts Institute of Technology, 1991, Fourth printing, 2001, Second MIT Press paperback edition). In addition, U.S. Pat. No. 6,002,776 to Bhadkamkar, et al, Dec. 14, 1999 cites publications dating back to 1976 as “prior art work related to sound separation by auditory scene analysis.” However, the Bhadkamkar, et al patent discourages the practical use of auditory scene analysis, concluding that “[t]echniques involving auditory scene analysis, although interesting from a scientific point of view as models of human auditory processing, are currently far too computationally demanding and specialized to be considered practical techniques for sound separation until fundamental progress is made.”
A useful way to identify auditory events is set forth by Crockett and Crocket et al in various patent applications and papers listed below under the heading “Incorporation by Reference.” According to those documents, an audio signal is divided into auditory events, each of which tends to be perceived as separate and distinct, by detecting changes in spectral composition (amplitude as a function of frequency) with respect to time. This may be done, for example, by calculating the spectral content of successive time blocks of the audio signal, calculating the difference in spectral content between successive time blocks of the audio signal, and identifying an auditory event boundary as the boundary between successive time blocks when the difference in the spectral content between such successive time blocks exceeds a threshold. Alternatively, changes in amplitude with respect to time may be calculated instead of or in addition to changes in spectral composition with respect to time.
In its least computationally demanding implementation, the process divides audio into time segments by analyzing the entire frequency band (full bandwidth audio) or substantially the entire frequency band (in practical implementations, band limiting filtering at the ends of the spectrum is often employed) and giving the greatest weight to the loudest audio signal components. This approach takes advantage of a psychoacoustic phenomenon in which at smaller time scales (20 milliseconds (ms) and less) the ear may tend to focus on a single auditory event at a given time. This implies that while multiple events may be occurring at the same time, one component tends to be perceptually most prominent and may be processed individually as though it were the only event taking place. Taking advantage of this effect also allows the auditory event detection to scale with the complexity of the audio being processed. For example, if the input audio signal being processed is a solo instrument, the audio events that are identified will likely be the individual notes being played. Similarly for an input voice signal, the individual components of speech, the vowels and consonants for example, will likely be identified as individual audio elements. As the complexity of the audio increases, such as music with a drumbeat or multiple instruments and voice, the auditory event detection identifies the “most prominent” (i.e., the loudest) audio element at any given moment.
At the expense of greater computational complexity, the process may also take into consideration changes in spectral composition with respect to time in discrete frequency subbands (fixed or dynamically determined or both fixed and dynamically determined subbands) rather than the full bandwidth. This alternative approach takes into account more than one audio stream in different frequency subbands rather than assuming that only a single stream is perceptible at a particular time.
Auditory event detection may be implemented by dividing a time domain audio waveform into time intervals or blocks and then converting the data in each block to the frequency domain, using either a filter bank or a time-frequency transformation, such as the FFT. The amplitude of the spectral content of each block may be normalized in order to eliminate or reduce the effect of amplitude changes. Each resulting frequency domain representation provides an indication of the spectral content of the audio in the particular block. The spectral content of successive blocks is compared and changes greater than a threshold may be taken to indicate the temporal start or temporal end of an auditory event.
Preferably, the frequency domain data is normalized, as is described below. The degree to which the frequency domain data needs to be normalized gives an indication of amplitude. Hence, if a change in this degree exceeds a predetermined threshold that too may be taken to indicate an event boundary. Event start and end points resulting from spectral changes and from amplitude changes may be ORed together so that event boundaries resulting from either type of change are identified.
Although techniques described in said Crockett and Crockett at al applications and papers are particularly useful in connection with aspects of the present invention, other techniques for identifying auditory events and event boundaries may be employed in aspects of the present invention.
According to one embodiment, a method for processing an audio signal is disclosed. The, the method includes monitoring a characteristic of the audio signal, identifying a change in the characteristic, establishing an auditory event boundary to identify the change in the characteristic, wherein an audio portion between consecutive auditory event boundaries constitutes an auditory event, and the applying a modification to the audio signal based in part on an occurrence of an auditory event.
In some embodiments, the method operates on an audio signal that includes two or more channels of audio content. In these embodiments, the auditory event boundary is identified by examining changes in the characteristic between the two or more channels of the audio signal. In other embodiments, the audio processing method generates one or more dynamically-varying parameters in response to the auditory event.
Typically, an auditory event is a segment of audio that tends to be perceived as separate and distinct. One usable measure of signal characteristics includes a measure of the spectral content of the audio, for example, as described in the cited Crockett and Crockett et al documents. All or some of the one or more audio dynamics processing parameters may be generated at least partly in response to the presence or absence and characteristics of one or more auditory events. An auditory event boundary may be identified as a change in signal characteristics with respect to time that exceeds a threshold. Alternatively, all or some of the one or more parameters may be generated at least partly in response to a continuing measure of the degree of change in signal characteristics associated with said auditory event boundaries. Although, in principle, aspects of the invention may be implemented in analog and/or digital domains, practical implementations are likely to be implemented in the digital domain in which each of the audio signals are represented by individual samples or samples within blocks of data. In this case, the signal characteristics may be the spectral content of audio within a block, the detection of changes in signal characteristics with respect to time may be the detection of changes in spectral content of audio from block to block, and auditory event temporal start and stop boundaries each coincide with a boundary of a block of data. It should be noted that for the more traditional case of performing dynamic gain changes on a sample-by-sample basis, that the auditory scene analysis described could be performed on a block basis and the resulting auditory event information being used to perform dynamic gain changes that are applied sample-by-sample.
By controlling key audio dynamics processing parameters using the results of auditory scene analysis, a dramatic reduction of audible artifacts introduced by dynamics processing may be achieved.
The present invention presents two ways of performing auditory scene analysis. The first performs spectral analysis and identifies the location of perceptible audio events that are used to control the dynamic gain parameters by identifying changes in spectral content. The second way transforms the audio into a perceptual loudness domain (that may provide more psychoacoustically relevant information than the first way) and identifies the location of auditory events that are subsequently used to control the dynamic gain parameters. It should be noted that the second way requires that the audio processing be aware of absolute acoustic reproduction levels, which may not be possible in some implementations. Presenting both methods of auditory scene analysis allows implementations of ASA-controlled dynamic gain modification using processes or devices that may or may not be calibrated to take into account absolute reproduction levels.
In some embodiments, a method for processing an audio signal in an audio processing apparatus is disclosed. The method includes receiving the audio signal and a parameter, the parameter indicating a location of an auditory event boundary. An audio portion between consecutive auditory event boundaries constitutes an auditory event. The method further includes applying a modification to the audio signal based in part on an occurrence of the auditory event. The audio processing apparatus may be implemented at least in part in hardware and the parameter may be generated by monitoring a characteristic of the audio signal and identifying a change in the characteristic.
In some embodiment, a method for processing an audio signal in an audio processing apparatus is disclosed. The method includes receiving the audio signal. The audio signal may comprise at least one channel of audio content. The audio signal may be divided into a plurality of subband signals with an analysis filterbank. Each of the plurality of subband signals may include at least one subband sample. A characteristic of the audio signal may be derived. The characteristic is a power measure of the audio signal. The power measure may be smoothed to generate a smoothed power measure of the audio signal, wherein the smoothing is based on a low-pass filter. A location of an auditory event boundary may be detected by monitoring the smoothed power measure. An audio portion between consecutive auditory event boundaries may constitute an auditory event. A gain vector may be generated based on the location of the auditory event boundary. The gain vector may be applied to a version of the plurality of subband signals to generate modified subband signals. The modified subband signals may be synthesized with a synthesis filterbank to produce a modified audio signal. The audio processing apparatus can implemented at least in part with hardware.
In some embodiments, the characteristic further includes loudness. The characteristic may include perceived loudness. The characteristic may further include phase. The characteristic may further include a sudden change in signal power. The audio signal may include two or more channels of audio content. The auditory event boundary may be identified by examining changes in the characteristic between the two or more channels. The characteristic may include interchannel phase difference. The characteristic may include interchannel correlation. The auditory event boundary may coincide with a beginning or end of a block of data in the audio signal. The auditory event boundary may be adjusted to coincide with a boundary of a block of data in the audio signal.
Aspects of the present invention are described herein in an audio dynamics processing environment that includes aspects of other inventions. Such other inventions are described in various pending United States and International Patent Applications of Dolby Laboratories Licensing Corporation, the owner of the present application, which applications are identified herein.
In accordance with an embodiment of one aspect of the present invention, auditory scene analysis may be composed of four general processing steps as shown in a portion of
The first step, illustrated conceptually in
Following the identification of the event boundaries, key characteristics of the auditory event are identified, as shown in step 1-4.
Either overlapping or non-overlapping segments of the audio may be windowed and used to compute spectral profiles of the input audio. Overlap results in finer resolution as to the location of auditory events and, also, makes it less likely to miss an event, such as a short transient. However, overlap also increases computational complexity. Thus, overlap may be omitted.
The following variables may be used to compute the spectral profile of the input block:
In general, any integer numbers may be used for the variables above. However, the implementation will be more efficient if M is set equal to a power of 2 so that standard FFTs may be used for the spectral profile calculations. In a practical embodiment of the auditory scene analysis process, the parameters listed may be set to:
The above-listed values were determined experimentally and were found generally to identify with sufficient accuracy the location and duration of auditory events. However, setting the value of P to 256 samples (50% overlap) rather than zero samples (no overlap) has been found to be useful in identifying some hard-to-find events. While many different types of windows may be used to minimize spectral artifacts due to windowing, the window used in the spectral profile calculations is an M-point Hanning, Kaiser-Bessel or other suitable, preferably non-rectangular, window. The above-indicated values and a Hanning window type were selected after extensive experimental analysis as they have shown to provide excellent results across a wide range of audio material. Non-rectangular windowing is preferred for the processing of audio signals with predominantly low frequency content. Rectangular windowing produces spectral artifacts that may cause incorrect detection of events. Unlike certain encoder/decoder (codec) applications where an overall overlap/add process must provide a constant level, such a constraint does not apply here and the window may be chosen for characteristics such as its time/frequency resolution and stop-band rejection.
In step 1-1 (
Step 1-2 calculates a measure of the difference between the spectra of adjacent blocks. For each block, each of the M (log) spectral coefficients from step 1-1 is subtracted from the corresponding coefficient for the preceding block, and the magnitude of the difference calculated (the sign is ignored). These M differences are then summed to one number. This difference measure may also be expressed as an average difference per spectral coefficient by dividing the difference measure by the number of spectral coefficients used in the sum (in this case M coefficients).
Step 1-3 identifies the locations of auditory event boundaries by applying a threshold to the array of difference measures from step 1-2 with a threshold value. When a difference measure exceeds a threshold, the change in spectrum is deemed sufficient to signal a new event and the block number of the change is recorded as an event boundary. For the values of M and P given above and for log domain values (in step 1-1) expressed in units of dB, the threshold may be set equal to 2500 if the whole magnitude FFT (including the mirrored part) is compared or 1250 if half the FFT is compared (as noted above, the FFT represents negative as well as positive frequencies—for the magnitude of the FFT, one is the mirror image of the other). This value was chosen experimentally and it provides good auditory event boundary detection. This parameter value may be changed to reduce (increase the threshold) or increase (decrease the threshold) the detection of events.
The process of
Alternatives to the arrangement of
The details of this practical embodiment are not critical. Other ways to calculate the spectral content of successive time segments of the audio signal, calculate the differences between successive time segments, and set auditory event boundaries at the respective boundaries between successive time segments when the difference in the spectral profile content between such successive time segments exceeds a threshold may be employed.
International application under the Patent Cooperation Treaty S.N. PCT/US2005/038579, filed Oct. 25, 2005, published as International Publication Number WO 2006/047600 A1, entitled “Calculating and Adjusting the Perceived Loudness and/or the Perceived Spectral Balance of an Audio Signal” by Alan Jeffrey Seefeldt discloses, among other things, an objective measure of perceived loudness based on a psychoacoustic model. Said application is hereby incorporated by reference in its entirety. As described in said application, from an audio signal, x[n], an excitation signal E[b, t] is computed that approximates the distribution of energy along the basilar membrane of the inner ear at critical band b during time block t. This excitation may be computed from the Short-time Discrete Fourier Transform (STDFT) of the audio signal as follows:
where X[k,t] represents the STDFT of x[n] at time block t and bin k. Note that in equation 1 t represents time in discrete units of transform blocks as opposed to a continuous measure, such as seconds. T[k] represents the frequency response of a filter simulating the transmission of audio through the outer and middle ear, and Cb[k] represents the frequency response of the basilar membrane at a location corresponding to critical band b.
Using equal loudness contours, such as those depicted in
where TQ1kHz is the threshold in quiet at 1 kHz and the constants β and α are chosen to match growth of loudness data as collected from listening experiments. Abstractly, this transformation from excitation to specific loudness may be presented by the function Ψ{ } such that:
N[b,t]=Ψ{E[b,t]}
Finally, the total loudness, L[t], represented in units of sone, is computed by summing the specific loudness across bands:
The specific loudness N[b,t] is a spectral representation meant to simulate the manner in which a human perceives audio as a function of frequency and time. It captures variations in sensitivity to different frequencies, variations in sensitivity to level, and variations in frequency resolution. As such, it is a spectral representation well matched to the detection of auditory events. Though more computationally complex, comparing the difference of N[b,t] across bands between successive time blocks may in many cases result in more perceptually accurate detection of auditory events in comparison to the direct use of successive FFT spectra described above.
In said patent application, several applications for modifying the audio based on this psychoacoustic loudness model are disclosed. Among these are several dynamics processing algorithms, such as AGC and DRC. These disclosed algorithms may benefit from the use of auditory events to control various associated parameters. Because specific loudness is already computed, it is readily available for the purpose of detecting said events. Details of a preferred embodiment are discussed below.
Two examples of embodiments of the invention are now presented. The first describes the use of auditory events to control the release time in a digital implementation of a Dynamic Range Controller (DRC) in which the gain control is derived from the Root Mean Square (RMS) power of the signal. The second embodiment describes the use of auditory events to control certain aspects of a more sophisticated combination of AGC and DRC implemented within the context of the psychoacoustic loudness model described above. These two embodiments are meant to serve as examples of the invention only, and it should be understood that the use of auditory events to control parameters of a dynamics processing algorithm is not restricted to the specifics described below.
The described digital implementation of a DRC segments an audio signal x[n] into windowed, half-overlapping blocks, and for each block a modification gain based on a measure of the signal's local power and a selected compression curve is computed. The gain is smoothed across blocks and then multiplied with each block. The modified blocks are finally overlap-added to generate the modified audio signal y[n].
It should be noted, that while the auditory scene analysis and digital implementation of DRC as described here divides the time-domain audio signal into blocks to perform analysis and processing, the DRC processing need not be performed using block segmentation. For example the auditory scene analysis could be performed using block segmentation and spectral analysis as described above and the resulting auditory event locations and characteristics could be used to provide control information to a digital implementation of a traditional DRC implementation that typically operates on a sample-by-sample basis. Here, however, the same blocking structure used for auditory scene analysis is employed for the DRC to simplify the description of their combination.
Proceeding with the description of a block based DRC implementation, the overlapping blocks of the audio signal may be represented as:
x[n,t]=w[n]x[n+tM/2] for 0<n<M−1 (4)
where M is the block length and the hopsize is M/2, w[n] is the window, n is the sample index within the block, and t is the block index (note that here t is used in the same way as with the STDFT in equation 1; it represents time in discrete units of blocks rather than seconds, for example). Ideally, the window w[n] tapers to zero at both ends and sums to unity when half-overlapped with itself; the commonly used sine window meets these criteria, for example.
For each block, one may then compute the RMS power to generate a power measure P[t] in dB per block:
As mentioned earlier, one could smooth this power measure with a fast attack and slow release prior to processing with a compression curve, but as an alternative the instantaneous power P[t] is processed and the resulting gain is smoothed. This alternate approach has the advantage that a simple compression curve with sharp knee points may be used, but the resulting gains are still smooth as the power travels through the knee-point. Representing a compression curve as shown in
G[t]=F{P[t]} (6)
Assuming that the compression curve applies greater attenuation as signal level increases, the gain will be decreasing when the signal is in “attack mode” and increasing when in “release mode”. Therefore, a smoothed gain G[t] may be computed according to:
where
and
αrelease>>αattach (7c)
Finally, the smoothed gain
y[n+tM/2]=(10
Note that because the blocks have been multiplied with a tapered window, as shown in equation 4, the overlap-add synthesis shown above effectively smooths the gains across samples of the processed signal y[n]. Thus, the gain control signal receives smoothing in addition to that in shown in equation 7a. In a more traditional implementation of DRC operating sample-by-sample rather than block-by-block, gain smoothing more sophisticated than the simple one-pole filter shown in equation 7a might be necessary in order to prevent audible distortion in the processed signal. Also, the use of block based processing introduces an inherent delay of M/2 samples into the system, and as long as the decay time associated with αattack is close to this delay, the signal x[n] does not need to be delayed further before the application of the gains for the purposes of preventing overshoot.
above −20 dB relative to full scale digital the signal is attenuated with a ratio of 5:1, and below −30 dB the signal is boosted with a ratio of 5:1. The gain is smoothed with an attack coefficient αattack corresponding to a half-decay time of 10 ms and a release coefficient αrelease corresponding to a half-decay time of 500 ms. The original audio signal depicted in
In the first signal that was examined in
A suitable behavior of the release control is now described. In qualitative terms, if an event is detected, the gain is smoothed with the release time constant as specified above in Equation 7a. As time evolves past the detected event, and if no subsequent events are detected, the release time constant continually increases so that eventually the smoothed gain is “frozen” in place. If another event is detected, then the smoothing time constant is reset to the original value and the process repeats. In order to modulate the release time, one may first generate a control signal based on the detected event boundaries.
As discussed earlier, event boundaries may be detected by looking for changes in successive spectra of the audio signal. In this particular implementation, the DFT of each overlapping block x[n, t] may be computed to generate the STDFT of the audio signal x[n]:
Next, the difference between the normalized log magnitude spectra of successive blocks may be computed according to:
Here the maximum of |X[k,t]| across bins k is used for normalization, although one might employ other normalization factors; for example, the average of |X[k,t]| across bins. If the difference D[t] exceeds a threshold Dmin, then an event is considered to have occurred. Additionally, one may assign a strength to this event, lying between zero and one, based on the size of D[t] in comparison to a maximum threshold Dmax. The resulting auditory event strength signal A[t] may be computed as:
By assigning a strength to the auditory event proportional to the amount of spectral change associated with that event, greater control over the dynamics processing is achieved in comparison to a binary event decision. The inventors have found that larger gain changes are acceptable during stronger events, and the signal in equation 11 allows such variable control.
The signal A[t] is an impulsive signal with an impulse occurring at the location of an event boundary. For the purposes of controlling the release time, one may further smooth the signal A[t] so that it decays smoothly to zero after the detection of an event boundary. The smoothed event control signal A[t] may be computed from A[t] according to:
Here αevent controls the decay time of the event control signal.
One may now use the event control signal Ā[t] to vary the release time constant used for smoothing the gain. When the control signal is equal to one, the smoothing coefficient α[t] from Equation 7a equals αrelease, as before, and when the control signal is equal to zero, the coefficient equals one so that the smoothed gain is prevented from changing. The smoothing coefficient is interpolated between these two extremes using the control signal according to:
By interpolating the smoothing coefficient continuously as a function of the event control signal, the release time is reset to a value proportionate to the event strength at the onset of an event and then increases smoothly to infinity after the occurrence of an event. The rate of this increase is dictated by the coefficient αevent used to generate the smoothed event control signal.
Loudness Based AGC and DRC As an alternative to traditional dynamics processing techniques where signal modifications are a direct function of simple signal measurements such as Peak or RMS power, International Patent Application S.N. PCT/US2005/038579 discloses use of the psychoacoustic based loudness model described earlier as a framework within which to perform dynamics processing. Several advantages are cited. First, measurements and modifications are specified in units of sone, which is a more accurate measure of loudness perception than more basic measures such as Peak or RMS power. Secondly, the audio may be modified such that the perceived spectral balance of the original audio is maintained as the overall loudness is changed. This way, changes to the overall loudness become less perceptually apparent in comparison to a dynamics processor that utilizes a wideband gain, for example, to modify the audio. Lastly, the psychoacoustic model is inherently multi-band, and therefore the system is easily configured to perform multi-band dynamics processing in order to alleviate the well-known cross-spectral pumping problems associated with a wideband dynamics processor.
Although performing dynamics processing in this loudness domain already holds several advantages over more traditional dynamics processing, the technique may be further improved through the use of auditory events to control various parameters. Consider the audio segment containing piano chords as depicted in 27a and the associated DRC shown in
The loudness domain dynamics processing system that is now described consists of AGC followed by DRC. The goal of this combination is to make all processed audio have approximately the same perceived loudness while still maintaining at least some of the original audio's dynamics.
Auditory events may be utilized to control the attack and release of both the AGC and DRC. In the case of AGC, both the attack and release times are large in comparison to the temporal resolution of event perception, and therefore event control may be advantageously employed in both cases. With the DRC, the attack is relatively short, and therefore event control may be needed only for the release as with the traditional DRC described above.
As discussed earlier, one may use the specific loudness spectrum associated with the employed loudness model for the purposes of event detection. A difference signal D[t], similar to the one in Equations 10a and b may be computed from the specific loudness N[b,t], defined in Equation 2, as follows:
Here the maximum of |N[b,t]| across frequency bands b is used for normalization, although one might employ other normalization factors; for example, the average of |N[b,t]| across frequency bands. If the difference D[t] exceeds a threshold Dmin, then an event is considered to have occurred. The difference signal may then be processed in the same way shown in Equations 11 and 12 to generate a smooth event control signal Ā[t] used to control the attack and release times.
The AGC curve depicted in
Lo=FAGC{Li} (15a)
The DRC curve may be similarly represented:
Lo=FDRC{Li} (15b)
For the AGC, the input loudness is a measure of the audio's long-term loudness. One may compute such a measure by smoothing the instantaneous loudness L[t], defined in Equation 3, using relatively long time constants (on the order of several seconds). It has been shown that in judging an audio segment's long term loudness, humans weight the louder portions more heavily than the softer, and one may use a faster attack than release in the smoothing to simulate this effect. With the incorporation of event control for both the attack and release, the long-term loudness used for determining the AGC modification may therefore be computed according to:
LAGC[t]=αAGC[t]LAGC[t−1]+(1−αAGC[t])L[t] (16a)
where
In addition, one may compute an associated long-term specific loudness spectrum that will later be used for the multi-band DRC:
NAGC[b,t]=αAGC[t]NAGC[b,t−1]+(1−αAGC[t])N[b,t] (16c)
In practice one may choose the smoothing coefficients such that the attack time is approximately half that of the release. Given the long-term loudness measure, one may then compute the loudness modification scaling associated with the AGC as the ratio of the output loudness to input loudness:
The DRC modification may now be computed from the loudness after the application of the AGC scaling. Rather than smooth a measure of the loudness prior to the application of the DRC curve, one may alternatively apply the DRC curve to the instantaneous loudness and then subsequently smooth the resulting modification. This is similar to the technique described earlier for smoothing the gain of the traditional DRC. In addition, the DRC may be applied in a multi-band fashion, meaning that the DRC modification is a function of the specific loudness N[b, t] in each band b, rather than the overall loudness L[t]. However, in order to maintain the average spectral balance of the original audio, one may apply DRC to each band such that the resulting modifications have the same average effect as would result from applying DRC to the overall loudness. This may be achieved by scaling each band by the ratio of the long-term overall loudness (after the application of the AGC scaling) to the long-term specific loudness, and using this value as the argument to the DRC function. The result is then rescaled by the inverse of said ratio to produce the output specific loudness. Thus, the DRC scaling in each band may be computed according to:
The AGC and DRC modifications may then be combined to form a total loudness scaling per band:
STOT[b,t]=SAGC[t]SDRC[b,t] (19)
This total scaling may then be smoothed across time independently for each band with a fast attack and slow release and event control applied to the release only. Ideally smoothing is performed on the logarithm of the scaling analogous to the gains of the traditional DRC being smoothed in their decibel representation, though this is not essential. To ensure that the smoothed total scaling moves in sync with the specific loudness in each band, attack and release modes may by determined through the simultaneous smoothing of specific loudness itself:
where
Finally one may compute a target specific loudness based on the smoothed scaling applied to the original specific loudness
{circumflex over (N)}[b,t]=
and then solve for gains G[b,t] that when applied to the original excitation result in a specific loudness equal to the target:
{circumflex over (N)}[b,t]=Ψ{G2[b,t]E[b,t]} (22)
The gains may be applied to each band of the filterbank used to compute the excitation, and the modified audio may then be generated by inverting the filterbank to produce a modified time domain audio signal.
While the discussion above has focused on the control of AGC and DRC attack and release parameters via auditory scene analysis of the audio being processed, other important parameters may also benefit from being controlled via the ASA results. For example, the event control signal Ā[t] from Equation 12 may be used to vary the value of the DRC ratio parameter that is used to dynamically adjust the gain of the audio. The Ratio parameter, similarly to the attack and release time parameters, may contribute significantly to the perceptual artifacts introduced by dynamic gain adjustments.
The invention may be implemented in hardware or software, or a combination of both (e.g., programmable logic arrays). Unless otherwise specified, the algorithms included as part of the invention are not inherently related to any particular computer or other apparatus. In particular, various general-purpose machines may be used with programs written in accordance with the teachings herein, or it may be more convenient to construct more specialized apparatus (e.g., integrated circuits) to perform the required method steps. Thus, the invention may be implemented in one or more computer programs executing on one or more programmable computer systems each comprising at least one processor, at least one data storage system (including volatile and non-volatile memory and/or storage elements), at least one input device or port, and at least one output device or port. Program code is applied to input data to perform the functions described herein and generate output information. The output information is applied to one or more output devices, in known fashion.
Each such program may be implemented in any desired computer language (including machine, assembly, or high level procedural, logical, or object oriented programming languages) to communicate with a computer system. In any case, the language may be a compiled or interpreted language.
Each such computer program is preferably stored on or downloaded to a storage media or device (e.g., solid state memory or media, or magnetic or optical media) readable by a general or special purpose programmable computer, for configuring and operating the computer when the storage media or device is read by the computer system to perform the procedures described herein. The inventive system may also be considered to be implemented as a computer-readable storage medium, configured with a computer program, where the storage medium so configured causes a computer system to operate in a specific and predefined manner to perform the functions described herein.
A number of embodiments of the invention have been described. Nevertheless, it will be understood that various modifications may be made without departing from the spirit and scope of the invention. For example, some of the steps described herein may be order independent, and thus may be performed in an order different from that described.
It should be understood that implementation of other variations and modifications of the invention and its various aspects will be apparent to those skilled in the art, and that the invention is not limited by these specific embodiments described. It is therefore contemplated to cover by the present invention any and all modifications, variations, or equivalents that fall within the true spirit and scope of the basic underlying principles disclosed and claimed herein.
The following patents, patent applications and publications are hereby incorporated by reference, each in their entirety.
This application is a continuation of U.S. patent application Ser. No. 17/839,099, filed Jun. 13, 2022, which is a continuation of U.S. patent application Ser. No. 17/093,178, filed Nov. 9, 2020, now U.S. Pat. No. 11,362,631, which is a continuation of U.S. patent application Ser. No. 16/729,468 filed Dec. 29, 2019, now U.S. Pat. No. 10,833,644, which is a divisional of U.S. patent application Ser. No. 16/365,947 filed on Mar. 27, 2019, now U.S. Pat. No. 10,523,169, which is a continuation of U.S. patent application Ser. No. 16/128,642 filed on Sep. 12, 2018, now U.S. Pat. No. 10,284,159, which is a continuation application of U.S. patent application Ser. No. 15/809,413 filed on Nov. 10, 2017, now U.S. Pat. No. 10,103,700, which is a continuation of U.S. patent application Ser. No. 15/447,564 filed on Mar. 2, 2017, now U.S. Pat. No. 9,866,191, which is a continuation of U.S. patent application Ser. No. 15/238,820 filed on Aug. 17, 2016, now U.S. Pat. No. 9,685,924, which is a continuation of U.S. patent application Ser. No. 13/850,380 filed on Mar. 26, 2013, now U.S. Pat. No. 9,450,551, which is a continuation of U.S. patent application Ser. No. 13/464,102 filed on May 4, 2012, now U.S. Pat. No. 8,428,270, which is a continuation of U.S. patent application Ser. No. 13/406,929 filed on Feb. 28, 2012, now U.S. Pat. No. 9,136,810, which is a continuation of U.S. patent application Ser. No. 12/226,698 filed on Jan. 19, 2009, now U.S. Pat. No. 8,144,881, which is a national application of PCT application No. PCT/US2007/008313 filed Mar. 30, 2007, which claims the benefit of the filing date of U.S. Provisional Patent Application No. 60/795,808 filed on Apr. 27, 2006, all of which are hereby incorporated by reference.
Number | Name | Date | Kind |
---|---|---|---|
2808475 | Stryker | Oct 1957 | A |
4281218 | Chuang et al. | Jul 1981 | A |
4543537 | Kuhn et al. | Sep 1985 | A |
4624009 | Glenn et al. | Nov 1986 | A |
4739514 | Short et al. | Apr 1988 | A |
4882762 | Waldhauer | Nov 1989 | A |
4887299 | Cummins et al. | Dec 1989 | A |
5027410 | Williamson et al. | Jun 1991 | A |
5097510 | Graupe | Mar 1992 | A |
5172358 | Kimura | Dec 1992 | A |
5278912 | Waldhauer | Jan 1994 | A |
5363147 | Joseph et al. | Nov 1994 | A |
5369711 | Williamson, III | Nov 1994 | A |
5377277 | Bisping | Dec 1994 | A |
RE34961 | Widin et al. | Jun 1995 | E |
5457769 | Valley | Oct 1995 | A |
5463695 | Werrbach | Oct 1995 | A |
5500902 | Stockham, Jr. et al. | Mar 1996 | A |
5530760 | Paisley | Jun 1996 | A |
5548538 | Grace et al. | Aug 1996 | A |
5583962 | Davis et al. | Dec 1996 | A |
5615270 | Miller et al. | Mar 1997 | A |
5632005 | Davis et al. | May 1997 | A |
5633981 | Davis | May 1997 | A |
5649060 | Ellozy et al. | Jul 1997 | A |
5663727 | Vokac | Sep 1997 | A |
5682463 | Allen et al. | Oct 1997 | A |
5712954 | Dezonno | Jan 1998 | A |
5724433 | Engebretson et al. | Mar 1998 | A |
5727119 | Davidson et al. | Mar 1998 | A |
5819247 | Freund et al. | Oct 1998 | A |
5848171 | Stockham, Jr. et al. | Dec 1998 | A |
5862228 | Davis | Jan 1999 | A |
5878391 | Aarts | Mar 1999 | A |
5907622 | Dougherty | May 1999 | A |
5909664 | Davis et al. | Jun 1999 | A |
6002776 | Bhadkamkar et al. | Dec 1999 | A |
6002966 | Loeb et al. | Dec 1999 | A |
6021386 | Davis et al. | Feb 2000 | A |
6041295 | Hinderks | Mar 2000 | A |
6061647 | Barrett | May 2000 | A |
6088461 | Lin et al. | Jul 2000 | A |
6094489 | Ishige et al. | Jul 2000 | A |
6108431 | Bachler | Aug 2000 | A |
6125343 | Schuster | Sep 2000 | A |
6148085 | Jung | Nov 2000 | A |
6182033 | Accardi et al. | Jan 2001 | B1 |
6185309 | Attias | Feb 2001 | B1 |
6233554 | Heimbigner et al. | May 2001 | B1 |
6263371 | Geagan, III et al. | Jul 2001 | B1 |
6272360 | Yamaguchi et al. | Aug 2001 | B1 |
6275795 | Tzirkel-Hancock | Aug 2001 | B1 |
6298139 | Poulsen et al. | Oct 2001 | B1 |
6301555 | Hinderks | Oct 2001 | B2 |
6311155 | Vaudrey et al. | Oct 2001 | B1 |
6314396 | Monkowski | Nov 2001 | B1 |
6327366 | Uvacek et al. | Dec 2001 | B1 |
6332119 | Hinderks | Dec 2001 | B1 |
6351731 | Anderson et al. | Feb 2002 | B1 |
6351733 | Saunders et al. | Feb 2002 | B1 |
6353671 | Kandel et al. | Mar 2002 | B1 |
6370255 | Schaub et al. | Apr 2002 | B1 |
6411927 | Morin et al. | Jun 2002 | B1 |
6430533 | Kolluru et al. | Aug 2002 | B1 |
6442278 | Vaudrey et al. | Aug 2002 | B1 |
6442281 | Sato et al. | Aug 2002 | B2 |
6473731 | Hinderks | Oct 2002 | B2 |
6498855 | Kokkosoulis et al. | Dec 2002 | B1 |
6529605 | Christoph | Mar 2003 | B1 |
6570991 | Scheirer et al. | May 2003 | B1 |
6625433 | Poirier et al. | Sep 2003 | B1 |
6639989 | Zacharov et al. | Oct 2003 | B1 |
6650755 | Vaudrey et al. | Nov 2003 | B2 |
6651040 | Bakis | Nov 2003 | B1 |
6651041 | Juric | Nov 2003 | B1 |
6700982 | Geurts et al. | Mar 2004 | B1 |
6731767 | Blamey | May 2004 | B1 |
6807525 | Li et al. | Oct 2004 | B1 |
6823303 | Su et al. | Nov 2004 | B1 |
6889186 | Michaelis | May 2005 | B1 |
6958644 | Palaskas | Oct 2005 | B2 |
6985594 | Vaudrey et al. | Jan 2006 | B1 |
7065498 | Thomas et al. | Jun 2006 | B1 |
7068723 | Foote et al. | Jun 2006 | B2 |
7095860 | Kemp | Aug 2006 | B1 |
7155385 | Berestesky et al. | Dec 2006 | B2 |
7171272 | Blamey et al. | Jan 2007 | B2 |
7212640 | Bizjak | May 2007 | B2 |
7283954 | Crockett | Oct 2007 | B2 |
7454331 | Vinton | Nov 2008 | B2 |
7461002 | Crockett | Dec 2008 | B2 |
7508947 | Smithers | Mar 2009 | B2 |
7551745 | Gundry | Jun 2009 | B2 |
7610205 | Crockett | Oct 2009 | B2 |
7617109 | Smithers | Nov 2009 | B2 |
7711123 | Crockett | May 2010 | B2 |
8054948 | Gailloux | Nov 2011 | B1 |
8090120 | Seefeldt | Jan 2012 | B2 |
8144881 | Crockett | Mar 2012 | B2 |
8230243 | Fujiwara | Jul 2012 | B2 |
9780751 | Crockett | Oct 2017 | B2 |
10103700 | Crockett | Oct 2018 | B2 |
10284159 | Crockett | May 2019 | B2 |
10833644 | Crockett | Nov 2020 | B2 |
11362631 | Crockett | Jun 2022 | B2 |
11711060 | Crockett | Jul 2023 | B2 |
20010027393 | Touimi et al. | Oct 2001 | A1 |
20010038643 | McParland | Nov 2001 | A1 |
20020013698 | Vaudrey et al. | Jan 2002 | A1 |
20020040295 | Saunders et al. | Apr 2002 | A1 |
20020051546 | Bizjak | May 2002 | A1 |
20020076072 | Cornelisse | Jun 2002 | A1 |
20020097882 | Greenberg et al. | Jul 2002 | A1 |
20020146137 | Kuhnel et al. | Oct 2002 | A1 |
20020147595 | Baumgarte | Oct 2002 | A1 |
20020173864 | Smith | Nov 2002 | A1 |
20030002683 | Vaudrey et al. | Jan 2003 | A1 |
20030035549 | Bizjak et al. | Feb 2003 | A1 |
20030223597 | Puria et al. | Dec 2003 | A1 |
20040013272 | Reams | Jan 2004 | A1 |
20040024591 | Boillot et al. | Feb 2004 | A1 |
20040037421 | Truman | Feb 2004 | A1 |
20040042617 | Beerends et al. | Mar 2004 | A1 |
20040044525 | Vinton et al. | Mar 2004 | A1 |
20040076302 | Christoph | Apr 2004 | A1 |
20040122662 | Crockett | Jun 2004 | A1 |
20040148159 | Crockett et al. | Jul 2004 | A1 |
20040165730 | Crockett | Aug 2004 | A1 |
20040172240 | Crockett et al. | Sep 2004 | A1 |
20040184537 | Geiger et al. | Sep 2004 | A1 |
20040190740 | Chalupper et al. | Sep 2004 | A1 |
20040213420 | Gundry et al. | Oct 2004 | A1 |
20050013443 | Marumoto | Jan 2005 | A1 |
20050027766 | Ben | Feb 2005 | A1 |
20050267744 | Nettre | Dec 2005 | A1 |
20050276425 | Forrester et al. | Dec 2005 | A1 |
20060002572 | Smithers et al. | Jan 2006 | A1 |
20060029239 | Smithers | Feb 2006 | A1 |
20060126865 | Blamey | Jun 2006 | A1 |
20060182290 | Yano | Aug 2006 | A1 |
20060215852 | Troxel | Sep 2006 | A1 |
20070258603 | Avendano | Nov 2007 | A1 |
20070274538 | Van Reck | Nov 2007 | A1 |
20070291959 | Seefeldt | Dec 2007 | A1 |
20090161883 | Katsianos | Jun 2009 | A1 |
20090220109 | Crockett et al. | Sep 2009 | A1 |
20120163629 | Seefeldt | Jun 2012 | A1 |
20120321096 | Crockett et al. | Dec 2012 | A1 |
20140376729 | Crockett | Dec 2014 | A1 |
Number | Date | Country |
---|---|---|
4335739 | May 1994 | DE |
19848491 | Apr 2000 | DE |
0517233 | Dec 1992 | EP |
0525544 | Feb 1993 | EP |
0637011 | Feb 1995 | EP |
0661905 | Oct 1995 | EP |
0746116 | Dec 1996 | EP |
1239269 | Sep 2002 | EP |
1251715 | Oct 2002 | EP |
1387487 | Feb 2004 | EP |
1601171 | Nov 2005 | EP |
1628397 | Feb 2006 | EP |
1736966 | Nov 2007 | EP |
2820573 | Aug 2002 | FR |
10074097 | Jul 1996 | JP |
2004-527000 | Sep 2004 | JP |
2004-528601 | Sep 2004 | JP |
2004-356894 | Dec 2004 | JP |
940003351 | Apr 1994 | KR |
720691 | Mar 1980 | SU |
318926 | Nov 1997 | TW |
563094 | Nov 2003 | TW |
9820482 | May 1998 | WO |
9827543 | Jun 1998 | WO |
WO9824082 | Jun 1998 | WO |
9929114 | Jun 1999 | WO |
0019414 | Apr 2000 | WO |
0078093 | Dec 2000 | WO |
0215587 | Feb 2002 | WO |
0217678 | Feb 2002 | WO |
03090208 | Oct 2003 | WO |
2004019656 | Mar 2004 | WO |
2004073178 | Aug 2004 | WO |
2004111994 | Dec 2004 | WO |
2005086139 | Sep 2005 | WO |
2005104360 | Nov 2005 | WO |
2006006977 | Jan 2006 | WO |
2006003536 | Jan 2006 | WO |
2006019719 | Feb 2006 | WO |
2006047600 | May 2006 | WO |
2006058361 | Jun 2006 | WO |
2006113047 | Oct 2006 | WO |
2006113062 | Oct 2006 | WO |
2007016107 | Feb 2007 | WO |
2007120452 | Oct 2007 | WO |
2007120453 | Oct 2007 | WO |
2007123608 | Nov 2007 | WO |
2007127023 | Nov 2007 | WO |
2008051347 | May 2008 | WO |
2008057173 | May 2008 | WO |
2008085330 | Jul 2008 | WO |
2008115445 | Sep 2008 | WO |
2008156774 | Dec 2008 | WO |
Entry |
---|
Atkinson, I. A., et al., “Time Envelope LP Vocoder: A New Coding Technology at Very Low Bit Rates,” 4.sup.th ed., 1995, ISSN 1018-4074, pp. 241-244. |
ATSC Standard A52/A: Digital Audio Compression Standard (AC-3), Revision A, Advanced Television Systems Committee, Aug. 20, 2001. The A/52A document is available on the World Wide Web at http://www./atsc.org..sub.--standards.html. |
Australian Broadcasting Authority (ABA), “Investigation into Loudness of Advertisements,” Jul. 2002. |
Ballou, Glen M. “Handbook for Sound Engineers, The New Audio Cyclopedia” 2nd Edition Dynamics, 850-851, Focal Press an imprint of Butterworth-Heinemann, 1998. |
Belger, “The Loudness Balance of Audio Broadcast Programs,” J. Audio Eng. Soc., vol. 17, No. 3, Jun. 1969, pp. 282-285. |
Bertsekas, Dimitri P., “Nonlinear Programming,” 1995, Chapter 1.2 “Gradient Methods—Convergence,” pp. 18-46. |
Bertsekas,. Dimitri P., “Nonlinear Programming,” 1995, Chapter 1.8 “Nonderivative Methods,”, pp. 142-148. |
Blesser, Barry “An Ultra-Miniature Console Compression System with Maximum User Flexibility” presented at the 41st Convention Oct. 5-8, 1971. cited by applicant. |
Blesser, Barry, “An Ultraminiature Console Compression System with Maximum User Flexibility” Journal of Audio Engineering Society, vol. 20, No. 4, May 1972, pp. 297-302, New York. |
Bosi, et al., “High Quality, Low-Rate Audio Transform Coding for Transmission and Multimedia Applications,” Audio Engineering Society Preprint 3365, 93.sup.rd AES Convention, Oct. 1992. cited by applicant. |
Bosi, et al., “Iso/Iec MPEG-2 Advanced Audio Coding,” J. Audio Eng. Soc., vol. 45, No. 10, Oct. 1997, pp. 789-814. cited by other. |
Brandenburg, et al., “Overview of MPEG Audio: Current and Future Standards for Low-Bit-Rate Audio Coding,” J. Audio eng. Soc., vol. 45, No. 1/2, Jan./Feb. 1997. cited by applicant. |
Bray, et al.; “An ”Optimized“ Platform for DSP Hearing Aids,” Sonic Innovations, vol. 1 No. 3 1998, pp. 1-4, presented at the Conference on Advanced Signal Processing Hearing Aids, Cleveland, OH, Aug. 1, 1998. cited by applicant. |
Bray, et al.; “Digital Signal Processing (DSP) Derived from a Nonlinear Auditory Model,” Sonic Innovations, vol. 1 No. 1 1998, pp. 1-3, presented at American Academy of Audiology, Los Angeles, CA, Apr. 4, 1998. |
Carroll, Tim, “Audio Metadata: You can get there from here”, Oct. 11, 2004, pp. 1-4, XP002392570. http://tvtechnology.com/features/audio.sub.--notes/f-TC-metadata-08.21.02-.shtml. |
CEI/IEC Standard 60804 published Oct. 2000. |
Chalupper, Josef; “Aural Exciter and Loudness Maximizer: What's Psychoacoustic about Psychoacoustic Processors?,” Audio Engineering Society (AES) 108.sup.th Convention, Sep. 22-25, 2000, Los Angeles, CA, pp. 1-20. |
Cheng-Chieh Lee, “Diversity Control Among Multiple Coders: A Simple Approach to Multiple Descriptions,” IEE, September. |
Claro Digital Perception Processing; “Sound Processing with a Human Perspective,” pp. 1-8. |
Crockett, Brett, “High Quality Multichannel Time Scaling and Pitch-Shifting using Auditory Scene Analysis,” Audio Engineering Society Convention Paper 5948, New York, Oct. 2003. |
Crockett, et al., “A Method for Characterizing and Identifying Audio Based on Auditory Scene Analysis,” Audio Engineering Society Convention Paper 6416, 118.sup.th Convention, Barcelona, May 28-31, 2005. |
Davis, Mark, “The AC-3 Multichannel Coder,” Audio engineering Society, Preprint 3774, 95.sup.th AES Convention, Oct. 1993. |
Fielder, et al., “Introduction to Dolby Digital Plus, an Enhancement to the Dolby Digital Coding System,” AES Convention Paper 6196, 117.sup.th AES Convention, Oct. 28, 2004. |
Fielder, et al., “Professional Audio Coder Optimized fro Use with Video,” AES Preprint 5033, 107.sup.th AES Conference, Aug. 1999. |
Ghent, Jr., et al.; “Expansion as a Sound Processing Tool in Hearing Aids,” American Academy of Audiology National Convention, Apr. 29-May 2, 1999, Miami Beach, FL. |
Ghent, Jr., et al.; “Uses of Expansion to Promote Listening Comfort with Hearing Aids,” American Academy of Audiology 12.sup.th Annual Convention, Mar. 16-19, 2000, Chicago, IL. |
Ghent, Jr., et al.; “Uses of Expansion to Promote Listening Comfort with Hearing Aids,” Sonic Innovations, vol. 3 No. 2, 2000, pp. 1-4, presented at American Academy of Audiology 12.sup.th Annual Convention, Chicago, IL, Mar. 16-19, 2000. |
Glasberg, et al., “A Model of Loudness Applicable to Time-Varying Sounds,” Journal of the Audio Engineering Society, Audio Engineering Society, New York, vol. 50, No. 5, May 2002, pp. 331-342. |
Guide to the Use of the ATSC Digital Television Standard, Dec. 4, 2003. |
H. H. Scott, “The Amplifier and its Place in the High Fidelity System,” J. Audio Eng. Soc., vol. 1, No. 3, Jul. 1953. |
Hauenstein M., “A Computationally Efficient Algorithm for Calculating Loudness Patterns of Narrowband Speech,” Acoustics, Speech and Signal Processing 1997. 1997 IEEE International Conference, Munich Germany, Apr. 21-24, 1997, Los Alamitos, Ca, USA, IEEE Comput. Soc., US, Apr. 21, 1997, pp. 1311-1314. |
Hennesand, et al., “Sound Design—Creating the Sound for Complex Systems and Virtual Objects,” Chapter II, “Anatomy and Psychoacoustics,” 2003-2004. |
Hoeg, W. et al.“Dynamic Range Control (DRC) and Music/Speech Control (MSC) Programme-Associated Data Services for DAB” EBU Review Technical, European Broadcasting Union, Brussels, BE, No. 261, Sep. 21, 1994, pp. 56-70. |
ISO226 : 1987 (E), “Acoustics—Normal Equal Loudness Level Contours.” |
Johns, et al.; “An Advanced Graphic Equalizer Hearing Aid: Going Beyond Your Home Audio System,” Sonic Innovations Corporation, Mar. 5, 2001, Http://www.audiologvonline.com/articles/pf.sub.--arc.sub.--disp.asp?id=27- 9. |
Laroche, Jean, “Autocorrelation Method for High-Quality Time/Pitch-Scaling”, Final Program and Paper Summaries, 1993 IEEE Workshop on New Paltz, NY. Oct. 17-20, 1993. pp. 131-134. |
Li, M., et al., “Wavelet-based Nonlinear AGC Method for Hearing Aid Loudness Compensation” IEE Proc.-Vis. Image Signal Process., vol. 147, No. 6, Dec. 2000. |
Lin, L., et al., “Auditory Filter Bank Design Using Masking Curves,” 7.sup.th European Conference on Speech Communications and Technology, Sep. 2001. |
Mapes, Riordan, et al., “Towards a model of Loudness Recalibration,” 1997 IEEE ASSP workshop on New Paltz, Ny USA, Oct. 19-22, 1997. |
Martinez G., Isaac; “Automatic Gain Control (AGC) Circuits—Theory and Design,” University of Toronto ECE1352 Analog Integrated Circuits 1, Term Paper, Fall 2001, pp. 1-25. |
Masciale, John M .; “The Difficulties in Evaluating A—Weighted Sound Level Measurements,” SV Observer, pp. 2-3. |
Moore, BCJ, “Use of a loudness model for hearing aid fitting, IV. Fitting hearing aids with multi-channel compression so as to restore “normal” loudness for speech at different levels.” British Journal of Audiology, vol. 34, No. 3, pp. 165-177, Jun. 2000, Whurr Publishers, UK. |
Moore, et al., “A Model for the Prediction of Thresholds, Loudness and Partial Loudness,” Journal of the Audio Engineering Society, Audio Engineering Society, New York, vol. 45, No. 4, Apr. 1997, pp. 224-240. |
Moulton, Dave, “Loud, Louder, Loudest!,” Electronic Musician, Aug. 1, 2003. |
Newcomb, et al., “Practical Loudness: an Active Circuit Design Approach,” J. Audio eng. Soc., vol. 24, No. 1, Jan./Feb. 1976. |
Nigro, et al., “Concert-Hall Realism through the Use of Dynamic Level Control,” J. Audio Eng. Soc., vol. 1, No. 1, Jan. 1953. |
Nilsson, et al.; “The Evolution of Multi-channel Compression Hearing Aids,” Sonic Innovations, Presented at American Academy of Audiology 13.sup.th Convention, San Diego, CA, Apr. 19-22, 2001. |
Painter, Ted, et al. “Perceptual Coding of Digital Audio”, Proceedings of the IEEE, vol. 88, No. 4, Apr. 2000. |
Park, et al.; “High Performance Digital Hearing Aid Processor with Psychoacoustic Loudness Correction,” IEEE Fam P3.1 0-7803-3734-4/97, pp. 312-313. |
Riedmiller, Jeff, “Working Toward Consistency in Progam Loudness,” Broadcast Engineering, Jan. 1, 2004. |
ISO/IEC 14496-3 “Information Technology—Coding of Audio-Visual Objects” Part 3: Audio, Third Edition Dec. 1, 2005. |
Robinson, et a., Dynamic Range Control via Metadata, 107.sup.th Convention of the AES, Sep. 14-27, 1999, New York. |
Robinson, et al., “Time-Domain Auditory Model for the Assessment of High-Quality Coded Audio,” 107.sup.th AES Convention, Sep. 1999. |
Saunders, “Real-Time Discrimination of Broadcast Speech/Music,” Proc. of Int. Conf. on Acoust. Speech and Sig. Proce., 1996, pp. 993-996. |
Schapire, “A Brief Introduction to Boosting,” Proc. of the 16.sup.th Int. Joint Conference on Artificial Intelligence, 1999. |
Scheirer and Slaney, “Construction and Evaluation of a robust Multifeature Speech/Music Discriminator,” Proc. of Int. Conf. on Acoust. Speech and Sig. Proc., 1997, pp. 1331-1334. |
Seefeldt, et al.; “A New Objective Measure of Perceived Loudness,” Audio Engineering Society (AES) 117.sup.th Convention, Paper 6236, Oct. 28-31, 2004, San Francisco, CA, pp. 1-8. |
Seo, et al., “Auditory Model Design for Objective Audio Quality Measurement,” Department of Electronic Engineering, Dongguk University, Seoul Korea. |
Smith, et al., “Tandem-Free VolP Conferencing: A Bridge to Next-Generation Networks,” IEEE Communications Magazine, IEEE Service Center, New York, NY, vol. 41, No. 5, May 2003, pp. 136-145. |
Soulodre, GA, “Evaluation of Objective Loudness Meters” Preprints of Papers Presented at the 116.sup.th AES Convention, Berlin, Germany, May 8, 2004. |
Stevens, “Calculations of the Loudness of Complex Noise,” Journal of the Acoustical Society of America, 1956. |
Swanson, Mitchell D., et al., “Multiresolution Video Watermarking using Perceptual Models and Scene Segmentation,” Department of Electrical and Computer Engineering, University of Minnesota, IEEE, 1997, pp. 558-561. |
Todd, Craig C., et al., AC-3: Flexible Perceptual Coding for Audio Transmission and Storage, pp. 1-16, Feb. 1994. |
Todd, et al., “Flexible Perceptual Coding for Audio Transmission and Storage,” 96.sup.th Convention of the Audio Engineering Society, Feb. 26, 1994, Preprint, 3796. |
Trapee, W., et al., “Key distribution for secure multimedia multicasts via data embedding,” 2001 IEEE International Conferenced on Acoustics, Speech, and Signal Processing. May 7-11, 2001. |
Truman, et al., “Efficient Bit Allocation, Quantization, and Coding in an Audio Distribution System,” AES Preprint 5068, 107.sup.th AES Conference, Aug. 1999. |
Vernon, Steve, “Design and Implementation of AC-3 Coders,” IEEE Trans. Consumer Electronics, vol. 41, No. 3, Aug. 1995. |
Watson, et al., “Signal Duration and Signal Frequency in Relation to Auditory Sensitivity,” Journal of the Acoustical Society of America, vol. 46, No. 4 (Part 2) 1969, pp. 989-997. |
Wei, S., et al., “Realization of Dynamic Range Controllers on a Digital Signal Processor for Audio Systems” J. Acoust. Soc. Jpn. (E) 16, 6, 1995. |
Zwicker, “Psychological and Methodical Basis of Loudness,” Acoustica, 1958. |
Zwicker, et al., “Psychoacoustics—Facts and Models,” Springer-Verlag, Chapter 8, “Loudness,” pp. 203-238, Berlin Heidelberg, 1990, 1999. |
Number | Date | Country | |
---|---|---|---|
20230318555 A1 | Oct 2023 | US |
Number | Date | Country | |
---|---|---|---|
60795808 | Apr 2006 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 16365947 | Mar 2019 | US |
Child | 16729468 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 17839099 | Jun 2022 | US |
Child | 18327585 | US | |
Parent | 17093178 | Nov 2020 | US |
Child | 17839099 | US | |
Parent | 16729468 | Dec 2019 | US |
Child | 17093178 | US | |
Parent | 16128642 | Sep 2018 | US |
Child | 16365947 | US | |
Parent | 15809413 | Nov 2017 | US |
Child | 16128642 | US | |
Parent | 15447564 | Mar 2017 | US |
Child | 15809413 | US | |
Parent | 15238820 | Aug 2016 | US |
Child | 15447564 | US | |
Parent | 13850380 | Mar 2013 | US |
Child | 15238820 | US | |
Parent | 13464102 | May 2012 | US |
Child | 13850380 | US | |
Parent | 13406929 | Feb 2012 | US |
Child | 13464102 | US | |
Parent | 12226698 | US | |
Child | 13406929 | US |