The field of the invention relates to audio streams and more particularly to the use of gain control in audio streams.
The use of automatic gain control (AGC) in audio circuits is well known. Typically, AGC functions through the use of a feedback signal wherein a signal level of the audio signal is measured and used to control a gain of an upstream amplifier.
In general, AGC involves the automatic maintenance of a nearly constant output level of an amplifying circuit by adjusting the amplification in inverse proportion to an input signal strength. AGC is widely used in broadcast receivers to accommodate widely varying incoming signals and to allow for a sound that remains at nearly a constant volume.
The use of AGC in audio circuits inherently involves at least some filtering. Sound in the audible range must be given precedence over changes in volume in the sub-audible and ultrasound ranges. In general, an energy storage device, such as a capacitor may be used to collect and average a sound energy over a time period.
While prior art AGC systems generally work well, they are typically implemented in hardware. However, some audible applications cannot be implemented in hardware. Accordingly, a need exists for a method of controlling volume that is not dependent upon circuit devices.
A method and apparatus are provided for controlling a gain of an audio stream. The method includes the steps of collecting a plurality of samples of the audio stream, squaring a magnitude of a representation of at least some samples of the collected plurality of samples, summing the squared representations and adjusting a magnitude of the plurality of samples by a value equal to a square root of a ratio between the sum and a predetermined reference value.
As depicted in
In order to set up a conference call, the parties 12, 14, 16 may dial the telephone number of a gateway 200 (
In general, when audio streams are mixed, such as by a conference call system 10, it is useful to first perform automatic gain control (AGC) to bring the audio streams to similar volume levels. When AGC is done in software, it is necessary to perform the gain control very efficiently. This is complicated by the fact that audio streams are often encoded using some compression algorithm, such as the standard G.711 codec.
The G.711 codec uses a representation of the voice sample similar to floating point numbers. For G.711, each 8-bit sample may be encoded using the format shown below.
The segment number is similar to a floating point exponent, and the amplitude number is similar to a mantissa. G.711 includes two different encoding schemes, A-law and μ-law, which differ in how they assign segments, but they have essentially equivalent functionality. Using the example of A-law, if the level is taken to be between 0 and 15, inclusive, and the segment between 0 and 7, inclusive, then the magnitude of a sample would be given by the equality, m=(16+q)2s. The total sound energy of a series of samples would be equal to the square root of the sum of the squares of the magnitudes of all of the samples. The goal of AGC would be to adjust the samples such that the input streams of each of the participants 12, 14, 16 have roughly equal total sound energy during periods of speech.
Under illustrated embodiments of the invention, it has been found that it is sufficient to approximate the magnitude of the speech samples by ignoring the level (bits 5-8), and using only the segment information (bits 2-4). Therefore a proxy for the total sound energy can be computed by taking the square root of the sum of the squares of the value 2s for each sample. To reduce computation time and avoid the need for floating point arithmetic, the described method does not compute the square root, and instead computes the sum of the value 22s for each sample, thus representing the square of the energy level. Since the sample is in binary, the squaring of a number involves shifting the bits by one position. This total will be referred to as T.
The specific method used for calculating T is to provide a ring buffer 300 (
A squared value 22s may be determined within a shift register 308 for each sample within the ring buffer 300. The values 22s determined from each sample within the ring buffer 300 may be added within an adder 310 to provide a value, T. After initialization, the values 22s for the new samples loaded into the ring buffer 300 may be added to a value T, and the value 22s for the samples being removed from the buffer may be subtracted from T.
A reference value T1 may be determined which represents the expected value of T for a reference audio level input. When T is approximately equal to T1, it indicates that a gain factor of 1 should be applied, i.e., the input signal should not be modified. When T1 is not equal to T, then it indicates that the square root of the ratio between T1 and T should be applied as the gain factor to each of the samples.
A series of threshold values Tn1-Tn2 and associated gain factors may be determined based upon T1. For example, if a threshold value Tn is chosen (T15/16) to simulate a sequence of samples that are each 1/15 larger than T1, then T15/16 is equal to T1*(16/15)2, indicating that if T approximates T15/16, then a gain factor of 15/16 should be applied. This suggests that each sample should be reduced in volume by multiplying the linear equivalent value of each sample by 15/16. Any number of gain level combinations 314, 316 (each with a threshold value Tn and associated gain factor) can be created, and for any gain level x, Tx=T1*(1/x) 2, where the adjustment is squared because T represents the square of the approximate speech energy, since the square root function was not previously applied.
During use, a value T is calculated for the samples within the ring buffer 300 during each time interval (e.g., every 20 ms). The calculated value T is them compared with the reference threshold values Tn 314, 316 within a comparator 312 to identify a closest match. Once the closest match is identified between the value T and the threshold values Tn, an associated gain factor may be retrieved from the matched file 314, 316. The retrieved gain factor may be multiplied by each voice sample within a volume adjuster 306.
In addition to calculating the appropriate gain level, for any given audio stream, the system also detects and keeps track of the highest magnitude sample 318 that has been received. Detection may be performed by comparing each sample with the largest sample 318 and storing the larger as the new sample 318. The largest sample 318 may be used by a gain processor 320 to determine a set of values Tn and associated gain factors.
The number 318 is never reset for the life of the audio stream. The gain processor 320 calculates a set of threshold values Tn and associated gain factors so that this sample 318 would never be clipped. In other words, the system will never choose a gain factor that, when applied to the highest magnitude sample, would cause the adjusted sample to exceed the possible sample range. This allows the gain adjustment to be done without explicit testing for overflow or clipping conditions.
A specific embodiment of method and apparatus for controlling the gain of an audio stream has been described for the purpose of illustrating the manner in which the invention is made and used. It should be understood that the implementation of other variations and modifications of the invention and its various aspects will be apparent to one skilled in the art, and that the invention is not limited by the specific embodiments described. Therefore, it is contemplated to cover the present invention and any and all modifications, variations, or equivalents that fall within the true spirit and scope of the basic underlying principles disclosed and claimed herein.
| Number | Date | Country | |
|---|---|---|---|
| 60526393 | Dec 2003 | US |