1. Field of the Invention
The invention is generally related to a sound generation approach that generates spatial sounds in a listening room. In particular, the invention relates to modeling with only a few user input parameters the listening room responses for a two-channel audio input based upon adjustable real-time parameters without coloring the original sound.
2. Related Art
The aim of a high-quality audio system is to faithfully reproduce a recorded acoustic event while generating a three-dimensional listening experience without coloring the original sound, in places such as a listening room, home theater or entertainment center, personal computer (PC) environment, or automobile. The audio signal from a two-channel stereo audio system or device is fundamentally limited in its ability to provide a natural three-dimensional listening experience, because only two frontal sound sources or loudspeakers are available. Phantom sound sources may only appear along a line between the loudspeakers at the loudspeaker's distance to the listener.
A true three-dimensional listening experience requires rendering the original acoustic environment with all sound reflections reproduced from their apparent directions. Current multi-channel recording formats add a small number of side and rear loudspeakers to enhance listening experience. But, such an approach requires the original audio media to be recorded or captured from each of the multiple directions. However, two-channel recording as found on traditional compact discs (CDs) is the most popular format for high-quality music today.
The current approaches to creating three-dimensional listening experiences have been focused on creating virtual acoustic environments for hall simulation using delayed sounds and synthetic reverb algorithms with digital filters. The virtual acoustic environment approach has been used with such devices as headphones and computer speakers. The synthetic reverb algorithm approach is widely used in both music production and home audio/audio-visual components such as consumer audio/video receivers (AVRs).
In
The left audio channel carries the left audio signal and the right audio channel carries the right audio signal. The AVR 104 may also have a left loudspeaker 110 and a right loudspeaker 112. The left loudspeaker 110 and right loudspeaker 112 each receive one of the audio signals carried by the stereo channels that originated at the audio device, such as CD player 106. The left loudspeaker 110 and right loudspeaker 112 enables a person sitting on sofa 114 to hear two-channel stereo sound.
The synthetic reverb algorithm approach may also be used in AVR 104. The synthetic reverb algorithm approach uses tapped delay lines that generate discrete room reflection patterns and recursive delay networks to create dense reverb responses and attempts to generate the perception of a number of surround channels. However, a very high number of parameters are needed to describe and adjust such an algorithm in the AVR to match a listening room and type of music. Such adjustments are very difficult and time-consuming for an average person or consumer seeking to find an optimum setting for a particular type of music. For this reason, AVRs may have pre-programmed sound fields for different types of music, allowing for some optimization for music type. But, the problem with such an approach it the pre-programmed sound fields lack any optimization for the actual listening room.
Another approach to generate surround channels from two-channel stereo signals employs a matrix of scale factors that are dynamically steered by the signal itself. Audio signal components with a dominant direction may be separated from diffuse audio signals, which are fed to the rear generated channels. But, such an approach to generating sound channels has several drawbacks. Sound sources may move undesirably due to dynamic steering and only one dominant, discrete source is typically detected. This approach also fails to enhance very dryly recorded music, because such source material does not contain enough ambient signal information to be extracted.
Along with the foregoing considerations, the known approaches discussed above for generation of surround channels typically add “coloration” to the audio signals that is perceptible by a person listening to the audio generated by the AVR 104. Therefore, there is a need for an approach to processing stereo audio signals that filters the input channels and generates a number of surround channels while allowing a user to control the filters in a simple and intuitive way in order to optimize their listening experience.
An approach to spatial processing of audio signals receives two or more audio signals (typically a left and right audio signal) and generates a number of additional surround sound audio signals that appear to be generated from around a predetermined location. The generation of the additional audio signals is customized by a user who inputs a limited number of parameters to define a listening room. A spatial processing stereo system then determines a number of coefficients, room impulse responses, and scaling factors from the limited number of parameters entered by the user. The coefficients, room impulse responses and scaling factors are then applied to the input signals that are further processed to generate the additional surround sound audio signals.
Other systems, methods, features and advantages of the invention will be or will become apparent to one with skill in the art upon examination of the following figures and detailed description. It is intended that all such additional systems, methods, features and advantages be included within this description, be within the scope of the invention, and be protected by the accompanying claims.
The invention can be better understood with reference to the following figures. The components in the figures are not necessarily to scale, emphasis instead being placed upon illustrating the principles of the invention. Moreover, in the figures, like reference numerals designate corresponding parts throughout the different views.
In
In the following description of examples of implementations of the present invention, reference is made to the accompanying drawings that form a part hereof, and which show, by way of illustration, specific implementations of the invention that may be utilized. Other implementations may be utilized and structural changes may be made without departing from the scope of the present invention.
Turning to
The SPSS 204 processes the two-channel stereo signal in such a way to generate seven audio channels in addition to the original left channel and right channel. In other implementations, two or more channels, in addition to the left and right stereo channels may be generated. Each audio channel from the AVR 202 may be connected to a loudspeaker, such as a center channel loudspeaker 212, four surround channel loudspeakers (side left 222, side right 224, rear left 226, and rear right 228), two elevated channeling loudspeakers (elevated left 218 and elevated right 220) in addition to the left loudspeakers 214 and right loudspeaker 216. The loudspeakers may be arranged around a central listening location or spot, such as sofa 230 located in listening room 208.
In
The additional four audio channels may be generated from the original right, left and center audio channels received from the television 308 and are connected to loudspeakers, such as the left loudspeaker 310, right loudspeaker 312 and center loudspeaker 314. The additional four audio channels are the rear left, rear right, side left and side right, and are connected to the rear left loudspeaker 320, rear right loudspeaker 322, side left loudspeaker 314, side right loudspeaker 318. All the loudspeakers may be located in a listing room 306 and placed relative to a central position, such as the sofa 324. The connection to the loudspeakers may be via wires, fiber optics, or electro magnetic waves (radio frequency, infrared, Bluetooth, wireless universal serial bus, or other non-wired connections).
In
The DSP 406 may be a microprocessor that processes the received digital signal or a controller designed specifically for processing digital audio signals. The DSP 406 may be implemented with different types of memory (i.e. RAM, ROM, EEPROM) located internal to the DSP, external to the DSP, or a combination of internal and external to the DSP. The DSP 406 may receive a clock signal from an oscillator that may be internal or external to the DSP, depending upon implementation design requirements such as cost. Preprogrammed parameters, preprogrammed instructions, variables, and user variables for filters 418, URP 416, and room response generator 420 may be incorporated into or programmed into the DSP 406. In other implementations, the SPSS 304 may be implemented in whole or in part within an audio signal processor separate from the DSP 406.
The SPSS 304 may operate at the audio sample rate of the analog-to-digital converter (44.1 KHz in the current implementation). In other implementations, the audio sample rate may be 48 KHz, 96 KHz or some other rate decided on during the design of the SPSS. In yet other implementations, the audio sample may be variable or selectable, with the selection based upon user input or cable detection. The SPSS 304 may generate the additional channels with the use of linear filters 418. The seven channels may then be passed through digital-to-analog (D/A) converters 422-434 and results in seven analog audio signals that may be amplified by amplifiers 436-448. The seven amplified audio signals are then output to the speakers 310-322 of
The URP 416 receives input or data from the user interface 414. The data is processed by the URP 416 to compute system variables for the SPSS 304 and may process other types of user interface input, such as input for the selector 412. The data for the SPSS 304 from the user interface 414 may be a limited set of input parameters related to spatial attributes, such as the three spatial attributes in the current implementation (stage width, stage distance, and room size).
The room response generator 420 computes a set of synthetic room impulse responses, which are filter coefficients. The room response generator 420 contains a statistical room model that generates modeled room impulse responses (RIRs) at its output. The RIRs may be used as filter coefficients for FIR filters that may be located in the AVR 302. A “room size” spatial attribute may be entered as an input parameter via the user interface 414 and processed by the URP 416 for generation of the RIRs by the room response generator 420. The “room size” spatial attribute input as an input parameter in the current implementation is a number in the range of 1 to 10, for example room_size=10. The room response generator 420 may be implemented in the DSP 406 as a background task or thread. In other implementations, the room response generator 420 may run off-line in a personal computer or other processor external to the DSP 406 or even the AVR 302.
Turning to
In the current implementation, a coefficient matrix 502 receives the left, right and center audio inputs. The coefficient matrix 502 is created in association with a “stage width” input parameter that is entered via the user interface 414 of
The left and right audio inputs may also be processed by a shelving filter processor 506. The shelving filter processor 506 applies shelving filters along with delay periods to the left and right audio signals inputted on the left and right audio inputs. The shelving filter processor 506 may be configured using a “stage distance” parameter that is input via the user interface 414 of
The left and right audio inputs may also be summed by a signal combiner 508. The combined left and right audio inputs may then be processed by a fast convolution processor 510 that uses the “room size” input parameter. The “room size” input parameter may be entered via the user interface 414 of
The left side, right side, left back and right back audio signals generated by the coefficient matrix 502, shelving filters box 506, and fast convolution processor 510, along with the left side, right side, left back and right back input audio signals inputted from all audio source are respectively combined. A sound field such as a five or seven channel stereo signal may also be selected via the user interface 414 and applied to or superimposed on the respectively combined signals to achieve a final audio output for the left side, right side, left back and right back output audio signals.
In
The center audio signal may be generated by the summation of the received left audio signal with the received right audio signal in a signal combiner 606. The signal combiner 606 may also employ a weight factor p2 that is dependent upon the state width parameter. The left side output signal and the right side output signal may also be scaled by a variable factor p3. All output signals (left, right, center, left side, and right side) may also be scaled by a common factor p4. The scale factors are determined by the URP 416 of
The stage width input parameter is an angular parameter φ in the range of zero to ninety degrees. The parameter controls the perceived width of the frontal stereo panorama, from minimum zero degrees to a maximum of ninety degrees. The scale factors p1-p4 are derived in the present implementation with the following formulas:
p1=0.3·[ cos(2πφ/180)−1],
p2=0.01·[80+0.2·φ], with center at input,
p2=0.01·[50+0.2·φ], without center at input,
p3=0.0247·φ,
p4=1/√{square root over (1+p12+p22+P32(1+p52))},
φε└0 . . . 90°┘.
The mappings are empirically optimized, in terms of perceived loudness, regardless of the input signals and chosen width setting, and in terms of uniformity of the image across the frontal stage. The output scale factor p4 normalizes the output energy for each width setting.
Turning to
In
The shelving filter process 506 receives the left audio signal at a first order high-shelving filter 802. Similarly, the shelving filter process 506 receives the right audio signal at another first order high shelving filter 804. The parameters of the shelving filters 802 and 804 may be gain “g” and corner frequency “fcs” and depend on the intended wall absorption properties of a modeled room. In the current implementation, “g” and “fcs” may be set to fixed values for convenience. Delays T1806, T2808, T3810, and T4812 are adjusted according to the intended stage distance parameter as determined by the URP 416 entered via the user interface 414. The resulting signals left side, left back, right side, and right back are attenuated by c11814, c12816, c13818, and c14820 respectively, resulting in attenuated signals left side, left back, right side, and right back.
Turning to
In
The pair of shorter decorrelation filters 1006 and 1008 with a length between 500-2,000 coefficients generates decorrelated versions of the room response. The impulse response of the decorrelation filters 1006 and 1008 may be constructed by using an exponentially decaying random noise sequence with normalization of its complex spectrum by the magnitude spectrum. With the resulting time domain signal computed with an inverse fast Fourier transform (FFT). The resulting filter may be classified as an all-pass filter and does not alter the frequency response in the signal path. However, the decorrelation filters 1006 and 1008 do cause time domain smearing and re-distribution, thereby generating decorrelated output signals when applying multiple filters with different random sequences.
The output from the decorrelation filters 1006 and 1008 are up-sampled by a factor of two respectively, by up-samplers 1010 and 1012. The resulting audio signal from the up-sampler 1010 is the left side audio signal that is scaled by a scale factor c21. The resulting audio signal from the up-sampler 1012 is the right audio signal that is scaled by a scale factor c24. The Ls and Rs are then used to generate the left back audio signal and right back audio signal.
The left back and right back audio signals are generated by another pair of decorrelated outputs using a simple 2×2-matrix with coefficients “a” 1014 and “b” 1016. Coefficients are chosen such that the center signal in the resulting stereo mix is attenuated, and the lateral signal (stereo width) amplified (for example a=0.3 and b=−0.7). The signals in the 2×2 matrix are combined by mixers 1018 and 1020. The resulting left back audio signal from mixer 1018 is scaled by a scale factor c22 and the resulting right back audio signal from mixer 1020 is scaled by a scale factor of c23.
Turning to
Turning to
fcl(Rsize)=[480, 723, 1090, 1642, 2473, 3726, 5614, 8458, 12744, 19200] Hz.
The first sequence may be element-wise multiplied using the multiplier 1206 by the second, lowpass filtered sequence. The result may be filtered with a first order shelving filter 1208 having a corner frequency fcs=10 kHz and gain “g”=0.5 in the current implementation, in order to simulate wall absorption properties. The two parameters are normally fixed.
In
Turning to
T60,i are the reverb times in the i-th band and fs is the sample frequency (typically fs=48 kHz). The sub-band signals may then be summed by a signal combiner 1412 or similar circuit to form the output sequence y(k).
In
The frequencies for fc(i) above denote the crossover (−6 dB) points of filter bank 1404. The gain factors ci (i=1 . . . 10) with linear interpolation between the ten frequency points, are displayed in graph 1600 shown in
The parameters above used to model the rooms may be obtained after measuring impulse responses in real halls of different sizes. The measured impulse responses may then be analyzed using the filter banks 1440. The energy in each band may then be measured and apparent peaks smoothed in order to eliminate pronounced resonances that could introduce unwanted colorations of the final audio signals.
In
Turning to
In
Means may be provided to assure smooth transitions between the parameter settings when parameters are change, such as interpolation techniques. The number of input parameters may be further reduced by, for example, combining stage distance and room size to one parameter that are controlled simultaneously with a single input device, such as a knob or keypad.
In
In
Turning to
Persons skilled in the art will understand and appreciate, that one or more processes, sub-processes, or process steps may be performed by hardware and/or software. Additionally, the SPSS described above may be implemented completely in software that would be executed within a processor or plurality of processors in a networked environment. Examples of a processor include but are not limited to microprocessor, general purpose processor, combination of processors, DSP, any logic or decision processing unit regardless of method of operation, instructions execution/system/apparatus/device and/or ASIC. If the process is performed by software, the software may reside in software memory (not shown) in the device used to execute the software. The software in software memory may include an ordered listing of executable instructions for implementing logical functions (i.e., “logic” that may be implemented either in digital form such as digital circuitry or source code or optical circuitry or chemical or biochemical in analog form such as analog circuitry or an analog source such an analog electrical, sound or video signal), and may selectively be embodied in any signal-bearing (such as a machine-readable and/or computer-readable) medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that may selectively fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. In the context of this document, a “machine-readable medium,” “computer-readable medium,” and/or “signal-bearing medium” (herein known as a “signal-bearing medium”) is any means that may contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. The signal-bearing medium may selectively be, for example but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, device, air, water, or propagation medium. More specific examples, but nonetheless a non-exhaustive list, of computer-readable media would include the following: an electrical connection (electronic) having one or more wires; a portable computer diskette (magnetic); a RAM (electronic); a read-only memory “ROM” (electronic); an erasable programmable read-only memory (EPROM or Flash memory) (electronic); an optical fiber (optical); and a portable compact disc read-only memory “CDROM” (optical). Note that the computer-readable medium may even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory. Additionally, it is appreciated by those skilled in the art that a signal-bearing medium may include carrier wave signals on propagated signals in telecommunication and/or network distributed systems. These propagated signals may be computer (i.e., machine) data signals embodied in the carrier wave signal. The computer/machine data signals may include data or software that is transported or interacts with the carrier wave signal.
While the foregoing descriptions refer to the use of a wide band equalization system in smaller enclosed spaces, such as a home theater or automobile, the subject matter is not limited to such use. Any electronic system or component that measures and processes signals produced in an audio or sound system that could benefit from the functionality provided by the components described above may be implemented as the elements of the invention.
Moreover, it will be understood that the foregoing description of numerous implementations has been presented for purposes of illustration and description. It is not exhaustive and does not limit the claimed inventions to the precise forms disclosed. Modifications and variations are possible in light of the above description or may be acquired from practicing the invention. The claims and their equivalents define the scope of the invention.
Number | Name | Date | Kind |
---|---|---|---|
5428687 | Willcocks et al. | Jun 1995 | A |
5625696 | Fosgate | Apr 1997 | A |
5642423 | Embree | Jun 1997 | A |
5671287 | Gerzon | Sep 1997 | A |
5742688 | Ogawa et al. | Apr 1998 | A |
6553121 | Matsuo et al. | Apr 2003 | B1 |
6697491 | Griesinger | Feb 2004 | B1 |
7107211 | Griesinger | Sep 2006 | B2 |
7257230 | Nagatani | Aug 2007 | B2 |
7443987 | Griesinger | Oct 2008 | B2 |
7447321 | Furge et al. | Nov 2008 | B2 |
7490044 | Kulkarni | Feb 2009 | B2 |
7526093 | Devantier et al. | Apr 2009 | B2 |
7787631 | Faller | Aug 2010 | B2 |
7822496 | Asada et al. | Oct 2010 | B2 |
20030039366 | Eid et al. | Feb 2003 | A1 |
20040086130 | Eid et al. | May 2004 | A1 |
20050031130 | Devantier et al. | Feb 2005 | A1 |
20060256969 | Takashima et al. | Nov 2006 | A1 |
20070110268 | Konagai et al. | May 2007 | A1 |
20070160219 | Jakka et al. | Jul 2007 | A1 |
20070223740 | Reams | Sep 2007 | A1 |
20070297519 | Thompson et al. | Dec 2007 | A1 |
20090154714 | Sugawara | Jun 2009 | A1 |
20090304213 | Cook et al. | Dec 2009 | A1 |
20100128880 | Scholz | May 2010 | A1 |
20100208900 | Amadu et al. | Aug 2010 | A1 |
20110051937 | Ma et al. | Mar 2011 | A1 |
20110081024 | Soulodre | Apr 2011 | A1 |
20110135098 | Kuhr et al. | Jun 2011 | A1 |
Number | Date | Country | |
---|---|---|---|
20090147975 A1 | Jun 2009 | US |