None.
None.
The present invention relates to systems for reproducing sound in general and to systems which can control sound production to localized regions in particular.
A typical sound system performs two functions: amplifying sound and reproducing sound with a given level of clarity or intelligibility within a particular room, auditorium, hall or other space. Sound may be either audio frequency sound, subsonic sound, or ultrasonic sound. The audio frequency sound falls within the range of 15 Hz to 20,000 Hz, the range generally of human hearing, with subsonic frequencies being those below 15 Hz, and ultrasonic frequencies being those above 20,000 Hz.
Recently new capabilities have led to research in sound systems which could have the potential to produce sound which is contained within a beam, or which is aimed at a particular point or listener. Such systems open up the possibility of providing different audio stimulus to different people occupying the same room, museum, or lecture hall. Such a system might also provide more realistic stereo without using headphones by providing a separate audio input to each ear of a listener.
One approach proposed by Joe Pompei while a student at MIT, involves generation of ultrasonic sound which distorts in a predictable way so that the distortions produce audio frequency sound. Starting with the desired audio frequency sound it is mathematically possible to predict the ultrasonic beam which will produce the desired audio frequency sound. By such means Pompei is able to generate an audio spotlight of sound.
Another proposed approach is to use an acoustic time-reversal mirror. Such systems have been developed by Mathias Fink at Ecole Superieure de Physique et de Chimie Industrielles de la Ville de Paris. A time-reversal mirror is a concept known from optics where it is known to be possible to construct a mirror which sends light reflected therefrom directly backwards so that the lightwaves appear to be reversed in time. Thus light emitted from a point when reflected in a phase conjugate, or time-reversal mirror, returns to the emitting point. To look into a time-reversal mirror is to see only the light emitted from the pupil which is gazing into the mirror. In a similar way, an acoustic time reversal mirror returns sound to the source that emitted the sound. This returning sound returns identically to that emitted, even if the path between the sound source and the time-reversal mirror involves many reflections, distortions, and dispersions. At least in theory, the time reversal process could be used to focus sound at a particular location so that different sounds would be heard by different people.
A wide variety of audio systems attempt to provide more realistic sound by providing an array of speakers which produce the effect that the sound appears to come from a particular direction or source and such systems are described in U.S. Pat. No. 5,521,981 to Gehring or U.S. Pat. No. 5,974,152 to Fujinami. More generally, any stereo, quadraphonic, or surround sound system uses multiple speakers to produce sound which is more realistic.
However, none of the foregoing systems has produced a cost-effective system for providing sound which can only be heard in a localized region. What is needed is an apparatus and method for producing audible sounds which are localized so that multiple listeners can be provided with unique audio input without the use of headphones.
The sound reproduction system of this invention employs an array of speakers to produce audio frequency sound. The speakers are fed from a single audio source, but each speaker transmits sound delayed by an amount which is related to the distance between a particular speaker and a selected point or region in space. In addition the amplitude of the sound output by each speaker may also be proportional to the distance between the particular speaker and a selected point or region in space. In one embodiment the audio output of each speaker can be below the audible threshold. An array of speakers is arranged on the ceiling or walls or even randomly distributed within the room. With such an arrangement the output from any given speaker cannot be heard. However, if each speaker has its output delayed such that even though the sound from each speaker in the array travels a different distance, the sound from all the speakers nevertheless reaches a single point or region in space at the same moment in time, the audio volume will be increased in proportion to the square of the number of speakers employed in the array. Thus, with a sufficient number of speakers producing inaudible volume levels of sound, at a particular region the sound will be readily audible. This simple technique allows audio frequency sound to be heard in only selected regions within the room or other auditory space.
This method achieves the result that the wave front from each speaker arrives at the target at the same time and roughly in phase. Due to superposition, the amplitudes of the wave-fronts will add algebraically. Sound intensity or volume is a function of the square of the signal amplitude, therefore very significant sound intensities at the target can be achieved for reasonable sized arrays.
A time varying audio stream of sound is digitized into a multiplicity of discrete digital samples similar to those used in digital recording systems such as those used in producing compact discs or digital audio tape.
The speed of sound in air, although varying with air temperature, is approximately 1,000 feet per second (fps). For a room having a maximum linear dimension of 40 feet, sound can travel from any speaker to any point in the room in 0.04 seconds. If sound is digitized at 44 kilohertz—the standard for most digital sound preproduction—approximately 1,760 sound samples are produced during 0.04 seconds.
If we now consider an array of speakers we can calculate the distance between each speaker in the array and a selected point in the room. A computer or similar device is used to continuously store 1,760 sequential sound samples, in 1760 storage registers, which are being incremented by one sample every 1/44,000 of second, corresponding to the sound sample interval. We can produce a digital sample stream for generating an audio drive signal for a particular speaker which is delayed by any amount within the 0.04 seconds, simply by reading out the value of a particular memory every sample cycle, i.e. every 1/44,000 of a second from the particular memory which corresponds to the selected time delay. The time delay is selected to correspond to the distance between a particular speaker and the point or region in space where it is desired to create a clearly audible sound. In particular, the distance between a particular speaker and the point or region in space is subtracted from the distance of the furthest speaker which has zero time delay and the result is divided by 1000 (the speed of sound) and multiplied by 44,000 which will give the number of the sound storage register, out of the 1760 sound storage registers, which will produce a digital output corresponding to an audio signal which has been delayed by a length of time such that all sound signals reach the same point at the same time.
It is an object of the present invention to provide a sound system which can produce one or more localized regions of audible sound.
It is another object of the present invention to produce a sound system where different persons within the same room can hear distinctly different soundtracks and cannot hear any soundtrack but the one directed at them.
It is a further object of the present invention to provide a sound system which does not produce clearly audible sound except within selected regions.
It is a yet further object of the present invention to provide a sound system where an array of speakers operating at a first sound level produces sound at selected points or regions in space at least an order of magnitude louder than the first sound level.
Further objects, features and advantages of the invention will be apparent from the following detailed description when taken in conjunction with the accompanying drawings.
Referring more particularly to
The apparent loudness of a sound is found by forming a ratio between a sound volume which is just perceptible to the human ear, and the sound in question. For convenience, this ratio is expressed as ten times the log of the ratio of sound intensity which is referred to as decibels or dB. A sound which has a power level of ten times a first sound seems to the human ear to be about twice as loud as the first sound. Thus if the sound intensity ratio is 20 dB the sound will sound four times as loud as the 0 dB sound.
Sound intensity or volume is a function of the square of the signal amplitude, therefore in an array containing 100 to 1,000 speakers, the sound from all of which is made to constructively interfere at a region in space, the interference will produce by the constructive addition of the sound from each speaker an increase in sound amplitude of ten thousand to one million times, theoretically achieving an increase of intensity at the target of 40 dB to 60 dB over the volume of any one speaker as heard from the same location.
The sound intensity at locations other than the target location is a function of sound addition which is not in phase, but rather randomly distributed in phase and timing, and is proportional to the number of speakers or transducers in the array rather than the square of the number of speakers or transducers. Thus, where the sound is not constructively added, the sound level will be 20 dB to 30 dB over the volume of any one speaker as heard from the same location. The resulting difference in sound intensity levels if each speaker were to be operated at a sound level of 0 dB or the threshold of audibility would be that at everywhere in the listening space except the target, the out-of-phase and unintelligible signal would have a sound volume of a faint whisper or the sound of rustling leaves while at the target a fully intelligible, focused and in-phase signal would have a sound intensity level nearly equal to that of normal conversation, to several times normal conversation level.
Thus the perceived sound intensity, depending on the number of speakers, will be approximately four to eight times as loud in the discrete regions 40 where constructive interference is occurring as outside the discrete regions 40. The addition of some white noise which decreases audibility for all sounds will result in distinct noticeable and intelligible sound only in discrete regions 40.
In a natural whispering chamber such as an ellipsoidal room having two foci, sound emitted at one focus at a sound level which cannot be heard a short distance away, will nevertheless be clearly audible at the second focus of the ellipsoidal room. In the whispering chamber, sound is reflected to a region in space where constructive addition forms an audible sound. Just as a localized region of audible sound can be produced by the natural reflective properties of an ellipsoidal room, which cause every first reflection of a sound from the first focus to reach the second focus at the same time, and add together to produce a localized region of audible sound, a similar effect can be achieved artificially with an array of speakers.
The audio speaker array 20 is driven to produce sound so that the sound from each speaker reaches a discrete region 40 in space at the same time. If many audio speakers were arranged facing inwardly on the surface of a sphere and the same sound signal is then broadcast through each speaker, a person standing at the center of the sphere will perceive a signal created by the constructive interference of all the speakers which is louder than the mere algebraic sum of the volume of each speaker.
If a single speaker located on the sphere surface is given a first sound level as heard from the center of the sphere, ten speakers at the same volume without constructive interference would normally sound twice as loud (10 dB). However, if the ten speakers are placed on the sphere's surface to result in constructive interference, a sound amplitude of ten squared (20 dB) will be perceived, or a sound four times as loud as the single speaker. Similarly, if 100 speakers are use, a sound amplitude of 100 squared (40 dB) will be perceived, a sound 16 times as loud as a single speaker will be perceived.
A flat array 20 such as shown in
The delay required for each speaker can be calculated by determining the distance Dmax between the speaker furthest from the common focus 40, and setting the delay for the furthest speaker equal to zero or a constant. The distance D1 for any particular speaker is determined between the particular speaker and the common focus 40, the delay in the sound being emitted from a particular speaker is equal to the maximum distance minus the particular distance divided by the speed of sound Vs.
(Dmax−D1)/Vs=Delay in seconds
As shown in
A time varying audio drive voltage with the correct delay for each speaker 50, 54, 56, 58 is created from a single audio source 70, as shown in
The maximum delay needed is proportional to the maximum difference in distance between any two speakers of of the array 20 and the target which is the discrete region 40. The speed of sound in air at room temperature is approximately 1,000 fps. For each speaker 22, the target distance is the distance between that speaker and the target 40. The maximum delay needed is the maximum difference between the target distances of any two speakers 22 divided by 1,000. The maximum difference in the target distances of any two speakers must necessarily be less than the maximum dimension of the speaker array 20, which must be less than the maximum dimension of the room in which the speakers are situated. The precise amount of delay necessary will of course be dependent on the position of the target 40 in space relative to the array 20. Because the location of the target 40 may be adjustable either in real-time, or at set intervals, the amount of delay may be simply taken as the maximum room dimension or the maximum array dimension, or by looking at a particular situation to determine the actual needed delay.
A digital sample can be simply thought of as the amplitude of an audio signal at a particular point in time. The amplitude of the audio signal is the sum of the amplitude of all audio frequencies present. As will be understood by those skilled in the art of audio frequency sound reproduction the sample frequency must be sufficiently higher than the highest frequency which it is desired to reproduce, and various analog/or digital filtering must be used to reduce the effects of digital sampling on signal quality.
The A/D converter 72 produces a standard digital signal comprised of a series of values corresponding to each timed sample of the audio signal taken each 1/44,000th of a second. The samples 74 are sent to the memory stack 76 which are incremented with the addition of each new sample, incrementing the previously stored samples to the next memory storage register 78. Each location in the memory stack will thus contain a digital word. The digital word will typically be a 16-bit word for standard sound quality, but could be of higher or lower precision. Because each additional storage register represents a time increment of 1/44,000th of a second, a storage register which contains a signal delayed by a selected amount can be determined by dividing the desired time delay by 44,000 to determine the address of the register which contains a signal with the selected delay. A computer or microprocessor 80 stores the sound samples and increments all the samples by 44,000 times a second. The total number of storage registers necessary depends on the total time delay needed which, as discussed above, is less than the maximum dimension of the speaker array 20. The total number of storage registers necessary is a product of the total time delay needed and the number of samples taken per second. The total delay time, as discussed above, is governed by the maximum dimension of the speaker array 20.
To provide an audio signal to each speaker, the computer controls a pointer 79 for each speaker which is directed to a particular memory register which is read out 44,000 times a second in synchronization with the memory registers being incremented. Each speaker 22 has a related pointer 79, and the computer 80 contains in memory all the pointers 79 which are collectively referred to as a pointer array. The end result is a digital signal with a selected delay which is which is converted to a time varying audio drive voltage, and applied to each individual speaker of the speaker array 20.
Referring to
To produce several sound regions 40 from a single array of speakers, a plurality of audio sources 94 are digitized by a plurality of A/D digital samplers 95, and storage in memory stacks 96 which are used to contained a multiplicity of sound samples 97 sequentially store from each audio source 94, to form an audio source signal. For a particular speaker 98, sound from each audio source signal will be delayed by a different amount so that the sound transmitted by each speaker 22 in the array 20 can contribute to the constructive interference of sound at a plurality of locations to create a plurality of sound regions each with their own soundtrack. Each particular speaker 98 is associated with a pointer 100 corresponding to each memory location which will produce a signal delay which will cause a particular signal to constructively interfere at a particular discrete region 40. Several such audio signals, with their characteristic delay may be increased or decreased in amplitude, by a volume control 108, which may be assembled digital multiplier. The amplitude adjusted audio signals are added with a digital adder 102 and converted to an analog signal with a digital-to-analog converter 104, then amplified by the amplifier 106. The signal associated with each particular set of pointers 100 is then sent to a particular speaker 98.
For the proper functioning of the sound reproduction system 36, each speaker must be connected to the output of the memory location which contains sound data with the proper delay value. If the discrete region in which it is desired to produce sound is at a fixed location, the fixed location with respect to the speaker array 20 can be used to solve for the necessary delay and thus each speaker can be connected to the memory location which produces sound with the proper delay. The set of data which is the proper delay for each speaker is a pointer matrix, and the value of the pointer matrix can be arrived at by calculation or empirically. If the focus of the array is progressively swept through the entire room, a microphone placed at the desired location 40 can readily detect when the progress of sweep has reached the desired location. Values contained in the pointer matrix when a test tone reaches a maximum volume at the test location can be saved and thereafter used to control each speaker time delay.
If it is desired to follow a person moving about a room with the discrete region 40 of increased sound volume, a tracking system may be employed to locate a wireless microphone placed on the person, preferably near the person's ear. The tracking system creates an interrogating target focus. The interrogating target focus is a sub-audible tone pattern that is localized in three-dimensional space and continuously scanned through the listening room. The Listeners wears the wireless microphone and the listener's location within the listening room is fixed in relation to the speaker array by sensing the time at which the signal from a given microphone reaches its maximum. Thus the pointer array is constantly updated with listener's current location.
It should be noted that this method reduces the computational load of the CPU since it eliminates the need to calculate the delays to be programmed into the the pointer matrix as the target locations are identified empirically as the interrogating tone is scanned through the room. The process of scanning a tone target is a simple matter of incrementing in a predetermined fashion the data elements of the the pointer matrix for the target focus to be scanned through the room.
An experiment was performed to test the sound system of the invention by fabricating a 9×9 array of speakers 10 inches on center. The eighty-one speaker emitter panel was constructed using one-quarter-inch thick pegboard. Low power, three-inch round speakers rated for 2 watts max were affixed to a mounting screw that extended from the back of each speaker co-located with the central axis of the cone of that speaker. The speakers were placed on a 10″ grid pattern and affixed to the peg board. A short length of PVC tubing was placed around the mounting screws so that the tubing would expand to fill the peg board hole as the screws was tightened. This arrangement served to acoustically isolate each speaker from one another as well as centered each speaker so that the axis of the cone of each speaker is normal to the plane of the emitter panel.
The speakers were wired to an 81 channel amplifier/phase delay apparatus. The apparatus use for producing individual delays for each of the 81 channels was composed of six digitally addressed multi-channel digital-to-analog converter printed circuit boards (D to A cards) and one digital signal processing printed circuit board (DSP card) that is capable of addressing each channel on each D to A card. The D to A cards and the DSP card were interconnected and powered via an ISA passive backplane of an IBM PC clone. The D to A cards as well as the DSP card were designed and laid out on an AT-ISA form factor to fit the passive backplane.
Each D to A card contained 16 channels of digital-to-analog converter circuitry. An 8 bit data byte was written to each individual channel. Individual channels were addressed by first enabling the entire board with a board enable signal that is generated by the DSP card. While a given card was enabled, a 4 bit digital address is driven on to the passive backplane bus. A unique address was generated by the 4 bits for each of the 16 channels on the D to A card. Latching the 8 bit data into an addressed channel was accomplished by driving the memory write signal to a logic low. The memory write signal was generated by the DSP card. Each channel of the D to A card was composed of an 8 bit data latch IC (74HC573); an 8 bit digital to analog converter IC (DAC08) that output a current that was proportional to a digital input value; and an operational amplifier configured to function as an analog current to voltage converter (LM741) sent its output voltage to an audio power amplifier (LM380). The audio power amplifier drove an individual speaker for that channel.
In addition to the circuits comprising the digital to analog converters, the D to A card also held the data bus receiver circuitry (74HC244), channel decoding circuitry 2 ea (74HC308) and logic NOR gates to combine the “memory write” signal with the channel select signal (output of the 74HC308) into a data latch enable signal used by the 74HC573.
The DSP card was comprised of Texas Instruments 32040 CODEC and a Texas Instruments 320C50 DSP. The CODEC contains the audio analog-to-digital converter that outputs 14 bit digital samples to the DSP to process. The operation of the system described to this point was synchronized to the sample rate of the CODEC.
The DSP performed the channel delay process by means of a first in, first out delay line. The amount of delay for any given channel was preprogrammed to produce the focus at the predetermined location in 3 dimensional space in front of the emitter panel. Each time the CODEC delivers a new sample to the DSP, it first updates the delay line, then it selects a sample for each digital-to-analog converter channel in the 81 channel array. The input latch to each channel was mapped into the memory map of the DSP. All 81 channels are written to, each with its own selected sample before the next new sample was delivered by the CODEC. The sample selected was determined from a delay pointer matrix that was a matrix of memory pointers that was preprogrammed into the DSP code at compile time. Each memory pointer in the matrix points to a specific address within the delay line. Each location in the delay line represents a specific amount of time delay that was equal to the amount of delay between samples multiplied by the number of memory locations the specific address was from the first location in the delay line. Thus the first location was the most recent sample. The length of the delay line used was 128 samples long. The apparatus allowed individual control of the delay of the audio signal to each speaker.
As discussed previously, eighty-one speakers should produce a sound intensity of approximately 19 dB above the volume of a single speaker, and when combined with time delays in accordance with the invention so that constructive interference is achieved for a selected region in space, a sound level of 38 dB above that produced by a single speaker should result, so that sound in a selected region should be 19 dB above the ambient sound levels. The data set produced below consists of sound level readings taken with a handheld meter which provided dB readings. The meter scale began at 40 dB, and meter readings were taken at 10 inch intervals on axis with the speakers. The sound delays were selected to produce a maximum volume at a region which was 60 inches in front of the speaker array and centered over the speaker which was in the fourth row from the top and sixth column from the left side. This reading, as indicated in the data set, was 59 dB. The ratings immediately surrounding the target point are 43, 42, 43, 43, 43, 42, 42, 43, and immediately in front of the target point 48, and immediately behind 46. And thus it is seen that the test apparatus produced a sound level which was approximately 16 dB above sound levels immediately adjacent to the target point, and generally at least 10 dB, above any other data point with the exception of a data point taken ten inches above a noisy speaker in this seventh row, six from the left, which was the result of a faulty amplifier driving a particular speaker.
It should be understood that the amplitude of each speaker maybe adjusted to improve the sound volume in a particular discrete region 40. Alternatively, speaker volume may be adjusted to reduce volume in a particular region where sound is unintentionally audible above background noise levels. Such regions might arise due to room acoustics, or speaker array geometry. Heuristic computer algorithms working with microphones placed between speakers in the array, or with microphones moved about a room could use fuzzy logic systems, or other systems of systematic or random variations to achieve a maximum volume of sound at discreet locations while simultaneously minimizing sound present outside of the discrete locations. By adjusting the variables of delay and volume of each speaker, a heuristic approach can be taken to find solutions which are better than those based on assumptions about room and speaker acoustics. Even moderately sized rooms can allow a new sound response to be established many times a second allowing thousands of iterations to be performed in a few minutes. These iterations may simultaneously be performed at multiple frequencies.
It should be understood that when, a means for applying a time varying audio drive voltage, i.e. the signal used to drive individual speakers is described as substantially identical, the signals are defined as being substantially identical, although they vary in amplitude. The term, substantially identical, means capable of constructive interference when used in the sound system 36 of this invention.
It should be understood that D class amplifiers which utilize pulse width modulation to drive audio speakers directly from line voltage could be used to drive the speakers 22 of the sound reproduction system 36. This approach eliminates the need for audio amplifiers and has greatly increased efficiency in converting electrical power into audio output. D class amplifiers thus require only a power source and the digital input to drive the speakers. As shown in
It should be understood that stereo sound without headphones could be produced by creating discrete regions in space 40 which are closely spaced and contained the left and right channels making up the stereo signal, so that when properly positioned, a person could hear stereo sound. Noise cancellation with the sound system 36 is also possible, particularly where the noise to be canceled can be predicted, either by monitoring the noise at its source, or because the noises is of periodic nature.
In should be understood that the speakers 22 are preferably mounted on the ceiling in part because this will minimize interference of objects and persons with the sound transmitted from the speakers to create the discrete regions in space 40. However the sound system 36 is inherently resistant to being blocked by objects and persons especially when the array 20 is spread over a wide area, so that sound reaches the discrete regions in space over a wide angle of convergence.
It should be understood that the sound system of this invention can provide distinct and controllable volume levels for different individuals in the same listening room. It should also be understood that the sound system of this invention can be used to create multilingual school rooms or auditoriums where listeners if properly equipped with a locating device or seated in the proper location can hear a presenter in his or her own language without the use of cumbersome headphones.
A sound system of this invention may also make possible having both edited and non-edited versions of motion picture film dialog presented to the same audience at the same time, or even different plot lines could be presented to different portions of the audience. Hands-free phone operation might be achieved in open office environments while still maintaining private conversation. Buildings so equipped could take advantage of listener tracking to automatically route telephone and intercom signals to the desired recipient without the need of a handset, or a public address system which is heard by all.
It should further be understood that the sound produced by the sound system of this invention is a ‘real image’ which actually comes from the location it appears to come from, creating many opportunities for sound reproduction and special effects of video games, Multimedia presentations and high fidelity music.
It is understood that the invention is not limited to the particular construction and arrangement of parts herein illustrated and described, but embraces such modified forms thereof as come within the scope of the following claims.
Number | Name | Date | Kind |
---|---|---|---|
4359944 | Stiennon | Nov 1982 | A |
4515997 | Stinger, Jr. | May 1985 | A |
4789801 | Lee | Dec 1988 | A |
5020155 | Griffin et al. | May 1991 | A |
5109416 | Croft | Apr 1992 | A |
5404406 | Fuchigami et al. | Apr 1995 | A |
5521981 | Gehring | May 1996 | A |
5596649 | Liu | Jan 1997 | A |
5668884 | Clair, Jr. et al. | Sep 1997 | A |
5781645 | Beale | Jul 1998 | A |
5889870 | Norris | Mar 1999 | A |
5949894 | Nelson et al. | Sep 1999 | A |
5974152 | Fujinami | Oct 1999 | A |
6016353 | Gunness | Jan 2000 | A |
6163613 | Cowans | Dec 2000 | A |
6263083 | Weinreich | Jul 2001 | B1 |
6373955 | Hooley | Apr 2002 | B1 |
6434239 | DeLuca | Aug 2002 | B1 |
20010007591 | Pompei | Jul 2001 | A1 |
20010043652 | Hooley | Nov 2001 | A1 |
20020012442 | Azima et al. | Jan 2002 | A1 |
20020106093 | Azima et al. | Aug 2002 | A1 |
20020131608 | Lobb et al. | Sep 2002 | A1 |
20030031333 | Cohen et al. | Feb 2003 | A1 |
Number | Date | Country | |
---|---|---|---|
20030185404 A1 | Oct 2003 | US |