An embodiment of the invention relates to generating audio beacons that may then used to for example determine the relative location and orientation of an audio emission device. Other embodiments are also described.
It is often useful to know the location/orientation of an audio capture device (e.g., a microphone array) relative to an audio emission device (e.g., a loudspeaker array). For example, this location/orientation information may be utilized for optimizing audio-visual content rendered by a computing device. Traditionally, location information may be determined using a set of audio beacons produced by the audio emission device and detected by the audio capture device. For example, an audio emission device may emit a set of beacon beams along with a set of intended/primary beams. The primary beams may represent channels for a piece of sound program content (e.g., a musical composition or a soundtrack for a movie) while the beacon beams are purely intended to be detected by the audio capture device for determining the spatial relationship between the audio capture device and the audio emission device.
However, the approach discussed above suffers from inefficiencies as beacon beams are separate and distinct from primary beams. Accordingly, extra processing overhead must be incurred by the audio emission device to produce these beacon beams.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.
An audio emission device and an audio capture device that may respectively emit and capture sound, within a listening area are described. In particular, the audio emission device may include a loudspeaker array, including a set of transducers, for emitting sound and the audio capture device may include one or more microphones (e.g., a standalone microphone, or a set of microphones in a microphone array) for capturing sound in a listening area.
Orthogonal test signals may be added into a set of modal sound patterns produced by the audio emission device, wherein the modal sound patterns are also weighted to produce a set of primary audio beams. The modal sound patterns may be extracted from sounds detected by the audio capture device based on the injected orthogonal test signals, such that the modal beam patterns operate as audio beacons.
In one embodiment, the audio emission device may produce a set of one or more primary audio beams in the listening area. Each of the primary audio beams may be formed by weighting a set of modal beam patterns. In one embodiment, separate orthogonal test signals may be injected into each modal beam pattern. Based on these separate orthogonal test signals, the individual modal beam patterns may be extracted from a detected sound signal produced by the audio capture device such that the contribution from each of these modal patterns in the detected sound signal may be determined. Utilizing the contributions from each modal beam pattern in the detected sound signal, the spatial relationship (e.g., distance and/or orientation/angle) between the audio emission device and the audio capture device may be determined. Accordingly, the modal beam patterns, which are used to generate the primary beams, may also be used as audio beacons.
As discussed above, by injecting orthogonal test signals into modal beam patterns, which are used to generate primary audio beams, the modal beam patterns may function as audio beacons. Accordingly, audio beacons that are separate from the primary audio beams do not need to be generated as instead the modal beam patterns that form the primary audio beams may be used as audio beacons for determining the relative position of the audio emission device relative to the audio capture device.
The above summary does not include an exhaustive list of all aspects of the present invention. It is contemplated that the invention includes all systems and methods that can be practiced from all suitable combinations of the various aspects summarized above, as well as those disclosed in the Detailed Description below and particularly pointed out in the claims filed with the application. Such combinations have particular advantages not specifically recited in the above summary.
The embodiments of the invention are illustrated by way of example and not by way of limitation in the figures of the accompanying drawings in which like references indicate similar elements. It should be noted that references to “an” or “one” embodiment of the invention in this disclosure are not necessarily to the same embodiment, and they mean at least one.
Several embodiments are described with reference to the appended drawings. While numerous details are set forth, it is understood that some embodiments of the invention may be practiced without these details. In other instances, well-known circuits, structures, and techniques have not been shown in detail so as not to obscure the understanding of this description.
As will be described in greater detail below, the audio emission device 101A may produce a set of primary audio beams in the listening area 103. Each of the primary audio beams may be formed by weighting a set of modal beam patterns. In one embodiment, separate orthogonal test signals may be injected into each modal beam pattern. Based on these separate orthogonal test signals, the individual modal beam patterns may be extracted from a detected sound signal produced by the audio capture device 101B such that the contribution from each of these modal patterns in the detected sound signal may be determined. Utilizing the contributions from each modal beam pattern in the detected sound signal, the spatial relationship (e.g., distance and orientation/angle) between the audio emission device 101A and the audio capture device 101B may be determined. Accordingly, as will be described in greater detail below, the modal beam patterns, which are used to generate the primary beams, may also be used as audio beacons for determining the spatial relationships between the audio emission device 101A and the audio capture device 101B.
As shown in
The audio emission device 101A may include a main system processor 201 and a memory unit 203. The processor 201 and memory unit 203 are generically used here to refer to any suitable combination of programmable data processing components and data storage that conduct the operations needed to implement the various functions and operations of the audio emission device 101A. The processor 201 may be a special purpose processor such as an application-specific integrated circuit (ASIC), a general purpose microprocessor, a field-programmable gate array (FPGA), a digital signal controller, or a set of hardware logic structures (e.g., filters, arithmetic logic units, and dedicated state machines) while the memory unit 203 may refer to microelectronic, non-volatile random access memory. An operating system may be stored in the memory unit 203, along with application programs specific to the various functions of the audio emission device 101A, which are to be run or executed by the processor 201 to perform the various functions of the audio emission device 101A. For example, the memory unit 203 may include a beam emission unit 205, which in conjunction with other hardware and software elements of the audio emission device 101A, emits a set of modal beam patterns into the listening area 103. As will be described in further detail below, these modal beam patterns (1) may be used for constructing one or more primary beam patterns where each primary beam pattern may be assigned, via beam input parameters, to a separate one or more channels of sound program content (e.g., each input channel of the sound program content may be assigned a separate primary beam, and the primary beam is decomposed into contributions from the modal beams) and (2) may be used as audio beacons for determining the spatial relationship between the audio capture device 101B and the audio emission device 101A.
As noted above, in one embodiment, the audio emission device 101A may include a loudspeaker array 105 for outputting sound into the listening area 103. As shown in
The transducers 107 may be any combination of full-range drivers, mid-range drivers, subwoofers, woofers, and tweeters. Each of the transducers 107 may use a lightweight diaphragm, or cone, connected to a rigid basket, or frame, via a flexible suspension that constrains a coil of wire (e.g., a voice coil) to move axially through a cylindrical magnetic gap. When an electrical audio signal is applied to the voice coil, a magnetic field is created by the electric current in the voice coil, making it a variable electromagnet. The coil and the transducers' 107 magnetic system interact, generating a mechanical force that causes the coil (and thus, the attached cone) to move back and forth, thereby reproducing sound under the control of the applied electrical audio signal coming from a source.
Each transducer 107 may be individually and separately driven to produce sound in response to a separate and discrete audio signals. By allowing the transducers 107 in the loudspeaker array 105 to be individually and separately driven according to different parameters and settings (including individual drive signal filters, which control delays, amplitude variations, and phase variations across the audio frequency range), the loudspeaker array 105 may produce numerous directivity patterns to simulate or better represent respective channels of sound program content. For example, the transducers 107 in the loudspeaker array 105 may be individually driven to produce a set of modal beam patterns as will be described in greater detail below.
In one embodiment, the audio emission device 101A may include a communications interface 207 for communicating with other components over one or more connections. For example, the communications interface 207 may be capable of communicating using Bluetooth, the IEEE 802.11x suite of standards, IEEE 802.3, cellular Global System for Mobile Communications (GSM) standards, cellular Code Division Multiple Access (CDMA) standards, and/or Long Term Evolution (LTE) standards. In one embodiment, the communications interface 207 facilitates the transmission/reception of video, audio, and/or other pieces of data.
Turning now to
The audio capture device 101B may include a main system processor 401 and a memory unit 403. Similar to the processor 201 and the memory unit 203, the processor 401 and the memory unit 403 are generically used here to refer to any suitable combination of programmable data processing components and data storage that conduct the operations needed to implement the various functions and operations of the audio capture device 101B. The processor 401 may be a special purpose processor such as an ASIC, a general purpose microprocessor, a FPGA, a digital signal controller, or a set of hardware logic structures (e.g., filters, arithmetic logic units, and dedicated state machines) while the memory unit 403 may refer to microelectronic, non-volatile random access memory. An operating system may be stored in the memory unit 403, along with application programs specific to the various functions of the audio capture device 101B, which are to be run or executed by the processor 401 to perform the various functions of the audio capture device 101B. For example, the memory unit 403 may include a sound detection unit 405 and an orientation determination unit 407. These units 405 and 407, in conjunction with other hardware and software elements of the audio capture device 101B, (1) detect/measure sounds in the listening area 103 (e.g., containing modal beam patterns produced by the audio emission device 101A), (2) extract/separate each of the modal beam patterns represented in a detected sound signal based on detected orthogonal test signals that had been injected into each modal pattern, and (3) determine the orientation of the audio capture device 101B in relation to the audio emission device 101A based on these modal sound patterns.
As noted above, in one embodiment, the audio capture device 101B may include one or more microphones 109. For example, the audio capture device 101B may include multiple microphones 109 arranged in a microphone array 111. Each of the microphones 109 in the audio capture device 101B may sense sounds and convert these sensed sounds into electrical signals. The microphones 109 may be any type of acoustic-to-electric transducer or sensor, including a MicroElectrical-Mechanical System (MEMS) microphone, a piezoelectric microphone, an electret condenser microphone, or a dynamic microphone. The microphones 109 may be used with various filters that can control gain and phase across a range of frequencies in the audible spectrum (including possible use of delays) to provide a range of polar patterns, such as cardioid, omnidirectional, and figure-eight. The generated polar, sound pickup patterns alter the direction and area of sound captured in the vicinity of the audio capture device 101B. In one embodiment, the polar patterns of the microphones 109 may vary continuously over time.
In one embodiment, the audio capture device 101B may include a communications interface 413 for communicating with other components over one or more connections. For example, similar to the communications interface 207, the communications interface 413 may be capable of communicating using Bluetooth, the IEEE 802.11x suite of standards, IEEE 802.3, cellular GSM standards, cellular CDMA standards, and/or LTE standards. In one embodiment, the communications interface 413 facilitates the transmission/reception of video, audio, and/or other pieces of data over one or more connections.
Turning now to
Each operation of the method 500 may be performed by one or more components of the audio emission device 101A, the audio capture device 101B, and/or another device. For example, one or more of the beam emission unit 205 of the audio emission device 101A and/or the sound detection unit 405 and the orientation determination unit 407 of the audio capture device 101B may be used for performing the various operations of the method 500. Although the units 205, 405, and 407 are described as software or instructions residing in the memory units 203 and 403, respectively, to be executed by the processors 201, 401, in other embodiments, the actions of the processors 201, 401 executing the units 205, 405, and 407 may be implemented by one or more hardwired logic structures, including digital filters, arithmetic logic units, and dedicated state machines.
The method 500 will be described in relation to the components shown in
Although the operations of the method 500 are shown and described in a particular order, in other embodiments the operations of the method 500 may be performed in a different order. For example, one or more of the operations may be performed concurrently or during overlapping time periods. Each operation of the method 500 will now be described below by way of example.
In one embodiment, the method 500 may commence at operation 501 with the receipt of a set of audio signals representing one or more channels for a piece of sound program content. For instance, the audio emission device 101A may receive N channels of audio, as shown in
At operation 503, the one or more audio channels may be processed using one or more filters. For example, as shown in
At operation 505, one or more beam inputs may be received describing desired characteristics for N primary beams that will be used for playing back the N channels, respectively. In other words, each primary beam is assigned to play back a separate one of the N input channels. For example, as shown in
The N audio channels may be represented in a matrix or a similar data structure. For example, samples from the N audio channels that have been processed by the FIR filters 6011-601N may be represented by the audio sample matrix X:
In the example audio sample matrix X, each component or value xi represents a discrete time division of audio channel i. In one embodiment, at operation 507 the audio matrix X may be processed (based on beam inputs received at operation 505) by a beam pattern matrix mixing unit 603, to produce a modal gain matrix. The modal gain matrix may be viewed as representing a number of weighted modal beam patterns. The beam pattern mixing unit 603 may regulate the shape and direction of beam patterns for each of the N audio channels, in view of the beam inputs received at operation 505 which describe desired characteristics for N primary beams. The primary beams as defined by the beam inputs (or beam input patterns) characterize how sound radiates from the transducers 107 in the loudspeaker array 105 and into the listening area 103 (once the transducers 107 are driven by their respective drive signals that have been generated in accordance with the primary beams). For example, a highly directed cardioid beam pattern (having high directivity index, DI) may emit a high degree of sound directly at a listener or another specified area while emitting relatively lower amounts of sound into other areas of the listening area 103, in general. In contrast, a lower directed beam pattern (having low DI, e.g., an omnidirectional beam pattern) may emit a more uniform amount of sound throughout the listening area 103 without special attention to a listener or any specified area.
For a loudspeaker array 105 with transducers 107 arranged in a circular, cylindrical, spherical, or otherwise curved manner, the radiation of sound may be represented by a set of frequency invariant beam pattern modes or bases. The beam pattern mixing unit 603 may represent or define a desired primary beam pattern in terms of (or as a weighted combination of) a set of two or more predefined, modal beam patterns. For instance, the predefined modal beam patterns may include an omnidirectional pattern (
M=2S+1
The beam pattern mixing unit 603 may define a set of weighting values for each of the N audio channels and each of the M predefined modal beam patterns. The weighting values define the amount of each of the N channels to apply to each of the M modal beam patterns, such that a desired, corresponding primary beam pattern, e.g., a separate primary beam for each of the N channels, may be generated by the loudspeaker array 105. In other words, the primary beam pattern is given as a combination of the so-weighted, M modal beam patterns. For example, through the setting of corresponding weighting values, an omnidirectional modal beam pattern may be mixed with a horizontal dipole modal beam pattern to yield a cardioid beam pattern directed at 90° as shown in
In one embodiment, the resulting combination of the predefined modal beam patterns may be non-proportional such that more of one modal beam pattern may be used in comparison to another modal beam pattern, to produce a desired beam pattern for an audio channel. In some embodiments, the weighting values defined by the beam pattern mixing unit 603 may be represented by any real numbers. For example, weighting values of
may be separately applied to a horizontal dipole modal beam pattern and a vertical dipole modal beam pattern, while a weighting value of one is applied to an omnidirectional modal beam pattern. The mixing of these three variably weighted modal beam patterns may yield a cardioid primary beam pattern directed at 270° as shown in
As described above, different weighting values may be used to apply different levels of each predefined modal beam pattern to generate a desired primary beam pattern, for a corresponding audio channel. In one embodiment, the beam pattern mixing unit 603 may use a beam pattern matrix Z that defines a primary beam pattern for each of the N audio channels in terms of weighting values applied to the predefined M modal beam patterns. For example, each entry a in the beam pattern matrix Z may correspond to a real number weighting value for a predefined modal beam pattern and a corresponding audio channel. For a set of M modal patterns and N audio channels, the beam pattern matrix ZM,N may be represented as:
As previously described, each of the weighting values α represents the level or degree a predefined modal beam pattern is to be applied to a corresponding audio channel. In the above example matrix ZM,N, each column represents the level or degree to which a respective one of the M predefined modal beam patterns will be applied, to a corresponding audio channel in the N received/retrieved audio channels. Each of the weighting values α may be based on the primary beam inputs received at operation 505.
The beam pattern mixing unit 603 may apply the beam pattern matrix Z to the N audio channels by multiplying the audio channel matrix X with the beam pattern matrix Z as shown below:
Multiplication of the beam pattern matrix Z and the audio channel matrix X yields a basis or modal gain matrix Y, as shown in the above equation. This multiplication may be repeatedly performed for each sample period of the N audio channels (each sample period having a new matrix XN) to yield a new modal gain matrix Y, for each sample period. Each component or value y in the modal gain matrix Y represents gains corresponding to the N audio channels that will be transmitted to corresponding modal filters 6071-607M, each of which represent a corresponding predefined modal beam pattern—see
In one embodiment, prior to feeding the modal gain matrix Y to the modal filters 6071-607M, operation 509 may mix orthogonal test signals into each modal beam pattern within the modal gain matrix Y, to generate an updated basis or modal gain matrix Y′. In some embodiments, the orthogonal test signals may be pseudorandom noise sequences, satisfying one or more of the standard tests for statistical randomness. For example, the orthogonal test signals may be generated using a linear shift register. In this embodiment, taps of the shift register would be set differently for each of the M modal beam patterns, thus ensuring that the M generated test signals are orthogonal to each other. In other embodiments, the orthogonal test signals may be highly or nearly orthogonal such that the dot product of each set of two orthogonal test signals is close to zero (i.e., within a threshold or tolerance amount from zero). There may be M orthogonal test signals, which may be binary sequences, where, as noted above, M is the number of modal beam patterns. The orthogonal test signals may be variable in duration or length (e.g., each may be 100 milliseconds to 3 seconds in duration).
Mixing may be performed at operation 509 using a mixer. The mixer 605 may be composed of any set of elements that combine two or more signals. In one embodiment, the mixer 605 may include a resistor network, buffer amplifiers, transistors, diodes, and/or other related components. In one embodiment, the modal/basis gain matrix Y may be combined with a matrix P of orthogonal test signals p1, p2, . . . pm (or PSN1, PSN2, . . . PSNM as depicted in
In the equation above, each of the modal gains yi may be combined with corresponding orthogonal test signals pi to yield an updated modal gain value yi′ (forming a matrix Y′ that is composed of updated modal gain values.)
As noted above, following mixing of an orthogonal test signal with each of the M modal gains at operation 509, the updated modal gain matrix Y′ may be processed by corresponding modal/basis filters 607 at operation 511, to produce a filtered modal/basis gain matrix. In one embodiment, each of the M modal filters 607 may compensate for radiation inefficiencies of sound at low frequencies, for each corresponding modal beam pattern. In particular, higher order modal beam patterns (and/or modal beam patterns with higher DI) may be more difficult to accurately produce at lower frequencies, and requiring stronger drive signals (e.g., high voltage) to produce. Specifically, lower frequency sounds tend to diffuse into the listening area 103 instead of forming directed patterns. To compensate for these inefficiencies, the M modal filters 607 may be linear digital filters that set their frequency responses to provide the needed boost at low frequencies. For instance, a modal filter 607i for a particular predefined modal beam pattern i may boost the output power of its input signal below a roll-off or cut-off frequency for the modal beam pattern i (e.g., the frequency at which the power of the signal for the modal beam pattern has dropped by one-half). Compensating for inefficiencies in modal beam patterns allows the modal beam patterns to be effectively and efficiently used at lower frequencies to produce more complex beam patterns (e.g., higher order patterns and/or beam patterns with higher directivity indices). In some embodiments, these M modal filters 607 may be affected by the diameter of the cabinet of the loudspeaker array 105. In particular, the farthest distance between two of the transducers 107, e.g., two transducers that are on opposing sides of the cabinet, which may be defined by a diameter of a circular cabinet, may affect the efficiencies and shape of sound produced by sets of transducers 107. Thus, the settings for a particular modal filter 607i may be adjusted according to the dimensions of the cabinet.
Still referring to
The modal decomposition unit 611 may determine how each transducer 107 in the loudspeaker array 105 is to be driven, so that the array 105 as a whole produces each of the primary beams. For example, to produce an omnidirectional modal beam pattern, each of the transducers 107 in the loudspeaker array 105 may be driven using the same driving signal (no relative delays, no relative gain differences). In contrast, a dipole modal beam pattern may require driving different sets of transducers 107 with driving signals that have varied weights (to achieve relative delay and/or relative gain differences.) In one embodiment, the modal decomposition unit 611 may include a modal decomposition matrix T that includes real numbers defining weights for each of the M modal beam patterns, that correspond to each of the D transducers 107 in the loudspeaker array 105. The modal decomposition matrix may be a matrix of real numbers representing assignment levels for each modal beam pattern to each transducer in the loudspeaker array, such that the transducers in the loudspeaker array produce each of the predefined modal patterns based on the weights represented in the beam pattern mixing matrix. The modal decomposition matrix T may be represented as:
In this example matrix T, each column represents a predefined modal beam pattern, while each row represents a transducer 107 in the loudspeaker array 105. Each of the weights βi,j in the modal decomposition matrix T may be applied to the modal amplitudes q in the modal amplitude matrix Q to create drive signals for each transducer 107 in the loudspeaker array 105. For example, the below sample modal decomposition matrix T defines weighting values for four modal beam patterns (four columns in the matrix) and eight transducers 107 (eight rows in the matrix) in a loudspeaker array 105:
The weights β may be chosen to represent the arrangement of the transducers 107 in the loudspeaker array 105. For example, as shown in
To generate a set of driving signals for the transducer 107, respectively, the modal amplitude matrix Q received from the modal filters 607 may be multiplied with the modal decomposition matrix T as shown below:
The resulting driving signal matrix R includes a separate driving signal ri for each of the D transducers 107. By multiplying the modal amplitude matrix Q with the modal decomposition matrix T, each of the driving signals ri includes a weighted component of each predefined modal beam pattern. In this manner, the transducers 107 may be driven to produce the desired N primary beams, for the N audio channels, by using appropriate components from each of the predefined, M modal beam patterns. And since the modal beam patterns also include respective orthogonal test signals, the modal beam patterns here may be used as audio beacons, as will be described further below.
At operation 515, the driving signals r produced by the modal decomposition unit 611 may be output to power amplifiers for driving corresponding transducers 107 in the loudspeaker array 105. Accordingly, the loudspeaker array 105 produces in the listening area 103 the primary beam patterns, which have been defined by the beam inputs received at operation 505, and in part as a result of the relative weights that were applied to the modal beam patterns by the decomposition unit 611. Since each of the modal beam patterns effectively included injected orthogonal test signals, these orthogonal test signals are also projected into the listening area 103 (by the audio emission device 101A).
At operation 517, the audio capture device 101B may capture the sound that is being produced by the audio emission device 101A (within the listening area 103), using the sound detection unit 405 and the microphones 109—see
As discussed above, by injecting orthogonal test signals into a process in which modal beam patterns are used to generate primary audio beams, the modal beam patterns may effectively function as audio beacons. In particular, the orthogonal test signals may be detected by the audio capture device 101B and analyzed to determine the relative position of the audio emission device 101A relative to the audio capture device 101B. Accordingly, audio beacons that are separate from the primary audio beams do not need to be generated, as instead the modal beam patterns that form the primary audio beams may be used as audio beacons, for determining the relative position of the audio emission device 101A relative to the audio capture device 101B.
As explained above, an embodiment of the invention may be an article of manufacture in which a machine-readable medium (such as microelectronic memory) has stored thereon instructions which program one or more data processing components (generically referred to here as a “processor”) to perform the operations described above including the digital signal processing tasks of the audio emission device recited in operations 507, 509, 511, and 513 of
While certain embodiments have been described and shown in the accompanying drawings, it is to be understood that such embodiments are merely illustrative of and not restrictive on the broad invention, and that the invention is not limited to the specific constructions and arrangements shown and described, since various other modifications may occur to those of ordinary skill in the art. The description is thus to be regarded as illustrative instead of limiting.
This non-provisional application claims the benefit of the earlier filing date of U.S. Provisional Application No. 62/105,671 filed Jan. 20, 2015.
Number | Name | Date | Kind |
---|---|---|---|
20040151325 | Hooley | Aug 2004 | A1 |
20050207592 | Sporer | Sep 2005 | A1 |
20080267413 | Faller | Oct 2008 | A1 |
20130223658 | Betlehem | Aug 2013 | A1 |
Number | Date | Country | |
---|---|---|---|
62105671 | Jan 2015 | US |