Aspects disclosed herein generally relate to an apparatus, system, and/or method for noise-robust time-frequency masking-based direction of arrival estimation for loudspeaker audio calibration. These aspects and others will be discussed in more detail herein.
Various loudspeaker manufacturers or providers may bring together various loudspeaker categories to form one ecosystem. In this regard, various loudspeakers communicate or work with one another and/or with a mobile device. Therefore, such loudspeakers can achieve higher audio quality using immersive sound. Information related to the locations of the loudspeakers may be needed for immersive sound generation. Hence, auto-calibration may be needed before the loudspeakers can generate immersive sound.
In at least one embodiment, an audio system is provided that includes a loudspeaker, at least one microphone, and at least one controller. The loudspeaker transmits an audio signal into a listening environment defined by at least one wall in a room. The at least one microphone is positioned on the loudspeaker and is configured to capture a reverberated audio signal including a plurality of reverberations and a plurality of peaks. The reverberated audio signal is indicative of the audio signal being reflected from the at least one wall. The at least one controller is programmed to apply a confidence score to the plurality of peaks to obtain a maximum score which is indicative of a maximum peak of the audio signal that is reflected from the at least one wall and determine a distance between the loudspeaker and the wall based at least on the maximum score.
In at least another embodiment, a method is included. The method includes transmitting, via loudspeaker, an audio signal into a listening environment defined by at least one wall in a room and capturing a reverberated audio signal including a plurality of reverberations via at least one microphone, wherein the reverberated audio signal is indicative of the audio signal being reflected from the at least one wall. The method further includes applying a confidence score to the plurality of peaks to obtain a maximum score which is indicative of a maximum peak of the audio signal that is reflected from the at least one wall and determining a distance between the loudspeaker and the wall based at least on the maximum score.
In at least another embodiment, a computer-program product embodied in a non-transitory computer readable medium stored in memory that is programmed and executable by at least one controller in an audio system is provided. The computer-program product includes instructions to transmit an audio signal, via loudspeaker, into a listening environment defined by at least one vertical surface in a room and to capture a reverberated audio signal including a plurality of reverberations, wherein the reverberated audio signal is indicative of the audio signal being reflected from the at least one vertical surface. The computer-program product further includes instructions to apply a confidence score to the plurality of peaks to obtain a maximum score which is indicative of a maximum peak of the audio signal that is reflected from the at least one vertical surface and determine a distance between the loudspeaker and the at least one vertical surface based at least on the maximum score.
The embodiments of the present disclosure are pointed out with particularity in the appended claims. However, other features of the various embodiments will become more apparent and will be best understood by referring to the following detailed description in conjunction with the accompany drawings in which:
As required, detailed embodiments of the present invention are disclosed herein; however, it is to be understood that the disclosed embodiments are merely exemplary of the invention that may be embodied in various and alternative forms. The figures are not necessarily to scale; some features may be exaggerated or minimized to show details of particular components. Therefore, specific structural and functional details disclosed herein are not to be interpreted as limiting, but merely as a representative basis for teaching one skilled in the art to variously employ the present invention.
Loudspeakers are used to generate immersive sound effects. One aspect for immersive sound the need for auto-calibration to be performed to localize a position for the loudspeakers. One method for performing loudspeaker localization includes estimating an azimuth of the loudspeakers, which is also known as the direction of arrival estimation (DOA). The performance of DOA methods may be problematic for a low signal to noise ratio (SNR), i.e., below 0 dB, since noise is a dominating signal for low SNR conditions. Also, noise may not be avoided for auto-calibration stage in realistic scenarios. Therefore, the noise-robust DOA estimation method is needed for the auto-calibration stage. The disclosed system and/or method utilize time-frequency (TF) masking, which may be used for source separation, as a preprocessing step for the DOA estimation method to achieve high performance under low SNR. TF masking may extract a desired signal from a noisy signal that is captured by microphones. Aspects provided herein also provide a signature signal which maximizes performance under low SNR conditions. The embodiment disclosed herein provides a TF masking-based DOA estimation using at least two microphones and a signature signal design that may be played back by the loudspeaker.
As noted above, auto calibration is generally required for immersive sound generation for loudspeakers. A failure in the auto-calibration phase can cause negative feedback from listeners. Also, background noise is not avoidable as the environment cannot be controlled in the auto-calibration stage. Hence, noise-robust auto calibration is desirable for immersive sound generation using multiple loudspeakers. The disclosed embodiments provide noise-robust auto calibration to provide immersive sound generation. In addition, the disclosed system generally provides an accurate DOA estimation under low signal to noise ratio and reverberation for loudspeaker auto calibration. These aspects enable immersive sound generation and microphone array calibration. In addition, the disclosed system may accurately estimate the DOA for corner cases such as two loudspeakers are on, for example, a same line but not aiming at one another.
One manner in which auto-calibration of loudspeakers may involve estimating an angle of the loudspeakers, which is also known as the DOA. There are many techniques that estimate the DOA of talker/loudspeaker, such as time difference of arrival (TDOA), multiple signal classification (MUSIC), and steered response power (SRP). While the TDOA method has not provided satisfactory performance for low signal to noise ratio (SNR), MUSIC and SRP require a high number of microphones for high performance under low SNR. Even MUSIC and SRP methods perform below the requirement for low SNR conditions (i.e., −10 dB babble noise). The disclosed system provides a signature tone in the form of an inverse exponential sine sweep (ESS) signal which has been discovered to, among other things, provide an indication to a controller to in initiate loudspeaker autocalibration in noisy environments such as −10 dB. Other types of signature tones that do not utilize an ESS based signal, may not be perceivable to the controller in these types of noise environments.
At least one of the loudspeakers 102 transmits an audio signal including a signature tone 104 into a listening environment 151 to the other loudspeakers 102 in the system 100. It is recognized that the loudspeaker 102 generally includes at least two of the microphones 106a-106b. The loudspeaker 102 may transmit an audio signal including the signature tone 104 into the listening environment 151. The microphones 106a-106b positioned on a different loudspeaker 102 captures the audio signal including the signature tone 104. Each loudspeaker 102a and 102b includes memory 130. The memory 130 of the loudspeaker 102b stores the audio signal and the corresponding signature tone (or signature frame) 104 for processing.
As noted above, the TF masking block 108, the signature frame detection 110, the GCC PHAT block 112, and the controller 122 are implemented in all of the loudspeakers 102 that are present in the system 100. Assuming for example that the system 100 includes four loudspeakers 102, a first loudspeaker 102 receives the audio signal and corresponding signature tone 104 from the other loudspeakers 102. Thus, in the regard, each loudspeaker 102 estimates the direction of arrival (DOA) of the audio signals received from the three other loudspeakers 102. The mobile device 150 includes one or more transceivers 155 to wirelessly receive the DOA estimations from each of the loudspeakers 102 in the system 100. It is also recognized that each of the loudspeakers 102 in the system 100 may also include one or more transceivers 152 to wirelessly transmit the estimated DOA information to the mobile device 150.
In general, the TF masking block 108 in the loudspeaker 102 reduces a noise effect associated with the captured audio signal as received from the other loudspeakers 102 in the system 100. For example, the controller 122 applies the TF masking block 108 to each microphone input to reduce the noise effect. The signature frame detection block 110 estimates the signature tone 104 after the TF masking block 108 reduces the noise effect. In one example, the length of the signature tone 104 may be 200 msec. However, the loudspeaker 102 records the received audio, for example, for more than 200 msec since the loudspeaker 102 does not have knowledge of when the signature tone 104 is being played by the other loudspeaker 102. It may be assumed that the loudspeaker 102 may be in a recording mode while the other loudspeaker 102 transmits the signature tone 104. It is generally desirable to detect the signature tone 104 for a long enough duration to correctly estimate the DOA. Receipt of the signature tone 104 on the audio signal may be indicative to the receiving loudspeaker 102 that the system 100 may be in autocalibration mode. In the autocalibration mode, the loudspeakers 102 may transmit information corresponding to the location of these loudspeakers 102 relative to the mobile device 150 (or other audio source).
The controller 122 applies cross-correlation between signature tone 104, which is played by the transmitting loudspeaker 102 and the acquired audio. The cross-correlation, performed by the GCC PHAT block 112 provides the location of the signature tone 104 in a long recording. In this regard, the controller 122 utilizes this location to extract the signature tone 104. At this point, the extracted signature tone 104 is provided to the GCC-PHAT block 112. The controller 122 may then utilize the estimated DOA to perform auto-calibration of the loudspeaker 102b. These aspects will be discussed in more detail below. In reference back to the TF masking block 108, the controller 122 applies the TF masking operation as a pre-processing step for the DOA estimation. The TF masking block 108 may eliminate the most noise-dominated T-F bins in the audio signal to minimize the effects of noises and reverberations. A noisy input audio signal including the signature tone 104 is generally shown at 200 in connection with the
Referring back to
Reference to equation 1 may be found, for example, in “The Optimal Ratio Time-Frequency Mask for Speech Separation in Terms of Signal-to-Noise Ratio”, The Journal of the Acoustical Society of America 134, no. 5 (2013): EL452-EL458. While S(t, f) is the frequency response of the signature signal (or the signature tone 104), N(t, f) represents a noise spectrum and β is the smoothing factor. Since the overall knowledge of the signature tone 104, S(t, f) can be calculated. The denominator in equation (1) may be the captured signal at the microphones 106a-106b. After the controller 122 calculates the mask, the enhanced signal can be calculated using the multiplication of the captured signal with the mask as in equation (2).
E(t, f) represents the enhanced signal which is a two-channel signal given that the two microphones 106a and 106b of the receiving loudspeaker 102 each receive the incoming audio signal including the signature tone 104. Y(t, f) corresponds to the captured signal at the microphones 106a-106b. The enhanced signal may correspond to the signal as generally shown at 202 in
Referring to
The GCC PHAT block 112 may utilize a single-path wave propagation of sound waves from a single sound source signal s(n) that is provided by a sound source (or any one of the loudspeakers 102). The microphones 106a and 106b receive the signal s(n) as received signals x1(n) and x2(n) that are delayed and attenuated versions of the original sound signal s(n). In general, the controller 122 may determine a time delay between the received signals x1(n) and x2(n) by finding a max of cross correlation of x1(n) and x2(n) The controller 112 performs cross-correlation by executing the following equations:
The sample delay {circumflex over (η)} is estimated using equation 3-7 in the GCC PHAT block 112. Equation 3 represents the cross-correlation between x1(n) and x2(n). Equation 4 is the cross-power density, which is obtained by taking the product of frequency response of x1(n) and x2(n). Equation 5 illustrates the PHAT processor (of the GCC PHAT block 112). The inverse Fourier transform is applied to obtain the cross-correlation between x1(n) and x2(n) as shown in equation 6. Finally, the sample delay {circumflex over (η)} is calculated by finding a max of cross correlation of x1(n) and x2(n) in equation 7.
At that point, the controller 122 may determine the DOA of the received audio signal or the angle of the sound source 102a (or first loudspeaker 102a). For example, the controller 122 may determine the DOA (or angle information, “angle”) for the audio signal as received as the receiving loudspeaker 102 by the following:
where {circumflex over (η)} is the estimate of the sample delay as noted above, c is a speed of sound, and d is a distance between the microphones 106a and 106b which is a known value. The GCC Phat block 112 estimates a phase difference between the audio captured between the microphones 106a and 106b. Thus, the phase difference generally corresponds to {circumflex over (θ)} (or angle information) as set forth in equation 8. The controller 122 utilizes, among other things, an inverse cosine to convert the phase difference to an enable as set forth in equation 8. The manner in which the controller 122 determines the sample delay {circumflex over (η)} is shown in
As noted above, the loudspeakers 102 in the system 100 are configured to communicate with one another. Each of the first and the second loudspeakers 102a, 102b may provide high audio quality while utilizing immersive sound. The immerse sound technology depends on the locations of the first and the second loudspeakers 102a, 102b. Thus, in this regard, the immersive sound technology requires an auto loudspeaker calibration process.
There are various ways to perform auto-calibration. One way to perform auto-calibration entails providing an estimate of an azimuth of the loudspeaker, also known as the DOAs. The DOA for an audio signal transmitted from each loudspeaker can be detected by playing the signature tone from one speaker at a time. The angles (or DOAs) from the different speakers are then used to create the speaker configuration in the room. In some cases, obtaining the estimate of the azimuth may be erroneous due to environmental conditions and locations of the loudspeakers. Such errors may occur primarily when the loudspeakers are not aimed at one another (e.g., due to loudspeaker directivity), and the background noise has more energy than the signature tone. Since these aspects may occur in real-world scenarios, auto-calibration technology implemented in the loudspeakers should address these scenarios. The system 100 as disclosed herein provides multiple DOA estimations for optimization loudspeaker location and estimating the loudspeaker layout configuration for two or more loudspeakers. The system 100 also provides an accurate representation of the loudspeaker configuration which is required for true immersive experience. The disclosed embodiments may increase robustness and overcome the above noted environmental conditions. In addition, the disclosed embodiments may provide (i) an accurate loudspeaker configuration estimation, (ii) loudspeaker orientation estimation, (iii) detection of DOA estimation outliers while taking into account background noise, reverberation, and obstruction, and (iv) optimizing the loudspeaker configuration estimation based on previous DOA estimations and outlier detection.
Referring back to
In operation 602, the microphone orientation estimation block 116 estimates an orientation for the microphones 106a and 106b. This operation will be discussed in more detail in connection with
In operation 604, the outlier detection block 118 detects outliers that may be present in the matrix formed by the matrix block 114 with respect to the DOAs. This operation will be discussed in more detail in connection with
In operation 606, the optimization block 120 performs a reference microphone selection. This operation will be discussed in more detail in connection with
In operation 608, the optimization block 120 performs an initial layout estimation using DOA estimations. This operation will be discussed in more detail in connection with
In operation 610, the optimization block 120 calculates candidate coordinate estimations. This operation will be discussed in more detail in connection with
In operation 612, the optimization block 120 selects best coordinates. This operation This operation will be discussed in more detail connection with
The first, second, third, and fourth loudspeakers 102a, 102b, 102c, and 102d wirelessly communicate with one another via the transceivers 152 and/or with the mobile device 150 to provide the loudspeaker layout in a listening environment 151. In particular, the mobile device 150 may provide a layout of the various loudspeakers 102a, 102b, 102c, and 102d as arranged in the listening environment 151. Generally, the particular layout of the loudspeaker 102a-102d may not be known relative to one another and aspects set forth herein may determine the particular layout of the loudspeakers 102a-102d in the listening environment 151. Once the layout of the loudspeakers 102a-102d is known, the mobile device 150 may assign channels to the loudspeakers 102a-102d in a deterministic way based on the prestored or predetermined system configurations.
The mobile device 150 may display the layout of the first, second, third, and fourth loudspeakers 102a, 102b, 102c, and 102d based on information received from such devices. In one example, the first, second, third, and fourth loudspeakers 102a, 102b, 102c, and 102d may wirelessly transmit DOA estimations, microphone orientation estimation information, outlier information, reference loudspeaker selection information, initial loudspeaker layout estimation, candidate coordinate estimation information, and best coordinate selection information as set forth in the method 600 to one another via the transceivers 152 and/or with the mobile device 150.
A legend 702 is provided that illustrates various angles of positions of the microphones 106a-106b on one loudspeaker 102 relative to microphones 106a-106b on other the loudspeakers 102a, 102b, 102c, and 102d. Reference will be made to the legend 702 in describing the various operations of the method 600 below. The first, third, and fourth loudspeakers 102a, 102c, and 102d illustrate that their respective microphones 106a-106b are arranged horizontally on such loudspeakers 102a, 102c, and 102d. The second loudspeaker 102b illustrates that the microphones 106a-106b are arranged vertically on the second loudspeaker 102b. It is recognized that prior to the loudspeaker layout being determined, the arrangement of the microphones 106a-106b is not known and that the arrangement of the microphones 106a-106b may be arranged in any number of configurations on the loudspeakers 102a-102d in the listening environment 151. The disclosed system 100 and method 600 are configured to determine the loudspeaker configuration layout while taking into account the different configurations of microphones 106a-106b.
Referring to the first loudspeaker 102a and further in reference to the legend 702, the first loudspeaker 102a is capturing audio (or detecting audio) from the second loudspeaker 102b at 0 degrees. The first loudspeaker 102a is capturing audio (or detecting audio) from the third loudspeaker 102c at 45 degrees. The first loudspeaker 102a is capturing audio from the fourth loudspeaker 102d at an angle 90 degrees. The angle (or angle information) at which the remaining loudspeakers 102b-102d are receiving audio relative to the other loudspeakers 102a-102d are illustrated in
The mobile device 150 generally stores information corresponding to the angle information depicted in the first matrix 800. The first column as shown by the dashed box as illustrated in the first matrix 800 corresponds to the particular loudspeaker that is receiving audio from the loudspeakers S1-S4 as illustrated in columns 2-5, respectively. For example, in reference to the first column and second row, the second loudspeaker (e.g., or S2) 102b receives audio from the first loudspeaker (e.g., or S1) 102a (as shown in the second column) at an angle of 90 degrees, the second loudspeaker 102b receives audio from the third loudspeaker 102c at an angle of 0 degrees, the second loudspeaker 102b receives audio from the fourth loudspeaker 102d (or S4) at an angle of 45 degrees. In reference to the first column and the third row, the third loudspeaker 102c (e.g., or S3) receives audio from the first loudspeaker 102a (e.g., or S1) at an angle of 45 degrees, and the third loudspeaker 102c (e.g., or S3) receives audio from the fourth loudspeaker 102d (e.g., or S4) at an angle of 0 degrees. In reference to the first column and the fourth row, the fourth loudspeaker 102d receives audio from the first loudspeaker 102a (or S1) at an angle of 90 degrees, the fourth loudspeaker 102d receives audio from the second loudspeaker 102b (or S2) at an angle of 135 degrees, and the fourth loudspeaker 102d receives audio from the third loudspeaker 102c (or S3) at an angle of 180 degrees.
Referring to
For example, the mobile device 150 may determine whether the difference in angles between the first, second, third, and fourth loudspeaker 102a, 102b, 102c, and 102d as illustrated in the first matrix 800 correspond to one or more predetermined values (e.g., 0 or 180). In the event the difference between the angles for the first, second, third, and fourth loudspeakers 102a, 102b, 102c, and 102d correspond to the one or more predetermined values, then the mobile device 150 may determine that the microphones 106a-106b for the two or more loudspeakers 102a, 102b, 102c, 102d are in the same orientation. In the event the difference between the angles for the first, second, third, and fourth loudspeakers 102a, 102b, 102c, and 102d does not correspond to the one or more predetermined values, then the mobile device 150 may determine that the microphones 106a-106b are not in the same orientation for the two or more loudspeakers 102a, 102b, 102c, 102d.
In reference to the first matrix 800 as illustrated in
In general, the mobile device 150 subtracts the angle from the first column from the first row to perform the microphone orientation estimation. Then the subtraction operation is performed, the result is [0, 90, 0, 0] for the first loudspeaker 102a (or S1), the second loudspeaker 102b (or S2), the third loudspeaker 102c (or S3), and the fourth loudspeaker 102d (or S4). Therefore, the microphone estimation for the third loudspeaker 102c (S3) and the fourth loudspeaker 102d (S4) is 0, which is the same orientation as the first loudspeaker 102. The mobile device 150 may also perform the microphone orientation with a modulo operation after the subtraction operation is performed since the angle range should be [0, 180] as identified in the legend 702 of
Referring to
Referring to
Referring to
Referring now to
For example, the mobile device 150 may calculate the distance coordinates for the second loudspeaker 102b, the third loudspeaker 102c, and the fourth loudspeaker 102d relative to the first loudspeaker 102a based on equations 10, 11, and 12, respectively:
Equation 10 as shown above corresponds to the distance coordinates of the second loudspeaker 102b relative to the first loudspeaker 102a, where the angle of 0 is inserted into equation 5 and taken from the first row (i.e., S1) and second column (i.e., S2) from the first matrix 800. Equation 11 as shown above corresponds to the distance coordinates of the third loudspeaker 102c relative to the first loudspeaker 102a, where the angle of 45 is inserted into equation 8 and taken from the first row (i.e., S1) and second column (i.e., S3) from the first matrix 800. Equation 11 as shown above corresponds to the distance coordinates of the third loudspeaker 102c relative to the first loudspeaker 102a, where the angle of 90 is inserted into equation 12 and taken from the first row (i.e., S1) and third column (i.e., S3) from the first matrix 800.
Referring to
Referring to
where {circumflex over (θ)}iC is the angle calculated by using candidate x and y coordinates for an ith loudspeaker and C corresponds to an index for candidates. The mobile device 150 selects candidate points that minimize an error. The calibrated DOA matrix 800 is set forth above is used as DOAij in the above equation.
The mobile device 150 determines the error for the third loudspeaker and the first loudspeaker 102c and 102a, respectively based on equation 13:
Similar, the mobile device 150 determines the error for the third loudspeaker and the second loudspeaker 102c, and 102b, respectively also based on equation 10:
As exhibited above, the first, second, third and fourth loudspeakers 102a-102d generally from a series of products all of which are equipped with microphones 106a-106b mounted thereon. The microphones 106a-106b for each loudspeaker 102 provide an ability to detect the location of an audio source (e.g., the mobile device 150) with respect to any nearby wall. However, since the microphones 106a-106b may be in a linear arrangement when packaged on a corresponding loudspeaker 102, the microphones 106a-106b may lack the ability to discriminate the audio source that is in a front or rear of the loudspeaker based on using a line between the microphones 106a-106b as the line of symmetry. Detecting a wall or barrier in one of the directions may eliminate the symmetry limitation.
Also, if a loudspeaker is placed too closed to a wall or to a corner, it may not be possible to detect the loudspeaker. The disclosed system may detect if a loudspeaker is placed too close to the wall and to automatically correct for the loudspeaker being positioned to close to the wall to ensure the desired sound field is transmitted in the room (or the listening environment 151). In general, loudspeaker close to the wall can have effects of +/−3 dB at low frequencies. Also, the disclosed system and method may be used for an improved audio upmix. Aspects disclosed herein may provide, for example, a circular microphone array having six microphones capable of detecting all surrounding walls using the disclosed method. At that point, the disclosed method may determine whether a left or right wall is the surrounding wall to the microphone by comparing the proximity to the walls to each microphone. At that point, the system may perform channel assignment that may be used for upmixing that can be performed automatically. In addition, the disclosed system and method may obtain the room characteristics and estimate the distance to the wall or a reflector.
Room impulse response (RIR) generally provides an audio fingerprint of a location in an acoustic environment. There may be a variety of applications of RIR, such as wall boundary estimation, digitally reconstructing the acoustic environment for pro-audio applications, room correction, and frequency response correction for the playback system. The measurement of RIR includes exciting the room (or listening environment) may be performed by, but not limited to, clapping hands. The measurement of RIR may also include deconvolving an audio signal to obtain room characteristics. RIR may involves the reflections after exciting the room. Reverberation may refer to the audio reflections that reflect back to the audio source. The reverberations are generally not direct sound, so the reverberations arrive later to the microphone. The reverberation amplitude and the time to come back depending on the material of the surfaces and the number of the reflected area. The sound continues to reflect until the sound loses its energy due to absorption.
The first loudspeaker 102a and the second loudspeaker 102b are located a distance away from a wall 2404. In general, it is desirable to understand the distance of the first and/or the second loudspeakers 102a-102b from the wall 2404 in the listening environment 151. If one or more of the first and the second loudspeakers 102a-102b are placed too close to the wall 2404, such a condition may be difficult for the audio source 2402 to automatically correct for the location of the wall 204 relative to the loudspeakers 102a-102b to ensure the desired sound field is transmitted into the room (or the listening environment 151). In general, the first and/or the second loudspeaker 102a, respectively, if positioned too close to the wall 2404, may cause effects of +/−3 dB at low frequencies. The audio source 2402 (i.e., within the loudspeaker 102a and/or the loudspeaker 102b) may determine the location of the first and/or second loudspeakers 102a-102b relative to the wall 2404 and employ a corrective mechanism to account for the distance of the first and/or second loudspeakers 102a-102b being positioned to close to the wall 2404. The system 2400 may improve channel assignment using more than two microphones 106a by employing the corrective mechanism to account for the close proximity of the loudspeakers 102a-102b to the wall 2404. The ability to perform channel assignment (e.g., which loudspeaker is front left/front right/rear, etc.) properly enables audio upmixing. It is recognized that the audio source 2402 may include any number of controllers 2410 (hereafter “the controller 2410”) to perform the operations noted herein. While the audio source 2402 may determine the distance of the first and/or the second loudspeakers 102a-102b relative to the wall 2404, it is recognized the any one or more of the first loudspeaker 102a or the second loudspeaker 102b may also include at least one controller 2412 to determine the distance of the loudspeakers 102a, 102b relative to the wall 2404.
The controller 2410 may employ, for example, a predetermined measurement scheme such as RIR to provide and transmit an audio fingerprint in the listening environment 151. For example, the controller 2410 may include a driver (not shown) to transmit the audio fingerprint into the listening environment 151. The controller 2410 may also include memory to store the audio fingerprint. The system 2400 may employ a variety of applications of RIR, such as wall boundary estimation, digitally reconstructing the acoustic environment for pro-audio applications, room correction, and frequency response correction for the playback system. In one example, the audio source 2402 may excite the room (or the listening environment 151) by transmitting an audio signal and perform and the measurement of RIR may also include deconvolving an audio signal to obtain room characteristics. As noted above, RIR may involve performing measurements of a captured audio fingerprint (i.e., reflections) after exciting the listening room 151 has been excited. Reverberation may refer to the audio reflections that reflect back to the audio source 2402. The audio source 2402 maybe coupled to the microphone 106a and 106b to receive the captured reflections (or reverberations) from the listening environment 151. The reverberations as received back by the audio source 2402 are generally not direct sound, so the reverberations arrive at a time later to the microphone 106. The amplitude of the reverberation and the time for the reverberation to arrive at audio source 2402 depends on the material of the surfaces within the listening environment 151 and the number of the reflected area. The sound continues to reflect until the sound loses its energy due to absorption within the listening environment 151.
The audio source 2204 may excite the listening environment 151 by transmitting an audio signal that includes an exponential sine sweep (ESS) (or ESS signal). The ESS signal may be more advantageous over an impulse response measurement method since (i) the ESS signal has better noise rejection than a maximum length sequence (MLS) method for a signal that is transmitting at a same length as that of the MLS, and (ii) the ESS signal may be more robust than non-linear effects given that the driver directly transmits the ESS signal
The equation below may be provided for ESS signal:
T denotes a time duration of the sweep. Variables ω1 and ω2 correspond to a start and end frequency, respectively. Since the frequencies for the ESS's varies, energy may depend on a rate of the instantaneous frequency which is given below:
The audio source 2402 may employ inverse filtering or deconvolution to measure the RIR after the first and/or the second loudspeakers 102a, 102b plays the EES signal 2500 in the listening environment 151. Then the controller 2410 employs inverse filtering and extracts the RIR. As noted above, the audio source 2402 includes any number of microphones 2420 to record the ESS signal 2500. The audio source 2402 may then extract or measure the RIR from the recorded ESS signal 2500. A time reversed energy for the ESS signal 2500 decreases, for example, at 3 db/octave, an inverse filter, for example, has 3 dB/octave increase in its energy spectrum to achieve a flat spectrogram. Assume h(t) is a room impulse response, r(t) is the excited room impulse response, and f(t) is the inverse filter.
f(t) can be created using post-modulation, which is applying amplitude modulation envelope of +6 dB/octave to the spectrum of the time reversed signal. The general form of the post-modulation function is as follows:
A denotes the constant for the modulation function. For time t=0, ω(t)=w1, and for obtaining a unity gain at time t=0:
Then, the modulation function becomes:
f(t) now has 3 dB/octave increase in frequency after modulating the time reversed signal with m(t).
In general, the measured RIR is obtained by the audio source 2402 by utilizing equation 13. Thus, the aspects related to equation 13 correspond to a convolution of the ESS signal and the inverse filter. The audio source 2402 may utilize the measured RIR to estimate the distance of the first and/or second loudspeakers 102a, 102b to the wall 2404. It is recognized that the audio source 2402 for a given loudspeaker 102a and 102b determines the distance for each loudspeaker 102a and 102b that the audio source 2402 is positioned in. In general, since the measured RIR comprises reverberations from multiple objects in the listening environment 151 (or room), the wall proximity estimation as utilized by the audio source 2402 may be sophisticated.
The audio source 2402 may overcome the noted issues above to perform wall distance estimation by (i) sampling or extracting peaks in the RIR measurement to avoid spurious peaks (or ringing) which are strong and close to the peaks to be detected around the peaks 2702 which may cause erroneous estimations, and/or (ii) score each peak to determine a correct peak from the wall 2404. It is recognized that there are undesired peaks around the peaks 2702 due to nonlinearity and it is desirable to avoid such peaks in the RIR measurement. In general, the peaks 2702 in the RIR measurement may correspond to a direct path from the audio source 2402 to the microphone 2420 and from the reflector to the microphone 2420 on the audio source 2402). It may be observed that there is ringing around the peaks in a closer look at the RIR measurement. The audio source 2402 may extract peaks to detect impulse events. Thus, in this regard the audio source 2402 may utilize a sliding window to extract the peak in each window. The audio source 2402 may find each peak in the window after the max peak in the RIR measurement is obtained and ignores the other peaks in the RIR measurement.
For example, the “index of estimated peak” as set forth above in equation 17 generally corresponds to the estimate peak in the RIR measurement 2800. Thus, in this regard, the detected peak 2802a as shown in
The audio source 2402 tracks an overall trend in the peaks 2802 of the RIR measurement 2800 to estimate the peaks of the reverberation of the RIR measurement 2800. For example, if the ESS signal as transmitted by the audio source 2402 does not encounter the wall 2404 or an object in the listening environment 151, then the anticipated trend of the peaks 2802 of the RIR measurement would illustrate or corresponding to an overall decrease in peaks (i.e., a decreasing trend). If the ESS signal as transmitted by the audio source 2402 does encounter the wall 2404 or an object in the listening environment 151, then the anticipated trend of the peaks 2802 of the RIR measurement would illustrate a decreasing trend of peaks 2802 followed by an increased trend in peaks which are then followed by a decreasing trend in peaks 2802. In general, the audio source 2402 stores information corresponding to the peaks 2802 as received for the RIR measurement to determine if there is only a decreasing trend of peaks 2802 that continually decrease over time or if there is a decreasing trend of peaks 2802 followed by an increasing peak 2802a. The audio source 2402 may then establish a confidence score that is calculated by using, for example, a percentage increase that is multiplied by, for example, a value of 1.01 to the number of negative peaks 2802. The audio source 2402 may then select a predetermined number of peaks that have the highest confidence score (i.e., maximum score) or level (e.g., 20) and then locates a maximum peak among the selected peaks 2802. Such a maximum peak may correspond to the peak that exhibits the largest amplitude on the RIR measurement and may be positive after a long series of decreasing peaks. In this case, the maximum peak may be selected as the sample number (e.g., 251) which is then utilized by the audio source 2402 for insertion into equation 17 as provided above to find the distance of the loudspeaker 102a or 102b from the wall 2404.
In operation 2904, the audio source 2402 receives reverberations from the listening environment 151 in response to transmitting the ESS signal. In this case, the audio source 2402 detects the peaks 2802 of the reverberations in the RIR measurement 2800 and stores information corresponding to the peaks 2802 in memory thereof. In operation 2906, the audio source 2402 performs trend tracking of the peaks 2802.
In operation 2908, the audio source 2402 assesses the stored peaks 2802 of the reverberations to determine if there is only a decreasing trend of peaks 2802 that continually decrease over time in the RIR measurement or if there is a decreasing trend of peaks 2802 followed by an increasing peak 2802a in the RIR measurement. If the audio source 2402 determines that the peaks 2802 do not increase over time, then the method 2900 moves to operation 2912 and determines that the wall distance of the first or the second loudspeaker 102a or 102b cannot be determined. In this case, the method 2900 may move back to operation 2902. If the audio source 2402 determines that there is an increasing peak 2802a in the RIR measurement, then the method 2900 moves to operation 2910.
In operation 2910, the audio source 2402 establishes a confidence score that is calculated by using, for example, a percentage increase that is multiplied by, for example, a value of 1.01 to the number of negative peaks 2802. The audio source 2402 may then select a predetermined number of peaks that have the highest confidence score or level (e.g., 20) and then locate a maximum peak among the selected peaks 2802. Such a maximum peak may correspond to the peak 2802a that exhibits the largest amplitude on the RIR measurement and may be positive after a long series of decreasing peaks 2802. In operation 2912, the audio source 2402 applies the maximum peak to the distance equation (e.g., equation 17) and also applies the other variables as noted above in connection with equation 17 to determine the distance of the first loudspeaker 102a or the second loudspeaker 102b relative to the wall 2404.
It is recognized that the controllers as disclosed herein may include various microprocessors, integrated circuits, memory devices (e.g., FLASH, random access memory (RAM), read only memory (ROM), electrically programmable read only memory (EPROM), electrically erasable programmable read only memory (EEPROM), or other suitable variants thereof), and software which co-act with one another to perform operation(s) disclosed herein. In addition, such controllers as disclosed utilizes one or more microprocessors to execute a computer-program that is embodied in a non-transitory computer readable medium that is programmed to perform any number of the functions as disclosed. Further, the controller(s) as provided herein includes a housing and the various number of microprocessors, integrated circuits, and memory devices ((e.g., FLASH, random access memory (RAM), read only memory (ROM), electrically programmable read only memory (EPROM), electrically erasable programmable read only memory (EEPROM)) positioned within the housing. The controller(s) as disclosed also include hardware-based inputs and outputs for receiving and transmitting data, respectively from and to other hardware-based devices as discussed herein.
While exemplary embodiments are described above, it is not intended that these embodiments describe all possible forms of the invention. Rather, the words used in the specification are words of description rather than limitation, and it is understood that various changes may be made without departing from the spirit and scope of the invention. Additionally, the features of various implementing embodiments may be combined to form further embodiments of the invention.
This application generally relates to the Attorney Docket No. P220102US (HARM0866PUS), U.S. application Ser. No. ______, filed May 31, 2023, entitled “SYSTEM AND/OR METHOD FOR LOUDSPEAKER AUTO CALIBRATION AND LOUDSPEAKER CONFIGURATION LAYOUT ESTIMATION” the disclosure of which is hereby incorporated in its entirety by reference herein. This application generally relates to the Attorney Docket No. P220104US (HARM0867PUS), U.S. application Ser. No. ______, filed May 31, 2023, entitled “APPARATUS, SYSTEM AND/OR METHOD FOR NOISE TIME-FREQUENCY MASKING BASED DIRECTION OF ARRIVAL ESTIMATION FOR LOUDSPEAKER AUDIO CALIBRATION” the disclosure of which is hereby incorporated in its entirety by reference herein.