Embodiments of the present disclosure relate generally to audio processing systems and, more specifically, to techniques for determining spatial impulse response using acoustic scrambling.
Audio systems often employ various techniques to improve audio quality and realism experienced by listeners of these audio systems. One such technique involves measuring how sound waves are affected by a particular acoustic space such as a room, concert hall, vehicle passenger compartment, or the like. Such techniques involve computing a room impulse response (RIR) that characterizes how sound waves from a source location are distorted as a result of reflection of the sound waves from surfaces in the acoustic space. The RIR is the time-domain acoustic relationship between a sound source and a receiver in a given acoustic space and indicates the intensity of sound waves received by a microphone over time. Audio systems use the RIR to improve audio quality by determining the appropriate locations for speakers, cancelling echoes or other sounds that reduce audio quality, and so on.
Audio systems can measure the RIR of an acoustic space during a system calibration phase. The RIR of an acoustic space is measured by, for example, using a speaker to generate a stimulus sound, such as a sine sweep or other frequency sweep, and using a microphone to capture resulting sound waves transmitted and reflected through the acoustic space. The sine sweep can be an exponential sine sweep (ESS), in which the generated sound wave amplitude varies according to a sine wave with progressively increasing frequency over a period of time. The frequencies generated in the sine sweep can vary from a low frequency, such as 20 Hz, to a high frequency, such as 20 kHz. This example range corresponds to the range of frequencies that can be heard by humans. The sound waves travel in numerous directions, and each sound wave strikes one or more surfaces, such as walls, furniture, people and other objects within the acoustic space. More typically, when a sound wave traveling in a particular direction strikes an object, some portion of the sound wave is absorbed while some portion of the sound wave is reflected. The reflected portion of the sound wave travels through the acoustic space in a different direction with respect to the direction of the original sound wave. The reflected portion can strike another object, where, again, some portion of the sound wave is absorbed while some portion of the sound wave is reflected. This process continues until the acoustic energy of the sound wave strikes an object and is fully absorbed, and little or no portion of the sound wave is reflected. The RIR represents the total effect of absorption and reflection of all sound waves emanating from the speaker. A microphone can capture the reflected sound waves at a particular location in acoustic space and the captured sound waves can be used to determine the RIR for the particular location.
One drawback with the above approach to generating an RIR is that the sounds transmitted by the speaker when performing the frequency sweep are audibly perceptible to humans and can be disruptive or irritating to human listeners who hear the frequency sweep. The audible frequency sweep is often perceived as a shrill sound, similar to a siren of an ambulance or other emergency vehicle, that produces discomfort in the human auditory system. The audible frequency sweep can also be disruptive or distracting to any human listeners who are near the speaker that generates the frequency sweep sound. An audible volume level is generally used at the transmitting speaker that is sufficiently loud to enable the microphone to detect the frequency sweep sound, so reducing the volume level is not a feasible way to eliminate the audible frequency sweep.
As the foregoing illustrates, improved techniques for determining a room impulse response using an audible frequency sweep would be useful.
Various embodiments of the present disclosure set forth a computer-implemented method for generating a frequency sweep signal. The method includes generating a frequency sweep signal having a monotonically increasing frequency. The method further includes partitioning the frequency sweep signal into N input segments, each of the N input segments representing a different frequency range. The method further includes generating an encoding key having a sequence of N non-consecutive numbers, wherein each number in the sequence appears once. The method further includes generating an output signal by selecting each of the N input segments in an order based on the sequence of N non-consecutive numbers in the encoding key. The method further includes causing a speaker to produce audio tones in an audio space based on the output signal.
Other embodiments include, without limitation, a system that implements one or more aspects of the disclosed techniques, and one or more computer readable media including instructions for performing one or more aspects of the disclosed techniques.
At least one technical advantage of the disclosed techniques relative to the prior art is that, with the disclosed techniques, a room impulse response can be determined using a test audio signal that is less disturbing to human listeners than the test audio signals of prior art techniques. The test audio signal of the disclosed techniques is also more pleasant to human listeners than the test audio signals of prior art techniques. The test audio signal of the disclosed techniques can also be mixed with other sounds such as music to further reduce the disruptiveness of the test audio signal. Further, the disclosed techniques improve the distance range for which accurate wall distance estimates are obtained compared to calculating the wall distance estimates without preserving the reverberation tails. These technical advantages represent one or more technological improvements over prior art approaches.
So that the manner in which the recited features of the one or more embodiments set forth above can be understood in detail, a more particular description of the one or more embodiments, briefly summarized above, can be had by reference to certain specific embodiments, some of which are illustrated in the appended drawings. It is to be noted, however, that the appended drawings illustrate only typical embodiments and are therefore not to be considered limiting of its scope in any manner, for the scope of the disclosure subsumes other embodiments as well.
In the following description, numerous specific details are set forth to provide a more thorough understanding of certain specific embodiments. However, it will be apparent to one of skill in the art that other embodiments can be practiced without one or more of these specific details or with additional specific details.
The processor 102 retrieves and executes programming instructions stored in the system memory 112. Similarly, the processor 102 stores and retrieves application data residing in the system memory 112. The interconnect 110 facilitates transmission, such as of programming instructions and application data, between the processor 102, I/O devices interface 106, storage 104, network interface 108, and system memory 112. The I/O devices interface 106 is configured to receive input data from user I/O devices 122. Examples of user I/O devices 122 can include one or more buttons, a keyboard, a mouse or other pointing device, and/or the like. The I/O devices interface 106 can also include an audio output unit configured to generate an electrical audio output signal, and user I/O devices 122 can further include a speaker configured to generate an acoustic output in response to the electrical audio output signal. Another example of a user I/O device 122 is a display device that generally represents any technically feasible means for generating an image for display. For example, the display device could be a liquid crystal display (LCD) display, organic light-emitting diode (OLED) display, or digital light processing (DLP) display. The display device can be a TV that includes a broadcast or cable tuner for receiving digital or analog television signals. The display device can be included in a head-mounted display (HMD) assembly such as a VR/AR headset or a heads-up display (HUD) assembly. Further, the display device can project an image onto one or more surfaces, such as walls, projection screens or a windshield of a vehicle. Additionally or alternatively, the display device can project an image directly onto the eyes of a user (e.g., via retinal projection).
The processor 102 is included to be representative of a single central processing unit (CPU), multiple CPUs, a single CPU having multiple processing cores, digital signal processors (DSPs), field-programmable gate arrays (FPGAs), graphics processing units (GPUs), tensor processing units, and/or the like. And the system memory 112 is generally included to be representative of a random-access memory. The storage 104 can be a disk drive storage device. Although shown as a single unit, the storage 104 can be a combination of fixed and/or removable storage devices, such as fixed disc drives, floppy disc drives, tape drives, removable memory cards, or optical storage, network attached storage (NAS), or a storage area-network (SAN). The processor 102 communicates to other computing devices and systems via the network interface 108, where the network interface 108 is configured to transmit and receive data via a communications network.
The system memory 112 includes, without limitation, a sweep signal encoder module 132 and a sweep signal decoder module 134. The sweep signal encoder module 132 and the sweep signal decoder module 134, when executed by the processor 102, perform one or more operations associated with the techniques described herein. The sweep signal encoder module 132 converts a monotonically increasing frequency sweep signal, such as an ESS signal, to an output signal 152 by partitioning the frequency sweep signal into segments and rearranging the segments into a sequence of rearranged input segments 150 such that there is a discontinuity in frequency between each pair of adjacent rearranged input segments 150. In the rearranged input segments 150, each segment represents a frequency sweep that is a fraction of the duration of the frequency sweep signal 212, and there is an abrupt change in frequency between each pair of segments because of the discontinuity in frequency at the boundaries between segments. In references to the sequence of input segments 142 and other sequences of segments herein, the word “sequence” is omitted for brevity.
The sweep signal encoder module 132 generates an output signal 152 based on the rearranged input segments 150 or, if optional effects such as fade-in, fade-out, and/or inter-segment silence are to be included in the output signal 152, based on the input segments with effects 160. As an example, the output signal 152 can have the same frequencies as the rearranged input segments 150 in the same order as in the rearranged input segments 150. Alternatively, the output signal 152 can have the same frequencies as the input segments with effects 160 in the same order as in the input segments with effects 160. The sweep signal encoder module 132 provides the output signal 152 to a speaker in an acoustic space and causes the speaker to produce audio based on the output signal 152.
The audio produced by the speaker based on the output signal 152 is propagated through and reflected within the acoustic space. A microphone captures sound data based on the sound waves that occur in the audio space as a result of the audio. The sweep signal decoder module 134 generates an input signal based on the captured sound data, identifies a portion of the input signal that corresponds to a portion of the output signal 152, and partitions the input signal into a sequence of received segments 148. The sweep signal decoder module 134 can generate a decoded signal 156 based on a sequence of decoded segments that are in an order that corresponds to the sequence of input segments 142 by performing an inverse mapping using the encoding key 144. The inverse mapping can involve selecting each received segment of the received segments 148 in an order based on the encoding key 144.
The sweep signal decoder module 134 can use one or more band pass filters to remove copies of reverberation tails of segments that are not in the same order as the original signal before re-ordering the received segments 148 to form the decoded signal 156. In some embodiments, the reverberation tails of segments are reordered with the segments after removing frequencies outside the expected frequency ranges of the segments, so the reordered segments include the reverberation tails. The band pass filters convert the received segments 148 to filtered segments 154, and the sweep signal decoder module 134 can generate the decoded signal 156 based on the filtered segments 154, which are in an order that corresponds to the sequence of input segments 142, by performing an inverse mapping using the encoding key 144.
When performing operations associated with the sweep signal encoder module 132, the processor 102 stores data in and retrieves data from portions of the data store 140, such as the input segments 142, the encoding key(s) 144, the encoding parameters 146, the rearranged input segments 150, and the output signal 152. When performing operations associated with the sweep signal decoder module 134, the processor 102 stores data in and retrieves data from portions of the data store 140, such as the encoding key(s) 144, the received segments 148, the filtered segments 154, the decoded signal 156, and the spatial impulse response 158.
The order of the segments in the rearranged input segments 150 can be determined based on an encoding key 144. The encoding key 144 is a sequence of N numbers that identify segments, where N is the number of segments in the input segments 142. The encoding key 144 specifies a modified order of the input segments 142 as a sequence of rearranged segment indexes that is a permutation of an initial order, such as an order in which the segment indexes are monotonically increasing (which corresponds to the order of the segments in the input segments 142). The sweep signal encoder module 132 rearranges the input segments 142 into the modified order specified by the encoding key 144 to form the rearranged input segments 150. The encoding key 144 can include a sequence of N non-consecutive random numbers in which no number is repeated. Alternatively, the encoding key 144 can begin with the number 1 and end with the number N, in which case the sequence elements having indexes 2 through N−1 form a sequence of N−2 non-consecutive random numbers in which no number is repeated. The sequence that begins with 1 and ends with N can be less disruptive and/or more pleasant to human listeners than a sequence that begins and ends with other numbered segments.
The sweep signal encoder module 132 on the computing device 100 generates an output signal 152 based on the rearranged input segments 150 and causes a speaker to produce audio tones in an audio space based on the output signal 152. The same computing device 100 that generates the output signal 152 and causes the speaker to produce the audio tones can use a microphone 206 to capture sound data 216 based on sound waves that occur in the audio space as a result of the speaker producing the audio tones. A sweep signal decoder module 134 on the computing device 100 can then convert the sound data 216 to a decoded signal 156 using the encoding key 144, and a spatial impulse response generator 218 can convert the decoded signal 156 to a spatial impulse response 158.
The sweep signal encoder module 132 can provide the encoding key 144 to an encoding key sender module 220, which can send the encoding key 144 to one or more other computing device(s) 100, e.g., via a communications network. At the other computing device 100, an encoding key receiver module 222 can receive the encoding key 144 via the communication network and provide the encoding key 144 to a sweep signal decoder module 134 so that the other computing device 100 can decode sound data captured by a microphone 206 on the other computing device 100. The sound data captured by the microphone 206 on the other computing device can be based on sound waves that occur in an audio space as a result of audio tones produced by the speaker 204 based on the output signal 152.
The sweep signal encoder module 132 converts the frequency sweep signal 212 to an output signal 152 having a sequence of N segments and rearranges the segments to form a sequence of rearranged input segments 150 having a discontinuity in frequency between each pair of adjacent segments. The resulting output signal 152 sounds less disruptive and/or more pleasant to human listeners than the frequency sweep signal 212 because of the shorter durations of the sweeps and the relatively large changes in frequency between the sweeps in the output signal 152.
The input segments generator 310 partitions the frequency sweep signal 212 into N input segments 142. The number N can be specified by or derived from encoding parameters 146. In one example, the number N can be directly specified by the encoding parameters 146. In another example, the number N can be determined by dividing a length of the frequency sweep signal 212 by a segment length 308 that specifies a length of each segment in time units such as milliseconds (ms). The segment length 308 can be, e.g., 40 ms, and the length of the frequency sweep signal 212 can be, e.g., 200 ms.
The rearranged segments generator 312 permutes the N input segments 142 into a sequence of N rearranged input segments 150 having a discontinuity in frequency between each pair of adjacent rearranged input segments 150. In some embodiments, at least one pair of adjacent rearranged input segments 150 are continuous in frequency (e.g., not separated by a discontinuity), and there is a discontinuity in frequency between at least one other pair of adjacent rearranged input segments 150. In some embodiments, the encoding key 144 is a sequence of N non-consecutive numbers having values selected from the range 1 through N in which no number is repeated. In some embodiments, the encoding key 144 is a sequence of N random non-consecutive numbers having values selected from the range 1 through N in which no number is repeated. In some embodiments, the encoding key 144 is a sequence of numbers in which the first and last numbers are 1 and N, respectively, and the numbers at indexes 2 through N−1 form a sequence of non-consecutive numbers having values selected from the range 2 through N−1 in which no number is repeated. In some embodiments, the numbers at indexes 2 through N—are random non-consecutive numbers having values selected from the range 2 through N−1 in which no number is repeated. The sequence that begins with 1 and ends with N can be less disruptive and/or more pleasant to human listeners than a sequence that begins and ends with other numbers.
Examples of the encoding key 144 include the sequence [1 4 3 2 5], in which the portion of the sequence between the first and last elements is [4 3 2], which is a sequence of non-consecutive numbers. The numbers 4 and 3 are non-consecutive, and the numbers 3 and 2 are non-consecutive, so the sequence [4 3 2] is a sequence of non-consecutive numbers. A sequence containing two consecutive numbers, such as [1, 2] or [2, 3] is not a valid encoding key 144. Other valid encoding keys 144 of length five that start with 1 and end with 5 include [1 2 4 3 5] and [1 3 2 4 5]. Because the encoding key 144 can be a sequence of random numbers that are non-consecutive, a particular encoding key 144 of length five that start with 1 and ends with 5 can be any of [1 4 3 2 5], [1 2 4 3 5], or [1 3 2 4 5], where the particular sequence is selected randomly (e.g., each of the three possible valid sequences could have an equal probability of being selected when generating a particular encoding key 144). Sequences that are of length five, start with 1, and end with 5 that are not valid encoding keys include [1 2 3 4 5]. [1 3 2 4 5], and [1 3 4 2 5]. As another example, [1 5 7 3 8 4 9 6 2 10] is a valid encoding key 144, but [1 5 7 3 4 8 9 6 2 10] is not a valid encoding key because it contains the consecutive numbers 3 and 4 and 8 and 9.
The rearranged segments generator 312 can convert the sequence of input segments 142 to the sequence of rearranged input segments 150 using a mapping operation that determines a rearranged order of the input segments 142. The mapping operation can map the input segments 142 to the rearranged input segments 150 by selecting each of the input segments 142 in an order based on a mapping algorithm and/or based on a mapping data structure referred to herein as an encoding key 144. The encoding key 144 can be generated by the encoding key generator 306 based on one or more random number(s) 304 using a suitable algorithm. The random number(s) 304 can be generated by a random number generator 302.
The encoding key 144 specifies the order in which the input segments 142 are selected. The selection order is specified as a sequence of segment numbers that identify segments in the input segments 142. The selection order can be a random order that conforms to ordering criteria such as being a sequence of non-consecutive segment numbers in which the first and last segments of the input segments 142 (e.g., at indexes 1 and N) are also the first and last segments of the rearranged input segments 150. For example, the encoding key 144 can be a sequence having 1 in the first element and N in the Nth element.
As an example, to generate the encoding key 144, the encoding key generator 306 initializes the encoding key 144 to an empty sequence and generates a sequence of available numbers that initially includes the numbers 2 through N−1. The encoding key generator 306 randomly selects an available number from the sequence of available numbers, adds (e.g., appends) the randomly selected available number to the encoding key 144, and removes the randomly selected available number from the sequence of available numbers. The encoding key generator 306 then identifies the available numbers, if any, that are in the sequence of available numbers and are non-consecutive with the number at the end of the encoding key. If there are no available numbers, then the encoding key generator 306 randomly selects a different available number from the sequence of available numbers. Otherwise, the encoding key generator 306 adds the available number to the encoding key 144 and removes the available number from the sequence of available numbers. The encoding key generator 306 repeatedly performs the above operations until the encoding key 144 includes each number in the range 2 through N−1.
The rearranged segments generator 312 can add (e.g., append) each successive selected input segment 142 to the rearranged input segments 150 in the order in which the input segments 142 are selected. The encoding key 144 can be a sequence of non-consecutive numbers, for example. The encoding key 144 can be a random key, e.g., a random sequence of the indexes of the input segments 142 such that the indexes are non-consecutive. The encoding key 144 can conform to ordering criteria as described above, e.g., elements 1 and N of the sequence of can have the values 1 and N, while elements 2 through N−1 can be in a random sequence of non-consecutive numbers having values selected from the range 2 through N−1.
The non-consecutive numbers in the encoding key 144 are referred to as “elements” of the sequence. Each element in the sequence of non-consecutive numbers has an associated index that ranges from 1 to N, where N is the number of elements in the sequence. Each of the numbers in the encoding key 144 is associated with a source index that represents the position of the number in the encoding key 144. Further, each of the numbers in the encoding key 144 identifies a destination index (e.g., position) in the sequence of rearranged input segments 150 to which a segment identified by the source index in the input segments 142 is to be mapped.
The rearranged segments generator 312 can provide the rearranged input segments 150 to the output signal generator 314, which generates an output signal 152 based on the rearranged input segments 150 and causes a speaker 204 to produce audio tones in an audio space based on the output signal 152. Alternatively, the rearranged segments generator 312 can provide the rearranged input segments 150 to a fade and silence effects generator 316, which applies fade-in, fade-out, and/or silence period effects to the rearranged input segments 150. The fade and silence effects generator 316 generates a sequence of rearranged input segments with effects 160 that includes the fade-in, fade-out, and/or silence period effects. Prior to applying the fade-in and/or fade-out effects, each input segment in the rearranged input segments 150 has a predetermined amplitude A. Further, as described above, each input segment has a segment length 308 specified in time units such as milliseconds (ms).
The fade-in effect applied by the fade and silence effects generator 316 modifies the amplitude of an initial portion of each input segment to gradually increase from an initial value, such as 0 dB, to the amplitude A over a period of time referred to herein as a “fade-in length.” The fade and silence effects generator 316 can use gain scaling to apply the fade-in effect over a time period that is specified by the fade-in length and starts at the beginning of each input segment. The fade-in length can be, for example, 25% of the segment length 308. If the segment length 308 is 40 ms, then the fade-in length is 10 ms, for example. The fade-out effect modifies the amplitude of a trailing portion of each input segment to gradually decrease from the amplitude A to the initial value (e.g., 0 dB) over a period of time referred to herein as a “fade-out length.” The fade-out length is thus the length of the trailing portion. The trailing portion ends at the end of the input segment. The fade-out length can be, for example, 25% of the segment length 308, in which case the fade-out length is 10 ms, for example.
The silence period effect applied by the fade and silence effects generator 316 inserts a period of silence of a predefined silence length between each pair of adjacent input segments in the sequence of rearranged input segments 150 to form a sequence of rearranged input segments with effects 160 in which the input segments are spaced apart by the silence length. The predefined silence length can be the same as the segment length 308, e.g., 40 ms. As an example, with reference to
The sweep signal encoder module 132 can perform additional processing to modify the rearranged input segments 150 and/or the output signal 152 prior to providing the output signal 152 to a speaker. The additional processing can include adding a period of silence between each pair of segments in the rearranged input segments 150 and/or adding fade-in and/or fade-out effects at the beginning and/or end of each of the rearranged input segments 150. The time durations of the periods of silence and/or the fade-in and fade-out effects are specified by encoding parameters 146.
As an example, the sweep signal encoder module 132 can map 5 input segments 142 having indexes “1, 2, 3, 4, 5” to five rearranged input segments 150 using the encoding key 144 [1, 4, 3, 2, 5], which specifies that the input segment having index 1 (“input segment 1”) is to be mapped to a rearranged input segment having index 1 (“rearranged segment 1”), input segment 2 is to be mapped to rearranged segment 4, input segment 3 is to be mapped to rearranged segment 3, input segment 4 is to be mapped to rearranged segment 2, and input segment 5 is to be mapped to rearranged segment 5. The resulting rearranged input segments 150 are thus “1, 4, 3, 2, 5”.
The sweep signal encoder module 132 generates an output signal 152 based on the rearranged input segments 150. For example, the output signal 152 can have the same frequencies as the rearranged input segments 150 in the same order as in the rearranged input segments 150. The sweep signal encoder module 132 provides the output signal 152 to a speaker 204 in an acoustic space and causes the speaker 204 to produce audio tones based on the output signal 152.
Although examples are described herein with reference to signals having increasing frequencies and signal segments successively higher frequency ranges, the techniques discussed herein can be also applied to signals having decreasing frequencies and signal segments having successively lower frequency ranges with appropriate changes.
The received segments generator 402 generates an input signal (not shown) based on the captured sound data 216. To generate the input signal, the received segments generator 402 identifies a portion of a captured signal in the sound data 216 that corresponds to a portion of the output signal 152 that was provided to the speaker 204 to cause the speaker 204 to produce audio tones. As an example, with reference to
If effects such as fade-in, fade-out, and/or silence periods between the segments are present in the captured signal, the received segments generator 402 removes the effects from the captured signal. Fade-in and fade-out effects are removed by performing a reverse of the fade effect transformation performed by the effects generator 316 that modified the rearranged input segments 150 to include the fade-in and/or fade-out effects. For example, the fade-in and fade-out effects can be removed by undoing the gain scaling that was applied by the fade and silence effects generator 316. The reverse fade effect transformation can increase the amplitudes of the fade-in and/or fade-out portions of the input signal to original values the rearranged input segments 150 had prior to application of the effects by the effects generator 316. Silence effects are removed, which can be periods of silence, are removed by identifying the periods of silence between segments 142 in the input signal and moving the segments 142 together so that the segments 142 are adjacent together, A period of silence can be, e.g., a portion of a signal having a frequency that is inaudible to humans, e.g., 0 Hz or other inaudible frequency, As an example, with reference to
The sweep signal decoder module 134 converts the sequence of received segments 148 to a sequence of decoded segments (not shown) that are in the same order as the sequence of input segments 142 and generates a decoded signal 156 based on the sequence of decoded segments. The sweep signal decoder module 134 can convert the sequence of received segments 148 to the sequence of decoded segments using an inverse mapping operation, for example. The inverse mapping operation can map the received segments 148 to the sequence of decoded segments by selecting each of the received segments 148 in an order based on the encoding key 144. The sweep signal decoder module 134 can add (e.g., append) each selected received segment 148 to the sequence of decoded segments in the order in which the received segments 148 are selected. Each of the numbers in the encoding key 144 identifies a destination index (e.g., position) in the sequence of received segments 148 to which a segment identified by a source index in the input segments 142 is mapped. To perform the inverse mapping from the order of the received segments 148 to the order of the input segments 142, the sweep signal decoder module 134 can iterate through the segment numbers in the encoding key 144 and, for each segment number in the key, select the received segment 148 identified by the segment number and add (e.g., append) the selected received segment 148 to the sequence of decoded segments.
As an example, the received segments 148 can be “1, 4, 3, 2, 5” and the encoding key 144 can be [1 4 3 2 5]. As described above, the encoding key 144 [1 4 3 2 5] specifies that the input segment having index 1 (“input segment 1”) is mapped to a rearranged input segment having index 1 (“rearranged segment 1”), input segment 2 is mapped to rearranged segment 4, input segment 3 is mapped to rearranged segment 3, input segment 4 is mapped to rearranged segment 2, and input segment 5 is mapped to rearranged segment 5. The sweep signal decoder module 134 performs the inverse mapping by iterating through the segment numbers in the encoding key 144. The first segment number in the encoding key 144 (at index=1) is 1, so the sweep signal decoder module 134 selects the received segment 148 having index=1, which is the first received segment 148 in the sequence of received segments 148 (having segment number=1). The segment number “1” is added to the sequence of decoded segments.
Moving to the next segment number in the encoding key 144, the second segment number in the encoding key 144 (at index=2) is 4, so the sweep signal decoder module 134 selects the received segment 148 having index=4, which is the fourth received segment 148 in the sequence of received segments 148 (having segment number=2). The segment number “2” is added to the sequence of decoded segments. The sweep signal decoder module 134 continues by iterating through the third, fourth, and fifth segment numbers in the encoding key 144, and selecting the respective received segments 148 having indexes 3, 2, and 5, which have segment numbers 3, 4, and 5, respectively. The resulting sequence of decoded segments is “1, 2, 3, 2, 5”, which is the same order the input segments 142 had prior to being rearranged. The sweep signal decoder module 134 generates a decoded signal 156 based on the sequence of decoded segments and determines a spatial impulse response 158 using the decoded signal 156.
The sweep signal decoder module 134 can be located on the same computing device 100 as the sweep signal encoder module 132, in which case the sweep signal decoder module 134 can access the encoding key 144 and the value of N via shared memory or otherwise receive the encoding key 144 and/or the value of N from the sweep signal encoder module 132. Alternatively or additionally, the sweep signal decoder module 134 can be located on a different computing device than the sweep signal encoder module 132, in which case the sweep signal encoder module 132 can send the encoding key 144 and/or the value of N to the sweep signal decoder module 134 on the different computing device via network communication. As another alternative, the encoding key 144 and/or the value of N can be provided to the different computing device in the encoding parameters 146 when the audio system is configured, for example.
The sound waves that occur in the audio space as a result of the audio produced by the speaker include reverberations of the audio tones, and the reverberations continue for some time after the segments of the audio tones are produced. When the sequence of received segments 148 is decoded to form the sequence of decoded segments for the decoded signal 156, portions of the reverberation tails from other segments that are included in the received segments 148 are moved as part of the segments that are moved during the re-ordering of the received segments 148 to form the decoded signal 156. The generated reordered signal accordingly has segments that contain portions of tails from previous segments that are out of order.
An optional filtered segments generator 404 can receive the sequence of received segments 148 and use a band pass filter to remove copies of the reverberation tails of segments that are not in the same order as the original signal before re-ordering the received segments 148 to form the decoded signal 156. The filtered segments generator 404 generates a sequence of filtered segments 154. In some embodiments, the reverberation tails of segments are reordered with the segments after removing frequencies outside the expected frequency ranges of the segments, so the reordered segments include the reverberation tails. With reference to
The band pass filter(s) used by the filtered segments generator 404 convert the received segments 148 to filtered segments 154, and the sweep signal decoder module 134 can generate the decoded signal 156 based on the filtered segments 154, which are in an order that corresponds to the sequence of input segments 142, by performing an inverse mapping using the encoding key 144. A decoded signal generator 406 can receive the filtered segments 154 and generate a decoded signal 156 based on the filtered segments 154. The decoded signal generator 406 can use an encoding key 144 received from an encoding key receiver module 222 by selecting each of the N filtered segments in an order based on the sequence of the non-consecutive numbers in the encoding key. The decoded signal 156 is provided as input to a spatial impulse response generator 218, which generates a spatial impulse response 158 based on the decoded signal 156.
Segment 142D (a fourth segment) has moved from a fourth time range (between T3 and T4) to a second time range between T2 and T3, as specified by the number “4” in the encoding key 144. Input segment 142C (a third segment) has not moved and is in the third time range (between T4 and T4 in
The fade-out effect modifies the amplitude of a trailing portion of each input segment to gradually decrease from the amplitude A to the initial value (e.g., 0 dB) over a period of time specified by a fade-out length. A fade-out effect 522 is a trailing portion of the modified input segment 542A. The fade-out effect 522 begins at time T2-f and ends at time T2, where is the fade-out length, which is equal to the fade-in length in this example. The fade-out length can be, for example, 25% of the segment length 308. The silence period effect applied by the fade and silence effects generator 316 inserts a period of silence of a predefined silence length s between each pair of adjacent input segments 542 in the sequence of rearranged input segments 150 to form a sequence of rearranged input segments with effects 160 in which the input segments are spaced apart by a period of silence 524, and each segment has a fade-in effect 520 and a fade-out effect 522.
As can be seen in
Further, the frequencies above and below segment 502C starting at time T3 have been removed by a band pass filter shown as rectangular regions 514A and 514B. The frequencies above and below segment 502B starting at time T4 have been removed by a band pass filter shown as rectangular regions 516A and 516B. The frequencies below segment 502E starting at time T5 have been removed by a band pass filter shown as a rectangular region 518.
As shown, a method 600 begins at step 602, where a computing device 100 generates a frequency sweep signal having a monotonically increasing frequency. The frequency sweep signal can be, for example, an exponential sine sweep signal in which the frequency of the signal is monotonically increasing over time.
At step 604, computing device 100 partitions the frequency sweep signal into N input segments, each of which represents a different frequency range. Each input segment has a segment length 308. For example, if the frequency sweep signal increases in frequency over time from 1 kHz to 6 kHz, and there are 5 pulses, the first pulse is 1-2 kHz, the second is 2-3 kHz, the third is 3-4 kHz, the fourth is 4-5 kHz, and the fifth is 5-6 kHz. An example frequency sweep signal partitioned into input segments 142A-142E is shown in
At step 606, computing device 100 optionally includes, in the N input segments, a period of silence having a given silence time period length subsequent to each input segment. Step 606 can apply a silence effect to the sequence of input segments 142 as described herein with reference to
At step 608, computing device 100 adds fade-in and fade-out effects having respective fade-in and fade-out time period lengths to each input segment. The fade-in and fade-out effects are applied to the input segments to prevent implosive sounds at the speaker 204 when step 616 causes the speaker to produce audio based on an output signal 152 that is based on the input segments 142. The computing device 100 can use gain scaling to apply the fade-in effect to each input segment of the frequency sweep signal over a time period that is specified by the fade-in length and starts at the beginning of each input segment. The fade-in length can be a parameter, e.g., a predetermined value such as 25% of the segment length 308. Applying the fade-in effect modifies the amplitude of an initial portion of each input segment to gradually increase from an initial value, such as 0 dB, to an amplitude of the frequency sweep signal that is generated step 602. The computing device 100 can also use gain scaling to apply the fade-out effect to of a trailing portion of each input segment of the frequency sweep signal over a time period that is specified by the fade-out length and ends at the end of each input segment. Applying the fade-out effect decreases the amplitude A to the initial value (e.g., 0 dB) over a period of time specified by a fade-out length parameter.
At step 610, computing device 100 generates an encoding key having a sequence of N non-consecutive numbers, wherein each number in the sequence appears once. The encoding key can be generated based on random numbers, so that different encoding keys are used at different times. As an example, to generate the encoding key 144, at step 610 the computing device 100 initializes the encoding key 144 to an empty sequence and generates a sequence of available numbers that initially includes the numbers 2 through N−1. The computing device 100 randomly selects an available number from the sequence of available numbers, adds (e.g., appends) the randomly selected available number to the encoding key 144, and removes the randomly selected available number from the sequence of available numbers. The computing device 100 then identifies the available numbers, if any, that are in the sequence of available numbers and are non-consecutive with the number at the end of the encoding key. If there are no available numbers, then the computing device 100 randomly selects a different available number from the sequence of available numbers. Otherwise, the computing device 100 adds the available number to the encoding key 144 and removes the available number from the sequence of available numbers. The computing device 100 repeatedly performs the above operations until the encoding key 144 includes each number in the range 2 through N−1.
At step 612, computing device 100 sends the key to one or more receiver devices. The key can be sent to the receiver devices via a communications network, for example. At step 614, computing device 100 generates an output signal 152 by selecting each of the N input segments in an order based on the sequence of the non-consecutive numbers in the encoding key. To generate the output signal 152, the computing device 100 can generate a rearranged sequence of the N input segments, in which each input segment has a respective second position in the rearranged sequence, and the respective second position is based on a respective number in the sequence of N non-consecutive numbers. Further, the respective number has a position in the sequence of N non-consecutive numbers that corresponds to the first position of the input segment in the N input segments, and the output signal 152 is based on the rearranged sequence. The position of the respective number in the sequence of N non-consecutive numbers is determined based on the first position of the input segment in the N input segments. At step 616, computing device 100 causes a speaker to produce audio tones in an audio space based on the output signal 152.
As shown, a method 700 begins at step 702, where a computing device 100 captures, using a microphone, sound data based on sound waves that occur in an audio space. The sound waves occur as a result of audio tones produced by a speaker in the audio space based on an output signal 152 such as that generated at step 616 of
At step 704, computing device 100 generates an input signal based on the sound data. For example, with reference to
At step 706, computing device 100 partitions the input signal into N received segments, each of the N received segments representing a different frequency range. N can be specified by the encoding parameters 146 and/or determined based on a segment length 308 specified by the encoding parameters 146.
At step 708 computing device 100 removes periods of silence, face-in effects, and fade-out effects from N received segments. If effects such as fade-in, fade-out, and/or silence periods between the segments are present in the captured signal, the received segments generator 402 removes the effects from the captured signal as described herein with respect to
At step 710, computing device 100 determines whether reverberation tail filtering is to be performed. Reverberation tail filtering optional and is performed if, for example, a corresponding configuration option for reverberation tail filtering is enabled.
If step 710 determines that reverberation tail filtering is not to be performed, then at step 712, computing device 100 generates a decoded signal by selecting each received segment of the N received segments in an order based on an encoding key. The encoding key can be received from a sender device, for example.
If step 710 determines that reverberation tail filtering is to be performed, then at step 714, computing device 100 generates N filtered segments by filtering each respective received segment of the received segments using a band pass filter, and wherein the sequence of N filtered segments is in an order based on an encoding key, and at step 716, computing device 100 generates a decoded signal by selecting each filtered segment of the N received segments in an order based on an encoding key. The encoding key 144 specifies the order in which the input segments 142 are selected. The selection order is specified as a sequence of segment numbers that identify segments in the input segments 142.
To generate the decoded signal by selecting each received segment, computing device can generate N filtered segments, where each filtered segment is generated by filtering each respective received segment of the received segments using a band pass filter. The decoded signal is generated by selecting each of the N filtered segments in an order based on the sequence of the non-consecutive numbers in the encoding key. At step 712, computing device 100 determines a spatial impulse response based on the decoded signal using a suitable technique.
In sum, a computer-based audio system generates a room impulse response by converting a given signal (e.g., a frequency sweep signal or exponential sine sweep) into a modified signal in which the segments from the given signal are in a different order than in the given signal. The modified signal is then used instead of the given signal to determine an RIR. To generate an RIR, the audio system causes a speaker in an acoustic space to produce audio tones based on the modified signal. A microphone in the acoustic space captures sound data based on sound waves that occur in a room in response to the audio tones produced by the speaker. The audio system then identifies the segments in the captured sound data and generates a reordered signal in which the segments are in the same order as in the original given signal. The RIR is then determined from the given signal and the reordered signal. The sound waves include reverberations of the audio tones, and the reverberations continue for some time after the segments of the modified signal that cause the reverberations. To remove copies of the reverberation tails of received segments that are not in the same order as the original signal, each segment is band pass filtered before being reordered into the reordered signal. In some embodiments, frequencies outside the expected frequency ranges of the received segments are removed by band pass filters and the reverberation tails are preserved during the reordering of the received segments so that the decoded signal contains the reverberation tails.
At least one technical advantage of the disclosed techniques relative to the prior art is that, with the disclosed techniques, a room impulse response can be determined using a test audio signal that is less disturbing to human listeners than the test audio signals of prior art techniques. The test audio signal of the disclosed techniques is also more pleasant to human listeners than the test audio signals of prior art techniques. The test audio signal of the disclosed techniques can also be mixed with other sounds such as music to further reduce the disruptiveness of the test audio signal. Further, frequencies outside the expected frequency ranges of the received segments are removed by band pass filters and the reverberation tails are preserved during the reordering of the received segments so that the decoded signal contains the reverberation tails substantially improves the distance range for which accurate wall distance estimates are obtained compared to calculating the wall distance estimates without preserving the reverberation tails. These technical advantages represent one or more technological improvements over prior art approaches.
1. In some embodiments, a computer-implemented method for generating a signal for measuring a spatial impulse response comprises: generating a frequency sweep signal having a monotonically increasing frequency; partitioning the frequency sweep signal into N input segments, each of the N input segments representing a different frequency range; generating an encoding key having a sequence of N non-consecutive numbers, wherein each number in the sequence appears once; generating an output signal by selecting each of the N input segments in an order based on the sequence of N non-consecutive numbers in the encoding key; and causing a speaker to produce audio tones in an audio space based on the output signal.
2. The method of clause 1, wherein each input segment in the N input segments is associated with a respective first position of the input segment in the N input segments, and wherein generating the output signal comprises: generating a rearranged sequence of the N input segments, wherein each input segment has a respective second position in the rearranged sequence, and the respective second position is based on a respective position of a number corresponding to the respective first position in the sequence of N non-consecutive numbers, wherein the output signal is based on the rearranged sequence.
3. The method of clause 1 or clause 2, wherein the output signal has a discontinuity in frequency at a boundary between a first output signal segment that corresponds to a first one of the N input segments and a second output signal segment that corresponds to a second one of the N input segments that is adjacent to the first output signal segment.
4. The method of any of clauses 1-3, wherein the output signal includes at least one segment having a lower frequency range than a frequency range of a previous segment of the output signal.
5. The method of any of clauses 1-4, wherein N is based on a length of the frequency sweep signal and a predetermined length of each input segment.
6. The method of any of clauses 1-5, wherein generating the output signal comprises: including, in the output signal, a period of silence of a given length between each pair of adjacent input segments.
7. The method of any of clauses 1-6, and wherein generating the output signal comprises one or more of: converting, in each segment of the N input segments, a beginning fade-in portion of the segment to a fade-in portion having an amplitude that increases over a period of time, or converting, in each segment of the N input segments, a portion of the segment that ends at an end of the segment to a fade-out portion having an amplitude that decreases over a period of time.
8. The method of any of clauses 1-7, further comprising: capturing, using a microphone, sound data based on sound waves that occur in the audio space; generating an input signal based on the sound data; partitioning the input signal into N received segments, each of the N received segments representing a different frequency range; generating a decoded signal by selecting each received segment of the N received segments in an order based on the sequence of N non-consecutive numbers in the encoding key, the decoded signal having a monotonically increasing frequency; and determining a spatial impulse response based on the decoded signal.
9. The method of any of clauses 1-8, further comprising filtering each received segment with a band pass filter having a frequency range based on the frequency range of the received segment.
10. The method of any of clauses 1-9, further comprising removing a fade-in portion and a fade-out portion of each received segment in the N received segments.
11. One or more non-transitory computer-readable media storing program instructions that, when executed by one or more processors, cause the one or more processors to perform steps of: generating a frequency sweep signal having a monotonically increasing frequency; partitioning the frequency sweep signal into N input segments, each of the N input segments representing a different frequency range; generating an encoding key having a sequence of N non-consecutive numbers, wherein each number in the sequence appears once; generating an output signal by selecting each of the N input segments in an order based on the sequence of N non-consecutive numbers in the encoding key; and causing a speaker to produce audio tones in an audio space based on the output signal.
12. The one or more non-transitory computer-readable media of clause 11, wherein each input segment in the N input segments is associated with a respective first position of the input segment in the N input segments, and wherein generating the output signal comprises: generating a rearranged sequence of the N input segments, wherein each input segment has a respective second position in the rearranged sequence, and the respective second position is based on a respective position of a number corresponding to the respective first position in the sequence of N non-consecutive numbers, wherein the output signal is based on the rearranged sequence.
13. The one or more non-transitory computer-readable media of clause 11 or clause 12, wherein the output signal has a discontinuity in frequency at a boundary between a first output signal segment that corresponds to a first one of the N input segments and a second output signal segment that corresponds to a second one of the N input segments that is adjacent to the first output signal segment.
14. The one or more non-transitory computer-readable media of any of clauses 11-13, wherein the encoding key is further based on at least one random value.
15. The one or more non-transitory computer-readable media of any of clauses 11-14, the steps further comprising sending the encoding key and one or more input segment lengths to one or more receiver devices, wherein each input segment length indicates a length of an input segment in the N input segments.
16. A system, comprising: one or more memories storing instructions; and one or more processors coupled to the one or more memories and, when executing the instructions: generate a frequency sweep signal having a monotonically increasing frequency; partition the frequency sweep signal into N input segments, each of the N input segments representing a different frequency range; generate an encoding key having a sequence of N non-consecutive numbers, wherein each number in the sequence appears once; generate an output signal by selecting each of the N input segments in an order based on the sequence of N non-consecutive numbers in the encoding key; and cause a speaker to produce audio tones in an audio space based on the output signal.
17. The system of clause 16, wherein each input segment in the N input segments is associated with a respective first position of the input segment in the N input segments, and wherein generating the output signal comprises: generating a rearranged sequence of the N input segments, wherein each input segment has a respective second position in the rearranged sequence, and the respective second position is based on a respective position of a number corresponding to the respective first position in the sequence of N non-consecutive numbers, wherein the output signal is based on the rearranged sequence.
18. The system of clause 16 or clause 17, wherein the output signal has a discontinuity in frequency at a boundary between a first output signal segment that corresponds to a first one of the N input segments and a second output signal segment that corresponds to a second one of the N input segments that is adjacent to the first output signal segment.
19. The system of any of clauses 16-18, wherein the output signal includes at least one segment having a lower frequency range than a frequency range of a previous segment of the output signal.
20. The system of any of clauses 16-19, wherein N is based on a length of the frequency sweep signal and a predetermined length of each input segment.
Any and all combinations of any of the claim elements recited in any of the claims and/or any elements described in this application, in any fashion, fall within the contemplated scope of the present disclosure and protection.
The descriptions of the various embodiments have been presented for purposes of illustration, but are not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments.
Aspects of the present embodiments can be embodied as a system, method, or computer program product. Accordingly, aspects of the present disclosure can take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that can all generally be referred to herein as a “module” or “system.” Furthermore, aspects of the present disclosure can take the form of a computer program product embodied in one or more computer readable medium(s) having computer readable program code embodied thereon.
Any combination of one or more computer readable medium(s) can be utilized. The computer readable medium can be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium can be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
Aspects of the present disclosure are described above with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, enable the implementation of the functions/acts specified in the flowchart and/or block diagram block or blocks. Such processors can be, without limitation, general purpose processors, special-purpose processors, application-specific processors, or field-programmable gate arrays.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams can represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block can occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks can sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
While the preceding is directed to embodiments of the present disclosure, other and further embodiments of the disclosure can be devised without departing from the basic scope thereof, and the scope thereof is determined by the claims that follow.