The present disclosure is generally related to sound event classification and more particularly to detecting wind noise in spatial audio.
Advances in technology have resulted in smaller and more powerful computing devices. For example, there currently exist a variety of portable personal computing devices, including wireless telephones such as mobile and smart phones, tablets and laptop computers that are small, lightweight, and easily carried by users. These devices can communicate voice and data packets over wireless networks. Further, many such devices incorporate additional functionality such as a digital still camera, a digital video camera, a digital recorder, audio recording, audio and/or video conferencing, and an audio file player. Also, such devices can process executable instructions, including software applications, such as a web browser application, that can be used to access the Internet. As such, these devices can include significant computing capabilities, including, for example audio signal processing. For such devices, wind noise can be problematic for audio captured outdoors.
In a particular aspect, a device includes one or more processors configured to obtain audio signals representing sound captured by at least three microphones and determine spatial audio data based on the audio signals. The one or more processors are further configured to determine a metric indicative of wind noise in the audio signals. The metric is based on a comparison of a first value and a second value, where the first value corresponds to an aggregate signal based on the spatial audio data and the second value corresponds to a differential signal based on the spatial audio data.
In a particular aspect, a method includes obtaining audio signals representing sound captured by at least three microphones and determining spatial audio data based on the audio signals. The method also includes determining a metric indicative of wind noise in the audio signals. The metric is based on a comparison of a first value and a second value, where the first value corresponds to an aggregate signal based on the spatial audio data and the second value corresponds to a differential signal based on the spatial audio data.
In a particular aspect, a device includes means for determining spatial audio data based on audio signals representing sound captured by at least three microphones. The device further includes means for determining a metric indicative of wind noise in the audio signals. The metric is based on a comparison of a first value and a second value, where the first value corresponds to an aggregate signal based on the spatial audio data and the second value corresponds to a differential signal based on the spatial audio data.
In a particular aspect, a non-transitory computer-readable storage medium stores instructions that are executable by one or more processors to cause the one or more processors to determine spatial audio data based on audio signals representing sound captured by at least three microphones. The instructions further cause the one or more processors to determine a metric indicative of wind noise in the audio signals. The metric is based on a comparison of a first value and a second value, where the first value corresponds to an aggregate signal based on the spatial audio data and the second value corresponds to a differential signal based on the spatial audio data.
Other aspects, advantages, and features of the present disclosure will become apparent after review of the entire application, including the following sections: Brief Description of the Drawings, Detailed Description, and the Claims.
Wind noise can be problematic for audio captured outdoors. Aspects disclosed herein enable detection of wind noise and reduction of wind noise in audio data, such as spatial audio data. In some aspects, wind noise is detected based on analysis of the spatial audio data. In some aspects, detected wind noise is mitigated or reduced by processing the spatial audio data. For example, particular channels of the spatial audio data may be de-emphasized. As another example, low-frequency components of the spatial audio data may be filtered out without degrading the audio and spatial quality of the capture.
In a particular aspect, a wind noise metric is determined based on a comparison of two values including a first value corresponding to an aggregate signal based on the spatial audio data and a second value corresponding to a differential signal based on the spatial audio data. In some implementations, the spatial audio data includes ambisonics data. For example, when the ambisonics data includes first order ambisonics, the ambisonics data may be encoded in a W-channel (including omnidirectional sound information), an X-channel (including differential sound information representing a front/back sound), a Y-channel (including differential sound information representing a left/right sound), and a Z-channel (including differential sound information representing a up/down sound). In this example, the aggregate signal corresponds to the omnidirectional sound information (e.g., the W-channel), and the differential signal corresponds to one of the directional channels (e.g., the X-channel, the Y-channel, or the Z-channel).
In some implementations, the spatial audio data includes two or more beamformed audio channels corresponding to beams offset by at least a threshold angle (e.g., 90 to 180 degrees). In such implementations, the aggregate signal corresponds to a sum based on two beams, and the differential signal corresponds to a difference based on the two beams.
A value of the metric indicates presence of wind noise and, when present, the extent of the wind noise. In some implementations, values of the metric in particular frequencies or frequency bands can be used to determine response actions used to reduce the wind noise. For example, band-specific values of the metric may be used to determine band-specific filter parameters used to reduce the wind noise. As another example, when a frequency-specific value of the metric exceeds a threshold, gain applied to one or more channels of audio data may be reduced to limit the wind noise.
Particular aspects of the present disclosure are described below with reference to the drawings. In the description, common features are designated by common reference numbers. As used herein, various terminology is used for the purpose of describing particular implementations only and is not intended to be limiting of implementations. For example, the singular forms “a,” “an,” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. Further, some features described herein are singular in some implementations and plural in other implementations. To illustrate,
The terms “comprise,” “comprises,” and “comprising” are used herein interchangeably with “include,” “includes,” or “including.” Additionally, the term “wherein” is used interchangeably with “where.” As used herein, “exemplary” indicates an example, an implementation, and/or an aspect, and should not be construed as limiting or as indicating a preference or a preferred implementation. As used herein, an ordinal term (e.g., “first,” “second,” “third,” etc.) used to modify an element, such as a structure, a component, an operation, etc., does not by itself indicate any priority or order of the element with respect to another element, but rather merely distinguishes the element from another element having a same name (but for use of the ordinal term). As used herein, the term “set” refers to one or more of a particular element, and the term “plurality” refers to multiple (e.g., two or more) of a particular element.
As used herein, “coupled” may include “communicatively coupled,” “electrically coupled,” or “physically coupled,” and may also (or alternatively) include any combinations thereof. Two devices (or components) may be coupled (e.g., communicatively coupled, electrically coupled, or physically coupled) directly or indirectly via one or more other devices, components, wires, buses, networks (e.g., a wired network, a wireless network, or a combination thereof), etc. Two devices (or components) that are electrically coupled may be included in the same device or in different devices and may be connected via electronics, one or more connectors, or inductive coupling, as illustrative, non-limiting examples. In some implementations, two devices (or components) that are communicatively coupled, such as in electrical communication, may send and receive electrical signals (digital signals or analog signals) directly or indirectly, such as via one or more wires, buses, networks, etc. As used herein, “directly coupled” refers to two devices that are coupled (e.g., communicatively coupled, electrically coupled, or physically coupled) without intervening components.
In the present disclosure, terms such as “determining,” “calculating,” “estimating,” “shifting,” “adjusting,” etc. may be used to describe how one or more operations are performed. It should be noted that such terms are not to be construed as limiting and other techniques may be utilized to perform similar operations. Additionally, as referred to herein, “generating,” “calculating,” “estimating,” “using,” “selecting,” “accessing,” and “determining” may be used interchangeably. For example, “generating,” “calculating,” “estimating,” or “determining” a parameter (or a signal) may refer to actively generating, estimating, calculating, or determining the parameter (or the signal) or may refer to using, selecting, or accessing the parameter (or signal) that is already generated, such as by another component or device.
In the example illustrated in
In
It is noted that while ambisonics coefficients of the first example 150 and the axes of the second example 160 each use X-, Y-, and Z-labels, the labels are the same due to labeling conventions and do not necessarily mean the same thing in the first example 150 and the second example 160. For example, as noted above, in B-format notation for first order ambisonics, the X-coefficient represents a difference between sound in front of the observer and sound behind the observer; whereas, in Cartesian coordinate notation, the X-axis merely indicates a direction and is observer independent. Accordingly, the X-, Y-, and Z-labels of the first and second examples 150, 160 are distinct and should not be confused.
In
When the spatial audio data 112 includes the two or more audio beams 164, 166, the aggregate signal may be determined as a sum of two audio beams, and the differential signal may be determined as a difference of the two audio beams. The two audio beams used to generate the aggregate signal and the differential signal are angularly offset from one another, such as by 90 degrees to 180 degrees. As a specific example of the second aspect, when spatial audio data 112 includes the two audio beams 164, 166, a value of the metric may be determined as a ratio of a sum of values of the two audio beams 164, 166 to a difference of the values of the two audio beams 164, 166.
In a particular aspect, the spatial-audio wind noise reduction processor 114 uses one or more values of the metric to configure filter parameters to remove at least a portion of the wind noise to generate reduced-wind-noise audio data 116. Additionally, or in the alternative, in some implementations, the spatial-audio wind noise reduction processor 114 detects wind noise by comparing values of the metric to one or more wind detection thresholds. In some such implementations, gain applied to one or more channels of the spatial audio data 112 is reduced when significant wind noise, represented by particular values of the metric, is detected.
In the example of
In some implementations, one or more of the components or operations illustrated in
In the example illustrated in
In
At metric calculation block 206, at least two channels of the frequency-domain spatial audio data 204 are used to calculate frequency-specific values of the metric (“frequency specific metric values” 210 in
Pt(f)=α*S(f)*conj(S(f))+(1−α)*Pt−1(f) Equation 1
where Pt(f) is signal power at time t and frequency f, α is a smoothing factor, S(f) is the complex power at frequency f and Pt−1(f) is signal power of the frequency at the prior time t−1. For a particular frequency and time sample, a frequency-specific metric value 210 is determined as a ratio of a power of the W-channel at the particular frequency and time sample to a power of one of the differential channels (e.g., the Y-channel, the X-channel, or the Z-channel) at the particular frequency and time sample. For example, when ambisonics coefficients are used to represent the spatial audio data 112, each frequency-specific value of the metric may represent an omnidirectional (e.g., W-channel) signal power at a particular frequency divided by differential (e.g., Y-channel) signal power at the particular frequency. In a particular aspect, the frequency-specific metric values 210 are determined for each frequency that is less than a threshold frequency 208. In this example, the metric indicates power for wind noise reduction, which corresponds to a gain that would be applied at the frequency to remove wind noise. Thus, in this example, higher values of the metric indicate that less of the signal is due to wind noise, and a lower value of the metric indicates that more of the signal is due to wind noise.
In a particular aspect, the frequency-specific metric values 210 are compared to one or more wind detection thresholds 214 at a conditional gain reduction block 212. In this aspect, a gain 216 applied to one or more channels of the audio data may be adjusted to reduce wind noise responsive to any of the frequency-specific metric values 210 satisfying (e.g., being less than or equal to) the wind detection threshold(s) 214. The wind detection threshold(s) 214 is a static or tunable value between 0 and 1.
In the example illustrated in
In a particular aspect, the frequency-specific metric values 210 are used to calculate band-specific metric values 238 at a band-specific metric calculation block 230. For example, the frequency-specific metric values 210 are grouped by frequency bands 232 and a weighted sum is used to calculate a band-specific metric value for each frequency band 232. In a particular implementation, the frequency bands 232 have a bandwidth of 500 Hertz (Hz). In other implementations, the frequency bands 232 are larger (e.g., 1000 Hz) or smaller (e.g., 250 Hz). In still other implementations, different frequency bands 232 may have different bandwidths.
In a particular implementation, a band-specific metric value 238 for a particular frequency band may be calculated using Equation 2:
Metricband=Σf_lowerf_upperMetric(f)wr_parameter Equation 2
Where Metricband is the band-specific metric value 238 for the frequency band between an upper frequency value (f_upper) and a lower frequency value (f_lower), Metric(f) is a frequency-specific value of the metric within the frequency band, and wr_parameter is a value of a wind-reduction parameter 234. The wind-reduction parameter 234 is a preconfigured or tunable value that affects how aggressively the device 200 reduces the wind noise, especially in lower frequency bands. For example, larger values of the wind-reduction parameter 234 result in more reduction in low frequency wind noise and smaller values of the wind-reduction parameter 234 result in less reduction in low frequency wind noise. As one example, a default value of 0.5 may be used for the wind-reduction parameter 234; however, the value of the wind-reduction parameter 234 may be tunable over a range of values, such as from 0.1 to 4 in a particular non-liming example.
In a particular aspect, the band-specific metric calculation block 230 may modify one or more of the frequency-specific metric values 210 before determining the band-specific metric values 238. For example, the band-specific metric calculation block 230 may compare each of the frequency-specific metric values 210 to an acceptance criterion 236. In this example, if a particular frequency-specific metric value 210 satisfies the acceptance criterion 236, the particular frequency-specific metric value 210 is determined to not represent wind noise. In this situation, the particular frequency-specific metric value 210 may be assigned a value of 1 to indicate that no wind noise is present. The acceptance criterion 236 is a pre-set or tunable value between 0 and 1. In a particular non-limiting example, the acceptance criterion 236 is between 0.6 and 0.9, and the acceptance criterion 236 is satisfied when a particular frequency-specific metric values 210 is greater than or equal to the acceptance criterion 236. To illustrate, if the acceptance criterion 236 has a value of 0.8, and the value of a particular frequency-specific metric value 210 is 0.82, the frequency-specific metric values 210 is assigned a frequency-specific metric value of 1 for purposes of determining the band-specific metric values 238.
The band-specific metric values 238 are shaped at the power shaping block 240. The shaping prevents a gain-adjusted power of a higher frequency band of the set of frequency bands from exceeding a gain-adjusted energy of a lower frequency band of the set of frequency bands. For example, the power shaping block 240 may use logic such as:
If Metricband(Bandk)*E(Bandk,W)<Metricband(Bandk+1)*E(Bandk+1,W);
then Metricband(Bandk)=Metricband(Bandk+1)*E(Bandk+1,W)/E(Bandk,W)
where Bandk indicates a particular frequency band, Bankk+1 indicates the next higher frequency band, E(Bandk, W) is the energy of the kth frequency band in the W-channel, and E(Bandk+1, W) is the energy of the k+1th frequency band in the W-channel, where the energy of each band in the W-channel is determined based on the frequency-domain spatial audio data 204.
The power shaped band-specific metric values 238 are used as filter parameters 242 for a filter bank 244. The filter bank 244 modifies the frequency-domain spatial audio data 204 to generate filtered frequency-domain spatial audio data 246. For example, the filter bank 244 may determine the frequency-domain spatial audio data 246 for each frequency and channel using Equation 3:
Output(f)=S(f)*Σn=1NMetric(Bandn)*H_n(f) Equation 3
where Output(f) is the frequency-domain spatial audio data 246 for a particular frequency (f) and channel, S(f) is the frequency-domain spatial audio data 204 for the particular frequency (f) and channel, Bandn is the particular band of the frequency bands 232 in which the particular frequency (f) falls, Metric(Bandn) is the power shaped band specific metric for Bandn of the particular channel, and H_n(f) is a transfer function for the particular frequency (f) and channel.
In
In some implementations, the reduced-wind-noise audio data 116 is provided to other components, such as the spatial audio converter 118 of
In
At metric calculation block 306, at least two channels of the frequency-domain spatial audio data 304 are used to calculate frequency-specific values of the metric (“frequency specific metric values” 310 in
where Pt is the signal power of time sample t for a particular beam, B(θ,f) represents the components of beam 164 corresponding to frequency f, and B(π,f) represents the components of beam 166 corresponding to frequency f.
In a particular aspect, the frequency-specific metric values 310 are determined for each frequency that is less than a threshold frequency 308. As in
In a particular aspect, the frequency-specific metric values 310 are compared to one or more wind detection thresholds 314 at a conditional gain reduction block 312. In this aspect, a gain 316 applied to one or more channels of the audio data may be adjusted to reduce wind noise responsive to any of the frequency-specific metric values 310 satisfying (e.g., being less than or equal to) the wind detection threshold(s) 314. The wind detection threshold(s) 314 is a static or tunable value between 0 and 1.
In the example illustrated in
In a particular aspect, the frequency-specific metric values 310 are used to calculate band-specific metric values 338 at a band-specific metric calculation block 330. For example, the frequency-specific metric values 310 are grouped by frequency bands 332 and a weighted sum is used to calculate a band-specific metric value for each frequency band 332. In a particular implementation, the frequency bands 332 have a bandwidth of 500 Hz. In other implementations, the frequency bands 232 are larger (e.g., 1000 Hz) or smaller (e.g., 250 Hz). In still other implementations, different frequency bands 332 may have different bandwidths.
In a particular implementation, a band-specific metric value 338 for a particular frequency band may be calculated using Equation 2, above. The wind-reduction parameter 334 is a preconfigured or tunable value that affects how aggressively the device 300 reduced the wind noise, especially in lower frequency bands. For example, larger values of the wind-reduction parameter 334 will result in more reduction in low frequency wind noise and smaller values of the wind-reduction parameter 334 will result in less reduction in low frequency wind noise. As one example, a default value of 0.5 may be used for the wind-reduction parameter 334; however, the value of the wind-reduction parameter 334 may be tunable over a range of values, such as from 0.1 to 4 in a particular non-liming example.
In a particular aspect, the band-specific metric calculation block 330 may modify one or more of the frequency-specific metric values 310 before determining the band-specific metric values 338. For example, the band-specific metric calculation block 330 may compare each of the frequency-specific metric values 310 to an acceptance criterion 336. In this example, if a particular frequency-specific metric value 310 satisfies the acceptance criterion 336, the particular frequency-specific metric value 210 is determined to not represent wind noise. In this situation, the particular frequency-specific metric value 310 may be assigned a value of 1 to indicate that no wind noise is present. The acceptance criterion 336 is a pre-set or tunable value between 0 and 1. In a particular non-limiting example, the acceptance criterion 336 is between 0.6 and 0.9, and the acceptance criterion 336 is satisfied when a particular frequency-specific metric values 310 is greater than or equal to the acceptance criterion 336. To illustrate, if the acceptance criterion 336 has a value of 0.8, and the value of a particular frequency-specific metric value 310 is 0.82, the frequency-specific metric values 310 is assigned a frequency-specific metric value of 1 for purposes of determining the band-specific metric values 338.
The band-specific metric values 338 are shaped at the power shaping block 340. The shaping ensures that the power in lower frequency bands is greater than or equal to the power in higher frequency bands after modification of each frequency band based on the band-specific metric value 338 associated with the frequency band. For example, the power shaping block 340 may the logic such as.
If Metricband(Bandk)*E(Bandk,(B(θ)+B(π)))<Metricband(Bandk+1)*E(Bandk+1,(B(θ)+B(π)));
then Metricband(Bandk)=Metricband(Bandk+1)*E(Bandk+1,(B(θ)+B(π)))/E(Bandk,(B(θ)+B(π)))
where Bandk indicates a particular frequency band, Bankk+1 indicates the next higher frequency band, E(Bandk, (B(θ)+B(π))) is the sum of the energy of the kth frequency band of the θ and π beams, and E(Bandk+1, W) is the sum of the energy of the k+1th frequency band of the θ and π beams, where the energy of each beam is determined based on the frequency-domain spatial audio data 304.
The power shaped band-specific metric values 338 are used as filter parameters 342 for a filter bank 344. The filter bank 344 modifies the frequency-domain spatial audio data 304 to generate filtered frequency-domain spatial audio data 346. For example, the filter bank 344 may determine the frequency-domain spatial audio data 346 for each frequency and channel using Equation 3, above.
In
In some implementations, the reduced-wind-noise audio data 116 is provided to other components, such as the spatial audio converter 118 of
In the graph 400, the ambisonics channels include a W-channel 402, a Y-channel 404, a Z-channel 406, and an X-channel 408, and the wind conditions include no wind, a 3 mile per hour (mph) wind, a 6 mph wind, and a 12 mph wind. The graph 400 shows detectable sound levels in all of the channels with a 6 mph wind and a significant increase in sound levels with a 12 mph wind. As illustrated in the graph 400, the sound levels in the Z-channel 406 and the X-channel 408 increase between the 6 mph wind and the 12 mph wind more than the sound levels for the W-channel 402 and the Y-channel 404 do.
The graph 450 shows ambisonics channels including a W-channel 452, a Y-channel 454, a Z-channel 456, and an X-channel 458 for the same wind conditions as illustrated in graph 400, but with wind-noise reduction applied. For the graph 450, the wind reduction includes both filtering (e.g., using the filter bank 244 of
In the graph 500, a first channel 502 corresponds to a first beam and a second channel 504 corresponds to a second beam. To generate the graph 500, the two beams were set 180 degrees apart from one another. To illustrate, the angle 168 of
The graph 550 shows a first channel 552 corresponding to the first channel 502 with wind noise reduction applied, and a second channel 554 corresponding to the second channel 504 with wind noise reduction applied. For the graph 450, the wind reduction includes filtering (e.g., using the filter bank 344 of
During operation, the mobile device 800 may perform particular actions in response to detecting wind noise. For example, the actions can include filtering one or more channels of spatial audio data to reduce wind noise in captured audio. As another example, the actions can include adjusting a gain applied to one or more channels of spatial audio data to reduce wind noise in captured audio.
The earbuds 900 include the microphones 102A, 102B, and 102N, at least one of which is positioned to primarily capture speech of a user. The earbuds 900 may also include one or more additional microphones positioned to primarily capture environmental sounds (e.g., for noise canceling operations).
In a particular aspect, during operation, the earbuds 900 may perform particular actions in response to detecting wind noise. For example, the actions can include filtering one or more channels of spatial audio data to reduce wind noise in captured audio. As another example, the actions can include adjusting a gain applied to one or more channels of spatial audio data to reduce wind noise in captured audio.
In a particular aspect, during operation, the headset 1000 may perform particular actions in response to detecting wind noise. For example, the actions can include filtering one or more channels of spatial audio data to reduce wind noise in captured audio. As another example, the actions can include adjusting a gain applied to one or more channels of spatial audio data to reduce wind noise in captured audio.
In a particular aspect, during operation, the wearable electronic device 1100 may perform particular actions in response to detecting wind noise. For example, the actions can include filtering one or more channels of spatial audio data to reduce wind noise in captured audio. As another example, the actions can include adjusting a gain applied to one or more channels of spatial audio data to reduce wind noise in captured audio.
In a particular aspect, during operation, the voice-controlled speaker system 1200 may perform particular actions in response to detecting wind noise. For example, the actions can include filtering one or more channels of spatial audio data to reduce wind noise in captured audio. As another example, the actions can include adjusting a gain applied to one or more channels of spatial audio data to reduce wind noise in captured audio.
In a particular aspect, during operation, the camera 1300 may perform particular actions in response to detecting wind noise. For example, the actions can include filtering one or more channels of spatial audio data to reduce wind noise in captured audio. As another example, the actions can include adjusting a gain applied to one or more channels of spatial audio data to reduce wind noise in captured audio.
In a particular aspect, during operation, the headset 1400 may perform particular actions in response to detecting wind noise. For example, the actions can include filtering one or more channels of spatial audio data to reduce wind noise in captured audio. As another example, the actions can include adjusting a gain applied to one or more channels of spatial audio data to reduce wind noise in captured audio.
The control system 1502 controls various operations of the aerial device 1500, such as cargo release, sensor activation, take-off, navigation, landing, or combinations thereof. For example, the control system 1502 may control flight of the aerial device 1500 between specified points and deployment of cargo at a particular location. In a particular aspect, the control system 1502 performs one or more action responsive to detecting wind noise. For example, the actions can include filtering one or more channels of spatial audio data to reduce wind noise in captured audio. As another example, the actions can include adjusting a gain applied to one or more channels of spatial audio data to reduce wind noise in captured audio.
In a particular implementations, the sensor(s) include also include vehicle occupancy sensors, eye tracking sensor, or external environment sensors (e.g., lidar sensors or cameras). In a particular aspect, sensor data from one or more sensors indicates a location of the user. For example, the sensors are associated with various locations within the vehicle 1600.
In a particular aspect, the vehicle 1600 performs one or more action responsive to detecting wind noise. For example, the actions can include filtering one or more channels of spatial audio data to reduce wind noise in captured audio. As another example, the actions can include adjusting a gain applied to one or more channels of spatial audio data to reduce wind noise in captured audio.
The method 1700 includes, at block 1702, obtaining audio signals representing sound captured by at least three microphones. For example, the device 100 of
The method 1700 includes, at block 1704, determining spatial audio data based on the audio signals. For example, the spatial audio converter 110 may generate the spatial audio data 112 based on the audio data 104 using ambisonics processing or beamforming.
The method 1700 includes, at block 1706, determining a metric indicative of wind noise in the audio signals. The metric is based on a comparison of a first value and a second value, where the first value corresponds to an aggregate signal based on the spatial audio data and the second value corresponds to a differential signal based on the spatial audio data. For example, when the spatial audio data 112 includes ambisonics coefficients, the metric may be determined as a ratio of signal power of the W-channel for a particular frequency and time frame to a signal power of one of the differential channels (e.g., the X-, Y-, or Z-channel) for the particular frequency and time frame. As another example, when the spatial audio data includes two or more beams, the metric may be determined as a ratio of a sum of the signal power of two beams for a particular frequency and time frame and a difference of the signal power of the two beams for the particular frequency and time frame.
The method 1800 includes, at block 1802, obtaining audio signals representing sound captured by at least three microphones. For example, the device 100 of
The method 1800 includes, at block 1804, determining spatial audio data based on the audio signal. For example, the spatial audio converter 110 may generate the spatial audio data 112 based on the audio data 104 using ambisonics processing or beamforming.
The method 1800 includes, at block 1806, determining a metric indicative of wind noise in the audio signals. The metric is based on a comparison of a first value and a second value, where the first value corresponds to an aggregate signal based on the spatial audio data and the second value corresponds to a differential signal based on the spatial audio data. The metric is based on a comparison of a first value and a second value, where the first value corresponds to an aggregate signal based on the spatial audio data and the second value corresponds to a differential signal based on the spatial audio data. For example, when the spatial audio data 112 includes ambisonics coefficients, the metric may be determined as a ratio of signal power of the W-channel for a particular frequency and time frame to a signal power of one of the differential channels (e.g., the X-, Y-, or Z-channel) for the particular frequency and time frame. As another example, when the spatial audio data includes two or more beams, the metric may be determined as a ratio of a sum of the signal power of two beams for a particular frequency and time frame and a difference of the signal power of the two beams for the particular frequency and time frame.
The method 1800 includes, at block 1808, modifying the spatial audio data based on the metric to generate reduced-wind-noise audio data. For example, filter parameters (such as the filter parameters 242 of
The method 1900 includes, at block 1902, obtaining audio signals representing sound captured by at least three microphones. For example, the device 100 of
The method 1900 includes, at block 1904, determining spatial audio data based on the audio signal. For example, the spatial audio converter 110 may generate the spatial audio data 112 based on the audio data 104 using ambisonics processing or beamforming.
The method 1900 includes, at block 1906, determining a metric indicative of wind noise in the audio signals. The metric is based on a comparison of a first value and a second value, where the first value corresponds to an aggregate signal based on the spatial audio data and the second value corresponds to a differential signal based on the spatial audio data. The metric is based on a comparison of a first value and a second value, where the first value corresponds to an aggregate signal based on the spatial audio data and the second value corresponds to a differential signal based on the spatial audio data. For example, when the spatial audio data 112 includes ambisonics coefficients, the metric may be determined as a ratio of signal power of the W-channel for a particular frequency and time frame to a signal power of one of the differential channels (e.g., the X-, Y-, or Z-channel) for the particular frequency and time frame. As another example, when the spatial audio data includes two or more beams, the metric may be determined as a ratio of a sum of the signal power of two beams for a particular frequency and time frame and a difference of the signal power of the two beams for the particular frequency and time frame.
The method 1900 includes, at block 1908, reducing a gain applied to one or more spatial audio channels based on a determination that at least one of the frequency-specific values satisfies a wind detection criterion. For example, the conditional gain reduction block 212 of
The method 2000 includes, at block 2002, obtaining audio signals representing sound captured by at least three microphones. For example, the device 100 of
The method 2000 includes, at block 2004, processing the audio signals to remove high frequency wind noise. For example, the wind turbulence noise reduction engine 106 of
The method 2000 includes, at block 2006, determining spatial audio data based on the audio signal. For example, the spatial audio converter 110 of
The method 2000 includes, at block 2008, determining, for a set of frequencies, frequency-specific values of a metric indicative of wind noise in the audio signals. For example, the frequency-specific metric values 210 may be calculated by the metric calculation block 206 of
The method 2000 includes, at block 2010, for each frequency band of a set of frequency bands, determining a band-specific value of the metric. For example, the band-specific metric values 238 may be calculated by the band-specific metric calculation block 230 of
The method 2000 includes, at block 2012, modifying band-specific value of the metric that satisfy acceptance criterion. For example, the band-specific metric calculation block 230 of
The method 2000 includes, at block 2014, applying power shaping to the band-specific values of the metric. For example, the power shaping block 240 of
The method 2000 includes, at block 2016, determining filter parameters based on the band-specific values of the metric. For example, the filter parameters 242 of
The method 2000 includes, at block 2018, filtering the spatial audio data using the filter parameters to generate reduced-wind-noise audio data. For example, the filter bank 244 of
The method 2000 includes, at block 2020, determining whether any frequency-specific values of the metric satisfies a wind detection criterion. For example, the conditional gain reduction block 212 may compare each of the frequency-specific metric values 210 to the wind detection threshold 214, or the conditional gain reduction block 312 may compare each of the frequency-specific metric values 310 to the wind detection threshold 314.
The method 2000 includes, at block 2022, based on a determination that at least one of the frequency-specific values of the metric satisfies a wind detection criterion, reducing a gain applied to one or more spatial audio channels. For example, the amplifiers 220, 226 may apply the gain(s) 216 to one or more channels of the spatial audio data to reduce wind noise. As another example, the amplifiers 320, 326 may apply the gain(s) 316 to one or more channels of the spatial audio data to reduce wind noise.
The method 2000 includes, at block 2024, generating binaural audio output based on the reduced-wind-noise audio data and performing ambient noise suppression of the binaural audio output. In the implementation illustrated in
Referring to
In a particular aspect, the device 2100 includes a processor 2104 (e.g., a central processing unit (CPU)). The device 2100 may include one or more additional processors 2106 (e.g., one or more digital signal processors (DSPs)). The processor 2104 or the processors 2106 may include or execute instructions 2116 from a memory 2114 to initiate, control or perform operations of the wind turbulence noise reduction engine 106, the spatial audio converter 110, the spatial-audio wind noise reduction processor 114, the spatial audio converter 118, the ambient noise suppressor 122, or a combination thereof.
The device 2100 may include a modem 2130 coupled to a transceiver 2132 and an antenna 2122. The transceiver 2132 may include a receiver, a transmitter, or both. The processor 2104, the processors 2106, or both, are coupled via the modem 2130 to the transceiver 2132.
The device 2100 may include a display 2140 coupled to a display controller 2118. The speaker(s) 126 and the microphones 102 may be coupled, via one or more interfaces, to a CODEC 2108. The CODEC 2108 may include a digital-to-analog converter (DAC) 2110 and an analog-to-digital converter (ADC) 2112.
The memory 2114 may store the instructions 2116, which are executable by the processor 2104, the processors 2106, another processing unit of the device 2100, or a combination thereof, to perform one or more operations described with reference to
One or more components of the device 2100 may be implemented via dedicated hardware (e.g., circuitry), by a processor (e.g., the processor 2104 or the processors 2106) executing the instructions 2116 to perform one or more tasks, or a combination thereof. As an example, the memory 2114 may include or correspond to a memory device (e.g., a computer-readable storage device), such as a random access memory (RAM), magnetoresistive random access memory (MRAM), spin-torque transfer MRAM (STT-MRAM), flash memory, read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), registers, hard disk, a removable disk, or a compact disc read-only memory (CD-ROM). The memory device may include (e.g., store) instructions (e.g., the instructions 2116) that, when executed by a computer (e.g., one or more processors, such the processor 2104 and/or the processors 2106), may cause the computer to perform one or more operations described with reference to
In a particular aspect, the device 2100 may be included in a system-in-package or system-on-chip device 2102. In a particular aspect, the processor 2104, the processors 2106, the display controller 2118, the memory 2114, the CODEC 2108, the modem 2130, and the transceiver 2132 are included in the system-in-package or system-on-chip device 2102. In a particular aspect, an input device 2124, such as a touchscreen and/or keypad, and a power supply 2120 are coupled to the system-in-package or system-on-chip device 2102. Moreover, in a particular aspect, as illustrated in
The device 2100 may include a wireless telephone, a mobile communication device, a mobile device, a mobile phone, a smart phone, a cellular phone, a virtual reality headset, an augmented reality headset, a mixed reality headset, a vehicle (e.g., a car), a laptop computer, a desktop computer, a computer, a tablet computer, a set top box, a personal digital assistant (PDA), a display device, a television, a gaming console, a music player, a radio, a video player, an entertainment unit, a communication device, a fixed location data unit, a personal media player, a digital video player, a digital video disc (DVD) player, a tuner, a camera, a navigation device, earbuds, an audio headset (e.g., headphones), or any combination thereof.
It should be noted that various functions performed by the one or more components of the systems described with reference to
In conjunction with the described implementations, an apparatus includes means for determining spatial audio data based on audio signals representing sound captured by at least three microphones. For example, the means for determining spatial audio data includes the device 100, the spatial audio converter 110, the integrated circuit 602, the processor(s) 608, the device 2100, the processor 2104, the processor(s) 2106, one or more other circuits or components configured to determine spatial audio data, or any combination thereof.
The apparatus also includes means for determining a metric indicative of wind noise in the audio signals, where the metric is based on a comparison of a first value and a second value, where the first value corresponds to an aggregate signal based on the spatial audio data and the second value corresponds to a differential signal based on the spatial audio data. For example, the means for determining the metric includes the device 100, the spatial-audio wind noise reduction processor 114, the device 200, the device 300, the integrated circuit 602, the processor(s) 608, the integrated circuit 702, the processor(s) 708, the device 2100, the processor 2104, the processor(s) 2106, one or more other circuits or components configured to determine the metric, or any combination thereof.
In some implementations, the apparatus also includes means for modifying the spatial audio data based on the metric to generate reduced-wind-noise audio data. For example, the means for modifying the spatial audio data includes the device 100, the spatial-audio wind noise reduction processor 114, the device 200, the device 300, the integrated circuit 602, the processor(s) 608, the integrated circuit 702, the processor(s) 708, the device 2100, the processor 2104, the processor(s) 2106, one or more other circuits or components configured to modify the spatial audio data, or any combination thereof.
Those of skill would further appreciate that the various illustrative logical blocks, configurations, modules, circuits, and algorithm steps described in connection with the implementations disclosed herein may be implemented as electronic hardware, computer software executed by a processor, or combinations of both. Various illustrative components, blocks, configurations, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or processor executable instructions depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, such implementation decisions are not to be interpreted as causing a departure from the scope of the present disclosure.
The steps of a method or algorithm described in connection with the implementations disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in random access memory (RAM), flash memory, read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), registers, hard disk, a removable disk, a compact disc read-only memory (CD-ROM), or any other form of non-transient storage medium known in the art. An exemplary storage medium is coupled to the processor such that the processor may read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an application-specific integrated circuit (ASIC). The ASIC may reside in a computing device or a user terminal. In the alternative, the processor and the storage medium may reside as discrete components in a computing device or user terminal.
Particular aspects of the disclosure are described below in a first set of interrelated clauses:
According to Clause 1 a device includes one or more processors configured to: obtain audio signals representing sound captured by at least three microphones; determine spatial audio data based on the audio signals; and determine a metric indicative of wind noise in the audio signals, the metric based on a comparison of a first value and a second value, where the first value corresponds to an aggregate signal based on the spatial audio data and the second value corresponds to a differential signal based on the spatial audio data.
Clause 2 includes the device of Clause 1 where the one or more processors are further configured to modify the spatial audio data based on the metric to generate reduced-wind-noise audio data.
Clause 3 includes the device of Clause 2 where the one or more processors are further configured to generate binaural audio output based on the reduced-wind-noise audio data and to perform ambient noise suppression of the binaural audio output.
Clause 4 includes the device of Clause 2 where modifying the spatial audio data based on the metric to generate the reduced-wind-noise audio data comprises filtering the spatial audio data using filter parameters based on the metric to reduce low frequency noise associated with wind.
Clause 5 includes the device of Clause 2 where modifying the spatial audio data based on the metric to generate the reduced-wind-noise audio data comprises reducing a gain applied to one or more spatial audio channels of the spatial audio data.
Clause 6 includes the device of any of Clauses 1 to 5 where determining the spatial audio data based on the audio signals comprises spatially filtering the audio signals to generate multiple beamformed audio channels.
Clause 7 includes the device of Clause 6 where the aggregate signal is based on signal power of a sum of multiple angularly offset beamformed audio channels of the multiple beamformed audio channels and the differential signal is based on signal power of a difference of the multiple angularly offset beamformed audio channels.
Clause 8 includes the device of Clause 7 where the multiple angularly offset beamformed audio channels are angularly offset by at least 90 degrees.
Clause 9 includes the device of any of Clauses 1 to 8 where determining the spatial audio data based on the audio signals comprises determining ambisonics coefficients based on the audio signals to generate multiple ambisonics channels.
Clause 10 includes the device of Clause 9 where the aggregate signal is based on signal power of an omnidirectional ambisonics channel of the multiple ambisonics channels and the differential signal is based on signal power of a directional ambisonics channel of the multiple ambisonics channels.
Clause 11 includes the device of any of Clauses 1 to 10 where the metric indicative of wind noise in the audio signals is determined for one or more frequency bands that are less than a threshold frequency.
Clause 12 includes the device of any of Clauses 1 to 11 where determining the metric indicative of wind noise in the audio signals comprises determining frequency-specific values of the metric for a set of frequencies, and where the one or more processors are further configured to cause a gain applied to one or more spatial audio channels to be reduced based on a determination that at least one of the frequency-specific values satisfies a wind detection criterion.
Clause 13 includes the device of Clause 12 where the one or more processors are configured to cause the gain to be reduced gradually over multiple frames of the spatial audio data associated with the one or more spatial audio channels.
Clause 14 includes the device of Clause 12 where the one or more spatial audio channels to which the gain is applied correspond to a front-to-back direction and an up-and-down direction, and where applying the gain reduces low-band audio corresponding the front-to-back direction and the up-and-down direction during playback.
Clause 15 includes the device of any of Clauses 1 to 14 where determining the metric indicative of wind noise in the audio signals comprises, for each frequency band of a set of frequency bands, determining a band-specific value of the metric.
Clause 16 includes the device of Clause 15 where the one or more processors are further configured to modify a particular band-specific value of the metric for a particular frequency band based on determining that the particular band-specific value of the metric satisfies an acceptance criterion.
Clause 17 includes the device of Clause 15 where the one or more processors are further configured to apply a wind-reduction parameter to multiple frequency-specific values of the metric to determine the band-specific value of the metric.
Clause 18 includes the device of Clause 15 where the one or more processors are further configured to adjust one or more of the band-specific values of the metric to prevent a gain-adjusted power of a higher frequency band of the set of frequency bands from exceeding a gain-adjusted energy of a lower frequency band of the set of frequency bands.
Clause 19 includes the device of Clause 15 where the one or more processors are further configured to filter the spatial audio data using filter parameters based on the metric to generate reduced-wind-noise audio data.
Clause 20 includes the device of any of Clauses 1 to 19 where the one or more processors are further configured to, before determining the spatial audio data, process the audio signals to remove high frequency wind noise.
Clause 21 includes the device of any of Clauses 1 to 20 and further includes the at least three microphones, where at least two microphones of the at least three microphones are spaced at least 0.5 centimeters apart.
Clause 22 includes the device of any of Clauses 1 to 21 and further includes the at least three microphones, where at least two microphones of the at least three microphones are spaced at least 2 centimeters apart.
Clause 23 includes the device of any of Clauses 1 to 22 where the one or more processors are integrated within a mobile communication device.
Clause 24 includes the device of any of Clauses 1 to 23 where the one or more processors are integrated within a vehicle.
Clause 25 includes the device of any of Clauses 1 to 24 where the one or more processors are integrated within one or more of an augmented reality headset, a mixed reality headset, a virtual reality headset, or a wearable device.
Clause 26 includes the device of any of Clauses 1 to 25 where the one or more processors are included in an integrated circuit.
According to Clause 27 a method includes obtaining audio signals representing sound captured by at least three microphones; determining spatial audio data based on the audio signals; and determining a metric indicative of wind noise in the audio signals, the metric based on a comparison of a first value and a second value, where the first value corresponds to an aggregate signal based on the spatial audio data and the second value corresponds to a differential signal based on the spatial audio data.
Clause 28 includes the method of Clause 27 and further includes modifying the spatial audio data based on the metric to generate reduced-wind-noise audio data.
Clause 29 includes the method of Clause 28 and further includes generating binaural audio output based on the reduced-wind-noise audio data and performing ambient noise suppression of the binaural audio output.
Clause 30 includes the method of Clause 28 where modifying the spatial audio data based on the metric to generate the reduced-wind-noise audio data comprises filtering the spatial audio data using filter parameters based on the metric to reduce low frequency noise associated with wind.
Clause 31 includes the method of Clause 28 where modifying the spatial audio data based on the metric to generate the reduced-wind-noise audio data comprises reducing a gain applied to one or more spatial audio channels of the spatial audio data.
Clause 32 includes the method of any of Clauses 27 to 31 where determining the spatial audio data based on the audio signals comprises spatially filtering the audio signals to generate multiple beamformed audio channels.
Clause 33 includes the method of Clause 32 where the aggregate signal is based on signal power of a sum of multiple angularly offset beamformed audio channels of the multiple beamformed audio channels and the differential signal is based on signal power of a difference of the multiple angularly offset beamformed audio channels.
Clause 34 includes the method of Clause 33 where the multiple angularly offset beamformed audio channels are angularly offset by at least 90 degrees.
Clause 35 includes the method of any of Clauses 27 to 34 where determining the spatial audio data based on the audio signals comprises determining ambisonics coefficients based on the audio signals to generate multiple ambisonics channels.
Clause 36 includes the method of Clause 35 where the aggregate signal is based on signal power of an omnidirectional ambisonics channel of the multiple ambisonics channels and the differential signal is based on signal power of a directional ambisonics channel of the multiple ambisonics channels.
Clause 37 includes the method of any of Clauses 27 to 36 where the metric indicative of wind noise in the audio signals is determined for one or more frequency bands that are less than a threshold frequency.
Clause 38 includes the method of any of Clauses 27 to 37 where determining the metric indicative of wind noise in the audio signals comprises determining frequency-specific values of the metric for a set of frequencies, and further comprising reducing a gain applied to one or more spatial audio channels based on a determination that at least one of the frequency-specific values satisfies a wind detection criterion.
Clause 39 includes the method of Clause 38 where the gain is reduced gradually over multiple frames of the spatial audio data associated with the one or more spatial audio channels.
Clause 40 includes the method of Clause 38 where the one or more spatial audio channels to which the gain is applied correspond to a front-to-back direction and an up-and-down direction, and where applying the gain reduces low-band audio corresponding the front-to-back direction and the up-and-down direction during playback.
Clause 41 includes the method of any of Clauses 27 to 40 where determining the metric indicative of wind noise in the audio signals comprises, for each frequency band of a set of frequency bands, determining a band-specific value of the metric.
Clause 42 includes the method of Clause 41 and further includes modifying a particular band-specific value of the metric for a particular frequency band based on determining that the particular band-specific value of the metric satisfies an acceptance criterion.
Clause 43 includes the method of Clause 41 and further includes applying a wind-reduction parameter to multiple frequency-specific values of the metric to determine the band-specific value of the metric.
Clause 44 includes the method of Clause 41 and further includes adjusting one or more of the band-specific values of the metric to prevent a gain-adjusted power of a higher frequency band of the set of frequency bands from exceeding a gain-adjusted energy of a lower frequency band of the set of frequency bands.
Clause 45 includes the method of Clause 41 and further includes filtering the spatial audio data using filter parameters based on the metric to generate reduced-wind-noise audio data.
Clause 46 includes the method of any of Clauses 27 to 45 and further includes, before determining the spatial audio data, processing the audio signals to remove high frequency wind noise.
Clause 47 includes the method of any of Clauses 27 to 46 where at least two microphones of the at least three microphones are spaced at least 0.5 centimeters apart.
Clause 48 includes the method of any of Clauses 27 to 47 where at least two microphones of the at least three microphones are spaced at least 2 centimeters apart.
According to Clause 49 a device includes means for determining spatial audio data based on audio signals representing sound captured by at least three microphones and means for determining a metric indicative of wind noise in the audio signals, the metric based on a comparison of a first value and a second value, where the first value corresponds to an aggregate signal based on the spatial audio data and the second value corresponds to a differential signal based on the spatial audio data.
Clause 50 includes the device of Clause 49 and further includes means for modifying the spatial audio data based on the metric to generate reduced-wind-noise audio data.
Clause 51 includes the device of Clause 50 and further includes means for generating binaural audio output based on the reduced-wind-noise audio data and further comprising means for performing ambient noise suppression of the binaural audio output.
Clause 52 includes the device of Clause 50 where modifying the spatial audio data based on the metric to generate the reduced-wind-noise audio data comprises filtering the spatial audio data using filter parameters based on the metric to reduce low frequency noise associated with wind.
Clause 53 includes the device of Clause 50 where modifying the spatial audio data based on the metric to generate the reduced-wind-noise audio data comprises reducing a gain applied to one or more spatial audio channels of the spatial audio data.
Clause 54 includes the device of any of Clauses 49 to 53 where determining the spatial audio data based on the audio signals comprises spatially filtering the audio signals to generate multiple beamformed audio channels.
Clause 55 includes the device of Clause 54 where the aggregate signal is based on signal power of a sum of multiple angularly offset beamformed audio channels of the multiple beamformed audio channels and the differential signal is based on signal power of a difference of the multiple angularly offset beamformed audio channels.
Clause 56 includes the device of Clause 55 where the multiple angularly offset beamformed audio channels are angularly offset by at least 90 degrees.
Clause 57 includes the device of any of Clauses 49 to 56 where determining the spatial audio data based on the audio signals comprises determining ambisonics coefficients based on the audio signals to generate multiple ambisonics channels.
Clause 58 includes the device of Clause 57 where the aggregate signal is based on signal power of an omnidirectional ambisonics channel of the multiple ambisonics channels and the differential signal is based on signal power of a directional ambisonics channel of the multiple ambisonics channels.
Clause 59 includes the device of any of Clauses 49 to 58 where the metric indicative of wind noise in the audio signals is determined for one or more frequency bands that are less than a threshold frequency.
Clause 60 includes the device of any of Clauses 49 to 59 where determining the metric indicative of wind noise in the audio signals comprises determining frequency-specific values of the metric for a set of frequencies, and further include means for reducing a gain applied to one or more spatial audio channels based on a determination that at least one of the frequency-specific values satisfies a wind detection criterion.
Clause 61 includes the device of Clause 60 where the means for reducing the gain is configured to reduce the gain gradually over multiple frames of the spatial audio data associated with the one or more spatial audio channels.
Clause 62 includes the device of Clause 60 where the one or more spatial audio channels to which the gain is applied correspond to a front-to-back direction and an up-and-down direction, and where applying the gain reduces low-band audio corresponding the front-to-back direction and the up-and-down direction during playback.
Clause 63 includes the device of any of Clauses 49 to 62 where determining the metric indicative of wind noise in the audio signals comprises, for each frequency band of a set of frequency bands, determining a band-specific value of the metric.
Clause 64 includes the device of Clause 63 and further includes means for modifying a particular band-specific value of the metric for a particular frequency band based on determining that the particular band-specific value of the metric satisfies an acceptance criterion.
Clause 65 includes the device of Clause 63 and further includes means for applying a wind-reduction parameter to multiple frequency-specific values of the metric to determine the band-specific value of the metric.
Clause 66 includes the device of Clause 63 and further includes means for adjusting one or more of the band-specific values of the metric to prevent a gain-adjusted power of a higher frequency band of the set of frequency bands from exceeding a gain-adjusted energy of a lower frequency band of the set of frequency bands.
Clause 67 includes the device of Clause 63 and further includes means for filtering the spatial audio data using filter parameters based on the metric to generate reduced-wind-noise audio data.
Clause 68 includes the device of any of Clauses 49 to 67 and further includes means for processing the audio signals to remove high frequency wind noise before determining the spatial audio data.
Clause 69 includes the device of any of Clauses 49 to 68 and further includes the at least three microphones, where at least two microphones of the at least three microphones are spaced at least 0.5 centimeters apart.
Clause 70 includes the device of any of Clauses 49 to 69 and further includes the at least three microphones, where at least two microphones of the at least three microphones are spaced at least 2 centimeters apart.
Clause 71 includes the device of any of Clauses 49 to 70 where the means for determining the spatial audio data and the means for determining the metric are integrated within a mobile computing device.
Clause 72 includes the device of any of Clauses 49 to 71 where the means for determining the spatial audio data and the means for determining the metric are integrated within a vehicle.
Clause 73 includes the device of any of Clauses 49 to 72 where the means for determining the spatial audio data and the means for determining the metric are integrated within one or more of an augmented reality headset, a mixed reality headset, a virtual reality headset, or a wearable device.
Clause 74 includes the device of any of Clauses 49 to 73 where the means for determining the spatial audio data and the means for determining the metric are included in an integrated circuit.
According to Clause 75 a computer-readable storage device stores instructions that are executable by one or more processors to cause the one or more processors to determine spatial audio data based on audio signals representing sound captured by at least three microphones and to determine a metric indicative of wind noise in the audio signals, the metric based on a comparison of a first value and a second value, where the first value corresponds to an aggregate signal based on the spatial audio data and the second value corresponds to a differential signal based on the spatial audio data.
Clause 76 includes the computer-readable storage device of Clause 75 where the instructions are further executable to modify the spatial audio data based on the metric to generate reduced-wind-noise audio data.
Clause 77 includes the computer-readable storage device of Clause 76 where the instructions are further executable to generate binaural audio output based on the reduced-wind-noise audio data and performing ambient noise suppression of the binaural audio output.
Clause 78 includes the computer-readable storage device of Clause 76 where modifying the spatial audio data based on the metric to generate the reduced-wind-noise audio data comprises filtering the spatial audio data using filter parameters based on the metric to reduce low frequency noise associated with wind.
Clause 79 includes the computer-readable storage device of Clause 76 where modifying the spatial audio data based on the metric to generate the reduced-wind-noise audio data comprises reducing a gain applied to one or more spatial audio channels of the spatial audio data.
Clause 80 includes the computer-readable storage device of any of Clauses 75 to 79 where determining the spatial audio data based on the audio signals comprises spatially filtering the audio signals to generate multiple beamformed audio channels.
Clause 81 includes the computer-readable storage device of Clause 80 where the aggregate signal is based on signal power of a sum of multiple angularly offset beamformed audio channels of the multiple beamformed audio channels and the differential signal is based on signal power of a difference of the multiple angularly offset beamformed audio channels.
Clause 82 includes the computer-readable storage device of Clause 81 where the multiple angularly offset beamformed audio channels are angularly offset by at least 90 degrees.
Clause 83 includes the computer-readable storage device of any of Clauses 75 to 82 where determining the spatial audio data based on the audio signals comprises determining ambisonics coefficients based on the audio signals to generate multiple ambisonics channels.
Clause 84 includes the computer-readable storage device of Clause 83 where the aggregate signal is based on signal power of an omnidirectional ambisonics channel of the multiple ambisonics channels and the differential signal is based on signal power of a directional ambisonics channel of the multiple ambisonics channels.
Clause 85 includes the computer-readable storage device of any of Clauses 75 to 84 where the metric indicative of wind noise in the audio signals is determined for one or more frequency bands that are less than a threshold frequency.
Clause 86 includes the computer-readable storage device of any of Clauses 75 to 85 where determining the metric indicative of wind noise in the audio signals comprises determining frequency-specific values of the metric for a set of frequencies, and where the instructions are further executable to reduce a gain applied to one or more spatial audio channels based on a determination that at least one of the frequency-specific values satisfies a wind detection criterion.
Clause 87 includes the computer-readable storage device of Clause 86 where the gain is reduced gradually over multiple frames of the spatial audio data associated with the one or more spatial audio channels.
Clause 88 includes the computer-readable storage device of Clause 86 where the one or more spatial audio channels to which the gain is applied correspond to a front-to-back direction and an up-and-down direction, and where applying the gain reduces low-band audio corresponding the front-to-back direction and the up-and-down direction during playback.
Clause 89 includes the computer-readable storage device of any of Clauses 75 to 88 where determining the metric indicative of wind noise in the audio signals comprises, for each frequency band of a set of frequency bands, determining a band-specific value of the metric.
Clause 90 includes the computer-readable storage device of Clause 89 where the instructions are further executable to modify a particular band-specific value of the metric for a particular frequency band based on determining that the particular band-specific value of the metric satisfies an acceptance criterion.
Clause 91 includes the computer-readable storage device of Clause 89 where the instructions are further executable to apply a wind-reduction parameter to multiple frequency-specific values of the metric to determine the band-specific value of the metric.
Clause 92 includes the computer-readable storage device of Clause 89 where the instructions are further executable to adjust one or more of the band-specific values of the metric to prevent a gain-adjusted power of a higher frequency band of the set of frequency bands from exceeding a gain-adjusted power of a lower frequency band of the set of frequency bands.
Clause 93 includes the computer-readable storage device of Clause 89 where the instructions are further executable to filter the spatial audio data using filter parameters based on the metric to generate reduced-wind-noise audio data.
Clause 94 includes the computer-readable storage device of any of Clauses 75 to 93 where the instructions are further executable to, before determining the spatial audio data, process the audio signals to remove high frequency wind noise.
Clause 95 includes the computer-readable storage device of any of Clauses 75 to 94 where at least two microphones of the at least three microphones are spaced at least 0.5 centimeters apart.
Clause 96 includes the computer-readable storage device of any of Clauses 75 to 95 where at least two microphones of the at least three microphones are spaced at least 2 centimeters apart.
The previous description of the disclosed aspects is provided to enable a person skilled in the art to make or use the disclosed aspects. Various modifications to these aspects will be readily apparent to those skilled in the art, and the principles defined herein may be applied to other aspects without departing from the scope of the disclosure. Thus, the present disclosure is not intended to be limited to the aspects shown herein but is to be accorded the widest scope possible consistent with the principles and novel features as defined by the following claims.
Number | Name | Date | Kind |
---|---|---|---|
9271075 | Matsuo | Feb 2016 | B2 |
9357307 | Taenzer | May 2016 | B2 |
20030147538 | Elko | Aug 2003 | A1 |
20130010982 | Elko et al. | Jan 2013 | A1 |
20170353809 | Zhang et al. | Dec 2017 | A1 |
Entry |
---|
International Search Report and Written Opinion—PCT/US2021/072943—ISA/EPO—dated Apr. 28, 2022. |
Mirabilii D., et al., “Multi-Channel Wind Noise Reduction Using the Corcos Model”, ICASSP 2019—2019 IEEE International Conference on Acoustics, Speech And Signal Processing (ICASSP), IEEE, May 12, 2019 (May 12, 2019), pp. 646-650, XP033566420, DOI: 10.1109/ICASSP.2019.8683873 [retrieved on Apr. 4, 2019] the whole document. |
Mirabilii D., et al., “On the Difference-to-Sum Power Ratio of Speech and Wind Noise Based on the Corcos Model”, arxiv.org, Cornell University Library, 201 Olin Library Cornell University Ithaca, NY14853, Oct. 23, 2018 (Oct. 23, 2018), XP081553827, pp. 1-5, DOI: 10.1109/ICSEE.2018.8645977 the whole document. |
Number | Date | Country | |
---|---|---|---|
20220199100 A1 | Jun 2022 | US |