The present disclosure is generally related to microphones.
Advances in technology have resulted in smaller and more powerful computing devices. For example, there currently exist a variety of portable personal computing devices, including wireless telephones such as mobile and smart phones, tablets and laptop computers that are small, lightweight, and easily carried by users. These devices can communicate voice and data packets over wireless networks. Further, many such devices incorporate additional functionality such as a digital still camera, a digital video camera, a digital recorder, and an audio file player. Also, such devices can process executable instructions, including software applications, such as a web browser application, that can be used to access the Internet. As such, these devices can include significant computing capabilities.
A higher-order ambisonics (HOA) signal (often represented by a plurality of spherical harmonic coefficients (SHC) or other hierarchical elements) is a three-dimensional representation of a soundfield. The HOA signal, or SHC representation of the HOA signal, may represent the soundfield in a manner that is independent of local speaker geometry used to playback a multi-channel audio signal rendered from the HOA signal. The HOA signal may also facilitate backwards compatibility as the HOA signal may be rendered to multi-channel formats, such as a 5.1 audio channel format or a 7.1 audio channel format.
Microphones used to capture audio for direct ambisonic conversion introduce “hiss” noise that may be audible during playback. Applying noise reduction, such as Wiener filtering and spectral subtraction, at the microphones can impair audio quality and introduce errors in direction information of audio signals. Applying Wiener filtering and spectral subtraction independently at loudspeakers during playback also introduces audio quality artefacts when loudspeaker contributions are added at the listener's position.
According to a particular implementation of the techniques disclosed herein, a device is configured to apply noise reduction to ambisonic signals. The device includes a memory configured to store noise data corresponding to microphones in a microphone array. The device also includes a processor configured to perform signal processing operations on signals captured by microphones in the microphone array to generate multiple sets of ambisonic signals. The multiple sets of ambisonic signals include a first set corresponding to a first particular ambisonic order and a second set corresponding to a second particular ambisonic order. The processor is also configured to perform a first noise reduction operation that includes applying a first gain factor to each ambisonic signal in the first set. The first gain factor is based on the noise data. The processor is also configured to perform a second noise reduction operation that includes applying a second gain factor to each ambisonic signal in the second set. The second gain factor is based on the noise data and is distinct from the first gain factor.
According to another particular implementation of the techniques disclosed herein, a method of reducing noise in ambisonic signals includes performing signal processing operations on signals captured by microphones in a microphone array to generate ambisonic signals. The ambisonic signals include multiple sets of ambisonic signals including a first set corresponding to a first particular ambisonic order and a second set corresponding to a second particular ambisonic order. The method includes performing a first noise reduction operation that includes applying a first gain factor to each ambisonic signal in the first set. The first gain factor is based on noise data corresponding to the microphones. The method includes performing a second noise reduction operation that includes applying a second gain factor to each ambisonic signal in the second set. The second gain factor is based on the noise data and is distinct from the first gain factor.
According to another particular implementation of the techniques disclosed herein, a non-transitory computer-readable medium includes instructions that, when executed by a processor, cause the processor to perform operations to cause the processor to perform operations to apply noise reduction to ambisonic signals. The operations include performing signal processing operations on signals captured by microphones in a microphone array to generate ambisonic signals. The ambisonic signals include multiple sets of ambisonic signals including a first set corresponding to a first particular ambisonic order and a second set corresponding to a second particular ambisonic order. The operations include performing a first noise reduction operation that includes applying a first gain factor to each ambisonic signal in the first set. The first gain factor is based on noise data corresponding to the microphones. The operations also include performing a second noise reduction operation that includes applying a second gain factor to each ambisonic signal in the second set. The second gain factor is based on the noise data and is distinct from the first gain factor.
According to another particular implementation of the techniques disclosed herein, an apparatus to apply noise reduction to ambisonic signals includes means for storing noise data corresponding to microphones in a microphone array. The apparatus includes means for performing signal processing operations on signals captured by microphones in the microphone array to generate multiple sets of ambisonic signals. The multiple sets of ambisonic signals include a first set corresponding to a first particular ambisonic order and a second set corresponding to a second particular ambisonic order. The apparatus includes means for performing a first noise reduction operation that includes applying a first gain factor to each ambisonic signal in the first set. The first gain factor is based on the noise data. The apparatus also includes means for performing a second noise reduction operation that includes applying a second gain factor to each ambisonic signal in the second set. The second gain factor is based on the noise data and is distinct from the first gain factor.
Other implementations, advantages, and features of the present disclosure will become apparent after review of the entire application, including the following sections: Brief Description of the Drawings, Detailed Description, and the Claims.
Conventional techniques for reducing noise introduced by microphone arrays that capture audio for direct ambisonic conversion can generate undesirable effects. For example, applying conventional noise reduction at the microphones can impair audio quality and introduce errors in direction information of audio signals, while applying conventional noise reduction independently at loudspeakers during playback also introduces audio quality artefacts when loudspeaker contributions are added at the listener's position.
The present disclosure describes noise reduction devices and techniques that reduce or eliminate the impaired audio quality and errors in direction information associated with conventional techniques. As described herein, improved noise reduction can be performed that includes, for each ambisonic order, determining microphone noise for the microphones contributing to signals of that ambisonic order, and using the microphone noise to generate a gain factor that is applied to the signals of that ambisonic order. By scaling signals corresponding to each of the ambisonic orders independently of the other ambisonic orders, noise is reduced and the direction information loss associated with conventional noise reduction techniques is also reduced or eliminated.
Particular aspects of the present disclosure are described below with reference to the drawings. In the description, common features are designated by common reference numbers. As used herein, various terminology is used for the purpose of describing particular implementations only and is not intended to be limiting of implementations. For example, the singular forms “a,” “an,” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It may be further understood that the terms “comprise,” “comprises,” and “comprising” may be used interchangeably with “include,” “includes,” or “including.” Additionally, it will be understood that the term “wherein” may be used interchangeably with “where.” As used herein, “exemplary” may indicate an example, an implementation, and/or an aspect, and should not be construed as limiting or as indicating a preference or a preferred implementation. As used herein, an ordinal term (e.g., “first,” “second,” “third,” etc.) used to modify an element, such as a structure, a component, an operation, etc., does not by itself indicate any priority or order of the element with respect to another element, but rather merely distinguishes the element from another element having a same name (but for use of the ordinal term). As used herein, the term “set” refers to one or more of a particular element, and the term “plurality” refers to multiple (e.g., two or more) of a particular element.
As used herein, “coupled” may include communicatively coupled, electrically coupled, magnetically coupled, physically coupled, optically coupled, and combinations thereof. Two devices (or components) may be coupled (e.g., communicatively coupled, electrically coupled, or physically coupled) directly or indirectly via one or more other devices, components, wires, buses, networks (e.g., a wired network, a wireless network, or a combination thereof), etc. Two devices (or components) that are electrically coupled may be included in the same device or in different devices and may be connected via electronics, one or more connectors, or inductive coupling, as illustrative, non-limiting examples. In some implementations, two devices (or components) that are communicatively coupled, such as in electrical communication, may send and receive electrical signals (digital signals or analog signals) directly or indirectly, such as via one or more wires, buses, networks, etc.
In the present disclosure, terms such as “determining”, “calculating”, “estimating”, “shifting”, “adjusting”, etc. may be used to describe how one or more operations are performed. It should be noted that such terms are not to be construed as limiting and other techniques may be utilized to perform similar operations. Additionally, as referred to herein, “generating”, “calculating”, “estimating”, “using”, “selecting”, “accessing”, and “determining” may be used interchangeably. For example, “generating”, “calculating”, “estimating”, or “determining” a parameter (or a signal) may refer to actively generating, estimating, calculating, or determining the parameter (or the signal) or may refer to using, selecting, or accessing the parameter (or signal) that is already generated, such as by another component or device.
In general, techniques are described for coding of higher-order ambisonics audio data. Higher-order ambisonics audio data may include at least one higher-order ambisonic (HOA) coefficient corresponding to a spherical harmonic basis function having an order greater than one.
The evolution of surround sound has made available many audio output formats for entertainment. Examples of such consumer surround sound formats are mostly ‘channel’ based in that they implicitly specify feeds to loudspeakers in certain geometrical coordinates. The consumer surround sound formats include the popular 5.1 format (which includes the following six channels: front left (FL), front right (FR), center or front center, back left or surround left, back right or surround right, and low frequency effects (LFE)), the growing 7.1 format, and various formats that includes height speakers such as the 7.1.4 format and the 22.2 format (e.g., for use with the Ultra High Definition Television standard). Non-consumer formats can span any number of speakers (in symmetric and non-symmetric geometries) often termed ‘surround arrays’. One example of such a sound array includes 32 loudspeakers positioned at coordinates on the corners of a truncated icosahedron.
The input to a future Moving Picture Experts Group (MPEG) encoder is optionally one of three possible formats: (i) traditional channel-based audio (as discussed above), which is meant to be played through loudspeakers at pre-specified positions; (ii) object-based audio, which involves discrete pulse-code-modulation (PCM) data for single audio objects with associated metadata containing their location coordinates (amongst other information); or (iii) scene-based audio, which involves representing the soundfield using coefficients of spherical harmonic basis functions (also called “spherical harmonic coefficients” or SHC, “Higher-order Ambisonics” or HOA, and “HOA coefficients”). The future MPEG encoder may be described in more detail in a document entitled “Call for Proposals for 3D Audio,” by the International Organization for Standardization/International Electrotechnical Commission (ISO)/(IEC) JTC1/SC29/WG11/N13411, released January 2013 in Geneva, Switzerland, and available at http://mpeg.chiariglione.org/sites/default/files/files/standards/parts/docs/w13411.zip.
There are various ‘surround-sound’ channel-based formats currently available. The formats range, for example, from the 5.1 home theatre system (which has been the most successful in terms of making inroads into living rooms beyond stereo) to the 22.2 system developed by NHK (Nippon Hoso Kyokai or Japan Broadcasting Corporation). Content creators (e.g., Hollywood studios) would like to produce a soundtrack for a movie once, and not spend effort to remix it for each speaker configuration. Recently, Standards Developing Organizations have been considering ways in which to provide an encoding into a standardized bitstream and a subsequent decoding that is adaptable and agnostic to the speaker geometry (and number) and acoustic conditions at the location of the playback (involving a renderer).
To provide such flexibility for content creators, a hierarchical set of elements may be used to represent a soundfield. The hierarchical set of elements may refer to a set of elements in which the elements are ordered such that a basic set of lower-ordered elements provides a full representation of the modeled soundfield. As the set is extended to include higher-order elements, the representation becomes more detailed, increasing resolution.
One example of a hierarchical set of elements is a set of spherical harmonic coefficients (SHC). The following expression demonstrates a description or representation of a soundfield using SHC:
The expression shows that the pressure pi at any point {rr, θr, φr} of the soundfield, at time t, can be represented uniquely by the SHC, Anm(k). Here,
c is the speed of sound (˜343 m/s), {rr, θr, φr} is a point of reference (or observation point), jn(⋅) is the spherical Bessel function of order n, and Ynm (θr, φr) are the spherical harmonic basis functions of order n and suborder m. It can be recognized that the term in square brackets is a frequency-domain representation of the signal (i.e., S (ω, rr, θr, φr)) which can be approximated by various time-frequency transformations, such as the discrete Fourier transform (DFT), the discrete cosine transform (DCT), or a wavelet transform. Other examples of hierarchical sets include sets of wavelet transform coefficients and other sets of coefficients of multiresolution basis functions.
The SHC Anm (k) can either be physically acquired (e.g., recorded) by various microphone array configurations or, alternatively, they can be derived from channel-based or object-based descriptions of the soundfield. The SHC represent scene-based audio, where the SHC may be input to an audio encoder to obtain encoded SHC that may promote more efficient transmission or storage. For example, a fourth-order representation involving (1+4)2 (25, and hence fourth order) coefficients may be used.
As noted above, the SHC may be derived from a microphone recording using a microphone array. Various examples of how SHC may be derived from microphone arrays are described in Poletti, M., “Three-Dimensional Surround Sound Systems Based on Spherical Harmonics,” J. Audio Eng. Soc., Vol. 53, No. 11, 2005 November, pp. 1004-1025.
To illustrate how the SHCs may be derived from an object-based description, consider the following equation. The coefficients Anm(k) for the soundfield corresponding to an individual audio object may be expressed as:
Anm(k)=g(ω)(−4πik)hn(2)(krs)Ynm*(θs,φs),
where i is √{square root over (−1)}, hn(2)(⋅) is the spherical Hankel function (of the second kind) of order n, and {rs, θs, φs} is the location of the object. Knowing the object source energy g(ω) as a function of frequency (e.g., using time-frequency analysis techniques, such as performing a fast Fourier transform on the PCM stream) enables conversion of each PCM object and the corresponding location into the SHC Anm(k). Further, it can be shown (since the above is a linear and orthogonal decomposition) that the Anm(k) coefficients for each object are additive. In this manner, a multitude of PCM objects can be represented by the Anm(k) coefficients (e.g., as a sum of the coefficient vectors for the individual objects). Essentially, the coefficients contain information about the soundfield (the pressure as a function of 3D coordinates), and the above represents the transformation from individual objects to a representation of the overall soundfield, in the vicinity of the observation point {rr, θr, φr}.
Referring to
The system 100 includes a microphone array 102 configured to provide audio data 104 to a processor 101 that includes an ambisonics conversion circuit 110. Ambisonic signals 112 corresponding to the audio data 104 are output by the ambisonics conversion circuit 110 and provided to a noise reduction block 120 in the processor 101. Noise reduced ambisonics signals 130 are output by the noise reduction block 120 and correspond to noise-reduced versions of the ambisonic signals 112.
The microphone array 102 includes multiple microphones configured to capture the audio data 104. For example, the microphone array 102 may have a spherical microphone array configuration, such as an Eigenmike or Zylia spherical array. In other examples, the microphone array 102 has another configuration, such as a linear array configuration, a tetrahedral configuration, or any other regular or non-regular configuration. The microphone array 102 may include any number of microphones, such as four microphones, eight microphones, or 32 microphones, as illustrative, non-limiting examples.
The ambisonic signals 112 include a first set 114 corresponding to a zero-order ambisonic signal (e.g., a W signal), a second set 115 corresponding to first order ambisonic signals (e.g., X, Y, and Z signals), a third set 116 corresponding to second order ambisonic signals, and one or more additional sets including a set 117 corresponding to N-th order ambisonic signals (where N is an integer greater than 2). The noise reduced ambisonics signals 130 include a first set 134 corresponding to a noise-reduced version of the first set 114 (e.g., a W signal), a second set 135 corresponding to a noise-reduced version of the second set 115, a third set 136 corresponding to a noise-reduced version of the third set 116, and one or more additional sets including a set 137 corresponding to a noise-reduced version of the set 117.
The noise reduction block 120 includes a frequency-domain vector-type noise subtraction circuit 124 configured to process the first set 114 to generate the noise-reduced first set 134, a frequency-domain vector-type noise subtraction circuit 125 configured to process the second set 115 to generate the noise-reduced second set 135, a frequency-domain vector-type noise subtraction circuit 126 configured to process the third set 116 to generate the noise-reduced third set 136, and one or more frequency-domain vector-type noise subtraction circuits including a frequency-domain vector-type noise subtraction circuit 127 configured to process the set 117 to generate the noise-reduced set 137.
The noise reduction block 120 is configured to process each order of the ambisonic signals 112 independently of the other orders of the ambisonic signals 112. Noise data 142 is stored in a memory 140 that is coupled to the processor 101. As explained in further detail with reference to
To illustrate, in an implementation in which N=4, the frequency-domain vector-type noise subtraction circuit 124 performs noise reduction on ambisonic signals of order 0, the frequency-domain vector-type noise subtraction circuit 125 performs noise reduction on ambisonic signals of order 1, the frequency-domain vector-type noise subtraction circuit 126 performs noise reduction on ambisonic signals of order 2, another frequency-domain vector-type noise subtraction circuit (not shown) performs noise reduction on ambisonic signals of order 3, and the frequency-domain vector-type noise subtraction circuit 127 performs noise reduction on ambisonic signals of order 4. In such an implementation with N=4, each of the five active frequency-domain vector-type noise subtraction circuits operates independently of, and in parallel with, each other to generate the noise-reduced ambisonics signals 130. In another implementation in which N=8, the noise reduction block 120 includes nine active frequency-domain vector-type noise subtraction circuits, and each of the nine active frequency-domain vector-type noise subtraction circuits operates independently of, and in parallel with, each other to generate the noise-reduced ambisonics signals 130. Although examples are described with N=4 and N=8, it should be understood that N may be any integer greater than 1.
As described in further detail with regard to
The implementation 200 includes multiple power computation circuits 210-214. Each of the power computation circuits 210-214 is configured to receive frequency-domain noise samples 202 from a respective signal (or “channel”) of the multiple signals or channels that correspond to a particular ambisonic order. For example, for first-order (N=1) noise determination, noise power is determined for each of the three first-order ambisonic channels; for second-order (N=2) noise determination, noise power is determined for each of the five second-order ambisonic channels; for third-order (N=3) noise determination, noise power is determined for each of the seven third-order ambisonic channels, for fourth-order (N=4) noise determination, noise power is determined for each of the nine fourth-order ambisonic channels, etc. Each power computation circuit 210-214 is configured to generate a channel noise power value based on a square average of the received samples 202 for that channel. Channel noise power values from each of the power computation circuits 210-214 are summed at an adder 220. A square root circuit 222 is configured to perform a square root operation to the output of the adder 220 to generate a noise power value 224 for the particular ambisonic order.
For example, a noise-only higher-order ambisonic (HOA) signal for the microphone array 102 of
Nn(f)=√{square root over (Σm=−nnβ2nm(f))}
where β2nm(f) represents a noise power of the m-th sub-order of the n-th order ambisonic signal (e.g., an output from one of the power computation circuits 210-214). The values of Nn(f) for all values of n and f may be stored as noise data for use during noise reduction operations, as described further with reference to
The components 300 includes multiple power computation circuits 310-314. Each of the power computation circuits 310-314 is configured to receive frequency-domain samples from a respective signal (channel) of the multiple signals 302 that correspond to a particular ambisonic order. For example, for first-order (N=1) power determination, signal power is determined for each of the three first-order ambisonic channels; for second-order (N=2) power determination, signal power is determined for each of the five second-order ambisonic channels. Each power computation circuit 310-314 is configured to generate a channel noise power value based on a square average of the received samples for that channel 302. Channel noise power values from each of the power computation circuits 310-314 are summed at an adder 320. A square root circuit 322 is configured to perform a square root operation to the output of the adder 320 to generate a total power value (total_order_power) 324 for the particular ambisonic order. In a particular implementation, the power computation circuits 310-314, the adder 320, and the square root circuit 322 correspond to the power computation circuits 210-214, the adder 220, and the square root circuit 222, respectively, of
In a particular implementation, the signal power at each order (e.g., rms power P at order n and frequency f) is determined as:
Pn(f)=√{square root over (Σm=−nnα2nm(f))}
where α2nm (f) represents a signal power of the m-th channel of the n-th order ambisonic signal (e.g., an output from one of the power computation circuits 310-314).
A gain computation circuit 334 is configured to receive the noise power value (noise_power) 224 (e.g., Nn(f)) and the total power value (total_order_power) 324 (e.g., Pn(f)) and to compute a gain factor 336 based on the noise power and the total power. In a particular example, the gain factor 336 is determined based on a difference of the first total power and the first noise power, as compared to the first total power, such as gain=(total_order_power−noise_power)/(total_order_power).
In some implementations, the gain computation circuit 334 is configured to apply a smoothing parameter to a previous gain factor. The previous gain factor is based on previous frequency samples of each ambisonic signal in the set of ambisonic signals for a particular ambisonic order and at the particular frequency, such as:
where gnt (f) represents the gain factor 336 at frequency f for a set of samples corresponding to a time frame t and for ambisonic order n, δ represents an aggressiveness and can vary with frequency—how much of noise power to subtract—and has a value between 0-1, γ represents a smoothing parameter that affects how quickly the gain changes over time, and gnt-1(f) represents the previous gain factor at frequency f for a set of samples corresponding to a time frame t−1 that precedes time frame t and for ambisonic order n.
A scaling circuit 330 is configured to scale the samples of each of the ambisonic signals 302 of the order based on the gain factor 336. In an example, each of the signals 302 may be multiplied by the gain factor 336 to generate noise-subtracted signals 332. In a particular example, scaling the samples of each ambisonic signal 302 in each of the sets 114, 115, 116, and 117, based on the particular gain factor 336 that is computed for that set 114, 115, 116, or 117, reduces noise without distorting directional information corresponding to that set 114, 115, 116, or 117.
In conjunction with the implementations depicted in
The processor 101 is configured to perform a first noise reduction operation that includes applying a first gain factor to each ambisonic signal in the first set (set 115). The first gain factor is based on the noise data 142. To illustrate, the frequency-domain vector-type noise subtraction circuit 125 includes a copy of the scaling circuit 330 of
The processor 101 is also configured to perform a second noise reduction operation that includes applying a second gain factor to each ambisonic signal in the second set (set 116). The second gain factor for order n=2 is based on the noise data 142 and is distinct from the first gain factor for order n=1. To illustrate, the frequency-domain vector-type noise subtraction circuit 126 includes a copy of the scaling circuit 330 of
In some implementations, the processor 101 is configured to calculate the first frequency-based noise data by determining a power of each channel of the first particular ambisonic order (order n=1) during silence, using the power computation circuits 210-214 of
In some implementations, the processor 101 is further configured to perform the first noise reduction operation using a copy of the components 300 in the frequency-domain vector-type noise subtraction circuit 125 by determining a first total power (total power value 324 at order n=1) based on a sum of powers of the ambisonic signals 312 in the first set (set 115) at the particular frequency, determining the first gain factor 336 based on the first noise power 224 (at order n=1) and the total power value 324, and scaling samples of each ambisonic signal in the first set 115 at the particular frequency based on the first gain factor 336.
Although in the example described above, the first particular ambisonic order is order n=1 and the second particular ambisonic order is order n=2, in other examples the first particular ambisonic order and the second particular ambisonic order correspond to different ambisonic orders. In an illustrative example, the first particular ambisonic order corresponds to a third order (n=3), and the channels of the first particular ambisonic order correspond to third order ambisonic channels.
In some implementations, the processor 101 is configured to receive, via a user interface, one or more user inputs corresponding to parameters of at least one of the first noise reduction operation or the second noise reduction operation. For example, the processor 101 may be incorporated in a device that includes a display screen and may be configured to generate the user interface for display at the display screen, such as described in further detail with reference to
Referring to
The microphone array 102 includes a microphone 412, a microphone 414, a microphone 416, and a microphone 418. According to one implementation, at least one microphone 412, 414, 416, 418 is an omnidirectional microphone. For example, at least one microphone 412, 414, 416, 418 is configured to capture sound with approximately equal gain for all sides and directions. According to one implementation, at least one of the microphones 412, 414, 416, 418 is a microelectromechanical system (MEMS) microphone.
In some implementations, the microphones 412, 414, 416, 418 are positioned in a tetrahedral configuration. However, it should be understood that the microphones 412, 414, 416, 418 may be arranged in different configurations (e.g., a spherical configuration, such as an Eigenmike or Zylia spherical array, a triangular configuration, a random configuration, etc.). Although the microphone array 102 is shown to include four microphones, in other implementations, the microphone array 102 may include fewer than four microphones or more than four microphones. For example, the microphone array 102 may include three microphones, eight microphones, or any other number of microphones.
The system 400 also includes signal processing circuitry that is coupled to the microphone array 102. The signal processing circuitry includes a signal processor 420, a signal processor 422, a signal processor 424, and a signal processor 426. The signal processing circuitry is configured to perform signal processing operations on analog signals captured by each microphone 412, 414, 416, 418 to generate digital signals.
To illustrate, the microphone 412 is configured to capture an analog signal 413, the microphone 414 is configured to capture an analog signal 415, the microphone 416 is configured to capture an analog signal 417, and the microphone 418 is configured to capture an analog signal 419. The signal processor 420 is configured to perform first signal processing operations (e.g., filtering operations, gain adjustment operations, analog-to-digital conversion operations) on the analog signal 413 to generate a digital signal 433. In a similar manner, the signal processor 422 is configured to perform second signal processing operations on the analog signal 415 to generate a digital signal 435, the signal processor 424 is configured to perform third signal processing operations on the analog signal 417 to generate a digital signal 437, and the signal processor 426 is configured to perform fourth signal processing operations on the analog signal 419 to generate a digital signal 439. Each signal processor 420, 422, 424, 426 includes an analog-to-digital converter (ADC) 421, 423, 425, 427, respectively, to perform the analog-to-digital conversion operations. According to one implementation, the ADCs 421, 423, 425, 427 are integrated into a coder/decoder (CODEC). According to another implementation, the ADCs 421, 423, 425, 427 are stand-alone ADCs. According to yet another implementation, the ADCs 421, 423, 425, 427 are included in the microphone array 102. Thus, in some scenarios, the microphone array 102 may generate the digital signals 433, 435, 437, 439.
Each digital signal 433, 435, 437, 439 is provided to one or more directivity adjusters 450 of the processor 101. In
The microphone analyzer 440 is coupled to the microphone array 102 via a control bus 446, and the microphone analyzer 440 is coupled to the directivity adjusters 450 and the filters 470 via a control bus 447. In some implementations, the microphone analyzer 440 is configured to determine position information 441 for each microphone of the microphone array 102, orientation information 442 for each microphone of the microphone array 102, and power level information 443 for each microphone of the microphone array 102. Based on the position information 441, the orientation information 442, and the power level information 443, the processor 101 selects a number of directivity adjusters 450 to activate, sets of multiplicative factors 453 and 455 to be used at the active directivity adjusters 450, one or more sets of the filters 471-478 to activate, and filter coefficients 459 for each of the activated filters 471-478.
The microphone analyzer 440 enables the processor 101 to compensate for flexible positioning of the microphone (e.g., a “non-ideal” tetrahedral microphone arrangement) by adjusting the number of active directivity adjusters 450, filters 470, multiplicative factors 453, 455, and filter coefficients 457, 459 based on the position of the microphones, the orientation of the microphones, etc. The directivity adjusters 450 and the filters 470 apply different transfer functions to the digital signals 433, 435, 437, 439 based on the placement and directivity of the microphones 412, 414, 416, 418.
The microphone analyzer 440 also includes a noise measurement circuit 408 configured to generate the noise data 142. For example, the noise measurement circuit 408 may include the components illustrated in
The directivity adjuster 452 may be configured to apply the first set of multiplicative factors 453 to the digital signals 433, 435, 437, 439 to generate a first set of ambisonic signals 461-464. For example, the directivity adjuster 452 may apply the first set of multiplicative factors 453 to the digital signals 433, 435, 437, 439 using a first matrix multiplication. The first set of ambisonic signals includes a W signal 461, an X signal 462, a Y signal 463, and a Z signal 464.
The directivity adjuster 454 may be configured to apply the second set of multiplicative factors 455 to the digital signals 433, 435, 437, 439 to generate a second set of ambisonic signals 465-168. For example, the directivity adjuster 454 may apply the second set of multiplicative factors 455 to the digital signals 433, 435, 437, 439 using a second matrix multiplication. The second set of ambisonic signals includes a W signal 465, an X signal 466, a Y signal 467, and a Z signal 468. In other implementations, the first and second sets of ambisonic signal include higher-order ambisonic signals (e.g., n=2, n=3, etc.).
The first set of filters 471-474 are configured to filter the first set of ambisonic signals 461-164 to generate a filtered first set of ambisonic signals 481-484. The second set of filters 475-478 are configured to filter the second set of ambisonic signals 465-468 to generate a filtered second set of ambisonic signals 485-488.
The system 400 also includes combination circuitry 495-498 coupled to the first set of filters 471-474 and to the second set of filters 475-478. The combination circuitry 495-498 is configured to combine the filtered first set of ambisonic signals 481-484 and the filtered second set of ambisonic signals 485-488 to generate a processed set of ambisonic signals 491-494. In an example, the ambisonic signal 491 corresponds to the set 114 of
The ambisonic signals and the noise data 142 are provided to the noise reduction block 120 to generate the noise reduced ambisonics signals 130, such as described with reference to
The digital signals 433-439 are processed at a matrix multiplier 402 that is configured to perform multiplication operations using a set of multiplicative factors 403 to generate the ambisonic signals 461-464. In some implementations, the matrix multiplier 402 corresponds to the ambisonics conversion circuit 110 of
As compared to the system 400 of
The system 400 and the system 401 provide illustrative, non-limiting examples of systems that include the noise reduction block 120 of
Thus,
Referring to
A user interface 520 may be displayed on the screen 510, such as a touch screen, to enable a user to provide user input corresponding to a noise reduction operation. For example, in some implementations the user interface 520 enables the user to select or adjust values of the aggressiveness parameter δ, the smoothing parameter γ, or a combination thereof, that are described with reference to
In another example, the user input indicates a playback system. For example, in some implementations the user interface 520 enables the user to indicate a loudspeaker configuration that is to be used for playback. Based on the loudspeaker configuration, one or more aspects of noise reduction may be adjusted. For example, if the user input indicates a loudspeaker configuration that uses a relatively low number of channels, such as 5.1, binaural or 7.1.4, a parameter for noise reduction is selected based on the playback system. For binaural a less aggressive noise reduction may be used since playback is over two channels. As more channels are added, more aggressive noise reduction can be applied. Thus, the user interface 520 enables user selection of aggressiveness of noise reduction. In some implementations, aggressiveness of noise reduction (e.g., a value of the aggressiveness parameter δ) is automatically adjusted based on playback history instead of via direct user input via the user interface 520. For example, if a history of playback is primarily binaural then noise reduction can be automatically set to be less aggressive as compared to a history of playback primarily on 5.1 systems. In some implementation where noise reduction aggressiveness is automatically selected based on a history of playback speaker configurations, the user interface 520 is omitted, while in other such implementations the user interface 520 is included and enables a user to override an automatically-selected aggressiveness setting.
In some implementations, power savings may be obtained using a field of view (FOV) or region-based system based on a camera, such as the optical wearable described with reference to
Referring to
Referring to
Referring to
Referring to
The method 700 includes performing signal processing operations on signals captured by microphones in a microphone array to generate ambisonic signals, at 702. The ambisonic signals include multiple sets of ambisonic signals including a first set corresponding to a first particular ambisonic order and a second set corresponding to a second particular ambisonic order. In a particular example, the first particular ambisonic order is order=1 and the second particular ambisonic order=2. In another example, the first particular ambisonic order corresponds to order=4 and the second particular ambisonic order corresponds to order=6. In general, the method 700 is implemented with the first particular ambisonic order corresponding to order=‘A’ and the second particular ambisonic order corresponding to order=‘B’, where A and B are any positive integers, A not equal to B.
The method 700 includes, at 704, performing a first noise reduction operation that includes applying a first gain factor to each ambisonic signal in the first set. The first gain factor is based on noise data corresponding to the microphones. The method 700 also includes, at 706, performing a second noise reduction operation that includes applying a second gain factor to each ambisonic signal in the second set. The second gain factor is based on the noise data and distinct from the first gain factor.
Although the method 700 is described as including two noise reduction operations, it should be understood that any number of noise reduction operations may be performed. To illustrate, in an implementation in which the generated ambisonic signals includes signals of order 0, 1, 2, 3, and 4, such as illustrated in
Performing noise reduction by applying separate gain values to different orders of ambisonic signal reduces distortion of direction information and reduces sound quality artefacts as compared to performing noise reduction at the microphones, at the loudspeakers, or both.
Referring to
In a particular implementation, the device 800 includes a processor 806, such as a central processing unit (CPU) or a digital signal processor (DSP), coupled to a memory 853. The memory 853 includes instructions 860 (e.g., executable instructions) such as computer-readable instructions or processor-readable instructions. The instructions 860 may include one or more instructions that are executable by a computer, such as the processor 806 or a processor 810, such to perform operations in accordance with the method 700 of
A transceiver 811 may be coupled to the processor 810 and to an antenna 842, such that wireless data received via the antenna 842 and the transceiver 811 may be provided to the processor 810. In some implementations, the processor 810, the display controller 826, the memory 853, the CODEC 834, and the transceiver 811 are included in a system-in-package or system-on-chip device 822. In some implementations, an input device 830 and a power supply 844 are coupled to the system-on-chip device 822. Moreover, in a particular implementation, as illustrated in
The device 800 may include a headset, a mobile communication device, a smart phone, a cellular phone, a laptop computer, a computer, a tablet, a personal digital assistant, a display device, a television, a gaming console, a music player, a radio, a digital video player, a digital video disc (DVD) player, a tuner, a camera, a navigation device, a vehicle, a component of a vehicle, or any combination thereof, as illustrative, non-limiting examples.
In an illustrative implementation, the memory 853 may include or correspond to a non-transitory computer readable medium storing the instructions 860. The instructions 860 may include one or more instructions that are executable by a computer, such as the processors 810, 806 or the CODEC 834. The instructions 860 may cause the processor 810 to perform one or more operations described herein, including but not limited to one or more portions of the method 700 of
In a particular implementation, one or more components of the systems and devices disclosed herein may be integrated into a decoding system or apparatus (e.g., an electronic device, a CODEC, or a processor therein), into an encoding system or apparatus, or both. In other implementations, one or more components of the systems and devices disclosed herein may be integrated into a wireless telephone, a tablet computer, a desktop computer, a laptop computer, a set top box, a music player, a video player, an entertainment unit, a television, a game console, a navigation device, a communication device, a personal digital assistant (PDA), a fixed location data unit, a personal media player, or another type of device.
In conjunction with the described techniques, an apparatus includes means for storing noise data corresponding to microphones in a microphone array, such as the memory 140. The apparatus also includes means for performing signal processing operations on signals captured by microphones in the microphone array to generate multiple sets of ambisonic signals, such as the ambisonics conversion circuit 110. The multiple sets of ambisonic signals include a first set corresponding to a first particular ambisonic order and a second set corresponding to a second particular ambisonic order.
The apparatus also includes means for performing a first noise reduction operation that includes applying a first gain factor to each ambisonic signal in the first set, the first gain factor based on the noise data, such as the frequency-domain vector-type noise subtraction circuit 124, 125, 126, or 127, one or more of the components 300 of
The foregoing techniques may be performed with respect to any number of different contexts and audio ecosystems. A number of example contexts are described below, although the techniques should be limited to the example contexts. One example audio ecosystem may include audio content, movie studios, music studios, gaming audio studios, channel based audio content, coding engines, game audio stems, game audio coding/rendering engines, and delivery systems.
The movie studios, the music studios, and the gaming audio studios may receive audio content. In some examples, the audio content may represent the output of an acquisition. The movie studios may output channel based audio content (e.g., in 2.0, 5.1, and 7.1) such as by using a digital audio workstation (DAW). The music studios may output channel based audio content (e.g., in 2.0, and 5.1) such as by using a DAW. In either case, the coding engines may receive and encode the channel based audio content based one or more codecs (e.g., AAC, AC3, Dolby True HD, Dolby Digital Plus, and DTS Master Audio) for output by the delivery systems. The gaming audio studios may output one or more game audio stems, such as by using a DAW. The game audio coding/rendering engines may code and or render the audio stems into channel based audio content for output by the delivery systems. Another example context in which the techniques may be performed includes an audio ecosystem that may include broadcast recording audio objects, professional audio systems, consumer on-device capture, HOA audio format, on-device rendering, consumer audio, TV, and accessories, and car audio systems.
The broadcast recording audio objects, the professional audio systems, and the consumer on-device capture may all code their output using HOA audio format. In this way, the audio content may be coded using the HOA audio format into a single representation that may be played back using the on-device rendering, the consumer audio, TV, and accessories, and the car audio systems. In other words, the single representation of the audio content may be played back at a generic audio playback system (i.e., as opposed to requiring a particular configuration such as 5.1, 7.1, etc.).
Other examples of context in which the techniques may be performed include an audio ecosystem that may include acquisition elements, and playback elements. The acquisition elements may include wired and/or wireless acquisition devices (e.g., Eigen microphones), on-device surround sound capture, and mobile devices (e.g., smartphones and tablets). In some examples, wired and/or wireless acquisition devices may be coupled to mobile device via wired and/or wireless communication channel(s).
In accordance with one or more techniques of this disclosure, the mobile device may be used to acquire a soundfield. For instance, the mobile device may acquire a soundfield via the wired and/or wireless acquisition devices and/or the on-device surround sound capture (e.g., a plurality of microphones integrated into the mobile device). The mobile device may then code the acquired soundfield into the HOA coefficients for playback by one or more of the playback elements. For instance, a user of the mobile device may record (acquire a soundfield of) a live event (e.g., a meeting, a conference, a play, a concert, etc.), and code the recording into HOA coefficients.
The mobile device may also utilize one or more of the playback elements to playback the HOA coded soundfield. For instance, the mobile device may decode the HOA coded soundfield and output a signal to one or more of the playback elements that causes the one or more of the playback elements to recreate the soundfield. As one example, the mobile device may utilize the wireless and/or wireless communication channels to output the signal to one or more speakers (e.g., speaker arrays, sound bars, etc.). As another example, the mobile device may utilize docking solutions to output the signal to one or more docking stations and/or one or more docked speakers (e.g., sound systems in smart cars and/or homes). As another example, the mobile device may utilize headphone rendering to output the signal to a set of headphones, e.g., to create realistic binaural sound.
In some examples, a particular mobile device may both acquire a 3D soundfield and playback the same 3D soundfield at a later time. In some examples, the mobile device may acquire a 3D soundfield, encode the 3D soundfield into HOA, and transmit the encoded 3D soundfield to one or more other devices (e.g., other mobile devices and/or other non-mobile devices) for playback.
Yet another context in which the techniques may be performed includes an audio ecosystem that may include audio content, game studios, coded audio content, rendering engines, and delivery systems. In some examples, the game studios may include one or more DAWs which may support editing of HOA signals. For instance, the one or more DAWs may include HOA plugins and/or tools which may be configured to operate with (e.g., work with) one or more game audio systems. In some examples, the game studios may output new stem formats that support HOA. In any case, the game studios may output coded audio content to the rendering engines which may render a soundfield for playback by the delivery systems.
The techniques may also be performed with respect to exemplary audio acquisition devices. For example, the techniques may be performed with respect to an Eigen microphone which may include a plurality of microphones that are collectively configured to record a 3D soundfield. In some examples, the plurality of microphones of Eigen microphone may be located on the surface of a substantially spherical ball with a radius of approximately 4 cm.
Another exemplary audio acquisition context may include a production truck which may be configured to receive a signal from one or more microphones, such as one or more Eigen microphones. The production truck may also include an audio encoder.
The mobile device may also, in some instances, include a plurality of microphones that are collectively configured to record a 3D soundfield. In other words, the plurality of microphone may have X, Y, Z diversity. In some examples, the mobile device may include a microphone which may be rotated to provide X, Y, Z diversity with respect to one or more other microphones of the mobile device. The mobile device may also include an audio encoder.
Example audio playback devices that may perform various aspects of the techniques described in this disclosure are further discussed below. In accordance with one or more techniques of this disclosure, speakers and/or sound bars may be arranged in any arbitrary configuration while still playing back a 3D soundfield. Moreover, in some examples, headphone playback devices may be coupled to a decoder via either a wired or a wireless connection. In accordance with one or more techniques of this disclosure, a single generic representation of a soundfield may be utilized to render the soundfield on any combination of the speakers, the sound bars, and the headphone playback devices.
A number of different example audio playback environments may also be suitable for performing various aspects of the techniques described in this disclosure. For instance, a 5.1 speaker playback environment, a 2.0 (e.g., stereo) speaker playback environment, a 9.1 speaker playback environment with full height front loudspeakers, a 22.2 speaker playback environment, a 16.0 speaker playback environment, an automotive speaker playback environment, and a mobile device with ear bud playback environment may be suitable environments for performing various aspects of the techniques described in this disclosure.
In accordance with one or more techniques of this disclosure, a single generic representation of a soundfield may be utilized to render the soundfield on any of the foregoing playback environments. Additionally, the techniques of this disclosure enable a rendered to render a soundfield from a generic representation for playback on the playback environments other than that described above. For instance, if design considerations prohibit proper placement of speakers according to a 7.1 speaker playback environment (e.g., if it is not possible to place a right surround speaker), the techniques of this disclosure enable a render to compensate with the other 6 speakers such that playback may be achieved on a 6.1 speaker playback environment.
Moreover, a user may watch a sports game while wearing headphones. In accordance with one or more techniques of this disclosure, the 3D soundfield of the sports game may be acquired (e.g., one or more Eigen microphones may be placed in and/or around the baseball stadium), HOA coefficients corresponding to the 3D soundfield may be obtained and transmitted to a decoder, the decoder may reconstruct the 3D soundfield based on the HOA coefficients and output the reconstructed 3D soundfield to a renderer, the renderer may obtain an indication as to the type of playback environment (e.g., headphones), and render the reconstructed 3D soundfield into signals that cause the headphones to output a representation of the 3D soundfield of the sports game.
It should be noted that various functions performed by the one or more components of the systems and devices disclosed herein are described as being performed by certain components. This division of components is for illustration only. In an alternate implementation, a function performed by a particular component may be divided amongst multiple components. Moreover, in an alternate implementation, two or more components may be integrated into a single component or module. Each component may be implemented using hardware (e.g., a field-programmable gate array (FPGA) device, an application-specific integrated circuit (ASIC), a DSP, a controller, etc.), software (e.g., instructions executable by a processor), or any combination thereof.
Those of skill would further appreciate that the various illustrative logical blocks, configurations, circuits, and algorithm steps described in connection with the implementations disclosed herein may be implemented as electronic hardware, computer software executed by a processing device such as a hardware processor, or combinations of both. Various illustrative components, blocks, configurations, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or executable software depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.
The steps of a method or algorithm described in connection with the implementations disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in a memory device, such as random access memory (RAM), magnetoresistive random access memory (MRAM), spin-torque transfer MRAM (STT-MRAM), flash memory, read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), registers, hard disk, a removable disk, or a compact disc read-only memory (CD-ROM). An exemplary memory device is coupled to the processor such that the processor can read information from, and write information to, the memory device. In the alternative, the memory device may be integral to the processor. The processor and the storage medium may reside in an application-specific integrated circuit (ASIC). The ASIC may reside in a computing device or a user terminal. In the alternative, the processor and the storage medium may reside as discrete components in a computing device or a user terminal.
The previous description of the disclosed implementations is provided to enable a person skilled in the art to make or use the disclosed implementations. Various modifications to these implementations will be readily apparent to those skilled in the art, and the principles defined herein may be applied to other implementations without departing from the scope of the disclosure. Thus, the present disclosure is not intended to be limited to the implementations shown herein but is to be accorded the widest scope possible consistent with the principles and novel features as defined by the following claims.
The present application claims priority from U.S. Provisional Patent Application No. 62/737,711, filed Sep. 27, 2018, entitled “AMBISONIC SIGNAL NOISE REDUCTION FOR MICROPHONE ARRAYS,” which is incorporated by reference in its entirety.
Number | Name | Date | Kind |
---|---|---|---|
9865274 | Vicinus | Jan 2018 | B1 |
20190069083 | Salehin et al. | Feb 2019 | A1 |
20200005760 | Tao | Jan 2020 | A1 |
Number | Date | Country | |
---|---|---|---|
20200107118 A1 | Apr 2020 | US |
Number | Date | Country | |
---|---|---|---|
62737711 | Sep 2018 | US |