The present disclosure is generally related to a microphone.
Advances in technology have resulted in smaller and more powerful computing devices. For example, there currently exist a variety of portable personal computing devices, including wireless telephones such as mobile and smart phones, tablets and laptop computers that are small, lightweight, and easily carried by users. These devices can communicate voice and data packets over wireless networks. Further, many such devices incorporate additional functionality such as a digital still camera, a digital video camera, a digital recorder, and an audio file player. Also, such devices can process executable instructions, including software applications, such as a web browser application, that can be used to access the Internet. As such, these devices can include significant computing capabilities.
Wireless devices may include microphone arrays. Each microphone array may include multiple microphones that capture surrounding audio in three-dimensional environments. However, activating each microphone in a microphone array may consume a relatively high amount of energy.
A higher-order ambisonics (HOA) signal (often represented by a plurality of spherical harmonic coefficients (SHC) or other hierarchical elements) is a three-dimensional representation of a sound field. The HOA signal, or SHC representation of the HOA signal, may represent the sound field in a manner that is independent of local speaker geometry used to playback a multi-channel audio signal rendered from the HOA signal. The HOA signal may also facilitate backwards compatibility as the HOA signal may be rendered to multi-channel formats, such as a 5.1 audio channel format or a 7.1 audio channel format.
In a particular implementation, a microphone device includes a microphone array configured to capture one or more audio objects associated with a three-dimensional sound field. The microphone array includes a first cluster and a second cluster. The first cluster includes a first set of two or more microphone elements and the second cluster includes a second set of two or more microphone elements. The microphone device also includes a processor coupled to the microphone array. The processor is configured to receive directionality information associated with a sound source. The processor is also configured to select a first microphone element configuration for the first cluster based on a condition, the directionality information, or both. Each microphone element of the first set of two or more microphone elements is deactivated in response to selection of the first microphone element configuration.
In another particular implementation, a method includes capturing, at a microphone array, one or more audio objects associated with a three-dimensional sound field. The microphone array includes a first cluster and a second cluster. The first cluster includes a first set of two or more microphone elements and the second cluster includes a second set of two or more microphone elements. The method also includes determining, at a processor, directionality information associated with a sound source. The method further includes selecting a first microphone element configuration for the first cluster based on a condition, the directionality information, or both. Each microphone element of the first set of two or more microphone elements is deactivated in response to selection of the first microphone element configuration.
In another particular implementation, a non-transitory computer-readable medium includes instructions that, when executed by a processor, cause the processor to perform operations including initiating capture, at a microphone array, of one or more audio objects associated with a three-dimensional sound field. The microphone array includes a first cluster and a second cluster. The first cluster includes a first set of two or more microphone elements and the second cluster includes a second set of two or more microphone elements. The operations also include determining directionality information associated with a sound source. The operations further include selecting a first microphone element configuration for the first cluster based on a condition, the directionality information, or both. Each microphone element of the first set of two or more microphone elements is deactivated in response to selection of the first microphone element configuration.
In another particular implementation, an apparatus includes means for capturing one or more audio objects associated with a three-dimensional sound field. The means for capturing includes a first cluster and a second cluster. The first cluster includes a first set of two or more microphone elements and the second cluster includes a second set of two or more microphone elements. The apparatus also includes means for determining directionality information associated with a sound source. The apparatus further includes means for selecting a first microphone element configuration for the first cluster based on a condition, the directionality information, or both. Each microphone element of the first set of two or more microphone elements is deactivated in response to selection of the first microphone element configuration.
In another particular implementation, a microphone device includes a microphone array configured to capture one or more audio objects associated with a three-dimensional sound field. The microphone array includes clusters of two or more microphone elements. Each cluster includes one or more acoustic port openings and two or more microphone elements coupled to the one or more acoustic port openings via corresponding acoustic ports. The microphone device also includes a processor coupled to the microphone array.
In another particular implementation, a method includes capturing, at a microphone array, one or more audio objects associated with a three-dimensional sound field. The microphone array includes clusters of two or more microphone elements. Each cluster includes one or more acoustic port openings and two or more microphone elements coupled to the one or more acoustic port openings via corresponding acoustic ports. The method also includes processing the one or more captured audio objects.
In another particular implementation, an apparatus includes means for capturing one or more audio objects associated with a three-dimensional sound field. The means for capturing includes clusters of two or more microphone elements. Each cluster includes one or more acoustic port openings and two or more microphone elements coupled to the one or more acoustic port openings via corresponding acoustic ports. The apparatus also includes means for processing the one or more captured audio objects.
In another particular implementation, a microphone device includes a microphone array configured to capture one or more audio objects associated with a three-dimensional sound field. The microphone array includes a first cluster of two or more microphone elements and a second cluster of two or more microphone elements. The microphone array also includes an acoustic port opening that is shared by the first cluster and the second cluster. The microphone device also includes a processor coupled to the microphone array.
Other implementations, advantages, and features of the present disclosure will become apparent after review of the entire application, including the following sections: Brief Description of the Drawings, Detailed Description, and the Claims.
Particular aspects of the present disclosure are described below with reference to the drawings. In the description, common features are designated by common reference numbers. As used herein, various terminology is used for the purpose of describing particular implementations only and is not intended to be limiting of implementations. For example, the singular forms “a,” “an,” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It may be further understood that the terms “comprise,” “comprises,” and “comprising” may be used interchangeably with “include,” “includes,” or “including.” Additionally, it will be understood that the term “wherein” may be used interchangeably with “where.” As used herein, “exemplary” may indicate an example, an implementation, and/or an aspect, and should not be construed as limiting or as indicating a preference or a preferred implementation. As used herein, an ordinal term (e.g., “first,” “second,” “third,” etc.) used to modify an element, such as a structure, a component, an operation, etc., does not by itself indicate any priority or order of the element with respect to another element, but rather merely distinguishes the element from another element having a same name (but for use of the ordinal term). As used herein, the term “set” refers to one or more of a particular element, and the term “plurality” refers to multiple (e.g., two or more) of a particular element.
In the present disclosure, terms such as “determining,” “calculating,” “estimating,” “shifting,” “adjusting,” etc. may be used to describe how one or more operations are performed. It should be noted that such terms are not to be construed as limiting and other techniques may be utilized to perform similar operations. Additionally, as referred to herein, “generating,” “calculating,” “estimating,” “using,” “selecting,” “accessing,” and “determining” may be used interchangeably. For example, “generating,” “calculating,” “estimating,” or “determining” a parameter (or a signal) may refer to actively generating, estimating, calculating, or determining the parameter (or the signal) or may refer to using, selecting, or accessing the parameter (or signal) that is already generated, such as by another component or device. As used herein, “capturing an audio object” may correspond to capturing a sound signal or generating data representative of a sound signal.
In general, techniques are described for coding of higher-order ambisonics audio data. Higher-order ambisonics audio data may include at least one higher-order ambisonic (HOA) coefficient corresponding to a spherical harmonic basis function having an order greater than one.
The evolution of surround sound has made available many audio output formats for entertainment. Examples of such consumer surround sound formats are mostly ‘channel’ based in that they implicitly specify feeds to loudspeakers in certain geometrical coordinates. The consumer surround sound formats include the popular 5.1 format (which includes the following six channels: front left (FL), front right (FR), center or front center, back left or surround left, back right or surround right, and low frequency effects (LFE)), the growing 7.1 format, and various formats that includes height speakers such as the 7.1.4 format and the 22.2 format (e.g., for use with the Ultra High Definition Television standard). Non-consumer formats can span any number of speakers (in symmetric and non-symmetric geometries) often termed ‘surround arrays.’ One example of such a sound array includes 32 loudspeakers positioned at coordinates on the corners of a truncated icosahedron.
The input to a future Moving Picture Experts Group (MPEG) encoder is optionally one of three possible formats: (i) traditional channel-based audio (as discussed above), which is meant to be played through loudspeakers at pre-specified positions; (ii) object-based audio, which involves discrete pulse-code-modulation (PCM) data for single audio objects with associated metadata containing their location coordinates (amongst other information); or (iii) scene-based audio, which involves representing the sound field using coefficients of spherical harmonic basis functions (also called “spherical harmonic coefficients” or SHC, “Higher-order Ambisonics” or HOA, and “HOA coefficients”).
There are various ‘surround-sound’ channel-based formats currently available. The formats range, for example, from the 5.1 home theatre system (which has been the most successful in terms of making inroads into living rooms beyond stereo) to the 22.2 system developed by NHK (Nippon Hoso Kyokai or Japan Broadcasting Corporation). Content creators (e.g., Hollywood studios) would like to produce a soundtrack for a movie once, and not spend effort to remix it for each speaker configuration. Recently, Standards Developing Organizations have been considering ways in which to provide an encoding into a standardized bitstream and a subsequent decoding that is adaptable and agnostic to the speaker geometry (and number) and acoustic conditions at the location of the playback (involving a renderer).
To provide such flexibility for content creators, a hierarchical set of elements may be used to represent a sound field. The hierarchical set of elements may refer to a set of elements in which the elements are ordered such that a basic set of lower-ordered elements provides a full representation of the modeled sound field. As the set is extended to include higher-order elements, the representation becomes more detailed, increasing resolution.
One example of a hierarchical set of elements is a set of spherical harmonic coefficients (SHC). The following expression demonstrates a description or representation of a sound field using SHC:
The expression shows that the pressure pi at any point {rr,θr,φr} of the soundfield, at time t, can be represented uniquely by the SHC, Anm(k). Here,
c is the speed of sound (˜343 m/s), {rr,θr,φr} is a point of reference (or observation point), jn(·) is the spherical Bessel function of order n, and Ynm (θn,φr) are the spherical harmonic basis functions of order n and suborder m. It can be recognized that the term in square brackets is a frequency-domain representation of the signal (i.e., S(ω,rr,θr,φr)) which can be approximated by various time-frequency transformations, such as the discrete Fourier transform (DFT), the discrete cosine transform (DCT), or a wavelet transform. Other examples of hierarchical sets include sets of wavelet transform coefficients and other sets of coefficients of multiresolution basis functions.
A number of spherical harmonic basis functions for a particular order may be determined as: # basis functions=(n+1){circumflex over ( )}2. For example, a tenth order (n=10) would correspond to 122 spherical harmonic basis functions (e.g., (10+1){circumflex over ( )}2). The SHC Anm(k) can either be physically acquired (e.g., recorded) by various microphone array configurations or, alternatively, they can be derived from channel-based or object-based descriptions of the sound field. The SHC represent scene-based audio, where the SHC may be input to an audio encoder to obtain encoded SHC that may promote more efficient transmission or storage. For example, a fourth-order representation involving (1+4)2 (25, and hence fourth order) coefficients may be used.
To illustrate how the SHCs may be derived from an object-based description, consider the following equation. The coefficients Anm(k) for the soundfield corresponding to an individual audio object may be expressed as:
Anm(k)=g(ω)(−4πik)hn(2)(krs)Ynm*(θs,φs),
where i is √{square root over (−1)}, hn(2)(·) is the spherical Hankel function (of the second kind) of order n, and {rs,θs,φs} is the location of the object. Knowing the object source energy g(ω) as a function of frequency (e.g., using time-frequency analysis techniques, such as performing a fast Fourier transform on the PCM stream) enables conversion of each PCM object and the corresponding location into the SHC Anm(k). Further, it can be shown (since the above is a linear and orthogonal decomposition) that the Anm(k) coefficients for each object are additive. In this manner, a multitude of PCM objects can be represented by the Anm(k) coefficients (e.g., as a sum of the coefficient vectors for the individual objects). Essentially, the coefficients contain information about the sound field (the pressure as a function of 3D coordinates), and the above represents the transformation from individual objects to a representation of the overall sound field, in the vicinity of the observation point {rr,θr,φr}. The remaining figures are described below in the context of object-based and SHC-based audio coding.
Referring to
The microphone array 102 includes a microphone cluster 104, a microphone cluster 106, and a microphone cluster 108. Although three microphone clusters 104, 106, 108 are shown, in other implementations, the microphone array 102 may include additional (or fewer) microphone clusters. As a non-limiting example, the microphone array 102 may include twelve microphone clusters. Each microphone cluster 104, 106, 108 includes a plurality of microphone elements (e.g., two or more microphones). The microphone array 102 may have different geometries (e.g., shapes). For example, the microphone array 102 may be a spherical microphone array (e.g., have a spherical geometry), a linear microphone array (e.g., have a linear geometry), a circular microphone array (e.g., have a circular geometry), etc.
As depicted in
Additionally, as depicted in
Each microphone cluster 104, 106 includes a single acoustic port opening. For example, the microphone cluster 104 includes an acoustic port opening 150 that is coupled to each microphone element 172-178 via corresponding acoustic ports, and the microphone cluster 106 includes an acoustic port opening 160 that is coupled to each microphone element 182-188 via corresponding acoustic ports. Thus, a “microphone cluster” may include a physical arrangement of microphone elements that are coupled to the same acoustic port opening. An example implementation of the microphone cluster 104 is shown in
Referring to
Referring back to
After the audio signals 151, 161 are received using the corresponding acoustic port openings 150, 160, each respective microphone element 172-178, 182-188 may capture soundwaves associated with the audio signals 151, 161. To illustrate, the audio signal 151 may be comprised of multiple soundwaves having substantially similar properties (e.g., phases and amplitudes). With reference to
Thus, the microphone element 172 captures audio 312 based on the first soundwaves 302 of the audio signal 151, the microphone element 174 captures audio 314 based on the second soundwaves 304 of the audio signal 151, the microphone element 176 captures audio 316 based on the third soundwaves 306 of the audio signal 151, and the microphone element 178 captures audio 318 based on the fourth soundwaves 308 of the audio signal 151. The microphone elements 172-178 may be configured to capture the audio 312-318 at the same time because the lengths of the acoustic ports 202-208 are similar. As a result, the microphone cluster 104A may operate as a “natural amplifier” and amplify the audio signal 151 in response to each microphone element 172-178 capturing the audio 312-318 at the same time. For example, because a typical microphone configuration has a one-to-one ratio of microphone elements and acoustic port openings (e.g., each microphone element has a separate acoustic port opening), a single microphone element in a typical configuration would capture the audio signal 151. However, in
The ADC 152 converts the captured audio 312 from an analog signal into a digital signal 153, the ADC 154 converts the captured audio 314 from an analog signal into a digital signal 155, the ADC 156 converts the captured audio 316 from an analog signal into a digital signal 157, and the ADC 158 converts the captured audio 318 from an analog signal into a digital signal 159. The digital signals 153, 155, 157, 159 are provided to the processor 110.
Referring to
Referring back to
Although each microphone cluster 104, 106 is shown to have a single acoustic port opening, in other implementations, one or more microphone clusters in the microphone array 102 may have different configurations. For example, referring to
The microphone cluster 108A includes a microphone element 220, a microphone element 221, a microphone element 222, and a microphone element 223. Two or more of the microphone elements 220-223 may be included in a MEMS package, a package made of metal, a package made of ceramic, a package made of fiber glass, a package made of a silicon material, a package made from a printed circuit board material, a package made of another material, etc. The housing 200 is positioned over the microphone elements 220-223. An acoustic port 224 is coupled to the microphone element 220, an acoustic port 225 is coupled to the microphone element 221, an acoustic port 226 is coupled to the microphone element 222, and an acoustic port 227 is coupled to the microphone element 223. The housing 200 includes an acoustic port opening 228 associated with the acoustic port 224, an acoustic port opening 229 associated with the acoustic port 225, an acoustic port opening 230 associated with the acoustic port 226, and an acoustic port opening 231 associated with the acoustic port 227. According to
Referring to
An acoustic port 242 is coupled to the microphone element 240, and an acoustic port 243 is coupled to the microphone element 241. The housing 200 includes an acoustic port opening 244 associated with the acoustic port 242, and the housing 239 includes an acoustic port opening 245 associated with the acoustic port 243. Thus, the microphone array 108B includes two non-coplanar acoustic port openings 244, 245.
Referring to
An acoustic port 252 is coupled to the microphone element 250, and an acoustic port 253 is coupled to the microphone element 251. The housing 200 includes an acoustic port opening 254 associated with the acoustic port 252, and the housing 249 includes an acoustic port opening 255 associated with the acoustic port 253. The microphone array 108C includes two orthogonal acoustic port openings 254, 255.
Although the microphone elements shown in
Referring to
The housing 200 is positioned over the microphone elements 172-178, 262-265. The housing 239 is positioned below (e.g., beneath) the microphone elements 172-178, 262-265. The acoustic port 202 is coupled to the microphone element 172, the acoustic port 204 is coupled to the microphone element 174, the acoustic port 206 is coupled to the microphone element 176, and the acoustic port 208 is coupled to the microphone element 178. The housing 200 includes the acoustic port opening 150 that is coupled to the acoustic ports 202-208. Thus, all four acoustic ports 202-208 are coupled to the single acoustic port opening 150 of the microphone cluster 104A.
Additionally, the microphone clusters 104B, 108D are coupled to another acoustic port opening 275 (e.g., a shared acoustic port opening) in the housing 200, and the microphone clusters 104B, 108D are coupled to another acoustic port opening 276 (e.g., a shared acoustic port opening) in the housing 200. For example, an acoustic port 271 is coupled to the microphone element 174, an acoustic port 272 is coupled to the microphone element 262, and the acoustic port opening 275 in the housing is coupled to the acoustic ports 271, 272. Additionally, an acoustic port 273 is coupled to the microphone element 178, an acoustic port 274 is coupled to the microphone element 264, and the acoustic port opening 275 in the housing 200 is coupled to the acoustic ports 273, 274. Thus, the acoustic port openings 275, 276 are shared between two microphone clusters 104B, 108D.
Although the acoustic port openings 275, 276, 277 are located in the housing 200, in other implementations, one or more of the acoustic port openings 275, 276, 277 may be located in the housing 239. For example, one or more of the acoustic port openings 275, 276, 277 may be located beneath the microphone elements 172-178, 262-265 to capture sound from a substantially different location than the sound captured using the acoustic port opening 150.
Referring back to
The directionality determination unit 111 may be configured to determine directionality information 120 associated with the sound source 140 based on the microphone array 102. For example, the directionality determination unit 111 may process the digital signals 153, 155, 157, 159, 163, 165, 167, 169 to determine which microphone cluster 104, 106 is more proximate to the sound source 140. According to one implementation, the directionality determination unit 111 may compare an amplitude of sound as encoded in the digital signals to determine which microphone cluster 104, 106 is more proximate to the sound source 140. To illustrate, if the sound encoded in the digital signals 163, 165, 167, 169 have a larger amplitude than the sound encoded in the digital signals 153, 155, 157, 159, the directionality information 120 may indicate that the sound source 140 is more proximate to the microphone cluster 106.
Based on a determination that the sound source 140 is positioned closer to the microphone cluster 106, the cluster configuration unit selector 112 may select a first microphone element configuration 121 for the microphone cluster 104 and may select a second microphone element configuration 122 for the microphone cluster 106. The cluster configuration unit selector 112 may send, via a control bus 130, a first signal (e.g., a deactivation signal) to transition the microphone cluster 104 into the first microphone element configuration 121. In response to receiving the first signal, each microphone element 172-178 of the microphone cluster 104 is deactivated. Energy consumption at the microphone array 102 is reduced in response to selection of the first microphone element configuration 121 for the microphone cluster 104. The cluster configuration unit selector 112 may send, via the control bus 130, a second signal (e.g., an activation signal) to the microphone cluster 106. In response to receiving the second signal, each microphone element 182-188 of the microphone cluster 106 is (or remains) activated.
In other implementations, the cluster configuration unit selector 112 may also select from microphone configurations that differ from the first and second microphone configurations 121, 122. For example, the cluster configuration unit selector 112 may select a third microphone element configuration (not shown) in which some (but not all) of the microphone elements of a cluster are deactivated. To illustrate, the microphone elements 172, 178 may be deactivated and the microphone elements 174, 76 may be activated if the third microphone element configuration is applied to the microphone cluster 104.
According to one implementation, the cluster configuration unit selector 112 may select the second microphone configuration 122 for six microphone clusters. To illustrate, the cluster configuration unit selector 112 may select the second microphone configuration 122 for a cluster facing a first cardinal direction (e.g., north), a cluster facing a second cardinal direction (e.g., south), a cluster facing a third cardinal direction (e.g., east), and a cluster facing a fourth cardinal direction (e.g., west). The cluster configuration unit selector 112 may also select the second microphone configuration 122 for a cluster facing an upwards direction and a cluster facing a downwards direction. After the six microphone clusters are operating according to the second microphone configuration 122, the directionality determination unit 111 determines the location of the sound source 140. Based on the location, the cluster configuration unit selector 112 activates additional microphone clusters pointing towards the sound source 140 (e.g., selects the second microphone configuration 122 for microphone clusters pointing towards the sound source 140). In some circumstances, the cluster configuration unit selector 112 deactivates the microphone elements 122 that are not facing the sound source 140 (e.g., selects the first microphone configuration 122 for the microphone clusters not facing the sound source 140).
The sound source tracking unit 113 may be configured to track movements of the sound source 140 as the sound source moves from a first position 123 to a second position 124. The sound source 140 is closer to the microphone cluster 104 when the sound source 140 is in the first position 123, and the sound source 140 is closer to the microphone cluster 106 when the sound source 140 is in the second position 123. Based on the tracked movements, the cluster configuration unit selector 112 may select the first microphone element configuration 121 for the microphone cluster 106 when the sound source 140 is proximate to the first position 123. Additionally, the cluster configuration unit selector 112 may select the second microphone element configuration 122 for the microphone cluster 104 when the sound source 140 is proximate to the first position 123. If the sound source 140 is proximate to the second position 124, the cluster configuration unit selector 112 may select the first microphone element configuration 121 for the microphone cluster 104 and may select the second microphone element configuration 122 for the microphone cluster 106.
The signal-to-noise comparison unit 114 may be configured to compare a first signal-to-noise ratio (SNR) 125 associated with the microphone cluster 104 to a second SNR 126 associated with the microphone cluster 106. The first SNR 125 is determined based on the digital signals 153, 155, 157, 159, and the second SNR 126 is determined based on the digital signals 163, 165, 167, 169. For example, the first SNR 125 may be indicative of an average SNR of the digital signals 153, 155, 157, 159, and the second SNR 126 may be indicative of an average SNR of the digital signals 163, 165, 167, 169. The cluster configuration unit selector 112 may select the first microphone element configuration 121 for the cluster 104 if the second SNR 126 is greater than the first SNR 125. A SNR for the microphone array 102 is increased in response to selection of the first microphone element configuration 121 for the cluster 104 because microphone elements 172-178 that capture a relatively large amount of noise are deactivated. Additionally, the cluster configuration unit selector 112 may select the second microphone element configuration 122 for the cluster 106 if the second SNR 126 is greater than the first SNR.
According to some implementations, the cluster configuration unit selector 112 may determine the microphone element configurations for each cluster 104, 106 based on the SNRs 125, 126 and the directionality information 120. As a non-limiting example, the cluster configuration unit selector 112 may select the first microphone element configuration 121 for microphone clusters having SNRs that fall below a threshold and for microphone clusters not facing the sound source 140. This may result in further power savings.
The ambisonics generation unit 115 may generate ambisonics signals 190 based on the digital signals provided by the microphone array 102. As a non-limiting example, based on the received digital signals, the ambisonics generation unit 115 may generate first-order ambisonics signals 190 (e.g., a W signal, an X signal, a Y signal, and a Z signal) that represent the three-dimensional sound field captured by the microphone array 102. According to other implementations, the ambisonics generation unit 115 may generate second-order ambisonics signals, third-order ambisonics signals, etc. The audio encoder 116 may be configured to encode the ambisonic signals 190 to generate an encoded bitstream 192. The encoded bitstream 192 may be transmitted to a decoder device to reconstruct the three-dimensional sound field that is represented by the ambisonic signals 190.
The techniques described with respect to
Additionally, the techniques described with respect to
Referring to
The method 500 includes capturing, at a microphone array, one or more audio objects associated with a three-dimensional sound field, at 502. The microphone array includes a plurality of microphone elements grouped into clusters of two or more microphone elements. For example, referring to
The method 500 also includes determining, at a processor, directionality information associated with a sound source, at 504. For example, referring to
The method 500 also includes selecting a microphone element configuration for each cluster based on the directionality information, at 506. For example, referring to
The method 500 of
Additionally, the method 500 may balance data throughput with sound quality based on the techniques described with respect to
Referring to
The method 550 includes capturing, at a microphone array, one or more audio objects associated with a three-dimensional sound field, at 552. The microphone array includes a first cluster and a second cluster. The first cluster includes a first set of two or more microphone elements, and the second cluster includes a second set of two or more microphone elements. For example, referring to
The method 500 also includes determining, at a processor, directionality information associated with a sound source, at 554. For example, referring to
The method 500 also includes selecting a first microphone element configuration for the first cluster based on a condition, the directionality information, or both, at 556. Each microphone element of the first set of two or more microphone elements is deactivated in response to selection of the first microphone element configuration. For example, referring to
According to one implementation, the condition indicates that a signal-to-noise ratio associated with the cluster 104 fails to satisfy a signal-to-noise ratio threshold. According to another implementation, the condition indicates that data throughput associated with the microphone array 102 fails to satisfy a data throughput threshold. According to another implementation, the condition indicates that an amount of power consumed by the microphone array 102 exceeds a power limit.
In some implementations, the condition corresponds to reduction of the amount of power provided to the microphone array 102. In other implementations, the condition corresponds to a tradeoff between power consumption and a signal-to-noise ratio. For example, the condition may indicate that selection of the first microphone element configuration 121 for the microphone cluster 104 will result in an amount of power consumed by the microphone array 102 satisfying a power limit and a signal-to-noise ratio associated with the microphone array 102 satisfying a signal-to-noise ratio threshold.
According to some implementations, the method 550 includes after a fixed interval of time, selecting a second microphone element configuration for the first cluster. Each microphone element of the first set of two or more microphone elements is activated in response to selection of the second microphone element configuration. According to other implementations, the method 550 includes detecting that at least one signal associated with the second cluster fails to satisfy a signal threshold and selecting the second microphone element configuration for the first cluster in response to the detection.
According to some implementations, the method 550 may include determining whether a laptop is open or closed, as further described with respect to
The method 550 of
Additionally, the method 550 may balance data throughput with sound quality based on the techniques described with respect to
Referring to
The method 600 includes capturing, at a microphone array, one or more audio objects associated with a three-dimensional sound field, at 602. The microphone array includes clusters of two or more microphone elements. For the purposes of the method 600, each cluster includes an acoustic port opening and two or more microphone elements coupled to the acoustic port opening via corresponding acoustic port. Thus, for the purposes of the method 600, each cluster is defined by a single acoustic port opening. For example, referring to
The method 600 also includes processing the one or more captured audio objects, at 604. For example, the processor 110 may process the audio 142 captured by the microphone array 102.
The method 600 may enable the microphone cluster 104 to operate as a “natural amplifier” and amplify the audio signal 151 in response to each microphone element 172-178 capturing the audio 312-318 at the same time. For example, because a typical microphone configuration has a one-to-one ratio of microphone elements and acoustic port openings (e.g., each microphone element has a separate acoustic port opening), a single microphone element in a typical configuration would capture the audio signal 151. However, in
Referring to
The method 650 includes capturing, at a microphone array, one or more audio objects associated with a three-dimensional sound field, at 652. The microphone array includes clusters of two or more microphone elements. Each cluster includes one or more acoustic port openings and two or more microphone elements coupled to the one or more acoustic port openings via corresponding acoustic ports. For example, referring to
The method 650 also includes processing the one or more captured audio objects, at 654. For example, the processor 110 may process the audio 142 captured by the microphone array 102.
Referring to
The memory 732 includes instructions 768 (e.g., executable instructions) such as computer-readable instructions or processor-readable instructions. The instructions 768 may include one or more instructions that are executable by a computer, such as the processor 110.
The device 700 may include a headset, a mobile communication device, a smart phone, a cellular phone, a laptop computer, a computer, a tablet, a personal digital assistant, a display device, a television, a gaming console, a music player, a radio, a digital video player, a digital video disc (DVD) player, a tuner, a camera, a navigation device, a vehicle, a component of a vehicle, or any combination thereof, as illustrative, non-limiting examples.
In an illustrative implementation, the memory 732 may include or correspond to a non-transitory computer readable medium storing the instructions 768. The instructions 768 may include one or more instructions that are executable by a computer, such as the processor 110. The instructions 768 may cause the processor 110 to perform one or more operations described herein, including but not limited to one or more portions of the methods 500, 550, 600, 650 of
One or more components of the device 700 may be implemented via dedicated hardware (e.g., circuitry), by a processor executing instructions to perform one or more tasks, or a combination thereof. As an example, the memory 732 or one or more components of the processor 110, and/or the CODEC 734 may be a memory device, such as a random access memory (RAM), magnetoresistive random access memory (MRAM), spin-torque transfer MRAM (STT-MRAM), flash memory, read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), registers, hard disk, a removable disk, or a compact disc read-only memory (CD-ROM). The memory device may include instructions (e.g., the instructions 768) that, when executed by a computer (e.g., a processor in the CODEC 734 or the processor 110), may cause the computer to perform one or more operations described with reference to
In a particular implementation, one or more components of the systems and devices disclosed herein may be integrated into a decoding system or apparatus (e.g., an electronic device, a CODEC, or a processor therein), into an encoding system or apparatus, or both. In other implementations, one or more components of the systems and devices disclosed herein may be integrated into a wireless telephone, a tablet computer, a desktop computer, a laptop computer, a set top box, a music player, a video player, an entertainment unit, a television, a game console, a navigation device, a communication device, a personal digital assistant (PDA), a fixed location data unit, a personal media player, or another type of device.
In conjunction with the described techniques, a first apparatus includes means for capturing one or more audio objects associated with a three-dimensional sound field. The means for capturing includes a first cluster and a second cluster. The first cluster includes a first set of two or more microphone elements, and the second cluster includes a second set of two or more microphone elements. For example, the means for capturing may include the microphone array 102 of
The first apparatus also includes means for determining directionality information associated with a sound source. For example, the means for determining may include the processor 110 of
The first apparatus also includes means for selecting a first microphone element configuration for the first cluster based on a condition, the directionality information, or both. Each microphone element of the first set of two or more microphone elements is deactivated in response to selection of the first microphone element configuration. For example, the means for selecting may include the processor 110 of
In conjunction with the described techniques, a second apparatus includes means for capturing one or more audio objects associated with a three-dimensional sound field. The means for capturing includes clusters of two or more microphone elements. Each cluster includes one or more acoustic port openings and two or more microphone elements coupled to the one or more acoustic port openings via corresponding acoustic ports. For example, the means for capturing may include the microphone array 102 of
Referring to
A microphone array 810 is located along an upper portion of the laptop 800. As illustrated in
The microphone array 810 includes a microphone cluster 811, a microphone cluster 812, a microphone cluster 813, a microphone cluster 814, a microphone cluster 815, a microphone cluster 816, and a microphone cluster 817. According to one implementation, the microphone array 810 may operate in a substantially similar manner as the microphone array 102 of
According to one implementation, in response to a determination that the laptop 800 is closed, the microphone clusters 811-817 may transition into the first microphone element configuration 121 to conserve energy. For example, microphone elements (not shown) within the microphone clusters 811-817 may transition into a low-power state (e.g., an “off” state) in response to a determination that the laptop 800 is closed. According to some implementations, one or more of the microphone clusters 811-817 may have a similar configuration as the microphone cluster 108B of
According to another implementation, in response to a determination that the laptop 800 is open, select microphone clusters 811, 812, 816, 817 may transition into the first microphone element configuration 121 and other microphone clusters 813-815 may transition into the second microphone element configuration 122. Thus, the microphone clusters 813-815 positioned near the center to laptop 800 (e.g., the microphone elements more likely to capture the user's voice) are activated, and the microphone clusters 811, 812, 816, 817 positioned towards the peripheral of the laptop 800 (e.g., the microphone clusters more likely to capture noise) are deactivated. As a result, the SNR of the captured audio may be relatively high because noise that would otherwise be captured by microphone elements in the microphone clusters 811, 812, 816, 817 is not captured.
Referring to
The band 902 includes a microphone cluster 911, a microphone cluster 912, a microphone cluster 913, a microphone cluster 914, a microphone cluster 915, and a microphone cluster 916. The microphone clusters 911-916 may have the same configuration (and operate in a substantially similar manner) as the microphone clusters 104, 106, 108 of
One or more of the microphone clusters 911-916 may be operable to detect a pulse of the user. For example, microphone elements within the microphone clusters 911-916 may capture ultrasound (or another acoustical frequency) associated with the pulse of the user. The pulse may be displayed on the screen of the timepiece 904. As illustrated in
According to some implementations, one or more of the microphone clusters 911-917 may have a similar configuration as the microphone cluster 108B of
The foregoing techniques may be performed with respect to any number of different contexts and audio ecosystems. A number of example contexts are described below, although the techniques should be limited to the example contexts. One example audio ecosystem may include audio content, movie studios, music studios, gaming audio studios, channel based audio content, coding engines, game audio stems, game audio coding/rendering engines, and delivery systems.
The movie studios, the music studios, and the gaming audio studios may receive audio content. In some examples, the audio content may represent the output of an acquisition. The movie studios may output channel based audio content (e.g., in 2.0, 5.1, and 7.1) such as by using a digital audio workstation (DAW). The music studios may output channel based audio content (e.g., in 2.0, and 5.1) such as by using a DAW. In either case, the coding engines may receive and encode the channel based audio content based one or more codecs (e.g., AAC, AC3, Dolby True HD, Dolby Digital Plus, and DTS Master Audio) for output by the delivery systems. The gaming audio studios may output one or more game audio stems, such as by using a DAW. The game audio coding/rendering engines may code and or render the audio stems into channel based audio content for output by the delivery systems. Another example context in which the techniques may be performed includes an audio ecosystem that may include broadcast recording audio objects, professional audio systems, consumer on-device capture, HOA audio format, on-device rendering, consumer audio, TV, and accessories, and car audio systems.
The broadcast recording audio objects, the professional audio systems, and the consumer on-device capture may all code their output using HOA audio format. In this way, the audio content may be coded using the HOA audio format into a single representation that may be played back using the on-device rendering, the consumer audio, TV, and accessories, and the car audio systems. In other words, the single representation of the audio content may be played back at a generic audio playback system (i.e., as opposed to requiring a particular configuration such as 5.1, 7.1, etc.), such as audio playback system 16.
Other examples of context in which the techniques may be performed include an audio ecosystem that may include acquisition elements, and playback elements. The acquisition elements may include wired and/or wireless acquisition devices (e.g., Eigen microphones), on-device surround sound capture, and mobile devices (e.g., smartphones and tablets). In some examples, wired and/or wireless acquisition devices may be coupled to mobile device via wired and/or wireless communication channel(s).
In accordance with one or more techniques of this disclosure, the mobile device may be used to acquire a sound field. For instance, the mobile device may acquire a sound field via the wired and/or wireless acquisition devices and/or the on-device surround sound capture (e.g., a plurality of microphones integrated into the mobile device). The mobile device may then code the acquired sound field into the HOA coefficients for playback by one or more of the playback elements. For instance, a user of the mobile device may record (acquire a sound field of) a live event (e.g., a meeting, a conference, a play, a concert, etc.), and code the recording into HOA coefficients.
The mobile device may also utilize one or more of the playback elements to playback the HOA coded sound field. For instance, the mobile device may decode the HOA coded sound field and output a signal to one or more of the playback elements that causes the one or more of the playback elements to recreate the sound field. As one example, the mobile device may utilize the wireless and/or wireless communication channels to output the signal to one or more speakers (e.g., speaker arrays, sound bars, etc.). As another example, the mobile device may utilize docking solutions to output the signal to one or more docking stations and/or one or more docked speakers (e.g., sound systems in smart cars and/or homes). As another example, the mobile device may utilize headphone rendering to output the signal to a set of headphones, e.g., to create realistic binaural sound.
In some examples, a particular mobile device may both acquire a 3D sound field and playback the same 3D sound field at a later time. In some examples, the mobile device may acquire a 3D sound field, encode the 3D sound field into HOA, and transmit the encoded 3D sound field to one or more other devices (e.g., other mobile devices and/or other non-mobile devices) for playback.
Yet another context in which the techniques may be performed includes an audio ecosystem that may include audio content, game studios, coded audio content, rendering engines, and delivery systems. In some examples, the game studios may include one or more DAWs which may support editing of HOA signals. For instance, the one or more DAWs may include HOA plugins and/or tools which may be configured to operate with (e.g., work with) one or more game audio systems. In some examples, the game studios may output new stem formats that support HOA. In any case, the game studios may output coded audio content to the rendering engines which may render a sound field for playback by the delivery systems.
The techniques may also be performed with respect to exemplary audio acquisition devices. For example, the techniques may be performed with respect to an Eigen microphone which may include a plurality of microphones that are collectively configured to record a 3D sound field. In some examples, the plurality of microphones of Eigen microphone may be located on the surface of a substantially spherical ball with a radius of approximately 4 cm. In some examples, the audio encoding device 20 may be integrated into the Eigen microphone so as to output a bitstream 21 directly from the microphone.
Another exemplary audio acquisition context may include a production truck which may be configured to receive a signal from one or more microphones, such as one or more Eigen microphones. The production truck may also include an audio encoder, such as audio encoder 20.
The mobile device may also, in some instances, include a plurality of microphones that are collectively configured to record a 3D sound field. In other words, the plurality of microphone may have X, Y, Z diversity. In some examples, the mobile device may include a microphone which may be rotated to provide X, Y, Z diversity with respect to one or more other microphones of the mobile device. The mobile device may also include an audio encoder, such as audio encoder 20.
Example audio playback devices that may perform various aspects of the techniques described in this disclosure are further discussed below. In accordance with one or more techniques of this disclosure, speakers and/or sound bars may be arranged in any arbitrary configuration while still playing back a 3D sound field. Moreover, in some examples, headphone playback devices may be coupled to a decoder 24 via either a wired or a wireless connection. In accordance with one or more techniques of this disclosure, a single generic representation of a sound field may be utilized to render the sound field on any combination of the speakers, the sound bars, and the headphone playback devices.
A number of different example audio playback environments may also be suitable for performing various aspects of the techniques described in this disclosure. For instance, a 5.1 speaker playback environment, a 2.0 (e.g., stereo) speaker playback environment, a 9.1 speaker playback environment with full height front loudspeakers, a 22.2 speaker playback environment, a 16.0 speaker playback environment, an automotive speaker playback environment, and a mobile device with ear bud playback environment may be suitable environments for performing various aspects of the techniques described in this disclosure.
In accordance with one or more techniques of this disclosure, a single generic representation of a sound field may be utilized to render the sound field on any of the foregoing playback environments. Additionally, the techniques of this disclosure enable a rendered to render a sound field from a generic representation for playback on the playback environments other than that described above. For instance, if design considerations prohibit proper placement of speakers according to a 7.1 speaker playback environment (e.g., if it is not possible to place a right surround speaker), the techniques of this disclosure enable a render to compensate with the other 6 speakers such that playback may be achieved on a 6.1 speaker playback environment.
Moreover, a user may watch a sports game while wearing headphones. In accordance with one or more techniques of this disclosure, the 3D sound field of the sports game may be acquired (e.g., one or more Eigen microphones may be placed in and/or around the baseball stadium), HOA coefficients corresponding to the 3D sound field may be obtained and transmitted to a decoder, the decoder may reconstruct the 3D sound field based on the HOA coefficients and output the reconstructed 3D sound field to a renderer, the renderer may obtain an indication as to the type of playback environment (e.g., headphones), and render the reconstructed 3D sound field into signals that cause the headphones to output a representation of the 3D sound field of the sports game.
It should be noted that various functions performed by the one or more components of the systems and devices disclosed herein are described as being performed by certain components or modules. This division of components and modules is for illustration only. In an alternate implementation, a function performed by a particular component or module may be divided amongst multiple components or modules. Moreover, in an alternate implementation, two or more components or modules may be integrated into a single component or module. Each component or module may be implemented using hardware (e.g., a field-programmable gate array (FPGA) device, an application-specific integrated circuit (ASIC), a DSP, a controller, etc.), software (e.g., instructions executable by a processor), or any combination thereof.
Those of skill would further appreciate that the various illustrative logical blocks, configurations, modules, circuits, and algorithm steps described in connection with the implementations disclosed herein may be implemented as electronic hardware, computer software executed by a processing device such as a hardware processor, or combinations of both. Various illustrative components, blocks, configurations, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or executable software depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.
The steps of a method or algorithm described in connection with the implementations disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in a memory device, such as random access memory (RAM), magnetoresistive random access memory (MRAM), spin-torque transfer MRAM (STT-MRAM), flash memory, read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), registers, hard disk, a removable disk, or a compact disc read-only memory (CD-ROM). An exemplary memory device is coupled to the processor such that the processor can read information from, and write information to, the memory device. In the alternative, the memory device may be integral to the processor. The processor and the storage medium may reside in an application-specific integrated circuit (ASIC). The ASIC may reside in a computing device or a user terminal. In the alternative, the processor and the storage medium may reside as discrete components in a computing device or a user terminal.
The previous description of the disclosed implementations is provided to enable a person skilled in the art to make or use the disclosed implementations. Various modifications to these implementations will be readily apparent to those skilled in the art, and the principles defined herein may be applied to other implementations without departing from the scope of the disclosure. Thus, the present disclosure is not intended to be limited to the implementations shown herein but is to be accorded the widest scope possible consistent with the principles and novel features as defined by the following claims.
The present application claims priority from U.S. Provisional Patent Application No. 62/492,106 filed Apr. 28, 2017, entitled “MULTI-ORDER MICROPHONE CONFIGURATIONS,” which is incorporated by reference in its entirety.
Number | Name | Date | Kind |
---|---|---|---|
5715319 | Chu | Feb 1998 | A |
7092539 | Sheplak et al. | Aug 2006 | B2 |
7623672 | Wu et al. | Nov 2009 | B2 |
7657025 | Hsu | Feb 2010 | B2 |
7933428 | Sawada | Apr 2011 | B2 |
8649545 | Tanaka | Feb 2014 | B2 |
8824699 | Derkx | Sep 2014 | B2 |
8879767 | Wickstrom | Nov 2014 | B2 |
8897455 | Visser | Nov 2014 | B2 |
8958592 | Huang | Feb 2015 | B2 |
8965005 | Gopalakrishnan et al. | Feb 2015 | B1 |
9301033 | Han | Mar 2016 | B2 |
9552840 | Kim et al. | Jan 2017 | B2 |
9668055 | Baggio et al. | May 2017 | B2 |
9712936 | Peters | Jul 2017 | B2 |
9888316 | Matheja et al. | Feb 2018 | B2 |
20030059061 | Tsuji et al. | Mar 2003 | A1 |
20050175190 | Tashev | Aug 2005 | A1 |
20060280318 | Warren et al. | Dec 2006 | A1 |
20070177752 | Kargus, IV | Aug 2007 | A1 |
20080146289 | Korneluk et al. | Jun 2008 | A1 |
20080181430 | Zhang et al. | Jul 2008 | A1 |
20090103704 | Kitada et al. | Apr 2009 | A1 |
20100086164 | Gong | Apr 2010 | A1 |
20100111340 | Miller et al. | May 2010 | A1 |
20120076316 | Zhu et al. | Mar 2012 | A1 |
20120328142 | Horibe et al. | Dec 2012 | A1 |
20130029684 | Kawaguchi et al. | Jan 2013 | A1 |
20130070951 | Tanaka | Mar 2013 | A1 |
20130129136 | Harney et al. | May 2013 | A1 |
20130294616 | Mulder | Nov 2013 | A1 |
20140105416 | Huttunen et al. | Apr 2014 | A1 |
20140161295 | Huang | Jun 2014 | A1 |
20140358563 | Sen et al. | Dec 2014 | A1 |
20150003638 | Kasai | Jan 2015 | A1 |
20160140949 | Fan et al. | May 2016 | A1 |
20160150325 | Oliaei | May 2016 | A1 |
20170150255 | Wang et al. | May 2017 | A1 |
20170325038 | Yu et al. | Nov 2017 | A1 |
20180033447 | Ramprashad et al. | Feb 2018 | A1 |
20180049660 | Sato | Feb 2018 | A1 |
20180133583 | Tran et al. | May 2018 | A1 |
20180317006 | Heimbigner et al. | Nov 2018 | A1 |
Number | Date | Country |
---|---|---|
2988527 | Feb 2016 | EP |
3145213 | Mar 2017 | EP |
Entry |
---|
Kobayashi K., et al., “Estimation of Multiple Talker Locations Using Randomly Positioned Microphones (Method of Subarray Selection)”, Electronics and Communications in Japan, Scripta Technica, New York, US, Jun. 1, 2001, vol. 84, No. 9, Part 03, XP001063555, DOI: 10.1002/ECJC.1 033, pp. 42-48. |
“Eigenmike® microphone: Digital Signal Processing, Acoustics”, 2017, 3 Pages. |
Sennheiser: “Ambeo® VR Mic”, 3D Audio Technology, 2017, 9 Pages. |
International Search Report and Written Opinion—PCT/US2018/022361—ISA/EPO—dated Jul. 3, 2018. |
Number | Date | Country | |
---|---|---|---|
20180317002 A1 | Nov 2018 | US |
Number | Date | Country | |
---|---|---|---|
62492106 | Apr 2017 | US |