Public safety and other two-way radio communications users often use remote speaker microphones. In noisy environments, a remote speaker microphone should be more sensitive to the voice of the user than it is to ambient noise. To accomplish this, some remote speaker microphones employ differential microphone arrays to form a directional response (that is, a beam pattern), which results in improved signal strength for audio received from a particular direction. Adaptive beamforming algorithms may be used to steer the beam pattern toward the desired sounds (for example, speech), while attenuating unwanted sounds (for example, ambient noise). Some remote speaker microphones employ multiple differential microphone arrays to produce a single voice output.
The accompanying figures, where like reference numerals refer to identical or functionally similar elements throughout the separate views, together with the detailed description below, are incorporated in and form part of the specification, and serve to further illustrate embodiments of concepts that include the claimed invention, and explain various principles and advantages of those embodiments.
Skilled artisans will appreciate that elements in the figures are illustrated for simplicity and clarity and have not necessarily been drawn to scale. For example, the dimensions of some of the elements in the figures may be exaggerated relative to other elements to help to improve understanding of embodiments of the present invention.
The apparatus and method components have been represented where appropriate by conventional symbols in the drawings, showing only those specific details that are pertinent to understanding the embodiments of the present invention so as not to obscure the disclosure with details that will be readily apparent to those of ordinary skill in the art having the benefit of the description herein.
Some communications devices, (for example, remote speaker microphones) use microphone arrays (for example, differential microphone arrays) and adaptive beamforming to selectively receive sound coming from a particular direction, for example, from a user of the communications device. Differential microphone arrays (for example, broadside and endfire arrays) included in such devices are made up of two or more microphones (for example, micro-electrical-mechanical system (MEMS) microphones) spaced apart from one another along an axis. Such devices employ beam selection techniques to select beam patterns for the differential microphone arrays to focus audio reception on a desired sound source. Using such techniques, a communications device can enhance the ability to obtain desired speech from the user, and reduce interfering ambient noise to improve reception and the intelligibility of the received speech.
However, differential microphone arrays have a high-pass frequency dependence that increases (becomes more high-pass) as microphone spacing decreases and the target beam pattern becomes more directional. However, the high-pass frequency response of the raw differential microphone array requires equalization, which boosts any uncorrelated internal device noise and there-by causes a high noise floor. In some cases, this can lead to objectionable voice audio quality when external ambient noise is not sufficient to mask the internal device noise. Voice audio quality problems can be exacerbated when signals from multiple differential microphone arrays with different microphone spacing are mixed to form a single output.
To address this concern, some devices use microphone arrays with increased spacing. However, while this lowers the noise floor, it also reduces the usable bandwidth of the array. Additionally, as user demand and other factors lead to smaller product sizes, microphone array spacing is decreasing, not increasing. Some devices address this concern by using less directional beam patterns. This lowers the noise floor in quiet environments, but reduces the array's rejection of interfering ambient noise. Other approaches add microphones to the arrays, which leads to increased product cost. Finally, some approaches implement filters (for example, a Wiener filter) to filter the output to suppress noise. However, this is ineffective in applications where noise suppression is not a viable option (for example, automatic speech recognition). Filters are also ineffective when there are multiple microphone arrays on a device with different noise floors, because the noise floor can vary as the beam switches from one array to the other, which causes the Weiner filter to have difficulty with tracking the noise floor. Accordingly, systems and methods are provided herein for, among other things, adaptive white noise gain control and equalization for differential microphone arrays. Using embodiments described herein, differential microphone array beam patterns are adaptively ranged in directivity based on ambient noise.
One example embodiment provides an electronic device. The electronic device includes a microphone array and an electronic processor communicatively coupled to the microphone array. The electronic processor is configured to estimate an ambient noise level. The electronic processor is configured to compare the ambient noise level to a first threshold and a second threshold, the second threshold being lower than the first threshold. The electronic processor is configured to determine a beam pattern for the microphone array based on the comparison of the ambient noise level to the first threshold and the second threshold. The electronic processor is configured to apply the beam pattern to an audio signal received by the microphone array.
Another example embodiment provides a method for beamforming audio signals received from a microphone array. The method includes estimating an ambient noise level for an electronic device. The method includes comparing, with an electronic processor, the ambient noise level to a first threshold and a second threshold, the second threshold being lower than the first threshold. The method includes determining, with the electronic processor, a first beam pattern for a first microphone array positioned in the electronic device at a first orientation, the first beam pattern based on the comparison of the ambient noise level to the first threshold and the second threshold. The method includes comparing, with the electronic processor, the ambient noise level to a third threshold and a fourth threshold, the fourth threshold being lower than the third threshold. The method includes determining, with the electronic processor, a second beam pattern for a second microphone array positioned in the electronic device at a second orientation different from the first orientation, the second beam pattern based on the comparison of the ambient noise level to the third threshold and the fourth threshold. The method includes applying the first beam pattern to a first audio signal received by the first microphone array. The method includes applying the second beam pattern to a second audio signal received by the second microphone array.
For ease of description, some or all of the example systems presented herein are illustrated with a single exemplar of each of its component parts. Some examples may not describe or illustrate all components of the systems. Other example embodiments may include more or fewer of each of the illustrated components, may combine some components, or may include additional or alternative components.
It should be noted that, as used herein, the terms “beamforming” and “adaptive beamforming” refer to microphone beamforming using a differential microphone array, and one or more known or future-developed beamforming algorithms, or combinations thereof.
In the example illustrated, the remote speaker microphone 102 is communicatively coupled (for example, using a wired or wireless connection) to a portable radio 120 to provide input to (for example, an audio signal) and receive output from the portable radio 120. The portable radio 120 may be a portable two-way radio, for example, one of the Motorola® APX™ family of radios. In some embodiments, the components of the remote speaker microphone 102 may be integrated into a body-worn camera, a portable radio, a converged device, or another similar electronic communications device.
The electronic processor 104 obtains and provides information (for example, from the memory 106 and/or the input/output interface 108), and processes the information by executing one or more software instructions or modules, capable of being stored, for example, in a random access memory (“RAM”) area or a read only memory (“ROM”) of the memory 106 or in another non-transitory computer readable medium (not shown). The software can include firmware, one or more applications, program data, filters, rules, one or more program modules, and other executable instructions. The electronic processor 104 is configured to retrieve from the memory 106 and execute, among other things, software related to the control processes and methods described herein.
The memory 106 can include one or more non-transitory computer-readable media, and includes a program storage area and a data storage area. The program storage area and the data storage area can include combinations of different types of memory, as described herein. In the embodiment illustrated, the memory 106 stores, among other things, an adaptive beam former 122 (described in detail below).
The input/output interface 108 is configured to receive input and to provide system output. The input/output interface 108 obtains information and signals from, and provides information and signals to, (for example, over one or more wired and/or wireless connections) devices both internal and external to the remote speaker microphone 102.
The human machine interface (HMI) 110 receives input from, and provides output to, users of the remote speaker microphone 102. The HMI 110 may include a keypad, switches, buttons, soft keys, indictor lights, haptic vibrators, a display (for example, a touchscreen), or the like. In some embodiments, the remote speaker microphone 102 is configurable by a user via the human machine interface 110.
The first microphone array 112 includes two or more microphones that sense sound, for example, the speech sound waves 150 generated by a speech source 152 (for example, a human speaking). In some embodiments, the first microphone array 112 is an endfire array. The first microphone array 112 converts the speech sound waves 150 to electrical signals, and transmits the electrical signals to the electronic processor 104. The second microphone array 114 and the third microphone array 116 contain similar components and operate similarly to the first microphone array 112. In the illustrated example, each of the first microphone array 112, the second microphone array 114, and the third microphone array 116 are positioned in a housing (not shown) of the remote speaker microphone 102 at a different orientation to pick up audio signals originating in different directions relative to the remote speaker microphone 102. For example, the microphone arrays may be oriented along the x, y, and z axes (relative to the housing) of the remote speaker microphone 102. The electronic processor 104 processes the electrical signals received from the first microphone array 112, the second microphone array 114, and the third microphone array 116, for example, using the adaptive beamformer 122 according to the methods described herein, to produce an output audio signal. The electronic processor 104 provides the output audio signal to the portable radio 120 for voice encoding and transmission.
In some embodiments, the first microphone array 112, the second microphone array 114, and the third microphone array 116 have different microphone spacing. In one example embodiment, the first microphone array 112 is spaced at fifteen millimeters, the second microphone array 114 is spaced at sixty-five millimeters, and the third microphone array 116 is spaced at fifteen millimeters. In some embodiments, the remote speaker microphone 102 has fewer than three microphone arrays.
Oftentimes, the speech source 152 is not the only source of sound waves near the remote speaker microphone 102. For example, a user of the remote speaker microphone 102 may be in an environment with a competing sound sources 160 (for example, equipment operating, traffic sounds, other people speaking, and the like), which produce ambient noise sound waves 164. In order to assure timely and accurate communications, the microphones of the first microphone array 112, the second microphone array 114, and the third microphone array 116 are configured to produce response defined by beam patterns to pick up desirable sound waves (for example, from the speech source 152), while attenuating undesirable sound waves (for example, from the competing sound sources 160).
In one example, as illustrated in
For example, in
Current beamformers use directional beam patterns to pick up the user's voice (that is, the desired sound). However, such directional patterns may result in output signals that include too much internal device noise (self-noise) in situations where the competing sound sources 160 are not present. Accordingly, embodiments provide, among other things, methods for beamforming audio signals received from the microphone arrays based on the ambient noise levels.
By way of example, the methods presented are described in terms of the remote speaker microphone 102, as illustrated in
The method 400 is described as being performed by the remote speaker microphone 102 and, in particular, the electronic processor 104. However, it should be understood that in some embodiments, portions of the method 400 may be performed external to the remote speaker microphone 102 by other devices, including for example, the portable radio 120. For example, the remote speaker microphone 102 may be configured to send input audio signals from the first microphone array 112 to the portable radio 120, which, in turn, processes the input audio signals as described below.
The method 400 begins at block 402, with the electronic processor 104 receiving audio signals from the microphone array. The audio signals are electrical signals based on acoustic input from the speech sound waves 150 and the ambient noise sound waves 164, detected by the microphone array.
At block 404, the electronic processor 104 estimates the ambient noise level. In some embodiments, the electronic processor 104 estimates the ambient noise level using a moving average of an audio signal power for the microphone array. For example, a moving average of the audio signal power with a time constant significantly longer than the dynamic scale of speech results in measuring a relatively stable noise level and not rapidly varying speech. In some embodiments, the electronic processor 104 estimates the ambient noise level using a voice activity detection system. For example, the voice activity detection system could be used to identify when speech sounds were absent. An average ambient noise level measured during noise-only segments can be used to continuously estimate an ambient noise level.
The electronic processor 104, as described in detail below, compares the ambient noise level to a first threshold (e.g., an upper threshold) and a second threshold (e.g., a lower threshold) to determine a beam pattern. The first and second thresholds are audio power levels. In some embodiments, the first threshold's level is set such that the ambient noise masks an amplified self-noise associated with a fully directional configuration of the microphone array in a transmitted audio signal from the electronic device. In some embodiments, the second threshold's level, which is lower than the first threshold's level, is set such that the ambient noise fails to mask an amplified self-noise associated with a fully directional configuration of the microphone array in a transmitted audio signal from the electronic device.
At block 406, the electronic processor 104 determines whether the ambient noise level exceeds the first threshold level. At block 408, when the ambient noise level exceeds the first threshold, the electronic processor 104 determines a beam pattern by selecting a fully directional beam pattern. For example, the electronic processor may select a cardioid beam pattern (illustrated in
When the ambient noise level does not exceed the first threshold, the electronic processor 104 determines whether the ambient noise level is below the second threshold, at block 410. At block 412, when the ambient noise level is below the second threshold, the electronic processor 104 determines a beam pattern by selecting an omnidirectional beam pattern.
When the ambient noise level is not below the second threshold, and does not exceed the first threshold, the ambient noise level is between the first threshold and the second threshold. At block 414, in response to this condition, the electronic processor 104 determines a beam pattern by selecting an intermediate beam pattern, between fully directional and omnidirectional, with a degree of directionality based on the ambient noise level. In some embodiments, selecting an intermediate beam pattern by varying a beamforming coefficient is based on the ambient noise level using a continuous monotonic function. For example, for a cardioid beam pattern, the beamformer output at frequency f can be described by the following equation:
Y(f)=X1(f)−H(f)·exp(−jfT)·X2(f)
X1(f) is the raw omnidirectional signal from a first microphone of a two-microphone differential array;
X2(f) is the raw omnidirectional signal from a second microphone of a two-microphone differential array; and
H(f) is the relative signal strength factor between the first and second microphones.
This example beamformer has a high pass filter frequency response (that is, it attenuates low frequencies). As a consequence, in this example, a beamformer equalization factor Heq(f) is applied at each frequency to flatten the response.
In order to adjust the beamformer response based on the ambient noise, a correction factor (K) is added:
Y(f)=X1(f)−K·H(f)·exp(−jfT)·X2(f)
The correction factor ranges from K=0 when the ambient noise level is at or below the lower threshold to K=1 when ambient noise level is high (for example, noise at or above the upper threshold). When K=0, the beamformer output is equal to the raw omnidirectional signal from the first microphone. When K=1, the beamformer output is equal to the classic cardioid pattern. In between the thresholds, K could be varied log-linearly with the ambient noise level to smoothly transition from an omnidirectional to cardioid beam pattern.
The beamformer equalization factor is also modified proportionally with K, from Heq(f) for the cardioid when K=1 to a flat Eq, such as by:
H
eq,mod(f)=1+K·[Heq(f)−1].
Regardless of which beam pattern is determined, at block 416, the electronic processor applies the beam pattern to an audio signal received by the microphone array to produce an audio output. As illustrated in
As noted above, in some embodiments, the remote speaker microphone 102 may include multiple microphone arrays (for example, the first microphone array 112 and the second microphone array 114). In such embodiments, the first microphone array 112 is positioned in the remote speaker microphone 102 at a first orientation (for example, along the z axis) and the second microphone array 114 is positioned at a second orientation different from the first orientation (for example, along the x axis). In such embodiments, the electronic processor 104, using methods described above, determines beam patterns separately for the first microphone array 112 and the second microphone array 114. In such embodiments, the second microphone array 114 has different characteristics from the first microphone array 112.
To determine the beam pattern for the second microphone array 114, a third threshold and a fourth threshold are used. The third threshold is analogous to the first threshold described above, and the fourth threshold is analogous to the second threshold. The values for the third and fourth thresholds are determined similarly to the values for the first and second thresholds, but using the characteristics of the second microphone array 114. The electronic processor 104 determines a second beam pattern for the second microphone array 114 based on the comparison of the ambient noise level to the third threshold and the fourth threshold, as described above with respect to the first microphone array 112. The electronic processor 104 applies the second beam pattern to a second audio signal received by the second microphone array. The beamformed audio signals from the first microphone array 112 and the second microphone array 114 are mixed to produce a combined audio output signal.
The method 400 may be applied similarly to embodiments of the remote speaker microphone 102, which include three microphone arrays.
Regardless of how it is generated, the audio output signal may then be further processed or transmitted to the portable radio 120 for voice encoding and transmission.
In the foregoing specification, specific embodiments have been described. However, one of ordinary skill in the art appreciates that various modifications and changes can be made without departing from the scope of the invention as set forth in the claims below. Accordingly, the specification and figures are to be regarded in an illustrative rather than a restrictive sense, and all such modifications are intended to be included within the scope of present teachings.
The benefits, advantages, solutions to problems, and any element(s) that may cause any benefit, advantage, or solution to occur or become more pronounced are not to be construed as a critical, required, or essential features or elements of any or all the claims. The invention is defined solely by the appended claims including any amendments made during the pendency of this application and all equivalents of those claims as issued.
Moreover in this document, relational terms such as first and second, top and bottom, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. The terms “comprises,” “comprising,” “has,” “having,” “includes,” “including,” “contains,” “containing” or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises, has, includes, contains a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. An element proceeded by “comprises . . . a,” “has . . . a,” “includes . . . a,” or “contains . . . a” does not, without more constraints, preclude the existence of additional identical elements in the process, method, article, or apparatus that comprises, has, includes, contains the element. The terms “a” and “an” are defined as one or more unless explicitly stated otherwise herein. The terms “substantially,” “essentially,” “approximately,” “about” or any other version thereof, are defined as being close to as understood by one of ordinary skill in the art, and in one non-limiting embodiment the term is defined to be within 10%, in another embodiment within 5%, in another embodiment within 1% and in another embodiment within 0.5%. The term “coupled” as used herein is defined as connected, although not necessarily directly and not necessarily mechanically. A device or structure that is “configured” in a certain way is configured in at least that way, but may also be configured in ways that are not listed.
It will be appreciated that some embodiments may be comprised of one or more generic or specialized processors (or “processing devices”) such as microprocessors, digital signal processors, customized processors and field programmable gate arrays (FPGAs) and unique stored program instructions (including both software and firmware) that control the one or more processors to implement, in conjunction with certain non-processor circuits, some, most, or all of the functions of the method and/or apparatus described herein. Alternatively, some or all functions could be implemented by a state machine that has no stored program instructions, or in one or more application specific integrated circuits (ASICs), in which each function or some combinations of certain of the functions are implemented as custom logic. Of course, a combination of the two approaches could be used.
Moreover, an embodiment can be implemented as a computer-readable storage medium having computer readable code stored thereon for programming a computer (e.g., comprising a processor) to perform a method as described and claimed herein. Examples of such computer-readable storage mediums include, but are not limited to, a hard disk, a CD-ROM, an optical storage device, a magnetic storage device, a ROM (Read Only Memory), a PROM (Programmable Read Only Memory), an EPROM (Erasable Programmable Read Only Memory), an EEPROM (Electrically Erasable Programmable Read Only Memory) and a Flash memory. Further, it is expected that one of ordinary skill, notwithstanding possibly significant effort and many design choices motivated by, for example, available time, current technology, and economic considerations, when guided by the concepts and principles disclosed herein will be readily capable of generating such software instructions and programs and ICs with minimal experimentation.
The Abstract of the Disclosure is provided to allow the reader to quickly ascertain the nature of the technical disclosure. It is submitted with the understanding that it will not be used to interpret or limit the scope or meaning of the claims. In addition, in the foregoing Detailed Description, it can be seen that various features are grouped together in various embodiments for the purpose of streamlining the disclosure. This method of disclosure is not to be interpreted as reflecting an intention that the claimed embodiments require more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive subject matter lies in less than all features of a single disclosed embodiment. Thus the following claims are hereby incorporated into the Detailed Description, with each claim standing on its own as a separately claimed subject matter.