Embodiments of the present disclosure relate to a method and apparatus for processing an audible signal to form a processed audible signal that has an improved signal-to-noise ratio.
The popularity and reliance on electronic devices has increased dramatically in the past decade. The popularity of electronic devices, such as smart phones, touch pads, PDAs, portable computers, and portable music players, has increased in the past decade. Videotelephony and video conferencing devices have also become more popular in recent years, thanks in large part to proliferation of high speed Internet and price reductions in the supporting equipment. As the number of electronic devices and the reliance on these electronic devices has increased, there has been a desire for these devices to receive and process an audible input signal received from a user so that the audible input can be used to enable some desired task to be performed.
For years there has been a desire to construct machines that can recognize, process and/or transmit various types of audible inputs received from a human being. Although in recent years this goal has begun to be realized, currently available systems have not been able to produce results that are able to accurately detect these received audible inputs in environments where external noise is common or not well controlled. In most conventional microphone containing devices that are configured to recognize and/or process various types of audible inputs, it is often hard for the audible input processing electronics (e.g., voice recognition hardware) to clearly separate the desired human speech from the unwanted noise. This inability to separate audible inputs from the surrounding noise within the environment is primarily due to difficulties that are involved in extracting and identifying the individual sounds that make up the human speech. These difficulties are exacerbated in noisy environments. Simplistically, speech may be considered as a sequence of sounds taken from basic sounds called “phonemes,” produced by a human. One or more phonemes represent a word or a phrase. Thus, extraction of the particular phonemes contained within the received speech is necessary to achieve voice recognition, which is often extremely difficult in noisy environments.
Moreover, conventional voice or speech recognition hardware are typically limited to detecting speech within the lower end of the speech frequency range, such as between about 100 hertz (Hz) and about 3,000 Hz, due to limitations in the device's sampling frequency and the geometry of the microphone assemblies. Thus, a large amount of useful data is lost by these conventional designs since they are not able to detect speech throughout the full speech range which extends between 100 Hz and about 8,000 Hz, and thus lose the information found in the higher end of the speech range found between 3,000 Hz and 8,000 Hz.
As the popularity of voice recognition systems increases, many users utilize them in a variety of environments. Use of these various devices is common in a myriad of moderately noisy to excessively noisy environments such as an office, conference room, airport, or restaurants. Several conventional methods for performing noise reduction already exist, however, many conventional methods can be categorized as types of filtering. In the related art, speech and noise are acquired in the same input channel, where they reside in the same frequency band and may have similar correlation properties. Consequently, filtering will inevitably have an effect on both the speech signal and the background noise signal. Distinguishing between voice and background noise signals is a challenging task. Speech components, which are received by conventional electronic devices, may be perceived as noise components and may be suppressed or filtered along with the noise components. While voice recognition technology is increasingly sophisticated, a clear separation of the voice component of an audio signal from noise components, or in other words having a high signal-to-noise ratio (SNR), is required for acceptable levels of accuracy in the voice recognition or even, in some cases, the delivery and reproduction of the received audio signal at a distant location.
Additionally, as the number of electronic devices and the reliance on these electronic devices has increased, there has been a desire for electronic devices that are untethered to conventional wall outlet types of power sources, thus allowing these untethered electronic devices to be portable. However, the power supply in portable electronic devices is commonly limited by a finite energy storage capacity provided by a battery. The rate of energy consumption by the device determines the time of operation of the device until the battery needs to be recharged or replaced. Therefore, it is desirable to find ways to reduce the power consumption used by the portable device's electronic components, such as voice recognition elements, to improve the battery lifetime of the portable electronic devices.
Therefore, there is a need for an electronic device that solves the problems described above. Moreover, there is a need for a portable electronic device that is able to efficiently filter out unwanted noise from an audible input that is received from an audible source.
Embodiments of the disclosure generally include a method and apparatus for receiving and separating unwanted external noise from an audible input received from an audible source. Embodiments of the disclosure may include an audible signal processing system that contains a plurality of audible signal sensing devices (e.g., microphones) that are arranged and configured to detect an audible signal that is generated and provided to the audible signal processing system from any position or angle within three dimensional (3-D) space.
Embodiments of the disclosure may include a direction detection device configured to determine a direction from which an audible signal is received, wherein the direction detection device includes a delay determination algorithm and a direction determination algorithm that are stored in a memory of an electronic device. The delay determination algorithm includes a number of instructions which, when executed by a processor, causes the electronic device to perform operations including determining a first time difference between when a first microphone received a first portion of an audible signal and when a second microphone received the first portion of the audible signal, and determining a second time difference between when the first microphone received the first portion of the audible signal and when a third microphone received the first portion of the audible signal. The direction determination algorithm includes a number of instructions which, when executed by the processor, causes the electronic device to perform operations including determine a direction from which the audible signal was received based on a comparison of the first time difference and the second time difference with a plurality of time delay values that are stored in memory. The stored plurality of time delay values may include time delay values that are each associated with a direction that is aligned relative to an orientation of the first, the second and the third microphones.
Embodiments of the disclosure may further include a direction detection device configured to determine a direction from which an audible signal is received, wherein the direction detection device includes a delay determination algorithm and a direction determination algorithm that are stored in a memory of an electronic device. The delay determination algorithm may include a number of instructions which, when executed by a processor, causes the electronic device to perform operations including analyzing an audible signal that comprises a plurality of audible signal portions that are sequentially received in time, wherein analyzing the audible signal comprises analyzing each of the audible signal portions to determine a first time difference between when a first microphone received an audible signal portion and when a second microphone received the audible signal portion, and determine a second time difference between when the first microphone received the audible signal portion and when a third microphone received the audible signal portion. The direction determination algorithm comprises a number of instructions which, when executed by the processor, causes the electronic device to perform operations comprising assigning a direction to each of the audible signal portions by comparing the first time difference and the second time difference determined for each of the audible signal portions with a plurality of time delay values that are stored in memory, and determining the direction from which the audible signal was received by determining which of the assigned directions for each of the audible signal portions occurred the most number of times over a period of time.
Embodiments of the disclosure may further include a method of determining a direction from which an audible signal is received that includes determining, by use of the electronic device, a direction from which an audible signal was received, wherein the audible signal comprises a plurality of audible signal portions that are sequentially received in time, and wherein determining the direction from which the audible signal was received includes determining a direction from which each of the plurality of audible signal portions were received by determining a first relative time delay, wherein the first relative time delay is determined by calculating a difference between when a first microphone received an audible signal portion and a time when the second microphone received the same audible signal portion, determining a second relative time delay, wherein the second relative time delay is determined by calculating the difference between when the first microphone received the same audible signal portion and when a third microphone received the same audible signal portion, comparing the first relative time delay and the second relative time delay with a plurality of stored time delays, and determining that the audible signal was received from a direction based on the comparison of the first relative time delay and the second relative time delay with a plurality of stored time delays.
Embodiments of the disclosure may further include a method of determining a direction from which an audible signal is received that includes determining, by use of an electronic device, when a first portion of an audible signal was received by each microphone disposed within an array of microphones, wherein the array of microphones comprises a first microphone, a second microphone and a third microphone, and determining, by use of the electronic device, a direction from which the first portion of the received audible signal was received. The process of determining the direction will include determining a first relative time delay, wherein the first relative time delay is calculated by determining the difference between a time when the second microphone received the first portion of the received audible signal and a time when the first microphone received the first portion of the received audible signal, determining a second relative time delay, wherein the second relative time delay is calculated by determining the difference between a time when the third microphone received the first portion of the received audible signal and the time when the first microphone received the first portion of the received audible signal, calculating a first time delay ratio by dividing the first relative time delay by the second relative time delay, comparing the first time delay ratio with a plurality of stored time delay ratios, and determining that the first time delay ratio is closer to a first stored time delay ratio that is associated with a first direction than a second stored time delay ratio that is associated with a second direction.
Embodiments of the disclosure may further include a direction detection device configured to determine a direction from which an audible signal is received, comprising a delay determination algorithm and a direction determination algorithm stored that are stored in a memory of an electronic device. The delay determination algorithm may include a number of instructions which, when executed by a processor, causes the electronic device to perform operations comprising analyzing an audible signal that comprises a plurality of audible signal portions that are sequentially received in time, wherein analyzing the audible signal comprises analyzing each of the audible signal portions to determine a first time difference between when a first microphone received an audible signal portion and when a second microphone received the audible signal portion. The direction determination algorithm may include a number of instructions which, when executed by the processor, causes the electronic device to perform operations including comparing each of the determined first time differences of each of the audible signal portions with a plurality of stored time delays, determining a direction for each of the plurality of audible signal portions based on the comparison, and determining the direction from which the audible signal was received by determining which of the determined directions for each of the audible signal portions occurred the most number of times over a period of time. The plurality of stored time delays will include a first stored time delay that is associated with the external audible source being positioned a distance from the first and second microphones along a first direction, and a second stored time delay that is associated with the external audible source being positioned a distance from the first and second microphones along a second direction, wherein the first direction and the second direction each extend from a vertex point, and a region formed between the first direction and the second direction comprises a first angular distance.
Embodiments of the disclosure may further include a method of determining a direction from which an audible signal is received that includes defining an audible signal detection region by dividing a first angular distance created between a first microphone and a second microphone that are disposed on an electronic device into at least two regions, wherein one of the at least two regions comprise a first angular distance that is formed between a first direction and a second direction that each extend from a vertex point, determining, by use of an electronic device, a first relative time delay created by the delivery of a first portion of an audible signal to the first microphone and the second microphone from the external audible source, wherein the first relative time delay is calculated by determining a difference between a time when the second microphone received the first portion of the audible signal and a time when the first microphone received the first portion of the audible signal, comparing, by use of the electronic device, the first relative time delay with a plurality of stored time delays, and determining, by use of the electronic device, that the external audible source is positioned in a direction that is closest to a third direction by determining that the first portion of the audible signal was received from a direction that is closer to the third direction that is positioned between the first and second directions versus a fourth direction that is positioned outside of the first angular distance formed between the first and second directions based on the comparison of the first relative time delay with the first and second stored time delays. The plurality of stored time delays include a first stored time delay that is associated with the external audible source being positioned a distance from the first and second microphones along the first direction, and a second stored time delay that is associated with the external audible source being positioned a distance from the first and second microphones along the second direction.
Embodiments of the disclosure may further include a method of determining a direction from which an audible signal is received, comprising defining an audible signal detection region by dividing a first angular distance created between a first microphone and a second microphone that are disposed on an electronic device into at least two regions, wherein one of the at least two regions comprise a first angular distance that is formed between a first direction and a second direction that each extend from a vertex point, determining, by use of an electronic device, a first relative time delay created by the delivery of a first portion of an audible signal to the first microphone and the second microphone from the external audible source, wherein the first relative time delay is calculated by determining the difference between the time when the second microphone received the audible signal and the time when the first microphone received the first portion of the audible signal, comparing, by use of the electronic device, the first relative time delay with a plurality of stored time delays, and determining, by use of the electronic device, that the first portion of the audible signal was received from a direction that is closer to the second direction than the first direction based on the comparison of the first relative time delay with the first and second stored time delays. The plurality of stored time delays may include a first stored time delay that is associated with the external audible source being positioned a distance from the first and second microphones along the first direction, and a second stored time delay that is associated with the external audible source being positioned a distance from the first and second microphones along the second direction.
So that the manner in which the above recited features of the invention can be understood in detail, a more particular description of the invention, briefly summarized above, may be had by reference to embodiments, some of which are illustrated in the appended drawings. It is to be noted, however, that the appended drawings illustrate only typical embodiments of this invention and are therefore not to be considered limiting of its scope, for the invention may admit to other equally effective embodiments.
To facilitate understanding, identical reference numerals have been used, where possible, to designate identical elements that are common to the figures. It is contemplated that elements disclosed in one embodiment may be beneficially utilized on other embodiments without specific recitation. The drawings referred to here should not be understood as being drawn to scale unless specifically noted. Also, the drawings are often simplified and details or components omitted for clarity of presentation and explanation. The drawings and discussion serve to explain principles discussed below, where like designations denote like elements.
In the following description, numerous specific details are set forth to provide a more thorough understanding of the embodiments of the present disclosure. However, it will be apparent to one of skill in the art that one or more of the embodiments of the present disclosure may be practiced without one or more of these specific details. In other instances, well-known features have not been described in order to avoid obscuring one or more of the embodiments of the present disclosure.
Embodiments of the disclosure generally include a method and apparatus for receiving and separating unwanted external noise from an audible input received from an audible source. Embodiments of the disclosure may include an audible signal processing system that contains a plurality of audible signal sensing devices (e.g., microphones) that are arranged and configured to detect an audible signal that is generated and provided to the audible signal processing system from any position or angle within three dimensional (3-D) space. The audible signal processing system is configured to analyze the audible signals received by each of the plurality of audible signal sensing devices using a first signal processing technique that is able to separate unwanted low frequency range noise from the detected audible signals and a second signal processing technique that is able to separate unwanted higher frequency range noise from the detected audible signals. The audible signal processing system is then configured to combine the signals processed by the first and second signal processing techniques to form a desired audible signal that has a high signal-to-noise ratio throughout a desired frequency range, such as the full speech range.
In some configurations, the audible signal processing system is designed to be portable and thus run on a power source that has a finite amount energy stored therein (e.g., battery). Therefore, in some cases the audible signal processing system may be further configured to receive and separate the unwanted external noise from a received audible input in an efficient manner to extend the operation time of the portable audible signal processing system. The audible signal processing system may also be configured to receive an audible signal from an external source, efficiently remove or separate unwanted noise from the received audible signal, and then deliver the processed audible signal to a software application that is configured to further process and/or perform some desired activity based on the processed audible signal. The audible signal processing system may also be configured to deliver the processed audible signal to another electronic device that is configured to receive and process the received information so that the second device can perform some desired activity.
The electronic device 102 will include a plurality of audible signal detection devices that are positioned in a geometrical array across one or more surfaces of the electronic device 102. In some embodiments, the geometrical array of audible signal detection devices, or hereafter referred to as microphones 101, can be positioned in a two dimensional (2-D) array of microphones 101 or a three dimensional (3-D) array of microphones, which may include microphones 101 and one or more microphones 121. The electronic device 102 may be any desirable shape, such as the cylindrical shape shown in
In some embodiments, the microphones 101 are positioned in a two-dimensional (2-D) geometrical array across the top surface 106 and/or side surface 108 of the electronic device 102. In one example, as shown in
The audible signal processing device 400 generally includes electrical components that can efficiently separate a desired portion on an audible signal from other received noise using a low frequency signal processing technique and a higher frequency signal processing technique. It is believed that the processes performed by the audible signal processing device 400 will reduce the error rate encountered when using the processed audible signal in a subsequent voice detection, voice communication, voice activated electronic device control and/or voice recognition process versus processed audible signals generated by conventional noise cancelling or noise reduction techniques that are common today. The processes described herein are also adapted to extend the operation time of the audible signal processing system 100 before a recharge or replacement of the power source 130 is required. While the power source 130 described herein may include a battery, the electronic device 102 may at one time or another receive power from a wired connection to a wall outlet, wireless charger or other similar devices without deviating from the basic scope of the disclosure provided herein.
The electronic assembly 135 may include the processor 118 that is coupled to input/output (I/O) devices 116, the power source 130, and the non-volatile memory unit 122. Memory unit 122 may include one or more software applications 124, such as the controlling software program which is described further below. The memory unit 122 may also include stored media data 126 that is used by the processor 118 to perform various parts of the methods described herein. The processor 118 may be a hardware unit or combination of hardware units capable of executing software applications and processing data. In some configurations, the processor 118 includes a central processing unit (CPU), a digital signal processor (DSP), an application-specific integrated circuit (ASIC), and/or a combination of such units. The processor 118 is generally configured to execute the one or more software applications 124 and process the stored media data 126, which may be each included within the memory unit 122.
The I/O devices 116 are coupled to memory unit 122 and processor 118, and may include devices capable of receiving input and/or devices capable of providing output. The I/O devices 116 include the audio processing device 117 which receives the battery power and an input signal 104, and produces the output signal 106 which may be received and then broadcast by the speaker system 111. The I/O devices 116 also include one or more wireless transceivers 120 that are configured to establish one or more different types of wired or wireless communication links with other transceivers residing within other computing devices. A given transceiver within the I/O devices 116 could establish, for example, a Wi-Fi communication link, near field communication (NFC) link or a Bluetooth® communication link (e.g., BTLE, Bluetooth classic), among other types of communication links with similar components in the second electronic device 195. In some embodiments, electronic components within the I/O device 116 are adapted to transmit signals processed by the audible signal processing device 400 to other internal electronic components found within the audible signal processing system 100 and/or to electronic devices that are external to the audible signal processing system 100, as is discussed further below.
The memory unit 122 may be any technically feasible type of hardware unit configured to store data. For example, the memory unit 122 could be a hard disk, a random access memory (RAM) module, a flash memory unit, or a combination of different hardware units configured to store data. The software application 124, which is stored within the memory unit 122, includes program code that may be executed by processor 118 in order to perform various functionalities associated with the electronic device 102. The stored media data 126 may include any type of information that relates to a desired control parameter, quasi-direction information, calculated time delay information, noise signal RMS information, user data, electronic device configuration data, device control rules or other useful information, which are discussed further below. The stored media data 126 may include information that is delivered to and/or received from the source 150 or another electronic device, such as the second electronic device 195. The stored media data 126 may reflect various data files, settings and/or parameters associated with the environment, audible signal processing device control and/or desired behavior of the electronic device 102.
As discussed above, during operation the electronic device 102 is configured to detect an audible signal “A” (e.g., voice command, acoustic signal) by use of a plurality of microphones 101 and then process received audible signals using the audible signal processing device 400 so that the processed audible signals can be used to perform some desired task, or audible signal processing activity, by the audible signal processing system 100 or other electronic device, such as voice recognition, voice communication, voice activated electronic device control and/or other useful audible signal enabled task or activity. However, depending on the position of the audible source 150 relative to the microphones 101 within the electronic device 102 there will be a delay in the time when each microphone receives the same audible signal. In general, voice communication techniques will include any type of two-way communication process such as an audio chat, video chat, voice call or other similar communication technique.
One will note that the delay one microphone will experience versus another microphone is equal to the differences in distance of each microphone from the source and the speed of sound (e.g., 340.3 m/s at sea level). As illustrated in
One will note that the timing when each of the components of the composite audible signal 301 reach each microphone will differ in at least one characteristic depending on the distance of the various sources relative to each of the microphones within the array of microphones found in the electronic assembly 102. In other words, for example, the time when the second audible noise signal 313 and desired audible signal 312 overlap in time as detected by each microphone will differ, and thus the phase relationship and delay between each type of received audible signal component will vary relative to each other from microphone to microphone.
Therefore, one goal of the audible signal processing device 400 within the electronic device 102 is to remove as much of the audible signal received from the first and second types of noise sources so that the desired audible signal 312 can be separated therefrom. Once separated, the desired audible signal 312 can then be delivered to a software application that is configured to further process the desired audible signal so that some desired activity can be performed based on the receipt of the desired audible signal 312. In some embodiments, the desired audible signal 312 includes a user's speech that includes information across the full speech range, which typically extends between about 100 Hz and about 8,000 Hz.
The design and configuration of the microphones within a geometric array within the electronic device 102 can be made based on a balance of the need to have a microphone array configuration that has a desired spacing to assure that the direction of the received audible signal can be accurately determined, as will be discussed further below, versus the need to assure that the signal processing technique (e.g., cardioid and/or beam forming) can preferentially reject unwanted noise across the full speech range without the signal processing technique falling apart at either the higher end or the lower end of the frequency range. It is believed that most conventional spatial noise reduction techniques used today are unable to work at or are ineffective at the high end frequencies due to microphone spacing limitations or constraints, and thus most voice recognition or other similar programs are unable to effectively utilize the information found in the higher end of the speech range, such as between 4,000 Hz and 8,000 Hz.
The optional microphone gain element 420 typically includes microphone signal gain adjusting elements that are adapted to adjust the signal level of input received from each of the microphones within the geometrical array of microphones. As illustrated in
Referring to
In some embodiments, the first signal processor 405 is configured to separate unwanted low frequency noise from the detected audible signals received from two or more of the microphones within the geometrical microphone array, while the second signal processor 407 is generally configured to separate unwanted higher frequency noise from the detected audible signals received from all of the microphones within the geometrical array.
In general, the first signal processor 405 is adapted to remove or separate noise found within the lower end of the audible signal frequency range from received composite audible signal using a cardioid noise rejection technique. In order to perform the first signal processing technique, the first signal processor 405 will include or use portions of the controlling software program and various analog and digital hardware components to perform the desired processes described herein. In some embodiments, the first signal processor 405 includes elements that are formed within a digital signal processor (DSP) module. In some embodiments, the first signal processor 405 is adapted to remove or separate the unwanted noise from a desired audible signal using a cardioid signal processing technique. The cardioid signal processing technique performed by the first signal processor 405 is generally adapted to reject noise received from an off axis direction relative to the direction of a desired audible signal source using a pattern that is similar to an endfire cardioid.
In some embodiments, a first-order cardioid is formed by the first signal processor 405 by use of two audible signal inputs that are positioned along a direction that is in-line with the direction that the audible signal is received from the audible signal source. In one embodiment, the first-order cardioid is formed using two audible signal inputs that are received from two of the microphones found within the geometrical array of microphones. Additionally or alternately, as will be discussed further below, the cardioid pattern is formed by averaging the inputs from two microphones to form a virtual microphone audible signal and then using the formed virtual microphone's audible signal and an audible signal from one of the other microphones in the geometrical array to from the first-order cardioid. While other higher-order cardioid forming signal processing techniques could be used by the first signal processor 405, it is believed that the use of a first-order cardioid for the low frequency signal processing has advantages over these higher-order cardioid signal processing techniques. In general, from a power conservation stand point, it is desirable to use a fewer number of microphones to perform the signal processing techniques disclosed herein. Thus, a first-order cardioid generation technique has advantages over other signal processing techniques that need an increased number of microphones to form the higher order cardioid patterns. One will note that a large portion of the power consumption is created by the process of reading, comparing and then writing to a buffer the audible signals, and performing other related pointer math, for each of the microphones within the electronic device at the high sampling rates required to perform these types of signal processing techniques. In another example, it is believed that higher-order cardioid signal processing techniques will require additional computing power and time to process the audible signals received from three or more microphones. The additional amount of computing power can thus create a significant drain on a battery powered type of power source 130.
In some embodiments, the first order cardioid noise rejection technique utilized by the first signal processor 405 can be achieved by summing audible signals received by two microphone elements within the geometrical array of microphones, where one of the audible signals is inverted and delayed a period of time relative to the other received audible signal before the two audible signals are then summed together. The amount of time delay generated by the first signal processor 405 is related to the speed of sound and the effective distance between the microphones in the direction that the desired audible signal is received. The first signal processing technique is able to form a desired cardioid pattern for rejecting unwanted noise received from off-axis orientations as long as the wavelength is much longer than the distance between the two microphones used to form the cardioid pattern. However, the ability to reject unwanted noise in a cardioid pattern degrades once the wavelength approaches a proportional distance between the microphones. Therefore, the closer the microphones are to each other, the higher in frequency the cardioid pattern can be maintained.
While the discussion surrounding
Referring back to
Next at step 473, the microphone input selection element 440 selects two microphones within the geometrical array of microphones based on the detected direction information received from the direction detection element 430. In this example, the first signal processor 405 performs an analysis of the audible signals received by the microphones 101B and 101C, since they are aligned along a line 505B that extends in the 90° and −90° directions.
After the desired pair of microphones has been selected, steps 474 and 475 are completed, which includes the delivery of the audible signal received from the microphone furthest from the audible signal source (e.g., microphone 101B) to the signal delaying element 441 and a signal inverting element 442, by the delivery of the audible signal along path 440B of the first signal processor 405, to form a delayed and inverted audible signal. The time delay used by the signal delaying element 441 is set by the known distance between the two microphones based on the known speed of sound. The time delay value used may be stored and retrieved from the memory unit 122.
Next, at step 476, the “undelayed” audible signal received from the microphone closest to the audible signal source (e.g., audible signal received from microphone 101C along path 440A) and the delayed and inverted audible signal are then combined together by use of the signal combining element 443.
Next, at step 477, the combined signals are then optionally filtered by use of the signal filtering element 411 within the post processing element 452. The signal filtering element may include a low pass filter and or a high pass filter that are able to remove frequencies that are higher and/or lower than the useable signal processing range of the first signal processor 405, such as frequencies where the cardioid pattern become significantly distorted. For example, a low pass filter frequency may coincide with a frequency between about 1,000 Hz and about 4,000 Hz, and a high pass filter frequency may coincide with a frequency of about 100 Hz, or, for example, with a frequency of about 700 Hz, or about 1,000 Hz, or even about 2000 Hz.
After the processes performed in method 470 have been completed the first signal processor 405 will then provide the processed audible signal, or hereafter first processed audible signal, to the signal combining element 414 where, in step 498, the first processed audible signal and a second processed audible signal received from the second signal processing technique are combined. As will be discussed further below, at step 499, the combined first and second processed audible signals are then transferred to other downstream electronic devices or elements within the audible signal processing system 100.
Therefore, based on the geometrical array of microphones illustrated in
In an effort to reduce the off-angle error produced when the selected microphones are not aligned in the exact direction of the audible source, a virtual microphone can be used to minimize or effectively eliminate any error created by the misalignment of the formed cardioid pattern relative to the audible source. A virtual microphone can be created by combining the audible signals received by two microphones within the geometric array to form a microphone that is effectively positioned at a point along a line that extends between the two microphones. The combined audible signal will generally approximate a portion of an audible signal that would have been received by a virtual microphone that is positioned between the first and second microphones. The process of combining the audible signals may include averaging, or even weighting, the two audible signals received by the two selected microphones within the geometric array to form a virtual microphone audible signal that can then be used by the first signal processing technique in a similar way that an audible signal from an “actual” microphone is used. For example, referring to
Therefore, by generating virtual microphones that are positioned along a line that extends between two microphones, the maximum off angle error that is produced when the selected microphones are not aligned in the exact direction of the audible source can be reduced by at least half of the angle formed between the microphones used to form the virtual microphone. For example, the virtual microphone in the above example would thus effectively have a microphone positioned every 60° versus the actual 120° distance between each of the three actual microphones 101A-101C. Thus, by use of a microphone signal averaging technique to form the virtual microphone the first signal processing technique can be used to detect audible signals found in at least 12 different directions while using only 3 microphones. Alternately, for virtual microphone generation techniques that use a weighted sum of the audible signals received by two or more microphones the first signal processing technique could find an infinite number of possible directions to position the virtual microphone along the line extending between the microphones while using only 3 microphones. The weighting values used to create the weighted sum of the audible signals may be based on a comparison of the determined direction received from the direction detection element 430 and its relationship (e.g., relative angle) to one of the six directions that are parallel to lines 505A, 505B or 505C. Therefore, as noted above, by use of the virtual microphone technique a fewer number of microphones are needed to perform the first signal processing technique and thus less power will be consumed by the electronic device in performance of this signal processing technique.
In some virtual microphone generating embodiments, it is desirable to select two microphones that are positioned along a line that is substantially perpendicular to the direction that an audible signal is received to form the virtual microphone. Also, by increasing the number of actual microphones found within the geometrical array of microphones the need for the generation of virtual microphones can be reduced, since the error will be reduced.
Alternately, in electronic device configurations that are not limited by electrical power constraints (e.g., electrical power is received from a wall plug or large battery) and/or limited by the processor's speed and other processor related resources it may be desirable to remove or separate noise found within the audible signal frequency range by use of a cardioid noise rejection technique that uses a higher order cardioid than a first order based cardioid signal processing technique. In some embodiments, the first signal processor 405 is adapted to remove noise using a second-order or greater cardioid based signal processing technique. In one configuration, the cardioid signal processing technique performed by the first signal processor 405 is adapted to reject noise received from an off-axis direction relative to the audible signal's direction using three or more microphones within the geometrical array. The controlling software used within the audible signal processing device 400 may thus utilize three or more microphones and/or generated virtual microphones that are aligned in a desired direction to separate unwanted noise from the detected audible signals using a cardioid pattern.
In some embodiments, the first signal processing technique is adapted to form higher order cardioid patterns by use of three or more audible signal inputs that are positioned along a direction that is in-line with the direction that the audible signal is received from the audible signal source. In one embodiment, the higher order cardioid is formed using the three or more audible signal inputs that are received from the microphones found within the geometrical array of microphones. Additionally or alternately, the higher order cardioid patterns are formed by combining (e.g., averaging or weighting) the inputs from two or more microphones to form a virtual microphone audible signal and then using the formed virtual microphone's audible signal and an audible signal from one of the other microphones or other virtually formed microphones to from the higher order cardioid.
During operation, the first signal processor 405 receives audible signal direction information from the direction detection element 430 noting that audible signals are being received from a direction 605 (i.e., 0° direction). The first signal processor 405 will then determine by use of the controlling software and microphone input selection element 440 that microphone 601G, a virtual microphone 620A, and a virtual microphone 620B are needed to perform the first signal processing technique, since these microphones are aligned along the direction 605. In this example, once the controlling software has formed the virtual microphone 620A by averaging the audible signals received by microphones 601A and 601F, and virtual microphone 620B has been formed by averaging the audible signals received by microphones 601C and 601D, the process of forming the desired cardioid pattern can be completed. In this example, the virtual microphone 620A is positioned at the midpoint of line 621A and the virtual microphone 620B is positioned at the midpoint of line 621B.
Next, the first signal processor 405 then uses a cascaded cardioid generation process to perform the first signal processing technique. The cascaded cardioid generation process begins by the delivery of the audible signal received from the microphone 601G to the signal delaying element 441 (e.g., step 474) and the signal inverting element 442 (e.g., step 475) to form a first delayed and inverted audible signal. The time delay value that is used may be stored and retrieved from the memory unit 122 for this known microphone configuration. The “undelayed” audible signal received from the virtual microphone 620A and the first delayed and inverted audible signal received from the second microphone 601G are then combined together (e.g., step 476) by use of the signal combining element 443 to form a first combined cascaded audible signal that is stored within memory.
The cascaded cardioid generation process then continues on to form a second combined cascaded audible signal by delivering the audible signal received from the second virtual microphone 620B to the signal delaying element 441 (e.g., step 474) and the signal inverting element 442 (e.g., step 475) to form a second delayed and inverted audible signal. The “undelayed” audible signal received from microphone 601G and the second delayed and inverted audible signal are then combined together (e.g., step 476) by use of the signal combining element 443 to form a second combined cascaded audible signal that is stored within memory.
The cascaded cardioid generation process then delivers the second combined cascaded audible signal to the signal delaying element 441 (e.g., step 474) and the signal inverting element 442 (e.g., step 475) to form a first combined delayed and inverted audible signal. The time delay value that is used may be stored and retrieved from the memory unit 122. Then, the first combined cascaded audible signal and the first combined delayed and inverted audible signal are then combined together (e.g., step 476) by use of the signal combining element 443 and then filtered by use of the signal filtering element 411 to form the complete combined cascaded audible signal.
After a complete combined cascaded audible signal has been formed, the first signal processor 405 will then provide the complete combined cascaded audible signal to the signal combining element 414 where, in step 498, the complete combined cascaded audible signal and a second processed audible signal received from the second signal processing technique are combined. As will be discussed further below, at step 499, the combined signals are then transferred to other downstream electronic devices or elements within the audible signal processing system 100.
In cases where the audible signal source is aligned with a direction that includes three “actual” microphones, such as direction 606 in
In electronic device configurations that include an array of microphones which are primarily disposed along the outer surface 108 of the electronic device 102, such as configurations where no centrally positioned microphone exists, a virtual microphone can be used in place of a centrally positioned microphone by averaging the audible signals received by two or more microphones within the geometric array. Referring to
Referring to
During operation, at step 482, after receiving the microphone inputs from the optional microphone gain element 420 during step 481, each of the audible signals provided from each of the microphones 101A-101C are preprocessed. The preprocessing steps may include processing the audible signal using an equalizing element 460 so that a certain desired frequency range, which may be associated with the speech range, can be extracted or isolated from other unwanted frequency regions before it is processed by the subsequent direction detection elements. In some configurations, the equalizing element 460 may include parametric equalizers 461A-4610 that are each configured to process the audible signal received from each of the microphones 101A-101C, respectively. In some embodiments, the equalizing element 460 is configured to preferentially allow frequencies within the full speech range to pass therethrough and be delivered to one or more downstream components.
Next, at step 483, at least one of the inputs from a microphone within the geometrical array of microphones is then delivered to the root-mean-square (RMS) processing element 467 that is used to detect and in some cases remove constant audible noise signals 311 found within the composite audible signals 301 that are received by each of the microphones. The RMS processing element 467 may utilize an RMS threshold analysis element 468 that contains and/or monitors and stores information regarding the level of the constant audible noise signal 311 over time. The RMS processing element 467 may be configured to detect the current level of the received constant audible noise signal 311, and further receive input regarding historic constant audible noise signal levels from the RMS threshold analysis element 468, so that unwanted background noise does not get utilized in the direction detection algorithm and thus affect the direction detection element's results. In some embodiments, the RMS threshold analysis element 468 uses the measured level of the constant audible noise and compares it to a received audible signal to determine if the incoming audible data should be used towards the determination of its arrival direction, and thus the received audible data is not part of the noise within the environment 110 (
Next, at step 484, the delay analysis element 462 receives the audible signals processed by the equalizing element 460 and analyzes the audible signals to determine the relative delays in the receipt of the audible signal experienced by each of the microphones. In some embodiments, the delay analysis element 462 analyzes each of the received audible signals as a function of time to determine which attributes in each of the received audible signals are common in each of the received audible signals to determine the relative delay experienced by each microphone. In one configuration, as illustrated in
At step 485, after determining the relative delay of each of the microphones, the direction determination element 463 then determines the direction that the audible signal is being received from by use of one or more portions of the controlling software program. While it is possible to perform various complicated mathematical analysis techniques to determine the exact position of the audible source relative to the electronic device 102, it has been found that these highly analytical direction detection processes require a significant amount of computing power and time, and thus can create a significant drain on the power source 130. The incorporation of these types of highly analytical direction detection processes would greatly increase the cost and complexity of a consumer electronic device that is able to perform these types of direction detection processes.
Therefore, in some embodiments, the direction determination element 463 utilizes a less analytically intensive and power intensive statistical binning approach to determine the direction of the audible source. While the direction determination element 463 in some cases will be able to detect the exact direction of the audible source from the electronic device 102, in most cases the direction that is determined by the direction determination element 463 will have an error, which at its maximum is related to the sample rate of the analysis program, spacing of the microphones and also the size of the direction bins selected and used by the controlling software program(s) to determine the nearest direction to the audible source direction. Thus, an audible signal direction determined by the statistical analysis that is performed by the direction determination element 463 is described herein as a “quasi-direction.”
During the direction determination process the direction determination element 463 uses the relative delay times received from the delay analysis element 462 to determine the direction of the received audible signal. The controlling software program and the direction determination element 463 may break-up the pattern of the geometrical array of microphones into binned regions. The number of binned regions will typically relate to the number of microphones that are contained within the geometrical array of microphones as well as the minimum width of the beam. In one example, the electronic device 102 illustrated in
Having determined the region of the geometrical array of microphones that the audible source is positioned nearest, the direction determination element 463 will then determine which directional bin (or binned region) within the determined region the audible source's direction is closest to so that a nearest quasi-direction can be determined. The directional bins are formed by dividing the angular region or sector into a desired number of sub-regions that meet desired accuracy and computing power goals. For example, each of the sectors 531A, 531B and 531C may be divided up into four binned regions that are each separated by a 30° interval. In one example, the first sector 531A's four bin configuration can be divided so that the edges of each of the bins have a known quasi-direction, such as directions 0°, 30°, 60°, 90° and 120° being the edges between the four formed bins. Thus, the angular distance formed for each defined bin (e.g., 30° bin) is disposed between a first known direction and a second known direction. In one example, the first direction is equal to the 0° direction and the second direction is equal to the 30° direction, wherein the first direction extends from the vertex point 536 through a portion of a first microphone 101A (e.g., geometric center of the microphone) and the second direction extends from the vertex point 536 in the 30° direction.
In one embodiment, to determine which of the quasi-directions the audible source's direction is closest to, the direction determination element 463 first calculates ratios of various time delays (e.g., first non-zero delay/second non-zero delay) measured by the delay analysis element 462 and then compares these calculated ratios with angular time delay ratio data that is stored within the memory unit 122. The stored angular time delay ratio data will include previously calculated data that is formed by calculating a ratio of the expected delays times that the microphones would see if the audible source was positioned at the edges of the bins that surround each of the quasi directions within a determined region of the electronic device 102. Therefore, using the example above, if the audible source is positioned at an angle of about 35° relative to the electronic device 102, the direction determination element 463 will determine, based on a calculated ratio of the delay time experienced by the microphone 101C to the delay time experienced by the microphone 101B, that the calculated ratio is closer to a stored angular time delay ratio associated with the 30° quasi-direction than any of the other stored angular time delay ratios associated with the other quasi-directions 0°, 60°, 90° or 120°. Alternately, the direction determination element 463 may determine that the calculated ratio is closer to a stored angular time delay ratio associated with the 30° quasi-direction by determining that the calculated ratio falls within a range that is half the bin size on either side of the quasi-direction. In this example, the direction determination element 463 may compare the calculated ratio with the stored angular time delay ratio associated with the directions 15°, 45°, 75° and 105°, and then determine that the calculated ratio falls between the stored angular time delay ratios that coincide with 15° and 45°. Therefore, the audible source is most likely positioned at the 30° quasi-direction.
In the somewhat rare case that the direction determination element 463 finds that the calculated ratio exactly matches a stored angular time delay ratio the controlling software program need not continue on with the process of determining that the calculated ratio falls between the stored angular time delay ratios.
One will appreciate that the process of determining the direction of the audible source is thus greatly simplified versus the mathematically intensive iterative process of determining the exact position of the audible source using a more conventional analytically intensive and power intensive approach. The greatly simplified statistical approach of determining the source direction will also reduce the performance requirements that the processor 118 needs to possess to perform these tasks.
In some embodiments, the direction determination element 463 will determine a direction of a received audible signal by first determining the relative time delays experienced by each microphone, and then comparing each of the relative time delays with a plurality of stored angular time delays. Each of the plurality of stored angular time delays, which are stored within memory, can be associated with a direction that is oriented relative to the non-linear array of microphones. Thus, for example, a stored angular time delay for each microphone can be associated for each quasi-direction, such as quasi-directions 0°, 30°, 60°, 90° or 120°. However, it is believed that the use of the ratio of the expected delay times in certain geometrical array configurations can be advantageous. For example, use of the ratio when the audible source may be positioned in 3-D space at an angle relative to a plane that contains a planar array of microphones can be useful due to the inherent comparison of the relative delays between microphones provided by the ratio versus other techniques that only compare delay times with the stored angular delay times.
However, since the accuracy of the time delay measurements determined by the delay analysis element 462 is also limited by the number of samples that can be collected by the processor 118 within the actual time delay experienced between microphones, some uncertainty in the determined time delay values will exist. The accuracy of the time delay is thus limited by the sampling frequency and spacing between microphones. Each of the samples being taken sequentially in time at a desired sampling frequency, and thus each sample include a portion of a received audible signal. The spacing of the microphones and the sampling frequency thus need to be large enough to allow at least two samples to be taken within the time that the receipt of the audible signal is delayed without upsampling. One will note that the process of upsampling can be a significant drain on the processor resources and also the electrical power required to perform this task. For example, if the processor is sampling at a frequency of 48 kHz (e.g., 21 μs per sample) and the microphones are spaced 70 mm apart will allow 10 samples to be taken by the processor within the delay time, while a microphone spacing of 14 mm would only allow the processor to take 2 samples within the delay time. The uncertainty in the determined time delay values due to the often small number of samples and noise contained within the received audible signals can cause jitter between the determined source position states, which will affect the ability of the direction determination element 463 to determine and settle on one probable quasi-direction. Oscillating between the determined source position states at a high rate may affect the signal processing technique's ability to perform its desired function.
In some embodiments of step 485, the controlling software program analyzes the frequencies at which the various determined directions are selected by the direction determination element 463 to determine the most probable determined direction of the audible source. In one example, the controlling software program will compare the number of times various determined directions are selected by the direction determination element 463 over a period of time and then select the direction that has the highest frequency over that period of time as the determined direction. Determining the most probable determined direction can be performed in a rolling average type of process where each determined direction within the rolling period can be taken as a “vote” that are summed to determine which direction gets the most votes over the current rolling period. The frequency that each particular determined direction is determined may include the analysis of two or more audio data samples that are sampled by the processor at the data sampling frequency (e.g., 48 kHz sampling frequency). This process can diminish the amount of jitter experienced from the output of the direction determination element 463.
Referring to
Next, at step 487, the direction selection element 465 then uses the determined coefficient values to determine the probable quasi-direction. The determined coefficient values are used to weight and thus damp the jitter experienced from the output of the direction determination element 463 as it determines and then refines the determined quasi-directions every received audible signal data sample or couple of audible signal data samples. In some embodiments, the determined coefficient values for each of the probable directions are summed over the sampling period or delay period, and the quasi-direction that has the highest sum total over the period is selected as the probable quasi-direction. In some configurations it may also be useful to give all of the coefficients associated with non-likely directions a zero or negative coefficient value to decrease the likelihood that these directions will be selected in this step.
At step 488, the direction delivery element 466 then delivers the detected direction information, which contains the determined direction or quasi-direction, to the first signal processor 405 and the second signal processor 407. The detected direction information is then received by the first signal processor 405 and the second signal processor 407, during steps 472 and 491, respectively, for further use or processing.
Referring to
In some embodiments, the second signal processor 407 uses the received direction or quasi-direction information for a first period of time that is longer than the time it takes the direction detection element 430 to update the direction or quasi-direction information (e.g., a second time). In this case, the rate at which the time delays are updated during the beamforming process is less than the rate at which the direction detection element 430 is able to update the direction or quasi-direction information, which will reduce a significant amount computing power, battery power and time expended by the electronic device 102. Use of this process can be helpful to smooth the final processed audible signal, which is generally not achieved if the rate the direction is updated is too rapid.
Referring back to
At steps 492-493, in some embodiments, the controlling software program performs an analysis of the received detected direction information received from the direction detection element 430 and then determines based on the detected direction or quasi-direction what the desired delay needs to be for each of the audible signals received by each of the microphones based on specific directional time delay information stored within the memory unit 122. The directional time delay information stored within the memory unit 122 may include a table of all of the possible directions or quasi-directions that the direction detection element 430 will provide to the second signal processing technique and all of the time delay values that are associated with each of the possible directions or quasi-directions for each of the microphones. Thus, the table will contain a time delay value for each microphone for each of the possible directions or quasi-directions. In one example, as shown in
Next, at step 494, the delayed audible signals are then combined together by use of the signal combining element 432. Then, at step 495, the delayed audible signals are then optionally filtered by use of the signal filtering element 408 within the post processing element 451. The signal filtering element may include a high pass filter that is able to remove frequencies that are lower than the useable range of the second signal processor 407. For example, high pass filter frequency may be configured to allow frequencies higher than about 1,000 Hz to pass. The appropriately delayed and combined audible signals will constructively add, and thus improve the signal-to-noise ratio of the audible source's audible input versus the off-axis noise delivered from unwanted noise sources.
As noted above, after the first and second signal processing processes have been completed the processed audible signal output from the first signal processor 405 may be further processed by use of a post processing elements 452 (step 477) and the processed audible signal output from the second signal processor 407 may be further processed by use of a post processing elements 451 (step 495) before they are combined together by the signal combining element 414 at step 498. The post processing elements 451, 452 may each include one or more amplifiers that are able to adjust the signal levels of the processed audible signals before they are combined.
At step 499, the signal combining element 414 then provides the processed audible signal (e.g., desired audible signal) to a downstream element 415. As noted above, the downstream element 415 may include a software application or other electronic device that uses the processed audible signal to perform some desired activity. The downstream element 415 can be an electronic component that is in direct communication or wireless communication with the signal combining element 414, which is disposed within the I/O device 116. In one configuration, the downstream element 415 can be an electronic component disposed within the audible signal processing system 100. In another configuration, the downstream element 415 can be an electronic component disposed within an electronic device that is external to the audible signal processing system 100. Examples of an external electronic device will include a wireless speaker, a video camera device, a keyboard, a smart phone, a speaker phone, a home automation device, or other useful electronic device that is positioned to allow communication with one or more electronic components found within the audible signal processing system 100.
One or more of the embodiments of the disclosure provided herein may be implemented as a program product for use with a computer system. The program(s) of the program product define functions of the embodiments (including the methods described herein) and can be contained on a variety of computer-readable storage media. Illustrative computer-readable storage media include, but are not limited to: (i) non-writable storage media (e.g., read-only memory devices within a computer such as CD-ROM disks readable by a CD-ROM drive, flash memory, ROM chips or any type of solid-state non-volatile semiconductor memory) on which information is permanently stored; and (ii) writable storage media (e.g., floppy disks within a diskette drive or hard-disk drive or any type of solid-state random-access semiconductor memory) on which alterable information is stored.
The invention has been described above with reference to specific embodiments. Persons skilled in the art, however, will understand that various modifications and changes may be made thereto without departing from the broader spirit and scope of the invention as set forth in the appended claims. The foregoing description and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.
This application is a continuation-in-part of U.S. patent application Ser. No. 15/787,699, filed Oct. 18, 2017, which is a continuation-in-part of U.S. patent application Ser. No. 15/650,614, filed Jul. 14, 2017, which claims the benefit of U.S. provisional patent application Ser. No. 62/456,632, filed Feb. 8, 2017, which are all herein incorporated by reference.
Number | Date | Country | |
---|---|---|---|
62456632 | Feb 2017 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 15787699 | Oct 2017 | US |
Child | 15825100 | US | |
Parent | 15650614 | Jul 2017 | US |
Child | 15787699 | US |