Embodiments described herein generally relate to automatic speech recognition (ASR) and more specifically to a microphone board for far field ASR.
ASR involves a machine-based collection of techniques to understand human languages. ASR is interdisciplinary, often involving microphone, analog to digital conversion, frequency processing, database, and artificial intelligence technologies to convert the spoken word into textual or machine readable representations of not only what said (e.g., a transcript) but also what was meant (e.g., semantic understanding) by a human speaker. Far field ASR involves techniques to decrease a word error rate (WER) in utterances made a greater distance to a microphone, or microphone array, than traditionally accounted for in ASR processing pipelines. Such distance often decreases the signal to noise (SNR) ration and thus increases WER in traditional ASR systems. As used herein, far field ASR involves distances more than half meter from the microphone.
In the drawings, which are not necessarily drawn to scale, like numerals may describe similar components in different views. Like numerals having different letter suffixes may represent different instances of similar components. The drawings illustrate generally, by way of example, but not by way of limitation, various embodiments discussed in the present document.
Embodiments and examples herein general described a number of systems, devices, and techniques for enhancing far field ASR via a microphone board. It is understood, however, that the systems, devices, and techniques are examples illustrating the underlying concepts.
A factor that may contribute to far field ASR performance includes the number of microphones, their characteristics and configuration, as well as microphone mounting techniques. Here we describe a microphone array that was demonstrated to improve far field ASR performance. In an example, both circular and linear microphone arrays are combined. Such a combination of arrays allows the simultaneous use of beam-forming techniques designed for linearly arranged microphones and beam-forming techniques designed for circularly arranged microphones. For the linear arrangement, one may use a phased-based beam-forming (PBF) technique, such as that discussed in US20150078571. For the circularly arranged microphones, one may use, for example, the Minimum Variance Distortionless Response (MVDR) beam-forming technique. In an example, the microphones may be mounted (e.g., to the housing) to reduce (e.g., minimize) leakage between microphones as well as microphone-loudspeaker or microphone-enclosure coupling. By combining microphone arrays and addressing acoustical leakage, the described microphone apparatus improves audio capabilities to enhance far field ASR.
The smart home gateway 105 includes a first plurality of microphones along a circumference of a circle 120 on a surface. This microphone arrangement may be referred to as a circular array or circular microphone array. As illustrated here, the surface is the top of the smart home gateway 105, however, another surface, such as the side of a wall mounted device may also be used. In an example, the number of microphones in the first plurality of microphones is an even number. In an example, the first plurality of microphones includes six microphones.
In an example, the microphones used may be bottom ported silicon digital transducers. For example, a Knowles digital MEMS with PDM output may be used. However, other manufactures (e.g. Akustica, Cirrus, STMicro) with other interfaces (e.g. I2S) or porting (e.g. top) may be used as well.
The smart home gateway 105 may also include a second plurality of microphones along a line 115 on the surface. This arrangement may be referred to as a linear array or linear microphone array. In an example, the number of microphones in the second plurality of microphones is an even number. In an example, the second plurality of microphones includes four microphones. In an example, the total of the first plurality of microphones and the second plurality of microphones is eight.
In an example, the line 115 intersects the circle 120. In an example, the line 115 terminates on the circle 120. In an example, the line 115 has a length equal to a diameter of the circle 120, as illustrated here. The relationship between the line 115 and the circle 120 provides a known relationship between these microphone array geometries that may be exploited for different ASR processing purposes. By intersecting the line 115 with the circle 120, the microphones may be compactly placed on the surface while still gaining the benefit of the multiple microphone array geometries. In an example, at least one microphone is in the first plurality of microphones and is in the second plurality of microphones. In an example, the line is wholly within the circle. The inclusion of one or more microphones in both (or more) microphone arrays efficiently uses the available space while still permitting multiple array geometries for far field ASR processing, such as beamforming, noise mitigation, etc.
The smart home gateway 105 also includes a first set of connections to the first plurality of microphones and a second set of connections to the second plurality of microphones. These sets of connections are at least a logical link between the microphones and other processing elements. A non-logical link, such as an individual wire leading to each microphone in the first plurality of microphones may also be used. However, the microphones may be connected via a bus, or other interlink, and encode individual signals with a microphone identification (ID) or the like to form a logical set of connections. These connections permit downstream processors to identify the source microphone of a signal in order to perform processing, such as phase based beamforming. In an example, the connection is labeled, or otherwise identifiable, by its microphone's position in an array (e.g., whether the array is linear, an index in the array or other positioning within the array).
The smart home gateway 105 also includes a connector to provide the first set connections and the second set of connections to an external entity. As used here, an external entity is any component that uses data derived from the microphone signals, including unaltered audio signals directly from the microphones, via the connector. Thus, the microphone arrays may be installed on a number of devices and interoperate with devices produced, for example, by other manufacturers without modification.
In an example, the smart home gateway 105 may include a cover affixed to the surface with a first fastener and a first vibration dampener and the surface is itself affixed to the housing of the device via a second fastener and a second vibration dampener. As illustrated, the smart gateway 105 in
In an example, the first fastener and the second fastener are the same. In this example, a single element, such as a bolt, secures the cover to the surface and to the housing. This is not to be confused with the first and second fastener being of the same type (which they may be even if they are separate fasteners). In an example, at least one of the first fastener or the second fastener is a screw, a clip, bolt, pin, or tie. In the example arrangement of a single fastener, a gap, or space, may exist between microphones between the cover and the housing. That is, sound entering a lumen 110 for a first microphone may also pass through this space to a second microphone. Such cross lumen 110 sound transmission may impair a variety of far field ASR processing operations. Thus, in an example, the first vibration dampener seals space between the surface and a microphone when secured by the first fastener. This arrangement reduces acoustical leakage between microphones and lumens 110.
The combination linear and circular microphone arrays and mounting the microphones to inhibit leakage improve far field ASR performance over current microphone arrays. For example, the devices that use only a circular microphone array does not allow use of beam-forming techniques for linearly arranged microphones. Such beamforming helps to reduce noise, improve voice quality, and thus improve ASR performance. Further, reducing acoustical leakage and coupling (e.g., between the housing and different microphones) improves signal quality for beam-forming techniques and acoustical echo cancellation.
Leakage from an internal loudspeaker (assuming the device uses one) to the microphones may be reduced via additional isolation.
Isolating a mounting fastener, such as screws, may be used to further reduce leakage. An example of such a mounting with screws is depicted in the example of
Using either the combination linear and circular microphone arrays or the isolating mounting improves microphone and microphone array performance, thus leading to good quality microphone signals. These signals may be further processed, as described above, with the far field pre-processing pipeline. When both the microphone board and the pre-processing pipeline are combined, large improvements to far field ASR performance may be achieved. From the evidence, it is clear that these systems and techniques lower WERs, leading to improved ASR performance.
At operation 705, a first plurality of microphones is disposed along a circumference of a circle on a surface. In an example, the first plurality of microphones includes six microphones.
At operation 710, a second plurality of microphones is disposed along a line on the surface. In an example, the second plurality of microphones includes four microphones. In an example the total of the first plurality of microphones and the second plurality of microphones is eight. In an example the first plurality of microphones has an even number of microphones. In an example the first plurality of microphones has more than two microphones. In an example the second plurality of microphones has an even number of microphones.
In an example the line intersects the circle. In an example the line terminates on the circle. In an example the line has a length equal to a diameter of the circle. In an example, at least one microphone is in the first plurality of microphones and is in the second plurality of microphones. In an example the line is wholly within the circle.
At operation 715, first connections to the first plurality of microphones are grouped together.
At operation 720, second connections to the second plurality of microphones are grouped together.
At operation 725, the first connections and the second connections are provided to an external entity via a connector.
The operations of the method 700 may be optionally extended to include affixing a cover to the surface with a first fastener and a first vibration dampener; and affixing the surface to a housing with a second fastener and a second vibration dampener. In an example the vibration dampener is an elastomer. In an example the first fastener and the second fastener are the same. In an example the first fastener and the second fastener are a screw. In an example the first vibration dampener seals space between the surface and a microphone when secured by the first fastener.
Examples, as described herein, may include, or may operate by, logic or a number of components, or mechanisms. Circuitry is a collection of circuits implemented in tangible entities that include hardware (e.g., simple circuits, gates, logic, etc.). Circuitry membership may be flexible over time and underlying hardware variability. Circuitries include members that may, alone or in combination, perform specified operations when operating. In an example, hardware of the circuitry may be immutably designed to carry out a specific operation (e.g., hardwired). In an example, the hardware of the circuitry may include variably connected physical components (e.g., execution units, transistors, simple circuits, etc.) including a computer readable medium physically modified (e.g., magnetically, electrically, moveable placement of invariant massed particles, etc.) to encode instructions of the specific operation. In connecting the physical components, the underlying electrical properties of a hardware constituent are changed, for example, from an insulator to a conductor or vice versa. The instructions enable embedded hardware (e.g., the execution units or a loading mechanism) to create members of the circuitry in hardware via the variable connections to carry out portions of the specific operation when in operation. Accordingly, the computer readable medium is communicatively coupled to the other components of the circuitry when the device is operating. In an example, any of the physical components may be used in more than one member of more than one circuitry. For example, under operation, execution units may be used in a first circuit of a first circuitry at one point in time and reused by a second circuit in the first circuitry, or by a third circuit in a second circuitry at a different time.
Machine (e.g., computer system) 800 may include a hardware processor 802 (e.g., a central processing unit (CPU), a graphics processing unit (GPU), a hardware processor core, or any combination thereof), a main memory 804 and a static memory 806, some or all of which may communicate with each other via an interlink (e.g., bus) 808. The machine 800 may further include a display unit 810, an alphanumeric input device 812 (e.g., a keyboard), and a user interface (UI) navigation device 814 (e.g., a mouse). In an example, the display unit 810, input device 812 and UI navigation device 814 may be a touch screen display. The machine 800 may additionally include a storage device (e.g., drive unit) 816, a signal generation device 818 (e.g., a speaker), a network interface device 820, and one or more sensors 821, such as a global positioning system (GPS) sensor, compass, accelerometer, or other sensor. The machine 800 may include an output controller 828, such as a serial (e.g., universal serial bus (USB), parallel, or other wired or wireless (e.g., infrared (IR), near field communication (NFC), etc.) connection to communicate or control one or more peripheral devices (e.g., a printer, card reader, etc.).
The storage device 816 may include a machine readable medium 822 on which is stored one or more sets of data structures or instructions 824 (e.g., software) embodying or utilized by any one or more of the techniques or functions described herein. The instructions 824 may also reside, completely or at least partially, within the main memory 804, within static memory 806, or within the hardware processor 802 during execution thereof by the machine 800. In an example, one or any combination of the hardware processor 802, the main memory 804, the static memory 806, or the storage device 816 may constitute machine readable media.
While the machine readable medium 822 is illustrated as a single medium, the term “machine readable medium” may include a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers) configured to store the one or more instructions 824.
The term “machine readable medium” may include any medium that is capable of storing, encoding, or carrying instructions for execution by the machine 800 and that cause the machine 800 to perform any one or more of the techniques of the present disclosure, or that is capable of storing, encoding or carrying data structures used by or associated with such instructions. Non-limiting machine readable medium examples may include solid-state memories, and optical and magnetic media. In an example, a massed machine readable medium comprises a machine readable medium with a plurality of particles having invariant (e.g., rest) mass. Accordingly, massed machine-readable media are not transitory propagating signals. Specific examples of massed machine readable media may include: non-volatile memory, such as semiconductor memory devices (e.g., Electrically Programmable Read-Only Memory (EPROM), Electrically Erasable Programmable Read-Only Memory (EEPROM)) and flash memory devices, magnetic disks, such as internal hard disks and removable disks; magneto-optical disks; and CD-ROM and DVD-ROM disks.
The instructions 824 may further be transmitted or received over a communications network 826 using a transmission medium via the network interface device 820 utilizing any one of a number of transfer protocols (e.g., frame relay, internet protocol (IP), transmission control protocol (TCP), user datagram protocol (UDP), hypertext transfer protocol (HTTP), etc.). Example communication networks may include a local area network (LAN), a wide area network (WAN), a packet data network (e.g., the Internet), mobile telephone networks (e.g., cellular networks), Plain Old Telephone (POTS) networks, and wireless data networks (e.g., Institute of Electrical and Electronics Engineers (IEEE) 802.11 family of standards known as Wi-Fi®, IEEE 802.16 family of standards known as WiMax®), IEEE 802.15.4 family of standards, peer-to-peer (P2P) networks, among others. In an example, the network interface device 820 may include one or more physical jacks (e.g., Ethernet, coaxial, or phone jacks) or one or more antennas to connect to the communications network 826. In an example, the network interface device 820 may include a plurality of antennas to wirelessly communicate using at least one of single-input multiple-output (SIMO), multiple-input multiple-output (MIMO), or multiple-input single-output (MISO) techniques. The term “transmission medium” shall be taken to include any intangible medium that is capable of storing, encoding or carrying instructions for execution by the machine 800, and includes digital or analog communications signals or other intangible medium to facilitate communication of such software.
Example 1 is a device for creating a far-field microphone array, the device comprising: a first plurality of microphones along a circumference of a circle on a surface of the device; a second plurality of microphones along a line on the surface; a first set of connections to the first plurality of microphones; a second set of connections to the second plurality of microphones; and a connector to provide the first connections and the second connections to an external entity of the surface.
In Example 2, the subject matter of Example 1 optionally includes wherein the line intersects the circle.
In Example 3, the subject matter of Example 2 optionally includes wherein the line terminates on the circle.
In Example 4, the subject matter of Example 3 optionally includes wherein the line has a length equal to a diameter of the circle.
In Example 5, the subject matter of any one or more of Examples 3-4 optionally include wherein at least one microphone is in the first plurality of microphones and is in the second plurality of microphones.
In Example 6, the subject matter of any one or more of Examples 1-5 optionally include wherein the line is wholly within the circle.
In Example 7, the subject matter of any one or more of Examples 1-6 optionally include wherein the first plurality of microphones includes six microphones.
In Example 8, the subject matter of any one or more of Examples 1-7 optionally include wherein the second plurality of microphones includes four microphones.
In Example 9, the subject matter of any one or more of Examples 1-8 optionally include wherein the total of the first plurality of microphones and the second plurality of microphones is eight, wherein the first plurality of microphones has an even number of microphones, wherein the first plurality of microphones has more than two microphones, and wherein the second plurality of microphones has an even number of microphones.
In Example 10, the subject matter of any one or more of Examples 1-9 optionally include a cover affixed to the surface with a first fastener and a first vibration dampener; and a housing affixed to the surface with a second fastener and a second vibration dampener.
In Example 11, the subject matter of Example 10 optionally includes wherein the vibration dampener has rubber-like properties.
In Example 12, the subject matter of any one or more of Examples 10-11 optionally include wherein the first fastener and the second fastener are the same and are a screw.
In Example 13, the subject matter of any one or more of Examples 10-12 optionally include wherein the first vibration dampener seals space between the surface and a microphone when secured by the first fastener.
Example 14 is at least one machine readable medium including instructions for creating a far-field microphone array, the instructions, when executed by a machine, causing the machine to perform operations comprising: disposing a first plurality of microphones along a circumference of a circle on a surface; disposing a second plurality of microphones along a line on the surface; grouping first connections to the first plurality of microphones together; grouping second connections to the second plurality of microphones together, and providing the first connections and the second connections to an external entity of the surface via a connector.
In Example 15, the subject matter of Example 14 optionally includes wherein the line intersects the circle.
In Example 16, the subject matter of Example 15 optionally includes wherein the line terminates on the circle.
In Example 17, the subject matter of Example 16 optionally includes wherein the line has a length equal to a diameter of the circle.
In Example 18, the subject matter of any one or more of Examples 16-17 optionally include wherein at least one microphone is in the first plurality of microphones and is in the second plurality of microphones.
In Example 19, the subject matter of any one or more of Examples 14-18 optionally include wherein the line is wholly within the circle.
In Example 20, the subject matter of any one or more of Examples 14-19 optionally include wherein the first plurality of microphones includes six microphones.
In Example 21, the subject matter of any one or more of Examples 14-20 optionally include wherein the second plurality of microphones includes four microphones.
In Example 22, the subject matter of any one or more of Examples 14-21 optionally include wherein the total of the first plurality of microphones and the second plurality of microphones is eight, wherein the first plurality of microphones has an even number of microphones, wherein the first plurality of microphones has more than two microphones, and wherein the second plurality of microphones has an even number of microphones.
In Example 23, the subject matter of any one or more of Examples 14-22 optionally include wherein the operations comprise: affixing a cover to the surface with a first fastener and a first vibration dampener; and affixing the surface to a housing with a second fastener and a second vibration dampener.
In Example 24, the subject matter of Example 23 optionally includes wherein the vibration dampener has rubber-like properties.
In Example 25, the subject matter of any one or more of Examples 23-24 optionally include wherein the first fastener and the second fastener are the same and are a screw.
In Example 26, the subject matter of any one or more of Examples 23-25 optionally include wherein the first vibration dampener seals space between the surface and a microphone when secured by the first fastener.
Example 27 is a method for creating a far-field microphone array, the method comprising: disposing a first plurality of microphones along a circumference of a circle on a surface; disposing a second plurality of microphones along a line on the surface; grouping first connections to the first plurality of microphones together; grouping second connections to the second plurality of microphones together; and providing the first connections and the second connections to an external entity of the surface via a connector.
In Example 28, the subject matter of Example 27 optionally includes wherein the line intersects the circle.
In Example 29, the subject matter of Example 28 optionally includes wherein the line terminates on the circle.
In Example 30, the subject matter of Example 29 optionally includes wherein the line has a length equal to a diameter of the circle.
In Example 31, the subject matter of any one or more of Examples 29-30 optionally include wherein at least one microphone is in the first plurality of microphones and is in the second plurality of microphones.
In Example 32, the subject matter of any one or more of Examples 27-31 optionally include wherein the line is wholly within the circle.
In Example 33, the subject matter of any one or more of Examples 27-32 optionally include wherein the first plurality of microphones includes six microphones.
In Example 34, the subject matter of any one or more of Examples 27-33 optionally include wherein the second plurality of microphones includes four microphones.
In Example 35, the subject matter of any one or more of Examples 27-34 optionally include wherein the total of the first plurality of microphones and the second plurality of microphones is eight, wherein the first plurality of microphones has an even number of microphones, wherein the first plurality of microphones has more than two microphones, and wherein the second plurality of microphones has an even number of microphones.
In Example 36, the subject matter of any one or more of Examples 27-35 optionally include affixing a cover to the surface with a first fastener and a first vibration dampener; and affixing the surface to a housing with a second fastener and a second vibration dampener.
In Example 37, the subject matter of Example 36 optionally includes wherein the vibration dampener with rubber-like properties.
In Example 38, the subject matter of any one or more of Examples 36-37 optionally include wherein the first fastener and the second fastener are the same and are a screw.
In Example 39, the subject matter of any one or more of Examples 36-38 optionally include wherein the first vibration dampener seals space between the surface and a microphone when secured by the first fastener.
Example 40 is a system including means to perform any of methods 27-39.
Example 41 is at least one machine readable medium including instructions that, when executed by a machine, cause the machine to perform any of methods 27-39.
Example 42 is a system for creating a far-field microphone array, the system comprising: means for disposing a first plurality of microphones along a circumference of a circle on a surface; means for disposing a second plurality of microphones along a line on the surface; means for grouping first connections to the first plurality of microphones together; means for grouping second connections to the second plurality of microphones together; and means for providing the first connections and the second connections to an external entity of the surface via a connector.
In Example 43, the subject matter of Example 42 optionally includes wherein the line intersects the circle.
In Example 44, the subject matter of Example 43 optionally includes wherein the line terminates on the circle.
In Example 45, the subject matter of Example 44 optionally includes wherein the line has a length equal to a diameter of the circle.
In Example 46, the subject matter of any one or more of Examples 44-45 optionally include wherein at least one microphone is in the first plurality of microphones and is in the second plurality of microphones.
In Example 47, the subject matter of any one or more of Examples 42-46 optionally include wherein the line is wholly within the circle.
In Example 48, the subject matter of any one or more of Examples 42-47 optionally include wherein the first plurality of microphones includes six microphones.
In Example 49, the subject matter of any one or more of Examples 42-48 optionally include wherein the second plurality of microphones includes four microphones.
In Example 50, the subject matter of any one or more of Examples 42-49 optionally include wherein the total of the first plurality of microphones and the second plurality of microphones is eight, wherein the first plurality of microphones has an even number of microphones, wherein the first plurality of microphones has more than two microphones, and wherein the second plurality of microphones has an even number of microphones.
In Example 51, the subject matter of any one or more of Examples 42-50 optionally include means for affixing a cover to the surface with a first fastener and a first vibration dampener; and means for affixing the surface to a housing with a second fastener and a second vibration dampener.
In Example 52, the subject matter of Example 51 optionally includes wherein the vibration dampener with rubber-like properties.
In Example 53, the subject matter of any one or more of Examples 51-52 optionally include wherein the first fastener and the second fastener are the same and are a screw.
In Example 54, the subject matter of any one or more of Examples 51-53 optionally include wherein the first vibration dampener seals space between the surface and a microphone when secured by the first fastener.
The above detailed description includes references to the accompanying drawings, which form a part of the detailed description. The drawings show, by way of illustration, specific embodiments that may be practiced. These embodiments are also referred to herein as “examples.” Such examples may include elements in addition to those shown or described. However, the present inventors also contemplate examples in which only those elements shown or described are provided. Moreover, the present inventors also contemplate examples using any combination or permutation of those elements shown or described (or one or more aspects thereof), either with respect to a particular example (or one or more aspects thereof), or with respect to other examples (or one or more aspects thereof) shown or described herein.
All publications, patents, and patent documents referred to in this document are incorporated by reference herein in their entirety, as though individually incorporated by reference. In the event of inconsistent usages between this document and those documents so incorporated by reference, the usage in the incorporated reference(s) should be considered supplementary to that of this document; for irreconcilable inconsistencies, the usage in this document controls.
In this document, the terms “a” or “an” are used, as is common in patent documents, to include one or more than one, independent of any other instances or usages of“at least one” or “one or more.” In this document, the term “or” is used to refer to a nonexclusive or, such that “A or B” includes “A but not B,” “B but not A,” and “A and B,” unless otherwise indicated. In the appended claims, the terms “including” and “in which” are used as the plain-English equivalents of the respective terms “comprising” and “wherein.” Also, in the following claims, the terms “including” and “comprising” are open-ended, that is, a system, device, article, or process that includes elements in addition to those listed after such a term in a claim are still deemed to fall within the scope of that claim. Moreover, in the following claims, the terms “first,” “second,” and “third,” etc. are used merely as labels, and are not intended to impose numerical requirements on their objects.
The above description is intended to be illustrative, and not restrictive. For example, the above-described examples (or one or more aspects thereof) may be used in combination with each other. Other embodiments may be used, such as by one of ordinary skill in the art upon reviewing the above description. The Abstract is to allow the reader to quickly ascertain the nature of the technical disclosure and is submitted with the understanding that it will not be used to interpret or limit the scope or meaning of the claims. Also, in the above Detailed Description, various features may be grouped together to streamline the disclosure. This should not be interpreted as intending that an unclaimed disclosed feature is essential to any claim. Rather, inventive subject matter may lie in less than all features of a particular disclosed embodiment. Thus, the following claims are hereby incorporated into the Detailed Description, with each claim standing on its own as a separate embodiment. The scope of the embodiments should be determined with reference to the appended claims, along with the full scope of equivalents to which such claims are entitled.
This patent application claims the benefit of priority, under 35 U.S.C. §119, to U.S. Provisional Application Ser. No. 62/350,507, titled “FAR FIELD AUTOMATIC SPEECH RECOGNITION” and filed on Jun. 15, 2016, the entirety of which is hereby incorporated by reference herein.
Number | Date | Country | |
---|---|---|---|
62350507 | Jun 2016 | US |