The present technology relates to electronic devices that are equipped with microphones. In particular, the present technology relates to electronic devices equipped with microphones, receivers including those of transceivers and other electronic components such as transmitters and speakers, batteries, etc., so that sound captured by the microphones can be converted to digital audio signals and transmitted. The present technology also relates to noise canceling and distortion-preventing techniques used to enhance the fidelity of digital audio signals, and to related techniques such as voice canceling, speech-to-text and noisy-to-clear speech conversion. Thus, the present technology may be applied to speech recognition, beamforming, and conferencing including virtual conferencing, etc.
Speech recognition at its core is a technology aimed at capturing speech and reproducing an accurate version thereof. Speech recognition and similar applications present many challenges especially under certain conditions and in certain environments. Single microphone approaches typically suffer from drawbacks such as difficulty in distinguishing between multiple speakers and removing background noise, and can only output low fidelity audio signals. Existing attempts to solve these issues use multiple microphones in a fixed configuration, but these systems have limited feasibility because the fixed positions and orientations of the microphones is not optimal for many situations requiring their use.
An object of the present technology is to leverage microphone-equipped electronic devices to produce enhanced audio.
Similarly, another object of the present technology is to provide a method of producing enhanced audio by leveraging a plurality of microphone-equipped electronic devices.
According to one aspect of the present technology, there is provided audio apparatus for capturing ambient sound and producing an enhanced audio signal from the captured sound, comprising a plurality of discrete self-contained electronic devices each or collectively having an audio functionality apart from the apparatus, and a stand to which the electronic devices are held in respective static positions as spaced apart from each other and at which the microphone or microphone array of each of the electronic devices is acoustically exposed to the ambient environment. The discrete self-contained electronic devices each include a receiver, and a microphone or microphone array, and a casing housing the receiver and the microphone or microphone array. The stand includes means for temporarily holding the electronic devices in their respective static positions at which the microphone or microphone array of each of the electronic devices is acoustically exposed to the ambient environment. The audio apparatus also includes a processing unit operatively connected to the microphone or microphone array of each of the electronic devices so as to receive as input streams of audio signals from the microphones or microphone arrays of the electronic devices. The processing unit stores information indicative of the distance(s) between and orientations of the microphones or microphone arrays of the electronic devices in their respective static positions, and is configured to process the input streams of audio based on that information. The audio apparatus also includes a transmitter or at least one speaker operatively connected to the processing unit to transmit enhanced audio resulting from the processing of the signals using the information indicative of the distance between and orientations of the microphone or microphone arrays of the electronic devices.
According to another aspect of the present technology, there is provided audio apparatus for capturing ambient sound and producing an enhanced audio signal from the captured sound, comprising: a plurality of discrete microphone-equipped devices each having a microphone or microphone array housed in a casing wherein the devices each or collectively have an audio functionality apart from the apparatus, a stand discrete from the microphone-equipped devices, and a processing unit comprising a processor operatively connected to the microphone-equipped devices and non-transitory computer-readable media (CRM). The stand has means for temporarily holding the microphone-equipped devices in respective static positions at which the microphone or microphone array of each of the microphone-equipped devices is acoustically exposed to the ambient environment outside the stand. The non-transitory CRM stores operating instructions and information indicative of distance between and orientations of the microphone-equipped devices in their static positions. The processor is operatively connected to the microphone or microphone array of each of the electronic devices, and the processor is configured to: execute the operating instructions to control the microphone-equipped devices to capture raw audio and convert the raw audio to streams of audio signals, process the streams of audio signals based on the information indicative of the distance between and orientations of the microphone-equipped devices, and transmit a result of the processed streams of audio signals.
According to still another aspect of the present technology, there is provided a method of producing enhanced audio by leveraging electronic devices each including a receiver, a microphone or microphone array, and a casing housing the receiver and a microphone or microphone array, the method comprising: taking the electronic devices and temporarily fixing them to a stand in static positions as spaced apart from each, capturing sound from the ambient environment outside the stand using the microphones or microphone arrays of the electronic devices while in said static positions, and converting the sounds to respective streams of audio signals, digitally processing the streams of audio signals based on information including distance between and orientations of the electronic devices in said static positions, and transmitting the digitally processed audio signals.
These and other objects, features and advantages of the present technology will be better understood from the detailed description of preferred embodiments and examples thereof that follows with reference to the accompanying drawings, in which:
Embodiments of the present technology and examples thereof will now be described more fully in detail hereinafter with reference to the accompanying drawings. In the drawings, elements may be shown schematically for ease of understanding. Also, like numerals are used to designate like elements throughout the drawings.
Certain examples may be described and illustrated in terms of blocks which carry out a described function or functions. These blocks, which may be referred to herein as modules or the like, are physically implemented by analog and/or digital circuits such as logic gates, integrated circuits, microprocessors, microcontrollers, memory circuits, passive electronic components, active electronic components, optical components, hardwired circuits and the like, and may be driven by firmware and/or software of non-transitory computer readable media (CRM). In the present disclosure, the term non-transitory computer readable medium or media (CRM) refers to any medium that stores data in a machine-readable format for short periods or in the presence of power such as a memory device or Random Access Memory (RAM). The circuits may, for example, be embodied in one or more semiconductor chips, or on substrate supports such as printed circuit boards and the like. The circuits constituting a block may be implemented by dedicated hardware or by a specialized computer (e.g., one or more programmed microprocessors and associated circuitry, a CPU and/or a GPU, and associated memory programmed with and/or storing algorithms, operating instructions, audio signals/information, text, etc.), or by a combination of dedicated hardware to perform some functions of the block and a specialized computer to perform other functions of the block. Each block of the examples may be physically separated into two or more interacting and discrete blocks and conversely, the blocks of the examples may be physically combined into more complex blocks while still providing the essential functions of the present technology.
In addition, the terminology used herein for the purpose of describing embodiments of the present technology is to be taken in context. For example, the term “comprises” or “comprising” when used in this disclosure indicates the presence of stated features in a system or steps in a process but does not preclude the presence of additional features or steps. The term “sound” will be used in a broad sense to mean vibrations which can travel through air or another medium and which can be heard naturally or when amplified. Thus, the term “raw audio” may refer to sound from the ambient environment in which audio apparatus according to the present technology is situated (or through which the audio apparatus is traveling). The term “high-fidelity audio” will be used conventionally to refer to sound having inaudible noise and distortion to the human ear, as is well understood in the art. The term “contour” will be understood as referring to the shape of an outer portion in whole or in part of an element. The term “acoustically exposed” will refer to an element's exposure to an ambient environment to such a degree that the element can transmit or receive sound without that sound being distorted or muffled to an appreciable extent. Thus, acoustically exposed may be understood as not only open to but as including the presence of acoustically transparent material, such as speaker grill cloth, plastic, wire screen and mesh and open cell phone. Moreover, when the distance between microphones of respective electronic devices is referred to, such a description may refer to the distance between the centers of individual microphones of the respective electronic devices or the distance between the centers of microphone arrays of respective electronic devices.
Referring now to
Examples of such microphone-equipped devices include ear-worn audio devices and mobile phones. Examples of ear-worn audio devices include True Wireless Stereo (TWS) earbuds, open-ear (OWS) earbuds, hearing aids, wired earbuds, i.e., basically any type of ear-worn audio device which can be used for audio recording and/or playback. In any case, the microphone-equipped devices will have an audio functionality (usefulness) apart from the stand, either individually as in the case of mobile phones or collectively as in the case of a pair of ear-worn audio devices.
The stand includes means for temporarily holding the microphone-equipped devices in respective and predetermined static positions at which the distance(s) between and orientations of the devices are specifications of the audio apparatus. The specifications can be predetermined or read from the microphone-equipped devices based on their model once they assume their static positions. For any microphone-equipped device, the structure of its microphone or microphone array is fixed and so, the distance(s) between and orientations of the devices corresponds to the distance(s) between and orientations of the respective devices. The stand and in particular, its holding means may be designed to optimize these specifications, e.g., maximize the distance between the microphones or microphone arrays in an array thereof when the microphone-equipped devices are in their static positions, or set a linear, circular, or planar geometry of the array depending on the desired functionality. In any case, these specifications of distance between and orientations of the microphone-equipped devices are optimized for leveraging the devices to capture sound in the ambient environment in which the audio apparatus is used and transmit enhanced (e.g., high-fidelity) audio of captured sound.
Examples of the means for holding the microphone-equipped devices in their static positions include portions of the stand defining cavities whose shapes are complementary to contours of the casings of the ear-worn audio devices, respectively. The microphone-equipped devices are received in the cavities in static positions under a high tolerance press-fit. Alternatively, the means for holding the microphone-equipped devices in their static positions may comprise ferromagnetic elements which attract ferromagnetic elements of the microphone-equipped devices to the stand.
A determination may be made as to whether the microphone-equipped devices are in the predetermined static positions relative to the stand (S20). For example, the stand may include proximity sensors integrated with certain other parts of the stand for sensing the presence of the microphone-equipped devices at their static positions, or for sensing whether part of the stand has been manipulated into a state in which the microphone-equipped devices must be in their static positions. Alternatively, the microphone-equipped devices may be equipped with proximity sensors triggered by the stand when they are in their static positions relative to the stand. The proximity sensors of the microphone-equipped devices may be similar to sensors of ear-worn audio devices, known per se, for detecting whether the ear-worn audio devices are situated in the ear canals of users. An example of proximity sensors that may be employed by the present technology are Hall Effect sensors.
In other examples, the sensors comprise electrical contacts and have a secondary function of connecting microcircuits of the microphone-equipped devices with circuitry in the stand. For example, the sensors are implemented as conductive spring-loaded pin contacts, and signals such as serial peripheral interface (SPI) signals are passed between the microphone-equipped devices and a processing unit in the stand. As another example, amplified microphone outputs from the microphone-equipped devices are connected directly to circuitry of the stand by an analog multiplexer. In addition, two or more microphone-equipped devices can be time-synchronized via a shared clock signal which is connected via electrical contact when the devices are in their static positions.
In any case, if the microphone-equipped devices are not in their static positions, the user can adjust them manually until they are (S30). In some examples, the means for holding the electronic devices ensures that the microphone-equipped devices are in their static positions and accordingly, step (S20) and step (S30) are omitted.
Once the microphone-equipped devices are in their respective static positions, they are powered on, charged, if necessary, and controlled to begin capturing sound in the ambient environment (S40). At this time, the audio signals produced by the microphones are processed based on the information of their relative positions and orientations (S50). Such information is representative of the specification of a microphone array constituted by the microphone-equipped devices, collectively. For example, the processing includes time-synchronizing the signals from the microphone(s) of the respective devices. Next, the processed signals are transmitted (S60), as enhanced by the signal processing (S50). Because the microphone-equipped devices are held in static positions at which their relative positions and orientations are known, the audio output by the microphones of the devices can be spatially and directionally controlled, i.e., so that the microphones can produce high-fidelity audio.
In addition, the present technology is well-suited to audio beamforming. Here, the microphone array may be controlled to selectively focus on sound coming from a specific direction, effectively “steering” a beam of sound towards a desired source of sound (e.g., a particular speaker) while minimizing noise from other directions by manipulating the timing and amplitude of signals received by the respective microphone(s) of the devices. The audio apparatus may thus act like a virtual microphone which captures sound from a targeted area. In other words, an algorithm tailored to the specifications and static positions of the microphone-equipped devices, and executed as part of the signal processing (S50), permits the creation of a virtual microphone(s) which can home in on or reject particular sources of sound. Accordingly, the audio apparatus of the present technology is useful for live or remote conferencing.
In some examples such as those shown in and described later with respect to
Referring now to
Furthermore, in this example, the case 220 has a base 220a and a cover 220b attached to the base 220a by a hinge 223 or the like so that the cover 220b is openable and closeable relative to the base 220a. Respective portions of each of the cavities 221, 222 are provided in the base 220a and cover 220b, respectively. However, the cavities 221, 222 may be provided entirely in the base 220a and the cover 220b may simply cap the ear-worn audio devices 210 when closed. In some examples the cover 220b is provided with an elastic element that presses against the ear-worn audio devices 210 and constitutes the means for holding the ear-worn audio devices 210 in their static positions.
The case 220 in the present example is of a hand-held type but may have various other shapes, configurations and compositions so long as the cavities 221, 222 hold the ear-worn audio devices 210 in static positions under a high tolerance press-fit with the microphones acoustically exposed to the ambient environment outside the case 220. To this end, the case 220 has acoustical channels 224 extending from an exterior surface thereof to the cavities 221, 222, respectively, and hence to the microphones of the ear-worn audio devices 210. The acoustical channels 224 acoustically expose or “couple” the microphones of the ear-worn audio devices 210 to the ambient environment outside the case 220. The acoustical channels 224 may be one or more openings (vents) in the case, as shown, and may additionally or alternatively be formed of acoustically transparent material that allows sound to pass easily therethrough in both directions. Examples of such acoustically transparent material include speaker grill cloth, some plastics, wire screen and mesh, open cell foam, and acoustically transparent fabrics, known per se. The case 220 may also have one or more elastic gaskets, e.g., an O-ring, extending around the acoustic channels 223, as interposed between the cavities 221, 222 and the exterior of the case to acoustically isolate a microphone(s)/microphone array with respect to the acoustic channel 224 associated therewith. Alternatively, the entirety of the case 220, or part of the case 220 that delimits the cavities 221, 222, may be made of acoustically transparent material such as the above-mentioned open cell foam. In this example, material of the case 220 serves as the acoustical channels 224.
The case 220 also includes sensors 225 operative to detect when the ear-worn audio devices 310 are in their respective static positions. In the illustrated example, the sensors are proximity sensors comprising metallic plates located within the cavities 221, 222 to confirm that the ear-worn audio devices 310 have been fully and correctly inserted into the cavities 221, 222. Alternatively or additionally, a proximity sensor may be disposed near the hinge 223 of the case 220 to check whether the cover 220b has been properly closed, because proper closure of the cover 220b can confirm that the ear-worn audio devices 210 are in their static positions owing to the small tolerances of the cavities 221, 223 with respect to the ear-worn audio devices 210. An example of a proximity sensor well-suited for checking whether the cover 220b has been properly closed is a magnetic proximity sensor. The case 220 may also have a closure 226 by which the cover 220b can be latched to the base 220a when closed. In some examples, the closure 226 is provided with a sensor that senses when it is in the closed position, and thereby confirms that the microphone-equipped devices have been fully and correctly inserted into the cavities 221, 222.
In this example, the stand 320 has acoustical channels 324 extending from an exterior surface thereof to the cavities 321a, 321b, 322a, 322b, respectively, and hence to the microphones of the mobile phones 310 received in the cavities 321a, 321b, 322a, 322b. The acoustical channels 324 acoustically expose the microphones of the mobile phones 310 to the ambient environment outside the stand 420. The acoustical channels 324 may be realized similarly to the acoustical channels 224 described with respect to those of the case 220. Likewise, the stand 320 also includes sensors 325 operative to detect when the mobile phones 310 are in their respective static positions. In the illustrated example, the sensors are proximity sensors comprising metallic plates facing the cavities 321a, 321b, 322a, 322b to confirm that the mobile phones 310 have been fully and correctly inserted into the cavities 321a, 321b, 322a, 322b.
Still referring to
The state machine protocol 420, as set by operating instructions stored in non-transitory CRM of the software module, is responsible for facilitating the triggering and toggling between different operating states and the displaying the current state of the audio apparatus to the user on a user interface. The user interface allows the user to trigger changes between the different operating states and input various commands to the audio apparatus through conventional means, per se, such as buttons, icons on a touch screen, etc. The current state of the audio apparatus may include information about whether the apparatus is recording sound, information on the state of a battery/batteries of the stand/microphone-equipped devices and charging status of the same, and information of system errors.
The different operating states may include those found in True Wireless Stereo (TWS) earbuds, but with the addition of an “active listening” state for when the microphone-equipped devices are used as a beamformer to capture sound and produce high-fidelity audio. In particular, the states include those related to the external microphone-equipped devices such as whether the microphone(s) are being used to capture audio and analyze its spatial characteristics, states related to the audio apparatus itself such as whether the apparatus is transmitting the captured audio externally, and finally states that are related to power control. The state machine protocol 420 is also responsible for handling communication with other components of the audio apparatus such as a power control module for determining charging states, the user interface, and the sensor of the stand responsible for detecting when the microphone-equipped devices are in their predetermined static positions. The state machine protocol 420 also includes operating instructions for updating the current state of the audio apparatus in real time and may always display the state on the user interface.
More specifically, the user interface is preferably first used for signifying the current status and relevant system notifications of the audio apparatus, e.g., whether the apparatus is performing audio recording, its battery status, charging status, system notifications and system error. The apparatus status relates to the charging state of the stand or the microphone-equipped devices (i.e., whether the microphone-equipped devices are being charged, whether the stand of the apparatus is charging or plugged into an external power source, etc.), or the recording state of the apparatus (e.g., whether the apparatus is being currently used as a beamformer to record spatial/directional audio, whether the apparatus is currently transmitting recorded audio to other external devices, etc.). System errors can be battery-related (e.g., exceedingly low power level), hardware-related (e.g., disconnected circuitry, degraded circuitry), or software-related (e.g., error in capturing, processing, or transmitting audio).
The user interface can also take in user commands for executing actions such as changing the operating state from charging to record spatial and directional audio or transmitting captured audio or toggling between charging states of the microphone-equipped devices, e.g., setting the power consumption and usage of the microphone-equipped devices when recording to be more efficient or high-performing.
To any or all of these ends, the user interface, as provided by a mobile phone application, for example, may include a display such as a touch screen or monitor that displays the current state of the device to the user. The user can interact with the interface, retrieve additional information, toggle between operating states, and send commands by directly swiping and tapping on the screen to send commands or view additional information of the status of the audio apparatus.
The user interface may also or alternatively comprise tactile buttons and LED lights located at different positions on the stand. In some examples, a single button is dedicated to handle multiple commands when pressed in different patterns such as being held down for an extended period or being subjected to a quick press or double and multiple consecutive presses. In other examples, the user interface comprises multiple buttons with each button being dedicated for a subset of the potential user commands. For example, one button can be dedicated for all commands related to charging and another for toggling between audio recording states. The LED lights are used to signify the current state of the audio apparatus through their color or blinking patterns.
Another example of the user interface uses the audio modality of the apparatus for receiving auditory commands and notifying the user of the current operating state. In this example, the audio apparatus includes internal speakers located directly on the stand to voice out changes in the operating state and general system notifications, and a dedicated microphone(s) and processor for detecting and executing the audio commands spoken by the user. In some examples, the audio apparatus includes transmitters for forwarding the user's audio commands to external processors of a smart phone, laptop, desktop or the like or to (a server on) the cloud responsible for analyzing and processing and the audio signals. After analysis, apparatus-specific commands can be sent back to the audio apparatus to be executed accordingly. Some examples of this user interface include a tactile button or a set of tactile buttons that trigger the internal speaker to voice out the current state or most recent system notification of the apparatus when pressed by the user. Depending on the frequency and pattern at which the button(s) is/are pressed by the user (e.g., quickpress, double-press, holding down), the internal speakers can voice out different information related to the status of the audio apparatus.
The audio capture and transmission protocol 430 executable by a processing unit of the software module includes audio (sound) capture 431, synchronization 432, and enhanced audio transmission 433 protocols and a command for the audio apparatus to stop recording 434. In this example, the software module is configured with a beamforming algorithm based on specifications of the microphone array (characteristics of the microphone-equipped devices and the relative positions and orientations of their microphones in the audio apparatus), and transmission functionalities responsible for controlling the transmitting of outputs from the microphone-equipped devices. The beamforming algorithm as is known, per se, may apply digital filters to the inputted audio signals to in effect focus on sound in the ambient environment coming from one direction while attenuating sound coming from other directions.
Under the capture audio and synchronize protocols 431 and 432, streams of signals of raw audio captured from each of the microphone-equipped devices (210 or 310, for example) are input to the beamforming algorithm, the algorithm synchronizes the inputted signals, with respect to time, to prevent mismatch across the different microphones or sets of microphones, and finally the algorithm executes a beamforming computation on the synchronized signals to produce directional and spatial audio according to a methodology (filtering) known, per se. Under the transmit protocol 433, the audio enhanced as a result of the beamforming computation is transmitted to the cloud or to external devices such as a smart phone or laptop or desktop computer. The transmitting of the audio can also happen before the synchronization and beamforming; that is, the raw captured audio can be first transmitted to an external device or to the cloud and be later synchronized and processed to provide the enhanced audio. After transmission, the user then can trigger a stop recording session command 434. At this time, the microphone-equipped devices could be removed from the stand and returned to their original use(s) as ear-worn audio devices 210 or mobile phones 310, for example.
The power control protocol 440, as established by the power control module of the audio apparatus, will now be described in more detail. In examples of the audio apparatus according to the present technology such as in the audio apparatus 200 or 300 shown in and described with reference to
In some examples, this component utilizes removable and commercial batteries for charging the apparatus. In some examples, the audio apparatus includes rechargeable or non-rechargeable battery slots in its stand. In examples like that shown in and described with reference to
In one embodiment, e.g., in an example of audio apparatus 200 shown in
In another embodiment of the audio apparatus, e.g., in an example of audio apparatus 300 shown in
In still other examples, processing units of the microphone-equipped devices, e.g., the digital signal processor(s) of ear-worn audio devices 210 or the processing unit(s) of the mobile phones 310, are configured with the beamforming algorithm and components for synchronization of microphone signals. One of the processing units of the microphone-equipped devices may be configured with the processing units according to the present technology so as to serve as a master unit of the audio apparatus.
Audio apparatus according to the present technology may be provided with one or more of the following: (1) positioning and motion sensors, (2) audio-out devices such as internal speakers and audio-out ports, (3) on-device microprocessors for further processing the audio input after applying the beamforming algorithm, and (4) additional custom circuitries used for audio amplification or denoising. Examples of the positioning and motion sensors include an accelerometer(s), a gyroscope, a Global Positioning System (GPS) and an Inertial Measurement Unit (IMU) serving to facilitate the calculation and processing of the captured audio. The positioning and motion sensors can be integrated into the processing unit, user interface, and relevant circuitries to further enhance the accuracy and fidelity of the spatial and directional audio recording from the external microphone arrays. To this end the positioning and motion sensors impart a capability of using the additional real-time information provided by the sensors on global and relative positioning, acceleration as well as angular and rotation velocity around different axes (i.e., roll, pitch, yaw), For example, if the position and orientation of the device or the erratic movement of the device or the speaker are deemed by the sensors to be less than ideal or suitable for capturing spatial and directional audio, the device can leverage position and movement information to send out a system notification or status update to the user through the user interface and prompt the user to make appropriate adjustments if necessary (e.g., reorienting the device, stop moving the device, asking the speakers to move less erratically). In this respect, the position and motion sensors are especially useful when the audio apparatus is being carried or transported or implemented as a wearable device.
In some examples, multiple Inertial Measurement Units are provided for detecting the current orientation and position of the audio apparatus such as whether it is standing upright or lying flat on one of its sides. Using this information, the audio apparatus can subsequently modify and adapt the employed beamforming algorithm to ensure the quality of the captured directional and spatial audio. In situations where the inferred position and orientation of the device is judged to be unfit for optimal spatial and directional audio capture, if the microphone array is positioned relatively far from or collinear with the different speakers in an audio recording setting, the apparatus can send out a notification to the user and a request to move and adjust the orientation or positioning of the apparatus before proceeding with audio capturing.
In some examples, the audio apparatus includes an accelerometer or a series of accelerometers so that if the movement (i.e., acceleration and velocity) of the audio apparatus is deemed to be too fast, i.e., exceeding a given threshold, the processing unit utilizes this information to prompt the user to retard the movement of the apparatus. Using an accelerometer, the processing unit of the audio apparatus can also leverage information of the displacement and movement of the apparatus, such as velocity and acceleration in a Cartesian coordinate system to provide a more accurate mapping between the captured spatial and directional audio to the global space. More specifically, as opposed to the device capturing where the speaker or audio source is only relative to the case itself (i.e., local coordinate system), a device that uses an accelerometer or a series of them can ideally also allow the device to determine where the audio source occurred in the large global space and coordinate system even when the device is moved or constantly moving while performing audio capture.
As previously mentioned with reference to the user interface, some examples of the audio apparatus include audio-out devices such as internal speakers or an audio-out port embedded in the device. In addition to serving the functionality of announcing the state and system notifications of the device, internal speakers and audio-out ports can also be used to broadcast previously recorded audio as well as other pre-loaded, downloaded, or streamed music. Audio apparatus according to the present technology may include multiple speakers in the stand of the device (see
In examples of the audio apparatus having on-device microprocessors, according to the present technology, the microprocessors are used for post processing the audio audio after the beamforming algorithm has been executed to enhance the audio. In contrast to microprocessors of a processing unit for processing the raw audio input into directional and spatial audio, these on-device microprocessors can be used to provide additional auxiliary functionalities such as speech recognition, speech-to-text transcription, and speaker labeling. The on-device microprocessors may also be configured with more advanced forms of Natural Language Processing algorithms such as Large Language Models, may be configured to perform sentiment analysis, and may constitute a virtual assistant. In some examples, the processing unit of the audio apparatus has microprocessors configured to perform speech-to-text transcription and speaker identification, which will allow the apparatus to transmit both the processed directional and spatial audio, as well as the text transcriptions from each of the distinctive speakers in real-time. In some examples in which the on-device microprocessors are configured for real-time speaker sentiment analysis, the categorization of the sentiments is reflected synchronously through the user interface by such means as changes in LED colors or dynamic and adaptive visualizations on a touch display. Additionally, results of the sentiment analysis can be transmitted alongside the directional and spatial audio information in real-time to external outlets.
In some examples of audio apparatus according to the present technology, the processing unit may include custom circuitry comprising an amplifier or denoising circuitry for amplifying or denoising the signals from the microphone(s) of the microphone-equipped devices. These circuitries can be located either in the stand or in an external case externally linked and connected to the stand. Although the beamforming algorithm could be used to denoise the raw captured audio input, additional denoising circuitry can further improve this effect. While not necessary, additional circuitry that is used to amplify or denoise the received audio input can help to enhance the quality and clarify of the captured directional and spatial audio. Some examples of audio apparatus according to the present technology include a customized amplification circuit that will serve to augment the raw audio inputs captured directly from the microphones of the microphone-equipped devices. The amplification circuit preferably accounts for the specific design and spatial configuration of the acoustical channels (e.g., 224 or 324), especially when they are constituted by a transparent acoustical material such as an open cell foam, when pre-processing the raw audio inputs. In other examples, on-device denoising circuitry is used to filter out ambient sounds from the captured audio inputs. By, for example, thresholding the raw audio before inputting them to the beamforming algorithm to further process, the denoising circuitry can further enhance the clarity and quality of the processed directional and spatial audio.
Finally, although the present technology has been described above in detail with respect to various embodiments and examples thereof, the present technology may be embodied in many other different forms. Thus, the present invention should not be construed as being limited to the embodiments and their examples described above. Rather, these embodiments and examples were described so that this disclosure is thorough, complete, and fully conveys the present invention to those skilled in the art. Thus, various alternatives or changes to and modifications of the disclosed embodiments are seen to be within the true spirit and scope of the present invention as defined by the appended claims.
The present utility patent application is related to and claims the benefit of priority of U.S. Provisional Patent Application No. 63/616,620 filed Dec. 31, 2023, entitled “Multi-microphone Device For Enhanced Audio Capture and Transmission”, and the entire contents of which are hereby incorporated by reference.
Number | Date | Country | |
---|---|---|---|
63616620 | Dec 2023 | US |