This application claims the benefit of U.S. Provisional Patent Application No. 62/821,800, filed Mar. 21, 2019, U.S. Provisional Patent Application No. 62/855,187, filed May 31, 2019, and U.S. Provisional Patent Application No. 62/971,648, filed Feb. 7, 2020. The contents of each application are fully incorporated by reference in their entirety herein.
This application generally relates to an array microphone having automatic focus and placement of beamformed microphone lobes. In particular, this application relates to an array microphone that adjusts the focus and placement of beamformed microphone lobes based on the detection of sound activity after the lobes have been initially placed, and allows inhibition of the adjustment of the focus and placement of the beamformed microphone lobes based on a remote far end audio signal.
Conferencing environments, such as conference rooms, boardrooms, video conferencing applications, and the like, can involve the use of microphones for capturing sound from various audio sources active in such environments. Such audio sources may include humans speaking, for example. The captured sound may be disseminated to a local audience in the environment through amplified speakers (for sound reinforcement), and/or to others remote from the environment (such as via a telecast and/or a webcast). The types of microphones and their placement in a particular environment may depend on the locations of the audio sources, physical space requirements, aesthetics, room layout, and/or other considerations. For example, in some environments, the microphones may be placed on a table or lectern near the audio sources. In other environments, the microphones may be mounted overhead to capture the sound from the entire room, for example. Accordingly, microphones are available in a variety of sizes, form factors, mounting options, and wiring options to suit the needs of particular environments.
Traditional microphones typically have fixed polar patterns and few manually selectable settings. To capture sound in a conferencing environment, many traditional microphones can be used at once to capture the audio sources within the environment. However, traditional microphones tend to capture unwanted audio as well, such as room noise, echoes, and other undesirable audio elements. The capturing of these unwanted noises is exacerbated by the use of many microphones.
Array microphones having multiple microphone elements can provide benefits such as steerable coverage or pick up patterns (having one or more lobes), which allow the microphones to focus on the desired audio sources and reject unwanted sounds such as room noise. The ability to steer audio pick up patterns provides the benefit of being able to be less precise in microphone placement, and in this way, array microphones are more forgiving. Moreover, array microphones provide the ability to pick up multiple audio sources with one array microphone or unit, again due to the ability to steer the pickup patterns.
However, the position of lobes of a pickup pattern of an array microphone may not be optimal in certain environments and situations. For example, an audio source that is initially detected by a lobe may move and change locations. In this situation, the lobe may not optimally pick up the audio source at the its new location.
Accordingly, there is an opportunity for an array microphone that addresses these concerns. More particularly, there is an opportunity for an array microphone that automatically focuses and/or places beamformed microphone lobes based on the detection of sound activity after the lobes have been initially placed, while also being able to inhibit the focus and/or placement of the beamformed microphone lobes based on a remote far end audio signal, which can result in higher quality sound capture and more optimal coverage of environments.
The invention is intended to solve the above-noted problems by providing array microphone systems and methods that are designed to, among other things: (1) enable automatic focusing of beamformed lobes of an array microphone in response to the detection of sound activity, after the lobes have been initially placed; (2) enable automatic placement of beamformed lobes of an array microphone in response to the detection of sound activity; (3) enable automatic focusing of beamformed lobes of an array microphone within lobe regions in response to the detection of sound activity, after the lobes have been initially placed; and (4) inhibit or restrict the automatic focusing or automatic placement of beamformed lobes of an array microphone, based on activity of a remote far end audio signal.
In an embodiment, beamformed lobes that have been positioned at initial coordinates may be focused by moving the lobes to new coordinates in the general vicinity of the initial coordinates, when new sound activity is detected at the new coordinates.
In another embodiment, beamformed lobes may be placed or moved to new coordinates, when new sound activity is detected at the new coordinates.
In a further embodiment, beamformed lobes that have been positioned at initial coordinates may be focused by moving the lobes, but confined within lobe regions, when new sound activity is detected at the new coordinates.
In another embodiment, the movement or placement of beamformed lobes may be inhibited or restricted, when the activity of a remote far end audio signal exceeds a predetermined threshold.
These and other embodiments, and various permutations and aspects, will become apparent and be more fully understood from the following detailed description and accompanying drawings, which set forth illustrative embodiments that are indicative of the various ways in which the principles of the invention may be employed.
The description that follows describes, illustrates and exemplifies one or more particular embodiments of the invention in accordance with its principles. This description is not provided to limit the invention to the embodiments described herein, but rather to explain and teach the principles of the invention in such a way to enable one of ordinary skill in the art to understand these principles and, with that understanding, be able to apply them to practice not only the embodiments described herein, but also other embodiments that may come to mind in accordance with these principles. The scope of the invention is intended to cover all such embodiments that may fall within the scope of the appended claims, either literally or under the doctrine of equivalents.
It should be noted that in the description and drawings, like or substantially similar elements may be labeled with the same reference numerals. However, sometimes these elements may be labeled with differing numbers, such as, for example, in cases where such labeling facilitates a more clear description. Additionally, the drawings set forth herein are not necessarily drawn to scale, and in some instances proportions may have been exaggerated to more clearly depict certain features. Such labeling and drawing practices do not necessarily implicate an underlying substantive purpose. As stated above, the specification is intended to be taken as a whole and interpreted in accordance with the principles of the invention as taught herein and understood to one of ordinary skill in the art.
The array microphone systems and methods described herein can enable the automatic focusing and placement of beamformed lobes in response to the detection of sound activity, as well as allow the focus and placement of the beamformed lobes to be inhibited based on a remote far end audio signal. In embodiments, the array microphone may include a plurality of microphone elements, an audio activity localizer, a lobe auto-focuser, a database, and a beamformer. The audio activity localizer may detect the coordinates and confidence score of new sound activity, and the lobe auto-focuser may determine whether there is a previously placed lobe nearby the new sound activity. If there is such a lobe and the confidence score of the new sound activity is greater than a confidence score of the lobe, then the lobe auto-focuser may transmit the new coordinates to the beamformer so that the lobe is moved to the new coordinates. In these embodiments, the location of a lobe may be improved and automatically focused on the latest location of audio sources inside and near the lobe, while also preventing the lobe from overlapping, pointing in an undesirable direction (e.g., towards unwanted noise), and/or moving too suddenly.
I other embodiments, the array microphone may include a plurality of microphone elements, an audio activity localizer, a lobe auto-placer, a database, and a beamformer. The audio activity localizer may detect the coordinates of new sound activity, and the lobe auto-placer may determine whether there is a lobe nearby the new sound activity. If there is not such a lobe, then the lobe auto-placer may transmit the new coordinates to the beamformer so that an inactive lobe is placed at the new coordinates or so that an existing lobe is moved to the new coordinates. In these embodiments, the set of active lobes of the array microphone may point to the most recent sound activity in the coverage area of the array microphone.
In other embodiments, the audio activity localizer may detect the coordinates and confidence score of new sound activity, and if the confidence score of the new sound activity is greater than a threshold, the lobe auto-focuser may identify a lobe region that the new sound activity belongs to. In the identified lobe region, a previously placed lobe may be moved if the coordinates are within a look radius of the current coordinates of the lobe, i.e., a three-dimensional region of space around the current coordinates of the lobe where new sound activity can be considered. The movement of the lobe in the lobe region may be limited to within a move radius of the current coordinates of the lobe, i.e., a maximum distance in three-dimensional space that the lobe is allowed to move, and/or limited to outside a boundary cushion between lobe regions, i.e., how close a lobe can move to the boundaries between lobe regions. In these embodiments, the location of a lobe may be improved and automatically focused on the latest location of audio sources inside the lobe region associated with the lobe, while also preventing the lobes from overlapping, pointing in an undesirable direction (e.g., towards unwanted noise), and/or moving too suddenly.
In further embodiments, an activity detector may receive a remote audio signal, such as from a far end. The sound of the remote audio signal may be played in the local environment, such as on a loudspeaker within a conference room. If the activity of the remote audio signal exceeds a predetermined threshold, then the automatic adjustment (i.e., focus and/or placement) of beamformed lobes may be inhibited from occurring. For example, the activity of the remote audio signal could be measured by the energy level of the remote audio signal. In this example, the energy level of the remote audio signal may exceed the predetermined threshold when there is a certain level of speech or voice contained in the remote audio signal. In this situation, it may be desirable to prevent automatic adjustment of the beamformed lobes so that lobes are not directed to pick up the sound from the remote audio signal, e.g., that is being played in local environment. However, if the energy level of the remote audio signal does not exceed the predetermined threshold, then the automatic adjustment of beamformed lobes may be performed. The automatic adjustment of the beamformed lobes may include, for example, the automatic focus and/or placement of the lobes as described herein. In these embodiments, the location of a lobe may be improved and automatically focused and/or placed when the activity of the remote audio signal does not exceed a predetermined threshold, and inhibited or restricted from being automatically focused and/or placed when the activity of the remote audio signal exceeds the predetermined threshold.
Through the use of the systems and methods herein, the quality of the coverage of audio sources in an environment may be improved by, for example, ensuring that beamformed lobes are optimally picking up the audio sources even if the audio sources have moved and changed locations from an initial position. The quality of the coverage of audio source in an environment may also be improved by, for example, reducing the likelihood that beamformed lobes are deployed (e.g., focused or placed) to pick up unwanted sounds like voice, speech, or other noise from the far end.
The array microphone 100, 400 may be placed on or in a table, lectern, desktop, wall, ceiling, etc. so that the sound from the audio sources can be detected and captured, such as speech spoken by human speakers. The array microphone 100, 400 may include any number of microphone elements 102a,b, . . . ,zz, 402a,b, . . . ,zz, for example, and be able to form multiple pickup patterns with lobes so that the sound from the audio sources can be detected and captured. Any appropriate number of microphone elements 102, 402 are possible and contemplated.
Each of the microphone elements 102, 402 in the array microphone 100, 400 may detect sound and convert the sound to an analog audio signal. Components in the array microphone 100, 400, such as analog to digital converters, processors, and/or other components, may process the analog audio signals and ultimately generate one or more digital audio output signals. The digital audio output signals may conform to the Dante standard for transmitting audio over Ethernet, in some embodiments, or may conform to another standard and/or transmission protocol. In embodiments, each of the microphone elements 102, 402 in the array microphone 100, 400 may detect sound and convert the sound to a digital audio signal.
One or more pickup patterns may be formed by a beamformer 170, 470 in the array microphone 100, 400 from the audio signals of the microphone elements 102, 402. The beamformer 170, 470 may generate digital output signals 190a,b,c, . . . z, 490a,b,c, . . . ,z corresponding to each of the pickup patterns. The pickup patterns may be composed of one or more lobes, e.g., main, side, and back lobes. In other embodiments, the microphone elements 102, 402 in the array microphone 100, 400 may output analog audio signals so that other components and devices (e.g., processors, mixers, recorders, amplifiers, etc.) external to the array microphone 100, 400 may process the analog audio signals.
The array microphone 100 of
The array microphone 400 of
In embodiments, the array microphone 100, 400 may include other components, such as an acoustic echo canceller or an automixer, that works with the audio activity localizer 150, 450 and/or the beamformer 170, 470. For example, when a lobe is moved to new coordinates in response to detecting new sound activity, as described herein, information from the movement of the lobe may be utilized by an acoustic echo canceller to minimize echo during the movement and/or by an automixer to improve its decision making capability. As another example, the movement of a lobe may be influenced by the decision of an automixer, such as allowing a lobe to be moved that the automixer has identified as having pertinent voice activity. The beamformer 170, 470 may be any suitable beamformer, such as a delay and sum beamformer or a minimum variance distortionless response (MVDR) beamformer.
The various components included in the array microphone 100, 400 may be implemented using software executable by one or more servers or computers, such as a computing device with a processor and memory, graphics processing units (GPUs), and/or by hardware (e.g., discrete logic circuits, application specific integrated circuits (ASIC), programmable gate arrays (PGA), field programmable gate arrays (FPGA), etc.
In some embodiments, the microphone elements 102, 402 may be arranged in concentric rings and/or harmonically nested. The microphone elements 102, 402 may be arranged to be generally symmetric, in some embodiments. In other embodiments, the microphone elements 102, 402 may be arranged asymmetrically or in another arrangement. In further embodiments, the microphone elements 102, 402 may be arranged on a substrate, placed in a frame, or individually suspended, for example. An embodiment of an array microphone is described in commonly assigned U.S. Pat. No. 9,565,493, which is hereby incorporated by reference in its entirety herein. In embodiments, the microphone elements 102, 402 may be unidirectional microphones that are primarily sensitive in one direction. In other embodiments, the microphone elements 102, 402 may have other directionalities or polar patterns, such as cardioid, subcardioid, or omnidirectional, as desired. The microphone elements 102, 402 may be any suitable type of transducer that can detect the sound from an audio source and convert the sound to an electrical audio signal. In an embodiment, the microphone elements 102, 402 may be micro-electrical mechanical system (MEMS) microphones. In other embodiments, the microphone elements 102, 402 may be condenser microphones, balanced armature microphones, electret microphones, dynamic microphones, and/or other types of microphones. In embodiments, the microphone elements 102, 402 may be arrayed in one dimension or two dimensions. The array microphone 100, 400 may be placed or mounted on a table, a wall, a ceiling, etc., and may be next to, under, or above a video monitor, for example.
An embodiment of a process 200 for automatic focusing of previously placed beamformed lobes of the array microphone 100 is shown in
At step 202, the coordinates and a confidence score corresponding to new sound activity may be received at the lobe auto-focuser 160 from the audio activity localizer 150. The audio activity localizer 150 may continuously scan the environment of the array microphone 100 to find new sound activity. The new sound activity found by the audio activity localizer 150 may include suitable audio sources, e.g., human speakers, that are not stationary. The coordinates of the new sound activity may be a particular three dimensional coordinate relative to the location of the array microphone 100, such as in Cartesian coordinates (i.e., x, y, z), or in spherical coordinates (i.e., radial distance/magnitude r, elevation angle θ (theta), azimuthal angle φ (phi)). The confidence score of the new sound activity may denote the certainty of the coordinates and/or the quality of the sound activity, for example. In embodiments, other suitable metrics related to the new sound activity may be received and utilized at step 202. It should be noted that Cartesian coordinates may be readily converted to spherical coordinates, and vice versa, as needed.
The lobe auto-focuser 160 may determine whether the coordinates of the new sound activity are nearby (i.e., in the vicinity of) an existing lobe, at step 204. Whether the new sound activity is nearby an existing lobe may be based on the difference in azimuth and/or elevation angles of (1) the coordinates of the new sound activity and (2) the coordinates of the existing lobe, relative to a predetermined threshold. The distance of the new sound activity away from the microphone 100 may also influence the determination of whether the coordinates of the new sound activity are nearby an existing lobe. The lobe auto-focuser 160 may retrieve the coordinates of the existing lobe from the database 180 for use in step 204, in some embodiments. An embodiment of the determination of whether the coordinates of the new sound activity are nearby an existing lobe is described in more detail below with respect to
If the lobe auto-focuser 160 determines that the coordinates of the new sound activity are not nearby an existing lobe at step 204, then the process 200 may end at step 210 and the locations of the lobes of the array microphone 100 are not updated. In this scenario, the coordinates of the new sound activity may be considered to be outside the coverage area of the array microphone 100 and the new sound activity may therefore be ignored. However, if at step 204 the lobe auto-focuser 160 determines that the coordinates of the new sound activity are nearby an existing lobe, then the process 200 continues to step 206. In this scenario, the coordinates of the new sound activity may be considered to be an improved (i.e., more focused) location of the existing lobe.
At step 206, the lobe auto-focuser 160 may compare the confidence score of the new sound activity to the confidence score of the existing lobe. The lobe auto-focuser 160 may retrieve the confidence score of the existing lobe from the database 180, in some embodiments. If the lobe auto-focuser 160 determines at step 206 that the confidence score of the new sound activity is less than (i.e., worse than) the confidence score of the existing lobe, then the process 200 may end at step 210 and the locations of the lobes of the array microphone 100 are not updated. However, if the lobe auto-focuser 160 determines at step 206 that the confidence score of the new sound activity is greater than or equal to (i.e., better than or more favorable than) the confidence score of the existing lobe, then the process 200 may continue to step 208. At step 208, the lobe auto-focuser 160 may transmit the coordinates of the new sound activity to the beamformer 170 so that the beamformer 170 can update the location of the existing lobe to the new coordinates. In addition, the lobe auto-focuser 160 may store the new coordinates of the lobe in the database 180.
In some embodiments, at step 208, the lobe auto-focuser 160 may limit the movement of an existing lobe to prevent and/or minimize sudden changes in the location of the lobe. For example, the lobe auto-focuser 160 may not move a particular lobe to new coordinates if that lobe has been recently moved within a certain recent time period. As another example, the lobe auto-focuser 160 may not move a particular lobe to new coordinates if those new coordinates are too close to the lobe's current coordinates, too close to another lobe, overlapping another lobe, and/or considered too far from the existing position of the lobe.
The process 200 may be continuously performed by the array microphone 100 as the audio activity localizer 150 finds new sound activity and provides the coordinates and confidence score of the new sound activity to the lobe auto-focuser 160. For example, the process 200 may be performed as audio sources, e.g., human speakers, are moving around a conference room so that one or more lobes can be focused on the audio sources to optimally pick up their sound.
An embodiment of a process 300 for automatic focusing of previously placed beamformed lobes of the array microphone 100 using a cost functional is shown in
Steps 302, 304, and 306 of the process 300 for the lobe auto-focuser 160 may be substantially the same as steps 202, 204, and 206 of the process 200 of
A cost functional for a lobe may take into account spatial aspects of the lobe and the audio quality of the new sound activity. As used herein, a cost functional and a cost function have the same meaning. In particular, the cost functional for a lobe i may be defined in some embodiments as a function of the coordinates of the new sound activity (LCi), a signal-to-noise ratio for the lobe (SNRi), a gain value for the lobe (Gaini), voice activity detection information related to the new sound activity (VADi), and distances from the coordinates of the existing lobe (distance (LOi)). In other embodiments, the cost functional for a lobe may be a function of other information. The cost functional for a lobe i can be written as Ji(x, y, z) with Cartesian coordinates or Ji(azimuth, elevation, magnitude) with spherical coordinates, for example. Using the cost functional with Cartesian coordinates as exemplary, the cost functional Ji(x, y, z)=f (LCi, distance(LOi), Gaini, SNRi, VADi). Accordingly, the lobe may be moved by evaluating and maximizing the cost functional Ji over a spatial grid of coordinates, such that the movement of the lobe is in the direction of the gradient (i.e., steepest ascent) of the cost functional. The maximum of the cost functional may be the same as the coordinates of the new sound activity received by the lobe auto-focuser 160 at step 302 (i.e., the candidate location), in some situations. In other situations, the maximum of the cost functional may move the lobe to a different position than the coordinates of the new sound activity, when taking into account the other parameters described above.
At step 308, the cost functional for the lobe may be evaluated by the lobe auto-focuser 160 at the coordinates of the new sound activity. The evaluated cost functional may be stored by the lobe auto-focuser 160 in the database 180, in some embodiments. At step 310, the lobe auto-focuser 160 may move the lobe by each of an amount Δx, Δy, Δz in the x, y, and z directions, respectively, from the coordinates of the new sound activity. After each movement, the cost functional may be evaluated by the lobe auto-focuser 160 at each of these locations. For example, the lobe may be moved to a location (x+Δx, y, z) and the cost functional may be evaluated at that location; then moved to a location (x, y+Δy, z) and the cost functional may be evaluated at that location; and then moved to a location (x, y, z+Δz) and the cost functional may be evaluated at that location. The lobe may be moved by the amounts Δx, Δy, Δz in any order at step 310. Each of the evaluated cost functionals at these locations may be stored by the lobe auto-focuser 160 in the database 180, in some embodiments. The evaluations of the cost functional are performed by the lobe auto-focuser 160 at step 310 in order to compute an estimate of partial derivatives and the gradient of the cost functional, as described below. It should be noted that while the description above is with relation to Cartesian coordinates, a similar operation may be performed with spherical coordinates (e.g., Δazimuth, Δelevation, Δmagnitude).
At step 312, the gradient of the cost functional may be calculated by the lobe auto-focuser 160 based on the set of estimates of the partial derivatives. The gradient ∇J may calculated as follows:
At step 314, the lobe auto-focuser 160 may move the lobe by a predetermined step size p in the direction of the gradient ∇J calculated at step 312. In particular, the lobe may be moved to a new location: (xi+μgxi,yi+μgyi,zi+μgzi) The cost functional of the lobe at this new location may also be evaluated by the lobe auto-focuser 160 at step 314. This cost functional may be stored by the lobe auto-focuser 160 in the database 180, in some embodiments.
At step 316, the lobe auto-focuser 160 may compare the cost functional of the lobe at the new location (evaluated at step 314) with the cost functional of the lobe at the coordinates of the new sound activity (evaluated at step 308). If the cost functional of the lobe at the new location is less than the cost functional of the lobe at the coordinates of the new sound activity at step 316, then the step size p at step 314 may be considered as too large, and the process 300 may continue to step 322. At step 322, the step size may be adjusted and the process may return to step 314.
However, if the cost functional of the lobe at the new location is not less than the cost functional of the lobe at the coordinates of the new sound activity at step 316, then the process 300 may continue to step 318. At step 318, the lobe auto-focuser 160 may determine whether the difference between (1) the cost functional of the lobe at the new location (evaluated at step 314) and (2) the cost functional of the lobe at the coordinates of the new sound activity (evaluated at step 308) is close, i.e., whether the absolute value of the difference is within a small quantity E. If the condition is not satisfied at step 318, then it may be considered that a local maximum of the cost functional has not been reached. The process 300 may proceed to step 324 and the locations of the lobes of the array microphone 100 are not updated.
However, if the condition is satisfied at step 318, then it may be considered that a local maximum of the cost functional has been reached and that the lobe has been auto focused, and the process 300 proceeds to step 320. At step 320, the lobe auto-focuser 160 may transmit the coordinates of the new sound activity to the beamformer 170 so that the beamformer 170 can update the location of the lobe to the new coordinates. In addition, the lobe auto-focuser 160 may store the new coordinates of the lobe in the database 180.
In some embodiments, annealing/dithering movements of the lobe may be applied by the lobe auto-focuser 160 at step 320. The annealing/dithering movements may be applied to nudge the lobe out of a local maximum of the cost functional to attempt to find a better local maximum (and therefore a better location for the lobe). The annealing/dithering locations may be defined by (xi+rxi,yi+ryi,zi+rzi), where (rxi, ryi, rzi) are small random values.
The process 300 may be continuously performed by the array microphone 100 as the audio activity localizer 150 finds new sound activity and provides the coordinates and confidence score of the new sound activity to the lobe auto-focuser 160. For example, the process 300 may be performed as audio sources, e.g., human speakers, are moving around a conference room so that one or more lobes can be focused on the audio sources to optimally pick up their sound.
In embodiments, the cost functional may be re-evaluated and updated, e.g., steps 308-318 and 322, and the coordinates of the lobe may be adjusted without needing to receive a set of coordinates of new sound activity, e.g., at step 302. For example, an algorithm may detect which lobe of the array microphone 100 has the most sound activity without providing a set of coordinates of new sound activity. Based on the sound activity information from such an algorithm, the cost functional may be re-evaluated and updated.
An embodiment of a process 500 for automatic placement or deployment of beamformed lobes of the array microphone 400 is shown in
At step 502, the coordinates corresponding to new sound activity may be received at the lobe auto-placer 460 from the audio activity localizer 450. The audio activity localizer 450 may continuously scan the environment of the array microphone 400 to find new sound activity. The new sound activity found by the audio activity localizer 450 may include suitable audio sources, e.g., human speakers, that are not stationary. The coordinates of the new sound activity may be a particular three dimensional coordinate relative to the location of the array microphone 400, such as in Cartesian coordinates (i.e., x, y, z), or in spherical coordinates (i.e., radial distance/magnitude r, elevation angle θ (theta), azimuthal angle φ (phi)).
In embodiments, the placement of beamformed lobes may occur based on whether an amount of activity of the new sound activity exceeds a predetermined threshold.
The activity detector 1904 may detect an amount of activity in the new sound activity. In some embodiments, the amount of activity may be measured as the energy level of the new sound activity. In other embodiments, the amount of activity may be measured using methods in the time domain and/or frequency domain, such as by applying machine learning (e.g., using cepstrum coefficients), measuring signal non-stationarity in one or more frequency bands, and/or searching for features of desirable sound or speech.
In embodiments, the activity detector 1904 may be a voice activity detector (VAD) which can determine whether there is voice and/or noise present in the remote audio signal. A VAD may be implemented, for example, by analyzing the spectral variance of the remote audio signal, using linear predictive coding, applying machine learning or deep learning techniques to detect voice and/or noise, and/or using well-known techniques such as the ITU G.729 VAD, ETSI standards for VAD calculation included in the GSM specification, or long term pitch prediction.
Based on the detected amount of activity, automatic lobe placement may be performed or not performed. The automatic lobe placement may be performed when the detected activity of the new sound activity satisfies predetermined criteria. Conversely, the automatic lobe placement may not be performed when the detected activity of the new sound activity does not satisfy predetermined criteria. For example, satisfying the predetermined criteria may indicate that the new sound activity includes voice, speech, or other sound that is preferably to be picked up by a lobe. As another example, not satisfying the predetermined criteria may indicate that the new sound activity does not include voice, speech, or other sound that is preferably to be picked up by a lobe. By inhibiting automatic lobe placement in this latter scenario, a lobe will not be placed to avoid picking up sound from the new sound activity.
As seen in the process 2000 of
If the amount of activity does not satisfy the predetermined criteria at step 2003, then the process 2000 may end at step 522 and the locations of the lobes of the array microphone 1900 are not updated. The detected amount of activity of the new sound activity may not satisfy the predetermined criteria when there is a relatively low amount of speech of voice in the new sound activity, and/or the voice-to-noise ratio is relatively low. Similarly, the detected amount of activity of the new sound activity may not satisfy the predetermined criteria when there is a relatively high amount of noise in the new sound activity. Accordingly, not automatically placing a lobe to detect the new sound activity may help to ensure that undesirable sound is not picked.
If the amount of activity satisfies the predetermined criteria at step 2003, then the process 2000 may continue to step 504 as described below. The detected amount of activity of the new sound activity may satisfy the predetermined criteria when there is a relatively high amount of speech or voice in the new sound activity, and/or the voice-to-noise ratio is relatively high. Similarly, the detected amount of activity of the new sound activity may satisfy the predetermined criteria when there is a relatively low amount of noise in the new sound activity. Accordingly, automatically placing a lobe to detect the new sound activity may be desirable in this scenario.
Returning to the process 500, at step 504, the lobe auto-placer 460 may update a timestamp, such as to the current value of a clock. The timestamp may be stored in the database 480, in some embodiments. In embodiments, the timestamp and/or the clock may be real time values, e.g., hour, minute, second, etc. In other embodiments, the timestamp and/or the clock may be based on increasing integer values that may enable tracking of the time ordering of events.
The lobe auto-placer 460 may determine at step 506 whether the coordinates of the new sound activity are nearby (i.e., in the vicinity of) an existing active lobe. Whether the new sound activity is nearby an existing lobe may be based on the difference in azimuth and/or elevation angles of (1) the coordinates of the new sound activity and (2) the coordinates of the existing lobe, relative to a predetermined threshold. The distance of the new sound activity away from the microphone 400 may also influence the determination of whether the coordinates of the new sound activity are nearby an existing lobe. The lobe auto-placer 460 may retrieve the coordinates of the existing lobe from the database 480 for use in step 506, in some embodiments. An embodiment of the determination of whether the coordinates of the new sound activity are nearby an existing lobe is described in more detail below with respect to
If at step 506 the lobe auto-placer 460 determines that the coordinates of the new sound activity are nearby an existing lobe, then the process 500 continues to step 520. At step 520, the timestamp of the existing lobe is updated to the current timestamp from step 504. In this scenario, the existing lobe is considered able to cover (i.e., pick up) the new sound activity. The process 500 may end at step 522 and the locations of the lobes of the array microphone 400 are not updated.
However, if at step 506 the lobe auto-placer 460 determines that the coordinates of the new sound activity are not nearby an existing lobe, then the process 500 continues to step 508. In this scenario, the coordinates of the new sound activity may be considered to be outside the current coverage area of the array microphone 400, and therefore the new sound activity needs to be covered. At step 508, the lobe auto-placer 460 may determine whether an inactive lobe of the array microphone 400 is available. In some embodiments, a lobe may be considered inactive if the lobe is not pointed to a particular set of coordinates, or if the lobe is not deployed (i.e., does not exist). In other embodiments, a deployed lobe may be considered inactive based on whether a metric of the deployed lobe (e.g., time, age, etc.) satisfies certain criteria. If the lobe auto-placer 460 determines that there is an inactive lobe available at step 508, then the inactive lobe is selected at step 510 and the timestamp of the newly selected lobe is updated to the current timestamp (from step 504) at step 514.
However, if the lobe auto-placer 460 determines that there is not an inactive lobe available at step 508, then the process 500 may continue to step 512. At step 512, the lobe auto-placer 460 may select a currently active lobe to recycle to be pointed at the coordinates of the new sound activity. In some embodiments, the lobe selected for recycling may be an active lobe with the lowest confidence score and/or the oldest timestamp. The confidence score for a lobe may denote the certainty of the coordinates and/or the quality of the sound activity, for example. In embodiments, other suitable metrics related to the lobe may be utilized. The oldest timestamp for an active lobe may indicate that the lobe has not recently detected sound activity, and possibly that the audio source is no longer present in the lobe. The lobe selected for recycling at step 512 may have its timestamp updated to the current timestamp (from step 504) at step 514.
At step 516, a new confidence score may be assigned to the lobe, both when the lobe is a selected inactive lobe from step 510 or a selected recycled lobe from step 512. At step 518, the lobe auto-placer 460 may transmit the coordinates of the new sound activity to the beamformer 470 so that the beamformer 470 can update the location of the lobe to the new coordinates. In addition, the lobe auto-placer 460 may store the new coordinates of the lobe in the database 480.
The process 500 may be continuously performed by the array microphone 400 as the audio activity localizer 450 finds new sound activity and provides the coordinates of the new sound activity to the lobe auto-placer 460. For example, the process 500 may be performed as audio sources, e.g., human speakers, are moving around a conference room so that one or more lobes can be placed to optimally pick up the sound of the audio sources.
An embodiment of a process 600 for finding previously placed lobes near sound activity is shown in
At step 602, the coordinates corresponding to new sound activity may be received at the lobe auto-focuser 160 or the lobe auto-placer 460 from the audio activity localizer 150, 450, respectively. The coordinates of the new sound activity may be a particular three dimensional coordinate relative to the location of the array microphone 100, 400, such as in Cartesian coordinates (i.e., x, y, z), or in spherical coordinates (i.e., radial distance/magnitude r, elevation angle θ (theta), azimuthal angle φ (phi)). It should be noted that Cartesian coordinates may be readily converted to spherical coordinates, and vice versa, as needed.
At step 604, the lobe auto-focuser 160 or the lobe auto-placer 460 may determine whether the new sound activity is relatively far away from the array microphone 100, 400 by evaluating whether the distance of the new sound activity is greater than a determined threshold. The distance of the new sound activity may be determined by the magnitude of the vector representing the coordinates of the new sound activity. If the new sound activity is determined to be relatively far away from the array microphone 100, 400 at step 604 (i.e., greater than the threshold), then at step 606 a lower azimuth threshold may be set for later usage in the process 600. If the new sound activity is determined to not be relatively far away from the array microphone 100, 400 at step 604 (i.e., less than or equal to the threshold), then at step 608 a higher azimuth threshold may be set for later usage in the process 600.
Following the setting of the azimuth threshold at step 606 or step 608, the process 600 may continue to step 610. At step 610, the lobe auto-focuser 160 or the lobe auto-placer 460 may determine whether there are any lobes to check for their vicinity to the new sound activity. If there are no lobes of the array microphone 100, 400 to check at step 610, then the process 600 may end at step 616 and denote that there are no lobes in the vicinity of the array microphone 100, 400.
However, if there are lobes of the array microphone 100, 400 to check at step 610, then the process 600 may continue to step 612 and examine one of the existing lobes. At step 612, the lobe auto-focuser 160 or the lobe auto-placer 460 may determine whether the absolute value of the difference between (1) the azimuth of the existing lobe and (2) the azimuth of the new sound activity is greater than the azimuth threshold (that was set at step 606 or step 608). If the condition is satisfied at step 612, then it may be considered that the lobe under examination is not within the vicinity of the new sound activity. The process 600 may return to step 610 to determine whether there are further lobes to examine.
However, if the condition is not satisfied at step 612, then the process 600 may proceed to step 614. At step 614, the lobe auto-focuser 160 or the lobe auto-placer 460 may determine whether the absolute value of the difference between (1) the elevation of the existing lobe and (2) the elevation of the new sound activity is greater than a predetermined elevation threshold. If the condition is satisfied at step 614, then it may be considered that the lobe under examination is not within the vicinity of the new sound activity. The process 600 may return to step 610 to determine whether there are further lobes to examine. However, if the condition is not satisfied at step 614, then the process 600 may end at step 618 and denote that the lobe under examination is in the vicinity of the new sound activity.
At least two sets of coordinates may be associated with each lobe of the array microphone 700: (1) original or initial coordinates LOi (e.g., that are configured automatically or manually at the time of set up of the array microphone 700), and (2) current coordinates {right arrow over (LCi)} where a lobe is currently pointing at a given time. The sets of coordinates may indicate the position of the center of a lobe, in some embodiments. The sets of coordinates may be stored in the database 180, in some embodiments.
In addition, each lobe of the array microphone 700 may be associated with a lobe region of three-dimensional space around it. In embodiments, a lobe region may be defined as a set of points in space that is closer to the initial coordinates LOi of a lobe than to the coordinates of any other lobe of the array microphone. In other words, if p is defined as a point in space, then the point p may belong to a particular lobe region LRi, if the distance D between the point p and the center of a lobe i (LOi) is the smallest than for any other lobe, as in the following:
Regions that are defined in this fashion are known as Voronoi regions or Voronoi cells. For example, it can be seen in
In embodiments, the lobe regions may be calculated and/or updated based on sensing the environment (e.g., objects, walls, persons, etc.) that the array microphone 700 is situated in using infrared sensors, visual sensors, and/or other suitable sensors. For example, information from a sensor may be used by the array microphone 700 to set the approximate boundaries for lobe regions, which in turn can be used to place the associated lobes. In further embodiments, the lobe regions may be calculated and/or updated based on a user defining the lobe regions, such as through a graphical user interface of the array microphone 700.
As further shown in
Another parameter is a move radius of a lobe that is a maximum distance in space that the lobe is allowed to move. The move radius of a lobe is generally less than the look radius of the lobe, and may be set to prevent the lobe from moving too far away from the array microphone or too far away from the initial coordinates LOi of the lobe. For example, in
A further parameter is a boundary cushion of a lobe that is a maximum distance in space that the lobe is allowed to move towards a neighboring lobe region and toward the boundary between the lobe regions. For example, in
An embodiment of a process 800 for automatic focusing of previously placed beamformed lobes of the array microphone 700 within associated lobe regions is shown in
Step 802 of the process 800 for the lobe auto-focuser 160 may be substantially the same as step 202 of the process 200 of
At step 806, the lobe auto-focuser 160 may identify the lobe region that the new sound activity is within, i.e., the lobe region which the new sound activity belongs to. In embodiments, the lobe auto-focuser 160 may find the lobe closest to the coordinates of the new sound activity in order to identify the lobe region at step 806. For example, the lobe region may be identified by finding the initial coordinates LOi of a lobe that are closest to the new sound activity, such as by finding an index i of a lobe such that the distance between the coordinates of the new sound activity and the initial coordinates LOi of a lobe is minimized:
The lobe and its associated lobe region that contain the new sound activity may be determined as the lobe and lobe region identified at step 806.
After the lobe region has been identified at step 806, the lobe auto-focuser 160 may determine whether the coordinates of the new sound activity are outside a look radius of the lobe at step 808. If the lobe auto-focuser 160 determines that the coordinates of the new sound activity are outside the look radius of the lobe at step 808, then the process 800 may end at step 820 and the locations of the lobes of the array microphone 700 are not updated. In other words, if the new sound activity is outside the look radius of the lobe, then the new sound activity can be ignored and it may be considered that the new sound activity is outside the coverage of the lobe. As an example, point A in
However, if at step 808 the lobe auto-focuser 160 determines that the coordinates of the new sound activity are not outside (i.e., are inside) the look radius of the lobe, then the process 800 may continue to step 810. In this scenario, the lobe may be moved towards the new sound activity contingent on assessing the coordinates of the new sound activity with respect to other parameters such as a move radius and a boundary cushion, as described below. At step 810, the lobe auto-focuser 160 may determine whether the coordinates of the new sound activity are outside a move radius of the lobe. If the lobe auto-focuser 160 determines that the coordinates of the new sound activity are outside the move radius of the lobe at step 810, then the process 800 may continue to step 816 where the movement of the lobe may be limited or restricted. In particular, at step 816, the new coordinates where the lobe may be provisionally moved to can be set to no more than the move radius. The new coordinates may be provisional because the movement of the lobe may still be assessed with respect to the boundary cushion parameter, as described below. In embodiments, the movement of the lobe at step 816 may be restricted based on a scaling factor α(where 0<α≤1), in order to prevent the lobe from moving too far from its initial coordinates LOi . As an example, point C in
The process 800 may also continue to step 812 if at step 810 the lobe auto-focuser 160 determines that the coordinates of the new sound activity are not outside (i.e., are inside) the move radius of the lobe. As an example, point B in
The process 800 may also continue to step 814 if at step 812 the lobe auto-focuser 160 determines that the coordinates of the new sound activity are not close to a boundary cushion. At step 812, the lobe auto-focuser 160 may transmit the new coordinates of the lobe to the beamformer 170 so that the beamformer 170 can update the location of the existing lobe to the new coordinates. In embodiments, the new coordinates {right arrow over (LCi)} of the lobe may be defined as {right arrow over (LCi)}={right arrow over (LOi)}+min(α,β){right arrow over (M)}={right arrow over (LOi)}+{right arrow over (Mr)}, where {right arrow over (M)} is a motion vector and {right arrow over (Mr)} is a restricted motion vector, as described in more detail below. In embodiments, the lobe auto-focuser 160 may store the new coordinates of the lobe in the database 180.
Depending on the steps of the process 800 described above, when a lobe is moved due to the detection of new sound activity, the new coordinates of the lobe may be: (1) the coordinates of the new sound activity, if the coordinates of the new sound activity are within the look radius of the lobe, within the move radius of the lobe, and not close to the boundary cushion of the associated lobe region; (2) a point in the direction of the motion vector towards the new sound activity and limited to the range of the move radius, if the coordinates of the new sound activity are within the look radius of the lobe, outside the move radius of the lobe, and not close to the boundary cushion of the associated lobe region; or (3) just outside the boundary cushion, if the coordinates of the new sound activity are within the look radius of the lobe and close to the boundary cushion.
The process 800 may be continuously performed by the array microphone 700 as the audio activity localizer 150 finds new sound activity and provides the coordinates and confidence score of the new sound activity to the lobe auto-focuser 160. For example, the process 800 may be performed as audio sources, e.g., human speakers, are moving around a conference room so that one or more lobes can be focused on the audio sources to optimally pick up their sound.
An embodiment of a process 900 for determining whether the coordinates of new sound activity are outside the look radius of a lobe is shown in
After computing the motion vector M at step 902, the process 900 may continue to step 904. At step 904, the lobe auto-focuser 160 may determine whether the magnitude of the motion vector is greater than the look radius for the lobe, as in the following: |{right arrow over (M)}|=√{square root over ((mx)2+(my)2)}+(mz)2>(LookRadius)i. If the magnitude of the motion vector {right arrow over (M)} is greater than the look radius for the lobe at step 904, then at step 906, the coordinates of the new sound activity may be denoted as outside the look radius for the lobe. For example, as shown in
An embodiment of a process 1100 for limiting the movement of a lobe to within its move radius is shown in
After computing the motion vector {right arrow over (M)} at step 1102, the process 1100 may continue to step 1104. At step 1104, the lobe auto-focuser 160 may determine whether the magnitude of the motion vector {right arrow over (M)} is less than or equal to the move radius for the lobe, as in the following: |{right arrow over (M)}|≤(MoveRadius)i. If the magnitude of the motion vector {right arrow over (M)} is less than or equal to the move radius at step 1104, then at step 1106, the new coordinates of the lobe may be provisionally moved to the coordinates of the new sound activity. For example, as shown in
However, if the magnitude of the motion vector {right arrow over (M)} is greater than the move radius at step 1104, then at step 1108, the magnitude of the motion vector {right arrow over (M)} may be scaled by a scaling factor α to the maximum value of the move radius while keeping the same direction, as in the following:
where the scaling factor a may be defined as:
Based on the above, moving from the original coordinates LOi of lobe i in the direction of the vector {right arrow over (Dij)} but restricting the amount of movement based on a value A (where 0<A<1)
will be within (100*A)% of the boundary between the lobe regions. For example, if A is 0.8(i.e., 80%), then the new coordinates of a moved lobe would be within 80% of the boundary between lobe regions. Therefore, the value A can be utilized to create the boundary cushion between two adjacent lobe regions. In general, a larger boundary cushion can prevent a lobe from moving into another lobe region, while a smaller boundary cushion can allow a lobe to move closer to another lobe region.
In addition, it should be noted that if a lobe i is moved in a direction towards a lobe j due to the detection of new sound activity (e.g., in the direction of a motion vector {right arrow over (M)} as described above), there is a component of movement in the direction of the lobe j, i.e., in the direction of the vector {right arrow over (Dij)}. In order to find the component of movement in the direction of the vector {right arrow over (Dij)}, the motion vector {right arrow over (M)} can be projected onto the unit vector {right arrow over (Duij)}={right arrow over (Dij)}/|{right arrow over (Dij)}| (which has the same direction as the vector {right arrow over (Dij)} with unity magnitude) to compute a projected vector {right arrow over (PMij)}. As an example,
An embodiment of a process 1400 for creating a boundary cushion of a lobe region using vector projections is shown in
Prior to performing the process 1400, a vector {right arrow over (Dij)} and unit vectors {right arrow over (Duij)}={right arrow over (Dij)}/{right arrow over (|Dij|)} can be computed for all pairs of active lobes. As described previously, the vectors {right arrow over (Dij)} may connect the original coordinates of lobes i and j. The parameter Ai (where 0<A<1) may be determined for all active lobes, which characterizes the size of the boundary cushion for each lobe region. As described previously, prior to the process 1400 being performed (i.e., prior to step 818 of the process 800), the lobe region of new sound activity may be identified (i.e., at step 806) and a motion vector may be computed (i.e., using the process 1100/step 810).
At step 1402 of the process 1400, the projected vector {right arrow over (PMij)} may be computed for all lobes that are not associated with the lobe region identified for the new sound activity. The magnitude of a projected vector {right arrow over (PMij)} (as described above with respect to
When PMij<0, the motion vector {right arrow over (M)} has a component in the opposite direction of the vector {right arrow over (Dij)}. This means that movement of a lobe i would be in the direction opposite of the boundary with a lobe j. In this scenario, the boundary cushion between lobes i and j is not a concern because the movement of the lobe i would be away from the boundary with lobe j. However, when PMij>0, the motion vector M has a component in the same direction as the direction of the vector {right arrow over (Dij)}. This means that movement of a lobe i would be in the same direction as the boundary with lobe j. In this scenario, movement of the lobe i can be limited to outside the boundary cushion so that
where Ai (with 0<Ai<1) is a parameter that characterizes the boundary cushion for a lobe region associated with lobe i.
A scaling factor β may be utilized to ensure that
The scaling factor β may be used to scale the motion vector {right arrow over (M)} and be defined as
Accordingly, if new sound activity is detected that is outside the boundary cushion of a lobe region, then the scaling factor β may be equal to 1, which indicates that there is no scaling of the motion vector {right arrow over (M)}. At step 1404, the scaling factor β may be computed for all the lobes that are not associated with the lobe region identified for the new sound activity.
At step 1406, the minimum scaling factor β can be determined that corresponds to the boundary cushion of the nearest lobe regions, as in the following:
After the minimum scaling factor β has been determined at step 1406, then at step 1408, the minimum scaling factor β may be applied to the motion vector {right arrow over (M)} to determine a restricted motion vector {right arrow over (Mr)}=min(α,β){right arrow over (M)}.
For example,
The projected vector {right arrow over (PM34)} depicted in
while the scaling factor β2 (for lobe 2) is less than 1 because the new sound activity S is inside the boundary cushion between lobe region 2 and lobe region 3
Accordingly, the minimum scaling factor β2 may be utilized to ensure that lobe 3 moves to the coordinate Sr.
The array microphone 1700 of
The transducer 1602, 1702 may be utilized to play the sound of the remote audio signal in the local environment where the array microphone 1600, 1700 is located. The activity detector 1604, 1704 may detect an amount of activity in the remote audio signal. In some embodiments, the amount of activity may be measured as the energy level of the remote audio signal. In other embodiments, the amount of activity may be measured using methods in the time domain and/or frequency domain, such as by applying machine learning (e.g., using cepstrum coefficients), measuring signal non-stationarity in one or more frequency bands, and/or searching for features of desirable sound or speech.
In embodiments, the activity detector 1604, 1704 may be a voice activity detector (VAD) which can determine whether there is voice present in the remote audio signal. A VAD may be implemented, for example, by analyzing the spectral variance of the remote audio signal, using linear predictive coding, applying machine learning or deep learning techniques to detect voice, and/or using well-known techniques such as the ITU G.729 VAD, ETSI standards for VAD calculation included in the GSM specification, or long term pitch prediction.
Based on the detected amount of activity, automatic lobe adjustment may be performed or inhibited. Automatic lobe adjustment may include, for example, auto focusing of lobes, auto focusing of lobes within regions, and/or auto placement of lobes, as described herein. The automatic lobe adjustment may be performed when the detected activity of the remote audio signal does not exceed a predetermined threshold. Conversely, the automatic lobe adjustment may be inhibited (i.e., not be performed) when the detected activity of the remote audio signal exceeds the predetermined threshold. For example, exceeding the predetermined threshold may indicate that the remote audio signal includes voice, speech, or other sound that is preferably not to be picked up by a lobe. By inhibiting automatic lobe adjustment in this scenario, a lobe will not be focused or placed to avoid picking up sound from the remote audio signal.
In some embodiments, the activity detector 1604, 1704 may determine whether the detected amount of activity of the remote audio signal exceeds the predetermined threshold. When the detected amount of activity does not exceed the predetermined threshold, the activity detector 1604, 1704 may transmit an enable signal to the lobe auto-focuser 160 or the lobe auto-placer 460, respectively, to allow lobes to be adjusted. In addition to or alternatively, when the detected amount of activity of the remote audio signal exceeds the predetermined threshold, the activity detector 1604, 1704 may transmit a pause signal to the lobe auto-focuser 160 or the lobe auto-placer 460, respectively, to stop lobes from being adjusted.
In other embodiments, the activity detector 1604, 1704 may transmit the detected amount of activity of the remote audio signal to the lobe auto-focuser 160 or to the lobe auto-placer 460, respectively. The lobe auto-focuser 160 or the lobe auto-placer 460 may determine whether the detected amount of activity exceeds the predetermined threshold. Based on whether the detected amount of activity exceeds the predetermined threshold, the lobe auto-focuser 160 or lobe auto-placer 460 may execute or pause the adjustment of lobes.
The various components included in the array microphone 1600, 1700 may be implemented using software executable by one or more servers or computers, such as a computing device with a processor and memory, graphics processing units (GPUs), and/or by hardware (e.g., discrete logic circuits, application specific integrated circuits (ASIC), programmable gate arrays (PGA), field programmable gate arrays (FPGA), etc.
An embodiment of a process 1800 for inhibiting automatic adjustment of beamformed lobes of an array microphone based on a remote far end audio signal is shown in
At step 1802, a remote audio signal may be received at the array microphone 1600, 1700. The remote audio signal may be from a far end (e.g., a remote location), and may include sound from the far end (e.g., speech, voice, noise, etc.). The remote audio signal may be output on a transducer 1602, 1702 at step 1804, such as a loudspeaker in the local environment. Accordingly, the sound from the far end may be played in the local environment, such as during a conference call so that the local participants can hear the remote participants.
The remote audio signal may be received by an activity detector 1604, 1704, which may detect an amount of activity of the remote audio signal at step 1806. The detected amount of activity may correspond to the amount of speech, voice, noise, etc. in the remote audio signal. In embodiments, the amount of activity may be measured as the energy level of the remote audio signal. At step 1808, if the detected amount of activity of the remote audio signal does not exceed a predetermined threshold, then the process 1800 may continue to step 1810. The detected amount of activity of the remote audio signal not exceeding the predetermined threshold may indicate that there is a relatively low amount of speech, voice, noise, etc. in the remote audio signal. In embodiments, the detected amount of activity may specifically indicate the amount of voice or speech in the remote audio signal. At step 1810, lobe adjustments may be performed. Step 1810 may include, for example, the processes 200 and 300 for automatic focusing of beamformed lobes, the process 400 for automatic placement of beamformed lobes, and/or the process 800 for automatic focusing of beamformed lobes within lobe regions, as described herein. Lobe adjustments may be performed in this scenario because even though lobes may be focused or placed, there is a lower likelihood that such a lobe will pick up undesirable sound from the remote audio signal that is being output in the local environment. After step 1810, the process 1800 may return to step 1802.
However, if at step 1808 the detected amount of activity of the remote audio signal exceeds the predetermined threshold, then the process 1800 may continue to step 1812. At step 1812, no lobe adjustment may be performed, i.e., lobe adjustment may be inhibited. The detected amount of activity of the remote audio signal exceeding the predetermined threshold may indicate that there is a relatively high amount of speech, voice, noise, etc. in the remote audio signal. Inhibiting lobe adjustments from occurring in this scenario may help to ensure that a lobe is not focused or placed to pick up sound from the remote audio signal that is being output in the local environment. In some embodiments, the process 1800 may return to step 1802 after step 1812. In other embodiments, the process 1800 may wait for a certain time duration at step 1812 before returning to step 1802. Waiting for a certain time duration may allow reverberations in the local environment (e.g., caused by playing the sound of the remote audio signal) to dissipate.
The process 1800 may be continuously performed by the array microphones 1600, 1700 as the remote audio signal from the far end is received. For example, the remote audio signal may include a low amount of activity (e.g., no speech or voice) that does not exceed the predetermined threshold. In this situation, lobe adjustments may be performed. As another example, the remote audio signal may include a high amount of activity (e.g., speech or voice) that exceeds the predetermined threshold. In this situation, the performance of lobe adjustments may be inhibited. Whether lobe adjustments are performed or inhibited may therefore change as the amount of activity of the remote audio signal changes. The process 1800 may result in more optimal pick up of sound in the local environment by reducing the likelihood that sound from the far end is undesirably picked up.
Any process descriptions or blocks in figures should be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps in the process, and alternate implementations are included within the scope of the embodiments of the invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those having ordinary skill in the art.
This disclosure is intended to explain how to fashion and use various embodiments in accordance with the technology rather than to limit the true, intended, and fair scope and spirit thereof. The foregoing description is not intended to be exhaustive or to be limited to the precise forms disclosed. Modifications or variations are possible in light of the above teachings. The embodiment(s) were chosen and described to provide the best illustration of the principle of the described technology and its practical application, and to enable one of ordinary skill in the art to utilize the technology in various embodiments and with various modifications as are suited to the particular use contemplated. All such modifications and variations are within the scope of the embodiments as determined by the appended claims, as may be amended during the pendency of this application for patent, and all equivalents thereof, when interpreted in accordance with the breadth to which they are fairly, legally and equitably entitled.
Number | Date | Country | |
---|---|---|---|
62971648 | Feb 2020 | US | |
62855187 | May 2019 | US | |
62821800 | Mar 2019 | US |