This disclosure generally relates to an acoustic echo cancellation (AEC) system. In particular, the disclosure relates to systems and methods for optimizing AEC convergence.
Conferencing environments, such as conference rooms, boardrooms, video conferencing settings, and the like, typically involve the use of microphones (including microphone arrays) for capturing sound from various audio sources in the environment (also known as a “near end”) and loudspeakers for presenting audio from a remote location (also known as a “far end”). For example, persons in a conference room may be conducting a conference call with persons at a remote location. Typically, speech and sound from the conference room may be captured by microphones and transmitted to the remote location, while speech and sound from the remote location may be received and played on loudspeakers in the conference room. Multiple microphones may be used in order to optimally capture the speech and sound in the conference room.
In some cases, the microphones may pick up the speech and sound from the remote location that is played on the loudspeakers. In such situations, the audio transmitted to the remote location may include not only the speech and sound from the conference room (“local microphone signal”), but also the speech and sound from the remote location (“remote audio signal), thus creating an undesirable echo for the persons at the remote location who may be hearing their own speech and sound. If there is no correction, the audio transmitted to the remote location may be low quality or unacceptable because of this echo. Typical acoustic echo cancellation systems utilize an adaptive filter, e.g., a finite impulse response filter, on the remote audio signal to generate a filtered signal that can be subtracted from the local microphone signal to help remove any echo.
The techniques of this disclosure provide systems and methods designed to, among other things: (1) initialize an acoustic echo canceller (AEC) for a first microphone lobe using converged AEC parameters from a second microphone lobe that was previously deployed to the same or nearby location; (2) generate a database configured to store converged AEC parameters in association with corresponding location information; and (3) generate a map of a room or other environment to represent the locations at which various microphone lobes have been deployed in the room, and use the map to assign location information to the AEC parameter(s) corresponding to each microphone lobe.
One exemplary embodiment includes a method of reducing echo in an audio system comprising a microphone, an acoustic echo canceller (AEC), and at least one processor, the method comprising: receiving, by the at least one processor, an audio signal detected by the microphone; deploying, by the at least one processor, a microphone lobe towards a first location associated with the detected audio signal; obtaining, by the at least one processor, one or more AEC parameters for the first location, the one or more AEC parameters being stored in a memory in communication with the at least one processor; initializing, by the at least one processor, the AEC using the one or more AEC parameters; and generating, by the at least one processor, an echo-cancelled output signal using the initialized AEC and based on the detected audio signal and a reference signal provided to the AEC.
Another exemplary embodiment includes an audio system, comprising: a loudspeaker configured to play a far end audio signal received from a remote computing device; a microphone configured to detect a near end audio signal; an acoustic echo canceller (AEC) configured to receive the far end audio signal as a reference signal for estimating an echo in the near end audio signal; a memory configured to store a plurality of AEC parameters for configuring the AEC; and at least one processor in communication with the remote computing device and configured to: receive the near end audio signal from the microphone; deploy a microphone lobe towards a first location associated with the near end audio signal; obtain one or more AEC parameters for the first location; initialize the AEC using the one or more AEC parameters; and generate an echo-cancelled output signal using the initialized AEC and based on the near end audio signal and the reference signal.
Another exemplary embodiment includes a method of generating a database of acoustic echo cancellation (“AEC”) parameters for an environment, the method comprising: generating, by at least one processor, a map of the environment, the map comprising a plurality of location points; receiving, by the at least one processor, an AEC parameter associated with convergence of an acoustic echo canceller for a microphone lobe deployed to a first location; assigning, by the at least one processor, the AEC parameter to a select one of the plurality of location points based on the first location; and storing, in a memory, the AEC parameter in association with the assigned location point.
Another exemplary embodiments includes a method of generating a database of acoustic echo cancellation (“AEC”) parameters for an environment, the method comprising: receiving, by the at least one processor, an AEC parameter associated with convergence of an acoustic echo canceller for a microphone lobe deployed to a first location; receiving, by the at least one processor, location information indicating the first location; and storing, in a memory, the AEC parameter in association with the location information. According to one aspect, the method further comprises storing, in the memory, a convergence timestamp in association with the AEC parameter. According to some aspects, the AEC parameter includes a filter coefficient or a non-linear processing level. According to one aspect, the method further comprises selecting the plurality of location points based on one or more audio coverage areas associated with the environment.
Another exemplary embodiment includes a non-transitory computer-readable storage medium comprising instructions that, when executed by at least one processor, cause the at least one processor to perform: receiving an audio signal detected by a microphone; deploying a microphone lobe towards a first location associated with the detected audio signal; obtaining one or more AEC parameters for the first location, the one or more AEC parameters being stored in a memory in communication with the at least one processor; initializing an acoustic echo canceller (AEC) using the one or more AEC parameters; and generating an echo-cancelled output signal using the initialized AEC and based on the detected audio signal and a reference signal provided to the AEC.
Another exemplary embodiment includes a device comprising at least one processor configured to perform: receiving an audio signal detected by a microphone; deploying a microphone lobe towards a first location associated with the detected audio signal; obtaining one or more AEC parameters for the first location, the one or more AEC parameters being stored in a memory in communication with the at least one processor; initializing an acoustic echo canceller (“AEC”) using the one or more AEC parameters; and generating an echo-cancelled output signal using the initialized AEC and based on the detected audio signal and a reference signal provided to the AEC.
Another exemplary embodiment includes a method of reducing echo in an audio system comprising a microphone, an acoustic echo canceller (AEC), and at least one processor, the method comprising: receiving, by the at least one processor, an audio signal detected by the microphone; identifying a first location associated with the detected audio signal; obtaining, by the at least one processor, one or more AEC parameters for the first location, the one or more AEC parameters being stored in a memory in communication with the at least one processor; initializing, by the at least one processor, the AEC using the one or more AEC parameters; and generating, by the at least one processor, an echo-cancelled output signal using the initialized AEC and based on the detected audio signal and a reference signal provided to the AEC. According to aspects, obtaining one or more AEC parameters for the first location comprises: identifying a group of location points within an audio coverage area associated with the environment; determining a first location point within the group of location points that is closest to the first location; obtaining at least one AEC parameter associated with the first location point; and providing the at least one AEC parameter as the one or more AEC parameters for the first location.
These and other embodiments, and various permutations and aspects, will become apparent and be more fully understood from the following detailed description and accompanying drawings, which set forth illustrative embodiments that are indicative of the various ways in which the principles of the invention may be employed.
A typical acoustic echo canceller (AEC) includes an adaptive filter and a subtractor (or other summing component). The adaptive filter models an impulse response of a given environment, or the effects of certain components, like a loudspeaker, on the environment and based thereon, estimates an echo in a local microphone signal captured by a microphone in the environment. For example, the local microphone signal may include near-end audio or speech and other sounds produced by person(s) located in the environment, as well as far-end audio, or speech and other sounds produced by person(s) at a remote location (“remote audio signal”), which is broadcast over a loudspeaker in the environment. If the local microphone signal is transmitted to the remote location as is, the person(s) at the remote location will hear an echo (or linear echo). Using the remote audio signal as a reference signal, the AEC reduces or removes the echo in the local microphone signal before the signal is transmitted to the remote location. In particular, the subtractor subtracts an estimated echo signal, calculated by the adaptive filter based on the reference signal, from the local microphone signal to produce an echo-cancelled output signal. The echo-cancelled output signal is then provided to the remote location. The echo-cancelled output signal is also fed back into the adaptive filter and compared to the reference signal (or remote audio signal) to obtain an error signal. The echo in the remote audio signal may be considered reduced and/or removed (or cancelled) once the error signal calculated by the AEC is below a predetermined threshold (or nears zero).
When the AEC is first initiated for a given microphone lobe (also known as “initialization”), the AEC configures the adaptive filter using a preliminary set of parameters (e.g., filter coefficients, a non-linear processing (NLP) level, etc.) that are randomly selected, or otherwise pre-determined irrespective of the particular environment. As the reference signal changes, the AEC adapts the parameters of the adaptive filter accordingly, until the error signal is below the threshold. This scenario, or when an impulse response modeled by the adaptive filter closely approximates the actual impulse response of the environment, is known as “convergence” of the AEC. Until convergence is achieved, however, the person at the remote location may still detect an echo in the output signal and/or other audio artifacts. Accordingly, there is a need to improve AEC performance by increasing a speed of convergence of an AEC.
Systems and methods are provided herein for improving acoustic echo cancellation performance and more specifically, speeding up convergence of an acoustic echo canceller (AEC) for a microphone lobe deployed to a given location, by initializing the AEC using previously-converged parameters associated with the same or nearby location. Existing systems typically discard previously-converged AEC parameters once they are no longer being used, for example, due to a change in the lobe position and/or echo path, or any other change that renders the existing AEC parameters invalid. The techniques described herein recognize that a microphone lobe that is newly deployed to a given location is likely to have similar acoustic properties as a prior microphone lobe deployed to the same location, or to another nearby location. As a result, the AEC may be more quickly converged for the new microphone lobe if the AEC is initialized, or pre-populated, with the converged AEC parameters from the prior microphone lobe, rather than starting with a blank slate. Accordingly, the systems and methods described herein include storing the previously-converged AEC parameters in association with corresponding location information to enable automatic retrieval of appropriate parameters (e.g., based on proximity) when initializing the AEC for a new microphone lobe, thus reducing AEC convergence time and improving overall AEC performance.
In embodiments, the previously-converged AEC parameters may be stored in association with information about the corresponding microphone lobe, including the location of the lobe and in some cases, a directionality of the lobe, a width of the lobe, and/or others. For example, the converged AEC parameters and corresponding lobe information may be stored in a database to facilitate faster retrieval of location-appropriate parameters when deploying a new microphone lobe, or moving an existing lobe to a new location. During initialization of the AEC for the new microphone lobe, one or more previously-converged AEC parameters may be retrieved from the database based on the intended location of the new microphone lobe. If the exact location is not included in the database, a nearest location within the database may be determined, and the previously-converged AEC parameters corresponding to the nearest location may retrieved. The retrieved parameters may be applied to the adaptive filter and/or other component of the AEC to complete initialization.
In some embodiments, the audio system includes a room mapping tool (or “room mapper”) configured to generate a map, or grid, of the room or other environment for representing the locations at which various microphone lobes have been previously deployed in the room. For example, the grid may be used to assign a point on the grid to each previously-converged AEC parameter, and each assigned grid point may be stored in the AEC parameter database as the location information for the corresponding AEC parameter. In some cases, each AEC parameter is also stored in association with a timestamp that indicates the time of convergence for that parameter, for example, so that only the most recent AEC parameters are used for initialization of the AEC. In some embodiments, the database may be continuously updated each time a new microphone lobe is deployed, for example, by storing any newly converged AEC parameters in the database in association with the corresponding location information (e.g., grid point) and timestamp.
As used herein, the terms “lobe” and “microphone lobe” refer to an audio beam generated by a given microphone array (or array microphone) to pick up audio signals at a select location, such as the location towards which the lobe is directed. While the techniques disclosed herein are described with reference to microphone lobes generated by array microphones, in some cases, the same or similar techniques may be utilized with other forms or types of microphone coverage (e.g., a cardioid pattern, etc.) and/or with microphones that are not array microphones (e.g., a handheld microphone, boundary microphone, lavalier microphones, etc.). Thus, the term “lobe” is intended to cover any type of audio beam or coverage.
Referring now to
As will be appreciated, various components included in the audio system 100 may be implemented using software executable by one or more servers or computers, such as the computing device 106 and/or other computing device with a processor and memory (e.g., device 500 shown in
Environments such as conference rooms may utilize the audio system 100 (also referred to as a “communication system”) to facilitate communication with persons at the remote location, for example. The type of microphone 102 and its placement in a particular environment may depend on the locations of audio sources, physical space requirements, aesthetics, room layout, and/or other considerations. For example, in some environments, the microphone 102 may be placed on a table or lectern near the audio source. In other environments, the microphone 102 may be mounted overhead to capture the sound from the entire room, for example. The audio system 100 may work in conjunction with any type and any number of microphones 102, including one or more microphone transducers (or elements), a microphone array, or other audio input device capable of capturing speech and other sounds. As an example, the microphone 102 may include, but is not limited to, SHURE MXA310, MX690, MXA910, and the like.
Loudspeaker 104 may be any type of audio speaker, speaker system, or other audio output device for audibly playing audio signals received from the remote location, such as remote audio signal 109 (also referred to herein as “far end audio signal”), or other sounds associated with the communication event. As an example, the loudspeaker 104 may include, but is not limited to, SHURE MXN5W-C and the like.
Computing device 106 may be configured to enable a conferencing call or otherwise implement one or more aspects of the communication between the audio system 100 and the remote location. The computing device 106 can be any generic computing device comprising at least one processor and a memory device (such as, e.g., computing device 500 as shown in
Microphone 102 can be configured to detect sound in the environment and convert the sound to an audio signal 105 (also referred to herein as “local microphone signal”). In some embodiments, the audio signal 105 detected by the microphone 102 may be processed by a beamformer 110 to generate one or more beamformed audio signals, or otherwise direct an audio pick-up beam, or microphone lobe, towards a particular location in the environment (e.g., as shown in
While
The remote audio signal 109 received from the remote location may be provided not only to the loudspeaker 104, but also to the acoustic echo canceller (AEC) 108 as a reference signal. As shown in
In embodiments, the AEC 108 can be configured to continuously improve the echo-cancelled output signal 107 until convergence is achieved for the corresponding microphone lobe, or said differently, until the echo path modeled by the adaptive filter 112 (e.g., filtered remote audio signal 111) closely approximates the actual echo in the environment. The AEC 108 may improve the output signal 107 by dynamically adjusting one or more parameters, such as, for example, a least means square (“LMS”) coefficient, normalized LMS coefficient (“NLMS”), recursive least squares coefficient (“RLS”), and/or other filter coefficient for the adaptive filter 112 or other component, parameters of another gradient-descent based algorithm, delay values applied to the adaptive filter 112, or any combination thereof (e.g., coefficient-delay pairings or “taps”), a non-linear processing (“NLP”) level (e.g., none, low, medium, high, etc.) or other attenuation level applied to the output of the adaptive filter 112 to suppress the echo, and/or other AEC parameters.
More specifically, the AEC 108 may be configured to identify an error level in the echo-cancelled output signal 107 and adapt one or more parameters of the AEC 108 until the error level is below a predetermined threshold (or zero). For example, as shown in
As shown in
In various embodiments, the control unit 116 may be a controller or other suitable control device and may include a processor and memory configured to carry out instructions and/or commands in accordance with the techniques described herein. The parameter database 118 may be stored in a memory of the computing device 106 or other memory of the audio system 100. In some embodiments, the information stored in the database 118 may be compressed, for example, using known techniques, in order to reduce the amount of memory occupied by the database 118. While
In embodiments, the control unit 116 may be configured to monitor a convergence status of the AEC 108. Once the adaptive filter 112 provides or reports a “converged” status, or the control unit 116 otherwise determines that the AEC 108 has achieved convergence for a given microphone lobe, the control unit 116 may be configured to request or obtain the AEC parameters that were used to achieve convergence for the given microphone lobe. The converged AEC parameters received at the control unit 116 may include, for example, the filter coefficients and/or taps applied to the adaptive filter 112, the NLP level applied to the output of the filter 112, and/or any other characteristics of the AEC 108 that are relevant to achieving convergence. In some cases, the control unit 116 may also receive, from the AEC 108, other information about the corresponding microphone lobe, such as, for example, a directionality of the microphone lobe, lobe width information, identifying information associated with the lobe (e.g., a lobe identifier, numerical label, or other code), and more.
In some embodiments, the control unit 116 may be configured to receive a location input 117 that comprises location information for the microphone lobe that corresponds to the converged AEC parameters. In some cases, the location input 117 may be provided or transmitted to the control unit 116 by the microphone 102 or other component of the audio system 100 that is outside the computing device 106, for example, as shown in
In some embodiments, the location information included in the location input 117, or otherwise, may include location coordinates for a center of the corresponding microphone lobe, the location to which the microphone lobe was deployed by the microphone 102, or other location associated with the microphone lobe. The location coordinates may be relative to a center of the microphone 102 or otherwise included in a coordinate system of the microphone 102. In some cases, the location coordinates may be Cartesian or rectangular coordinates that represent a location point in three dimensions, or x, y, and z values. In other cases, the location coordinates may be polar or spherical coordinates, i.e. azimuth (phi), elevation (theta), and radius (r), which may be obtained from the Cartesian coordinates using a transformation formula, as is known in the art. In various embodiments, the location coordinates may be generated by a localization software or other algorithm included in the microphone 102 or other component of the audio system 100. For example, the localization software in the microphone 102 may be configured to generate a localization of a detected sound, or other audio source, and determine coordinates that represent a location or position of the detected audio source relative to the microphone 102 (or microphone array). These location coordinates may be provided to the control unit 116 as the location input 117. Various methods for generating sound localizations are known in the art, including, for example, generalized cross-correlation (“GCC”) and others.
Referring additionally to
In some embodiments, the number of lobes 204 may be fixed at eight, or other number (e.g., six, four, etc.). In other embodiments, the number of lobes 204 may be selectable by a user and/or automatically determined based on the locations of the various audio sources detected by the microphone array 202. Similarly, in some embodiments, a directionality and/or location of each lobe 204 may be fixed, such that the lobes 204 always form a specific configuration (e.g., the “flower pattern” shown in
More specifically, in various embodiments, the microphone array 202, the beamformer 110, and/or other component of the audio system 100 (e.g., the computing device 106) may be configured to automatically place or deploy a select one of the microphone lobes 204 based on a directionality of the audio signal detected by the microphone array (e.g., audio signal 105 in
For example,
Referring back to
In embodiments where movement of a microphone lobe involves a two-step process as shown in
Referring back to
In various embodiments, the control unit 116 and/or the computing device 106 may be configured to populate the parameter database 118 over time, or dynamically during normal use of the audio system 100. For example, the control unit 116 may be configured to store the corresponding AEC parameters in the database 118 each time the AEC 108 is converged for a new microphone lobe during a communication event. In such cases, the database 118 may be generated based on historical information, and the location information may be received at the control unit 116 in the form of the location input 117, or other input, from one or more components of the audio system 100, as described herein.
Alternatively, or additionally, the control unit 116 and/or the computing device 106 may be configured to generate or build the parameter database 118 during an initial set up phase of the audio system 100, or otherwise prior to normal use of the audio system 100. For example, the control unit 116 and/or the computing device 106 may be configured to play a test signal (e.g., pink noise, white noise, etc.) over the loudspeaker 104 or other audio output device in the environment. The microphone 102 may detect the test signal as coming from a given location (e.g., the location of the loudspeaker 104) and deploy a microphone lobe towards that location. The control unit 116 may initialize the AEC 108 for the microphone lobe using known (e.g., default or generic) parameters, and once convergence is achieved, store the converged AEC parameters in the database 118 in association with the location of that microphone lobe. The test signal may be played again and again at various locations around the room or other environment, for example, either at random or at predetermined locations in the room, until the parameter database 118 includes, or is populated with, a sufficiently diverse collection of microphone lobes or otherwise meets a minimum requirement for setting up the database 118 (e.g., minimum number of database entries, database entries corresponding to each corner of the room and/or anticipated lobe locations, etc.).
In some embodiments, the audio system 100 and/or the computing device 106 may include a mapping tool (such as, e.g., mapping tool 512 shown in
In
In embodiments, when a new microphone lobe 204 is deployed, the microphone lobe 204 may be plotted on the grid 300 (e.g., Lobe 1 in
While
Various embodiments describe using prior microphone lobe locations to identify appropriate converged AEC parameters for a new talker location, for example, as shown in
More specifically, audio coverage areas may be defined as regions designated within the environment for capturing audio signals, such as, e.g., speech produced by human speakers. In some cases, the audio coverage areas (or “audio pick-up regions”) may designate or delineate the spaces within which lobes can be deployed by the microphones, other beamforming techniques can be focused for audio pick-up, or audio can otherwise be captured by the microphones. Conversely, the areas outside the audio coverage area(s) may correspond to the spaces where audio capture will be rejected or avoided by the microphones. The exact number, size, and shape of the audio coverage area(s) may vary depending on the size, shape, and type of environment, for example.
For example,
As another example,
In various embodiments, the mapping tool and/or the control unit 116 can be configured to select, or limit, the plurality of location points that are available for matching to a newly-detected talker location based on the one or more audio coverage areas associated with the environment (e.g., room). For example, in
As another example, in
In some embodiments, the mapping tool can be further configured to ensure that only the location points within the audio coverage area(s) are available for matching to incoming talker locations and that audio from outside the audio coverage area(s) is rejected. For example, the mapping tool may be configured to select the plurality of location points 608 or 710 based on the audio coverage areas, so that the map only displays those prior audio pick-up locations that are within the audio coverage area(s) associated with the given environment, instead of displaying all known location points as illustrated. In some embodiments, the parameter database 118 may also be configured to limit the pool of available location points, for example, by storing only those location points that are within the regions of the environment covered by audio coverage areas and thus, speed up data processing and retrieval times.
Referring now to
As shown in
Process 400 may further include, at step 404, generating, by the at least one processor, a database (e.g., parameter database 118 of
In various embodiments, generating the database at step 404 may include receiving a plurality of AEC parameters (or converged AEC parameters), each AEC parameter associated with convergence of the AEC (e.g., AEC 108 of
Step 404 may also include storing each of the plurality of AEC parameters in association with the corresponding location information (e.g., the location of the corresponding microphone lobe, the corresponding talker location, etc.). Each AEC parameter may represent an acoustic property of a location towards which the corresponding microphone lobe is deployed, or audio beamforming is otherwise focused. For example, the AEC parameters may include a least means square (“LMS”) coefficient, normalized LMS coefficient (“NLMS”), recursive least squares coefficient (“RLS”), and/or any other suitable filter coefficient, parameters of another gradient-descent based algorithm, a non-linear processing (“NLP”) level, and/or any other parameter that may be used to configure the AEC.
In some embodiments, generating the database at step 404 also includes storing a convergence timestamp in association with each AEC parameter. The convergence timestamp may indicate the time at which the converged AEC parameter was received at the database, the time at which AEC convergence was achieved for the particular microphone lobe, or other timing information associated with the corresponding AEC parameter.
In some embodiments, step 404 may be carried out during a setup phase of the audio system in order to populate the database prior to normal use of the audio system. For example, test signals may be played to various locations in the environment in order to trigger lobe deployment to those locations, and the converged AEC parameters for each of those lobes may be stored in the database in association with the lobe information and/or location information for the lobe, as described herein.
In some embodiments, the corresponding location information is obtained or determined using the map generated at step 402. For example, step 404 may include receiving, by the at least one processor, an AEC parameter associated with convergence of the AEC for a microphone lobe deployed to a first location; assigning, by the at least one processor, the received AEC parameter to a select one of the location points on the map based on the first location; and storing, in the database, the AEC parameter in association with the point assigned to the AEC parameter, for example, as the corresponding location information for that AEC parameter. In other cases, the AEC parameters received at step 404 may be associated with other beamformed audio pick-up locations on the map generated at step 402 that fall within an audio coverage area associated with the environment, instead of, or in addition to, microphone lobes, as described herein. For the sake of brevity and clarity, the remaining steps of process 400 will be described with reference to microphone lobe locations, but it should be appreciated that the process 400 may also be applied to other beamformed audio pick-up locations within an audio coverage area using at least somewhat similar techniques.
In some embodiments, steps 402 and 404 may be completed before proceeding to step 406, as shown in
As shown in
From step 408, the process 400 may proceed to step 410, which includes obtaining, by the at least one processor, one or more AEC parameters for the first location. In some embodiments, obtaining the one or more AEC parameters at step 410 may include retrieving the one or more AEC parameters from the database generated at step 404 based on the first location. For example, said retrieving may include determining whether the database includes the first location, or a database entry comprising converged AEC parameters that were previously determined for the first location. If the first location is found, the AEC parameters for the first location may be retrieved from the database at step 410. If, on the other hand, the at least one processor determines that the database does not include AEC parameters for the first location, retrieving the one or more AEC parameters from the database based on the first location may include determining a second location that is closest to the first location and is associated with at least one of the plurality of AEC parameters stored in the database; retrieving, from the database, the at least one of the plurality of AEC parameters associated with the second location; and providing the at least one of the plurality of AEC parameters as the one or more AEC parameters associated with the first location.
In some embodiments, the one or more AEC parameters obtained at step 410 may be retrieved from the memory that is in communication with the at least one processor (e.g., instead of the database generated at step 404). In such cases, the AEC parameters and corresponding location information may be stored in the memory, and obtaining the one or more AEC parameters for the first location at step 410 comprises identifying another, or third, location that is closest to the first location; obtaining, from the memory, at least one AEC parameter associated with the third location; and providing, to the AEC, the at least one AEC parameter as the one or more AEC parameters for the first location. For example, the at least one AEC parameter may correspond to a second microphone lobe previously deployed towards the third location. As another example, the at least one AEC parameter may correspond to other beamformed audio pick-up location that is closest to the third location and within a corresponding audio coverage area.
In embodiments where audio coverage areas are used instead of, or in addition to microphone lobes, obtaining the one or more AEC parameters at step 410 may include identifying a group of location points within an audio coverage area associated with the environment; determining a first location point within the group of location points that is closest to the first location; obtaining at least one AEC parameter associated with the first location point; and providing the at least one AEC parameter as the one or more AEC parameters for the first location.
From step 410, the process 400 may continue to step 412, which includes initializing the AEC using the one or more AEC parameters obtained at step 410 by applying the one or more AEC parameters to the AEC. For example, an adaptive filter (e.g., adaptive filter 112 of
As shown in
From step 414, the process 400 may continue to step 416, where the at least one processor determines a convergence status of the AEC. For example, once the AEC is initialized at step 412 using the previously-converged AEC parameters obtained at step 410, the at least one processor (e.g., control unit 116) may continuously or periodically monitor the convergence status of the AEC to see if convergence has been achieved (e.g., the error signal is minimized), as described herein. If the answer at step 416 is “No,” i.e. the AEC is not yet converged, the process 400 may continue to step 417, where the AEC parameters are updated so as to further minimize the error signal or otherwise move towards convergence (i.e. until the cost function reaches below a given threshold), as described herein. In some cases, step 417 includes adapting the AEC parameters based on incoming audio data, i.e. in real-time, using the techniques described herein. From step 417, the process 400 may loop back to the start of step 414 where a new echo-cancelled output signal is generated using the updated AEC parameters and based on the incoming audio signal and the reference signal provided to the AEC. At step 416, the at least one processor checks the AEC convergence status again, based on the updated echo-cancelled output signal. This loop may continue until the answer at step 416 is “Yes,” i.e. the AEC is converged. Once the AEC is converged, the process 400 continues to step 418, which includes storing, in the memory, a set of AEC parameters corresponding to the convergence status of the AEC. For example, the corresponding AEC parameters (or converged AEC parameters) may be stored in association with the first location of the corresponding microphone lobe in the parameter database, or other location of the memory.
The process 400 may end at step 418 once the converged AEC parameters are stored, or may loop back to step 414 to generate a new echo-cancelled output signal and re-check the convergence status based on any newly received audio data, or otherwise ensure that the echo-cancelled output signal generated at step 414 remains relatively error-free. In some embodiments, the converged AEC parameters (e.g., coefficients) may be stored periodically, or at regular intervals, after convergence is achieved, so that the parameter database is kept up to date with the most recently converged AEC parameters. In other embodiments, the AEC parameters may be stored just before moving the relevant lobe to a new location, so that the stored AEC parameters represent the most recent or latest conditions for that lobe.
Processor 502 executes instructions retrieved from the memory 504. In embodiments, the memory 504 stores one or more software programs, or sets of instructions, that embody the techniques described herein. When executed by the processor 502, the instructions may cause the computing device 500 to implement or operate all or parts of the techniques described herein, one or more components of the audio system 100, and/or methods, processes, or operations associated therewith, such as, e.g., process 400 shown in
In general, the computing device 500 may be configured to control and communicate or interface with the other hardware devices included in the audio system 100, such as the microphone 102, the loudspeaker 104, and any other devices in the same network. The computing device 500 may also control or interface with certain software components of the audio system 100. For example, the computing device 500 may interface with a localization module (not shown) installed or included in the microphone 102, the beamformer 110, and/or other component of the audio system, in order to receive sound localization coordinates or other location data for an audio source detected by the microphone 102. In addition, the computing device 500 may be configured to communicate or interface with external components coupled to the audio system 100 (e.g., remote servers, databases, and other devices). For example, the computing device 500 may interface with a component graphical user interface (GUI or CUI) associated with the audio system 100, any existing or proprietary conferencing software, and/or a remote computing device located at the remote location (or far end). In addition, the computing device 500 may support one or more third-party controllers and in-room control panels (e.g., volume control, mute, etc.) for controlling one or more of the audio devices in the audio system 100.
Communication interface 508 may be configured to allow the computing device 500 to communicate with one or more devices (or systems) according to one or more protocols. In some embodiments, the communication interface 508 includes one or more wired communication interfaces, such as, for example, an Ethernet port, a high-definition serial-digital-interface (HD-SDI), an audio network interface with universal serial bus (ANI-USB), a high definition media interface (HDMI) port, a USB port, or an audio port (e.g., a 3.5 mm jack, lightning port, etc.). In some embodiments, the communication interface 508 includes one or more wireless communication interfaces, such as, for example, a broadband cellular communication module (e.g., to support 4G technology, 5G technology, or the like), a short-range wireless communication module (e.g., to support Bluetooth technology, Radio Frequency Identification (RFID) technology, Near Field Communication (NFC) technology, or the like), a long-range wireless communication module (e.g., to support Wi-Fi technology or other Internet connection), or any other type of wireless communication module. In some embodiments, communication interface 508 may enable the computing device 500 to transmit information to, and receive information from, one or more of the loudspeaker 104, the microphone 102, or other component(s) of the audio system 100. Such information may include, for example, location data (e.g., sound localization coordinates), audio coverage area assignments and parameters (or boundaries), lobe information (e.g., directionality, lobe width, and/or other pick-up pattern information), and more.
User interface 510 may facilitate interaction with a user of the computing device 500 and/or audio system 100. As such, the user interface 510 may include input components such as a keyboard, a keypad, a mouse, a touch-sensitive panel, a microphone, and a camera, and output components such as a display screen (which, for example, may be combined with a touch-sensitive panel), a sound speaker, and a haptic feedback system. The user interface 510 may also comprise devices that communicate with inputs or outputs, such as a short-range transceiver (RFID, Bluetooth, etc.), a telephonic interface, a cellular communication port, a router, or other types of network communication equipment. The user interface 510 may be internal to the computing device 500, or may be external and connected wirelessly or via connection cable, such as through a universal serial bus port. In some embodiments, the user interface 510 may include a button, touchscreen, or other input device for receiving a user input associated with movement of a microphone lobe, placement of a new microphone lobe, and the like, and/or a user input associated with indicating the start or end of a set-up mode or phase of the audio system 100 and/or the start or end of a normal use mode or phase of the audio system 100, as described herein.
Any of the processors described herein, such as, e.g., processor 502, may include a general purpose processor (e.g., a microprocessor) and/or a special purpose processor (e.g., an audio processor, a digital signal processor, etc.). In some examples, processor 502, and/or any other processor described herein, may be any suitable processing device or set of processing devices such as, but not limited to, a microprocessor, a microcontroller-based platform, an integrated circuit, one or more field programmable gate arrays (FPGAs), and/or one or more application-specific integrated circuits (ASICs).
Any of the memories or memory devices described herein, such as, e.g., memory 504, may be volatile memory (e.g., RAM including non-volatile RAM, magnetic RAM, ferroelectric RAM, etc.), non-volatile memory (e.g., disk memory, FLASH memory, EPROMs, EEPROMs, memristor-based non-volatile solid-state memory, etc.), unalterable memory (e.g., EPROMs), read-only memory, and/or high-capacity storage devices (e.g., hard drives, solid state drives, etc.). In some examples, memory 504, and/or any other memory described herein, includes multiple kinds of memory, particularly volatile memory and non-volatile memory.
Moreover, any of the memories described herein (e.g., memory 504) may be computer readable media on which one or more sets of instructions, such as the software for operating the techniques described herein, can be embedded. The instructions may reside completely, or at least partially, within any one or more of the memory, the computer readable medium, and/or within one or more processors (e.g., processor 502) during execution of the instructions. In some embodiments, memory 504, and/or any other memory described herein, may include one or more data storage devices configured for implementation of a persistent storage for data that needs to be stored and recalled by the end user, such as, e.g., location data received from one or more audio devices, prestored location data or coordinates indicating a known location of one or more audio devices, and more. In such cases, the data storage device(s) may save data in flash memory or other memory devices. In some embodiments, the data storage device(s) can be implemented using, for example, SQLite data base, UnQLite, Berkeley DB, BangDB, or the like.
In some embodiments, any of the computing devices described herein, such as, e.g., the computing device 200, may include one or more components configured to facilitate a conference call, meeting, classroom, or other event and/or process audio signals associated therewith to improve an audio quality of the event. For example, in various embodiments, the computing device 500, and/or any other computing device described herein, may comprise a digital signal processor (“DSP”) configured to process the audio signals received from the various audio sources using, for example, automatic mixing, matrix mixing, delay, compressor, parametric equalizer (“PEQ”) functionalities, acoustic echo cancellation, and more. In other embodiments, the DSP may be a standalone device operatively coupled or connected to the computing device using a wired or wireless connection. One exemplary embodiment of the DSP, when implemented in hardware, is the P300 IntelliMix Audio Conferencing Processor from SHURE, the user manual for which is incorporated by reference in its entirety herein. As further explained in the P300 manual, this audio conferencing processor includes algorithms optimized for audio/video conferencing applications and for providing a high quality audio experience, including eight channels of acoustic echo cancellation, noise reduction and automatic gain control. Another exemplary embodiment of the DSP, when implemented in software, is the IntelliMix Room from SHURE, the user guide for which is incorporated by reference in its entirety herein. As further explained in the IntelliMix Room user guide, this DSP software is configured to optimize the performance of networked microphones with audio and video conferencing software and is designed to run on the same computer as the conferencing software. In other embodiments, other types of audio processors, digital signal processors, and/or DSP software components may be used to carry out one or more of audio processing techniques described herein, as will be appreciated.
Moreover, the computing device 500, and/or any of the other computing devices described herein, may also comprise various other software modules or applications (not shown) configured to facilitate and/or control the conferencing event, such as, for example, internal or proprietary conferencing software and/or third-party conferencing software (e.g., Microsoft Skype, Microsoft Teams, Bluejeans, Cisco WebEx, GoToMeeting, Zoom, Join.me, etc.). Such software applications may be stored in the memory (e.g., memory 504) of the computing device and/or may be stored on a remote server (e.g., on premises or as part of a cloud computing network) and accessed by the computing device via a network connection. Some software applications may be configured as a distributed cloud-based software with one or more portions of the application residing in the computing device (e.g., computing device 500) and one or more other portions residing in a cloud computing network. One or more of the software applications may reside in an external network, such as a cloud computing network. In some embodiments, access to one or more of the software applications may be via a web-portal architecture, or otherwise provided as Software as a Service (SaaS).
It should be understood that examples disclosed herein may refer to computing devices and/or systems having components that may or may not be physically located in proximity to each other. Certain embodiments may take the form of cloud based systems or devices, and the term “computing device” should be understood to include distributed systems and devices (such as those based on the cloud), as well as software, firmware, and other components configured to carry out one or more of the functions described herein. Further, as noted above, one or more features of the computing device may be physically remote (e.g., a standalone microphone) and may be communicatively coupled to the computing device.
In general, a computer program product in accordance with embodiments described herein includes a computer usable storage medium (e.g., standard random access memory (RAM), an optical disc, a universal serial bus (USB) drive, or the like) having computer-readable program code embodied therein, wherein the computer-readable program code is adapted to be executed by a processor (e.g., working in connection with an operating system) to implement the methods described herein. In this regard, the program code may be implemented in any desired language, and may be implemented as machine code, assembly code, byte code, interpretable source code or the like (e.g., via C, C++, Java, ActionScript, Python, Objective-C, JavaScript, CSS, XML, and/or others). In some embodiments, the program code may be a computer program stored on a non-transitory computer readable medium that is executable by a processor of the relevant device.
The terms “non-transitory computer-readable medium” and “computer-readable medium” include a single medium or multiple media, such as a centralized or distributed database, and/or associated caches and servers that store one or more sets of instructions. Further, the terms “non-transitory computer-readable medium” and “computer-readable medium” include any tangible medium that is capable of storing, encoding or carrying a set of instructions for execution by a processor or that cause a system to perform any one or more of the methods or operations disclosed herein. As used herein, the term “computer readable medium” is expressly defined to include any type of computer readable storage device and/or storage disk and to exclude propagating signals.
Any process descriptions or blocks in the figures, such as, e.g.,
Further, it should be noted that in the description and drawings, like or substantially similar elements may be labeled with the same reference numerals. However, sometimes these elements may be labeled with differing numbers, such as, for example, in cases where such labeling facilitates a more clear description. In addition, system components can be variously arranged, as is known in the art. Also, the drawings set forth herein are not necessarily drawn to scale, and in some instances, proportions may be exaggerated to more clearly depict certain features and/or related elements may be omitted to emphasize and clearly illustrate the novel features described herein. Such labeling and drawing practices do not necessarily implicate an underlying substantive purpose. The above description is intended to be taken as a whole and interpreted in accordance with the principles taught herein and understood to one of ordinary skill in the art.
In this disclosure, the use of the disjunctive is intended to include the conjunctive. The use of definite or indefinite articles is not intended to indicate cardinality. In particular, a reference to “the” object or “a” and “an” object is intended to also denote one of a possible plurality of such objects.
Moreover, this disclosure is intended to explain how to fashion and use various embodiments in accordance with the technology rather than to limit the true, intended, and fair scope and spirit thereof. The foregoing description is not intended to be exhaustive or to be limited to the precise forms disclosed. Modifications or variations are possible in light of the above teachings. The embodiment(s) were chosen and described to provide the best illustration of the principle of the described technology and its practical application, and to enable one of ordinary skill in the art to utilize the technology in various embodiments and with various modifications as are suited to the particular use contemplated. All such modifications and variations are within the scope of the embodiments as determined by the appended claims, which may be amended during the pendency of the application for patent, and all equivalents thereof, when interpreted in accordance with the breadth to which they are fairly, legally and equitably entitled.
This application claims the benefit of U.S. Provisional Patent Application No. 63/304,286, filed on Jan. 28, 2022, the entirety of which is incorporated by reference herein.
Number | Date | Country | |
---|---|---|---|
63304286 | Jan 2022 | US |