This disclosure generally relates to an audio system located in a conference room or other conferencing environment. More specifically, this disclosure relates to automatically configuring audio coverage areas of the audio system within the conferencing environment.
Conferencing environments, such as conference rooms, boardrooms, video conferencing settings, and the like, typically involve the use of microphones for capturing sound from various audio sources active in such environments. Such audio sources may include human participants of a conference call, for example, that are producing speech, music and other sounds. The captured sound may be disseminated to a local audience in the environment through amplified speakers (for sound reinforcement), and/or to others remote from the environment (such as, e.g., a via a telecast and/or webcast) using communication hardware. The conferencing environment may also include one or more loudspeakers or audio reproduction devices for playing out loud audio signals received, via the communication hardware, from the remote participants, or human speakers that are not located in the same room. These and other components of a given conferencing environment may be included in one or more conferencing devices and/or operate as part of an audio system.
In general, conferencing devices are available in a variety of sizes, form factors, mounting options, and wiring options to suit the needs of particular environments. The types of conferencing devices, their operational characteristics (e.g., lobe direction, gain, etc.), and their placement in a particular conferencing environment may depend on a number of factors, including, for example, the locations of the audio sources, locations of listeners, physical space requirements, aesthetics, room layout, and/or other considerations. For example, in some environments, a conferencing device may be placed on a table or lectern to be near the audio sources and/or listeners. In other environments, a conferencing device may be mounted overhead or on a wall to capture the sound from, or project sound towards, the entire room, for example.
Typically, a system designer or other professional installer installs an audio system in a given environment or room by manually connecting, testing, and configuring each piece of equipment to ensure optimal performance of the overall system. As an example, when installing microphones, the installer ensures optimal audio coverage of the environment by delineating “audio coverage areas,” which represent the regions in the environment that are designated for capturing audio signals, such as, e.g., speech produced by human speakers. These audio coverage areas then define the spaces where lobes can be deployed by the microphones. A given environment or room can include one or more audio coverage areas, depending on the size, shape, and type of environment. For example, the audio coverage area for a typical conference room may include the seating areas around a conference table, while the audio coverage area for a typical classroom may include the space around a blackboard and/or podium at the front of the room.
Accordingly, there is still a need for an audio system that can be optimally configured and maintained with minimal setup time, cost, and manual effort.
The invention is intended to solve the above-noted and other problems by providing systems and methods that are designed to, among other things: (1) automatically configure audio coverage areas (or “audio pick-up regions”) for an environment using location data obtained over time from one or more audio devices positioned within the environment, (2) dynamically adapt the audio coverage areas as new location data is received, and (3) automatically determine a position of a given audio device relative to another audio device using time-synchronized location data obtained from both audio devices.
One exemplary embodiment includes an audio system comprising: a plurality of microphones disposed in an environment, the plurality of microphones comprising a first subset of microphones and a second subset of microphones, wherein the first subset of microphones is configured to detect one or more audio sources, and generate first location data indicating a location of each of the one or more audio sources relative to the first subset of microphones, and the second subset of microphones is configured to detect the one or more audio sources, and generate second location data indicating the location of each of the one or more audio sources relative to the second subset of microphones; and at least one processor communicatively coupled to the plurality of microphones, wherein the at least one processor is configured to: receive the first location data and the second location data from the plurality of microphones; define a plurality of audio pick-up regions in the environment based on the first location data and the second location data, the plurality of audio pick-up regions comprising a first audio pick-up region and a second audio pick-up region; assign the first audio pick-up region to the first subset of microphones based on a proximity of the first subset of microphones to the first audio pick-up region, the first subset of microphones being configured to deploy a first lobe within the first audio pick-up region; and assign the second audio pick-up region to the second subset of microphones based on a proximity of the second subset of microphones to the second audio pick-up region, the second subset of microphones being configured to deploy a second lobe within the second audio pick-up region.
According to certain aspects, the first subset of microphones is disposed in a first microphone array and the second subset of microphones is disposed in a second microphone array. According to further aspects, the at least one processor is further configured to: receive, from each of the first microphone array and the second microphone array, a timestamp with each set of coordinates included in the first location data and the second location data; based on the timestamp received for each set of coordinates included in the first location data and the second location data, identify a first set of coordinates received from the first microphone array and corresponding to a first point in time, and a second set of coordinates received from the second microphone array and corresponding to the first point in time, wherein the first set of coordinates is located in a first coordinate system associated with the first microphone array, and the second set of coordinates is located in a second coordinate system associated with the second microphone array; apply a transform function to the second set of coordinates, the transform function configured to transform the second set of coordinates into a transformed second set of coordinates located in the first coordinate system; and determine a location of the second microphone array relative to the first microphone array based on the transformed second set of coordinates. According to some aspects, the at least one processor is further configured to determine, based on the relative location of the second microphone array, the proximity of the second microphone array to the second audio pick-up region. According to some aspects, the at least one processor is further configured to calculate the transform function based on the first set of coordinates and the second set of coordinates. According to some aspects, the at least one processor is further configured to determine a location of a first one of the one or more audio sources relative to the first microphone array based on the first set of coordinates and the transformed second set of coordinates.
Another exemplary embodiment includes a method of automatically configuring audio coverage for an environment having a plurality of microphones communicatively coupled to at least one processor, the plurality of microphones including a first subset of microphones and a second subset of microphones, the method comprising: receiving, with at least one processor, first location data from the first subset of microphones, the first location data indicating a location of each of one or more audio sources relative to the first subset of microphones; receiving, with at least one processor, second location data from the second subset of microphones, the second location data indicating the location of each of the one or more audio sources relative to the second subset of microphones; defining, with the at least one processor, a plurality of audio pick-up regions in the environment based on the first location data and the second location data, the plurality of audio pick-up regions comprising a first audio pick-up region and a second audio pick-up region; assigning, with the at least one processor, the first audio pick-up region to the first subset of microphones based on a proximity of the first subset of microphones to the first audio pick-up region, the first subset of microphones being configured to deploy a first lobe within the first audio pick-up region; and assigning, with the at least one processor, the second audio pick-up region to the second subset of microphones based on a proximity of the second subset of microphones to the second audio pick-up region, the second subset of microphones being configured to deploy a second lobe within the second audio pick-up region.
According to certain aspects, the first subset of microphones is disposed in a first microphone array and the second subset of microphones is disposed in a second microphone array. According to further aspects, the method further comprises receiving, with the at least one processor, a timestamp with each set of coordinates included in the first location data and the second location data; based on the timestamp received for each set of coordinates in the first location data and the second location data, identifying, with the at least one processor, a first set of coordinates received from the first microphone array and corresponding to a first point in time, and a second set of coordinates received from the second microphone array and corresponding to the first point in time, wherein the first set of coordinates are located in a first coordinate system associated with the first microphone array, and the second set of coordinates are located in a second coordinate system associated with the second microphone array; applying, with the at least one processor, a transform function to the second set of coordinates, the transform function configured to transform the second set of coordinates into a transformed second set of coordinates located in the first coordinate system; and determining, with the at least one processor, a location of the second microphone array relative to the first microphone array based on the transformed second set of coordinates. According to some aspects, the method further comprises determining, with the at least one processor, the proximity of the second microphone array to the second audio pick-up region based on the relative location of the second microphone array. According to some aspects the method further comprises calculating the transform function based on the first set of coordinates and the second set of coordinates. According to some aspects, the method further comprises determining a location of a first one of the one or more audio sources relative to the first microphone array based on the first set of coordinates and the transformed second set of coordinates.
Another exemplary embodiment includes an audio system comprising a plurality of microphones disposed in an environment, wherein the plurality of microphones is configured to detect one or more audio sources, and generate location data indicating a location of each of the one or more audio sources relative to the plurality of microphones; and at least one processor communicatively coupled to the plurality of microphones, wherein the at least one processor is configured to receive the location data from the plurality of microphones, and define a plurality of audio pick-up regions in the environment based on the location data, the plurality of audio pick-up regions comprising a first audio pick-up region and a second audio pick-up region, wherein the plurality of microphones are configured to deploy a first lobe within the first audio pick-up region and a second lobe within the second audio pick-up region.
These and other embodiments, and various permutations and aspects, will become apparent and be more fully understood from the following detailed description and accompanying drawings, which set forth illustrative embodiments that are indicative of the various ways in which the principles of the invention may be employed.
Existing techniques for setting up audio coverage areas involve complex, manual tasks. For example, the installer must first determine the exact geometry of the environment and the precise locations of all audio sources therein, including each microphone and loudspeaker in the environment and the anticipated positions of all talkers or human speakers. Typically, the installer obtains this information manually, for example, by taking measurements throughout the room. Next, the installer manually positions or points microphone lobes towards locations where talkers are expected to be in a room (e.g., the seats around a conference tables), adjusts a beam width of each lobe depending on how many talkers are expected to be in the corresponding area (e.g., narrow for single talkers, or medium or wide to cover multiple talkers by a single lobe), tests each lobe for sufficient clarity and presence and a smooth sound level across the entire lobe (e.g., by sitting in the area and talking while listening to the mixed output via headphones), and confirms that only the expected lobe gates on when talkers are seated in correct positions. These steps may need to be repeated after the initial configurations are complete, for example, in order to adapt to changes in room layout, seated locations, audio connections, and other factors, as these changing circumstances may cause the audio system to become sub-optimal over time.
Systems and methods are provided herein for automatically defining and configuring one or more audio coverage areas for an environment to optimally capture audio sources in the environment using a plurality of microphones. The plurality of microphones may be microphone elements or transducers included in a single microphone array, in a plurality of microphone arrays, and/or in one or more other audio devices. Each audio coverage area defines a region in which a given microphone array, or other audio input device, is able to deploy lobes for picking up sound from the audio sources. In some embodiments that include multiple audio coverage areas, the audio coverage areas can be adjacent regions configured to cover the audio sources without overlapping with each other. In embodiments that include multiple microphone arrays, each audio coverage area can be assigned to a specific microphone array, for example, depending on proximity to the audio source. In some embodiments, the audio coverage areas can be used to establish sound zones for voice-lift or other sound reinforcement applications. The plurality of microphones may be part of a larger audio system that is used to facilitate a conferencing operation (such as, e.g., a conference call, telecast, webcast, etc.) or other audio/visual event. The audio system may be configured as an ecosystem comprised of a plurality of audio devices and a computing device that is in communication with each of the audio devices, for example, using a common communication protocol. The audio devices in the audio system may include the plurality of microphones, at least one speaker, and/or one or more conferencing devices. In various embodiments, the computing device comprises at least one processor configured to automatically define the one or more audio coverage areas for the environment using location data (e.g., sound localization data) obtained over time from two or more of the microphones in the audio system. In some embodiments, the at least one processor is also configured to dynamically adapt or re-configure the audio coverage areas as new location data is received from the audio devices. In some embodiments, the at least one processor is further configured to automatically determine a position of a given audio device in the environment using time-synchronized location data received from the same audio device and at least one other audio device in the environment.
Thus, the above techniques, and others described herein, enable an installer to set up and configure audio coverage areas for a given environment, or room, with minimal effort and increased efficiency. For example, as mentioned above, typical room installation methods require manually setting up the audio coverage areas of a room by measuring the precise location of each microphone in the room, the distance from the microphone to a conference table or chair, and other specifications of the room. Moreover, every time the room layout changes, for example, due to changes in seating and/or table arrangement, the installer must repeat these manual tasks to create new audio coverage areas for the new layout. In contrast, the techniques described herein provide improved audio systems and methods for automatically defining and configuring audio coverage areas for the room, so as to require little to no manual measurements or inputs by the installer. For example, once the audio devices are mounted in the room and connected to the system, the installer need only provide sounds in the intended audio pick-up regions over a period of time and the audio system handles the rest, within a fraction of the time. Specifically, the audio system can detect the provided sounds using its microphones, create a “heat map” of the locations of those sounds over the period of time using localization data obtained from the microphones, and define audio coverage areas for the room based on the sound locations in the heat map, all within a matter of minutes. Furthermore, the techniques described herein can be used to identify and remove any spurious and/or erroneous localization data, or other outliers that may be the result of reverb or other undesirable audio effects in the room, thus improving an accuracy of the audio coverage areas. In addition, the systems and methods described herein can automatically configure the audio coverage areas to avoid noise sources in the room and/or loudspeakers used to play far-end audio or other audio signals within the room, thus improving audio performance and acoustic echo cancellation operation of the audio system. Moreover, since little to no manual measurements are required, the techniques described herein can be used to automatically reconfigure or adjust the audio coverage areas as the locations of the audio sources, and/or noise sources, change over time, for example, due to movement of the microphones or other audio devices, changes in room configuration (e.g., re-arrangement of seating, tables, podiums, and other furniture), and the like.
Referring now to
Starting with
The conferencing environment 100 further includes a plurality of microphones 106 for detecting and capturing sound from the audio sources, such as, for example, speech spoken by the human speakers situated in the conferencing environment 100 (e.g., near-end conference participants seated around the table 104), music or other sounds generated by the human speakers, and other near-end sounds associated with the conferencing event. In some embodiments, all or some of the microphones 106 may be disposed in a single microphone array or other audio device, for example, as shown in
Other sounds may also be present in the environment 100 which may be undesirable, such as noise from ventilation, other persons, audio/visual equipment, electronic devices, etc. For example,
The conferencing environment 100 can also include a presentation unit 112 for displaying video, images, or other content associated with the conferencing event, such as, for example, a live video feed of the remote conference participants, a document being presented or shared by one of the participants, a video or film being played as part of the event, etc. In some embodiments, the presentation unit 112 may be a smart board or other interactive display unit. In other embodiments, the presentation unit 112 may be a television, computer monitor, or any other suitable display screen. In still other embodiments, the presentation unit 112 may be a chalkboard, whiteboard, or the like. The presentation unit 112 may be attached to one of the walls, as shown in
As illustrated in
Though not shown, in various embodiments, one or more components of the environment 100 may be combined into one device. For example, in some embodiments, at least one of the microphones 106 and at least one of the speakers 108 may be included in a single device, such as, e.g., a conferencing device or other audio hardware. As another example, in some embodiments, at least one of the speakers 108 and/or at least one of the microphones 106 may be included in the presentation unit 112. In some embodiments, at least one of the microphones 106 and at least one of the speakers 108 may be included in the computing device 114, for example, as native microphone(s) and/or speaker(s) of the computing device 114. It should be appreciated that the conferencing environment 100 may include other devices not shown in
In embodiments, the computing device 114, the plurality of microphones 106, and the one or more speakers 108 form an audio system (such as, e.g., audio system 400 shown in
The audio system may reach the above determination using the plurality of microphones 106 and the computing device 114. For example, the plurality of microphones 106 can be configured to detect one or more of the audio sources and generate location data (also referred to as “sound localization data”) that indicates a position of each audio source relative to the microphones 106. In embodiments, the microphones 106 may include localization software (e.g., localization module 422 shown in
According to various embodiments, the localization coordinates may be Cartesian or rectangular coordinates that represent a location point in three dimensions, or x, y, and z values. For example, the location data may include a first set of coordinates (x1, y1, z1) that represents a location of a first audio source relative to a first subset of the microphones 106 (e.g., two or more microphones included within a given microphone array or other audio device) and a second set of coordinates (x2, y2, z2) that represents a location of the first audio source relative to a second subset of the microphones 106 (e.g., two or more other microphones included within the same microphone array or in a second microphone array or other audio input device). In some cases, the localization coordinates may be converted to polar or spherical coordinates, i.e. azimuth (phi), elevation (theta), and radius (r), for example, using a transformation formula, as is known in the art. The spherical coordinates may be used in various embodiments to determine additional information about the audio system, such as, for example, a distance between the audio source and a given microphone array and/or a distance between two microphone arrays (e.g., as described herein with respect to
In some embodiments, the location data also includes a timestamp or other timing information that indicates the time at which each set of coordinates was generated by the microphones 106, an order in which the coordinates were generated, and/or any other information that helps identify coordinates that were generated simultaneously, or nearly simultaneously, for the same audio source. In some embodiments, the microphones 106 may have synchronized clocks (e.g., using Network Time protocol or the like). In other embodiments, the timing, or simultaneous output, of the coordinates may be determined using other techniques, such as, for example, setting up a time-synchronized data channel for transmitting the localization coordinates from the microphones 106 to the computing device 114 and more.
The computing device 114 can be configured to aggregate or receive the location data from the plurality of microphones 106 over a period of time, and define the audio coverage area 116 based on the received location data. In particular, the computing device 114 can be configured to perform various techniques to identify localization coordinates corresponding to the detected audio sources within the location data, identify one or more clusters, or groupings of closely-adjacent localization coordinates, for example, using a heat map of the localization coordinates (e.g., as shown in
Upon applying these techniques to the environment 100, for example, the computing device 114 may define the audio coverage area 116 shown in
Once the audio coverage area 116 is defined and refined, the audio system may transition from an adaptation (or set-up) phase to a usage phase. In the usage phase, the audio system may set or implement the audio coverage area 116 by deploying microphone lobes in the region defined by the audio coverage area 116. For example, in some embodiments, the computing device 114 may be configured to instruct or cause the plurality of microphones 106 to deploy appropriate lobes in the audio coverage area 116. In other embodiments, the computing device 114 may send information about the audio coverage area 116 (e.g., information describing or defining the boundaries of the area 116) to the audio device(s) that include the microphones 106, and the audio device(s) can be configured to deploy the appropriate microphone lobes within the audio coverage area 116 accordingly. In either case, the microphone lobes may be deployed by providing a set of coordinates that are associated with the desired audio coverage area to a beamformer configured to direct a microphone lobe toward the specified coordinates. In various embodiments, the beamformer may be included in the audio system as part of the computing device 114, as part of one or more of the audio devices that include the microphones 106, as a standalone device that is in communication with the computing device 114 and the microphones 106, or any combination thereof. The beamformer may include any type of beamforming algorithm or other beamforming technology configured to deploy microphone lobes, including, for example, a delay and sum beamforming algorithm, a minimum variance distortionless response (“MVDR”) beamforming algorithm, and more.
In some embodiments, implementation of the audio coverage area 116, and corresponding deployment of the appropriate microphone lobes, may occur automatically, for example, once a threshold number of localization points have been collected and analyzed, or other criteria has been met. In other embodiments, the audio system may include a button, switch, touchscreen, or other user input device for enabling a user (or installer) to enter an input for implementing the audio coverage area 116, or otherwise indicate the end of a set-up or adaptation mode and/or the start of a normal use mode of the audio system. As an example, the user input device may be included on the microphone array that includes the microphones 106, in the computing device 114 (e.g., as part of the user interface), or as a standalone device disposed within the environment 100 and communicatively coupled to the audio system.
The environment 200 also includes multiple components that may be substantially similar to corresponding components of the conferencing environment 100 shown in
As shown in
Moreover, like the computing device 114, the computing device 214 can be configured to select or define an overall size and shape of each of the audio coverage areas 216 and 218 according to a size and shape of the corresponding cluster, as well as, general shape requirements for audio coverage areas (e.g., a requirement that each area be shaped as a square, rectangle, circle, oval, triangle, hexagon or other polygon, or any other shape), thus ensuring optimal coverage of the audio sources and allowing for better audio control and audio performance. For example, in
In addition, the computing device 214 can be further configured to optimize the audio coverage areas 216 and 218 in order to improve acoustic echo cancellation (AEC) operation and overall audio performance. For example, the computing device 214 may be configured to adjust or configure the size and shape of one or more of the audio coverage areas 216 and 218 based on the locations of nearby loudspeakers 208 (which may be used for playing far-end audio), noise source 210 (which may emit undesirable noise), and/or any other sounds in the environment 200 that should not be picked up by the microphones 206. In
Thus, the audio system of the environment 200 can be configured to automatically provide optimal audio coverage of the audio sources disposed around the table 204a. Once the above-described set-up or adaptation mode is complete, the audio system may implement the audio coverage areas 216 and 218 and begin operating in a normal use mode, similar to the audio system of the conferencing environment 100.
The environment 300 also includes multiple components that may be substantially similar to corresponding components of the conferencing environment 100 shown in
As shown in
Like the computing devices 114 and 214, the computing device 314 can be configured to define an overall size and shape of each of the audio coverage areas 316, 318, and 320, according to a size and shape of the corresponding cluster, as well as, general shape requirements for audio coverage areas (e.g., a requirement for each area to be shaped as a square, rectangle, circle, oval, triangle, hexagon or other polygon, or any other shape), to ensure optimal coverage of the audio sources and allow for better audio control and optimal audio performance. In addition, like the computing device 214, the computing device 314 can be further configured to optimize the audio coverage areas 316, 318, and 320 by adjusting the size and shape of the areas to avoid overlap with the locations of any nearby loudspeakers 308, noise source 310, and/or any other sounds that might degrade acoustic echo cancellation (AEC) operation and other audio performance metrics if picked up by the microphone lobes. In some embodiments, the computing device 214 is also configured to refine an accuracy of the audio coverage areas 316, 318, and 320 by identifying any outlier or isolated location points within the clusters and removing those outlier(s) from the corresponding cluster (e.g., outliers 710 shown in
In embodiments that include multiple microphone arrays, for example, as shown in
Once the positions, or relative positions, of the arrays 306 are determined, the audio system can be further configured to assign each audio coverage area to a given one of the microphone arrays based on a proximity of the array to the area. For example, in
Thus, the two microphone arrays 306 can be advantageously employed to provide optimal audio coverage of the audio sources disposed at or around the tables 304. Once the above-described set-up or adaptation mode is complete, the audio system of the environment 300 may implement the audio coverage areas 316, 318, and 320 and begin operating in a normal use mode, like the audio system of the conferencing environment 100.
In various embodiments, the audio system included in each of the environments 100, 200, and 300 (i.e., as shown in
In some embodiments, the computing device 402 can be physically located in and/or dedicated to the given environment or room, for example, as shown in
As shown in
Processor 410 executes instructions retrieved from the memory 412. In embodiments, the memory 412 stores one or more software programs, or sets of instructions, that embody the techniques described herein. When executed by the processor 410, the instructions may cause the computing device 402 to implement or operate all or parts of the techniques described herein, one or more components of the audio system 400, and/or methods, processes, or operations associated therewith, such as, e.g., process 500 shown in
In general, the computing device 402 may be configured to control and communicate or interface with the other hardware devices included in the audio system 400, such as the conferencing device 404, the loudspeaker 406, the microphone 408, and any other devices in the same network. The computing device 402 may also control or interface with certain software components of the audio system 400, such as, for example, a localization module 422 installed or included in one or more of the conferencing device 404 and the microphone 408, in order to receive sound localization coordinates or other location data collected by the audio devices. For example, in some embodiments, the computing device 402 may operate as an aggregator configured to aggregate or collect location data from the appropriate audio devices. In addition, the computing device 402 may be configured to communicate or interface with external components coupled to the audio system 400 (e.g., remote servers, databases, and other devices). For example, the computing device 402 may interface with a component graphical user interface (GUI or CUI) associated with the audio system 400 and any existing or proprietary conferencing software. In addition, the computing device 402 may support one or more third-party controllers and in-room control panels (e.g., volume control, mute, etc.) for controlling one or more of the audio devices in the audio system 400.
Communication interface 414 may be configured to allow the computing device 402 to communicate with one or more devices (or systems) according to one or more protocols, including the above-described communications and protocols. In some embodiments, the communication interface 414 includes one or more wired communication interfaces, such as, for example, an Ethernet port, a high-definition serial-digital-interface (HD-SDI), an audio network interface with universal serial bus (ANI-USB), a high definition media interface (HDMI) port, a USB port, or an audio port (e.g., a 3.5 mm jack, lightning port, etc.). In some embodiments, the communication interface 414 includes one or more wireless communication interfaces, such as, for example, a broadband cellular communication module (e.g., to support 4G technology, 5G technology, or the like), a short-range wireless communication module (e.g., to support Bluetooth technology, Radio Frequency Identification (RFID) technology, Near Field Communication (NFC) technology, or the like), a long-range wireless communication module (e.g., to support Wi-Fi technology or other Internet connection), or any other type of wireless communication module. In some embodiments, communication interface 414 may enable the computing device 402 to transmit information to and receive information from one or more of the conferencing device 404, the loudspeaker 406, and the microphone 408, or other component(s) of the audio system 400. Such information may include, for example, location data (e.g., sound localization coordinates), audio coverage area assignments and parameters (or boundaries), lobe or pick-up pattern information, and more.
In various embodiments, the components or devices of the audio system 400 can use a common communication protocol (or “language”) in order to communicate and convey location data and other information. For example, each component of the audio system 400 may include a communication interface that is similar to, or compatible with, the communication interface 414. In addition, one or more of the audio devices (e.g., conferencing device 404 and/or microphone 408) may include the localization module 422, which is configured to generate localization coordinates for detected audio sources and transmit the coordinates and/or other location data to the computing device 402 via the communication interface 414. In this manner, the components of the audio system 400 can be configured to form a network in which the common communication protocol is used for intra-network communication, including, for example, sending, receiving, and interpreting messages. The common communication protocol can be configured to support direct one-to-one communications between the computing device 402 and each of the other components or devices of the audio system 400 (e.g., conferencing device 404, loudspeaker 406, and/or microphone 408) by providing a specific application programming interface (“API”) to each device. The API may be specific to the device and/or to the function or type of information being gathered from the device via the API. In the illustrated embodiment, for example, an API may be included in the localization module 422 that is installed in each of the conferencing device 404 and the microphone 408.
User interface 416 may facilitate interaction with a user of the computing device 402 and/or audio system 400. As such, the user interface 416 may include input components such as a keyboard, a keypad, a mouse, a touch-sensitive panel, a microphone, and a camera, and output components such as a display screen (which, for example, may be combined with a touch-sensitive panel), a sound speaker, and a haptic feedback system. The user interface 416 may also comprise devices that communicate with inputs or outputs, such as a short-range transceiver (RFID, Bluetooth, etc.), a telephonic interface, a cellular communication port, a router, or other types of network communication equipment. The user interface 416 may be internal to the computing device 402, or may be external and connected wirelessly or via connection cable, such as through a universal serial bus port. In some embodiments, the user interface 416 may include a button, touchscreen, or other input device for receiving a user input for implementing the audio coverage areas defined by the computing device 402, or to otherwise indicate the end of an adaptation or set-up mode of the audio system 400 and/or the beginning of a normal use mode of the audio system 400, as described herein.
Conferencing device 404 may be any type of audio hardware that comprises microphones and/or speakers for facilitating a conference call, webcast, telecast, or other meeting or event. For example, the conferencing device 404 may include, but is not limited to, SHURE MXA310, MX690, MXA910, MXA710, Microflex Wireless, and Microflex Complete Wireless, and the like. In embodiments, the conferencing device 404 may include one or more microphones for capturing near-end audio signals produced by conference participants situated in the conferencing environment (e.g., seated around a conference table). For example, the conferencing device 404 may include a plurality of microphones arranged as an array (i.e. a microphone array), or the like. The conferencing device 404 may also include one or more speakers for broadcasting far-end audio signals received from conference participants situated remotely but connected to the conference through third-party conferencing software or other far-end audio source. In various embodiments, the conferencing device 404 may be a network audio device that is coupled to the computing device 402 via a network cable (e.g., Ethernet) and configured to handle digital audio signals. In other embodiments, the conferencing device 404 may be an analog audio device or another type of digital audio device. While the illustrated embodiment shows one conferencing device 404, it should be appreciated that the audio system 400 may include multiple conferencing devices 404 in other embodiments, for example, as shown in
Loudspeaker 406 may be any type of audio speaker, speaker system, or other audio output device for audibly playing audio signals associated with the conference call, webcast, telecast, or other meeting or event. For example, the loudspeaker 406 may include, but are not limited to, SHURE MXN5W-C and the like. In embodiments, the loudspeaker 406 may be configured to play far-end audio signals associated with the conference call or other event, or sounds produced by the far-end participants of the event (i.e., those not physically present in the conferencing room). In some embodiments, the loudspeaker 406 may be a standalone network audio device that includes a speaker, a native speaker built into a computer, laptop, tablet, mobile device, or other computing device in the audio system 400. In other embodiments, the loudspeaker 406 may be a loudspeaker coupled to the computing device 402 using a wireless or wired connection (e.g., via a Universal Serial Bus (“USB”) port, an HDMI port, a 3.5 mm jack, a lightning port, or other audio port). In some cases, the loudspeaker 406 includes a plurality of audio drivers arranged in an array (i.e., a speaker array). While the illustrated embodiment shows one loudspeaker 406, it should be appreciated that the audio system 400 may include multiple loudspeakers 406 in other embodiments, for example, as shown in
Microphone 408 may be any type of microphone, including one or more microphone transducers (or elements), a microphone array, or other audio input device capable of capturing speech and other sounds associated with the conference call, webcast, telecast, or other meeting or event. For example, the microphone 408 may include, but is not limited to, SHURE MXA310, MX690, MXA910, and the like. In embodiments, the microphone 408 may be configured to capture near-end audio associated with the conference call or other event, or sounds produced by the near-end participants of the event (i.e., those located in the conferencing room). In some embodiments, the microphone 408 may be a standalone network audio device or a native microphone built into a computer, laptop, tablet, mobile device, or other computing device in the audio system 400. In other embodiments, the microphone 408 may be a microphone coupled to the computing device 402 using a wireless or wired connection (e.g., via a Universal Serial Bus (“USB”) port, an HDMI port, a 3.5 mm jack, a lightning port, or other audio port). In some cases, the microphone 408 includes a plurality of microphone transducers arranged in an array (i.e., a microphone array), for example, like the microphone arrays 306a and 306b in
All or portions of the process 500 may be performed by one or more processors and/or other processing devices (e.g., analog to digital converters, encryption chips, etc.) that are within or external to the audio system, including the processor in communication with the plurality of microphones. In addition, one or more other types of components (e.g., memory, input and/or output devices, transmitters, receivers, buffers, drivers, discrete components, logic circuits, etc.) may also be used in conjunction with the processors and/or other processing components to perform any, some, or all of the steps of the process 500. For example, the process 500 may be carried out by a computing device (e.g., computing device 402 of
As shown in
In embodiments, the location data may include successive sound localization coordinates generated over time by localization software (e.g., localization module 422 of
The localization coordinates can also include a timestamp (“T”) or other timing component to provide a time reference for each localization, or otherwise indicate the time or order in which the coordinates were obtained or determined by the corresponding microphones. As an example, the timestamp may be attached to the set of coordinates by creating a quad coordinate (x, y, z, T) (e.g., a 4-tuple). In some embodiments, the location data may be collected during a specific time period, such as while the audio system is operating in a setup mode or other finite period of time. In other embodiments, the location data may be continuously collected or received from the plurality of microphones in order to support an ongoing or constant adaptation mode of the audio system, as described herein. In some embodiments, the timestamp associated with each set of coordinates may be used to determine which localization coordinates are relatively newer or more recently received, for example, where the processor is configured to use localization coordinates that are less than T seconds (or minutes) old to define an audio coverage area. In such cases, the processor may discard any localization coordinates that are older than and/or equal to T seconds, for example.
In some embodiments, the data received from the plurality of microphones also includes information about the type of audio source detected, such as, for example, whether the detected audio is far-end audio or near-end audio, or whether the detected sounds are voice sounds or noise sounds. Once this determination is made, the plurality of microphones may use a pre-established code (e.g., Voice, Noise, Combo, Far-end, Near-end, etc.) to indicate the type of audio in the location data. Such audio type codes may be determined and/or applied using various voice activity detection techniques, classification techniques, etc. In various embodiments, the plurality of microphones can be configured to identify the type of audio by using a voice activity detector (“VAD”) included in the localization module or otherwise accessible to a processor associated with the microphones, or other suitable technique. The voice activity detector may be configured to analyze the structure of a detected audio signal and use energy estimation techniques, zero-crossing count techniques, cepstrum techniques, machine-learning or artificial intelligence methods, cues from video associated with the event, or any other suitable technique to identify or differentiate voice sounds and noise sounds, and/or to identify or differentiate far-end audio and near-end audio. As an example, with respect to near-end audio, the voice activity detector may differentiate between stationary noise and near-end speech or voice based on the energy of the signal, or may differentiate between non-stationary noise and near-end speech using cepstrum techniques, machine-learning or artificial intelligence methods, or the like. In some cases, the localization module, or other processor, may differentiate between far-end audio and near-end audio by comparing the detected audio signal to a far-end reference signal received from the computing device, the processor included therein, the loudspeaker, or other component of the audio system.
In embodiments where the plurality of microphones are included in more than one microphone array and the location data is received from two or more microphone arrays, the process 500 may further include, for example, transforming the localization coordinates received from the second microphone array into a coordinate system of the first microphone array, or otherwise converting the received coordinates to a common coordinate system, so that the position of each detected audio source can be represented in the same coordinate system. As an example, a coordinate-transform-matrix may be used to transform localization coordinates into the common coordinate system. Such transformation may be carried out by the computing device and/or processor once the position of each microphone array within the environment is known, for example, using the process 800 shown in
Step 504 comprises defining, using the processor, a plurality of audio pick-up regions (or audio coverage areas) in the environment based on the received location data. Each audio pick-up region may define an area in which at least one of the one or more audio sources is located. In some embodiments, the plurality of audio pick-up regions comprises a first audio pick-up region and a second audio pick-up region that does not overlap with the first audio pick-up region. For example, the first audio pick-up region may be located adjacent to the second audio pick-up region without overlapping each other (i.e. non-overlapping). In some cases, the two audio pick-up regions may be adjoining, or share a boundary, for example, like the audio coverage areas 216 and 218 shown in
In embodiments, defining the audio pick-up regions at step 504 comprises identifying clusters of adjacent localization coordinates (or location points) within the received location data, and forming a respective audio pick-up region around each cluster. In some embodiments, a clustering algorithm may be used to identify a group of adjacent localization coordinates based on the location data. For example, the clustering algorithm may include a k-means clustering algorithm, a centroid-based clustering algorithm, a density-based clustering algorithm, a grid-based clustering algorithm, any other suitable clustering technique, or any combination thereof. Each group of coordinates may be divided into one or more clusters depending on a size of the group (e.g., a distance from the center of the group to an outer edge of the group), a location of the group relative to the plurality of microphones, a proximity of the group to the audio source, and/or other factors. An audio pick-up region can then be formed, or identified, around each cluster.
In some embodiments, step 504 further includes determining that a given localization coordinate corresponds to a desired audio type before using that coordinate to define audio pick-up regions. For example, the processor may define the plurality of pick-up regions using only the localization coordinates that are identified as voice audio, near-end audio, or other desired audio (e.g., by the audio type codes received with the location data), and may ignore or disregard any localization coordinates that are identified as noise, far-end audio, a combination of voice and noise audio, a combination of near-end and far-end audio, or other undesired audio.
In some embodiments, defining the audio pick-up regions further comprises identifying one or more isolated or outlier location points within the received location data, and removing each outlier location point from the corresponding cluster, prior to creating the audio pick-up regions. As used herein, the term “outlier” refers to location points that may be grouped with a cluster initially but are significantly distant, or isolated, from the other location points, or localization coordinates, in the cluster. For example, a location point may be considered an outlier if it is more than a predetermined distance (e.g., 2 meters (m), 2.5 m, 3 m, etc.) away from the center of any other cluster. In some embodiments, a given location point or set of points may be initially identified as the start of a cluster, but if that cluster does not grow in density over time, the corresponding location point(s) may be re-classified as outlier(s). In some cases, the outliers may be the result of localization error, such as, for example, a consequence of reverb. In other cases, the outliers may represent spurious audio signals detected by the microphones due to other error. Such outliers may skew a shape and/or size of the corresponding cluster if not removed, which may result in less-than-optimal coverage of the audio sources. Thus, the processor can be configured to optimize the one or more clusters by identifying and removing any isolated or spurious location points from the clusters.
To help illustrate the above techniques,
As shown in
In embodiments, the clusters 606 and 608 may be optimized by identifying and removing any isolated or spurious coordinates within the location points 602. For example, as shown in
Referring back to
The overall size and shape of each audio pick-up region may also be determined based on other criteria, e.g., in addition to the size and shape of the corresponding cluster. For example, the audio system may have a preset shape requirement for all audio pick-up regions that may be stored in memory or otherwise accessible to the processor, such as, e.g., a requirement that each area be shaped as a square, rectangle, circle, oval, triangle, hexagon or other polygon, or any other shape. As another example, the size and shape of each audio pick-up region may be determined based on the presence of other regions and/or a known shape of the room or environment. The size and shape of each audio pick-up region may also be selected in order to minimize the total number of audio pick-up regions used for the environment and maximize coverage of the audio sources detected in the environment. For example, the audio pick-up regions may be placed adjacent to each other, or with adjoining boundaries that do not overlap, in order to make sure each audio source is covered by only one of the audio pick-up regions.
In some embodiments, the process 500 further comprises, at step 506, adjusting, using the processor, a boundary of one or more of the audio pick-up regions based on a location of at least one speaker (e.g., loudspeaker 406 of
In some embodiments, the process 500 further comprises, at step 508, adjusting, using the processor, a boundary of one or more of the audio pick-up regions based on a location of at least one noise source (e.g., noise source 210 of
In some embodiments, the processor may use an appropriate cost function or other suitable formula to select a more optimal size and shape for each audio pick-up region. For example, the cost function may weigh or consider a number of parameters, such as an overall size of each cluster, a total number of clusters identified for the location data, the known or determined positions of loudspeakers and/or persistent noise sources within the room, a general shape requirement for audio coverage areas, a requirement to avoid overlap between adjacent audio coverage areas, and/or other constraints. By minimizing the cost function based on these constraints while clustering the location points received in the localization data, the processor can obtain more audio coverage areas for the audio sources detected in the environment.
In some cases, the processor may select the size and shape for each audio coverage area after determining that the received location points meet certain threshold criteria, such as, for example, a minimum number of location points, a maximum number of location points, a preset range for the number of location points, and others. For example, if the number of received location points falls below a minimum threshold, the processor may wait for more location points before beginning the clustering process, so that there is a high enough “heat” to generate the heat map shown in
In some cases, two or more adjacent audio pick-up regions may be merged together if the combined region is more optimal or better satisfies certain threshold criteria (e.g., size and/or shape criteria, minimum or preset number of location points, maximum number of audio pick-up regions per room, etc.). For example, the processor may decide to merge two or more audio pick-up regions that are adjacent to each other based on an optimization of certain thresholds or parameters, such as, for example, a total number of audio pick-up regions (e.g., to avoid tracking too many small, fragmented regions), a total area covered by the audio pick-up regions before and after merging (e.g., to avoid creating a merged region that is too large in overall size), and a distance between the location points in the audio pick-up region and the centroid of that region before and after merging (e.g., to avoid creating enormous audio pick-up regions with location points that are too far away from the center of the merged region to be part of the same “cluster”).
In some cases, the processor may use these parameters to define or determine a merge cost for merging two or more audio pick-up areas and may decide to merge the regions if the merge cost is minimal or can be minimized. As an example, two adjacent regions that are relatively small, rectangular in shape, and approximately the same or similar in size can be merged together with a relatively small merge cost penalty, as the point clusters would be generally centered about the center of the merged region. On the other hand, if two adjacent regions are relatively large in size and are shaped as long, narrow rectangles that extend so as to form an L-shape, merging the regions would incur a relatively large merge-cost penalty, as the merged region would be a very large rectangle with a center that is not centered relative to the point clusters of the individual regions. Similar techniques may be used when deciding whether an existing audio pick-up region should be divided or split into two or more regions, for example, because the single region exceeds certain threshold criteria (e.g., too large in size, too unwieldy in shape, includes too many location points, location points are too spread out, etc.) and/or optimization of the above thresholds or parameters warrants the division.
Referring back to
In embodiments where the plurality of microphones are included in a plurality of microphone arrays, the process 500 can further include assigning each audio pick-up region defined at step 504, and refined at steps 506 and 508, to one of the plurality of microphone arrays. For example, in
To help illustrate the above techniques,
Unlike the plot 600, however, the plot 700 shows location points 702 received from the microphones of two separate microphone arrays 704 and 705 and shows the locations of the microphone arrays 704 and 705 relative to the location data. In some embodiments, the locations of the microphones arrays 704 and 705 may be previously known and stored in a memory. In other embodiments, the locations of the arrays 704 and 705 may be estimated or determined using one or more techniques described herein, such as, for example, process 800 of
As shown in
Once the clusters 706, 708, and 709 are refined, the processor may define or form an audio pick-up region around each, in accordance with step 504 of process 500. In particular, a first audio pick-up region 712 may be formed around the first cluster 706, a second audio pick-up region 714 may be formed around the second cluster 708, and a third audio pick-up region 716 may be formed around the third cluster 709. As shown in
Once the audio pick-up regions 712, 714, and 716 are defined and refined, the processor may assign each of the regions to one of the microphone arrays 704 and 705, in accordance with steps 510 and 512 of process 500. For example, upon determining that the first audio pick-up region 712 is closest to the first microphone array 704, the processor may assign the first audio pick-up region 712 to the first microphone array 704. Upon determining that the second audio pick-up region 714 is closest to the second microphone array 705, the processor may assign the second audio pick-up region 714 to the second microphone array 705. And upon determining that the third audio pick-up region 716 is closest to the first microphone array 704, the processor may assign the third audio pick-up region 716 to the first microphone array 704. Thus, the room may be divided into two sound zones, the “left” zone for placement of lobes from the first microphone array 704 and the “right” zone for placement of lobes from the second microphone array 705. In other embodiments, the processor may be configured to assign a given audio pick-up region to multiple microphone arrays, such that, for example, microphone elements from two different microphone arrays can be used to deploy microphone lobes in the same region.
Referring back to
In some embodiments, one or more of the audio coverage areas may be removed based on the new location points. For example, the new localization data and/or further processing may indicate that a certain cluster of location points includes outliers and/or undesirable noise sources or loudspeaker locations and thus, should not be included in an audio coverage area. As another example, the processor may remove a given audio coverage area upon determining that a loudspeaker of the audio system has been moved to a new location that falls within or overlaps with the given audio coverage area. The movement of the loudspeaker to the new location may be determined by the processor based on new location data received from one or more of the microphones, and using a triangulation technique for identifying the position of the loudspeaker relative to the one or more microphones within the environment, for example, as described with respect to
In some embodiments, the process 500 may be performed during a setup or configuration mode of the audio system during which an installer purposefully applies stimulus, or creates sounds, in the locations where human talkers or other audio sources are expected to be present, or the areas where the installer wants to set up audio pick-up regions, for example, as shown in
In other embodiments, the process 500 may be performed during an offline mode of the audio system during which historical localization data collected over a long period of time may be used to automatically define the audio pick-up regions. For example, the historical data may include localization coordinates generated by the plurality of microphones over time while the room was used for various purposes (e.g., for conference calls or other meeting events). The historical data may also indicate whether the audio sources detected by the microphones represent far-end audio, near-end audio, voice sounds, noise sounds, etc. Once a threshold amount of data is collected, or after a threshold amount of time has passed, the processor may automatically set up the best coverage areas for the collected localization data using the process 500. For example, in embodiments that include two or more microphone arrays, the processor may be configured to calculate the transform function used to estimate the relative positions of the arrays once the location data includes a threshold number (e.g., 50, etc.) of time-synchronized pairs of location points, or localization coordinates that were generated by two different microphone arrays at the same time. As another example, the processor may be configured to calculate the transform function (or complete calculation of the transform function) once the mean squared error, or other measure of estimation error, is below a threshold value (e.g., 10 centimeters (cm), etc.).
In terms of clustering the location points to determine coverage areas, in some embodiments, the processor, or clustering algorithm, may be configured to divide the location points into clusters once, for example, the number of available location points reaches a threshold number (e.g., 500, etc.). In other embodiments, the processor, or clustering algorithm, may be configured to use all historical data that is available, regardless of the exact amount. In still other embodiments, the processor may be configured to stop the clustering algorithm upon determining that a new cluster has not been formed and/or no substantial changes have been made to a geometry, or shape and size, of the existing clusters over a given period of time (e.g., at least one minute, etc.) or for a threshold number (e.g., 50, etc.) of consecutive localization points.
In some embodiments, the process 500 may further include receiving manual adjustments to one or more of the audio pick-up regions or other aspects, for example, via a user input device (e.g., user interface 416 of
In some embodiments, the process 500 further includes a preliminary step of deactivating or removing any pre-existing audio-pick regions before proceeding with step 502, so that a completely new set of audio pick-up regions can be formed based on the most recent location data.
In some embodiments, the process 500 further includes adjusting one or more of the audio pick-up regions to accommodate a pre-existing audio pick-up region stored in a memory (e.g., for automatic adjustment) or provided by a user (e.g., for manual adjustment). For example, a given environment may include a dedicated audio pick-up region centered on a podium, platform, whiteboard/chalkboard, or other designated presentation space in the environment. One or more of the audio pick-up regions automatically determined at steps 504 to 514 may be adjusted by resizing or otherwise changing a boundary, size, and/or shape of the one or more regions, by merging or separating the regions, or any adjustment needed to make room for or accommodate the pre-existing audio pick-up region.
For ease of explanation, the process 800 will be described with reference to
All or portions of the process 800 may be performed by one or more processors and/or other processing devices (e.g., analog to digital converters, encryption chips, etc.) that are within or external to the audio system, including the processor in communication with the plurality of microphones. In addition, one or more other types of components (e.g., memory, input and/or output devices, transmitters, receivers, buffers, drivers, discrete components, logic circuits, etc.) may also be used in conjunction with the processors and/or other processing components to perform any, some, or all the steps of the process 800. For example, the process 800 may be carried out by a computing device (e.g., computing device 402 of
As shown in
Step 804 comprises, based on the timestamps included in the location data, identifying, using the processor, a first set of coordinates received from a first microphone array and corresponding to a first point in time. Further, step 806 comprises, based on said timestamps, identifying a second set of coordinates received from a second microphone array and corresponding to the first point in time. Thus, at steps 804 and 806, the location data received at step 802 may be sorted so that time synchronized (or simultaneous) coordinates can be grouped or paired together. In some embodiments, coordinates that belong to the same audio source but have different timestamps (e.g., T1 and T2) may still be identified as time-synchronized pairs if the timestamps are sufficiently close (e.g., the difference between T1 and T2 is less than a preset threshold).
As an example, in
Step 808 comprises determining or estimating a transform function using the processor, to transform or convert the second set of coordinates identified at step 806 into a coordinate system associated with the first set of coordinates identified at step 804. Step 808 also includes applying the transform function to the second set of coordinates to obtain a transformed second set of coordinates that are in the same coordinate system as the first set of coordinates. In this manner, the localization coordinates received at step 802 from various microphones can be transformed or converted to a common coordinate system, as needed.
For example, the first microphone array 902 may be associated with a first coordinate system whose origin is the center of the first microphone array 902, while the second microphone array 904 may be associated with a second coordinate system whose origin is the center of the second microphone array 904. Thus, the first set of coordinates (x1, y1, z1) and the second set of coordinates (x2, y2, z2) both represent the same location, i.e. the location of the audio source 906, using two different coordinate systems. At step 808, the second set of coordinates (x2, y2, z2) may be transformed into a new set of coordinates within the first coordinate system using the transform function. While
In embodiments, the transform function used to transform the second set of coordinates into the first coordinate system may be a coordinate-change transform matrix, which enables coordinates obtained across different coordinate systems for the same location point (e.g., the audio source 906) to be compared. The transform matrix may use linear translation (“T”) and rotation (“R”) values to transform coordinates from a second coordinate system into coordinates from a first coordinate system. In some embodiments, the transform matrix may be better estimated (e.g., with smaller error) once there is a large enough number of coordinate pairs, or distinct, simultaneous localization points for the same audio source from multiple microphones. In such cases, step 808 includes estimating the transformation matrix once a threshold number of coordinate pairs are collected (e.g., at least four pairs or other minimum), and then applying the estimated matrix to the second set of coordinates. As an example, the coordinate-change transform matrix may be estimated by performing a constraint-based least-squares, or least-mean-squares, adaptive estimation method, or other suitable method. Other techniques for converting or transforming localization coordinates into a common coordinate system may also be used.
Step 810 comprises determining, using the processor, a location of the second microphone array relative to the first microphone array (e.g., as represented by the dotted line arrow in
In embodiments, steps 802 to 810 may be iteratively repeated as new time-synchronized coordinates, or pairs of location points, are received at the processor, or until a convergence is reached, wherein the estimated triangulation error is less than a predefined threshold.
In some embodiments, process 800 further includes, at step 812, determining a distance between the second microphone array and a given audio pick-up region based on the relative location of the second microphone array, and based on said distance, determining a proximity of the second microphone array to the given audio pick-up region. For example, step 812 may be used to determine the proximity of the second microphone array to the second audio pick-up region in step 512 of process 500. In various embodiments, once the position of the second microphone array within the common coordinate system is determined, the distance between the second microphone array and the given audio pick-up region can be compared to the distance between the first microphone array and the same audio pick-up region. This comparison can then be used to determine which microphone array is in closer proximity to the audio pick-up region for microphone assignment purposes, as in steps 510 and 512 of process 500.
Thus, process 800 can provide automatic triangulation of microphone array positions in a room, which may be used to perform automatic setup of audio coverage areas, for example, as shown in
In some cases, the process 800 can be used to improve the localization performance and accuracy of the audio system, for example, as further described below with reference to
In some embodiments, the process 800 may be used to improve an accuracy of the location of a given audio source. For example, in cases where automatic triangulation techniques are used to estimate the relative positions of the microphone arrays, there may be a margin of error in the azimuth, elevation, and/or radius information obtained for an audio source by a given microphone array due to the aperture size of the microphone array. This margin of error may cause, for example, the estimated distance between the audio source and the microphone array to be inaccurate. Accordingly, in various embodiments, the one or more processors may be configured to triangulate, or determine a more precise location for, the audio source by combining multiple localization coordinates obtained for the same audio source by different microphone arrays, after the coordinates have been transformed to the same coordinate system. In such embodiments, the process 800 may further include determining, or refining, a location of a given audio source relative to the first microphone array (e.g., array 902 in
In some embodiments, the automatic triangulation techniques described herein may be used to determine loudspeaker positions within an environment. For example, the position of a loudspeaker within a room may be determined based on sound localization data that is generated by a microphone array, or other audio input device comprising two or more microphones, within the same room, while far-end audio is played from the loudspeaker, for example, as shown in
Referring now to
Referring now to
At a first point in time (T1), the first microphone array 1102 can be configured to localize an active audio source in the room (e.g., the loudspeaker 1104), while simultaneously detecting far-end signal activity in the form of a reference input. This produces a first set of coordinates (x1, y1, z1) associated with the timestamp T1 that represents the position of the loudspeaker 1104 at time T1 relative to the first microphone array 1102. Simultaneously and independently, the second microphone array 1103 can be configured to localize the same active audio source (e.g., the loudspeaker 1104), while simultaneously detecting far-end signal activity in the form of a reference input. This produces a second set of coordinates (x2, y2, z2) associated with the timestamp T2 that represents the position of the loudspeaker 1104 at time T1 relative to the second microphone array 1103. The first and second microphone arrays 1102 and 1103 may use localization software to generate the coordinates, as described herein.
The position of the second microphone array 1103 relative to the first microphone array 1102 (e.g., as represented by the dotted line arrow in
Referring now to
In embodiments, during a setup mode, far-end signals may be played by the first and second loudspeakers 1204 and 1205, one at a time, while the same far-end signal is provided, as a reference signal or input, to each of the microphone arrays 1202 and 1203 and to the loudspeaker 1204 or 1205 that is not playing audio at that time. While the first loudspeaker 1204 is playing the far-end signal, the two microphone arrays 1202 and 1203 may be used to localize the first loudspeaker 1204 or determine sound localization coordinates for the first loudspeaker 1204, and the resulting coordinates may be used to estimate or triangulate the position of the first loudspeaker 1204 relative to the first array 1202, for example, using the techniques shown in
In some cases, such as, for example, after the setup mode (i.e. during a long term adaptation mode or during a normal use mode of the audio system), each of the microphone arrays 1202 and 1203 may localize a different one of the loudspeakers 1204 and 1205, such as, for example, the loudspeaker that is closer to the location of the particular microphone array. For example, at a first point in time (T1), the first microphone array 1202 may localize an active audio source in the room (i.e. the first loudspeaker 1204), while simultaneously detecting far-end signal activity in the form of a reference input, for example, using the same techniques as in
The position of the second microphone array 1203 relative to the first microphone array 1202 may be previously known or may have been automatically determined or triangulated during set-up mode and/or using, for example, process 800 of
According to embodiments, the GUI 1300 is further configured to graphically and animatedly represent one or more audio pick-up regions (or audio coverage areas) using coverage icon(s) 1306 that correspondingly change in appearance as the audio pick-up regions are dynamically formed, in real time or near real time, using the process 500 shown in
In embodiments, the GUI 1300 may be used by an installer (or user) during the set-up mode of the audio system to automatically create one or more audio pick-up regions at expected talker locations, or other selected locations in the environment 1400, as described herein. For example,
As an example,
In some embodiments, the GUI 1300 may be interactive or otherwise configured to allow a user to manually refine or adjust a selected audio pick-up region, for example, by resizing, reshaping, moving, or otherwise changing a look and/or position of the corresponding coverage icon 1306, or by entering new values for one or more parameters of the selected audio pick-up region via a user interface of the computing device (e.g., user interface 416 of
Referring back to
Any of the memories or memory devices described herein, such as, e.g., memory 412, may be volatile memory (e.g., RAM including non-volatile RAM, magnetic RAM, ferroelectric RAM, etc.), non-volatile memory (e.g., disk memory, FLASH memory, EPROMs, EEPROMs, memristor-based non-volatile solid-state memory, etc.), unalterable memory (e.g., EPROMs), read-only memory, and/or high-capacity storage devices (e.g., hard drives, solid state drives, etc.). In some examples, memory 412, and/or any other memory described herein, includes multiple kinds of memory, particularly volatile memory and non-volatile memory.
Moreover, any of the memories described herein (e.g., memory 412) may be computer readable media on which one or more sets of instructions, such as the software for operating the techniques described herein, can be embedded. The instructions may reside completely, or at least partially, within any one or more of the memory, the computer readable medium, and/or within one or more processors (e.g., processor 410) during execution of the instructions. In some embodiments, memory 412, and/or any other memory described herein, may include one or more data storage devices configured for implementation of a persistent storage for data that needs to be stored and recalled by the end user, such as, e.g., location data received from one or more audio devices, prestored location data or coordinates indicating a known location of one or more audio devices, and more. In such cases, the data storage device(s) may save data in flash memory or other memory devices. In some embodiments, the data storage device(s) can be implemented using, for example, SQLite data base, UnQLite, Berkeley DB, BangDB, or the like.
In some embodiments, any of the computing devices described herein, such as, e.g., the computing device 402, may include one or more components configured to facilitate a conference call, meeting, classroom, or other event and/or process audio signals associated therewith to improve an audio quality of the event. For example, in various embodiments, the computing device 402, and/or any other computing device described herein, may comprise a digital signal processor (“DSP”) configured to process the audio signals received from the various audio sources using, for example, automatic mixing, matrix mixing, delay, compressor, parametric equalizer (“PEQ”) functionalities, acoustic echo cancellation, and more. In other embodiments, the DSP may be a standalone device operatively coupled or connected to the computing device using a wired or wireless connection. One exemplary embodiment of the DSP, when implemented in hardware, is the P300 IntelliMix Audio Conferencing Processor from SHURE, the user manual for which is incorporated by reference in its entirety herein. As further explained in the P300 manual, this audio conferencing processor includes algorithms optimized for audio/video conferencing applications and for providing a high quality audio experience, including eight channels of acoustic echo cancellation, noise reduction and automatic gain control. Another exemplary embodiment of the DSP, when implemented in software, is the IntelliMix Room from SHURE, the user guide for which is incorporated by reference in its entirety herein. As further explained in the IntelliMix Room user guide, this DSP software is configured to optimize the performance of networked microphones with audio and video conferencing software and is designed to run on the same computer as the conferencing software. In other embodiments, other types of audio processors, digital signal processors, and/or DSP software components may be used to carry out one or more of audio processing techniques described herein, as will be appreciated.
Various components of the computing device 402, and/or any other computing device described herein, may be implemented in hardware (e.g., discrete logic circuits, application specific integrated circuits (ASIC), programmable gate arrays (PGA), field programmable gate arrays (FPGA), etc.), using software (e.g., program modules comprising software instructions executable by a processor), or through a combination of both. For example, some or all components of the computing device 402, and/or any other computing device described herein, may use discrete circuitry devices and/or use a processor (e.g., audio processor, digital signal processor, or other processor) executing program code stored in a memory, the program code being configured to carry out one or more processes or operations described herein. In embodiments, all or portions of the processes may be performed by one or more processors and/or other processing devices (e.g., analog to digital converters, encryption chips, etc.) within or external to the computing device 402. In addition, one or more other types of components (e.g., memory, input and/or output devices, transmitters, receivers, buffers, drivers, discrete components, logic circuits, etc.) may also be utilized in conjunction with the processors and/or other processing components to perform any, some, or all of the operations described herein. For example, in
Moreover, the computing device 402, and/or any of the other computing devices described herein, may also comprise various other software modules or applications (not shown) configured to facilitate and/or control the conferencing event, such as, for example, internal or proprietary conferencing software and/or third-party conferencing software (e.g., Microsoft Skype, Microsoft Teams, Bluejeans, Cisco WebEx, GoToMeeting, Zoom, Join.me, etc.). Such software applications may be stored in the memory (e.g., memory 412) of the computing device and/or may be stored on a remote server (e.g., on premises or as part of a cloud computing network) and accessed by the computing device via a network connection. Some software applications may be configured as a distributed cloud-based software with one or more portions of the application residing in the computing device (e.g., computing device 402) and one or more other portions residing in a cloud computing network. One or more of the software applications may reside in an external network, such as a cloud computing network. In some embodiments, access to one or more of the software applications may be via a web-portal architecture, or otherwise provided as Software as a Service (SaaS).
It should be understood that examples disclosed herein may refer to computing devices and/or systems having components that may or may not be physically located in proximity to each other. Certain embodiments may take the form of cloud based systems or devices, and the term “computing device” should be understood to include distributed systems and devices (such as those based on the cloud), as well as software, firmware, and other components configured to carry out one or more of the functions described herein. Further, as noted above, one or more features of the computing device may be physically remote (e.g., a standalone microphone) and may be communicatively coupled to the computing device.
The terms “non-transitory computer-readable medium” and “computer-readable medium” include a single medium or multiple media, such as a centralized or distributed database, and/or associated caches and servers that store one or more sets of instructions. Further, the terms “non-transitory computer-readable medium” and “computer-readable medium” include any tangible medium that is capable of storing, encoding or carrying a set of instructions for execution by a processor or that cause a system to perform any one or more of the methods or operations disclosed herein. As used herein, the term “computer readable medium” is expressly defined to include any type of computer readable storage device and/or storage disk and to exclude propagating signals.
Any process descriptions or blocks in the figures, such as, e.g.,
Further, it should be noted that in the description and drawings, like or substantially similar elements may be labeled with the same reference numerals. However, sometimes these elements may be labeled with differing numbers, such as, for example, in cases where such labeling facilitates a more clear description. In addition, system components can be variously arranged, as is known in the art. Also, the drawings set forth herein are not necessarily drawn to scale, and in some instances, proportions may be exaggerated to more clearly depict certain features and/or related elements may be omitted to emphasize and clearly illustrate the novel features described herein. Such labeling and drawing practices do not necessarily implicate an underlying substantive purpose. The above description is intended to be taken as a whole and interpreted in accordance with the principles taught herein and understood to one of ordinary skill in the art.
In this disclosure, the use of the disjunctive is intended to include the conjunctive. The use of definite or indefinite articles is not intended to indicate cardinality. In particular, a reference to “the” object or “a” and “an” object is intended to also denote one of a possible plurality of such objects.
Moreover, this disclosure is intended to explain how to fashion and use various embodiments in accordance with the technology rather than to limit the true, intended, and fair scope and spirit thereof. The foregoing description is not intended to be exhaustive or to be limited to the precise forms disclosed. Modifications or variations are possible in light of the above teachings. The embodiment(s) were chosen and described to provide the best illustration of the principle of the described technology and its practical application, and to enable one of ordinary skill in the art to utilize the technology in various embodiments and with various modifications as are suited to the particular use contemplated. All such modifications and variations are within the scope of the embodiments as determined by the appended claims, which may be amended during the pendency of the application for patent, and all equivalents thereof, when interpreted in accordance with the breadth to which they are fairly, legally and equitably entitled.
This application claims priority to U.S. Provisional Patent Application No. 63/266,553, filed on Jan. 7, 2022, the entirety of which is incorporated by reference herein.
Number | Date | Country | |
---|---|---|---|
63266553 | Jan 2022 | US |