1. Field of the Embodiments of the Invention
Embodiments of the present invention relate generally to human-device interfaces and, more specifically, to techniques for generating multiple listening environments via auditory devices.
2. Description of the Related Art
In various situations, people often find a need or desire to engage in a private conversation while in the presence of one or more other people. For example, and without limitation, a person participating in a conference meeting could receive an important phone call during the meeting. In order to prevent disruption of the meeting, the person could choose to physically leave the room or not take the call. In another example, and without limitation, a person riding in a vehicle could desire to initiate a telephone call while maintaining privacy with respect to other passengers or to avoid disrupting conversation among the other passengers. In such a case, the person could initiate the call and speak in a hushed voice or defer the call until a later time when the call could be made in private. In yet another example, and without limitation, the main conversation in a group meeting could give rise to a need for a sidebar meeting among a subset of the group meeting participants. In such a case, the subset of participants could adjourn to another meeting room, if another meeting room is available, or could defer the sidebar meeting until later.
One potential problem with these approaches is that an important or necessary conversation may be detrimentally deferred until a later time, or the main conversation may be disrupted by the second conversation. Another potential problem with these approaches is that the second conversation may not enjoy the desired level of privacy or may be conducted in whispers, making the conversation difficult to understand by the participants.
As the foregoing illustrates, a new technique to accommodate multiple conversations simultaneously would be useful.
One or more embodiments set forth include a computing device that includes a wireless network interface and a processor. The processor is configured to receive, via a microphone, a first auditory signal that includes a first plurality of voice components. The processor is further configured to receive a request to at least partially suppress a first voice component included in the first plurality of voice components. The processor is further configured to generate a second auditory signal that includes the first plurality of voice components with the first voice component at least partially suppressed. The processor is further configured to transmit the second auditory signal to a speaker for output.
Other embodiments include, without limitation, a computer-readable medium that includes instructions that enable a processing unit to implement one or more aspects of the disclosed methods. Other embodiments include, without limitation, a method to implement one or more aspects of the disclosed methods as well as a computing system configured to implement one or more aspects of the disclosed methods.
At least one advantage of the approach described herein is that participants in a group may engage in multiple conversations while maintaining appropriate privacy for each conversation and reducing or eliminating disruption to other conversations. As a result, important conversations are not deferred and multiple conversations are accommodated without the need to find separate physical space to accommodate each separate conversation.
So that the manner in which the recited features of the one more embodiments set forth above can be understood in detail, a more particular description of the one or more embodiments, briefly summarized above, may be had by reference to certain specific embodiments, some of which are illustrated in the appended drawings. It is to be noted, however, that the appended drawings illustrate only typical embodiments and are therefore not to be considered limiting of its scope in any manner, for the scope of the invention subsumes other embodiments as well.
In the following description, numerous specific details are set forth to provide a more thorough understanding of certain specific embodiments. However, it will be apparent to one of skill in the art that other embodiments may be practiced without one or more of these specific details or with additional specific details.
Microphone module 110 may be any technically feasible type of device configured to receive audio signals via a microphone and transducer the audio signals into machine readable form. Microphone module 110 is configured to receive audio signals from the physical environment and transduce those audio signals for further processing by processing unit 120 for processing, as described in greater detail below. The audio signals may include spoken voices from various participants in a meeting or other physical space as well as environmental audio sources such as background noise, music, street sounds, etc.
Processing unit 120 may be any technically feasible unit configured to process data and execute software applications, including, for example, and without limitation, a central processing unit (CPU), digital signal processor (DSP), or an application-specific integrated circuit (ASIC). Input devices 125 may include, for example, and without limitation, devices configured to receive input (such as, one or more buttons, without limitation). Certain functions or features related to an application executed by processing unit 120 may be accessed by actuating an input device 125, such as by pressing a button. As further described herein, processing unit 120 is operable to generate one or more audio groups or conversation “bubbles” to fully or partially isolate various users from each other.
Speaker module 140 may be any technically feasible type of device configured to receive audio signal, and generate a corresponding signal capable of driving one or more loudspeakers or speaker devices. The audio signal may be the audio input signal received by microphone module 110, or may be an audio signal generated by processing unit 120. The audio signal received from processing unit 120 may be an alternative version of the audio input signal received by microphone unit 110, but with one or more voices suppressed.
Wireless transceiver 130 may be any technically feasible device configured to establish wireless communication links with other wireless devices, including, without limitation, a WiFi™ transceiver, a Bluetooth transceiver, an RF transceiver, and so forth. Wireless transceiver 130 is configured to establish wireless links with other auditory scene controllers and a central communications controller, as further described herein.
Memory unit 150 may be any technically feasible unit configured to store data and program code, including, for example, and without limitation, a random access memory (RAM) module or a hard disk, without limitation. Auditory scene application 152 within memory unit 150 may be executed by processing unit 120 in order to generate one or more listening environments, also referred to herein as auditory scenes. An auditory scene represents a listening environment within which at least one voice component corresponding to a particular person is suppressed being heard either by individuals inside the auditory scene or by people outside of the auditory scene. In one example, and without limitation, an auditory scene that includes one person could be generated such that no one else hears the person's voice. In another example, and without limitation, an auditory scene that includes one person could be generated such that the person does not hear anyone else's voice. In another example, and without limitation, an auditory scene that includes one person could be generated such that no one else hears the person's voice, and, simultaneously, the person simultaneously does not hear anyone else's voice. In yet another example, any number of auditory scenes may be generated, where each auditory scene includes any number of people, and each auditory scene suppresses various voices are prevented leaving or entering each auditory scene. In this manner, auditory scenes are very customizable and configurable. Accordingly, the auditory scenes described herein are merely exemplary and do not limit the scope of possible auditory scenes that may be generated, within the scope of this disclosure.
When generating auditory scenes, software application 152 may implement a wide variety of different audio processing algorithms to analyze and parse frequency and amplitude data associated with an audio input signal. Such algorithms are operable to suppress one or more voices from the input audio signal by one or more techniques.
In one example, and without limitation, processing unit 120 executing an auditory scene application 152 could determine a portion of the audio input signal corresponding to the one or more voices to be suppressed, generate an inversion audio signal representing the inverse signal corresponding to the one or more voices, and mix the inversion signal with the original audio input signal. In another example, and without limitation, processing unit 120 executing auditory scene application 152 could digitally receive a signal from the auditory scene controller of another user, where the received signal represents the original or inverted voice of the associated user as captured, for example, and without limitation, by the corresponding microphone module. Processing unit 120 would then invert the received signal, as appropriate, and mix the received signal with the audio input signal from microphone module 110. In yet another example, and without limitation, processing unit 120 executing an auditory scene application 152 could receive timing information from the auditory scene controller of another user, identifying when the associated user is speaking or is silent. Processing unit 120 processes the received timing information to determine time intervals during which processing unit 120 unit suppresses the audio input signal from microphone module 110. Auditory scene application 152 is configured to then transmit the processed audio signal to speaker module 140.
Persons skilled in the art will understand that the specific implementation of auditory scene controller 100 shown in
In this configuration, auditory scene controllers 220, 222, and 224 communicate directly with each other in a peer-to-peer fashion without a central communications controller. Consequently, in response to an action of user 210, such as a button press, auditory scene controller 220 transmits a request to auditory scene controllers 222 and 224 to suppress to voice of user 210. In response, auditory scene controllers 222 and 224 suppress the voice of user 210 so that users 212 and 214 cannot hear user 210. In response to a second action of user 210, such as another button press, auditory scene controller 220 transmits a request to auditory scene controllers 222 and 224 to discontinue suppressing to voice of user 210. In response, auditory scene controllers 222 and 224 discontinue suppressing the voice of user 210 so that users 212 and 214 can again hear user 210.
In this configuration, auditory scene controllers 220, 222, and 224 communicate with each other via central communications controller 240. Central communications controller 240 may be embodied within any technically feasible computing device. Each auditory scene controllers 220, 222, and 224 central communications controller 240. As appropriate, central communications controller 240 forwards communications received from auditory scene controllers 220, 222, and 224 to other auditory scene controllers 220, 222, and 224. In addition, central communications controller 240 may initiate communications directed to auditory scene controllers 220, 222, and 224.
Consequently, in response to an action of user 210, such as a button press, auditory scene controller 220 transmits a request to communication controller 240 to suppress to voice of user 210. In response, communication controller 240 forwards the request to auditory scene controllers 222 and 224. Auditory scene controllers 222 and 224 suppress the voice of user 210 so that users 212 and 214 cannot hear user 210. In response to a second action of user 210, such as another button press, auditory scene controller 220 transmits a request to communication controller 240 to discontinue suppressing the voice of user 210. In response, communication controller 240 forwards the request to auditory scene controllers 222 and 224. Auditory scene controllers 222 and 224 discontinue suppressing the voice of user 210 so that users 212 and 214 can again hear user 210.
As shown, user interface device 250 is a smartphone associated with user 210, user interface device 252 is a laptop computer associated with user 210, and user interface device 254 is a tablet computer associated with user 210. Alternatively, various users may be associated with any technically feasible user interface devices, in any combination, including, without limitation, attached to the wearer's glasses, attached to the wearer's necklace or “amulet device,” on a wristwatch or a wrist bracelet, embedded into a head band or head ring, attached to an article of clothing or belt buckle, a device attached to or worn anywhere on a user's body, an accessory attached to the user's smartphone or table computer, and attached to a vehicle associated with the user, such as a bicycle or motorcycle.
In the configuration of
Consequently, in response to an action of user 210, such as selecting a function on an application executing on user interface device 250, user interface device 250 transmits a request to communication controller 240 to suppress to voice of user 210. In response, communication controller 240 forwards the request to auditory scene controllers 222 and 224. Auditory scene controllers 222 and 224 suppress the voice of user 210 so that users 212 and 214 cannot hear user 210. In response to a second action of user 210, such as selecting a function on an application executing on user interface device 250, user interface device 250 transmits a request to communication controller 240 to discontinue suppressing the voice of user 210. In response, communication controller 240 forwards the request to auditory scene controllers 222 and 224. Auditory scene controllers 222 and 224 discontinue suppressing the voice of user 210 so that users 212 and 214 can again hear user 210.
HDLs 260 are loudspeakers that generate sound wave patterns with a relatively high degree of directivity (narrowness), rather than the more typical omnidirectional sound wave pattern generated by conventional loudspeakers. Consequently, a given HDL 260 may direct sound at a particular listener, such that the listener hears the sound generated by the HDL 260, but another person sitting just to the left or just to the right of the listener does not hear the sound generated by the HDL 260. For example, and without limitation, HDL 260(1) and HDL 260(2) could be configured to direct sound at the right ear and left ear, respectively, of user 210. HDL 260(5) and HDL 260(6) could be configured to direct sound at the right ear and left ear, respectively, of user 212. HDL 260(10) and HDL 260(11) could be configured to direct sound at the right ear and left ear, respectively, of user 214. Although fourteen HDLs 260(0)-260(13) are shown, any technically feasible quantity of HDLs 260 may be employed, to accommodate any technically feasible quantity of users 210, 212, and 214, within the scope of this disclosure.
The various components of
As shown, the functionality of auditory scene controller 100 may be incorporated into a wearable device that may be worn or carried by a user. In one embodiment, auditory scene controller 100 may be incorporated into an in-ear device worn by the user. In alternative embodiments, the functionality of auditory scene controller 100 may be incorporated into a head-mounted auditory device that includes at least one of a microphone and a speaker, including, for example and without limitation, a Bluetooth headset, shoulder worn speakers, headphones, ear buds, hearing aids, in-ear monitors, speakers embedded into a headrest, or any other device with having the same effect or functionality. Auditory scene controller 100 may be coupled to a device that includes a user interface for configuring auditory scenes, including, without limitation, a smartphone, a computer, and a tablet computer. Auditory scene controller 100 may be coupled to such a device via any technically feasible approach, including, without limitation, wireless link, a hardwired connection, and a network connection. Wireless links may be made via any technically feasible wireless communication link, including, without limitation, a WiFi™ link, a Bluetooth connection, or a generic radio frequency (RF) connection. In practice, auditory scene controller 100 may establish a communication link with a wide range of different wireless devices beyond those illustrated. The specific devices 250, 252, and 254 illustrated in
In the configuration of
In the configuration of
In the configuration of
In the configuration of
In the configuration of
Persons skilled in the art will understand that the exemplary use-case scenarios described above in conjunction with
Having described various use cases and systems for generating various configurations of auditory scenes, exemplary algorithms that may be implemented by auditory scene controller 100 are now described. By implementing the functionality described thus far, auditory scene controller 100 may improve the ability of individuals to simultaneously conduct various conversations in the same space without interfering with each other.
As shown, a method 800 begins at step 802, where auditory scene controller 100 discovers nearby wireless devices, including, without limitation, other auditory scene controllers and a central communications controller. Auditory scene controller 100 may perform any technically feasible form of device discovery, including, and without limitation, locating a WiFi™ access point and then identifying other devices coupled thereto, interacting directly with nearby Bluetooth devices, or performing generic handshaking with wireless devices using RF signals.
At step 804, auditory scene controller 100 obtains device information from each discovered device that reflects, among other things, device capabilities. The capabilities could include, for example, and without limitation, a preferred wireless connection protocol (e.g., WiFi™, Bluetooth, without limitation), a maximum quantity of auditory scenes supported by the device, and so forth. Other device information could include, for example, and without limitation, a device position, a device battery level, etc.
At step 806, auditory scene controller 100 pairs with one or more of the discovered devices. In doing so, auditory scene controller 100 may rely on any relevant protocol. In addition, auditory scene controller 100 may pair with different devices that rely on different protocols.
At step 808, auditory scene controller 100 configures command routing preferences for paired devices, as needed. In doing so, auditory scene controller 100 may communicate directly with other auditory scene controllers in a peer-to-peer network. Alternatively, auditory scene controller 100, along with other auditory scene controllers, communicates directly with only central communications controller 240, and central communications controller communicates with each of the auditory scene controllers separately.
As shown, a method 900 begins at step 902, where auditory scene controller 100 initializes to a state where the audio input signal received from microphone module 110 is transmitted to speaker module 140 without alteration. At step 904, the auditory scene controller 100 receives an audio scene request, for example, and without limitation, by receiving a request from another auditory scene controller or from communications module 140.
At step 906, auditory scene controller 100 determines whether the audio scene request was a request to suppress an audio voice component, such as a voice associated with another auditory scene controller. If the audio scene request is a voice suppress request, then the method 900 proceeds to step 908, where auditory scene controller 100 generates an audio signal that includes the received audio input signal with the requested voice component suppressed. At step 910, auditory scene controller 100 transmits the generated audio signal to loudspeaker module 140. The method 900 then process to step 904, described above.
If, at step 906, the audio scene request is not a voice suppress request, then the method 900 proceeds to step 912, where auditory scene controller 100 determines whether the audio scene request was a request to discontinue suppressing an audio voice component, such as a voice associated with another auditory scene controller. If the audio scene request is a stop voice suppress request, then the method 900 proceeds to step 914, where auditory scene controller 100 generates an audio signal that includes the received audio input signal and with requested voice component mixed back into the signal. At step 916, auditory scene controller 100 transmits the generated audio signal to loudspeaker module 140. The method 900 then process to step 904, described above.
If, at step 906, the audio scene request is not a stop voice suppress request, then the method 900 proceeds to step 904, described above.
In sum, an auditory scene controller is configured to generate multiple auditory scenes in a physical environment. The auditory scene controller can bidirectionally isolate a user of the auditory signal by suppressing all voices in the incoming audio signal and sending a request to other auditory scene controller to suppress the user's voice to disallow the user's voice to be heard by other users. Alternatively, the auditory scene controller can unidirectionally isolate a user of the auditory signal by suppressing all voices in the incoming audio signal, but allowing the user's voice to be heard by other users. Alternatively, the auditory scene controller can unidirectionally isolate a user of the auditory signal by allowing all voices in the incoming audio signal to be heard by the user, but sending a request to other auditory scene controller to suppress the user's voice to disallow the user's voice to be heard by other users. Conversational bubbles may be generated to allow a subgroup of several people to converse with each other in the subgroup, but be isolated from the conversation of other users in the main group.
At least one advantage of the approach described herein is that participants in a group may engage in multiple conversations while maintaining appropriate privacy for each conversation and reducing or eliminating disruption to other conversations. As a result, important conversations are not deferred and multiple conversations are accommodated without the need to find separate physical space to accommodate each separate conversation.
One embodiment of the invention may be implemented as a program product for use with a computer system. The program(s) of the program product define functions of the embodiments (including the methods described herein) and can be contained on a variety of computer-readable storage media. Illustrative computer-readable storage media include, but are not limited to: (i) non-writable storage media (e.g., read-only memory devices within a computer such as compact disc read only memory (CD-ROM) disks readable by a CD-ROM drive, flash memory, read only memory (ROM) chips or any type of solid-state non-volatile semiconductor memory) on which information is permanently stored; and (ii) writable storage media (e.g., floppy disks within a diskette drive or hard-disk drive or any type of solid-state random-access semiconductor memory) on which alterable information is stored.
The invention has been described above with reference to specific embodiments. Persons of ordinary skill in the art, however, will understand that various modifications and changes may be made thereto without departing from the broader spirit and scope of the invention as set forth in the appended claims. The foregoing description and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.
Therefore, the scope of embodiments of the present invention is set forth in the claims that follow.