Noise reduction is a process of removing or reducing background noise from a sound signal such that a desired sound can be more noticeable. For example, a desired sound may be a conversation with another person or music played via a speaker or headphone. The desired sound, however, can sometimes be obscured or even rendered inaudible due to background noises. Examples of background noises can include sounds from traffic, alarms, power tools, air conditioning, or other sound sources. By reducing or removing background noises, a desired sound can be more readily detected, especially by people who are hearing impaired.
This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
Various techniques have been developed to reduce or remove background noises from a sound signal. For example, certain hearing aids can detect and remove background noises at certain frequencies via spectral extraction, non-linear processing, finite impulse response filtering, or other suitable techniques. By applying such techniques, background noises can be suppressed or attenuated to emphasize human speech. In another example, a noise canceling headphone can detect ambient noises (e.g., sounds from refrigerators, fans, etc.) from outside the headphone using one or more microphones. The detected ambient noises can then be removed or suppressed by applying corresponding sound waves with opposite amplitudes. As such, music, conversations, or other suitable sound played through the headphone can be heard without interference from the ambient noises.
The foregoing techniques for attenuating background noises, however, have certain drawbacks. For instance, removing background noises from a detected sound single may also remove important information contained in the background noises. In one example, background noises from a detected sound signal may contain sounds of an alarm, a door knock, an emergency siren, an approaching vehicle, etc. In another example, a person wearing a noise canceling headphone may not notice someone is calling his/her name or is shouting out a warning about on-coming traffic or other dangers. As such, removing background noises can render a person less aware of his/her environment, and thus negatively impact his/her safety, interactions with other people, or other aspects of the person's daily life.
Several embodiments of the disclosed technology can address at lease certain aspects of the foregoing drawbacks by implementing intelligent information capturing in a sound device. In one embodiment, a sound device can be a hearing aid suitable for improving hearing ability of a person with hearing impairment. In other embodiments, the sound device can also include a noise canceling headphone, a noise isolating headphone, or other suitable types of listening device. In some embodiments, the sound device can include one or more microphones, one or more speakers, a processor, and a memory containing data representing a set of sound models. The processor of the sound device can be configured to execute instructions to perform intelligent information capturing based on the sound models, as described in more detail below.
In certain embodiments, the microphones of the sound device can be configured to capture a sound signal from an environment in which the sound device is located. The captured sound signal is referred to herein as an original sound and can have a frequency range, such as from about 100 Hz to about 8000 Hz, from about 600 Hz to about 1600 Hz, or other suitable values. In certain implementations, the original sound can be divided into a number of frequency bands, for instance, ten to fifteen frequency bands from about 100 Hz to about 8000 Hz. The original sound can then be digitized, for instance, by converting an analog signal from the microphones at each frequency band (or in other suitable manners) into a digital signal (referred to herein as a “digitized signal”) using an analog-to-digital converter (ADC). The digitized signal can then be compared with one or more sound models stored at the memory of the sound device or otherwise accessible by the sound device via, for instance, a computer network such as the Internet.
The sound models can individually include an identification of a sound, one or more corresponding sound signature(s) of the sound, and one or more corresponding actions. For instance, one example sound model can identify a known sound of an approaching vehicle. Another example sound model can identify a sound of an emergency siren or an alarm. A further example sound model can identify human speech. Example sound signatures can include values, value ranges, or patterns of frequency, frequency distribution, sound amplitude at frequency bands, frequency/amplitude variations (e.g., repetitions, attenuations, etc.), and/or other suitable parameters of the corresponding sound.
The sound signatures can be developed according to various suitable techniques. In certain implementations, a model developer can be configured to develop the sound signatures from a training dataset. For instance, a sample sound (e.g., a sound from an approaching vehicle) can be captured using one or more microphones and then digitized using an ADC into a training dataset. According to one example technique, the model developer can then treat frequency spectra of the training dataset as vectors in a high-dimensional frequency feature domain. In such a domain, a vector distribution, e.g., a mean frequency vector of the training dataset can be calculated and then subtracted from each vector in the training dataset. To capture variation of the frequency vectors within the training dataset, eigenvectors of the covariance matrix of a zero-mean-adjusted training dataset can be calculated. The eigenvectors can represent principal components of the vector distribution. For each eigenvector, a corresponding eigenvalue indicates an importance level of the eigenvector in capturing the vector distribution. Thus, for each training dataset, a mean vector and corresponding most important eigenvectors together can represent a sound signature of the sound of the approaching vehicle.
During operation, when a new sound (not in the training dataset) is detected, the processor of the sound device can be configured to compare a spectrum vector of the captured new sound against the mean vector of the sound model. A difference vector can then be projected into principal component directions to find a residual vector. The coefficients of the residual vector can then be used to identify whether the new sound is a sound from a vehicle as represented in the training dataset. For example, a magnitude of the residual vector can measure the extent to which the captured new sound deviates from that in the sound model. In certain embodiments, if the magnitude of the residual vector is below a preset threshold, the sound device can indicate that the captured new sound matches that in the training dataset. In other embodiments, the captured new sound can be deemed matching the sound in the training dataset based on other suitable criteria.
In other implementations, the model developer can be configured to identify sound signatures based on training datasets using a “neural network” or “artificial neural network” configured to “learn” or progressively improve performance of tasks by studying known examples. In certain implementations, a neural network can include multiple layers of objects generally refers to as “neurons” or “artificial neurons.” Each neuron can be configured to perform a function, such as a non-linear activation function, based on one or more inputs via corresponding connections. Artificial neurons and connections typically have a contribution value that adjusts as learning proceeds. The contribution value increases or decreases a strength of an input at a connection. Typically, artificial neurons are organized in layers. Different layers may perform different kinds of transformations on respective inputs. Signals typically travel from an input layer, to an output layer, possibly after traversing one or more intermediate layers. Thus, by using a neural network, the model developer can provide a set of sound models that can be used by the sound device to recognize certain sounds (e.g., approaching vehicles, human speech, etc.) in the captured sound signal. In additional implementations, the model developer can be configured to perform sound signature identification based on user provided rules or via other suitable techniques.
In any of the foregoing implementations, upon identifying the digitized signal of the captured sound signal matches at least one sound model, the sound device can be configured to perform one or more corresponding actions included in the sound model. For instance, the sound device can be configured to determine whether the captured sound signal represents and/or includes human speech. In certain embodiments, upon determining that the detect sound includes human speech, the sound device can be configured to playback the human speech directly to a user of the sound device via the one or more speakers. In other embodiments, upon determining that the captured sound signal includes human speech, the sound device can be configured to extract the human speech (e.g., via spectral extraction and/or signal to noise enhancement) and perform speech to text conversion to derive a speech text via, for instance, feature extraction or other suitable techniques.
Based on the derived speech text, the sound device can be configured to perform various additional actions indicated in the corresponding sound model. For example, the sound device can be configured to determine whether the speech text represents a command from the user of the sound device. For instance, the speech text can include a command such as “up volume” or “lower volume.” In response, the sound device can be configured to incrementally or in other suitably manners increase a volume setting on the speakers of the sound device. In another example, the sound device may be operatively coupled to a computing device (e.g., a smartphone), and the speech text can include a command for interacting with the computing device, such as “call home.” In further examples, the sound device and/or the computing device can be communicatively coupled to a digital assistant, such as Alexa provided by Amazon.com of Seattle, Washington. The command can include a command that interacts with the digital assistant. For instance, the command can cause the digital assistant to perform certain operations, such as creating a calendar item, send an email, turning on a light, etc.
In yet further examples, the sound device can be configured to determine whether the speech text includes one or more keywords preidentified by the user and perform a corresponding preset operation accordingly. For example, a keyword can be selected by the user to include the user's name (e.g., “Bob”). Upon determining that the speech text represents someone calling “Bob,” in one embodiment, the sound device can be configured to playback a preconfigured message to the user via the speakers of the sound device, such as “someone just called your name.” In another instance, the sound device can also provide a text, sound, or other suitable forms of notification on a connected device, such as a smartphone, in addition to or in lieu of performing playback of the preconfigured message.
In response to determining that the captured sound signal does not include human speech, the sound device can be configured to identify one or more known sounds (e.g., a sound of an approaching vehicle) from the digitized signal based on the sound models. Upon identifying one or more known sounds, the sound device can be configured to select for playback a preconfigured message corresponding to the detected known sounds. For example, upon determining that the identified sound is that of an approaching vehicle, the sound device can be configured to select a preconfigured message such as “warning, vehicle approaching.” In one embodiment, the sound device can then be configured to perform text to speech conversion of the selected preconfigured message and then playback the message to the user via the speakers of the sound device. In other embodiments, the sound device can also be configured to provide a text, a sound, a flashing light, or other suitable forms of notification on a connected device (e.g., a smartphone) in addition to or in lieu of playback the selected message.
Several embodiments of the disclosed technology can thus improve the user's awareness of his/her environment by capturing useful information that is normally discarded when suppressing background noises. For example, by identifying a sound of a vehicle approaching, an emergency siren, or other alarms from background noises, the sound device can promptly provide notifications to the user via the speakers of the sound device and/or a connected smartphone. As such, safety of the person can be improved. In another example, by identifying a captured sound signal includes a door knock or someone calling the user's name, interaction and attentiveness of the user can also be improved.
In the foregoing description, various operations of intelligent information capturing are described as being performed by the processor of the sound device. In other implementations, at least some of the foregoing operations of intelligent information capturing can be performed by a computing device (e.g., a smartphone) operatively coupled to the sound device via, for instance, a Bluetooth, WIFI, or other suitable connection. As such, the set of sound models can be stored in the computing device instead of the sound device. In further implementations, the sound device and/or the computing device can be communicatively connected to a remote server (e.g., a server in a cloud computing data center), and at least some of the operations of intelligent information capturing, such as identifying sound(s) based on sound models, can be performed by a virtual machine, a container, or other suitable components of the remote server.
Certain embodiments of systems, devices, components, modules, routines, data structures, and processes for intelligent information capturing in sound devices are described below. In the following description, specific details of components are included to provide a thorough understanding of certain embodiments of the disclosed technology. A person skilled in the relevant art will also understand that the technology can have additional embodiments. The technology can also be practiced without several of the details of the embodiments described below with reference to
As used herein, “sound” generally refers to a vibration that can propagate as a wave of pressure through a transmission medium such as a gas (e.g., air), liquid (e.g., water), or solid (e.g., wood). A sound can be captured using an acoustic/electric device such as a microphone to convert the sound into an electrical signal. In certain implementations, the electronical signal can be an analog sound signal. In other implementations, the electrical signal can be a digital sound signal by, for example, sampling the analog sound signal using an ADC. A sound can be produced using an electroacoustic transducer, such as a speaker that converts an electrical signal into a corresponding sound.
Also used herein, an “ambient sound” generally refers to a composite sound that can be captured by a microphone or heard by a person in an environment in which the microphone or person resides. Ambient sound can include both desired sound, such as a conversation with another person or music played in a speaker or headphone, and unwanted sound referred to herein as “noise,” “background noise,” or “ambient noise.” Examples of noises can include sounds from traffic, alarms, power tools, air conditioning, or other sound sources.
Noises in an ambient sound can sometimes obscure or even render inaudible desired sound, such as a desired conversation or music. Various techniques have been developed to reduce or remove background noises from a sound signal. For example, certain hearing aids can detect and remove background noises at certain frequencies via spectral extraction, non-linear processing, finite impulse response filtering, or other suitable techniques. By applying such techniques, background noises can be suppressed or attenuated to emphasize desired human speech. In another example, a noise canceling headphone can detect ambient noises (e.g., sounds from refrigerators, fans, etc.) from outside the headphone using one or more microphones. The detected ambient noises can then be removed or suppressed by applying corresponding sound waves with opposite amplitudes. As such, desired music, conversations, or other suitable sound played through the headphone can be heard without interference from the ambient noises.
The foregoing techniques for attenuating background noises, however, have certain drawbacks. For instance, removing background noises from an ambient sound may also remove important information contained in the background noises. In one example, the background noises may contain sounds of an alarm, a door knock, an emergency siren, an approaching vehicle, etc. In another example, a person wearing a noise canceling headphone may not notice someone is calling his/her name or is shouting out a warning about on-coming traffic or other dangers. As such, removing background noises can render a person less aware of his/her environment, and thus negatively impact his/her safety, interactions with other people, or other aspects of the person's daily life.
Several embodiments of the disclosed technology can address at lease certain aspects of the foregoing drawbacks by implementing intelligent information capturing in a sound device, such as a hearing aid or headphone. In certain embodiments, an ambient sound can be captured using a microphone. The ambient sound can then be digitized into a digital sound signal. A sound device can then analyze the digital sound signal to determine whether the digital sound signal contains one or more signal profiles that match sound signatures in one or more sound models. In response to determining that the digital sound signal has a sound profile that matches the sound signature of one of the sound models, the sound device can output, via the speaker, an audio message to the user identifying the known sound while suppressing the captured ambient sound from the environment. As such, ambient noises can be suppressed while useful information from the suppressed ambient noises can be maintained, as described in more detail below with reference to
Components within a system may take different forms within the system. As one example, a system comprising a first component, a second component and a third component can, without limitation, encompass a system that has the first component being a property in source code, the second component being a binary compiled library, and the third component being a thread created at runtime. The computer program, procedure, or process may be compiled into object, intermediate, or machine code and presented for execution by one or more processors of a personal computer, a network server, a laptop computer, a smartphone, and/or other suitable computing devices.
Equally, components may include hardware circuitry. A person of ordinary skill in the art would recognize that hardware may be considered fossilized software, and software may be considered liquefied hardware. As just one example, software instructions in a component may be burned to a Programmable Logic Array circuit or may be designed as a hardware circuit with appropriate integrated circuits. Equally, hardware may be emulated by software. Various implementations of source, intermediate, and/or object code and associated data may be stored in a computer memory that includes read-only memory, random-access memory, magnetic disk storage media, optical storage media, flash memory devices, and/or other suitable computer readable storage media excluding propagated signals.
As shown in
In one embodiment, the sound device 102 can be a hearing aid suitable for improving hearing of the user 101 with hearing impairment. In other embodiments, the sound device 102 can also include a noise canceling headphone, a noise isolating headphone, or other suitable types of listening device. As shown in
The microphone 105 can be configured to capture the ambient sound 122. The speaker 106 can be configured to produce an output sound 103 to the user 101. In certain embodiments, the microphone 105 can be configured to capture the ambient sound 122 from the environment 100. The captured ambient sound 122 can have a frequency range, such as from about 100 Hz to about 8000 Hz, from about 600 Hz to about 1600 Hz, or other suitable values. In certain implementations, the captured ambient sound 122 can be divided into a number of frequency bands, for instance, ten to fifteen frequency bands from about 100 Hz to about 8000 Hz. The captured ambient sound 122 can then be digitized, for instance, by converting an analog signal from the microphone 105 at each frequency band (or in other suitable manners) into a digital signal (shown in
The processor 104 can include a microprocessor, a field-programmable gate array, and/or other suitable logic devices. The memory 108 can include volatile and/or nonvolatile media (e.g., ROM; RAM, magnetic disk storage media; optical storage media; flash memory devices, and/or other suitable storage media) and/or other types of computer-readable storage media configured to store data, such as records of sound models 110, as well as instructions for, the processor 104 (e.g., instructions for performing the methods discussed below with reference to
The sound signatures can be developed according to various suitable techniques. In certain implementations, a model developer 130 (shown in
The processor 104 can be configured to execute suitable instructions to provide certain components for facilitating intelligent information capturing in the sound device 102. For example, as shown in
The interface component 132 can be configured to receive input from the microphone 105 as well as provide an output to the speaker 106. In one embodiment, as shown in
As shown in
Upon identifying the digitized signal 124 of the captured ambient sound 122 matches at least one sound model 110, the analysis component 134 can be configured to indicate such matching and provide, for example, a sound identification (shown in
In response to determining that the captured ambient sound 122 does not include human speech, the control component 136 can be configured to identify one or more known sounds and select for playback a preconfigured message corresponding to the detected known sounds. For example, as shown in
In one embodiment, the control component 136 can then be configured to perform text to speech conversion of the selected preset message 140 and then playback the converted message to the user 101 via the speaker 106. In another embodiment, the control component 136 can be configured to select a sound file (not shown) corresponding to the preset message 140 and then instruct the speaker 106 to playback the sound file. In other embodiments, the control component 136 can also be configured to provide a text, a sound, a flashing light, or other suitable forms of notification 142 (shown in
In response to determining that the ambient sound 122 includes human speech, in one embodiment, the control component 136 can be configured to playback the human speech directly to the user of the sound device 102 via the speaker 106. In other embodiments, as shown in
In one implementation, the control component 136 can be configured to determine whether the text string represents a command to the sound device 102, such as “volume up” or “volume down.” In response to determining that the text string represents a command to the sound device 102, the control component 136 can be configured to execute the command to, for instance, adjust a volume of the speaker 106. In another implementation, the control component 136 can be configured to determine whether the text string represents a command to a digital assistant (e.g., Alexa provided by Amazon.com of Seattle, Washington). In response to determining that the text string represents a command to a digital assistant, the control component 136 can be configured to transmit the command to the digital assistant via a computer network (not shown) and/or provide output to the user 101 upon receiving feedback from the digital assistant. In further implementations, the control component 136 can also be configured to determine whether the text string includes one or more keywords pre-identified by the user 101. Examples of the keywords can include a name (e.g., “Bob”) of the user 101. In response to determining that the text string includes one or more keywords pre-identified by the user 101, the control component 136 can be configured to output an audio message to the user 101 informing the user 101 that the one or more keywords have been detected. For instance, as shown in
In further embodiments, the control component 136 can also be configured to perform sound suppression, compensation, or other suitable operations. For example, the control component 136 can be configured to modify, an amplitude of one or more of frequency ranges of the captured ambient sound 122 and outputting the captured ambient sound 122, via the speaker 106, with the modified amplitude at one or more of the frequency ranges along with the preset message 140. In another example, the control component 136 can also be configured to generate another digital or analog sound signal (not shown) having the multiple frequency ranges with corresponding amplitude opposite that of the captured ambient sound 122 and output, via the speaker 106, the generated sound signal along with the preset message 140 to at least partially cancel or attenuate the ambient sound 122.
Even though output provided to the user 101 is shown as being through the speaker 106 in
In further embodiments, at least some of the operations of intelligent information capturing can be performed on the mobile device 111. For instance, as shown in
In yet further embodiments, at least some of the operations of intelligent information capturing can be performed on a remote server 121, as shown in
As shown in
As shown in
Depending on the desired configuration, the processor 304 can be of any type including but not limited to a microprocessor (μP), a microcontroller (μC), a digital signal processor (DSP), or any combination thereof. The processor 304 can include one more level of caching, such as a level-one cache 310 and a level-two cache 312, a processor core 314, and registers 316. An example processor core 314 can include an arithmetic logic unit (ALU), a floating-point unit (FPU), a digital signal processing core (DSP Core), or any combination thereof. An example memory controller 318 can also be used with processor 304, or in some implementations memory controller 318 can be an internal part of processor 304.
Depending on the desired configuration, the system memory 306 can be of any type including but not limited to volatile memory (such as RAM), non-volatile memory (such as ROM, flash memory, etc.) or any combination thereof. The system memory 306 can include an operating system 320, one or more applications 322, and program data 324. This described basic configuration 302 is illustrated in
The computing device 300 can have additional features or functionality, and additional interfaces to facilitate communications between basic configuration 302 and any other devices and interfaces. For example, a bus/interface controller 330 can be used to facilitate communications between the basic configuration 302 and one or more data storage devices 332 via a storage interface bus 334. The data storage devices 332 can be removable storage devices 336, non-removable storage devices 338, or a combination thereof. Examples of removable storage and non-removable storage devices include magnetic disk devices such as flexible disk drives and hard-disk drives (HDD), optical disk drives such as compact disk (CD) drives or digital versatile disk (DVD) drives, solid state drives (SSD), and tape drives to name a few. Example computer storage media can include volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information, such as computer readable instructions, data structures, program modules, or other data. The term “computer readable storage media” or “computer readable storage device” excludes propagated signals and communication media.
The system memory 306, removable storage devices 336, and non-removable storage devices 338 are examples of computer readable storage media. Computer readable storage media include, but not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other media which can be used to store the desired information and which can be accessed by computing device 300. Any such computer readable storage media can be a part of computing device 300. The term “computer readable storage medium” excludes propagated signals and communication media.
The computing device 300 can also include an interface bus 340 for facilitating communication from various interface devices (e.g., output devices 342, peripheral interfaces 344, and communication devices 346) to the basic configuration 302 via bus/interface controller 330. Example output devices 342 include a graphics processing unit 348 and an audio processing unit 350, which can be configured to communicate to various external devices such as a display or speakers via one or more A/V ports 352. Example peripheral interfaces 344 include a serial interface controller 354 or a parallel interface controller 356, which can be configured to communicate with external devices such as input devices (e.g., keyboard, mouse, pen, voice input device, touch input device, etc.) or other peripheral devices (e.g., printer, scanner, etc.) via one or more I/O ports 358. An example communication device 346 includes a network controller 360, which can be arranged to facilitate communications with one or more other computing devices 362 over a network communication link via one or more communication ports 364.
The network communication link can be one example of a communication media. Communication media can typically be embodied by computer readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave or other transport mechanism, and can include any information delivery media. A “modulated data signal” can be a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media can include wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, radio frequency (RF), microwave, infrared (IR) and other wireless media. The term computer readable media as used herein can include both storage media and communication media.
The computing device 300 can be implemented as a portion of a small-form factor portable (or mobile) electronic device such as a cell phone, a personal data assistant (PDA), a personal media player device, a wireless web-watch device, a personal headset device, an application specific device, or a hybrid device that include any of the above functions. The computing device 300 can also be implemented as a personal computer including both laptop computer and non-laptop computer configurations.
From the foregoing, it will be appreciated that specific embodiments of the disclosure have been described herein for purposes of illustration, but that various modifications may be made without deviating from the disclosure. In addition, many of the elements of one embodiment may be combined with other embodiments in addition to or in lieu of the elements of the other embodiments. Accordingly, the technology is not limited except as by the appended claims.