The methods, systems and devices described below relate to the field of noise filtering in telecommunications, and more particularly to methods, systems and devices for actively identifying and eliminating unwanted background noise in cellular telephone communications.
Cellular and wireless communication technologies have seen explosive growth over the past few years. This growth has been fueled by better communications hardware and larger networks with more reliable protocols that provide an unprecedented freedom of movement to the mobile public, cutting the tether to hardwired communication systems. As a result of this mobility, more cellular phone and wireless device users are using their devices in places where background noise cannot be controlled, such as in family rooms with children present, on construction sites and on busy streets. Consequently, cellular conversations are often interrupted by annoying background noises (e.g., dog barking, a car alarm ringing, etc.) that are transmitted with the users voices.
The various embodiments provide systems, devices, and methods for enabling one party to a telephone call or video conference to indicate sounds for filtering, in response to which one or more components in the communication system record the indicated sounds, generate filtering criteria for suppression or enhancing those sounds, and then process subsequent communications to suppress or enhance sound. For example, a first user may designate noise for filtering by pressing a push-to-hush button on the user's phone. This action prompts the user's phone, the other party's phone or a communication component in the intervening network to record sound present while the button is pressed and/or for a certain amount of time thereafter. One of the components in the communication network (either phone or an intermediate component) may analyze the recorded sound, such as to identify frequency, harmonic and amplitude characteristics that can then be used to filter (suppress or enhance) the sound from subsequent communications. Enabling the user to designate sounds for filtering enables generation and use of very specific noise filtering mechanisms (i.e., specific to the annoying or pleasing sound identified by a user). The embodiments also enable the person on the receiving side who is most likely to be bother by the background noise to designate the sounds for filtering, thereby enabling the parties to the conversation to initiate active sound filtering according to their preferences.
The accompanying drawings, which are incorporated herein and constitute part of this specification, illustrate exemplary embodiments of the invention, and together with the general description given above and the detailed description given below, serve to explain the features of the invention.
The various embodiments will be described in detail with reference to the accompanying drawings. Wherever possible, the same reference numbers will be used throughout the drawings to refer to the same or like parts. References made to particular examples and implementations are for illustrative purposes, and are not intended to limit the scope of the invention or the claims.
The word “exemplary” is used herein to mean “serving as an example, instance, or illustration.” Any implementation described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other implementations.
The terms “cell phone” and “mobile device” are used interchangeably herein to refer to any one or all of telephones, cellular telephones, smart-phones (e.g., iphone), multimedia Internet enabled cellular telephones (e.g., the Blackberry Storm®), personal data assistants (PDA's), laptop computers, personal computers, receivers within vehicles (e.g., automobiles) and similar electronic devices. However, the terms “cell phone” and “mobile device” should not be limited to the enumerated list of devices and may include any device capable of controlling a transducer and/or receiving audio files or si.
As used herein the term “file” refers to any of a wide variety of data structures, assemblages of data and database files which may be stored on a computing device.
The various embodiments provide methods, devices and systems for identifying and filtering background noises during a telephone/videophone conversation. As most have experienced, there are times when it is difficult to hold a conversation over a cell phone due to background noises. Current methods for reducing background noises cannot target a particular noise that a user finds annoying (e.g., car alarm, baby crying, barking dog, etc) while allowing other background noises (e.g., children laughing) to be transmitted along with the user's voice. Existing noise cancellation systems have no mechanisms for differentiating between sounds that a user finds annoying and those that the user wants to hear.
The various embodiments enable any party in a telephone conversation to identify and label sounds as being either annoying or pleasant so that the identified sound can be actively filtered and/or amplified. The embodiments enable the listener to indentify a sound by providing a user input option (e.g., a physical or virtual button) that can be pressed when the user hears the annoying (or pleasant) sound. The system analyzes sound while the input is depressed, and uses the result to actively filter the rest of the telephone conversation. By allowing the listeners to actively identify the sounds for filtering, noise filtering/amplification mechanisms may focus their processing on the identified sounds and remove only those sounds deemed annoying, amplify those sounds deemed pleasant, and/or permit all other sounds (e.g., voice) to be heard without any filtering.
In an embodiment, the phones may be configured such that pressing a push-to-hush button on a far-end phone (e.g., a phone in a different location than the source of the background noise) causes a near-end phone (e.g., a phone in the location of the background noise) to capture and filter the sounds before transmission. In other embodiments, the near-end phone captures and transmits the sounds without modification and the phones may be configured such that pressing a push-to-hush button on a far-end phone causes a server or the far-end phone to filter the sounds so they are not produced by the far-end phone. Thus, in the various embodiments, a phone on which the push-to-hush button was pressed (e.g., far-end phone) may capture and filter the sounds, instruct another phone (e.g., near-end phone) to sample and capture the sound, and/or instruct another component (e.g., near-end phone, server) to filter the sounds. In any case, one of the phones captures the sound, digitizes the sound and stores it in a memory, and the stored sounds are analyzed by a component in the communications link (e.g., near end phone 304, far end phone 302 or server 314 in between the phones) to generate filtering criteria (e.g., frequency and harmonic components). The stored sounds may be analyzed using known sound analysis techniques, such as to identifying frequencies, amplitudes, and time-varying patterns that characterize the sound or which can otherwise be used to filter out and/or amplify the recorded sounds
The various embodiments may be implemented within a variety of communication systems, such as a cellular telephone network, an example of which is illustrated in
For ease of reference, the user interface elements 216, 218 are referred to herein as “push-to-hush” or “push-to-amplify” buttons. The term “push-to-hush button” is used herein to describe a user interface element (e.g., buttons 216, 218) that allows users to initiate the sampling of unwanted or “annoying” background sounds and/or “wanted” (e.g., voice) sounds, and the term “push-to-amplify button” is used to describe a user interface element that allows user to initiate the sampling of background sounds for amplification. However, a single “push-to-hush button” may be associated all of the above-mentioned elements/operations. Therefore, it should be understood that the terms “push-to-hush” and “push-to-amplify,” are used generically and interchangeably in this application to describe any user interface element (e.g., button) that initiates the sampling of sounds and the use of these terms should not be used to limit the scope of the claims.
As discussed above with reference to
Captured sounds may be labeled as being good or annoying via their storage in one or more memories dedicated to each labeling category. For example, each of the components in the communication link (e.g., near end phone 304, far end phones 302, 316 or server 314 in between the phones) may contain a “good” memory (e.g., a buffer) and a “bad” memory. Sounds indicated as being annoying or unwanted (e.g., by pressing the push-to-hush button) may be placed in the “bad” memory. Likewise, sounds may be labeled as wanted via their storage in the good memory. In an embodiment, the sounds that are to be amplified (e.g., sounds indicated as “pleasing” via a pressing of a push-to-amplify button or virtual button) may be stored in an “amplify” memory. Each of the bad, good and amplify memories may be a portion of the device memory that has been reserved for audio samples that are annoying, wanted, or to be amplified, respectively.
In various embodiments, the cellular phones 302, 304, 316 may be configured to capture “good sounds” at various times. A user may indicate or record wanted sounds by pressing and/or holding the push-to-hush button while a wanted sound is heard, such as while one or more parties to the conversation is speaking. The phones 302, 304, 316 may capture these sounds and store them in the “good” memory (e.g., good buffer). In an embodiment, the cellular phones 302, 304, 316 may be configured to automatically sample sounds for an amount of time (e.g., 500 milliseconds, 1 second, 2 seconds, etc.) before and/or after an identified “bad” sound is captured and store these sampled sounds in the “good” memory. For example, the cellular phones 302, 304, 316 may be configured (e.g., via a user setting) to automatically sample sounds for a configurable amount of time (e.g., 500 milliseconds, 1 second, 2 seconds, etc.) after the pressing of the push-to-hush button (i.e., after the initiation of the “bad” sound capture process) and store these sampled sounds in the “good” memory. In an embodiment, the cellular phones 302, 304, 316 may be configured to prompt the user to capture sounds to be stored in the “good” memory by pressing the push-to-hush button at a designated time. In an embodiment, the sampling of sounds to be stored in the “good” memory may be initiated after the push-to-hush button is pressed a certain number of times. For example, the phone may be configured such that the first three times the push-to-hush button is pressed, the captured sounds are stored in the “bad” memory, and the next three times the push-to-hush button is pressed the captured sounds are stored in the “good” memory.
In an embodiment, the cellular phones 302, 304, 316 may be configured to capture sounds to be amplified and store them in an “amplify” memory. In an embodiment, the “amply” memory may stored sounds captured in response to the user pressing the “push-to-amplify” button. In an embodiment, any component in the communication link (i.e., the near-end phone, far-end phone, or server in between) may amplify sound signals matching the frequency range of sounds received and/or transmitted over the air. It should be noted that, in the various embodiment, the “amplify” memory may be completely distinct from the above-mentioned “good” and “bad” memories and store only sounds that are to be amplified. In various embodiments, both the “good” memory and the “bad” memory may be used to for noise suppression.
In an embodiment, captured sounds may be stored in a temporary buffer while a mobile device user labels the captured sound as “wanted,” “amplify” or “annoying.” For example, sounds may be stored in a temporary buffer while a mobile phone display prompts the user with options such as “hush,” “pleasant,” “wanted,” “voice” and “amplify.” Based on the user's selection, the sounds may be transferred from the temporary buffer to the appropriate memory (e.g., good, bad, amplify).
The system may use intelligence to recognize when an identified sound may encompass the sounds of speech, and behave appropriately. The system may be configured such that sounds that overlay normal speech are not filtered and/or amplified. In an embodiment, the sounds may be filtered only briefly to enable speech to be heard while reducing the background sound. In an embodiment, the filtering device (near end phone, far end phone or server in between) may recognize when a person is talking and not talking and filter out the background noise overlaying speech sounds only when it determines the person is not talking.
Sounds stored in the “good” and “bad” memories may be used to determine what sounds are background sounds (e.g., dog barking, children laughing) and what sounds are foreground sounds (e.g., caller's voice). A background-foreground separation process (e.g., software executed by a device processor, circuit logic embedded within a processor, etc.) may determine if the identified background sounds may be filtered based the identified foreground sounds. For example, a device processor may determine if sounds stored in the different memories have different spectral frequencies in relation to one another and/or if the stored sounds have a different spectral frequency than that of the sounds transmitted when the button is no longer pressed (e.g., normal conversation). The device processor may also determine whether the sounds are different enough so that they can be removed/amplified without significant distortions. If it is determined that it is possible to filter the background sounds from the foreground sounds without significant distortion, the sounds may be separated for filtering and/or amplification. However, if it is determined that the sounds are too similar (e.g., a car horn is too similar to the sound caller A's voice) the system may choose not to filter the sounds and no additional filtering/amplification is performed.
In various embodiments, the labeling of the sounds as being “pleasant,” “annoying” or “amplify” may be based on the configuration of the push-to-hush button. For example, if caller B wants to hear the background noise more clearly, he/she may configure the push-to-hush button such that pressing the button indicates that background noises are “pleasant.” In another embodiment, a second push-to-hush button (e.g., a push-to-amplify button) may be used to differentiate background noises deemed annoying from background noises deemed pleasant. Thus, the memories may be chosen based on the button pressed (e.g., push to hush, push to amplify, etc.) or based on the configuration (hush, pleasant, amplify, etc.) of the push-to-hush button.
In an embodiment, the system may determine which sounds are to be placed in a bad buffer and which sounds are to be placed in a good buffer based on user input, user preferences, usage history, and/or direction of incoming sound. The sounds stored in the memories may be used to determine what sounds are background sounds (e.g., dog barking, children laughing) and what sounds are foreground sounds (e.g., caller's voice). A background-foreground filtering algorithm may be executed to determine if the identified background sounds may be separated from the identified foreground sounds. If it is determined that it is possible to separate the background sounds from the foreground sounds, the sounds may be filtered and/or amplified. However, if it is determined that the sounds are too similar (e.g., a car honking is too similar to the sound caller A's voice) the system may choose not to filter the sounds and no additional filtering/amplification is performed.
As discussed above with reference to
In an embodiment, the user on the near end phone 304 (caller A) may press the push-to-hush button to control the selection of microphone arrays on the near end phone 304 (caller A). In another embodiment, the user on the far end phone 302 (caller B) may press the push-to-hush button to control the selection of microphone arrays on the near end phone 304 (caller A). In this embodiment, the pushing of the push-to-hush button on the far end phone 302 transmits a signal to the near end phone 304 (caller A) via the cellular communication link instructing the near end phone 304 to sample the wanted/unwanted sounds and determine which microphone arrays to select.
In the various embodiments, the system may use intelligence to recognize when an identified sound may encompass the sounds of speech. For example, in an embodiment, the system may be configured such that sounds that overlay normal speech are not filtered and/or amplified. In various embodiments, the sounds may be filtered only briefly to enable speech to be heard while reducing the background sound. In an embodiment, the filtering device (near end phone, far end phone or server in between) may recognize when a person is talking and not talking and filter out the background noise overlaying speech sounds only when it determines the person is not talking.
As previously discussed, caller A may remove the background noise by halting the conversation and pressing a push-to-hush button on caller A's phone 304, which instructs caller A's phone 304 to detect and capture all sound present while the button is pressed and/or for a certain amount of time thereafter. In another embodiment, caller B may remove the sounds directly by simply asking caller A to momentarily stop speaking while caller B presses a push-to-hush button located on caller B's phone 302. In this scenario, caller B is the “husher” and the caller B's phone 302 may initiate the capture of the background noise 310. In yet another embodiment, pressing the push-to-hush button on either phone 302, 304 may instruct a server 314 (e.g., a server in the communications link between the two phones) to initiate the sound capture process and to filter the noise. Thus, in the various embodiments, pressing a push-to-hush button on any one of the phones 302, 304 may instruct one or more components within the communications link (e.g., near end phone 304, far end phone 302 or server 314 in between the phones) to sample background noises and/or filter the background noises.
The filtering of sounds may occur either before the transmission of the signal or after the transmission of the signal. For example, the near-end phone 304 may actively sample the background sounds and filter the sounds before transmitting the signal. Alternatively, the far-end phone 302 may actively sample the background sounds and filter the sounds after the signal is received from the near-end phone 304.
In an embodiment, the phones may be configured such that pressing the push-to-hush button on a far-end phone 302, 316 causes the near-end phone 304 to capture and filter the sounds before transmission. In other embodiments, the near-end 304 phone captures and transmits the sounds without modification and the phones may be configured such that pressing a push-to-hush button on a far-end phone 302, 316, causes a server 314 or the far-end phone 302, 316 to filter the sounds so they are not produced by the far-end phone 302,316. Thus, in the various embodiments, a phone on which the push-to-hush button was pressed (e.g., far-end phone 302) may capture and filter the sounds, instruct another phone (e.g., near-end phone 304) to sample and capture the sound, and/or instruct another component (e.g., near-end phone 304, server 314) to filter the sounds. In any case, one of the phones 302, 304 captures the sound, digitizes the sound and stores it in a memory, and the stored sounds are analyzed by a component in the communications link (e.g., near end phone 304, far end phone 302 or server 314 in between the phones) to generate filtering criteria (e.g., frequency and harmonic components). The stored sounds may be analyzed using known sound analysis techniques, such as to identifying frequencies, amplitudes, and time-varying patterns that characterize the sound or which can otherwise be used to filter out and/or amplify the recorded sounds.
As discussed above, current methods for reducing background noises cannot target a particular noise that a user finds annoying (e.g., car horn), and as a result, are unable to differentiate between sounds that a user finds annoying and those that the user wants to hear. By allowing users to actively identify and label the sounds, the various embodiments enable noise filtering/amplification mechanisms to focus their processing on the identified sounds and remove only those sounds deemed annoying, amplify only those sounds deemed pleasant, and/or permit all other (e.g., voice) sounds to be heard without any filtering. For instance, in the illustrated example of
If it is determined that the residual noise volume is more than a preset threshold value, the mobile device may prompt the user to capture more sound samples for one or more categories of labeled sounds (e.g., good, bad). For example, in an embodiment, the phone on which the push-to-hush button was pressed may be configured to prompt the user to repress the button at the next occurrence of the sound (e.g., the next time the car horn sounds) to capture additional sound samples. These additional samples allow the analyzing component (e.g., near end phone 304, far end phone 302 or server 314) refine filtering criteria in order to more narrowly target the undesirable sounds. By labeling precise intervals of unwanted sound (e.g., via taking and analyzing additional sound samples), the noise signature of the unwanted sound becomes more readily identifiable, allowing the filtering mechanism to better suppress the undesired noise without significantly impacting the voice signal, resulting in better noise reduction than existing solutions. This improved noise reduction is illustrated by
Aggressively filtering background noise may result in filtering out some speech sounds, as represented by gaps in the frequency domain of
An embodiment method 1100 for removing background noises from a communication link is illustrated in
Returning to
Another embodiment method 1200 for removing background is illustrated in
Another embodiment method 1300 for removing background noises is illustrated in
Another embodiment method 1400 for removing background noises is illustrated in
In an embodiment, captured sounds may be stored in a database and used for filtering in future conversations. A user specific database to store all the captured sounds and/or filtering criteria may be created on any of the components within the communications link, including a server within the network. In this manner, whenever a user engages in another conversation, the user may choose to continue to hush and/or amplify sounds based on the previously designated and stored sounds. In an embodiment, the captured sounds and/or filtering criteria may be stored in a global database such that any component in the system may choose to amplify or hush sounds that have previously been designated for filtering. In another embodiment, the phones may be configured to filter based on any combination of filtering criteria extracted from any combination of global and user specific databases.
The various embodiments may be implemented on either phone (e.g., caller A or caller B), on both phones, or on intermediate servers routing the calls between caller A and caller B. Either phone may initiate the sound capture process and any component (e.g., near end phone, far end phone or server in between the phones) may analyze and filter the annoying sound. For example, if the system is only installed on the far end phone (e.g., caller B), the far end phone may record and actively filter the sound. Likewise, if the system is only installed on the near end phone (e.g., Caller a), the near end phone may record and actively filter the sound before transmission. However, if the system is installed on both phones, the far end phone (e.g., Caller B's phone) may record and filter the sound locally or instruct the server or the near end phone (Caller A's phone) to record the sounds and filter the sounds before they are transmitted. Thus it should be clear from the above examples that, in the various embodiments, the sound may be sampled, recorded and filtered at any point in the communication link between two or more callers.
A typical cell phone 1500 includes a sound encoding/decoding (CODEC) circuit 1524 which digitizes sound received from a microphone into data packets suitable for wireless transmission and decodes received sound data packets to generate analog signals that are provided to the speaker 1554 to generate sound. Also, one or more of the processor 1501, wireless transceiver 1505 and CODEC 1524 may include a digital signal processor (DSP) circuit (not shown separately). Processing of stored sound to generate filtering criteria may be accomplished by one or more DSP circuits within the components of the cell phone 1500 using signal analysis methods is well known in the DSP arts. Also, the application of filtering criteria to suppress undesirable sounds and/or enhance desirable sounds may be accomplished by one or more DSP circuits within the components of the cell phone 1500 and/or within the CODEC 1524.
The various embodiments may be implemented within a telecommunications network on any of a variety of commercially available server devices, such as the server 1600 illustrated in
The processors 1501, 1601 may be any programmable microprocessor, microcomputer or multiple processor chip or chips that can be configured by software instructions (applications) to perform a variety of functions, including the functions of the various embodiments described below. In some mobile receiver devices, multiple processors 1601 may be provided, such as one processor dedicated to wireless communication functions and one processor dedicated to running other applications. Typically, software applications may be stored in the internal memory 1502, 1602, 1603 before they are accessed and loaded into the processor 1501, 1601. The processor 1501, 1601 may include internal memory sufficient to store the application software instructions.
The foregoing method descriptions and the process flow diagrams are provided merely as illustrative examples and are not intended to require or imply that the steps of the various embodiments must be performed in the order presented. As will be appreciated by one of skill in the art the order of steps in the foregoing embodiments may be performed in any order. Words such as “thereafter,” “then,” “next,” etc. are not intended to limit the order of the steps; these words are simply used to guide the reader through the description of the methods. Further, any reference to claim elements in the singular, for example, using the articles “a,” “an” or “the” is not to be construed as limiting the element to the singular.
The various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The hardware used to implement the various illustrative logics, logical blocks, modules, and circuits described in connection with the aspects disclosed herein may be implemented or performed with a general purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A general-purpose processor may be a microprocessor, but, in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration. Alternatively, some steps or methods may be performed by circuitry that is specific to a given function.
In one or more exemplary aspects, the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. The steps of a method or algorithm disclosed herein may be embodied in a processor-executable software module which may reside on a tangible, non-transitory computer-readable storage medium. Tangible, non-transitory computer-readable storage media may be any available media that may be accessed by a computer. By way of example, and not limitation, such, non-transitory computer-readable media may comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that may be used to store desired program code in the form of instructions or data structures and that may be accessed by a computer. Disk and disc, as used herein, includes compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk, and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of , non-transitory computer-readable media. Additionally, the operations of a method or algorithm may reside as one or any combination or set of codes and/or instructions on a tangible, non-transitory machine readable medium and/or computer-readable medium, which may be incorporated into a computer program product.
The preceding description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the following claims and the principles and novel features disclosed herein.