The field of representative embodiments of this disclosure relates to methods, apparatuses, or implementations concerning or relating to playback management in an audio device. Applications include detection of certain ambient events, but are not limited to, those concerning the detection of near-field sound, proximity sound and tonal alarm detection using spatial processing based on signals received from multiple microphones.
Personal audio devices have become prevalent and they are used in diverse ambient environments. The headphones used in these audio devices have become advanced such that the occlusion caused by either passive or active methods prevents a user from keeping track of an ambient sound field external to the audio device. Even though the increased isolation and uninterrupted listening is preferable in most cases, sometimes for safety or enhanced user experience, it is imperative that some specific ambient events are heard by the user and an appropriate action is taken in response to that event. For example, if the user is listening to music through his headset and interrupted by someone attempting to start a conversation with him or her, it may be difficult to maintain the conversation unless the user pauses the playback signal or reduces the volume of the playback signal. For example, U.S. Pat. No. 7,903,825 proposes an audio device in which the playback signal is modified depending on the ambient acoustic field. As another example, U.S. Pat. No. 8,804,974 teaches ambient event detection in a personal audio device which can then be used to implement an event-based modification of the playback content. The above-mentioned references also teach the use of microphones to detect various acoustic events. As a further example, U.S. application Ser. No. 14/324,286, filed on Jul. 7, 2014, teaches using a speech detector as an event detector to adjust the playback signal during a conversation. As an additional example, U.S. Pat. No. 8,565,446 teaches the use of a direction of arrival (DOA) estimate and an interference to desired (near-field) speech signal ratio estimate from a set of plural microphones to detect desired speech in the presence of non-stationary background noise to control a speech enhancement algorithm in a noise reduction echo cancellation (NREC) system. Similarly, U.S. application Ser. No. 13/199,593 teaches that a maximum of the normalized cross-correlation statistic that is derived through a cross-correlation analysis of plural microphones may be an effective discriminator to detect near-field speech. A spectral flatness measure-based music detector for a NREC system is proposed in U.S. Pat. No. 8,126,706 to differentiate the presence of background noise from the background music. U.S. Pat. Nos. 7,903,825; 8,804,974; U.S. application Ser. No. 14/324,286; U.S. Pat. No. 8,565,446; U.S. application Ser. No. 13/199,593; and U.S. Pat. No. 8,126,706 are incorporated by reference herein.
In accordance with the teachings of the present disclosure, one or more disadvantages and problems associated with existing approaches to event detection for playback management in a personal audio device may be reduced or eliminated.
In accordance with embodiments of the present disclosure, a method for processing audio information in an audio device may include reproducing audio information by generating an audio output signal for communication to at least one transducer of the audio device, receiving at least one input signal indicative of ambient sound external to the audio device, detecting from the at least one input signal a near-field sound in the ambient sound, and modifying a characteristic of the audio information reproduced to the at least one transducer in response to detection of the near-field sound.
In accordance with these and other embodiments of the present disclosure, an integrated circuit for implementing at least a portion of an audio device may include an audio output configured to reproduce audio information by generating an audio output signal for communication to at least one transducer of the audio device, a microphone input configured to receive an input signal indicative of ambient sound external to the audio device, and a processor configured to detect from the input signal a near-field sound in the ambient sound and modify a characteristic of the audio information in response to detection of the near-field sound.
In accordance with these and other embodiments of the present disclosure, a method for processing audio information in an audio device may include reproducing audio information by generating an audio output signal for communication to at least one transducer of the audio device, receiving at least one input signal indicative of ambient sound external to the audio device, detecting from the at least one input signal an audio event, and modifying a characteristic of the audio information reproduced to the at least one transducer in response to detection of the audio event being persistent for at least a predetermined time.
In accordance with these and other embodiments of the present disclosure, an integrated circuit for implementing at least a portion of an audio device, may include an audio output configured to reproduce audio information by generating an audio output signal for communication to at least one transducer of the audio device, a microphone input configured to receive an input signal indicative of ambient sound external to the audio device, and a processor configured to detect from the input signal an audio event and modify a characteristic of the audio information reproduced to the at least one transducer in response to detection of the audio event being persistent for at least a predetermined time.
Technical advantages of the present disclosure may be readily apparent to one of ordinary skill in the art from the figures, description and claims included herein. The objects and advantages of the embodiments will be realized and achieved at least by the elements, features, and combinations particularly pointed out in the claims.
It is to be understood that both the foregoing general description and the following detailed description are examples and explanatory and are not restrictive of the claims set forth in this disclosure.
A more complete understanding of the example, present embodiments and certain advantages thereof may be acquired by referring to the following description taken in conjunction with the accompanying drawings, in which like reference numbers indicate like features, and wherein:
In accordance with embodiments of this disclosure, systems and methods may use at least three different audio event detectors that may be used in an automatic playback management framework. Such audio event detectors for an audio device may include a near-field detector that may detect when sounds in the near-field of the audio device is detected, such as a user of the audio device (e.g., a user that is wearing or otherwise using the audio device) speaks, a proximity detector that may detect when sounds in proximity to the audio device is detected, such as when another person in proximity to the user of the audio device speaks, and a tonal alarm detector that detects acoustic alarms that may have been originated in the vicinity of the audio device are proposed.
Near-field detector 3 may detect near-field sounds including speech. When such near-field sound is detected, it may be desirable to modify audio information reproduced to output audio transducer 51, as detection of near-field sound may indicate that a user is participating in a conversation. Such near-field detection may need to be able to detect near-field sound in acoustically noisy conditions and be resilient to false detection of near-field sounds in very diverse background noise conditions (e.g., background noise in a restaurant, acoustical noise when driving a car, etc.). As described in greater detail below, near-field detection may require spatial sound processing using a plurality of microphones 52. In some embodiments, such near-field sound detection may be implemented in a manner identical or similar to that described in U.S. Pat. No. 8,565,446 and/or U.S. application Ser. No. 13/199,593.
Proximity detector 4 may detect ambient sounds (e.g., speech from a person in proximity to a user, background music, etc.) other than near-field sounds. As described in greater detail below, because it may be difficult to differentiate proximity sounds from non-stationary background noise and background music, proximity detector may utilize a music detector and noise level estimation to disable proximity detection of proximity detector 4 in order to avoid poor user experience due to false detection of proximity sounds. In some embodiments, such proximity sound detection may be implemented in a manner identical or similar to that described in U.S. Pat. Nos. 8,126,706, 8,565,446, and/or U.S. application Ser. No. 13/199,593.
Tonal alarm detector 5 may detect tonal alarms (e.g., sirens) proximate to an audio device. To provide maximum user experience, it may be desirable that tonal alarm detector 5 ignores certain alarms (e.g., feeble or low-volume alarms). As described in greater detail below, tonal alarm detection may require spatial sound processing using a plurality of microphones 52. In some embodiments, such tonal alarm detection may be implemented in a manner identical or similar to that described in U.S. Pat. No. 8,126,706 and/or U.S. application Ser. No. 13/199,593.
The various statistics generated by the system of
In some embodiments, thresholds idrThres and imdTh may be dynamically adjusted based on a background noise level estimate.
Proximity detection of proximity detector 4 may be different than near-field sound detection of near-field detector 3 because the signal characteristics of proximity speech may be very similar to ambient signals such as music and noise. Accordingly, proximity detector 4 must avoid false detection of proximity speech in order to achieve acceptable user experience. Accordingly, a music detector 9 may be used to disable proximity detection whenever there is music in the background. Similarly, proximity detector 4 may be disabled whenever background noise level is above certain threshold. The threshold value for background noise may be determined a priori such that a likelihood of false detection below the threshold level is very low.
In some embodiments, the music detector taught in U.S. Pat. No. 8,126,706 may be used to implement music detector 9 to detect the presence of background music. Another embodiment of the proximity speech detector is shown in
This condition is verified by comparing the estimated background noise level with a threshold, noiseLevelThLo. If low noise level is detected, then the following two conditions are further tested to confirm the presence of proximity speech;
If the above-mentioned background noise level condition is not satisfied at block 31, then the following conditions may be indicative of proximity speech, in order to improve the detection rate of proximity speech without increasing occurrence of a false alarm (e.g., due to background noise conditions):
If the above stationary noise and the direction of arrival conditions are not satisfied at block 32, then the presence of both of the following set of conditions may indicate the presence of proximity speech:
If the above-mentioned direction of arrival condition is not satisfied at block 29, then the presence of following conditions may be indicative of proximity speech:
Tonal alarm detector 5 may be configured to detect alarm signals that are tonal in nature in which a sonic bandwidth of such alarm signals are also narrow (e.g., siren, buzzer). In some embodiments, the tonality of an ambient sound may be measured by splitting the time domain signal into multiple sub-bands through time to frequency domain transformation and the spectral flatness measure, depicted in
In practice, the instantaneous audio event detections of near-field detector 3, proximity detector 4, and tonal alarm detector 5 as shown in
The following pseudo-code may demonstrate application of the hold-off and hang-over logic to reduce false detection of audio events, in accordance with embodiments of the present disclosure.
A validated event may be further validated before generating the playback mode switching control. For example, the following pseudo-code may demonstrate application of the hold-off and hang-over logic for gracefully switching between a conversational mode (e.g., in which audio information reproduced to output audio transducer 51 may be modified in response to an audio event) and a normal playback mode (e.g., in which the audio information reproduced to output audio transducer 51 is unmodified).
The smoothing parameters alpha and beta may be set at different values to adjust a gain ramping rate.
It should be understood—especially by those having ordinary skill in the art with the benefit of this disclosure—that the various operations described herein, particularly in connection with the figures, may be implemented by other circuitry or other hardware components. The order in which each operation of a given method is performed may be changed, and various elements of the systems illustrated herein may be added, reordered, combined, omitted, modified, etc. It is intended that this disclosure embrace all such modifications and changes and, accordingly, the above description should be regarded in an illustrative rather than a restrictive sense.
Similarly, although this disclosure makes reference to specific embodiments, certain modifications and changes can be made to those embodiments without departing from the scope and coverage of this disclosure. Moreover, any benefits, advantages, or solutions to problems that are described herein with regard to specific embodiments are not intended to be construed as a critical, required, or essential feature or element.
Further embodiments likewise, with the benefit of this disclosure, will be apparent to those having ordinary skill in the art, and such embodiments should be deemed as being encompassed herein.
The present disclosure claims priority to U.S. Provisional Patent Application Ser. No. 62/202,303, filed Aug. 7, 2015, U.S. Provisional Patent Application Ser. No. 62/237,868, filed Oct. 6, 2015, and U.S. Provisional Patent Application Ser. No. 62/351,499, filed Jun. 17, 2016, each of which is incorporated by reference herein in its entirety.
Number | Name | Date | Kind |
---|---|---|---|
4306115 | Humphrey | Dec 1981 | A |
7903825 | Melanson | Mar 2011 | B1 |
8126706 | Ebenezer | Feb 2012 | B2 |
8565446 | Ebenezer | Oct 2013 | B1 |
8712069 | Murgia | Apr 2014 | B1 |
8804974 | Melanson | Aug 2014 | B1 |
20080091421 | Gustavsson | Apr 2008 | A1 |
20080240458 | Goldstein et al. | Oct 2008 | A1 |
20100278352 | Petit | Nov 2010 | A1 |
20120046906 | Alameh | Feb 2012 | A1 |
20130282373 | Visser | Oct 2013 | A1 |
20140270200 | Usher | Sep 2014 | A1 |
20140286497 | Thyssen | Sep 2014 | A1 |
20150171813 | Ganatra | Jun 2015 | A1 |
20150289070 | Armstrong-Muntner | Oct 2015 | A1 |
20170280239 | Sekiya | Sep 2017 | A1 |
Number | Date | Country |
---|---|---|
2002516535 | Jun 2002 | JP |
2004013084 | Jan 2004 | JP |
2004187283 | Jul 2004 | JP |
2004336251 | Nov 2004 | JP |
2011-097268 | May 2011 | JP |
2006011310 | Feb 2006 | WO |
2008083315 | Jul 2008 | WO |
WO-2008083315 | Jul 2008 | WO |
2013166439 | Nov 2013 | WO |
Entry |
---|
Ozgur Izmirli, 2000, Using a Spectral Flatness Based Feature for Audio Segmentation and Retrieval Abstract; pp. All (Year: 2000). |
International Search Report and Written Opinion of the International Searching Authority, International Application No. PCT/US2016/045834, dated Mar. 13, 2017. |
Communication pursuant to Article 94(3) EPC, European Patent Office, Application No. 16763354.4, dated Nov. 22, 2019. |
First Examination Opinion Notice, State Intellectual Property Office of the People's Republic of China, Application No. 201680058340.7, dated Dec. 19, 2019. |
Second Examination Opinion Notice, State Intellectual Property Office of the People's Republic of China, Application No. 201680058340.7, dated Jul. 13, 2020. |
Office Action, Japanese Patent Application No. 2018-526614, dated Oct. 1, 2020. |
Number | Date | Country | |
---|---|---|---|
20170040029 A1 | Feb 2017 | US |
Number | Date | Country | |
---|---|---|---|
62351499 | Jun 2016 | US | |
62237868 | Oct 2015 | US | |
62202303 | Aug 2015 | US |