This application is the U.S. national phase of PCT Application No. PCT/EP2015/071639 filed on 22 Sep. 2015, which claims priority to European Patent Application No. 14186097.3 filed on Sep. 24, 2014 the disclosures of which are incorporated in their entirety by reference herein.
The disclosure relates to audio reproduction systems and methods, in particular to audio reproduction systems and methods with a higher degree of individualization.
A number of algorithms exist on the market for binaural playback of audio content over earphones. They are based on synthetic binaural room impulse responses (BRIR), which means they are based on generalized head-related transfer functions (HRTF) such as standard dummy heads or generalized functions from a large HRTF database. In addition, some algorithms allow users to select the most suitable BRIR from a given set of BRIRs. Such options can improve the listening quality; they include externalization and out-of-head localization, but individualization (for example, head shadowing, shoulder reflections or the pinna effect) is missing from the signal processing chain. Pinna information especially is as unique as a fingerprint. The addition of individualization by way of a personal BRIR can increase naturalness.
The method described herein includes the following procedures: positioning a mobile device with a built-in loudspeaker at a first location in a listening environment and at least one microphone at at least one second location in the listening environment; emitting test audio content from the loudspeaker of the mobile device at the first position in the listening environment; receiving the test audio content emitted by the loudspeaker using the at least one microphone at the at least one second location in the listening environment; and, based at least in part on the received test audio content, determining one or more adjustments to be applied to desired audio content before playback by at least one earphone; wherein the first location and the second location are distant from each other so that the at least one microphone is within the near-field of the loudspeaker.
The system for measuring the binaural room impulse responses includes a mobile device with a built-in loudspeaker disposed at a first location in a listening environment and at least one microphone disposed at at least one second location in the listening environment. The mobile device is configured to emit test audio content via the loudspeaker at the first position in the listening environment and to receive from the earphones the test audio content emitted by the loudspeaker and received by the earphones at the at least one second location in the listening environment. The mobile device is further configured, based at least in part on the received audio content, to determine one or more adjustments to be applied to desired audio content by the mobile device before playback by the earphones, wherein the first location and the second location are distant from each other so that the at least one microphone is within the near-field of the loudspeaker.
Other systems, methods, features and advantages will be or will become apparent to one with skill in the art upon examination of the following detailed description and figures. It is intended that all such additional systems, methods, features and advantages be included within this description, be within the scope of the invention and be protected by the following claims.
The system may be better understood with reference to the following description and drawings. The components in the figures are not necessarily to scale, emphasis instead being placed upon illustrating the principles of the invention. Moreover, in the figures, like referenced numerals designate corresponding parts throughout the different views.
A Recorded “surround sound” is typically delivered through five, six, seven or more speakers. Real world sounds come to users (also herein referred to as “listeners”, particularly when it comes down to their acoustic perception) from an infinity of locations. Listeners readily sense direction on all axes of three-dimensional space, although the human auditory system is a two-channel system. One route into the human auditory system is via headphones (also herein referred to as “earphones”, particularly when it comes down to the acoustic behavior relative to each individual ear). The weakness of headphones is their inability to create a spacious and completely accurate sonic image in three dimensions. Some “virtual surround” processors have made incremental progress in this regard, as headphones are in principle able to provide a sonic experience as fully spacious, precisely localized and vivid as that created by multiple speakers in a real room.
Sounds that come from various directions are altered as they encounter the shape and dimensions of the head and upper torso and the shape of the outer ear (pinna). The human brain is highly sensitive to these modifications, which are not perceivable as tonal alterations; they are rather experienced by listeners quite accurately, as localized up, down, front, back or in between. This acoustic alteration can be expressed by the HRTF.
One type of recording has recognized that two audio channels can recreate a three-dimensional experience. Binaural recordings are made with a single pair of closely spaced microphones and are intended for headphone listening. Sometimes the microphones are embedded in a dummy head or head/torso to create an HRTF, in which case the sense of three-dimensionality is enhanced. The reproduced sound space can be convincing, though with no reference to the original environment, its accuracy cannot be attested. In any case, these are specialized recordings rarely seen in the commercial catalogue. Recordings intended to capture sounds front, rear and sometimes above are made with multiple microphones, are stored on multiple channels and are intended to be played back on multiple speakers arrayed around the listener.
Other systems (such as the Smyth Realiser) provide a completely different experience in which a multichannel recording (including stereo) sounds indistinguishably the same through headphones as it does through a loudspeaker array in a real room. In principle, the Smyth Realiser is similar to other systems in that it applies HRTFs to multichannel sound to drive the headphones. But along with other refinements, the Smyth Realiser employs three critical components not seen in other products: personalization, head tracking and the capture of the properties of every real listening space and sound system. The Smyth Realiser includes a pair of tiny microphones inserted into earplugs, which are placed in the listener's ears for measurement. The listener sits at the listening position within the array of loudspeakers, typically 5.1- or 7.1-channel, but any configuration, including height channels, can be accommodated. A brief set of test signals is played through the loudspeakers, then the listener puts on the headphones and a second brief set of measurements is taken. The whole procedure takes less than five minutes. In the measurement with the speakers, the Smyth Realiser not only captures the personal HRTF of the listener, but completely characterizes the room, the speakers and the electronics driving the speakers. In the measurement with the headphones, the system gathers data to correct for the interaction of the headphones and the ears and the response of the headphones themselves. The composite data is stored in memory and can be used to control equalizers connected in the audio signal paths.
As can be seen, the effort needed to take binaural measurement is cumbersome due to the need for dedicated measurement microphones, sound cards and other equipment. The methods and systems described herein allow for measuring BRIRs by way of smartphones to ease binaural measurement without the use of expensive hardware.
Referring to
In a study, four stimuli (test audio content) were considered in connection with the exemplary system shown in
Acoustic sources such as loudspeakers have both near-field and far-field regions. Within the near-field, wavefronts produced by the loudspeaker (or speaker for short) are not parallel, and the intensity of the wave oscillates with the range. For that reason, echo levels from targets within the near-field region can vary greatly with small changes in location. Once in the far-field, wavefronts are nearly parallel, and intensity varies with the range, squared under the inverse-squared rule. Within the far-field, the beam is properly formed and echo levels are predictable from standard equations.
It can be seen from
a) Although smartphone speakers have a limited frequency response, they can still render signals above approximately 600 Hz (see also
b) If the smartphone speaker itself is used to render measurement stimuli, the end user does not need to carry additional objects such as balloons for measurement.
c) The swept sine stimulus is proven and widely used by many manufacturers and researchers; it can easily be implemented in smartphones.
d) The user can move the smartphone (speaker) to any location around his head. This gives the flexibility of measuring the BRIR at any combination of azimuth and elevation.
Magnitude response 601 of an exemplary smartphone speaker generated from near-field measurement is shown in
Two exemplary algorithms for BRIR calculation are described below. Using the BRIR resulting from a headphone real room (HRR) process, a user's favorite content can be listened to via headphones, including the information of the measured room. Using the BRIR resulting from a headphone virtual room (HVR) process, a user's favorite content can be listened to via headphones, including only binaural information. However, the user can optionally include a virtual room in the signal chain.
HRR systems and methods intend to render binaural content with included listeners' room information via headphones (earphones). A flow chart of an exemplary application of a BRIR measurement in an HRR system that includes smartphone 701 is given in
Measurement of the BRIR is taken by using smartphone speaker 702 and placing binaural microphones (not shown) at the entrances of the user's ear canals. A sweep sine signal for spectral analysis is played back over smartphone speaker 702 at the desired azimuth and elevation angles. A specially designed pair of binaural microphones may be used that completely block the listener's ear canals. The microphones may be a separate set of binaural microphones, and the measurement hardware may be separated from smartphone 701, similar to the system shown in
Concerning correction for the smartphone speaker deficiency, a full bandwidth loudspeaker is ideally required to cover all frequency ranges while measuring the BRIR. Since a limited band speaker is used for measurement, namely smartphone speaker 701, it is necessary to cover the missing frequency range. For this, a near-field measurement is taken using one of the binaural microphones. From this, an inverse filter with an exemplary magnitude frequency characteristic (also known as “frequency characteristic” or “frequency response”), as shown in
Regarding the (optional) spectral balancer, an additional equalization can be applied if the user wishes to embed a certain tonality in the sound. For this, an average of the left ear and right ear BRIRs is taken. A flow chart of the process is given in
Regarding the headphone equalizer, since there is a huge variation of frequency characteristics for earphones, sometimes even within the same manufacturing company, applying an equalizer to compensate for influence from earphones is required. To do this, the frequency response of the particular earphone is required. This measurement of the earphone characteristics can be taken using simple equipment, as shown in
A schematic of a corresponding measuring process is given in
Referring again to
A headphone virtual room (HVR) system intends to render binaural content without included listeners' room information via earphones. Listeners can optionally include a virtual room in the chain. A schematic of the process is given in
Dereverberator/Smoothing: If the measured room impulse response contains unnecessary peaks and notches, unpleasant timbral artifacts may degrade the sound quality. To get rid of the room information or to remove the early and late reflections, (temporal and/or spectral) windowing techniques can be incorporated. In the application, a combination of rectangular and Blackman-Harris windows is used, as shown in
Artificial reverberator: In the previous block, all room-related information has been removed. That is, only directional information (e.g., interaural time difference [ITD] and interaural level difference [ILD]) is contained in the BRIR after the application of a windowing function (window). Sources therefore appear to be very close to the ears. An artificial reverberator can thus optionally be used if there is a need to incorporate distance information. Any state-of-the-art reverberator can be used for this purpose.
As can be seen from
Throughout this study, the focus was not to destroy the phase information of the BRIR. The magnitude frequency response in
While various embodiments of the invention have been described, it will be apparent to those of ordinary skill in the art that many more embodiments and implementations are possible within the scope of the invention. Accordingly, the invention is not to be restricted except in light of the attached claims and their equivalents.
Number | Date | Country | Kind |
---|---|---|---|
14186097 | Sep 2014 | EP | regional |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2015/071639 | 9/22/2015 | WO | 00 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2016/046152 | 3/31/2016 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
20030179891 | Rabinowitz | Sep 2003 | A1 |
20060045294 | Smyth | Mar 2006 | A1 |
20060050908 | Shteyn | Mar 2006 | A1 |
20060274901 | Terai et al. | Dec 2006 | A1 |
20070270988 | Goldstein et al. | Nov 2007 | A1 |
20080298604 | Starobin | Dec 2008 | A1 |
20100272270 | Chaikin et al. | Oct 2010 | A1 |
20110135101 | Matsuura | Jun 2011 | A1 |
20130028429 | Amada et al. | Jan 2013 | A1 |
20130216071 | Maher | Aug 2013 | A1 |
20140334644 | Selig | Nov 2014 | A1 |
20150223002 | Mehta | Aug 2015 | A1 |
20150326815 | Masuda et al. | Nov 2015 | A1 |
20150350804 | Crockett et al. | Dec 2015 | A1 |
20160035337 | Aggarwal | Feb 2016 | A1 |
20160174013 | Pallone | Jun 2016 | A1 |
20170006403 | Fontana | Jan 2017 | A1 |
Number | Date | Country |
---|---|---|
3001701 | Mar 2016 | EP |
01240099 | Sep 1989 | JP |
H05199596 | Aug 1993 | JP |
2001134272 | May 2001 | JP |
2004128854 | Apr 2004 | JP |
2013031076 | Feb 2013 | JP |
2013126603 | Aug 2013 | WO |
2014036085 | Mar 2014 | WO |
2014002640 | May 2016 | WO |
Entry |
---|
English Translation of Japanese Final Office Action for Application No. 2017-507406, dated Jun. 2, 2020, 7 pages. |
English Translation of Indian Office Action for Application No. 201747009273, dated May 29, 2020, 6 pages. |
Number | Date | Country | |
---|---|---|---|
20170295445 A1 | Oct 2017 | US |