The present disclosure relates to improvements to binaural processing. More particularly, it relates to methods and systems for providing a lightweight process for binaural processing.
Audio systems typically are made up of an audio source (such as a radio receiver, smartphone, laptop computer, desktop computer, tablet, television, etc.) and speakers. In some cases, the speakers are worn proximal to the ears of the listener, e.g., headphones and earbuds. In that situation, it is sometime desirable to emulate the audio qualities of external speakers not proximal to the ears. This can be done by synthesizing the sound to create a binaural effect prior to sending the audio to the proximal speakers (henceforth referred to as headphones).
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section. Similarly, issues identified with respect to one or more approaches should not assume to have been recognized in any prior art based on this section, unless otherwise indicated.
While synthesizing the sound to create a binaural effect prior to sending the audio to the speaker, not all audio sources are set up to do this synthesizing, and normal synthesizing circuity is too memory intensive and complex to be included in headphones or earbuds.
The methods and systems/devices described herein present a lower complexity (lightweight) means of creating quality binaural effects with channel-level controlled reverb. This, among other things, allows for binaural virtualization implementation in small devices, including headphones and earbuds, which would normally not be feasible.
The disclosure herein describes systems and methods for providing lightweight binaural virtualization that could be included in headphone, earbuds, or other devices that are memory and complexity sensitive. The systems and methods can be implemented as part of an audio decoder.
An embodiment of the invention is a device providing binaural virtualization, the device comprising: an input of a left input signal and a right input signal; a virtualizer; an upmixer configured to convert the left input signal and right input signal to a right channel, a left channel, and a center channel; a mixer configured to combine the left input signal with the left channel based on a center-only reverb amount value and combine the right input signal with the right channel based on the center-only reverb amount value, producing a mixer output; a reverb module configured to apply reverb to the mixer output for the virtualizer.
An embodiment of the invention is a method for providing binaural virtualization, the
method comprising: receiving input of a left input signal and a right input signal; upmixing the left input signal and right input signal to a right channel, a left channel, and a center channel; mixing the left input signal with the left channel based on a center-only reverb amount value and mixing the right input signal with the right channel based on the center-only reverb amount value, thereby producing a mixer output; applying reverb to the mixer output for a virtualizer.
These embodiments are exemplary and not limiting: other embodiments can be envisioned based on the disclosure herein.
As used herein, “lightweight” refers to a reduced memory and complexity implementation of circuitry. This reduces the footprint and energy consumption of the circuit.
As used herein, “HRIR” refers to the head related impulse response. This can be thought of as the time domain representation of an HRTF (head related transfer function) which describes how an ear receives sound from a source.
As used herein, “ITD” refers to the interaural time difference which describes the difference in time each ear receives from a given instance of sound from a source.
As used herein, “ILD” refers to the interaural level difference which describes the difference in perceived amplitude each ear receives from a given instance of sound from a source.
As used herein, “Butterworth filter” refers to a filter that is essentially flat in the passband.
As used herein, “binaural” refers to sound sent separately to each ear with the effect of a plurality of speakers placed at a distance from the listener and at a distance from each-other.
As used herein, “virtualizer” refers to a system that can synthesize binaural sound.
As used herein, “upmixing” is a process where M input channels are converted to N output channels, where N>M (integers). An “upmixer” is a module that performs upmixing.
As used herein, a “signal” is an electronic representation of audio or video, input or output from a system. The signal can be stereo (left and right signals being separate). As used herein, a “channel” is a portion of a signal being processed by a system. Examples of channels are left, right, and center.
As used herein, “module” refers to the part of a hardware, software, or firmware that operates a particular function. Modules are not necessarily physically separated from each other in implementation.
As used herein, “input stage” refers to the hardware and/or software/firmware that handles receiving input signals for a device.
An HRIR head model from C. Phillip Brown, “A Structural Model for Binaural Sound Synthesis” IEEE Transaction on Speech and Audio Processing, vol. 6, No. 5, September 1998 is a combination of ITD and ILD. The ITD model is head radius and angle related based on
Woodworth and Schlosberg's formula (see Woodworth, R. S., and Schlosberg, H. (1962), Experimental Psychology (Holt, New York), pp. 348-361). With the elevation angle set to zero, the formula becomes:
ITD=(a/c)(θ+sin θ) (1)
By adding a minimum-phase filter to account for the magnitude response (head-shadow) one can approximate ILD cue. The ILD filter can additionally provide the frequency-dependent delay observed.
By cascading ITD and ILD, the filter in time domain is:
A harmonic generator can generate harmonics based mostly on the center channel. It aims to provide virtual bass effect. It uses multiplication per sample of itself to generate a harmonic.
y =x(1−0.5|x|) (5)
An equalizer can apply parametric or shelving filters, for example using a method from SO. J. Orfanidis, “High-Order Digital Parametric Equalizer Design,” J. Audio Eng. Soc., vol. 53, no. 11, pp. 1026-1046, (2005 November.).
In some embodiments, binaural sound is synthesized by controlling the amount of reverb on the channels by adjusting amplitudes based on a total reverb amount value.
p
rev
where α is the total reverb value,
The mixer output (413) is then passed through ipsi (415-I) and contra (415-C) filters, then mixed with the center channel (420), creating the virtualized binaural signal output (42 ).
The control of the total reverb amount allows control of the virtualization, thereby allowing the manufacturer of the headphones to adapt the virtualization to the specific hardware of the headphones and/or the user to adjust the virtualization experience. In some embodiments, a center-only reverb amount can be controlled by an API (application programming interface), for example from an app on a device paired with the headphones. This control can be automated by the software of the mobile device (e.g., upon detection of a voice in the audio that should have reduced reverb), or it can be set/adjusted the user through a user interface to provide a customized virtualization experience, or both. In some embodiments, the center-only reverb amount is set or adjusted by the headphones themselves (e.g., a pre-set value or offset value in the software/firmware), to provide the best balance based on how the hardware handles reverb.
In some embodiments, the center-only reverb amount is controlled independently from the total reverb amount (given the option of having different values from each other). This helps control the center-vs-(left+right) reverb amount to, for example, avoid too much reverb on voice audio on the center channel while still having enough reverb on the music to provide a virtualized 3D experience.
A straightforward way to generate reverb on the center channel is shown in
A more efficient way to generate reverb on the center channel is shown in
reverb module (555) is instead fed from a mixed input from the input channels (565) and the upmixed left and right channels (570) of the upmixer (560). The mixing is controlled by a center-only reverb value (center reverb amount) similarly to the mixing shown in
Both the center-only reverb value and the total reverb value can be separately controlled by an API.
The efficient reverb generation method (e.g.,
In some embodiments, the mix proportion is controlled as a piecewise non-linear function, such as:
where r is the center-only reverb value (e.g., the API setting), A is a constant to normalize the results (provide a consistent volume), w is a value from the upmixer giving the proportion of a left or right channel (e.g., left channel) in the center channel, thr is a threshold value, and
In some embodiments, reverb generation can be switched between two modes of complexity.
The Z domain expressions of comb filter and all pass filter are
where g1 and g2 are reflection gains and d is a delay in samples.
The following description of a further embodiment will focus on the differences between it and the previously described embodiment. Therefore, features which are common to both embodiments will be omitted from the following description, and so it should be assumed that features of the previously described embodiment are or at least can be implemented in the further embodiment, unless the following description thereof requires otherwise. In some embodiments, the lightweight virtualizer can detect if virtualization is not needed and bypass the virtualization. This can be by API instruction, machine learning derived binaural detection (see, e.g., Chunmao Zhang et al. “Blind Detection Of Binauralized Stereo Content”, WO2019/209930A1, incorporated herein by reference in its entirety), or by receiving an identification of the mobile device or mobile device app that is known to have virtualization.
Several embodiments of the disclosure have been described. Nevertheless, it will be understood that various modifications may be made without departing from the spirit and scope of the present disclosure. Accordingly, other embodiments are within the scope of the following claims.
The examples set forth above are provided to those of ordinary skill in the art as a complete disclosure and description of how to make and use the embodiments of the disclosure and are not intended to limit the scope of what the inventor/inventors regard as their disclosure.
Modifications of the above-described modes for carrying out the methods and systems herein disclosed that are obvious to persons of skill in the art are intended to be within the scope of the following claims. All patents and publications mentioned in the specification are indicative of the levels of skill of those skilled in the art to which the disclosure pertains.
It is to be understood that the disclosure is not limited to particular methods or systems, which can, of course, vary. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting. As used in this specification and the appended claims, the singular forms “a,” “an,” and “the” include plural referents unless the content clearly dictates otherwise. The term “plurality” includes two or more referents unless the content clearly dictates otherwise. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the disclosure pertains.
Number | Date | Country | Kind |
---|---|---|---|
PCT/CN2021/077922 | Feb 2021 | WO | international |
This application claims priority to U.S. Provisional Application No. 63/266,500 filed on Jan. 6, 2022, and U.S. Provisional Application No. 63/168,340 filed on Mar. 31, 2021, titled “LIGHTWEIGHT VIRTUALIZER FOR BINAURAL SIGNAL GENERATION FROM STEREO” and International Application No. PCT/CN2021/077922 filed on Feb. 25, 2021, the contents of which are incorporated by reference in their entirety herein.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US22/17823 | 2/25/2022 | WO |
Number | Date | Country | |
---|---|---|---|
63266500 | Jan 2022 | US | |
63168340 | Mar 2021 | US |