The present invention relates generally to psychoacoustic enhancement of bass sensation, and more particularly to preservation of directionality and stereo image under such enhancement.
Problems of psychoacoustic audio enhancement have been recognized in the conventional art and various techniques have been developed to provide solutions, for example:
Psychoacoustic bass enhancement has received strong interest from consumer electronics manufacturers. Due to physical limitations and cost constraints, products such as low-end speakers and headphones often suffer from inferior bass performance.
Solutions have been proposed based on the psychoacoustic phenomenon known as the “missing fundamental”, whereby the human auditory system can perceive the fundamental frequency of a complex signal according to its higher harmonics.
Many methods of bass enhancement exploit this effect, in essence creating a virtual pitch at low frequencies. It is thus common in the art of audio enhancement to add harmonics to an original signal, without producing the whole low frequency range, so that the audience can perceive the fundamental frequencies even though these frequencies not physically present in the generated sound or if the speakers/headphones cannot even generate the frequencies.
Some further examples for the psychoacoustic effect are shown in U.S. Pat. No. 5,930,373, in “Ben-Tzur, D. et al.: The Effect of MaxxBass Psychoacoustic Bass Enhancement on Loudspeaker Design, 106th AES Convention, Munich, Germany, 1999”, in “Woon S. Gan, Sen. M. Kuo, Chee W. Toh: Virtual bass for home entertainment, multimedia pc, game station and portable audio systems, IEEE Transactions on Consumer Electronics, Vol. 47, No. 4, November 2001, page 787-794”, at “http://www.srslabs.com/partners/aetech/trubass_theory.asp”, at “http://vst-plugins.homemusician.net/instruments/virtual_bass_vb1.html”, at “http://mp3.deepsound.net/plugins_dynamique.php”, and at “http://www.srs-store.com/store-plugins/mall/pdfWOW%20XT%Plug-inmanual.pdf”.
The references cited above teach background information that may be applicable to the presently disclosed subject matter. Therefore the full contents of these publications are incorporated by reference herein where appropriate for appropriate teachings of additional or alternative details, features and/or technical background.
Existing methods for virtual bass enhancement often replace the fundamental bass frequency with its higher harmonics. Such methods typically generate harmonics based on some type of monophonic signal, such as the sum of the stereo input audio channels. These harmonics are often controlled through a nonlinear gain control as shown in [1] or through an amplifier as shown in [3] and [5]. This gain adjustment is often intended to equalize the perceived loudness of the harmonics signal with the perceived loudness of the input fundamental frequency.
With non-monophonic input signals (e.g. stereo, binaural, surround etc.), these methods can suffer from problems, such as:
These problems can become more severe in some consumer devices where the harmonics must be generated in higher frequencies due to the small size of the loudspeakers—as directional cues in higher frequencies are highly important for the stereo image in stereo audio, and for perceived directionality in a binaural signal.
Among the advantages of some embodiments of the presently disclosed subject matter are: providing a bass enhancement effect which can better preserve stereo image, can better preserve directional perception of binaural signals, and can better preserve directional cues including ILD and ITD.
According to one aspect of the presently disclosed subject matter there is provided a method for conveying to a listener a directionality-preserving pseudo low frequency psycho-acoustic sensation of a multichannel sound signal, comprising:
In addition to the above features, the method according to this aspect of the presently disclosed subject matter can comprise one or more of features (i) to (ix) listed below, in any desired combination or permutation which is technically possible:
According to another aspect of the presently disclosed subject matter there is provided a non-transitory program storage device readable by a processing circuitry, tangibly embodying computer readable instructions executable by the processing circuitry to perform a method for conveying to a listener a directionality-preserving pseudo low frequency psycho-acoustic sensation of a multichannel sound signal, comprising:
In order to understand the invention and to see how it can be carried out in practice, embodiments will be described, by way of non-limiting examples, with reference to the accompanying drawings, in which:
In the following detailed description, numerous specific details are set forth in order to provide a thorough understanding of the invention. However, it will be understood by those skilled in the art that the presently disclosed subject matter may be practiced without these specific details. In other instances, well-known methods, procedures, components and circuits have not been described in detail so as not to obscure the presently disclosed subject matter.
Unless specifically stated otherwise, as apparent from the following discussions, it is appreciated that throughout the specification discussions utilizing terms such as “processing”, “computing”, “representing”, “comparing”, “generating”, “assessing”, “matching”, “updating” or the like, refer to the action(s) and/or process(es) of a computer that manipulate and/or transform data into other data, said data represented as physical, such as electronic, quantities and/or said data representing the physical objects. The term “computer” should be expansively construed to cover any kind of hardware-based electronic device with data processing capabilities including, by way of non-limiting example, “processing unit” disclosed in the present application.
The terms “non-transitory memory” and “non-transitory storage medium” used herein should be expansively construed to cover any volatile or non-volatile computer memory suitable to the presently disclosed subject matter.
The operations in accordance with the teachings herein may be performed by a computer specially constructed for the desired purposes or by a general-purpose computer specially configured for the desired purpose by a computer program stored in a non-transitory computer-readable storage medium.
Embodiments of the presently disclosed subject matter are not described with reference to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of the presently disclosed subject matter as described herein.
Human perception of direction of sound is based mainly on directional cues such as ILD (inter-aural level difference) and ITD (inter-aural time difference). A multi-channel audio content to be reproduced is assumed to include ILD and ITD cues resulting from the recording or mixing process. For example: stereo music contains several instruments and vocals, each positioned in a different direction in the stereo image, encoded by a stereophonic microphone used for recording, or by amplitude panning in the multi-track mixing process.
When a subject is listening to loudspeakers, due to the cross-talk from each loudspeaker to the opposite ear, the perceived ITD of a sound source is in fact affected by both the time (or phase) and level differences between the channels of the signal.
However, when monophonic bass harmonics have been added to the signal, the perceived ILD of the fundamental frequency in the original sound (as indicated by the ratio between the level of the fundamental frequency in the left channel to the level of the fundamental frequency in right channel) is not preserved in the harmonics for both headphones and loudspeakers listening setups. By the mono summing of the channels before the harmonics generation, ITD is also not preserved. When the same content is reproduced over limited-range loudspeakers or headphones, lacking bass response, and when some of the bass energy is replaced with higher harmonics for bass-enhancement (e.g. [1]), it is desirable to preserve the directional cues as they would be reproduced by a full-range device.
In order to produce harmonics signal in multi channels system which preserve the stereo image and the ILD of binaural content we should take into consideration the following:
In the descriptions provided hereinbelow, operations are sometimes described, for reasons of convenience, as being applied to all channels, to all frequencies in a channel, to all ILDs etc. It will be understood that in all these cases that, by way of non-limiting example, these operations can be applied to a subset of the channels, frequencies in a channel etc. in some embodiments of the presently disclosed subject matter.
Similarly, in the descriptions provided hereinbelow, operations are sometimes described, for reasons of convenience, using identifiers such as, for example, 390, It will be understood that such descriptions can also pertain, by way of non-limiting example, to identifiers 390a, 390b etc.
Attention is now directed to
Processing Unit 100 is an exemplary system which implements directionality-preserving bass enhancement. Processing Unit 100 can receive a multichannel input signal 105, which can contain various types of audio content such as, by way of non-limiting example, high fidelity stereophonic audio, binaural or surround-sound game content, etc. Processing Unit 100 can output a loudness-preserving and directionality-preserving enhanced bass multichannel output signal 145, which is, for example, suited for output on a restricted-range sound output device such as earphones or a desktop speaker.
Processing unit 100 can be, for example, a signal processing unit based on analog circuitry. Processing unit 100 can, for example, utilize digital signal processing techniques (for example: instead of or in addition to analog circuitry). In this case processing unit 100 can include a DSP (or other type of CPU) and memory. An input audio signal can then be, for example, converted to a digital signal using techniques well-known in the art, and a resulting digital output signal can, for example, similarly be converted to an analog audio signal for further analog processing. in this case the various units shown in
Processing unit 100 can include separation unit 110. Separation unit 110 can separate the low frequencies over a given range of interest from multichannel input signal 105, resulting in multichannel low-frequency signal 115 and multichannel high-frequency signal 125. Separation unit 110 can be implemented by, for example, directing each channel of multichannel input signal 105 through a high-pass filter (HPF) and a low-pass filter (LPF) (arranged in parallel), and passing the HPF output to multichannel hi-frequency signal 125, and the LPF output to multichannel low-frequency signal 115.
Processing unit 100 can include harmonics unit 120. Harmonics unit 120 can generate—for each channel in the multichannel signal—harmonic frequencies according to the fundamental frequencies present in multichannel low-frequency signal 115, and output multichannel harmonic signal 135.
In some embodiments of the presently disclosed subject matter, harmonics unit 120 produces multichannel harmonic signal 135 with some or all of the following characteristics:
The loudness of one signal can be considered as substantially matching the loudness of another signal when, for example, the criteria for “essentially loudness match” specified in [1] are met. A fundamental frequency from which a harmonic is derived is herein referred to as a corresponding fundamental frequency. A channel in the low-frequency multichannel signal from which a channel in the harmonic multichannel signal is derived is herein referred to as a corresponding channel.
The ILD of one pair of channels of a multichannel signal at a particular frequency can be considered as substantially matching the ILD of another pair of channels in the corresponding multichannel signal at a different frequency when, for example, the ILDs have equivalent perceived level difference according to, for example, a frequency-sensitive head-shadowing model such as, for example, the model described in Brown, C. P., Duda, R. O.: An efficient hrtf model for 3-D sound. In: Proceedings of the IEEE ASSP Workshop on Applications of Signal Processing to Audio and Acoustics, IEEE (1997).
Harmonics unit 120 can be implemented in any suitable manner. By way of non-limiting example, harmonics unit 120 can be implemented using a time-domain structure as described herein below with reference to
Processing unit 100 can include mixer unit 130. Mixer unit 130 can combine multichannel high-frequency signal 125 and multichannel harmonic signal 135 to create output multithannel harmonic signal 135. Mixer unit 130 can be implemented, for example, by a mixer circuit or by its digital equivalent.
It is noted that the teachings of the presently disclosed subject matter are not bound by the directionality-preserving bass enhancement system described with reference to
It is noted that the teachings of the presently disclosed subject matter are not bound by the flow chart illustrated in
Attention is now directed to
The processor 100 (for example: harmonics unit 120) can, for each channel, generate 210 a per-channel harmonics signal—including harmonic frequencies corresponding to each fundamental frequency in the channel signal.
The processor 100 (for example: harmonics unit 120) can generate 220 a reference signal derived from the multichannel signal (for example: for every sample in the time domain or for every buffer in the frequency domain).
The processor 100 (for example: harmonics unit 120) can generate 230 a loudness gain adjustment according to the loudness characteristics of the reference signal2
The processor 100 (for example: harmonics unit 120) can generate 240 a directionality gain adjustment for each per-channel harmonics signal, according to the directionality cues between the input signal that generated the per-channel harmonics signal and the reference signal
The processor 100 (for example: harmonics unit 120) can, to each per-channel harmonics signal, apply 250 the generated loudness gain adjustment and ILD gain adjustment.
It is noted that the teachings of the presently disclosed subject matter are not bound by the flow chart illustrated in
Attention is now directed to
For clarity of explanation, exemplary harmonics unit 120 includes processing for two audio channels. It will be clear to one skilled in the art how this teaching is to be applied in embodiments including more than two audio channels.
As described hereinabove with reference to
In some embodiments of the presently disclosed subject matter, the HGU 310a generates, according to its input signal, a harmonics signal 320a consisting of at least the first two harmonic frequencies of each fundamental frequency of the input signal.
A HGU 310 can be implemented. for example, as a recursive feedback loop such as the one described in
In some embodiments of the presently disclosed subject matter, each harmonics signal 320a, 320b is utilized as an input to the Harmonics Level Control unit (HLC) 340. The HLC can output, for example, adjusted harmonics signals 380a 380b, where the adjusted harmonics signals substantially match both a) the loudness of the corresponding original low frequency channel signals and b) directional cue information such as, for example, the ILD or the TTD.
In some embodiments of the presently disclosed subject matter, the HLC 340 includes envelope components 345a, 345b which can determine an envelope for each per-channel harmonic signal. The per-channel envelope can then serve as input to a maximum selection component 350 and also to unlinked gain curve components 370a 370b.
Maximum selection component 350 receives each per-channel envelope as input, and outputs an envelope that is indicative of the loudness of the input channels. in some embodiments of the presently disclosed subject matter, the output envelope can be, for example, the maximum value of the input envelopes. In some embodiments of the presently disclosed subject matter, the output envelope can be, for example, the average value of the input envelopes. The output envelope can be supplied as input to the linked min curve component 360.
The linked gain curve component 360 can yield a gain curve that adjusts the loudness of the corresponding harmonics signal according to a loudness model such as Fletcher-Munson model—so that the loudness (for example as measured in phon) of each generated harmonic frequency is the same as the loudness of the fundamental frequency from which the harmonic was generated.
Linked gain curve component 360 can be implemented, for example, as a dynamic range compressor or an AGC as shown in
The nonlinear unlinked gain curve components 370a 370b can utilize envelope resulting from the maximum selection component 350 to yield a gain curve that adjusts the level of the corresponding harmonics signal according so that the perceived ILD of the harmonics signal substantially matches the ILD of the fundamental frequency.
Unlinked gain curve components 370a 370b can be implemented, for example, as a dynamic range compressor or an AGC as shown in
The linked gains can then be multiplied by the unlinked gains, and the resulting gain signal is applied to both the harmonic signal 320 and as a control signal to the feedback process of the harmonic generator 310.
It is noted that the teachings of the presently disclosed subject matter are not bound by the directionality-preserving bass enhancement system described with reference to
It is noted that the teachings of the presently disclosed subject matter are not bound by the directionality-preserving bass enhancement system described with reference to
Attention is now drawn to
The processing unit (100) (for example: harmonics generator units 310) can, for each channel, generate 410, according to its input signal, a harmonics signal 320a consisting of at least the first two harmonic frequencies of each fundamental frequency of the input signal.
The processing unit (100) (for example: envelope units 345) can, for each channel, calculate 420 an envelope for the harmonics signal.
The processing unit (100) (for example: maximum unit 350) can determine 430 a linked envelope value.
The processing unit (100) (for example: unlinked gain curve 345) can, for each channel, apply 440 a nonlinear gain curve on the unlinked envelope to as to create a gain curve representing the correct ratio between the harmonics (e.g. according to a head shadowing model).
The processing unit (100) (for example: linked gain curve 360) can apply 450 a nonlinear gain curve on the linked envelope to as to create a gain curve representing the correct loudness of the harmonics.
The processing unit (100) (for example: mixer 240) can, for each channel, combine 460 the unlinked gain with the linked gain.
The processing unit (100) (for example: mixer 330) can, for each channel, apply 470 the combined gain curve to the output harmonics signal.
It is noted that the teachings of the presently disclosed subject matter are not bound by the flow chart illustrated in
Attention is now directed to
For clarity of explanation, exemplary harmonics unit 120 includes processing for two audio channels. It will be clear to one skilled in the art how this teaching is to be applied in embodiments including more than two audio channels.
Harmonics unit 120 can optionally include a downsampling component 510. Downsampling component 510 can reduce the original sampling rate by a factor (termed D) so that the highest harmonic frequency will be below the Nyquist frequency of the new sample rate (2*sample_rate/D). By way of non-limiting example, if the highest harmonic frequency is 1400Hz (the 4th harmonic)) and the sample_rate is 48 KHz then D will be 16.
Harmonics unit 120 can include, for example, a Fast Fourier Transform (ITT) component 520. The FFT can convert the input time domain signal to a frequency domain signal. In some embodiments of the presently disclosed subject matter, a different time-domain to frequency-domain conversion method can be used instead of FFT. The FFT can be used, for example, with or without time overlap and/or by summing the bands of a filter-bank.
FFT 520 can, for example, split the frequency domain signal into a group of frequency bands—where each band contains a single fundamental frequency. Each band can further consist of several bins.
Harmonics unit 120 can include—for each band—a Harmonics Level Control component 530 and a pair of harmonics generator components 540, 542 (one per channel). Harmonics Level Control component 530 and harmonics generator components 540, 542 can, for example, receive the per-band multichannel input signal as input.
where “fund” is the linear sound pressure level in the fundamental bin and hN is the linear sound pressure level in the Nth harmonics bin of the relevant fundamental.
Per-band harmonics generators 540, 542 can generate—for each channel of the multichannel signal—a series of harmonics signals (up to Nyquist frequency) with intensity equal to the fundamental frequency intensity. Per-band harmonics generators 540, 542 can generate the harmonics signals using methods known in the art, such as, for example, by applying a pitch shift of the fundamental as described in [2].
Per-band harmonics level control 530 can select, in each band—a channel with the highest fundamental frequency signal intensity (hereforward termed channel iMax).
It is noted that at this stage the level of the harmonics is equal to the level of the fundamental.
Per-band harmonics level control 530 can calculate for each bin in the band for each channel, the LC (loudness compensation) i.e. a gain value to render the loudness of harmonic frequencies of the bin as, for example, substantially matching the loudness of the fundamental frequency of the band in channel iMax. The loudness value can be determined, for example, using a Sound Pressure Level -to-phony ratio based on Fletcher-Munson equal loudness contours.
Optionally, per-band harmonics level control 530 can smooth the loudness compensation gains over time.
Per-band harmonics level control 530 can measure—for each channel and for each band in the channel—an ILD of the fundamental. It can do this, for example, by calculating the ratio between the level of the fundamental frequency in this channel in the input signal and level of the fundamental frequency in channel iMax.
By way of non-limiting example, continuing with the signal described above, the ILD of the fundamental is 0.5/1 i.e. 0.5.
Per-band harmonics level control 530 can calculate—for each channel—for each bin in the band, an ILD compensation gain i.e. a gain value to render the perceived ILD of harmonic frequencies of the bin (relative to channel iMax) as, for example, substantially matching the calculated ILD for the channel (relative to channel iMax).
Perceived ILD can be assessed according to, for example, a head shadowing model such as the exemplary curve shown in
Per-band harmonics level control 530 can derive directionality-preserving compensation gains by, for example, multiplying the calculated ILD of the fundamental by the calculated ILD compensation gains.
Optionally, per-hand harmonics level control 530 can smooth the directionality-preserving compensation gains over time.
Per-band harmonics level control 530 can—for each channel and for each hand within the channel—apply a spectrum modification for the harmonics signal by multiplying the amplitude of each bin by its LC gain and by its ILD gain to create output gain signals. The respective output gains signals can then applied to the harmonic signals generated by per-band harmonics generators 540, 542, An exemplary structure for this processing is shown in detail below, with reference to
Harmonics unit 120 can include, for example, adder 550a and 550b (one adder for each channel), which can sum the harmonic signals from each hand.
Harmonics unit 120 can include, for example, an inverse fast Fourier transform (IFFT) component to convert the frequency domain harmonics signal to time domain. In some embodiments of the presently disclosed subject matter, the conversion can be accomplished via other methods, for example by sum of sinusoids as described in [4]. IFFT can be used with or without time overlap and/or by summing the bands of a filter-bank.
Harmonics unit 120 can optionally include up-sampling units 570—in ratio D—in order to restore the original sample rate.
It is noted that the teachings of the presently disclosed subject matter are not bound by the directionality-preserving bass enhancement system described with reference to
Attention is now drawn to
The method described hereinbelow can be performed, by way of non-limiting example, on a system such as the one described above with reference to
The following description pertains to a method operating, for example, on a signal within the frequency domain—separated into bands which contain a fundamental frequency. Exemplary descriptions of how a frequency domain signal is obtained or how it is utilized are described above, with reference to
By way of non-limiting example, the original signal can appear as follows:
The processing unit (100) (for example: harmonics level generators 540, 542) can—for each fundamental frequency in each channel signal, generate (610) a series of harmonic frequencies. In some embodiments of the presently disclosed subject matter, the processing unit (100) (for example: harmonics level generators 540, 542) generates, for example, series of harmonic lines up to the Nyquist frequency, with intensity of the frequencies equal to the fundamental frequency. Harmonic series can be generated, for example, by a harmonic generation algorithm such as pitch shift.
By way of non-limiting example, after harmonics generation (where ch1 is the reference signal), the signal can appear thus:
In some embodiments of the presently disclosed subject matter, the processing unit (100) (for example: harmonics level generators 540, 542) can generate the harmonic series using a method that synchronizes the harmonic frequencies with phase of the fundamental (such as, by way of non-limiting example, the method described in Sanjaume, Jordi Bonada. Audio Time-Scale Modification in the Context of Professional Audio Post-production. Informàtica i Comunicació digital, Universitat Pompeu Fabra Barcelona. Barcelona, Spain, 2002, (p 63, section 5.2.4). Such a method can, for example, ensure that the ITD of the harmonics signal substantially matches the ITD of the input signal so as to preserve directionality perceived by a listener.
Next, the processing unit (100) (for example: harmonics level control 530) can—for each fundamental frequency—determine (620) a reference signal (with a reference signal intensity) based on the input channel signals, loudness compensation value
Next, the processing unit (100) (for example: harmonics level control 530) can determine (630) a loudness compensation value for each harmonic frequency in each channel, according to the loudness of the fundamental frequency in the reference signal.
A loudness compensation value a gain value to render the loudness of harmonic frequencies of the bin as, for example, substantially matching the loudness of the fundamental frequency of the band in channel iMax. The loudness value can be determined, for example, using a Sound Pressure Level -to-phons ratio based on Fletcher-Munson equal loudness contours.
Optionally, the processing unit (100) (for example: harmonics level control 530) can smooth the loudness compensation gains over time.
The processing unit (100) (for example: harmonics level control 530) can determine (640)—for each channel—for each harmonic frequency in the band, a directionality-preserving ILD compensation value i.e. a gain value to render the perceived ILD of the harmonic frequency (relative to the reference signal) as, for example, substantially matching the calculated ILD for the fundamental channel (relative to the reference signal).
To do this, the processing unit (100) (for example: harmonics level control 530) can first calculate—for each channel and for each band in the channel—an ILD of the fundamental frequency. It can do this, for example, by calculating the ratio between the level of the fundamental frequency in this channel in the input signal and level of the fundamental frequency in the reference signal.
By way of non-limiting example, continuing with the signal described above, the ILD of the fundamental is 0.5/1 i.e. 0.5.
Perceived ILD of a particular harmonic frequency can be assessed according to—for example—the actual observed ILD at the particular frequency, the particular frequency itself, and a model such as—for example—a head shadowing model such as the exemplary curve shown in
By way of non-limiting example, ILD compensation gains for the signal presented above—according to a head shadow curve in relation to the reference signal can be as follows:
The processing unit (100) (for example: harmonics level control 530) can finally compute directionality-preserving compensation values by, for example, multiplying the calculated ILD of the fundamental by the calculated ILD compensation gains.
Optionally, processing unit (100) (for example: harmonics level control 530) can smooth the directionality-preserving compensation gains over time.
By way of non-limiting example, for the signal above, directionality-preserving compensation gain=(ILD of the fundamental×ILD compensation gains), and appears thus:
It is noted that the teachings of the presently disclosed subject matter are not bound by the flow chart illustrated in
It is to be understood that the invention is not limited in its application to the details set forth in the description contained herein or illustrated in the drawings. The invention is capable of other embodiments and of being practiced and carried out in various ways. Hence, it is to be understood that the phraseology and terminology employed herein are for the purpose of description and should not be regarded as limiting. As such, those skilled in the art will appreciate that the conception upon which this disclosure is based may readily be utilized as a basis for designing other structures, methods, and systems for carrying out the several purposes of the presently disclosed subject matter.
It will also be understood that the system according to the invention may be, at least partly, implemented on a suitably programmed computer. likewise, the invention contemplates a computer program being readable by a computer for executing the method of the invention. The invention further contemplates a non-transitory computer-readable memory tangibly embodying a program of instructions executable by the computer for executing the method of the invention.
Those skilled in the art will readily appreciate that various modifications and changes can be applied to the embodiments of the invention as hereinbefore described without departing from its scope, defined in and by the appended claims.
The present application claims benefit from U.S. provisional application No. 62/535,898 “STEREO VIRTUAL BASS ENHANCEMENT” filed on Jul. 23, 2017, which is incorporated hereby by reference in its entirety.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/IL2018/050815 | 7/23/2018 | WO | 00 |
Number | Date | Country | |
---|---|---|---|
62535898 | Jul 2017 | US |