Embodiments of the present disclosure generally relate to the field of audio signal processing and, more particularly, to crosstalk interference reduction and spatial enhancement.
Stereophonic sound reproduction involves encoding and reproducing signals containing spatial properties of a sound field. Stereophonic sound enables a listener to perceive a spatial sense in the sound field.
For example, in
An audio processing system adaptively produces two or more output channels for reproduction with enhanced spatial detectability and reduced crosstalk interference based on parameters of the speakers and the listener's position relative to the speakers. The audio processing system applies a two channel input audio signal to multiple audio processing pipelines that adaptively control how a listener perceives the extent of sound field expansion of the audio signal rendered beyond the physical boundaries of the speakers and the location and intensity of sound components within the expanded sound field. The audio processing pipelines include a sound field enhancement processing pipeline and a crosstalk cancellation processing pipeline for processing the two channel input audio signal (e.g., an audio signal for a left channel speaker and an audio signal for a right channel speaker).
In one embodiment, the sound field enhancement processing pipeline preprocesses the input audio signal prior to performing crosstalk cancellation processing to extract spatial and non-spatial components. The preprocessing adjusts the intensity and balance of the energy in the spatial and non-spatial components of the input audio signal. The spatial component corresponds to a non-correlated portion between two channels (a “side component”), while a nonspatial component corresponds to a correlated portion between the two channels (a “mid component”). The sound field enhancement processing pipeline also enables control of the timbral and spectral characteristic of the spatial and non-spatial components of the input audio signal.
In one aspect of the disclosed embodiments, the sound field enhancement processing pipeline performs a subband spatial enhancement on the input audio signal by dividing each channel of the input audio signal into different frequency subbands and extracting the spatial and nonspatial components in each frequency subband. The sound field enhancement processing pipeline then independently adjusts the energy in one or more of the spatial or nonspatial components in each frequency subband, and adjusts the spectral characteristic of one or more of the spatial and non-spatial components. By dividing the input audio signal according to different frequency subbands and by adjusting the energy of a spatial component with respect to a nonspatial component for each frequency subband, the subband spatially enhanced audio signal attains a better spatial localization when reproduced by the speakers. Adjusting the energy of the spatial component with respect to the nonspatial component may be performed by adjusting the spatial component by a first gain coefficient, the nonspatial component by a second gain coefficient, or both.
In one aspect of the disclosed embodiments, the crosstalk cancellation processing pipeline performs crosstalk cancellation on the subband spatially enhanced audio signal output from the sound field processing pipeline. A signal component (e.g., 118L, 118R) output by a speaker on the same side of the listener's head and received by the listener's ear on that side is herein referred to as “an ipsilateral sound component” (e.g., left channel signal component received at left ear, and right channel signal component received at right ear) and a signal component (e.g., 112L, 112R) output by a speaker on the opposite side of the listener's head is herein referred to as “a contralateral sound component” (e.g., left channel signal component received at right ear, and right channel signal component received at left ear). Contralateral sound components contribute to crosstalk interference, which results in diminished perception of spatiality. The crosstalk cancellation processing pipeline predicts the contralateral sound components and identifies signal components of the input audio signal contributing to the contralateral sound components. The crosstalk cancellation processing pipeline then modifies each channel of the subband spatially enhanced audio signal by adding an inverse of the identified signal components of a channel to the other channel of the subband spatially enhanced audio signal to generate an output audio signal for reproducing sound. As a result, the disclosed system can reduce the contralateral sound components that contribute to crosstalk interference, and improve the perceived spatiality of the output sound.
In one aspect of the disclosed embodiments, an output audio signal is obtained by adaptively processing the input audio signal through the sound field enhancement processing pipeline and subsequently processing through the crosstalk cancellation processing pipeline, according to parameters for speakers' position relative to the listeners. Examples of the parameters of the speakers include a distance between the listener and a speaker, an angle formed by two speakers with respect to the listener. Additional parameters include the frequency response of the speakers, and may include other parameters that can be measured in real time, prior to, or during the pipeline processing. The crosstalk cancellation process is performed using the parameters. For example, a cut-off frequency, delay, and gain associated with the crosstalk cancellation can be determined as a function of the parameters of the speakers. Furthermore, any spectral defects due to the corresponding crosstalk cancellation associated with the parameters of the speakers can be estimated. Moreover, a corresponding crosstalk compensation to compensate for the estimated spectral defects can be performed for one or more subbands through the sound field enhancement processing pipeline.
Accordingly, the sound field enhancement processing, such as the subband spatial enhancement processing and the crosstalk compensation, improves the overall perceived effectiveness of a subsequent crosstalk cancellation processing. As a result, the listener can perceive that the sound is directed to the listener from a large area rather than specific points in space corresponding to the locations of the speakers, and thereby producing a more immersive listening experience to the listener.
The features and advantages described in the specification are not all inclusive and, in particular, many additional features and advantages will be apparent to one of ordinary skill in the art in view of the drawings, specification, and claims. Moreover, it should be noted that the language used in the specification has been principally selected for readability and instructional purposes, and may not have been selected to delineate or circumscribe the inventive subject matter.
The Figures (FIG.) and the following description relate to the preferred embodiments by way of illustration only. It should be noted that from the following discussion, alternative embodiments of the structures and methods disclosed herein will be readily recognized as viable alternatives that may be employed without departing from the principles of the present invention.
Reference will now be made in detail to several embodiments of the present invention(s), examples of which are illustrated in the accompanying figures. It is noted that wherever practicable similar or like reference numbers may be used in the figures and may indicate similar or like functionality. The figures depict embodiments for purposes of illustration only. One skilled in the art will readily recognize from the following description that alternative embodiments of the structures and methods illustrated herein may be employed without departing from the principles described herein.
Example Audio Processing System
In one embodiment, the audio processing system 220 includes a sound field enhancement processing pipeline 210, a crosstalk cancellation processing pipeline 270, and a speaker configuration detector 202. The components of the audio processing system 220 may be implemented in electronic circuits. For example, a hardware component may comprise dedicated circuitry or logic that is configured (e.g., as a special purpose processor, such as a digital signal processor (DSP), field programmable gate array (FPGA) or an application specific integrated circuit (ASIC)) to perform certain operations disclosed herein.
The speaker configuration detector 202 determines parameters 204 of the speakers 280. Examples of parameters of the speakers include a number of speakers, a distance between the listener and a speaker, the subtended listening angle formed by two speakers with respect to the listener (“speaker angle”), output frequency of the speakers, cutoff frequencies, and other quantities that can be predefined or measured in real time. The speaker configuration detector 202 may obtain information describing a type (e.g., built in speaker in phone, built in speaker of a personal computer, a portable speaker, boom box, etc.) from a user input or system input (e.g., headphone jack detection event), and determine the parameters of the speakers according to the type or the model of the speakers 280. Alternatively, the speaker configuration detector 202 can output test signals to each of the speakers 280 and use a built in microphone (not shown) to sample the speaker outputs. From each sampled output, the speaker configuration detector 202 can determine the speaker distance and response characteristics. Speaker angle can be provided by the user (e.g., the listener 120 or another person) either by selection of an angle amount, or based on the speaker type. Alternatively or additional, the speaker angle can be determined through interpreted captured user or system-generated sensor data, such as microphone signal analysis, computer vision analysis of an image taken of the speakers (e.g., using the focal distance to estimate intra-speaker distance, and then the arc-tan of the ratio of one-half of the intra-speaker distance to focal distance to obtain the half-speaker angle), system-integrated gyroscope or accelerometer data. The sound field enhancement processing pipeline 210 receives the input audio signal X, and performs sound field enhancement on the input audio signal X to generate a precompensated signal comprising channels TL and TR. The sound field enhancement processing pipeline 210 performs sound field enhancement using a subband spatial enhancement, and may use the parameters 204 of the speakers 280. In particular, the sound field enhancement processing pipeline 210 adaptively performs (i) subband spatial enhancement on the input audio signal X to enhance spatial information of input audio signal X for one or more frequency subbands, and (ii) performs crosstalk compensation to compensate for any spectral defects due to the subsequent crosstalk cancellation by the crosstalk cancellation processing pipeline 270 according to the parameters of the speakers 280. Detailed implementations and operations of the sound field enhancement processing pipeline 210 are provided with respect to
The crosstalk cancellation processing pipeline 270 receives the precompensated signal T, and performs a crosstalk cancellation on the precompensated signal T to generate the output signal O. The crosstalk cancellation processing pipeline 270 may adaptively perform crosstalk cancellation according to the parameters 204. Detailed implementations and operations of the crosstalk cancellation processing pipeline 270 are provided with respect to
In one embodiment, configurations (e.g., center or cutoff frequencies, quality factor (Q), gain, delay, etc.) of the sound field enhancement processing pipeline 210 and the crosstalk cancellation processing pipeline 270 are determined according to the parameters 204 of the speakers 280. In one aspect, different configurations of the sound field enhancement processing pipeline 210 and the crosstalk cancellation processing pipeline 270 may be stored as one or more look up tables, which can be accessed according to the speaker parameters 204. Configurations based on the speaker parameters 204 can be identified through the one or more look up tables, and applied for performing the sound field enhancement and the crosstalk cancellation.
In one embodiment, configurations of the sound field enhancement processing pipeline 210 may be identified through a first look up table describing an association between the speaker parameters 204 and corresponding configurations of the sound field enhancement processing pipeline 210. For example, if the speaker parameters 204 specify a listening angle (or range) and further specify a type of speakers (or a frequency response range (e.g., 350 Hz and 12 kHz for portable speakers), configurations of the sound field enhancement processing pipeline 210 may be determined through the first look up table. The first look up table may be generated by simulating spectral artifacts of the crosstalk cancellation under various settings (e.g., varying cut off frequencies, gain or delay for performing crosstalk cancellation), and predetermining settings of the sound field enhancement to compensate for the corresponding spectral artifacts. Moreover, the speaker parameters 204 can be mapped to configurations of the sound field enhancement processing pipeline 210 according to the crosstalk cancellation. For example, configurations of the sound field enhancements processing pipeline 210 to correct spectral artifacts of a particular crosstalk cancellation may be stored in the first look up table for the speakers 280 associated with the crosstalk cancellation.
In one embodiment, configurations of the crosstalk cancellation processing pipeline 270 are identified through a second look up table describing an association between various speaker parameters 204 and corresponding configurations (e.g., cut off frequency, center frequency, Q, gain, and delay) of the crosstalk cancellation processing pipeline 270. For example, if the speakers 280 of a particular type (e.g., portable speaker) are arranged in a particular angle, configurations of the crosstalk cancellation processing pipeline 270 for performing crosstalk cancellation for the speakers 280 may be determined through the second look up table. The second look up table may be generated through empirical experiments by testing sound generated under various settings (e.g., distance, angle, etc.) of various speakers 280.
The subband spatial audio processor 230 receives 370 the input audio signal X comprising two channels, such as left channel XL and right channel XR, and performs 372 a subband spatial enhancement on the input audio signal X to generate a spatially enhanced audio signal Y comprising two channels, such as left channel YL and right channel YR. In one embodiment, the subband spatial enhancement includes applying the left channel YL and right channel YR to a crossover network that divides each channel of the input audio signal X into different input subband signals X(k). The crossover network comprises multiple filters arranged in various circuit topologies as discussed with reference to the frequency band divider 410 shown in
The crosstalk compensation processor 240 performs 374 a crosstalk compensation to compensate for artifacts resulting from a crosstalk cancellation. These artifacts, resulting primarily from the summation of the delayed and inverted contralateral sound components with their corresponding ipsilateral sound components in the crosstalk cancellation processor 260, introduce a comb filter-like frequency response to the final rendered result. Based on the specific delay, amplification, or filtering applied in the crosstalk cancellation processor 260, the amount and characteristics (e.g., center frequency, gain, and Q) of sub-Nyquist comb filter peaks and troughs shift up and down in the frequency response, causing variable amplification and/or attenuation of energy in specific regions of the spectrum. The crosstalk compensation may be performed as a preprocessing step by delaying or amplifying, for a given parameter of the speakers 280, the input audio signal X for a particular frequency band, prior to the crosstalk cancellation performed by the crosstalk cancellation processor 260. In one implementation, the crosstalk compensation is performed on the input audio signal X to generate a crosstalk compensation signal Z in parallel with the subband spatial enhancement performed by the subband spatial audio processor 230. In this implementation, the combiner 250 combines 376 the crosstalk compensation signal Z with each of two channels YL and YR to generate a precompensated signal T comprising two precompensated channels TL and TR. Alternatively, the crosstalk compensation is performed sequentially after the subband spatial enhancement, after the crosstalk cancellation, or integrated with the subband spatial enhancement. Details of the crosstalk compensation are described below with respect to
The crosstalk cancellation processor 260 performs 378 a crosstalk cancellation to generate output channels OL and OR. More particularly, the crosstalk cancellation processor 260 receives the precompensated channels TL and TR from the combiner 250, and performs a crosstalk cancellation on the precompensated channels TL and TR to generate the output channels OL and OR. For a channel (L/R), the crosstalk cancellation processor 260 estimates a contralateral sound component due to the precompensated channel T(L/R) and identifies a portion of the precompensated channel T(L/R) contributing to the contralateral sound component according the speaker parameters 204. The crosstalk cancellation processor 260 adds an inverse of the identified portion of the precompensated channel T(L/R) to the other precompensated channel T(R/L) to generate the output channel O(R/L). In this configuration, a wavefront of an ipsilateral sound component output by the speaker 280(R/L) according to the output channel O(R/L) arrived at an ear 125(R/L) can cancel a wavefront of a contralateral sound component output by the other speaker 280(L/R) according to the output channel O(L/R), thereby effectively removing the contralateral sound component due to the output channel O(L/R). Alternatively, the crosstalk cancellation processor 260 may perform the crosstalk cancelation on the spatially enhanced audio signal Y from the subband spatial audio processor 230 or on the input audio signal X instead. Details of the crosstalk cancellation are described below with respect to
In one configuration, the frequency band divider 410, or filterbank, is a crossover network that includes multiple filters arranged in any of various circuit topologies, such as serial, parallel, or derived. Example filter types included in the crossover network include infinite impulse response (IIR) or finite impulse response (FIR) bandpass filters, IIR peaking and shelving filters, Linkwitz-Riley, or other filter types known to those of ordinary skill in the audio signal processing art. The filters divide the left input channel XL into left subband components XL(k), and divide the right input channel XR into right subband components XR(k) for each frequency subband k. In one approach, four bandpass filters, or any combinations of low pass filter, bandpass filter, and a high pass filter, are employed to approximate the critical bands of the human ear. A critical band corresponds to the bandwidth of within which a second tone is able to mask an existing primary tone. For example, each of the frequency subbands may correspond to a consolidated Bark scale to mimic critical bands of human hearing. For example, the frequency band divider 410 divides the left input channel XL into the four left subband components XL(k), corresponding to 0 to 300 Hz, 300 to 510 Hz, 510 to 2700 Hz, and 2700 to Nyquist frequency respectively, and similarly divides the right input channel XR into the right subband components XR(k) for corresponding frequency bands. The process of determining a consolidated set of critical bands includes using a corpus of audio samples from a wide variety of musical genres, and determining from the samples a long term average energy ratio of mid to side components over the 24 Bark scale critical bands. Contiguous frequency bands with similar long term average ratios are then grouped together to form the set of critical bands. In other implementations, the filters separate the left and right input channels into fewer or greater than four subbands. The range of frequency bands may be adjustable. The frequency band divider 410 outputs a pair of a left subband component XL(k) and a right subband component XR(k) to a corresponding L/R to M/S converter 420(k).
A L/R to M/S converter 420(k), a mid/side processor 430(k), and a M/S to L/R converter 440(k) in each frequency subband k operate together to enhance a spatial subband component Xs(k) (also referred to as “a side subband component”) with respect to a nonspatial subband component Xn(k) (also referred to as “a mid subband component”) in its respective frequency subband k. Specifically, each L/R to M/S converter 420(k) receives a pair of subband components XL(k), XR(k) for a given frequency subband k, and converts these inputs into a mid subband component and a side subband component. In one embodiment, the nonspatial subband component Xn(k) corresponds to a correlated portion between the left subband component XL(k) and the right subband component XR(k), hence, includes nonspatial information. Moreover, the spatial subband component Xs(k) corresponds to a non-correlated portion between the left subband component XL(k) and the right subband component XR(k), hence includes spatial information. The nonspatial subband component Xn(k) may be computed as a sum of the left subband component XL(k) and the right subband component XR(k), and the spatial subband component Xs(k) may be computed as a difference between the left subband component XL(k) and the right subband component XR(k). In one example, the L/R to M/S converter 420 obtains the spatial subband component Xs(k) and nonspatial subband component Xn(k) of the frequency band according to a following equations:
Xs(k)=XL(k)−XR(k) for subbandk Eq. (1)
Xn(k)=XL(k)+XR(k) for subbandk Eq. (2)
Each mid/side processor 430(k) enhances the received spatial subband component Xs(k) with respect to the received nonspatial subband component Xn(k) to generate an enhanced spatial subband component Ys(k) and an enhanced nonspatial subband component Yn(k) for a subband k. In one embodiment, the mid/side processor 430(k) adjusts the nonspatial subband component Xn(k) by a corresponding gain coefficient Gn(k), and delays the amplified nonspatial subband component Gn(k)*Xn(k) by a corresponding delay function D[ ] to generate an enhanced nonspatial subband component Yn(k). Similarly, the mid/side processor 430(k) adjusts the received spatial subband component Xs(k) by a corresponding gain coefficient Gs(k), and delays the amplified spatial subband component Gs(k)*Xs(k) by a corresponding delay function D to generate an enhanced spatial subband component Ys(k). The gain coefficients and the delay amount may be adjustable. The gain coefficients and the delay amount may be determined according to the speaker parameters 204 or may be fixed for an assumed set of parameter values. Each mid/side processor 430(k) outputs the nonspatial subband component Xn(k) and the spatial subband component Xs(k) to a corresponding M/S to L/R converter 440(k) of the respective frequency subband k. The mid/side processor 430(k) of a frequency subband k generates an enhanced non-spatial subband component Yn(k) and an enhanced spatial subband component Ys(k) according to following equations:
Yn(k)=Gn(k)*D[Xn(k),k] for subbandk Eq. (3)
Ys(k)=Gs(k)*D[Xs(k),k] for subbandk Eq. (4)
Examples of gain and delay coefficients are listed in the following Table 1.
Each M/S to L/R converter 440(k) receives an enhanced nonspatial component Yn(k) and an enhanced spatial component Ys(k), and converts them into an enhanced left subband component YL(k) and an enhanced right subband component YR(k). Assuming that a L/R to M/S converter 420(k) generates the nonspatial subband component Xn(k) and the spatial subband component Xs(k) according to Eq. (1) and Eq. (2) above, the M/S to L/R converter 440(k) generates the enhanced left subband component YL(k) and the enhanced right subband component YR(k) of the frequency subband k according to following equations:
YL(k)=(Yn(k)+Ys(k))/2 for subband k Eq. (5)
YR(k)=(Yn(k)−Ys(k))/2 for subband k Eq. (6)
In one embodiment, XL(k) and XR(k) in Eq. (1) and Eq. (2) may be swapped, in which case YL(k) and YR(k) in Eq. (5) and Eq. (6) are swapped as well.
The frequency band combiner 450 combines the enhanced left subband components in different frequency bands from the M/S to L/R converters 440 to generate the left spatially enhanced audio channel YL and combines the enhanced right subband components in different frequency bands from the M/S to L/R converters 440 to generate the right spatially enhanced audio channel YR, according to following equations:
YL=ΣYL(k) Eq. (7)
YR=ΣYR(k) Eq. (8)
Although in the embodiment of
The subband spatial audio processor 230 receives an input signal comprising input channels XL, XR. The subband spatial audio processor 230 divides 510 the input channel XL into XL(k) (e.g., k=4) subband components, e.g., XL(1), XL(2), XL(3) XL(4), and the input channel XR(k) into subband components, e.g., XR(1), XR(2), XR(3) XR(4) according to k frequency subbands, e.g., subband encompassing 0 to 300 Hz, 300 to 510 Hz, 510 to 2700 Hz, and 2700 to Nyquist frequency, respectively.
The subband spatial audio processor 230 performs subband spatial enhancement on the subband components for each frequency subband k. Specifically, the subband spatial audio processor 230 generates 515, for each subband k, a spatial subband component Xs(k) and a nonspatial subband component Xn(k) based on subband components XL(k), XR(k), for example, according to Eq. (1) and Eq. (2) above. In addition, the subband spatial audio processor 230 generates 520, for the subband k, an enhanced spatial component Ys(k) and an enhanced nonspatial component Yn(k) based on the spatial subband component Xs(k) and nonspatial subband component Xn(k), for example, according to Eq. (3) and Eq. (4) above. Moreover, the subband spatial audio processor 230 generates 525, for the subband k, enhanced subband components YL(k), YR(k) based on the enhanced spatial component Ys(k) and the enhanced nonspatial component Yn(k), for example, according to Eq. (5) and Eq. (6) above.
The subband spatial audio processor 230 generates 530 a spatially enhanced channel YL by combining all enhanced subband components YL(k) and generates a spatially enhanced channel YR by combining all enhanced subband components YR(k).
The L&R combiner 610 receives the left input audio channel XL and the right input audio channel XR, and generates a nonspatial component Xn of the input channels XL, XR. In one aspect of the disclosed embodiments, the nonspatial component Xn corresponds to a correlated portion between the left input channel XL and the right input channel XR. The L&R combiner 610 may add the left input channel XL and the right input channel XR to generate the correlated portion, which corresponds to the nonspatial component Xn of the input audio channels XL, XR as shown in the following equation:
Xn=XL+XR Eq. (9)
The nonspatial component processor 620 receives the nonspatial component Xn, and performs the nonspatial enhancement on the nonspatial component Xn to generate the crosstalk compensation signal Z. In one aspect of the disclosed embodiments, the nonspatial component processor 620 performs a preprocessing on the nonspatial component Xn of the input channels XL, XR to compensate for any artifacts in a subsequent crosstalk cancellation. A frequency response plot of the nonspatial signal component of a subsequent crosstalk cancellation can be obtained through simulation. In addition, by analyzing the frequency response plot, any spectral defects such as peaks or troughs in the frequency response plot over a predetermined threshold (e.g., 10 dB) occurring as an artifact of the crosstalk cancellation can be estimated. These artifacts result primarily from the summation of the delayed and inverted contralateral signals with their corresponding ipsilateral signal in the crosstalk cancellation processor 260, thereby effectively introducing a comb filter-like frequency response to the final rendered result. The crosstalk compensation signal Z can be generated by the nonspatial component processor 620 to compensate for the estimated peaks or troughs. Specifically, based on the specific delay, filtering frequency, and gain applied in the crosstalk cancellation processor 260, peaks and troughs shift up and down in the frequency response, causing variable amplification and/or attenuation of energy in specific regions of the spectrum.
In one implementation, the nonspatial component processor 620 includes an amplifier 660, a filter 670 and a delay unit 680 to generate the crosstalk compensation signal Z to compensate for the estimated spectral defects of the crosstalk cancellation. In one example implementation, the amplifier 660 amplifies the nonspatial component Xn by a gain coefficient Gn, and the filter 670 performs a 2nd order peaking EQ filter F[ ] on the amplified nonspatial component Gn*Xn. Output of the filter 670 may be delayed by the delay unit 680 by a delay function D. The filter, amplifier, and the delay unit may be arranged in cascade in any sequence. The filter, amplifier, and the delay unit may be implemented with adjustable configurations (e.g., center frequency, cut off frequency, gain coefficient, delay amount, etc.). In one example, the nonspatial component processor 620 generates the crosstalk compensation signal Z, according to equation below:
Z=D[F[Gn*Xn]] Eq. (10)
As described above with respect to
In one example, for a particular type of speakers (small/portable speakers or large speakers), filter center frequency, filter gain and quality factor of the filter 670 can be determined, according to an angle formed between two speakers 280 with respect to a listener. In some embodiments, values between the speaker angles are used to interpolate other values.
In some embodiments, the nonspatial component processor 620 may be integrated into subband spatial audio processor 230 (e.g., mid/side processor 430) and compensate for spectral artifacts of a subsequent crosstalk cancellation for one or more frequency subbands.
The crosstalk compensation processor 240 receives an input audio signal comprising input channels XL and XR. The crosstalk compensation processor 240 generates 710 a nonspatial component Xn between the input channels XL and XR, for example, according to Eq. (9) above.
The crosstalk compensation processor 240 determines 720 configurations (e.g., filter parameters) for performing crosstalk compensation as described above with respect to
By dividing the input audio signal T into different frequency band components and by performing crosstalk cancellation on selective components (e.g., inband components), crosstalk cancellation can be performed for a particular frequency band while obviating degradations in other frequency bands. If crosstalk cancellation is performed without dividing the input audio signal T into different frequency bands, the audio signal after such crosstalk cancellation may exhibit significant attenuation or amplification in the nonspatial and spatial components in low frequency (e.g., below 350 Hz), higher frequency (e.g., above 12000 Hz), or both. By selectively performing crosstalk cancellation for the inband (e.g., between 250 Hz and 14000 Hz), where the vast majority of impactful spatial cues reside, a balanced overall energy, particularly in the nonspatial component, across the spectrum in the mix can be retained.
In one configuration, the frequency band divider 810 or a filterbank divides the input channels TL, TR into inband channels TL,In, TR,In and out of band channels TL,Out, TR,Out, respectively. Particularly, the frequency band divider 810 divides the left input channel TL into a left inband channel TL,In and a left out of band channel TL,Out. Similarly, the frequency band divider 810 divides the right input channel TR into a right inband channel TR,In and a right out of band channel TR,Out. Each inband channel may encompass a portion of a respective input channel corresponding to a frequency range including, for example, 250 Hz to 14 kHz. The range of frequency bands may be adjustable, for example according to speaker parameters 204.
The inverter 820A and the contralateral estimator 825A operate together to generate a contralateral cancellation component SL to compensate for a contralateral sound component due to the left inband channel TL,In. Similarly, the inverter 820B and the contralateral estimator 825B operate together to generate a contralateral cancellation component SR to compensate for a contralateral sound component due to the right inband channel TR,In.
In one approach, the inverter 820A receives the inband channel TL,In and inverts a polarity of the received inband channel TL,In to generate an inverted inband channel TL,In′. The contralateral estimator 825A receives the inverted inband channel TL,In′, and extracts a portion of the inverted inband channel TL,In′ corresponding to a contralateral sound component through filtering. Because the filtering is performed on the inverted inband channel TL,In′, the portion extracted by the contralateral estimator 825A becomes an inverse of a portion of the inband channel TL,In attributing to the contralateral sound component. Hence, the portion extracted by the contralateral estimator 825A becomes a contralateral cancellation component SL, which can be added to a counterpart inband channel TR,In to reduce the contralateral sound component due to the inband channel TL,In. In some embodiments, the inverter 820A and the contralateral estimator 825A are implemented in a different sequence.
The inverter 820B and the contralateral estimator 825B perform similar operations with respect to the inband channel TR,In to generate the contralateral cancellation component SR. Therefore, detailed description thereof is omitted herein for the sake of brevity.
In one example implementation, the contralateral estimator 825A includes a filter 852A, an amplifier 854A, and a delay unit 856A. The filter 852A receives the inverted input channel TL,In′ and extracts a portion of the inverted inband channel TL,In′ corresponding to a contralateral sound component through filtering function F. An example filter implementation is a Notch or Highshelf filter with a center frequency selected between 5000 and 10000 Hz, and Q selected between 0.5 and 1.0. Gain in decibels (GdB) may be derived from the following formula:
GdB=−3.0−log1.333(D) Eq. (11)
where D is a delay amount by delay unit 856A/B in samples, for example, at a sampling rate of 48 KHz. An alternate implementation is a Lowpass filter with a corner frequency selected between 5000 and 10000 Hz, and Q selected between 0.5 and 1.0. Moreover, the amplifier 854A amplifies the extracted portion by a corresponding gain coefficient GL,In, and the delay unit 856A delays the amplified output from the amplifier 854A according to a delay function D to generate the contralateral cancellation component SL. The contralateral estimator 825B performs similar operations on the inverted inband channel TR,In′ to generate the contralateral cancellation component SR. In one example, the contralateral estimators 825A, 825B generate the contralateral cancellation components SL, SR, according to equations below:
SL=D[GL,In*F[TL,In′]] Eq. (12)
SR=D[GR,In*F[TR,In′]] Eq. (13)
As described above with respect to
In one example, filter center frequency, delay amount, amplifier gain, and filter gain can be determined, according to an angle formed between two speakers 280 with respect to a listener. In some embodiments, values between the speaker angles are used to interpolate other values.
The combiner 830A combines the contralateral cancellation component SR to the left inband channel TL,In to generate a left inband compensated channel CL, and the combiner 830B combines the contralateral cancellation component SL to the right inband channel TR,In to generate a right inband compensated channel CR. The frequency band combiner 840 combines the inband compensated channels CL, CR with the out of band channels TL,Out, TR,Out to generate the output audio channels OL, OR, respectively.
Accordingly, the output audio channel OL includes the contralateral cancellation component SR corresponding to an inverse of a portion of the inband channel TR,In attributing to the contralateral sound, and the output audio channel OR includes the contralateral cancellation component SL corresponding to an inverse of a portion of the inband channel TL,In attributing to the contralateral sound. In this configuration, a wavefront of an ipsilateral sound component output by the speaker 280R according to the output channel OR arrived at the right ear can cancel a wavefront of a contralateral sound component output by the speaker 280L according to the output channel OL. Similarly, a wavefront of an ipsilateral sound component output by the speaker 280L according to the output channel OL arrived at the left ear can cancel a wavefront of a contralateral sound component output by the speaker 280R according to the output channel OR. Thus, contralateral sound components can be reduced to enhance spatial detectability.
The crosstalk cancellation processor 260 receives an input signal comprising input channels TL, TR. The input signal may be output TL, TR from the combiner 250. The crosstalk cancellation processor 260 divides 910 an input channel TL into an inband channel TL,In and an out of band channel TL,Out. Similarly, the crosstalk cancellation processor 260 divides 915 the input channel TR into an inband channel TR,In and an out of band channel TR,Out. The input channels TL, TR may be divided into the in-band channels and the out of band channels by the frequency band divider 810, as described above with respect to
The crosstalk cancellation processor 260 generates 925 a crosstalk cancellation component SL based on a portion of the inband channel TL,In contributing to a contralateral sound component for example, according to Table 4 and Eq. (12) above. Similarly, the crosstalk cancellation processor 260 generates 935 a crosstalk cancellation component SR contributing to a contralateral sound component based on the identified portion of the inband channel TR,In, for example, according to Table 4 and Eq. (13).
The crosstalk cancellation processor 260 generates an output audio channel OL by combining 940 the inband channel TL,In, crosstalk cancellation component SR, and out of band channel TL,Out. Similarly, the crosstalk cancellation processor 260 generates an output audio channel OR by combining 945 the inband channel TR,In, crosstalk cancellation component SL, and out of band channel TR,Out.
The output channels OL, OR can be provided to respective speakers to reproduce stereo sound with reduced crosstalk and improved spatial detectability.
Upon reading this disclosure, those of skill in the art will appreciate still additional alternative embodiments through the disclosed principles herein. Thus, while particular embodiments and applications have been illustrated and described, it is to be understood that the disclosed embodiments are not limited to the precise construction and components disclosed herein. Various modifications, changes and variations, which will be apparent to those skilled in the art, may be made in the arrangement, operation and details of the method and apparatus disclosed herein without departing from the scope described herein.
Any of the steps, operations, or processes described herein may be performed or implemented with one or more hardware or software modules, alone or in combination with other devices. In one embodiment, a software module is implemented with a computer program product comprising a computer readable medium (e.g., non-transitory computer readable medium) containing computer program code, which can be executed by a computer processor for performing any or all of the steps, operations, or processes described.
This application is a divisional of U.S. patent application Ser. No. 15/409,278, entitled “Subband Spatial and Crosstalk Cancellation for Audio Reproduction,” filed on Jan. 18, 2017, which is a continuation of International Application No. PCT/US17/13061, entitled “Subband Spatial and Crosstalk Cancellation for Audio Reproduction,” filed Jan. 11, 2017, which claims priority under 35 U.S.C. § 119(e) from U.S. Provisional Patent Application No. 62/280,119, entitled “Sub-Band Spatial and Cross-Talk Cancellation Algorithm for Audio Reproduction,” filed on Jan. 18, 2016, and U.S. Provisional Patent Application No. 62/388,366, entitled “Sub-Band Spatial and Cross-Talk Cancellation Algorithm for Audio Reproduction,” filed on Jan. 29, 2016, all of which are incorporated by reference herein in their entirety.
Number | Name | Date | Kind |
---|---|---|---|
3920904 | Blauert | Nov 1975 | A |
4910778 | Barton | Mar 1990 | A |
6614910 | Clemow | Sep 2003 | B1 |
8213648 | Kimijima | Jul 2012 | B2 |
9351073 | Alexandrov | May 2016 | B1 |
20020039421 | Kirkeby | Apr 2002 | A1 |
20050265558 | Neoran | Dec 2005 | A1 |
20070213990 | Moon et al. | Sep 2007 | A1 |
20070223708 | Villemoes et al. | Sep 2007 | A1 |
20080031462 | Walsh et al. | Feb 2008 | A1 |
20080165975 | Oh et al. | Jul 2008 | A1 |
20080249769 | Baumgarte | Oct 2008 | A1 |
20080273721 | Walsh | Nov 2008 | A1 |
20090086982 | Kulkarni et al. | Apr 2009 | A1 |
20090262947 | Karlsson et al. | Oct 2009 | A1 |
20090304189 | Vinton | Dec 2009 | A1 |
20110152601 | Puria et al. | Jun 2011 | A1 |
20110188660 | Xu et al. | Aug 2011 | A1 |
20110268281 | Florencio et al. | Nov 2011 | A1 |
20120099733 | Wang et al. | Apr 2012 | A1 |
20120170756 | Kraemer et al. | Jul 2012 | A1 |
20150036826 | Trammell | Feb 2015 | A1 |
20150125010 | Yang et al. | May 2015 | A1 |
20160249151 | Grosche et al. | Aug 2016 | A1 |
20170208411 | Seldess et al. | Jul 2017 | A1 |
20170230777 | Seldess et al. | Aug 2017 | A1 |
Number | Date | Country |
---|---|---|
101040565 | Sep 2007 | CN |
101346895 | Jan 2009 | CN |
100481722 | Apr 2009 | CN |
101406074 | Apr 2009 | CN |
1941073 | Oct 2010 | CN |
102007780 | Apr 2011 | CN |
102737647 | Oct 2012 | CN |
101884065 | Jul 2013 | CN |
104519444 | Apr 2015 | CN |
103765507 | Jan 2016 | CN |
102893331 | Mar 2016 | CN |
103928030 | Mar 2017 | CN |
1 194 007 | Apr 2002 | EP |
2 099 238 | Sep 2009 | EP |
2 560 161 | Feb 2013 | EP |
2419265 | Apr 2006 | GB |
2000-050399 | Feb 2000 | JP |
2002-159100 | May 2002 | JP |
2007-336118 | Dec 2007 | JP |
4887420 | Feb 2012 | JP |
2013-013042 | Jan 2013 | JP |
5772356 | Sep 2015 | JP |
10-2009-0074191 | Jul 2009 | KR |
10-2012-0077763 | Jul 2012 | KR |
I484484 | May 2015 | TW |
I489447 | Jun 2015 | TW |
201532035 | Aug 2015 | TW |
WO 2004049759 | Jun 2004 | WO |
WO 2009022463 | Feb 2009 | WO |
WO 2009127515 | Oct 2009 | WO |
WO 2011151771 | Dec 2011 | WO |
WO 2012036912 | Mar 2012 | WO |
WO 2013181172 | Dec 2013 | WO |
Entry |
---|
European Patent Office, Extended European Search Report and Opinion, EP Patent Application No. 17741772.2, dated Jul. 17, 2019, eight pages. |
“Bark scale,” Wikipedia.org, Last Modified Jul. 14, 2016, 4 pages, [Online] [Retrieved on Apr. 20, 2017] Retrieved from the Internet<URL:https://en.wikipedia.org/wiki/Bark_scale>. |
Korean First Office Action, Korean Application No. 2017-7031417, dated Nov. 30, 2017, 7 pages. |
Korean First Office Action, Korean Application No. 2017-7031493, dated Dec. 1, 2017, 6 pages. |
Korean Notice of Allowance, Korean Application No. 10-2017-7031417, dated Apr. 6, 2018, 4 pages. |
New Zealand Intellectual Property Office, First Examination Report, NZ Patent Application No. 745415, dated Sep. 14, 2018, four pages. |
PCT International Search Report and Written Opinion, PCT Application No. PCT/US2017/013061, dated Apr. 18, 2017, 12 pages. |
PCT International Search Report and Written Opinion, PCT Application No. PCT/US2017/013249, dated Apr. 18, 2017, 20 pages. |
Taiwan Office Action, Taiwan Application No. 106101748, dated Aug. 15, 2017, 6 pages. (with concise explanation of relevance). |
Taiwan Office Action, Taiwan Application No. 106101777, dated Aug. 15, 2017, 6 pages. (with concise explanation of relevance). |
Taiwan Office Action, Taiwan Application No. 106138743, dated Mar. 14, 2018, 7 pages. |
United States Office Action, U.S. Appl. No. 15/646,821, dated Dec. 6, 2018, 19 pages. |
Japan Patent Office, Official Notice of Rejection, JP Patent Application No. 2018-538234, dated Jan. 15, 2019, five pages. |
PCT International Search Report and Written Opinion, PCT Application No. PCT/US19/23243, dated Jun. 6, 2019, 18 pages. |
United States Office Action, U.S. Appl. No. 15/933,207, dated Sep. 3, 2019, 12 pages. |
European Patent Office, Extended European Search Report and Opinion, EP Patent Application No. 17741783.9, dated Oct. 31, 2019, 11 pages. |
Gerzon, M., “Stereo Shuffling: New Approach—Old Technique,” Studio Sound and Broadcast Engineer, Jul. 1, 1986, pp. 122-130. |
Thomas, M. V., “Improving the Stereo Headphone Sound Image,” Journal of the Audio Engineering Society, vol. 25, No. 7-8, Jul.-Aug. 1977, pp. 474-478. |
Walsh, M. et al., “Loudspeaker-Based 3-D Audio System Design Using the M-S Shuffler Matrix,” AES Convention 121, Oct. 2006, pp. 1-17. |
United States Office Action, U.S. Appl. No. 15/933,207, dated Feb. 14, 2020, 12 pages. |
China National Intellectual Property Administration, Notification of the First Office Action, CN Patent Application No. 201780018587.0, Feb. 26, 2020, 14 pages. |
China National Intellectual Property Administration, Notification of the First Office Action, CN Patent Application No. 201780018313.1, Mar. 19, 2020, 13 pages. |
Number | Date | Country | |
---|---|---|---|
20190090061 A1 | Mar 2019 | US |
Number | Date | Country | |
---|---|---|---|
62280119 | Jan 2016 | US | |
62388366 | Jan 2016 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 15409278 | Jan 2017 | US |
Child | 16192522 | US |
Number | Date | Country | |
---|---|---|---|
Parent | PCT/US2017/013061 | Jan 2017 | US |
Child | 15409278 | US |