Devices such as notebooks, desktop computers, mobile telephones, tablets, and other such devices may include speakers or utilize headphones to reproduce sound. The sound emitted from such devices may be subject to a variety of processes that modify the sound quality.
Features of the present disclosure are illustrated by way of example and not limited in the following figure(s), in which like numerals indicate like elements, in which:
For simplicity and illustrative purposes, the present disclosure is described by referring mainly to examples. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure. It will be readily apparent however, that the present disclosure may be practiced without limitation to these specific details. In other instances, some methods and structures have not been described in detail so as not to unnecessarily obscure the present disclosure.
Throughout the present disclosure, the terms “a” and “an” are intended to denote at least one of a particular element. As used herein, the term “includes” means includes but not limited to, the term “including” means including but not limited to. The term “based on” means based at least in part on.
Crosstalk cancellation for speaker-based spatial rendering apparatuses, methods for crosstalk cancellation for speaker-based spatial rendering, and non-transitory computer readable media having stored thereon machine readable instructions to provide crosstalk cancellation for speaker-based spatial rendering are disclosed herein. The apparatuses, methods, and non-transitory computer readable media disclosed herein provide for crosstalk cancellation based on perceptual smoothing of head-related transfer functions (HRTFs), insertion of an inter-aural time difference, and time-domain inversion of a regularized matrix determined from the perceptually smoothed HRTFs.
With respect to crosstalk cancellation, devices such as notebooks, desktop computers, mobile telephones, tablets, and other such devices may include speakers or utilize headphones to reproduce sound. Such devices may utilize a high-quality audio reproduction to create an immersive experience for cinematic and music content. The cinematic content may be multichannel (e.g., 5.1, 7.1, etc., where 5.1 represents “five point one” and includes a six channel surround sound audio system, 7.1 represents “seven point one” and includes an eight channel surround sound audio system, etc.). Elements that contribute towards a high-quality audio experience may include the frequency response (e.g., bass extension) of the speakers or drivers, and proper equalization to attain a desired spectral balance. Other elements that contribute towards a high-quality audio experience may include artifact-free loudness processing to accentuate masked signals and improve loudness, and spatial quality that reflects artistic intent for stereo music and multichannel cinematic content.
With respect to spatial rendering with speakers, crosstalk cancellation may provide for the reproduction of virtual sound sources at a listener's ears by inverting acoustic transfer paths. A crosstalk canceller (e.g., a crosstalk cancellation filter) may be updated in real time according to the head position of a listener, as the angles of the speakers relative to a center of listener's head change with lateral head movements. Crosstalk cancellers may present technical challenges with respect to the introduction of artifacts in a rendering over the speakers. These artifacts may include frequency-domain-based artifacts (e.g., over-excursion of the speakers in the low and high-frequencies, artifacts in the voice-region, etc.), as well as temporal artifacts (e.g., metallic and reverberant sound processing).
In order to address at least these technical challenges associated with the introduction of artifacts, the apparatuses, methods, and non-transitory computer readable media disclosed herein provide for crosstalk cancellation that provides for a sense of relatively strong immersion with respect to sound and imperceptible artifacts. In this regard, the apparatuses, methods, and non-transitory computer readable media disclosed herein provide for crosstalk cancellation based on perceptual smoothing of the HRTFs, insertion of an inter-aural time difference, as well as constrained inversion of a cancellation matrix for crosstalk cancellation. An HRTF may be described as a response that characterizes how an ear receives a sound from a point in space.
For the apparatuses, methods, and non-transitory computer readable media disclosed herein, the perceptual smoothing provides for reduction of the effect of a “sweet-spot” caused by lateral head-movements of a listener. In this regard, the sweet-spot may represent a focal point between two speakers where a listener is fully capable of hearing a stereo audio mix the way the audio mix is intended to be heard. The perceptual smoothing also provides for the design of reduced filter orders, for example, by eliminating high-frequency noise and variations in the HRTFs that are not perceptually relevant for spatial reproduction.
For the apparatuses, methods, and non-transitory computer readable media disclosed herein, a constrained inversion of the perceptually smoothed HRTFs may be performed through the use of regularization, and validation of a condition number of a regularized matrix before inversion. In this regard, as disclosed herein, a tradeoff may be achieved, for example, by analyzing the condition number with respect to an objective cancellation performance, a subjective audio quality, and robustness to head-movements.
For the apparatuses, methods, and non-transitory computer readable media disclosed herein, modules, as described herein, may be any combination of hardware and programming to implement the functionalities of the respective modules. In some examples described herein, the combinations of hardware and programming may be implemented in a number of different ways. For example, the programming for the modules may be processor executable instructions stored on a non-transitory machine-readable storage medium and the hardware for the modules may include a processing resource to execute those instructions. In these examples, a computing device implementing such modules may include the machine-readable storage medium storing the instructions and the processing resource to execute the instructions, or the machine-readable storage medium may be separately stored and accessible by the computing device and the processing resource. In some examples, some modules may be implemented in circuitry.
In some examples, the apparatus 100 may include or be provided as a component of a device such as a notebook, a desktop computer, a mobile telephone, a tablet, and other such devices. For the example of
Referring to
A time difference insertion module 114 is to insert an inter-aural time difference 116 (also designated ITD) in the perceptually smoothed HRTFs corresponding to the contralateral transfer paths. According to an example, the inter-aural time difference may be determined as a function of a head radius of the user, and an angle of one of the speakers (e.g., the speaker 106 or 108) from a median plane of a device (e.g., the device 150) that includes the speakers.
A crosstalk canceller generation module 118 is to generate a crosstalk canceller 120 by inverting the perceptually smoothed HRTFs corresponding to the ipsilateral transfer paths and the perceptually smoothed HRTFs corresponding to the contralateral transfer paths including the inserted inter-aural time difference 116. As disclosed herein, in some examples, the crosstalk canceller 120 may be provided as a component of the device 150 (e.g., see also
According to an example and as disclosed herein, the crosstalk canceller generation module 118 is to generate the crosstalk canceller 120 by performing a time-domain inversion of a regularized matrix determined from the perceptually smoothed HRTFs corresponding to the ipsilateral transfer paths and the perceptually smoothed HRTFs corresponding to the contralateral transfer paths including the inserted inter-aural time difference 116. In this regard, as disclosed herein, the crosstalk canceller generation module 118 is to determine a time-domain matrix from the perceptually smoothed HRTFs corresponding to the ipsilateral transfer paths and the perceptually smoothed HRTFs corresponding to the contralateral transfer paths including the inserted inter-aural time difference 116, determine a regularization term (e.g., β) to control inversion of the time-domain matrix, and invert the time-domain matrix based on the regularization term to generate the regularized matrix. Further, as disclosed herein, the crosstalk canceller generation module 118 is to determine the regularization term to control the inversion of the time-domain matrix by comparing a condition number associated with a transpose of the time-domain matrix to a threshold (e.g., 100), and in response to a determination that the condition number is below the threshold, invert the time-domain matrix based on the regularization term to generate the regularized matrix. Thus, the crosstalk canceller generation module 118 is to validate the condition number of the regularized matrix prior to the performing of the time-domain inversion of the regularized matrix.
Referring to
The immersive audio renderer 200 may be extended to accommodate next-generation audio formats (including channel/objects or pure object-based signals and metadata) as input to the immersive audio renderer 200. In addition to the crosstalk canceller 120, the immersive audio renderer 200 may include a low-frequency extension 202 that performs a synthesis of non-linear terms of the low pass audio signal in the side chain. Specifically auditory motivated filterbanks filter the audio signal, the peak of the signal may be tracked in each filterbank, and the maximum peak over all peaks or each of the peaks may be selected for nonlinear term generation. The nonlinear terms for each filterbank output may then be band pass filtered and summed into each of the channels to create the perception of low frequencies. The immersive audio renderer 200 may include spatial synthesis and binaural downmix 204 where reflections and desired direction sounds may be mixed in prior to crosstalk cancellation. For example, the spatial synthesis and binaural downmix 204 may apply HRTFs to render virtual sources at desired angles (and distances). According to an example, the perceptually-smoothed HRTFS may be for angles±40° for the front left and front right sources (channels), 0° for the center, and ±110° degrees for the left and right surround sources (channels). The immersive audio renderer 200 may include multiband-range compression 206 that performs multiband compression, for example, by using perfect reconstruction (PR) filterbanks, an International Telecommunication Union (ITU) loudness model, and a neural network to generalize to arbitrary multiband dynamic range compression (DRC) parameter settings.
Referring to
For the example layout of the crosstalk-canceller and the binaural acoustic transfer function of
Referring to
Referring to
With respect to phase and magnitude smoothing, the perceptual smoothing module 102 may include processing such as critical-band smoothing, equivalent rectangular band smoothing (ERB), or time-domain fractional octave smoothing that perceptually smooths the temporal response.
With respect to complex-smoothing, the perceptual smoothing module 102 may introduce minimum-phase smoothing, thereby eliminating the time-of arrival information.
The perceptual smoothing of the HRTFs may degrade the cues associated with time-of-arrival differences between the two-ears. In this regard, the time difference insertion module 114 is to re-insert the inter-aural time difference 116 in the perceptually smoothed HRTFs corresponding to the contralateral transfer paths. For example, the time difference insertion module 114 is to re-insert the inter-aural time difference 116 by applying the following Equation (1):
For Equation (1), a=0.0875 m may represent the head-radii, e may represent the angle of the speaker (e.g., the speaker 106 or 108) from a median plane (viz., 15° in this case), and c=343 m/s may represent the speed of sound. In this regard, the re-insertion of the inter-aural time difference 116 may insert a time delay in the contralateral signal of
Referring to
With respect to
After smoothing by the perceptual smoothing module 102 as described above, the crosstalk canceller generation module 118 may invert the perceptually smoothed HRTFs corresponding to the ipsilateral transfer paths and the perceptually smoothed HRTFs corresponding to the contralateral transfer paths including the inserted inter-aural time difference 116. In this regard, the crosstalk canceller generation module 118 may generate the crosstalk canceller 120 by determining a Toeplitz convolution matrix that emulates the following matrix Equations (2) to (4):
For Equations (2) to (4), G(z) may represent the ipsilateral and contralateral transfer functions, H(z) may represent the crosstalk canceller filter transfer function to be designed, d may represent the desired delay in samples, I may represent the identity matrix, and z=e{circumflex over ( )}{jw}, where w may represent the angular frequency in radians and w=2*pi*f*T, where f may represent frequency in Hz, T may represent the sampling period, and pi=3.14. With respect to Equations (2) to (4), equalization may be achieved based on the correction of dips and peaks for the ipsilateral ears while minimizing contralateral contribution from DC-20 kHz by using the matrix inverse G−1(z).
The crosstalk canceller generation module 118 may perform frequency-domain or time-domain inversion of the perceptually smoothed HRTFs corresponding to the ipsilateral transfer paths and the perceptually smoothed HRTFs corresponding to the contralateral transfer paths including the inserted inter-aural time difference.
With respect to frequency-domain inversion, the crosstalk canceller generation module 118 may determine the crosstalk filter (e.g., the crosstalk canceller 120) by direct inversion in the frequency domain of Equation (4) using the perceptually smoothed responses.
With respect to time-domain inversion with regularization, gij=(gij,0 . . . gij,L
GH=U Equation (5)
For Equations (6) to (9), G may represent a time-domain matrix that includes {tilde over (G)}ij for {tilde over (G)}11, {tilde over (G)}12, {tilde over (G)}21, and {tilde over (G)}22, H may represent time-domain crosstalk canceler filters, and U may represent the identity matrix with appropriate time delays represented along the diagonal for causal filters. In this regard, {tilde over (G)}ij may represent a convolution matrix in Toeplitz form. The {tilde over (G)}ij matrix may be expressed as follows:
With respect to Equation (9), the superscript t may denote matrix transpose, with {tilde over (G)}ij being a real matrix of size Lh Lg−1×Lh (Lh being the duration of the desired crosstalk cancellation filter, and Lg being the duration in samples of the perceptually smoothed acoustical path response). The convolution matrix {tilde over (G)}ij may include the samples gij,0 to gij,Lg-1. For the ipsilateral response, the response may be imbedded in the convolution matrix, {tilde over (G)}ij, for example, from sample 0 to sample 500 for the example of
With respect to the crosstalk canceller generation module 118, given that the matrix G is non-square, a least-squares solution may involve determination of the pseudo-inverse of G as follows:
For Equation (10), Hopt may represent an optimal matrix for implementing the crosstalk canceller 120, and β may represent a regularization term to control the inversion. According to an example, β may be determined via listening assessments to include a tradeoff between objective cancellation performance and timbre (e.g., audio quality). In this regard, γ may be determined by evaluating the condition number of the square matrix GtG (which is the ratio of the maximum to minimum singular values, derived from the singular value decomposition of the square matrix) with and without β, assessing the crosstalk cancellation performance, and listening evaluations on headphones with pink noise, music, and speech. For the examples of
Referring to
The processor 1002 of
Referring to
The processor 1002 may fetch, decode, and execute the instructions 1008 to insert (e.g., by the time difference insertion module 114) an inter-aural time difference 116 in the perceptually smoothed HRTFs corresponding to the contralateral transfer paths.
The processor 1002 may fetch, decode, and execute the instructions 1010 to generate (e.g., by the crosstalk canceller generation module 118) a crosstalk canceller 120 by inverting the perceptually smoothed HRTFs corresponding to the ipsilateral transfer paths and the perceptually smoothed HRTFs corresponding to the contralateral transfer paths including the inserted inter-aural time difference 116.
Referring to
At block 1104, the method may include inserting an inter-aural time difference (e.g., by the time difference insertion module 114) in the perceptually smoothed HRTFs corresponding to the contralateral transfer paths.
At block 1106, the method may include generating (e.g., by the crosstalk canceller generation module 118) a crosstalk canceller 120 by performing a time-domain inversion of a regularized matrix determined from the perceptually smoothed HRTFs corresponding to the ipsilateral transfer paths and the perceptually smoothed HRTFs corresponding to the contralateral transfer paths including the inserted inter-aural time difference 116.
Referring to
The processor 1204 may fetch, decode, and execute the instructions 1208 to insert (e.g., by the time difference insertion module 114) an inter-aural time difference 116 in the perceptually smoothed HRTFs corresponding to the contralateral transfer paths.
The processor 1204 may fetch, decode, and execute the instructions 1210 to determine (e.g., by the crosstalk canceller generation module 118) a time-domain matrix from the perceptually smoothed HRTFs corresponding to the ipsilateral transfer paths and the perceptually smoothed HRTFs corresponding to the contralateral transfer paths including the inserted inter-aural time difference 116.
The processor 1204 may fetch, decode, and execute the instructions 1212 to determine (e.g., by the crosstalk canceller generation module 118) a regularization term (e.g., β) to control inversion of the time-domain matrix.
The processor 1204 may fetch, decode, and execute the instructions 1214 to invert (e.g., by the crosstalk canceller generation module 118) the time-domain matrix based on the regularization term to generate a regularized matrix.
The processor 1204 may fetch, decode, and execute the instructions 1216 to generate (e.g., by the crosstalk canceller generation module 118) a crosstalk canceller 120 by performing a time-domain inversion of the regularized matrix.
What has been described and illustrated herein is an example along with some of its variations. The terms, descriptions and figures used herein are set forth by way of illustration only and are not meant as limitations. Many variations are possible within the spirit and scope of the subject matter, which is intended to be defined by the following claims—and their equivalents—in which all terms are meant in their broadest reasonable sense unless otherwise indicated.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2017/027718 | 4/14/2017 | WO | 00 |