The present invention relates to the field of audio signal processing, and discloses methods and systems for efficient estimation of dialogue components, in particular for audio signals having spatialization components, sometimes referred to as immersive audio content.
Any discussion of the background art throughout the specification should in no way be considered as an admission that such art is widely known or forms part of common general knowledge in the field.
Content creation, coding, distribution and reproduction of audio are traditionally performed in a channel based format, that is, one specific target playback system is envisioned for content throughout the content ecosystem. Examples of such target playback systems audio formats are mono, stereo, 5.1, 7.1, and the like, and we refer to these formats as different presentations of the original content. The above mentioned presentations are typically played back over loudspeakers but a notable exception is the stereo presentation which also commonly is played back directly over headphones.
One specific presentation is the binaural presentation, typically targeting playback on headphones. Distinctive to a binaural presentation is that it is a two-channel signal with each signal representing the content as perceived at, or close to, the left and right eardrum respectively. A binaural presentation can be played back directly over loudspeakers, but preferably the binaural presentation is transformed into a presentation suitable for playback over loudspeakers using cross-talk cancellation techniques.
Different audio reproduction systems have been introduced above, like loudspeakers in different configurations, for example stereo, 5.1, and 7.1, and headphones. It is understood from the examples above that a presentation of the original content has a natural, intended, associated audio reproduction system, but can of course be played back on a different audio reproduction system.
If content is to be reproduced on a different playback system than the intended one, a downmixing or upmixing process can be applied. For example, 5.1 content can be reproduced over a stereo playback system by employing specific downmix equations. Another example is playback of stereo encoded content over a 7.1 speaker setup, which may comprise a so-called upmixing process, that could or could not be guided by information present in the stereo signal. A system capable of upmixing is Dolby Pro Logic from Dolby Laboratories Inc (Roger Dressler, “Dolby Pro Logic Surround Decoder, Principles of Operation”, www.Dolby.com).
An alternative audio format system is an audio object format such as that provided by the Dolby Atmos system. In this type of format, objects or components are defined to have a particular location around a listener, which may be time varying. Audio content in this format is sometimes referred to as immersive audio content. It is noted that within the context of this application an audio object format is not considered a presentation as described above, but rather a format of the original content that is rendered to one or more presentations in an encoder, after which the presentation(s) is encoded and transmitted to a decoder.
When multi-channel and object based content is to be transformed into a binaural presentation as mentioned above, the acoustic scene consisting of loudspeakers and objects at particular locations is simulated by means of head-related impulse responses (HRIRs), or binaural room impulse responses (BRIRs), which simulate the acoustical pathway from each loudspeaker/object to the ear drums, in an anechoic or echoic (simulated) environment, respectively. In particular, audio signals can be convolved with HRIRs or BRIRs to re-instate inter-aural level differences (ILDs), inter-aural time differences (ITDs) and spectral cues that allow the listener to determine the location of each individual loudspeaker/object. The simulation of an acoustic environment (reverberation) also helps to achieve a certain perceived distance.
The HRIR/BRIR convolution approach comes with several drawbacks, one of them being the substantial amount of convolution processing that is required for headphone playback. The HRIR or BRIR convolution needs to be applied for every input object or channel separately, and hence complexity typically grows linearly with the number of channels or objects. As headphones are often used in conjunction with battery-powered portable devices, a high computational complexity is not desirable as it may substantially shorten battery life. Moreover, with the introduction of object-based audio content, which may comprise say more than 100 objects active simultaneously, the complexity of HRIR convolution can be substantially higher than for traditional channel-based content.
For this purpose, co-pending and non-published U.S. Provisional Patent Application Ser. No. 62/209,735, filed Aug. 25, 2015, describes a dual-ended approach for presentation transformations that can be used to efficiently transmit and decode immersive audio for headphones. The coding efficiency and decoding complexity reduction are achieved by splitting the rendering process across encoder and decoder, rather than relying on the decoder alone to render all objects.
A part of the content which during creation is associated with a specific spatial location is referred to as an audio component. The spatial location can be a point in space or a distributed location. Audio components can be thought of as all the individual audio sources that a sound artist mixes, i.e., positions spatially, into a soundtrack. Typically a semantic meaning (e.g. dialogue) is assigned to the components of interest so that the goal of the processing (e.g. dialogue enhancement) becomes defined. It is noted that audio components that are produced during content creation are typically present throughout the processing chain, from the original content to different presentations. For example, in an object format there can be dialogue objects with associated spatial locations. And in a stereo presentation there can be dialogue components that are spatially located in the horizontal plane.
In some applications, it is desirable to extract dialogue components in the audio signal, in order to e g enhance or amplify such components. The goal of dialogue enhancement (DE) may be to modify the speech part of a piece of content that contains a mix of speech and background audio so that the speech becomes more intelligible and/or less fatiguing for an end-user. Another use of DE is to attenuate dialogue that for example is perceived as disturbing by an end-user. There are two fundamental classes of DE methods: encoder side and decoder side DE. Decoder side DE (called single ended) operates solely on the decoded parameters and signals that reconstruct the non-enhanced audio, i.e., no dedicated side-information for DE is present in the bitstream. In encoder side DE (called dual ended), dedicated side-information that can be used to do DE in the decoder is computed in the encoder and inserted in the bitstream.
Another approach is disclosed in U.S. Pat. No. 8,315,396. Here, the bitstream to the decoder includes an object downmix signal (e.g. a stereo presentation), object parameters to enable reconstruction of the audio objects, and object based metadata allowing manipulation of the reconstructed audio objects. As indicated in FIG. 10 of U.S. Pat. No. 8,315,396, the manipulation may include amplification of speech related objects. This approach thus requires the reconstruction of the original audio objects on the decoder side, which typically is computationally demanding.
There is a general desire to provide dialogue estimation efficiently also in a binaural context.
It is an object of the invention to provide efficient dialogue enhancement in a binaural context, i.e. when at least one of the audio presentations that the dialogue component(s) is extracted from, or the audio presentation to which the extracted dialogue is added to, is a (echoic or anechoic) binaural representation.
In accordance with a first aspect of the present invention, there is provided a method for dialogue enhancing audio content having one or more audio components, wherein each component is associated with a spatial location, comprising providing a first audio signal presentation of the audio components intended for reproduction on a first audio reproduction system, providing a second audio signal presentation of the audio components intended for reproduction on a second audio reproduction system, receiving a set of dialogue estimation parameters configured to enable estimation of dialogue components from the first audio signal presentation, applying the set of dialogue estimation parameters to the first audio signal presentation, to form a dialogue presentation of the dialogue components; and combining the dialogue presentation with the second audio signal presentation to form a dialogue enhanced audio signal presentation for reproduction on the second audio reproduction system, wherein at least one of the first and second audio signal presentation is a binaural audio signal presentation.
In accordance with a second aspect of the present invention, there is provided a method for dialogue enhancing audio content having one or more audio components, wherein each component is associated with a spatial location, comprising receiving a first audio signal presentation of the audio components intended for reproduction on a first audio reproduction system, receiving a set of presentation transform parameters configured to enable transformation of the first audio signal presentation into a second audio signal presentation intended for reproduction on a second audio reproduction system, receiving a set of dialogue estimation parameters configured to enable estimation of dialogue components from the first audio signal presentation, applying the set of presentation transform parameters to the first audio signal presentation to form a second audio signal presentation, applying the set of dialogue estimation parameters to the first audio signal presentation to form a dialogue presentation of the dialogue components; and combining the dialogue presentation with the second audio signal presentation to form a dialogue enhanced audio signal presentation for reproduction on the second audio reproduction system, wherein only one of the first audio signal presentation and the second audio signal presentation is a binaural audio signal presentation.
In accordance with a third aspect of the present invention, there is provided a method for dialogue enhancing audio content having one or more audio components, wherein each component is associated with a spatial location, comprising receiving a first audio signal presentation of the audio components intended for reproduction on a first audio reproduction system, receiving a set of presentation transform parameters configured to enable transformation of the first audio signal presentation into the second audio signal presentation intended for reproduction on a second audio reproduction system, receiving a set of dialogue estimation parameters configured to enable estimation of dialogue components from the second audio signal presentation, applying the set of presentation transform parameters to the first audio signal presentation to form a second audio signal presentation, applying the set of dialogue estimation parameters to the second audio signal presentation to form a dialogue presentation of the dialogue components; and summing the dialogue presentation with the second audio signal presentation to form a dialogue enhanced audio signal presentation for reproduction on the second audio reproduction system, wherein only one of the first audio signal presentation and the second audio signal presentation is a binaural audio signal presentation.
In accordance with a fourth aspect of the present invention, there is provided a decoder for dialogue enhancing audio content having one or more audio components, wherein each component is associated with a spatial location, comprising, a core decoder for receiving and decoding a first audio signal presentation of the audio components intended for reproduction on a first audio reproduction system and a set of dialogue estimation parameters configured to enable estimation of dialogue components from the first audio signal presentation, a dialogue estimator for applying the set of dialogue estimation parameters to the first audio signal presentation, to form a dialogue presentation of the dialogue components, and means for combining the dialogue presentation with the second audio signal presentation to form a dialogue enhanced audio signal presentation for reproduction on the second audio reproduction system, wherein only one of the first and second audio signal presentation is a binaural audio signal presentation.
In accordance with a fifth aspect of the present invention, there is provided a decoder for dialogue enhancing audio content having one or more audio components, wherein each component is associated with a spatial location, comprising a core decoder for receiving a first audio signal presentation of the audio components intended for reproduction on a first audio reproduction system, a set of presentation transform parameters configured to enable transformation of the first audio signal presentation into a second audio signal presentation intended for reproduction on a second audio reproduction system, and a set of dialogue estimation parameters configured to enable estimation of dialogue components from the first audio signal presentation, a transform unit configured to apply the set of presentation transform parameters to the first audio signal presentation to form a second audio signal presentation intended for reproduction on a second audio reproduction system, a dialogue estimator for applying the set of dialogue estimation parameters to the first audio signal presentation to form a dialogue presentation of the dialogue components, and means for combining the dialogue presentation with the second audio signal presentation to form a dialogue enhanced audio signal presentation for reproduction on the second audio reproduction system, wherein only one of the first audio signal presentation and the second audio signal presentation is a binaural audio signal presentation.
In accordance with a sixth aspect of the present invention, there is provided a decoder for dialogue enhancing audio content having one or more audio components, wherein each component is associated with a spatial location, comprising a core decoder for receiving a first audio signal presentation of the audio components intended for reproduction on a first audio reproduction system, a set of presentation transform parameters configured to enable transformation of the first audio signal presentation into a second audio signal presentation intended for reproduction on a second audio reproduction system, and a set of dialogue estimation parameters configured to enable estimation of dialogue components from the first audio signal presentation, a transform unit configured to apply the set of presentation transform parameters to the first audio signal presentation to form a second audio signal presentation intended for reproduction on a second audio reproduction system, a dialogue estimator for applying the set of dialogue estimation parameters to the second audio signal presentation to form a dialogue presentation of the dialogue components, and a summation block for summing the dialogue presentation with the second audio signal presentation to form a dialogue enhanced audio signal presentation for reproduction on the second audio reproduction system, wherein one of the first audio signal presentation and the second audio signal presentation is a binaural audio signal presentation.
The invention is based on the insight that a dedicated parameter set may provide an efficient way to extract a dialogue presentation from one audio signal presentation which may then be combined with another audio signal presentation, where at least one of the presentations is a binaural presentation. It is noted that according to the invention, it is not necessary to reconstruct the original audio objects in order to enhance dialogue. Instead, the dedicated parameters are applied directly on a presentation of the audio objects, e.g. a binaural presentation, a stereo presentation, etc. The inventive concept enables a variety of specific embodiments, each with specific advantages.
It is noted that the expression “dialogue enhancement” here is not restricted to amplifying or boosting dialogue components, but may also relate to attenuation of selected dialogue components. Thus, in general the expression “dialogue enhancement” refers to a level-modification of one or more dialogue related components of the audio content. The gain factor G of the level modification may be less than zero in order to attenuate dialogue, or greater than zero in order to enhance dialogue.
In some embodiments, the first and second presentations are both (echoic or anechoic) binaural presentations. In case only one of them binaural, the other presentation may be a stereo or surround audio signal presentation.
In the case of different presentations, the dialogue estimation parameters may be configured to also perform a presentation transform, so that the dialogue presentation corresponds to the second audio signal presentation.
The invention may advantageously be implemented in a particular type of a so called simulcast system, where the encoded bit stream also includes a set of transform parameters suitable for transforming the first audio signal presentation to a second audio signal presentation.
Embodiments of the invention will now be described, by way of example only, with reference to the accompanying drawings in which:
Systems and methods disclosed in the following may be implemented as software, firmware, hardware or a combination thereof. In a hardware implementation, the division of tasks referred to as “stages” in the below description does not necessarily correspond to the division into physical units; to the contrary, one physical component may have multiple functionalities, and one task may be carried out by several physical components in cooperation. Certain components or all components may be implemented as software executed by a digital signal processor or microprocessor, or be implemented as hardware or as an application-specific integrated circuit. Such software may be distributed on computer readable media, which may comprise computer storage media (or non transitory media) and communication media (or transitory media). As is well known to a person skilled in the art, the term computer storage media includes both volatile and non-volatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by a computer. Further, it is well known to the skilled person that communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media.
Various ways to implement embodiments of the invention will be discussed with reference to
In the presented embodiments the input signals are preferably analyzed in time/frequency tiles, for example by means of a filter bank such as a quadrature mirror filter (QMF) bank, a discrete Fourier transform (DFT), a discrete cosine transform (DCT), or any other means to split input signals into a variety of frequency bands. The result of such a transform is that an input signal xi[n] for input with index i and discrete-time index n is represented by sub-band signals xi[b,k] for time slot (or frame) k and sub-band b. Consider for example the estimation of the binaural dialogue presentation from a stereo presentation. Let xj[b,k],j=1,2 denote the sub-band signals of the left and right stereo channels, and {circumflex over (d)}i[b,k],i=1,2 denote the sub-band signals of the estimated left and right binaural dialogue signals. The dialogue estimate may be computed like
with Bp, K sets of frequency (b) and time (k) indices corresponding to a desired time/frequency tile, p the parameter band index, and m a convolution tap index, and wijmB
The dialogue parameters w may be computed in the encoder, and encoded using techniques disclosed in U.S. Provisional Patent Application Ser. No. 62/209,735, filed Aug. 25, 2015, hereby incorporated by reference. The parameters w are then transmitted in the bitstream and decoded by a decoder prior to application using the above equation. Due to the linear nature of the estimate the encoder computation can be implemented using minimum mean squared error (MMSE) methods in cases where the target signal (the clean dialogue or an estimate of the clean dialogue) is available.
The choice of P, and the choice of the number of time slots in K is a trade-off between quality and bit rate. Furthermore, the parameters w can be constrained in order to lower the bit rate (at the cost of lower quality), e.g., by assuming wijmB
In general it is proposed to use estimators of the form
where at least one of ŷ and x is a binaural signal, i.e., I=2 or J=2 or I=J=2. For notational convenience we will in the following often omit the time/frequency tile indexing Bp, K as well as the i,j,m indexing when referring to different parameter sets used to estimate dialogue.
The above estimator can conveniently be expressed in matrix notation as (omitting the time/frequency tile indexing for ease of notation)
where Xm=[x1(m) . . . xJ(m)] and ŷ=[ŷ1 . . . ŷI] contain vectorized versions of xj[b, k−m] and ŷi[b,k] respectively in the columns, and Wm is a parameter matrix with J rows and I columns. The above form of the estimator may be used when performing only dialogue extraction, or when performing only a presentation transform, as well as in the case where both extraction and presentation transform is done using a single set of parameters as is detailed in embodiments below.
With reference to
According to the present invention, at least one of the presentations is a binaural presentation (echoic or anechoic). As will be further discussed in the following, the first and second presentations may be different, and the dialogue presentation may or may not correspond to the second presentation. For example, the first audio signal presentation may be intended for playback on a first audio reproduction system, e.g. a set of loudspeakers, while the second audio signal presentation may be intended for playback on a second audio reproduction system, e.g. headphones.
Single Presentation
In the decoder embodiment in
In the embodiment in
Two Presentations
In the decoder embodiment in
As indicated in
In
Further, it is noted that the dialogue extraction can be one dimensional, such that the extracted dialogue is a mono representation. The transform parameters D2 are then positional metadata, and the presentation transform comprises rendering the mono dialogue using HRTFs, HRIRs or BRIRs corresponding to the position. Alternatively, if the desired rendered dialogue presentation is intended for loudspeaker playback, the mono dialogue could be rendered using loudspeaker rendering techniques such as amplitude panning or vector-based amplitude panning (VBAP).
Simulcast Implementation
As illustrated in
In the embodiment in
In the embodiment in
It is noted that the set of parameters w(D1) may be identical to the dialogue enhancement parameters used to provide dialogue enhancement of the stereo signal in a simulcast implementation. This alternative is illustrated in
In one embodiment, the aforementioned dedicated presentation transform w(D2) in
It is noted that combining signals with different presentations, e.g., summing a stereo dialogue signal to a binaural signal (which contains non-enhanced binaural dialogue components) naturally leads to spatial imaging artifacts since the non-enhanced binaural dialogue components are perceived to be spatially different compared to a stereo presentation of the same components.
It is further noted that combining signals with different presentations can lead to constructive summing of dialogue components in certain frequency bands, and destructive summing in other frequency bands. The reason for this is that binaural processing introduces ITDs (phase differences) and we are summing signals that are in-phase in certain frequency bands and out-of-phase in other bands, leading to coloring artifacts in the dialogue components (moreover the coloring can be different in the left and right ear). In one embodiment, phase differences above the phase/magnitude cut-off frequency are avoided in the binaural processing so as to reduce this type of artifact.
As a final note to the case of combining signals with different presentations it is acknowledged that in general, binaural processing can reduce the intelligibility of dialogue. In cases where the goal of dialogue enhancement is to maximize intelligibility, it may be advantageous to extract and level modify (e.g. boost) a dialogue signal that is non-binaural. To elaborate further, even if the final presentation intended for playback is binaural, it may be advantageous in such a case to extract and level modify (e.g. boost) a stereo dialogue signal and combine that with the binaural presentation (trading off coloring artifacts and spatial imaging artifacts as described above, for increased intelligibility).
In the embodiment in
In some applications, it may be desirable to apply different processing depending on the desired value of the dialogue level modification factor G. In one embodiment, example, appropriate processing is selected based on a determination of whether the factor G is greater than or smaller than a given threshold. Of course, there may also be more than one threshold, and more than one alternative processing. For example, a first processing when G<th1, a second processing when th1<=G<th2, and a third processing when G>=th2, where th1 and th2 are two given threshold values.
In a specific example, illustrated in
When the switch is in position A, the circuit is here configured to combine the estimated stereo dialogue from matrix transform 86 with the stereo signal z, and then perform the matrix transform 73 on the combined signal to generate a reconstructed anechoic binaural signal. The output from the feedback delay network 75 is then combined with this signal in 78. It is noted that this processing essentially corresponds to
When the switch is in position B, the circuit is here configured to apply transform parameters w(D2) to the stereo dialogue from matrix transform 86 in order to provide a binaural dialogue estimation. This estimation is then added to the anechoic binaural signal from transform 73, and output from the feedback delay network 75. It is noted that this processing essentially corresponds to
The skilled person will realize many other alternatives for the processing in position A and B, respectively. For example, the processing when the switch is in position B could instead correspond to that in
Interpretation
Reference throughout this specification to “one embodiment”, “some embodiments” or “an embodiment” means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, appearances of the phrases “in one embodiment”, “in some embodiments” or “in an embodiment” in various places throughout this specification are not necessarily all referring to the same embodiment, but may. Furthermore, the particular features, structures or characteristics may be combined in any suitable manner, as would be apparent to one of ordinary skill in the art from this disclosure, in one or more embodiments.
As used herein, unless otherwise specified the use of the ordinal adjectives “first”, “second”, “third”, etc., to describe a common object, merely indicate that different instances of like objects are being referred to, and are not intended to imply that the objects so described must be in a given sequence, either temporally, spatially, in ranking, or in any other manner.
In the claims below and the description herein, any one of the terms comprising, comprised of or which comprises is an open term that means including at least the elements/features that follow, but not excluding others. Thus, the term comprising, when used in the claims, should not be interpreted as being limitative to the means or elements or steps listed thereafter. For example, the scope of the expression a device comprising A and B should not be limited to devices consisting only of elements A and B. Any one of the terms including or which includes or that includes as used herein is also an open term that also means including at least the elements/features that follow the term, but not excluding others. Thus, including is synonymous with and means comprising.
As used herein, the term “exemplary” is used in the sense of providing examples, as opposed to indicating quality. That is, an “exemplary embodiment” is an embodiment provided as an example, as opposed to necessarily being an embodiment of exemplary quality.
It should be appreciated that in the above description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, FIG., or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. This method of disclosure, however, is not to be interpreted as reflecting an intention that the claimed invention requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the Detailed Description are hereby expressly incorporated into this Detailed Description, with each claim standing on its own as a separate embodiment of this invention.
Furthermore, while some embodiments described herein include some but not other features included in other embodiments, combinations of features of different embodiments are meant to be within the scope of the invention, and form different embodiments, as would be understood by those skilled in the art. For example, in the following claims, any of the claimed embodiments can be used in any combination.
Furthermore, some of the embodiments are described herein as a method or combination of elements of a method that can be implemented by a processor of a computer system or by other means of carrying out the function. Thus, a processor with the necessary instructions for carrying out such a method or element of a method forms a means for carrying out the method or element of a method. Furthermore, an element described herein of an apparatus embodiment is an example of a means for carrying out the function performed by the element for the purpose of carrying out the invention.
In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the invention may be practiced without these specific details. In other instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Similarly, it is to be noticed that the term coupled, when used in the claims, should not be interpreted as being limited to direct connections only. The terms “coupled” and “connected,” along with their derivatives, may be used. It should be understood that these terms are not intended as synonyms for each other. Thus, the scope of the expression a device A coupled to a device B should not be limited to devices or systems wherein an output of device A is directly connected to an input of device B. It means that there exists a path between an output of A and an input of B which may be a path including other devices or means. “Coupled” may mean that two or more elements are either in direct physical or electrical contact, or that two or more elements are not in direct contact with each other but yet still co-operate or interact with each other.
Thus, while there has been described specific embodiments of the invention, those skilled in the art will recognize that other and further modifications may be made thereto without departing from the spirit of the invention, and it is intended to claim all such changes and modifications as falling within the scope of the invention. For example, any formulas given above are merely representative of procedures that may be used. Functionality may be added or deleted from the block diagrams and operations may be interchanged among functional blocks. Steps may be added or deleted to methods described within the scope of the present invention.
Number | Date | Country | Kind |
---|---|---|---|
16153468 | Jan 2016 | EP | regional |
The present application claims priority to U.S. Provisional Patent Application No. 62/288,590, filed Jan. 29, 2016, and European Patent Application No. 16153468.0, filed Jan. 29, 2016, both of which are incorporated herein by reference in their entirety.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2017/015165 | 1/26/2017 | WO | 00 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2017/132396 | 8/3/2017 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
8315396 | Schreiner | Nov 2012 | B2 |
20080049943 | Faller | Feb 2008 | A1 |
20150348564 | Paulus | Dec 2015 | A1 |
20160225387 | Koppens | Aug 2016 | A1 |
20170309288 | Koppens | Oct 2017 | A1 |
20180233156 | Breebaart | Aug 2018 | A1 |
Number | Date | Country |
---|---|---|
2017035281 | Mar 2017 | WO |
WO-2017035281 | May 2017 | WO |
Entry |
---|
Paulus, J. et al “MPEG-D Spatial Audio Object Coding for Dialogue Enhancement (SAOC-DE)” AES Convention 138, May 2015. |
Dressler Roger, “Dolby Surround Pro Logic Decoder Principles of Operation” published in 2000. |
Wightman, F. et al “Sound Localization” Human Psychophysics, Springer New York, 1993, pp. 155-192. |
Breebaart, J. et al “Spectral and Spatial Parameter Resolution Requirements for Parametric, Filter-Bank-Based HRTF Processing” JAES vol. 58 Issue 3, pp. 126-140, Mar. 2010. |
Number | Date | Country | |
---|---|---|---|
20190037331 A1 | Jan 2019 | US |
Number | Date | Country | |
---|---|---|---|
62288590 | Jan 2016 | US |