Aspects in the disclosure here relate generally to digital audio signal processing techniques for binaural reproduction (e.g., through a headset) of surround sound channels.
Surround sound is a technique for using multiple audio channels routed to multiple speakers more or less surrounding a listener to produce the perception of sound spatialization. In one case, this technique relies on a listener's ability to identify the location or origin of a detected sound in direction and distance, and directs different sound elements to one or more speakers in order to produce a desired localization of the sound element. In another case, more than one speaker may reproduce a generally non-localizable sound field typically for ambience and reverberation. Several methods of surround sound have been developed, such as multichannel audio, object-based audio, and scene-based audio. Multichannel audio is based on specific loudspeaker layouts, and recorded sound channels in 1:1 correspondence with the speaker channels. Object-oriented audio may be divided into two types of sound channels: beds, and objects. Object-oriented audio is more flexible than conventional multichannel audio because the strict 1:1 correspondence between recorded channels and loudspeaker channels is not necessary; Objects have metadata from which a system derives the best representation of the audio objects over the loudspeaker channels at hand in a given playback environment. All of these methods of surround sound generally include one or more recorded surround channels that contain ambient sound and/or reverberation, which gives the listener a sense of envelopment because the listener is not able to localize the source of the sound.
Existing techniques for reproducing the ambient sound of the surround channel during headphone playback have struggled with producing envelopment and suppressing localization of the sound using ambience and reverberation that is meant to be enveloping. Further, advancements have been made in the field of virtualizing audio, which attempts to create the perception for the listener that there are many more sources of sound than are actually present.
Generally, aspects of the disclosure here relate to a system and method for binaural reproduction of a surround sound channel using a virtualized line array.
In one aspect, a method for producing a diffuse surround sound field that is non-localizable and with reduced timbre inaccuracies starts with a processor receiving an audio bitstream that contains a surround channel. The processor then renders the surround channel as at least one virtualized line array source. Timbre correction is applied to the virtualized line array source. A number of speaker output signals are generated by a spatial sound processor from the timbre-corrected virtualized line array source, for driving a plurality of speakers.
The virtualized line array source has the characteristics of a line array speaker in a simulated virtual environment. In one aspect of the disclosure the virtualized line array source is comprised of a plurality of finite source elements that may be arranged substantially at the same elevation on the azimuth and at sufficient density so as to appear continuous. In this case the purpose is to reproduce the surround channel of a large body of content that has been made in sound channel layouts known as 5.1-channel or 7.1-channel sound.
Any one of several timbre matching techniques may be applied to the virtualized line array source. A first method is to apply an inverse head related transfer function, HRTF filter to a second finite source element so as to match the perceived approach vector component response angle of a first finite source element. In another method, the timing of playback by each finite source element of the virtualized line array source is non-uniform and is delayed for elements close to and at the center of the virtualized line array source. In yet another method, comb filtering is applied to the virtualized line array source.
In one aspect, a system for producing a diffuse surround sound field that is non-localizable and with reduced timbre inaccuracies comprises a processor and memory having stored therein instructions that when executed by the processor receive an audio bitstream that contains a surround channel. The processor then renders the surround channel as at least one virtualized line array source. Timbre correction is applied to the virtualized line array source. A number of speaker output signals are generated from the timbre-corrected virtualized line array source for driving a plurality of speakers.
The above summary does not include an exhaustive list of all aspects of the present disclosure. It is contemplated that the invention includes all systems, apparatuses and methods that can be practiced from all suitable combinations of the various aspects summarized above, as well as those disclosed in the Detailed Description below and particularly pointed out in the claims filed with the application. Such combinations may have particular advantages not specifically recited in the above summary.
The aspects of the invention are illustrated by way of example and not by way of limitation in the figures of the accompanying drawings in which like references indicate similar elements. It should be noted that references to “an” or “one” aspect in this disclosure are not necessarily to the same aspect, and they mean at least one. Also, in the interest of conciseness and reducing the total number of figures, a given figure may be used to illustrate the features of more than one aspect of the disclosure, and not all elements in the figure may be required for a given aspect. In the drawings:
In the following description, numerous specific details are set forth. Whenever the shapes, relative positions and other aspects of the parts described are not explicitly defined, the scope of the invention is not limited only to the parts shown, which are meant merely for the purpose of illustration. Also, while numerous details are set forth, it is understood that some aspects of the disclosure may be practiced without these details.
In the description, certain terminology is used to describe the various aspects of the disclosure here. For example, in certain situations, the terms “component,” “unit,” “module,” and “logic” are representative of hardware and/or software configured to perform one or more functions. For instance, examples of “hardware” include, but are not limited or restricted to an integrated circuit such as a processor (e.g., a digital signal processor, microprocessor, application specific integrated circuit, a micro-controller, etc.). Further, “a processor” may encompass one or more processors, such as a processor in a remote server working with a processor on a local client machine. Of course, the hardware may be alternatively implemented as a finite state machine or even combinatorial logic. An example of “software” includes executable code in the form of an application, an applet, a routine or even a series of instructions. The software may be stored in any type of machine-readable medium.
In an aspect, the processor 3 and spatial sound processor 7 may each include a processor, such as a microprocessor, a microcontroller, a digital signal processor, or a central processing unit, and other needed integrated circuits such as glue logic. In an aspect, the processor 3 and spatial sound processor 7 may be a single processor. The term “processor” may refer to a device having two or more processing units or elements, e.g. a CPU with multiple processing cores. The processor 3 and spatial sound processor 7 may execute software instructions or code stored in a storage. The storage may include one or more different types of storage such as hard disk drive storage, nonvolatile memory, and volatile memory such as dynamic random access memory. In some cases, a particular function as described below may be implemented as two or more pieces of software in the storage that are being executed by different hardware units of a processor. The processor 3 includes a vector response processor 6, which may include a vector timing adjuster or an HRTF equalizer used for timbral correction as described below.
The processor 3 receives the audio bitstream containing a surround sound channel, e.g., a 5.1 surround format having left and right surround channels, and renders the surround sound channel into a digital signal that is referred to here as the spatialized surround channel digital signal. As illustrated in
The first virtualized line array source 10 may have the characteristics of a line array source as it would sound in a simulated virtual environment. The first virtualized line array source 10 may be produced by, for example, rendering the surround channel as a plurality of finite source elements 10a located in virtual space that have a plurality of vector component responses, with each vector component response associated with a respective finite source element 10a, and each impulse vector representing the same output of the surround channel. The second virtualized line array source 11 is also produced during the same rendering, and the finite source elements 11a of the second virtualized line array source 11 may work in partnership with the respective finite source elements 10a of the first virtualized line array 10 to produce a sense of immersion in the listener 9 during playback. The plurality of finite source elements 10a may be placed in proximity in virtual space at substantially the same elevation along the azimuth. The plurality of finite source elements 10a may be of sufficient density and unity so as to seem a continuous virtualized line array source 10 to a listener 9, e.g., the listener 9 should not be able to determine that the virtualized line array source 10 is composed of finite source elements 10a. In an example, the finite source elements 10a may all be located at the same elevation along the azimuth as the listener's ears. In another aspect, the finite source elements 10a may be located at the same elevation along an azimuth above the listener 9, such as to form an overhead array. In another aspect, the finite source elements 10a may be located at the same elevation along an azimuth below the listener 9. In one aspect, the finite source elements 10a may be located at any of varying elevations and varying azimuths, such that the virtualized line array source 10 may be substantially vertical, horizontal, or at an angle, so long as the virtualized line array source 10 seems continuous to the listener 9. The vector component responses may all be substantially uniform, such that a plurality of virtual listeners that are equidistant from a plurality of finite source elements 10a will all perceive a vector component response at the same time. The processor 3 (see
The processor 3 may also upmix or downmix the surround channel of the audio bitstream. The audio bitstream may take the form of multichannel audio, objected based audio, scene based audio, and any other type of audio source encoded with a surround sound channel. The audio bitstream may correspond to a music composition, a track for a television show or movie, and any other type of audio work. For an audio bitstream with one surround sound channel, the processor 3 may upmix the one surround sound channel to produce the spatialized surround channel digital signal. For an audio bitstream with a plurality of surround sound channels, the processor 3 may downmix the plurality of surround sound channels to produce the digital signal.
The vector component response processor 6 receives the spatialized surround channel digital signal. In one aspect, the digital signal is sent to the vector timing adjuster. The vector timing adjuster delays the playback timing of each finite source element 10a so that the playback timing of the vector component responses is non-uniform.
For instance, a first finite source element that is near an end of the virtualized line array source 10 may play a piece of audio content of the surround channel at a desired time. A second finite source element that is between the center point of the virtualized sound array source 10 and the first finite source element may play a piece of audio content at a time delay from the first finite source element, whereas the playback time of the second finite source element relative to the first finite source element and the distance from a center point of the virtualized line array source 10 may be calculated through numerical methods. The relationship between delay of playback time of a finite source element 10a and the distance between the finite source element 10a and the center point of the virtualized line array source 10 may be calculated such that the difference in time between the vector component response of the finite source element 10a reaching the user and a vector component response emanating from near a center point of the virtualized line array source 10 reaching the user is negligible. This may have the effect of producing a vector response pattern that is substantially parabolic, e.g., a listener 9 will perceive the first virtualized line array 10 as a curved virtualized line array 12 and the second virtualized line array as a curved virtualized line array 13. In one aspect, the virtualized line array 10 is synthesized by the processor 3 as a curved virtualized line array 12, such as a segment of a circle, such that during playback the individual vector component responses from all finite source elements will reach the listener 9 at substantially the same time.
The HRTF equalizer is another tool that reduces timbral differences between finite source elements in the virtualized line array source in the spatialized surround channel digital signal. In contrast to the vector timing adjuster, the HRTF equalizer applies an HRTF filter to each finite source element 11a as seen in
A result of the vector response processor 6 performing a time alignment upon a virtualized speaker array source 10 (as described above using the vector timing adjuster) may be similar in principle to smoothing the HRTF across the range of angles spanned by the elements 10a, 10b, . . . of the virtualized speaker array source 10, e.g., computing an average HRTF across those elements 10a, 10b and then applying, by the binaural processor (spatial sound processor 7) the same, average HRTF to all of the elements of the virtualized speaker array source 10, to produce the left and right headphone signals.
Another approach for timbral correction is to configure the vector response processor 6 to apply comb filtering to the spatialized surround channel digital signal. For example,
In yet another aspect of the disclosure here, rather than having the elements 10a, 10b, of the virtualized speaker array source 10 be equi-spaced as depicted in
When rendering is complete, the processor 3 sends the spatialized surround channel digital signal to a spatial sound processor 7, such as a binaural processor, which then generates a number of speaker output signals. In the case of binaural reproduction, the spatial sound processor 7 generates left and right speaker output signals which are left and right headphone driver signals, by applying a binaural rendering algorithm to the spatialized surround channel digital signal, and transmits the speaker output signals to the speakers 8 which are in this case headphones (for playback or output as sound.) The spatial sound processor 7 generates the speaker output signals by applying head related transfer function (HRTF) filters to the digital signal to produce a left ear component and a right ear component that together drive the headphone speakers to reproduce the acoustical waveforms at a listener's eardrums as they would have been present at a listener's eardrums if the surround sound had emanated from an actual line array source.
The following aspects may be described as a process, which may be depicted as a flowchart, a flow diagram, a structure diagram, or a block diagram. Although a flowchart may describe the operations as a sequential process, many of the operations can be performed in parallel or concurrently. In addition, the order of the operations may be re-arranged. A process is terminated when its operations are completed. A process may correspond to a method, a procedure, etc.
An aspect of the disclosure is a machine-readable medium having stored thereon instructions which program a processor to perform some or all of the operations described above. A machine-readable medium may include any mechanism for storing or transmitting information in a form readable by a machine (e.g., a computer), such as Compact Disc Read-Only Memory (CD-ROMs), Read-Only Memory (ROMs), Random Access Memory (RAM), and Erasable Programmable Read-Only Memory (EPROM). In other aspects, some of these operations might be performed by specific hardware components that contain hardwired logic. Those operations
might alternatively be performed by any combination of programmable computer components and fixed hardware circuit components.
While the disclosure here has been described in terms of several aspects, those of ordinary skill in the art will recognize that the disclosure is not limited to the aspects described, but can be practiced with modification and alteration within the spirit and scope of the appended claims. The description is thus to be regarded as illustrative instead of limiting. There are numerous other variations to different aspects described above, which in the interest of conciseness have not been provided in detail. Accordingly, other aspects are within the scope of the claims.
This non-provisional patent application claims the benefit of the earlier filing date of U.S. provisional application No. 62/738,862 filed 28 Sep. 2018.
Number | Name | Date | Kind |
---|---|---|---|
6498857 | Sibbald | Dec 2002 | B1 |
20130142355 | Isaac et al. | Jun 2013 | A1 |
20150131824 | Nguyen | May 2015 | A1 |
Entry |
---|
Merimaa, Juha, “Modification of HRTF Filters to Reduce Timbral Effects in Binaural Synthesis”, Oct. 2009, Audio Engineering Society, AES 127th Convention, pp. 1-14. (Year: 2009). |
Gardner et al., “HRTF Measurements of a KEMAR Dummy-Head Microphone”, May 1994, MIT Media Lab, Technical Report #280, pp. 1-7. (Year: 1994). |
Digital Augmented Reality Audio Headset by Jussi Ramo and Vesa Valimaki; Hindawi Publishing Corporation Journal of Electrical and Computer Engineering vol. 2012, Article ID 457374, 13 pages. |
Audio Engineering Society Convention Paper on the audibility of comb-filter distortions, by Stefan Brunner, Hans-Joachim Maempel, and Stefan Weinzierl; Presented at the 122nd Convention May 5-8, 2007 Vienna, Austria; 8 Pages. |
New Factors in Sound for Cinema and Television, by Tomlinson Holman; Presented at the 89th Convention of the Audio Engineering Society, Los Angeles, CA, 1990; J. Audio Eng. Soc., vol. 39. No. 7/8, 1991; 12 Pages. |
Number | Date | Country | |
---|---|---|---|
62738862 | Sep 2018 | US |