NONE.
The present invention generally relates to audio rendering in real-time communications, and more particularly relates to real-time spatial audio rendering in a virtual environment. More particularly still, the present disclosure relates to a system and method for rendering real-time stereo audio in a virtual environment.
In real-world communication, people can hear from audio sources and distinguish the direction and distance of the sources. Such determination is based on the binaural effect. The binaural effect requires that the sound wave signals, received by the listener's two ears, have two different time delays and spectral energy distributions. Therefore, spatial audio should have at least two channels (stereo audio) to provide the binaural effect for a user in a real-time communication environment, such as an online game environment. Participating people (or participants in short) are in different room conditions in real-time communication (RTC) virtual environments, such as an online meeting room or a virtual theater. They can move from one place to another within their own rooms. There may be multiple audio sources such as people speaking, TVs, etc. in a room.
However, for real-time communication, many devices such as laptops or mobile phones may only support mono-channel recording. Even the devices support stereo recording, the audio codec used by the RTC application may not support stereo audio. As a result, the audio in RTC virtual environment is often in mono format. Besides the limitation of hardware and audio codec, in the RTC virtual environment, the position of each speaker can be varying. In another word, the mono audio signals require a new real-time spatial audio rendering system to generate stereo audio according to the real-time positions of sound sources and listeners. An illustrative virtual environment is shown in
Accordingly, there is a need for a new audio rendering system and method that generate stereo audio for a listener in a virtual environment. With mono audio signals from an audio source, the real-time virtual positions of the listener and audio sources, and the real-time orientations of the listener, the real-time spatial audio rendering system need to provide real-time stereo audio signals for each listener with minimal time delay. The audio sources will be rendered and mixed into a stereo playback format for the listener in the virtual room through the real-time spatial audio rendering system. Furthermore, the listener can distinguish each audio source's direction and distance with the stereo audio, which makes the virtual RTC environment closer to a real-world listening experience. In addition, the real-time spatial audio rendering system needs to generate stereo audio signals with reverberation effects.
Generally speaking, pursuant to the various embodiments, the present disclosure provides a computer-implemented method for rendering real-time spatial audio from mono audio sources in a virtual environment. The method is performed by a real-time spatial audio rendering computer software application within a real-time spatial audio rendering system and includes determining whether reverberation is configured for rendering spatial audio from a set of mono audio sources; determining a set of dynamic locations of the set of mono audio sources relative to a listener's location in a virtual environment respectively; obtaining a set of discrete Head-Related Impulse Responses (HRIRs); converting the set of discrete HRIRs into continuous HRIRs; determining interaural time differences of each mono audio source within the set of mono audio sources based on the set of dynamic locations; modifying said continuous HRIRs with said interaural time differences to generate modified HRIRs; applying gain control on audio signals of each mono audio source within the set of mono audio sources to generate modified audio signals; convoluting the modified audio signals by the modified HRIRs to generate spatial audio signals of each mono audio source within the set of mono audio sources; and combining the spatial audio signals of all mono audio sources within the set of mono audio sources to generate anechoic audio, the anechoic audio adapted to be played back by the communication device. The spatial audio is stereo audio. The method further includes compressing the anechoic audio's level to a target range for playback by the communication device wherein the spatial audio is stereo audio.
When reverberation is configured, the method further includes generating Binaural Room Impulse Responses (BRIRs) based on a set of dimensions of a room of the listener and positions of the listener and the set of mono audio sources; convoluting the audio signals of each mono audio source within the set of mono audio sources with the BRIRs to generate reverberation stereo audio of each mono audio source within the set of mono audio sources; combining the reverberation stereo audio of all mono audio source within the set of mono audio sources to generate combined reverberation audio; and mixing the anechoic audio with the combined reverberation audio for both a left channel and a right channel to generate final spatial audio for playback on the communication device.
Further in accordance with the present teachings is a real-time spatial audio rendering system having a real-time spatial audio rendering computer software application adapted to run on a communication device. The real-time spatial audio rendering computer software application is adapted to determine whether reverberation is configured for rendering spatial audio from a set of mono audio sources; determine a set of dynamic locations of the set of mono audio sources relative to a listener's location in a virtual environment respectively; obtain a set of discrete Head-Related Impulse Responses (HRIRs); convert the set of discrete HRIRs into continuous HRIRs; determine interaural time differences of each mono audio source within the set of mono audio sources set of dynamic locations; modify said continuous HRIRs with said interaural time differences to generate modified HRIRs; apply gain control on audio signals of each mono audio source within the set of mono audio sources to generate modified audio signals; convolute the modified audio signals by the modified HRIRs to generate spatial audio signals of each mono audio source within the set of mono audio sources; and combine the spatial audio signals of all mono audio sources within the set of mono audio sources to generate anechoic audio, the anechoic audio adapted to be played back by the communication device. In one implementation, the spatial audio is stereo audio. The real-time spatial audio rendering computer software application is further adapted to compress the anechoic audio's level to a target range for playback by the communication device.
When reverberation is configured, the real-time spatial audio rendering computer software application is further adapted to generate Binaural Room Impulse Responses (BRIRs) based on a set of dimensions of a room of the listener and positions of the listener and the set of mono audio sources; convolute the audio signals of each mono audio source within the set of mono audio sources with the BRIRs to generate reverberation stereo audio of each mono audio source within the set of mono audio sources; combine the reverberation stereo audio of all mono audio source within the set of mono audio sources to generate combined reverberation audio; and mix the anechoic audio with the combined reverberation audio for both a left channel and a right channel to generate final spatial audio for playback on the communication device. In a further implementation, the real-time spatial audio rendering computer software application is further adapted to compress the final spatial audio's level to a target range.
The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.
Although the characteristic features of this disclosure will be particularly pointed out in the claims, the invention itself, and the manner in which it may be made and used, may be better understood by referring to the following description taken in connection with the accompanying drawings forming a part hereof, wherein like reference numerals refer to like parts throughout the several views and in which:
A person of ordinary skills in the art will appreciate that elements of the figures above are illustrated for simplicity and clarity, and are not necessarily drawn to scale. The dimensions of some elements in the figures may have been exaggerated relative to other elements to help understanding of the present teachings. Furthermore, a particular order in which certain elements, parts, components, modules, steps, actions, events and/or processes are described or illustrated may not be actually required. A person of ordinary skill in the art will appreciate that, for the purpose of simplicity and clarity of illustration, some commonly known and well-understood elements that are useful and/or necessary in a commercially feasible embodiment may not be depicted in order to provide a clear view of various embodiments in accordance with the present teachings.
The new real-time (RT) spatial audio rendering system provides stereo audio output with or without reverberation. Reverberation provides the sense of dimensions of the virtual room size. Reverberation is not necessarily required depending on the usage because too much reverberation may reduce the intelligibility and not suitable for certain situations, such as a virtual meeting over the Internet with multiple participants. The RT spatial audio rendering system, in one implementation, includes a computer software application (also referred to herein as real-time spatial audio rendering computer software application) running on a communication device operated by the listener or a server computer for providing stereo audio to a listener from mono audio signals from one or more audio sources. When the server computer performs the spatial audio rendering, the computer software application obtains the input data from the listener's communication device over an Internet connection, generates the stereo audio and forwards the stereo audio data to the listener's communication device over the Internet for playback by the same device. The spatial audio rendering software application includes one or more computer programs that are written in computer software programming languages, such as C, C++, C#, Java, etc.
The process by which the RT spatial audio rendering software application provides spatial audio (such as stereo audio) is further shown and generally indicated at 100 in
The communication device and a server computer are further illustrated by reference to
The communication device 202 (such as a laptop computer, a tablet computer, a smartphone, etc.), is further illustrated in
The server computer 206 is further illustrated in
Referring to
Referring to
Turning back to
In real-time, the distance between listener and an audio source can be varying when the listener is mobile. As a result, the distances between the audio source and the listener's two ears are also varying. The latency difference is very important for the sense of space to the listener. Accordingly, at 508, the spatial audio rendering software application determines the interaural time differences (ITD) of each mono audio source within the set of audio sources by calculating the distance of the audio source to each of the listener's two ears and dividing the distances by the sound speed. The ITD calculation is further shown as follows:
At 510, the spatial audio rendering software application modifies the continuous HRIRs using the interaural time differences to generate modified HRIRs. In one implementation, additional samples of zeros are added to the continuous HRIRs. For example, when the audio source is at left side and the ITD is 1 ms, and the sampling rate of HRIRs is 48000 Hz, 48 samples of zeros are added to the beginning of the right side HRIRs.
At 512, the spatial audio rendering software application applies gain control on the mono audio signals of the audio source. In particular, at 512, an audio source's volume is modified according to the distance between the mono audio source and listener. A gain adjusting the volume is applied to the audio signals from the audio source. The gain follows the volume propagation attenuation rules. In one implementation, the gain calculation is shown as follows:
Where A(d) is the gain at distance d, dref is the reference distance, and Aref is the reference gain. dref and Aref are predefined parameters, meaning that at distance dref, Aref is the amount of gain to be applied to the mono audio signals. The mono audio signals are multiplied by A(d) to generate modified audio signals of the audio source.
At 514, the spatial audio rendering software application convolutes the modified mono audio signals of the audio source by the modified HRIRs (both right and left ears) to generate the stereo audio signals of the audio source. The stereo audio signals include both right and left channels. [How are ITD and A(d) involved/used in this step?]. At 516, the spatial audio rendering software application combines the stereo audio signals of each audio source within the set of audio sources (such as the audio sources P1 and P2 shown in
When the room reverberation is desired for spatial audio rendering, the reverberation based on the Binaural Room Impulse Response (BRIR) is added during the spatial audio rendering. Referring to
At 704, the spatial audio rendering software application generates BRIRs based on the room dimension and the positions of the listener and the audio sources. An illustrative virtual room is shown in
At 706, the spatial audio rendering software application convolutes the mono audio signals of an audio source with the BRIRs to generate reverberation stereo audio (also referred to herein as reverberation audio and reverberation audio signals) of the audio source. At 708, the spatial audio rendering software application combines the generated reverberation stereo audio signals of all the audio sources within the set the audio sources (such as P1 and P2) to generate the combined reverberation stereo audio signals (or reverberation audio for short). In one implementation, the combination is achieved by adding the reverberation stereo audio signals of the set the audio sources together using the following equation:
where Si stands for the reverberation stereo audio data of the i-th audio source and n stands for the number of audio sources.
At 710, the spatial audio rendering software application mixes the anechoic stereo audio and the combined reverberation stereo audio for both the left and right channels to generate the final stereo audio for playback on the device 202. In one implementation, the mixing is the addition of the two categories of audio data. In a further implementation, at 712, the spatial audio rendering software application compresses the final audio signals's level to a target range to prevent the playback from being too loud. For instance, at 712, a dynamic audio compressor is applied to compress the final audio signal level to a target range.
Obviously, many additional modifications and variations of the present disclosure are possible in light of the above teachings. Thus, it is to be understood that, within the scope of the appended claims, the disclosure may be practiced otherwise than is specifically described above.
The foregoing description of the disclosure has been presented for purposes of illustration and description, and is not intended to be exhaustive or to limit the disclosure to the precise form disclosed. The description was selected to best explain the principles of the present teachings and practical application of these principles to enable others skilled in the art to best utilize the disclosure in various embodiments and various modifications as are suited to the particular use contemplated. It should be recognized that the words “a” or “an” are intended to include both the singular and the plural. Conversely, any reference to plural elements shall, where appropriate, include the singular.
It is intended that the scope of the disclosure not be limited by the specification, but be defined by the claims set forth below. In addition, although narrow claims may be presented below, it should be recognized that the scope of this invention is much broader than presented by the claim(s). It is intended that broader claims will be submitted in one or more applications that claim the benefit of priority from this application. Insofar as the description above and the accompanying drawings disclose additional subject matter that is not within the scope of the claim or claims below, the additional inventions are not dedicated to the public and the right to file one or more applications to claim such additional inventions is reserved.