An aspect of the disclosure here relates to spatializing sound. Other aspects are also described and claimed.
Spatial audio rendering (spatializing sound) may be described as the electronic processing of an audio signal (such as a microphone signal or other recorded or synthesized audio content) to generate multi-channel speaker driver signals that produce sound which is perceived by a listener to be more real. For example, a voice signal (of a person talking) may be electronically processed to generate a virtual, point source (of the person's voice) that is perceived by the listener to be emanating from a given location that is to the right or to the left of the listener for example, instead of straight ahead or equally from all directions. Such sound is produced by a spatial audio rendering algorithm that is driving a multi-channel speaker setup, e.g., stereo loudspeakers, surround-sound loudspeakers, speaker arrays, or headphones,
An aspect of the disclosure here is a computer-implemented method for reproducing the sound of a data object that may yield a more real listening experience. An audio signal that represents sound of the data object is received by a sound engine. The object includes a visual element to be displayed, e.g., a simulated reality object such as an avatar. The sound engine splits the audio signal into two or more sub-band audio signals including a first sub-band and a second sub-band. The first sub-band may be assigned to a first location in the visual element, and the second sub-band may be assigned to a second location in the visual element that is spaced apart from the first location. A number of speaker driver signals are generated using the sub-band signals, to produce the sound of the object.
In one aspect, this is done by processing the sub-band audio signals, e.g., separately spatializing each sub-band signal, so that sound in the first sub-band emanates from a different location than sound in the second sub-band. Thus, taking a voice signal as an example, the voice signal from a single, virtual point source (on a virtual mouth) is split into two frequency domain or sub-band components assigned to two virtual point sources, respectively, one in the mouth and one in the chest. The mouth sub-band may be in a higher frequency range than the torso sub-band. The speaker driver signals may be binaural left and right headphone driver signals, for driving a headset worn by the listener, or they may be loudspeaker driver signals for a stereo or a surround sound loudspeaker system.
In another aspect, the speaker driver signals may be high frequency and low frequency signals intended for driving the tweeter and the woofer, respectively, of a 2-way speaker system.
In another aspect, one or more cut off frequencies that define the sub-bands are set, based on an acoustic characteristic, e.g., volume or size, of a room. The volume of the room may be used to determine at what frequency does sound diffuse around the room, versus how directional the sound is. The cut off frequency that demarcates the boundary between a low sub-band and a high sub-band may thus change depending on the size of the room.
The room may be a virtual room, and a visual element of the object is in the virtual room while both are presented on a display. The listener may be watching the display and wearing a headset (through which the sound of the object is being reproduced.) Alternatively, the room may be a real room in which the listener of the reproduced sound is located, and the listener is wearing a headset while looking through an optical head mounted display in which the object is being presented (as in an augmented reality environment.)
The above summary does not include an exhaustive list of all aspects of the present disclosure. It is contemplated that the disclosure includes all systems and methods that can be practiced from all suitable combinations of the various aspects summarized above, as well as those disclosed in the Detailed Description below and particularly pointed out in the Claims section. Such combinations may have particular advantages not specifically recited in the above summary.
Several aspects of the disclosure here are illustrated by way of example and not by way of limitation in the figures of the accompanying drawings in which like references indicate similar elements. It should be noted that references to “an” or “one” aspect in this disclosure are not necessarily to the same aspect, and they mean at least one. Also, in the interest of conciseness and reducing the total number of figures, a given figure may be used to illustrate the features of more than one aspect of the disclosure, and not all elements in the figure may be required for a given aspect.
Several aspects of the disclosure with reference to the appended drawings are now explained. Whenever the shapes, relative positions and other aspects of the parts described are not explicitly defined, the scope of the invention is not limited only to the parts shown, which are meant merely for the purpose of illustration. Also, while numerous details are set forth, it is understood that some aspects of the disclosure may be practiced without these details. In other instances, well-known circuits, structures, and techniques have not been shown in detail so as not to obscure the understanding of this description.
One aspect of the disclosure is
The input audio signal (e.g., a monaural signal) is associated with or represents the sound of a data object which is represented by a visual element 2, such as in a simulated reality application program. The visual element 2 of the data object appears on a display 3 after having been rendering by a video engine (not shown.) The visual element 2 may be a graphical object area (e.g., drawn on a 2D display) or it may be a graphical object volume (e.g., drawn on a 3D display) of the data object. The data object may be for example a person and the visual element 2 is an avatar of the person, depicted in
The audio system renders a single input audio signal as two or more virtual sound sources or point sources, as follows. A splitter 4 splits the audio signal into two or more sub-band audio signals (components of the input audio signal), including a first sub-band (sub-band A) and a second sub-band (sub-band B.) The splitter may be implemented for example as a filter bank. The sub-band A may be in a higher frequency range of the human audible range than the sub-band B. As an example, the low frequency band (sub-band B) may lie within 50 Hz-200 Hz. In another example, the low frequency band lies within 100 Hz-300 Hz. The high frequency band may lie above those ranges.
The sub-band A is assigned to a first location in the visual element, which is within the area or volume of the visual element, while the second sub-band is assigned to a second location in the visual element that is spaced apart from the first location (but that is also within the area or volume of the visual element.) As seen in the figure, sub-band A is spatialized as a virtual sound source A or a point source that is located at the person's or avatar's head or mouth, while sub-band B is spatialized as a virtual sound source B located at the person's or avatar's torso. The system generates a set of multi-channel speaker driver signals (two or more speaker driver signals) that drive a listening device to produce the sound of the data object, by processing the two sub-band audio signals and their associated metadata that includes their respective virtual source locations, so that sound of the sub-band A emanates from a different location than sound of the sub-band B. Note here that the location of a virtual sound source may be equivalent to an azimuthal direction or angle, and an elevation direction or angle, for example as viewed from the virtual listening position.
In the example of
Turning now to
In another instance, rather than spatializing the sound of the data object, sound of the first sub-band signal is produced by a high frequency speaker driver, e.g., a tweeter, while sound of the second sub-band signal is produced by a low frequency speaker driver, e.g., a woofer, of a 2-way or multi-way speaker system. Those speaker drivers may be integrated into the same housing of a listening device such as a laptop computer, a tablet computer, or a head mounted device. In those instances, the listening device also has therein (either integrated or mounted) the display 3.
Another aspect of the disclosure here is to add an audio processing effect into the chain of signal processing being performed upon the sub-band A audio signal (e.g., a high-frequency band being rendered as emanating from the source which in this case is the avatar's mouth) being a frequency-dependent directivity, or a frequency-and-gain dependent directivity. In
While certain aspects have been described and shown in the accompanying drawings, it is to be understood that such are merely illustrative of and not restrictive on the broad invention, and that the invention is not limited to the specific constructions and arrangements shown and described, since various other modifications may occur to those of ordinary skill in the art. The description is thus to be regarded as illustrative instead of limiting.
Number | Date | Country | |
---|---|---|---|
63278265 | Nov 2021 | US |