This disclosure relates to the technical field of audio processing, in particular to a reverberation processing method, a reverberation processing device and a non-transitory computer-readable storage medium.
Environmental acoustic phenomena are ubiquitous in reality. Therefore, in an immersive virtual environment, in order to simulate the various information that the real world gives to humans as much as possible, it is necessary to simulate the impact of virtual scenes on sound in the scene with high quality, so as not to break the user's sense of immersion.
In related arts, there are mainly three categories of methods to simulate environmental acoustic phenomena: wave solvers based on finite element analysis, ray tracing and simplification of the geometric shape of the environment.
According to some embodiments of the present disclosure, there is provided a reverberation processing method, including:
In some embodiments, the estimating the shape information of the scene according to the plurality of intersection points of the plurality of sound rays centered on the listener with the scene includes:
In some embodiments, the calculating the first average acoustic parameter value of the scene material of the scene according to the first acoustic parameter values of the scene materials at the positions of the plurality of intersection points includes:
In some embodiments, a shape of the scene is a cube, and the shape information includes a side length of the cube.
In some embodiments, the calculating the reverberation time according to the shape information of the scene and the first average acoustic parameter value includes:
In some embodiments, the processing method further includes:
In some embodiments, the calculating the second average acoustic parameter value of the scene material of the scene according to the second acoustic parameter values of the scene materials at the positions of the plurality of intersection points includes:
In some embodiments, the performing the reverberation processing on the sound source signal includes:
In some embodiments, the performing the reverberation processing on the sound source signal includes:
In some embodiments, the one or more feedback gains are a plurality of feedback gains, and each of the plurality of feedback gains is determined according to a corresponding delay time.
In some embodiments, the performing the reverberation processing using the one or more feedback gains based on the result of the filtering processing includes:
In some embodiments, the performing the delay processing on the result of the filtering processing includes:
In some embodiments, the performing the delay processing on the result of the filtering processing includes:
According to other embodiments of the present disclosure, there is provided a reverberation processing device, including:
In some embodiments, the estimation unit calculates a coordinate of an average intersection point according to an average value of coordinates of the plurality of intersection points; and estimates the shape information of the scene according to an average value of distances between each of the plurality of intersection points and the average intersection point.
In some embodiments, the calculation unit calculates an average absorption rate of the scene material of the scene according to an average value of absorption rates of the scene materials at the positions of the plurality of intersection points.
In some embodiments, a shape of the scene is a cube, and the shape information includes a side length of the cube.
In some embodiments, the calculation unit calculates the reverberation time according to a side length of the scene and the average absorption rate of the scene material of the scene.
In some embodiments, the calculation unit calculates a second average acoustic parameter value of the scene material of the scene according to second acoustic parameter values of the scene materials at the positions of the plurality of intersection points; and the processing device further includes a processing unit configured to perform a reverberation processing on a sound source signal according to the second average acoustic parameter value and the reverberation time.
In some embodiments, the calculation unit calculates an average scattering rate of the scene material of the scene according to an average value of scattering rates of the scene materials at the positions of the plurality of intersection points.
In some embodiments, the processing unit performs a filtering processing on the sound source signal using an all-pass filter, where the all-pass filter is controlled according to the second average acoustic parameter value.
In some embodiments, the processing unit performs the reverberation processing using one or more feedback gains based on a result of the filtering processing, where the one or more feedback gains are controlled according to the reverberation time.
In some embodiments, the one or more feedback gains are a plurality of feedback gains, and each of the plurality of feedback gains is determined according to a corresponding delay time.
In some embodiments, the processing unit performs a delay processing on the result of the filtering processing; processes a result of the delay processing using a reflection matrix; and processes a processing result of the reflection matrix using the one or more feedback gains.
In some embodiments, the processing unit performs the delay processing on the result of the filtering processing respectively using a plurality of delay times.
In some embodiments, the processing unit performs the delay processing on a sum of the result of the filtering processing and a processing result using the one or more feedback gains.
According to still other embodiments of the present disclosure, there is provided a reverberation processing device, including:
According to still further embodiments of the present disclosure, there is provided a computer-readable storage medium stored thereon a computer program that, when executed by a processor, carries out a reverberation processing method according to any one of the above embodiments.
According to still further embodiments of the present disclosure, there is also provided a computer program including instructions that, when executed by a processor, cause the processor to carry out a reverberation processing method according to any one of the above embodiments.
According to still further embodiments of the present disclosure, there is also provided a computer program product including instructions that, when executed by a processor, cause the processor to carry out a reverberation processing method according to any one of the above embodiments.
Other features and advantages of the present disclosure will become clear through detailed descriptions of the exemplary embodiments of the present disclosure with reference to the following accompanying drawings.
The drawings described herein are used to provide further understanding of the present disclosure, which constitute a portion of the present application. The schematic embodiments and description of the present disclosure are only used for explaining the present disclosure, and do not constitute improper delimitations of the present disclosure. In the accompanying drawings:
The technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present disclosure. Obviously, the described embodiments are only a part of the embodiments of the present disclosure instead of all of them. The following descriptions on at least one illustrative embodiment are actually illustrative, but shall not be construed as any limitation on the present disclosure and its application or utilization. All other embodiments that are obtainable to those skilled in the art based on the embodiments of the present disclosure without any creative effort are included in the protection scope of the present disclosure.
Unless otherwise specifically stated, the relative arrangements, mathematic expressions and values of the components and steps illustrated in these embodiments do not limit the scope of the present disclosure. Meanwhile, it shall be understood that for ease of description, the dimensions of various parts shown in the drawings are not drawn according to actual proportional relations. Techniques, methods and devices that have already been known to ordinary skilled in the art may not be discussed here in details, but under suitable circumstances, the techniques, methods and devices shall be deemed as parts of the authorized description. In all examples shown and discussed here, any specific values should be interpreted as merely illustrative and not as limitations. Therefore, other examples of exemplary embodiments may have different values. It should be noted that similar numerals and letters indicate similar items in the following drawings, so once an item is defined in one drawing, it does not need to be further discussed in subsequent drawings.
All sounds in the real world are spatial audio. Sound originates from the vibration of objects and is heard after propagating through a medium. In the real world, vibrating objects can appear anywhere, and they may form a three-dimensional direction vector with the human head. The horizontal angle of the direction vector may affect the loudness difference, time difference and phase difference of the sound reaching both ears, and the vertical angle of the direction vector may also affect the frequency response of the sound reaching both ears. It is by relying on these physical information that human beings acquire the ability to determine the position of a sound source according to binaural sound signals under a lot of acquired and unconscious training.
In an immersive virtual environment, in order to simulate various information that the real world gives to human as much as possible, it is also necessary to simulate the impact of sound position on the binaural signals heard in high quality, so as not to break the user's sense of immersion. This impact determines the position of the sound source and the position of the listener in a static environment. In this case, it can be expressed by HRTF (Head Related Transfer Function). HRTF is a two-channel FIR (Finite Impulse Response) filter. By convolving the original signal with the HRTF at a designated position, the signal heard in a case where the sound source is at the designated position can be obtained.
However, one HRTF can only represent the relative position relationship between one fixed sound source and one definite listener. In a case where N sound sources need to be rendered, N HRTFs are needed theoretically to perform 2N convolutions on the N original signals. Moreover, in a case where the listener rotates, all N HRTFs need to be updated to correctly render the virtual spatial audio scene, resulting in a large amount of computation.
In order to solve the problem of multi-source rendering and 3DOF (3 Degrees of Freedom) rotation of the listener, spherical harmonics are applied to spatial audio rendering. The basic idea of spherical harmonics (Ambisonic) is to imagine that the sound is distributed on a sphere, and N signal channels pointing in different directions perform their respective duties and are responsible for the sound in the corresponding direction. The spatial audio rendering algorithm based on ambisonics is as follows:
In this way, the number of convolutions is only related to the number of ambionics channels, and has nothing to do with the number of sound sources, and encoding the sound sources into Ambionics is much faster than convolution. Moreover, in a case where the listener rotates, all ambisonics channels can be rotated, and the amount of calculation also has nothing to do with the number of sound sources. In addition to rendering the ambisonics signal to both ears, it can also be simply rendered to the speaker array.
In the real world, human beings, as well as other animals, perceive not only the direct sound of the sound source directly reaching the ear, but also the vibration wave of the sound source reflected, scattered and diffracted by the environment. Environmental reflected and scattered sound directly affect the auditory perception of the sound source and the listener's own environment. This perception ability is the basic principle that nocturnal animals such as bats can locate their own positions in the dark and understand their own environment.
Humans may not as sensitive in hearing as bats, but they can also get a lot of information by listening to the influence of the environment on the sound source. For example, in a case of listening to a singer performing, due to different reverberation times, it is easy to distinguish whether one is hearing the performance in a large cathedral or in a parking lot. Because the ratio of reverberation to direct sound is different, even in the cathedral, it is possible to clearly distinguish whether one is listening to the song one meter directly in front of the singer or twenty meters directly in front of the singer. For example, also for the scene in the cathedral, due to the difference in the loudness of the early reflected sounds, it is possible to clearly distinguish whether one is listening to the singer singing in the center of the cathedral or only ten centimeters away from the wall.
The wave solver based on finite element analysis (wave physical simulation) divides the space to be calculated into densely arranged cubes, which are called “voxels” (similar to the concept of pixels, except that pixels are the smallest area units on the two-dimensional plane and voxels are the smallest units of volume in the three-dimensional space). The basic process of the algorithm is as follows:
The house acoustic simulation algorithm based on the wave solver has the following advantages:
The accuracy of time and space is very high, and as long as a small enough voxel and a short enough time slice length are provided, it can be adapted to scenes of any shape and material.
At the same time, the algorithm has the following disadvantages:
The core idea of the ray tracing algorithm is to find as many sound propagation paths as possible from the sound source to the listener, so as to obtain the energy direction, delay and filtering characteristics brought by these paths. This kind of algorithm is the core of the house acoustic simulation system of Oculus and Wwise.
The algorithm for finding the propagation path from the sound source to the listener can be simply summed up in the following steps:
So far, for each sound source, some path information is recorded. Then this information is used to calculate the energy direction, delay and filtering characteristics of each path of each sound source. These pieces of information are collectively referred to as the spatial impulse response between the sound source and the listener.
Finally, as long as the spatial impulse response of each sound source is auralized, very real orientation and distance of the sound source and the characteristics of the environment where the sound source and the listener are located can be simulated. There are two methods for auralizing the spatial impulse response:
The environmental acoustic simulation algorithm based on ray tracing has the following advantages:
This type of algorithm also has the following disadvantages:
The accuracy of the algorithm is extremely dependent on the sampling amount of the initial direction of the ray, that is, more rays. However, because the complexity of ray tracing algorithm is O(nlog(n)), more rays may inevitably bring an explosive increase in the amount of computation.
Whether BRIR convolution or encoding the original signal into the spherical harmonics, the amount of computation is very considerable. With the increase of the number of sound sources in the scene, the amount of computation may increase linearly, which is not very friendly for mobile devices with limited computing power.
The idea of the algorithm for simplifying the geometric shape of the environment is to try to find an approximate but much simpler geometric shape and surface material after the geometric shape and surface material of the current scene are given, so as to greatly reduce the computation amount of environmental acoustic simulation. The algorithm for simplifying the geometric shape of the environment includes:
This type of algorithm has the following advantages:
However, this type of algorithm has the following disadvantages:
That is to say, the environmental acoustic simulation algorithm that simplifies the geometric shape of the environment provides a fast rendering speed, but sacrifices the rendering quality, and the rendering framework cannot support dynamically changing scenes, such as opening and closing doors.
Aiming at the above technical problems, the present disclosure renders the influence of the dynamically changing scenes on the environmental sound without significantly affecting the rendering speed, so that devices with weak computing power can also simulate the dynamic environmental sound of a large number of sound sources in real time. Therefore, the efficiency and accuracy of sound rendering can be improved.
As shown in
In some embodiments, a coordinate of an average intersection point is calculated according to an average value of the coordinates of the plurality of intersection points; the shape information of the scene is estimated according to an average value of the distance between each of the plurality of intersection points and an average intersection point.
In some embodiments, the shape of the scene is a cube, and the shape information includes the side length of the cube. For example, the shape of the scene can also be other shapes such as a rectangular box.
For example, centered on the listener, N sound rays are randomly and evenly scattered around, and N intersection points Pn, n∈(1, N) of these sound rays with the scene are obtained. The coordinate of the average intersection point is calculated as follows:
The shape information of an approximately cubic room is calculated: an average distance from all intersection points Pn to the coordinate
Where the side length of the cubic room is estimated to be 2
In step 120, a first average acoustic parameter value of a scene material of the scene is calculated according to first acoustic parameter values of the scene materials at the positions of the plurality of intersection points.
In some embodiments, an average absorption rate of the scene material of the scene is calculated according to an average value of absorption rates of the scene materials at the positions of the plurality of intersection points.
In some embodiments, a second average acoustic parameter value of the scene material of the scene is calculated according to second acoustic parameter values of the scene materials at the positions of the plurality of intersection points. For example, an average material scattering rate of the scene is calculated according to an average value of material scattering rates at the positions of the plurality of intersection points.
For example, the average acoustic parameter of the scene material is calculated. It is assumed that the absorption rate of the scene material is An and the scattering rate of the scene material is Sn at each intersection point Pn of the N sound rays with the scene mentioned above.
For example, the average absorption rate is:
For example, the average scattering rate is:
In step 130, the reverberation time is calculated according to the shape information of the scene and the first average acoustic parameter value. For example, the reverberation time is calculated according to the side length of the scene and the average absorption rate of the scene material of the scene.
In some embodiments, reverberation processing is performed on the sound source signal according to the second average acoustic parameter value and the reverberation time.
For example, the reverberation time is calculated by using the estimated cubic room, the average absorption rate of material and the Eyring formula:
Where S is the indoor surface area of the cubic room, and V is the net volume of the cubic room.
In a case where the position of the listener changes, the sound rays emitted from the listener may intersect with different surfaces of the scene objects, causing the reverberation time T60 and the average scattering rate
In some embodiments, reverberation processing is performed on the sound source signal according to the reverberation time.
In some embodiments, calculating an average scattering rate of the scene material of the scene according to an average value of scattering rates of the scene materials at the positions of the plurality of intersection points.
In some embodiments, performing filtering processing on the sound source signal using an all-pass filter, where the all-pass filter is controlled according to the second average acoustic parameter value.
In some embodiments, based on the result of the filtering processing, the reverberation processing is performed using one or more feedback gains, where the one or more feedback gains are controlled based on the reverberation time.
In some embodiments, one or more feedback gains are a plurality of feedback gains, and each of the plurality of feedback gains is determined according to the corresponding delay time.
For example, 16 feedback gains are controlled by the reverberation time T60 as follows:
Where delay(n) is the delay time corresponding to the feedback gain n.
In some embodiments, delay processing is performed on the result of the filtering processing; the result of the delay processing is processed using a reflection matrix; a processing result of the reflection matrix is processed using the one or more feedback gains.
In some embodiments, the second average acoustic parameter of the scene material includes the scattering rate of the scene material; the All-pass filter is set according to the scattering rate of the scene material.
In some embodiments, a plurality of delay times are used to respectively perform delay processing on the result of filtering processing.
In some embodiments, the delay processing is performed on a sum of the result of the filtering processing and the processing result using one or more feedback gains.
As shown in
As shown in
In some embodiments, the input information for calculating the reverberation model is the scene shape, the absorption rate of the scene, the scattering rate of the scene and the position of the listener.
Centered on the listener, N sound rays are randomly and evenly scattered around, and N intersection points Pn, n∈(1,N) of these sound rays with the scene are obtained. The coordinate of the average intersection point is calculated as follows:
The shape information of an approximately cubic room is calculated: the average distance from all intersection points Pn to the coordinate
It is assumed that the side length of the cubic room is 2
The average acoustic parameter of the scene material is calculated. It is assumed that the absorption rate of the scene material is An and the scattering rate of the scene material is Sn at each intersection point Pn of the N sound rays with the scene mentioned above.
For example, the average absorption rate is:
For example, the average scattering rate is:
For example, the reverberation time is calculated by using the estimated cubic room, the average absorption rate of material and the Eyring formula:
In a case where the position of the listener changes, the rays emitted from the listener may intersect with different surfaces of the scene objects, causing the reverberation time T60 and the average scattering rate
Corresponding to these changing sound field parameters, a reverberation processing link that can dynamically adjust the reverberation time and the time-domain density of reflected sounds during operation is proposed, so that the calculated changing sound field parameters can dynamically affect the heard reverberation sound, thus realizing dynamic reverberation related to the scene. Dynamic reverberation can produce different reverberation effects for the 6DoF (6 Degrees of Freedom) movement following the listener.
The input signal of the dynamic reverberator is the original signal of the sound source or the original sound source signal processed by one or more of the following effectors: loudness attenuation, air absorption filtering, delay effect processing or spatialization algorithm.
There are many implementation methods for this dynamically adjustable artificial reverberator, one of the embodiments is shown in
The structure of Schroder Allpass filter is shown in
The coefficient 0.3 can also be replaced by other values as required. The larger the value of g, the more dispersed the input energy is on the time axis.
As shown in
Where delay(n) is the delay time corresponding to the nth feedback gain.
In the above embodiments, the reverberation model in the dynamically changing scene is calculated by estimating the simplified shape of the room in real time; the dynamically adjustable artificial reverberation is controlled by the reverberation model. In this way, the influence of the dynamically changing scene on the environmental sound can be rendered without significantly affecting the rendering speed, so that devices with weak computing power can also simulate the dynamic ambient sound of a large number of sound sources in real time. Therefore, the efficiency and accuracy of sound rendering can be improved.
As shown in
In some embodiments, the estimation unit 41 calculates a coordinate of an average intersection point according to an average value of coordinates of the plurality of intersection points, and estimates the shape information of the scene according to an average value of a distance between each of the plurality of intersection points and an average intersection point.
In some embodiments, the calculation unit 42 calculates an average absorption rate of the scene material of the scene according to an average value of absorption rates of the scene materials at the positions of the plurality of intersection points.
In some embodiments, a shape of the scene is a cube, and the shape information includes a side length of the cube.
In some embodiments, the calculation unit 42 calculates the reverberation time according to a side length of the scene and the average absorption rate of the scene material of the scene.
In some embodiments, the calculation unit 42 calculates a second average acoustic parameter value of the scene material of the scene according to second acoustic parameter values of the scene materials at the positions of the plurality of intersection points; the processing device 4 further includes a processing unit 43 configured to perform a reverberation processing on a sound source signal according to the second average acoustic parameter value and the reverberation time.
In some embodiments, the calculation unit 42 calculates an average scattering rate of the scene material of the scene according to an average value of scattering rates of the scene materials at the positions of the plurality of intersection points.
In some embodiments, the processing unit 43 performs a filtering processing on the sound source signal using an all-pass filter, where the all-pass filter is controlled according to the second average acoustic parameter value.
In some embodiments, the processing unit 43 performs the reverberation processing using one or more feedback gains based on a result of the filtering processing, where the one or more feedback gains are controlled according to the reverberation time.
In some embodiments, the one or more feedback gains are a plurality of feedback gains, and each of the plurality of feedback gains is determined according to a corresponding delay time.
In some embodiments, the processing unit 43 performs a delay processing on the result of the filtering processing, processes a result of the delay processing using a reflection matrix, and processes a processing result of the reflection matrix using the one or more feedback gains.
In some embodiments, the processing unit 43 performs the delay processing on the result of the filtering processing respectively using a plurality of delay times.
In some embodiments, the processing unit 43 performs the delay processing on a sum of the result of the filtering processing and a processing result using one or more feedback gains.
As shown in
The memory 51 may include, for example, a system memory, a fixed non-volatile storage medium, and the like. The system memory stores, for example, an operating system, an application program, a Boot Loader, a database and other programs.
As shown in
Memory 610 may include, for example, a system memory, a fixed non-volatile storage medium, and the like. The system memory stores, for example, an operating system, an application program, a Boot Loader, and other programs.
The reverberation processing device 6 may also include an input/output interface 630, a network interface 640, a storage interface 650, and etc. These interfaces 630, 640, 650 and the memory 610 and the processor 620 may be connected by a bus 660, for example. The input/output interface 630 provides a connection interface for input/output devices such as a display, a mouse, a keyboard, a touch screen, a microphone, and a sound box. The network interface 640 provides a connection interface for various networking devices. The storage interface 650 provides a connection interface for an external storage device such as an SD card and a U disk.
It shall be understood by those skilled in the art that the embodiments of the present disclosure may be provided as a method, a system, or a computer program product. Therefore, embodiments of the present disclosure can take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment containing both hardware and software elements. Moreover, the present disclosure may take the form of a computer program product embodied on one or more computer-usable non-transitory storage media (including but not limited to disks, CD-ROM, optical storage, etc.) having computer-usable program code embodied in the medium.
Heretofore, all the embodiments of the present disclosure have been described in detail. In order to avoid shielding of the concept of the present disclosure, some details commonly known in the art are not described. Based on the above description, those skilled in the art can fully understand how to carry out the technical solutions disclosed here.
The method and system of the present disclosure may be implemented in a number of ways. For example, the method and system of the present disclosure may be implemented in software, hardware, firmware, or any combination of software, hardware, and firmware. The above order of the steps of the method is for illustration only, and the steps of the method of the present disclosure are not limited to the order specifically described above unless specifically stated otherwise. Further, in some embodiments, the present disclosure may also be implemented as programs recorded in a recording medium, the programs including machine-readable instructions for implementing the method according to the present disclosure. Thus, the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.
Although some specific embodiments of the present disclosure have been exemplified in detail, it shall be understood by those skilled in the art that the above examples are only illustrative, but shall by no means limit the scope of the present disclosure. Those skilled in the art will appreciate that the above embodiments may be modified without departing from the scope and spirit of the present disclosure. The scope of the present disclosure is defined by the appended claims.
| Number | Date | Country | Kind |
|---|---|---|---|
| PCT/CN2022/123290 | Sep 2022 | WO | international |
This application is a Continuation of the PCT Application No. PCT/CN2023/121368 filed on Sep. 26, 2023, which is based on International Application number PCT/CN2022/123290 and filing date of Sep. 30, 2022, and claims its priority. The disclosures of these PCT applications as a whole are incorporated into the present application herein by reference in their entireties.
| Number | Date | Country | |
|---|---|---|---|
| Parent | PCT/CN2023/121368 | Sep 2023 | WO |
| Child | 19094699 | US |