The invention relates to a method and a system with which an artificial audible impression corresponding to a certain space can be created for a listener. Particularly the invention relates to the processing of directed sound in such an audible impression and to the transmitting of the resulting audible impression in a system where the information presented to the user is transmitted, processed and/or compressed in a digital form.
An acoustic virtual environment means an audible impression with the aid of which the listener to an electrically reproduced sound can imagine that he is in a certain space. Complicated acoustic virtual environments often aim at imitating a real space, which is called auralization of said space. This concept is described for instance in the article M. Kleiner, B.-I. Dalenbäck P. Svensson: “Auralization—An Overviews”, 1993, J. Audio Eng. Soc., vol. 41, No. 11, pp. 861-875. The auralization can be combined in a natural way with the creation of a visual virtual environment, whereby a user provided with suitable displays and speakers or a headset can examine a desired real or imaginary space, and even “move around” in said space, whereby he gets a different visual and acoustic impression depending on which point in said environment he chooses as his examination point.
The creation of an acoustic virtual environment can be divided into three factors which are the modeling of the sound source, the modeling of the space, and the modeling of the listener. The present invention relates particularly to the modeling of a sound source and the early reflections of the sound.
The VRML97 language (Virtual Reality Modeling Language 97) is often used for modeling and processing a visual and acoustic virtual environment, and this language is treated in the publication ISO/IEC JTC/SC24 IS 14772-1, 1997, Information Technology—Computer Graphics and Image Processing—The Virtual Reality Modeling Language (VRML97), April 1997; and on the corresponding pages at the Internet address http://www.vrml.org/Specifications/VRML97/. Another set of rules being developed while this patent application is being written relates to the Java3D, which is to become the control and processing environment of the VRML, and which is described for instance in the publication SUN Inc. 1997: JAVA 3D API Specification 1.0; and at the Internet address http://www.javasoft.com/-products/java-media/3D/forDevelopers/3Dguide/-. Further the MPEG-4 standard (Motion Picture Expert Group 4) under development has as a goal that a multimedia presentation transmitted via a digital communication link can contain real and virtual objects, which together form a certain audiovisual environment. The MPEG-4 standard is described in the publication ISO/IEC JTC/SC29 WG11 CD 14496. 1997: Information technology—Coding of audiovisual objects. November 1997; and on the corresponding pages at the Internet address http://www.cselt.it/-mpeg/public/mpeg-4_cd.htm.
A=−20 dB·(d′/d″)
where d′ is the distance from the surface of the inner ellipsoid to the observation point, as measured along the straight line joining the points 101 and 105, and d″ is the distance between the inner and outer ellipsoids, as measured along the same straight line.
In Java3D directed sound is modeled with the ConeSound concept which is illustrated in
A known method for modeling the acoustics of a space comprising surfaces is the image source method, in which the original sound source is given a set of imaginary image sources which are mirror images of the sound source in relation to the reflection surfaces to be examined: one image source is placed behind each reflection surface to be examined, whereby the distance measured directly from this image source to the examination point is the same as the distance from the original sound source via the reflection to the examination point. Further, the sound from the image source arrives at the examination point from the same direction as the real reflected sound. The audible impression is obtained by adding the sounds generated by the image sources.
The prior art methods are very heavy regarding the calculation. If we assume that the virtual environment is transmitted to the user for instance as a broadcast or via a data network, then the receiver of the user should continuously add the sound generated by even thousands of image sources. Moreover, the bases of the calculation always changes when the user decides to change the location of the examination point. Further the known solutions completely ignore the fact that in addition to the direction angle the directivity of the sound strongly depends on its wave-length, in other words, sounds with a different pitch are directed differently.
From the Finnish patent application number 974006 (Nokia Corp.) and the corresponding U.S. patent application Ser. No. 09/174,989 there is known a method and a system for processing an acoustic virtual environment. There the surfaces of the environment to be modeled are represented by filters having a certain frequency response. In order to transmit the modeled environment in digital transmission form it is sufficient to present in some way the transfer functions of all essential surfaces belonging to the environment. However, even this does not take into account the effects which the arrival direction or the pitch of the sound has on the direction of the sound.
The disclosed embodiments present a method and a system with which an acoustic virtual environment can be transmitted to the user with a reasonable calculation load. In one aspect, the method and a system are able to take into account how the pitch and the arrival direction of the sound affect the direction of the sound.
In one aspect, the disclosed embodiments model the sound source or its early reflection by a parametrized system function where it is possible to set a desired direction of the sound with the aid of different parameters and to take into account how the direction depends on the frequency and on the direction angle.
In one aspect, a method for processing directed sound in an acoustic virtual environment in an electronic device, said acoustic virtual environment comprising at least one sound source, comprises defining a reference direction and a set of selected directions for the at least one sound source, each selected direction differing from said reference direction establishing a direction dependent filtering arrangement having at least one parameter disposed to at least partly determine a filtering effect of the direction dependent filtering arrangement, said at least one parameter enabling the direction dependent filtering arrangement to model how sound emitted by said at least one sound source sounds when listened from a direction that deviates from said reference direction, for each selected direction defining a value (values) of said at least one parameter, and
In one aspect, a system for processing directed sound in an acoustic virtual environment comprising at least one sound source comprises:
The model of the sound source or the reflection calculated from it comprises direction dependent digital filters. A certain reference direction, called the zero azimuth, is selected for the sound. This direction can be directed in any direction in the acoustic virtual environment. In addition to it a number of other directions are selected, in which it is desired to model how the sound is directed. Also these directions can be selected arbitrarily. Each selected other direction is modeled by digital filter having a transfer function which can be selected either to be frequency dependent or frequency independent. In a case when the examination point is located somewhere else than exactly in a direction represented by a filter it is possible to form different interpolations between the filter transfer functions.
When we want to model sound and how it is directed in a system where the information must be transmitted in a digital form it is necessary to transmit only the data about each transfer function. The receiving device, knowing the desired examination point, determines the sound is directed from the location of the sound source towards the examination point with the aid of the transfer functions it has reconstructed. If the location of the examination point changes in relation to the zero azimuth the receiving device checks how the sound is directed towards the new examination point. There can be several sound sources, whereby the receiving device calculates how the sound is directed from each sound source to the examination point and correspondingly it modifies the sound it reproduces. Then the listener obtains an impression of a correctly positioned listening place, for instance in relation to a virtual orchestra where the instruments are located in different places and where they are directed in different ways.
The simplest alternative to realize direction dependent digital filtering is to attach a certain amplification factor to each selected direction. However, then the pitch of the sound will not be taken into account. In a more advanced alternative the examined frequency band is divided into sub-bands, and for each sub-band there are presented their own amplification factors in the selected directions. In a further advanced version each examined direction is modeled by a general transfer function, for which certain coefficients are indicated which enable the reconstruction of the same transfer functions.
Below the invention is described in more detail with reference to preferred embodiments presented as examples and to the enclosed figures, in which
a shows in more detail a part of a system according to the invention; and
b shows a detail of
Reference to the
Each filter shown in
Yi(t)=Hi*X(t) (1)
where * represents convolution in relation to the time. The response Yi(t) is the sound directed into the direction in question.
In it simplest form the transfer function means that the impulse X(t) is multiplied by a real number. Because it is natural to choose the zero azimuth as that direction in which the strongest sound is directed, then the simplest transfer functions of the filters 306 to 309 are real numbers between zero and one, these limits included.
A simple multiplication by real numbers does not take into account importance of the pitch for the directivity of the sound. A more versatile transfer function is such where the impulse is divided into predetermined frequency bands, and each frequency band is multiplied by its own amplification factor, which is a real number. The frequency bands can be defined by one number which represents the highest frequency of the frequency band. Alternatively certain real number coefficients can now be presented for some example frequencies, whereby a suitable interpolation is applied between these frequencies (for instance, if there is given a frequency of 400 Hz and a factor 0.6; and a frequency of 1000 Hz and a factor is 0.2, then with straightforward interpolation we get the factor 0.4 for the frequency 700 Hz).
Generally it can be stated that each filter 306 to 309 is a certain IIR or FIR filter (Infinite Impulse Response; Finite Impulse Response) having a transfer function H which can be expressed with the aid of a Z-transform H(z). When we take the Z-transform X(t) of the impulse X(t) and the Z-transform Y(t) of the impulse Y(t), then we get the definition
whereby it is sufficient to express the coefficients [b0 b1 a1 b2 a2 . . . ] used in modeling the Z-transform in order to express an arbitrary transfer function. The upper limits N and M used in the summing represent that accuracy at which it is desired to define the transfer function. In practice they are determined by how large capacity is available in order to store and/or to transmit in a transmission system the coefficients used to model each single transfer function.
The invention is suitable for the reproduction in local equipment where the acoustic virtual environment is created in the computer memory and processed in the same connection, or it is read from a storage medium, such as a DVD disc (Digital Versatile Disc) and reproduced to the user via audiovisual presentation means (displays, speakers). The invention is further applicable in system where the acoustic virtual environment is generated in the equipment of a so called service provider and transmitted to the user via a transmission system. A device, which to a user reproduces the directed sound processed in a manner according to the invention, and which typically enables the user to select in which point of the acoustic virtual environment he wants to listen to the reproduced sound, is generally called the receiving device. This term is not intended to be limiting regarding the invention.
When the user has given the receiving device information about in which point of the acoustic virtual environment he wants to listen to the reproduced sound, the receiving device determines in which way the sound is directed from the sound source towards said point. In
According to the invention we can, in addition to the actual sound sources, also model sound reflections, particularly early reflections. In
In the embodiment presented in
The
In the filters 701, 702 and 703 each signal component is divided into the right and the left channels, or in a multichannel system generally into N channels. All signals related to a certain channel are combined in the adder 715 or 716 and directed to the adder 717 or 718, where the post-echo belonging to each signal is added to the signal. The lines 719 and 720 lead to the speakers or to the headset. In
b shows in more detail a possibility to realize the parametrized filter 722 shown in
Above we have generally discussed how the characteristics of the acoustic virtual environment can be processed and transmitted from one device to another device by using parameters. In the following we discuss how the invention is applied to a certain data transmission form. Multimedia means a mutually synchronized presentation of audiovisual objects to the user. It is thought that interactive multimedia presentations will come into large-scale use in future, for instance as a form of entertainment and teleconferencing. From prior art there are known a number of standards which define different ways to transmit multimedia programs in an electrical form. In this patent application we discuss particularly the so called MPEG standards (Motion Picture Experts Group), of which the MPEG-4 standard being prepared at the time when this patent application is filed has as an aim that the transmitted multimedia presentation can contain real and virtual objects, which together form a certain audiovisual environment. The invention is not in any way limited to be used only in connection with the MPEG-4 standard, but it can be applied for instance in the extensions of the VRML97 standard, or even in fixture audiovisual standards which are unknown for the time being.
A data stream according to the MPEG-4 standard comprises multiplexed audiovisual objects which can contain a section which is continuous in time (such as a synthesized sound) and parameters (such as the location of the sound source in the space to be modeled). The objects can be defined to be hierarchic, whereby so called primitive objects are on the lowest level of the hierarchy. In addition to the objects a multimedia program according to the MPEG-4 standard includes a so called scene description which contains such information relating to the mutual relations of the objects and to the arrangement of the general setting of the program, which information most advantageously is encoded and decoded separately from the actual objects. The scene description is also called the BIFS section (Binary Format for Scene description). The transmission of an acoustic virtual environment according to the invention is advantageously realized by using the structured audio language defined in the MPEG-4 standard (SAOL/SASL. Structured Audio Orchestra Language/Structured Audio Score Language) or the VRML97 language.
In the above mentioned languages there is at present defined a Sound node which models the sound source. According to the invention it is possible to define an extension of a known Sound node, which in this patent application is called a DirectiveSound node. In addition to the known Sound node it further contains a field, which here is called the directivity field and which supplies the information required for reconstruct the filters representing the sound directivity. Three different alternatives for modeling the filters were presented above, so below we describe how these alternatives appear in the directivity field of a DirectiveSound node according to the invention.
According to the first alternative each filter modeling a direction different from a certain zero azimuth corresponds to a simple multiplication by an amplification factor being a standardized real number between 0 and 1. Then the contents of the directivity field could be for instance as follows:
((0.79 0.8) (1.57 0.6) (2.36 0.4) (3.14 0.2))
In this alternative the directivity field contains as many number pairs as there are directions differing from the zero azimuth in the sound source model. The first number of a number pair indicates the angle in radians between the direction in question and the zero azimuth, and the second number indicates the amplification factor in said direction.
According to the second alternative the sound in each direction differing from the direction of the zero azimuth is divided into frequency bands, of which each has its own amplification factor. The contents of the directivity field could be for instance as follows:
((0.79 125.0 0.8 1000.0 0.6 4000.0 0.4)
(1.57 125.0 0.7 1000.0 0.5 4000.0 0.3)
(2.36 125.0 0.6 1000.0 0.4 4000.0 0.2)
(3.14 125.0 0.5 1000.0 0.3 4000.0 0.1))
In this alternative the directivity field contains as many number sets, separated from each other by the inner parentheses, as there are directions differing from the direction of the zero azimuth in the sound source model. In each number set the first number indicates the angle in radians between the direction in question and the zero azimuth. After the first number there are number pairs, of which the first one indicates a certain frequency in hertz and the second is the amplification factor. For instance the number set (0.79 125.0 0.8 1000.0 0.6 4000.0 0.4) can be interpreted so that in the direction 0.79 radians an amplification factor of 0.8 is used for the frequencies 0 to 125 Hz, an amplification factor of 0.6 is used for the frequencies 125 to 1000 Hz, and an amplification factor of 0.4 is used for the frequencies 1000 to 4000 Hz. Alternatively it is possible to use a notation where the above mentioned number set means that in the direction 0.79 radians the amplification factor is 0.8 at the frequency 125 Hz, the amplification factor is 0.6 at the frequency 1000 Hz, and the amplification factor is 0.4 at the frequency 4000 Hz, and the amplification factors at other frequencies are calculated from these by interpolation and extrapolation. Regarding the invention it is not essential which notation is used, as long as the used notation is known to both the transmitting device and the receiving device.
According to the third alternative a transfer function is applied in each direction differing from the zero azimuth, and in order to define the transfer function there are given the a and b coefficients of its Z-transform. The contents of the directivity field could be for instance as follows:
((45 b45,0 b45,1 a45,1 b45,2 a45,2 . . . )
(90 b90,0 b90,1 a90,1 b90,2 a90,2 . . . )
(135 b135,0 b135,1 a135,1 b135,2 a135,2 . . . )
(180 b180,0 b180,1 a180,1 b180,2 a180,2 . . . ))
In this alternative the directivity field also contains as many number sets, separated from each other by the inner parentheses, as there are directions differing from the direction of the zero azimuth in the sound source model. In each number set the first number indicates the angle, this time in degrees, between the direction in question and the zero azimuth; in this case, as also in the cases above, it is possible to use any other known angle units as well. After the first number there are the a and b coefficients which determine the Z-transform of the transfer function used in the direction in question. The points after each number set mean that the invention does not impose any restrictions on how many a and b coefficients define the Z-transforms of the transfer function. In different number sets there can be a different number of a and b coefficients. In the third alternative the a and b coefficients could also be given as their own vectors, so that an efficient modeling of FIR or all-pole-IIR filters would be possible in the same way as in the publication Ellis, S. 1998: “Towards more realistic sound in VMRL”. Proc. VRML '98, Monterey, USA, Feb. 16-19, 1998, pp. 95-100.
The above presented embodiments of the invention are of course only intended as examples, and they do not have any effect of restricting the invention. Particularly the manner in which the parameters representing the filters are arranged in the directivity field of the DirectiveSound node can be chosen in very many ways.
Number | Date | Country | Kind |
---|---|---|---|
980649 | Mar 1998 | FI | national |
Number | Name | Date | Kind |
---|---|---|---|
4731848 | Kendall et al. | Mar 1988 | A |
5285165 | Renfors et al. | Feb 1994 | A |
5293139 | Polonen et al. | Mar 1994 | A |
5350956 | Gronroos | Sep 1994 | A |
5406635 | Jarvinen | Apr 1995 | A |
5485514 | Knappe et al. | Jan 1996 | A |
5502747 | McGrath | Mar 1996 | A |
5581618 | Satoshi et al. | Dec 1996 | A |
5585587 | Inoue et al. | Dec 1996 | A |
5659619 | Abel | Aug 1997 | A |
5684881 | Serikawa et al. | Nov 1997 | A |
5790957 | Heidari | Aug 1998 | A |
5839101 | Vahatalo et al. | Nov 1998 | A |
5907823 | Sjoberg et al. | May 1999 | A |
6418226 | Mukojima | Jul 2002 | B2 |
Number | Date | Country |
---|---|---|
0 735 796 | Oct 1996 | EP |
2 303 019 | Feb 1997 | GB |
2 305 092 | Mar 1997 | GB |
2 305 092 | Mar 1997 | GB |
WO 9820706 | May 1998 | WO |
WO 9921164 | Apr 1999 | WO |