The present invention relates to a three-dimensional audio signal processing system using a rigid sphere, the method which can acquire three-dimensional audio signals by using mikes disposed on a rigid sphere and reproduce the three-dimensional audio signals in diverse reproduction environments.
Conventionally, three-dimensional audio signal acquiring systems are mainly based on Binaural technology in which audio signals are acquired by setting up mikes on the ears of dummy heads and reproduced through a headphone.
Since the audio signals are acquired through the mikes set up in the ears of the dummy heads in the Binaural technology, when people listen to the audio signals through the headphone, it feels like that they are in the place where the sound is acquired.
However, if binaural signals are acquired through the dummy heads and reproduced in a speaker, crosstalk phenomenon occurs. Crosstalk is a phenomenon in which output signals of the left speaker are heard by the right ear while those of the right speaker are heard by the left ear. To remove the crosstalk phenomenon, various methods for designing an inverse filter are suggested.
Recently, researchers are studying a system with a rigid sphere, a simplified form of a dummy head that resembles the head of a human, to acquire three-dimensional audio signals through the rigid sphere. Since a rigid sphere can estimate the shape of a signal characteristically, the technology can give the effect of dummy head by acquiring and processing three-dimensional audio signals.
The conventional method of acquiring three-dimensional audio signals by using dummy heads can acquire very natural sound because it uses a dummy head, which resembles the head of a human. However, since the size and shape of a human head differ according to each individual, the audio signals obtained by using the dummy head having a specific size and shape in the conventional method cannot be satisfactory to all people.
Also, in the conventional method, when the binaural signals are reproduced through a speaker, the audio signals acquired by setting up mikes in the ears of the dummy heads travel through the ears of a listener. Thus, the effect of ears imposed on the signals is doubled.
In addition, the conventional dummy heads have a problem that it takes many restrictions to record sound in public places due to the size and shape of the dummy head which resembles the head of a human.
A human being moves his/her head a little to the right and left when he/she determines a direction of sound. However, the signals acquired from the dummy heads have an effect of front-back confusion, in which signals from the front direction are determined as signals from the back direction and the signals from the back are determined as the signals from the front. This is because it is hard to determine a direction due to the fixed direction of the ears of the dummy heads.
Moreover, since the output of a dummy head is basically a two-channel signal, it is hard to extend the output into a multichannel signal.
It is, therefore, an object of the present invention to provide a three-dimensional audio signal processing system and method using a rigid sphere, the system and method that can acquire three-dimensional audio signals by simplifying the shape of a human head into a sphere and disposing mikes on the sphere.
It is another object of the present invention to provide a three-dimensional audio signal processing system and method using a rigid sphere, the system and method that can acquire three-dimensional audio signals by simplifying the shape of a human head into a sphere and disposing mikes on the sphere and applying the acquired three-dimensional audio signals to diverse reproduction systems that exist currently.
In accordance with an aspect of the present invention, there is provided a system for processing three-dimensional audio signals by using a rigid sphere, including: a three-dimensional audio signal acquiring unit for acquiring audio signals by using a predetermined number of mikes set up on the rigid sphere; and a three-dimensional audio signal post-processing unit for converting the acquired audio signals to reproduce in diverse reproduction environments such as five-channel, four-channel, headphone, stereo, and stereo dipole reproduction environments.
In accordance with another aspect of the present invention, there is provided a three-dimensional audio signal processing system, further including a three-dimensional audio signal reproducing unit for reproducing the audio signals obtained from the three-dimensional audio signal post-processing unit in diverse reproduction environments such as five-channel, four-channel, headphone, stereo, and stereo dipole reproduction environments.
In accordance with another aspect of the present invention, there is provided a method for processing three-dimensional audio signals by using a rigid sphere, including the steps of: a) acquiring audio signals by using a predetermined number of mikes set up on the rigid sphere; and b) converting the audio signals to reproduce in diverse reproduction environments such as five-channel, four-channel, headphone, stereo, and stereo dipole reproduction environments.
In accordance with another aspect of the present invention, there is provided a three-dimensional audio signal processing method, further including a step of: c) reproducing the audio signals obtained from the three-dimensional audio signal post-processing unit in diverse reproduction environments such as five-channel, four-channel, headphone, stereo, and stereo dipole reproduction environments.
The above and other objects and features of the present invention will become apparent from the following description of the preferred embodiments given in conjunction with the accompanying drawings, in which:
Other objects and aspects of the invention will become apparent from the following description of the embodiments with reference to the accompanying drawings, which is set forth hereinafter.
First, a conventional three-dimensional audio signal acquiring method using mikes set up at both right and left 90° positions can give a three-dimensional audio effect, because the technology can describe an interaural level difference and an interaural time difference between two ears which a human being uses to sense the direction of sound. However, due to the characteristics of a rigid sphere, signals that enter from the back and front at the same angle have the same characteristics. This causes front and back confusion in which signals from the front and those from the back are not discriminated from each other.
The present invention suggests a system and method that can reduce the front and back confusion by disposing a plurality of mikes on a rigid sphere and thereby differentiating the front and back signals and, additionally, reproduce the signals acquired from the mikes in diverse reproduction environments such as five-channel, four-channel, headphone, stereo, and stereo dipole reproduction environments.
As shown in
The three-dimensional audio signal acquiring unit 110 acquires three-dimensional audio signals from the mikes disposed on the rigid sphere, a simplified form of a human head, and it includes a center mike for increasing the image of the front side and two side mikes on each right side and left side to compensate the head movement of the human.
The three-dimensional audio signal post-processing unit 120 performs post-processing to reproduce the three-dimensional audio signals, which are acquired in the three-dimensional audio signal acquiring unit 110 by using the five mikes on the rigid sphere, in diverse reproduction environments. The post-processing includes a 5×5 crosstalk removal filtering, a 4×4 crosstalk removal filtering, a conversion filtering and a 2×2 crosstalk removal filtering. The 5×5 crosstalk removal filtering is a process for reproducing the three-dimensional audio signals by using five channels except a low frequency effect (LFE) channel in a conventional 5.1 channel reproducing system.
The 4×4 crosstalk removal filtering is a process for reproducing the three-dimensional audio signals through a right speaker, a left speaker, a right surround speaker and a left surround speaker by using four channels except the center channel among the five channels.
The conversion filtering is a process for converting multichannel signals into two-channel signals to reproduce them in a headphone. The 2×2 crosstalk removal filtering is a process for reproducing the two-channel signals for the headphone reproduction in stereo and/or stereo dipole reproduction environments.
The three-dimensional audio signal reproducing unit 130 reproduces the three-dimensional audio signals in diverse reproduction environments such as five-channel, four-channel, headphone, stereo, and stereo dipole reproduction environments by converting them in the three-dimensional audio signal post-processing unit 120 adaptively to a reproduction environment.
The three-dimensional audio signal processing system of the present invention will be described in detail with reference to
As shown in
A mike is positioned at the center of the rigid sphere and acquires audio signals in front. Four side mikes are disposed on the right and left sides, two on each side at a degree of 15 before and behind in order to compensate the right/left head movement of a human, an action for determining the direction of sound.
The mike for the front side is referred to herein as a first mike and the mikes on the left are referred to as a second mike and a fourth mike. The mikes on the right are referred to as a third mike and a fifth mike. Audio signals acquired by using the five mikes are referred to as audio signals u1, u2, u3, u4, and u5.
The three-dimensional audio signal post-processing unit 120 performs post-processing to reproduce the signals u1, u2, u3, u4, and u5 outputted from the five mikes in the three-dimensional audio signal acquiring unit 110 in diverse reproduction systems.
The three-dimensional audio signal post-processing unit 120 is operated as follows.
First, speaker input signals vC5ch, vL5ch, vR5ch, vLS5ch and vRS5ch of a five-channel reproduction system are generated based on the output signals u1, u2, u3, u4, and u5 and the convolution operation in a 5×5 inverse filter 310 for removing crosstalk between five speakers and five target points. Here, vC5ch denotes an input signal to a center speaker; vL5ch denotes an input signal to a left speaker; vR5ch denotes an input signal to a right speaker; vLS5ch denotes an input signal to a left surround speaker; and vRS5ch denotes an input signal to a right surround speaker.
Five target points indicate five points on a horizontal plane of the rigid sphere, which is illustrated in
In case of five-channel reproduction, an inverse filter is used to remove crosstalk between the speakers and target points so that the output signal of the center speaker is observed only in the first target point; that of the left speaker, only in the second target point; that of the right speaker, only in the third target point; that of the left surround speaker, only in the fourth target point; and that of the right surround speaker, only in the fifth target point.
To design the 5×5 inverse filter, five speakers are positioned with a rigid sphere at the center and impulse is generated from each of the five speakers. Then, an impulse response between the five speakers and five target points is obtained by measuring responses at the five target points on the rigid sphere.
The inverse function of the impulse response is the 5×5 inverse filter that removes crosstalk between the five-channel reproduction system and five target points.
The speaker input signals vC5ch, vL5ch, vR5ch, vLS5ch and vRS5ch the five-channel reproduction system are generated based on convolution operation of the output signals u1, u2, u3, u4, and u5 in the three-dimensional audio signal acquiring unit 110.
Meanwhile, in order to generate four-channel reproducing signals, four speaker input signals are generated in 4×4 inverse filter 320 based on four mike output signals u2, u3, u4, and u5 except the first mike output signal u1 among the five output signals u1, u2, u3, u4, and u5 of the three-dimensional audio signal acquiring unit 110 except Low Frequency Effect (LFE) channel and the center channel among the structure of 5.1 channel speakers.
The speaker input signals vL4ch, vR4ch, vLS4ch and vRS4ch four-channel reproduction system are generated based on the output signals u2, u3, u4, and u5 of the three-dimensional audio signal acquiring unit 110 and a convolution operation of a 4×4 inverse filter for removing crosstalk between four speakers and four target points. Here, vL4ch denotes an input signal of a left speaker; vR4ch denotes an input signal of a right speaker; vLS4ch denotes an input signal of a left surround speaker; and vRS4ch denotes an input signal of a right surround speaker.
The four target points denote four points on a horizontal plane of the rigid sphere, as shown in
In case of a four-channel reproduction, an inverse filter is used to remove crosstalk between the speakers and target points so that the output signal of the left speaker is observed only in the second target point; that of the right speaker, only in the third target point; that of the left surround speaker, only in the fourth target point; and that of the right surround speaker, only in the fifth target point.
The 4×4 inverse filter is designed by disposing four speakers with the rigid sphere at the center and generating impulses in the four speakers. Then, an impulse response between the four speakers and four target points is obtained by measuring the responses at the four target points on the rigid sphere.
The inverse function of the impulse response is the 4×4 inverse filter that removes crosstalk between the four-channel reproduction system and four target points.
The speaker input signals vL4ch, vR4ch, vLS4ch and vRS4ch of the four-channel reproduction system are generated based on convolution operation of the output signals u2, u3, u4, and u5 in the three-dimensional audio signal acquiring unit 110.
Meanwhile, headphone reproducing signals are generated in two methods which will be described hereafter.
One method is to put the rigid sphere at the center of the five-channel reproduction system and convert five-channel speaker input signals into two-channel headphone reproducing signals in the 5×2 filter A 330 by using impulse responses from the positions of the five speakers and the right and left 90° positions of the rigid sphere, which is described in
In the drawing, SIR denotes an impulse response of the rigid sphere, i.e., sphere impulse response; LT denotes the left 90° point of the rigid sphere; and RT denotes the right 90° point of the rigid sphere. That is, SIRC-LT denotes an impulse response from a center speaker to the LT.
After transfer functions from the five speakers to RT and LT at the right and left 90° positions of the rigid sphere at the center are obtained, right and left headphone reproducing signals vLHP
Subsequently, the other method for generating two-channel signals for headphone reproduction is to use a 5×2 filter B 340 obtained by converting an impulse response of the rigid sphere.
The impulse response of the rigid sphere is measured by setting up a mike at a horizontal 0° position of the rigid sphere and generating impulse by varying the direction of the speakers by 5° each time.
The headphone reproducing signals are generated based on a filter which is acquired by obtaining an inverse function of an impulse response at 0°, where a mike and a speaker are parallel with each other, among the measured impulse responses and performing impulse responses and convolution operation.
SF0-355=conv(SIR0-355, SIR0−1) Eq. 2
where SIR0−1 denotes an inverse function of the impulse response at 0°; SIR0-355 denotes impulse response of the rigid sphere at each angle; and “conv” denotes convolution operation.
The filter obtained as above and the output signals u1, u2, u3, u4, and u5 of the three-dimensional audio signal acquiring unit 110 go through a convolution operation expressed as Equation 3 to thereby generate headphone reproducing signals.
vLHP
Meanwhile, to generate input signals vRST and vLST to the right and left speakers for stereo reproduction, crosstalk should be removed in a 2×2 inverse filter 350 based on transfer functions between the stereo speaker, which is shown in
The impulse response between the stereo speaker and RT and LT of the rigid sphere is a value obtained by generating impulse in the right and left speakers of the stereo reproduction system, which is shown in
The inverse function of the impulse response is the inverse filter that removes crosstalk between the stereo speaker and the target point (LT and RT) of the rigid sphere.
The input signals vRST and vLST to the right and left speakers of the stereo reproduction system are generated by selecting one of two-channel headphone reproducing signals A and B and performing convolution operation of a 2×2 inverse filter 350.
To generate input signals vRSD and vLSD to the right and left speakers for stereo dipole reproduction, crosstalk should be removed based on a transfer function between a stereo dipole reproduction system, which is shown in
The impulse response between the speaker and the RT and LT of the rigid sphere at the center is a value obtained by generating impulse in the right and left speakers and measuring impulse at the RT and LT which are the right and left 90° positions of the rigid sphere in the stereo dipole reproduction system, which is shown in
The inverse function of the impulse response is the inverse filter that removes crosstalk between the stereo dipole speakers and the target point (LT and RT) of the rigid sphere.
Input signals vRSD and vLSD to the right and left speakers of the stereo dipole reproduction system are generated by selecting one of two-channel headphone reproducing signals A and B and performing convolution operation of the 2×2 inverse filter 360.
The three-dimensional audio signal reproducing unit 130 reproduces a signal obtained by performing conversion in the three-dimensional audio signal post-processing unit 120 through a conversion filter that is suitable for each reproduction environment.
Five-channel reproducing signals of the three-dimensional audio signal post-processing unit 120 are inputted to a five-channel reproduction system, which is shown in
Headphone reproducing signals A and B are input signals to a headphone, which is shown in
Stereo reproducing signals are input signals to a stereo reproduction system of
As shown, at step S1101, audio signals are acquired by using five mikes disposed on a rigid sphere. At step S1102, post-processing is performed on the acquired audio signals to reproduce them in diverse reproduction environments such as five-channel, four-channel, headphone, stereo, and stereo dipole reproduction environments.
Subsequently, at step S1103, audio signals obtained from the post-processing are reproduced in the actual reproduction environment.
The method described above can be embodied as a program and stored in a computer-readable recording medium such as CD-ROMs, RAM, ROM, floppy disks, hard disks, and magneto-optical disks.
The technology of the present invention can acquire three-dimensional audio signals by using five mikes on the rigid sphere and reproduce them in diverse reproduction environments such as five-channel, four-channel, headphone, stereo, and stereo dipole reproduction environments by performing post-processing. Since the rigid sphere with mikes makes people feel comfortable compared to a dummy head, it can be used to acquire three-dimensional audio signals in public places such as concerts.
While the present invention has been described with respect to certain preferred embodiments, it will be apparent to those skilled in the art that various changes and modifications may be made without departing from the scope of the invention as defined in the following claims.
Number | Date | Country | Kind |
---|---|---|---|
10-2003-0099168 | Dec 2003 | KR | national |
10-2004-0027214 | Apr 2004 | KR | national |
Number | Name | Date | Kind |
---|---|---|---|
4393270 | van den Berg | Jul 1983 | A |
5862227 | Orduna-Bustamante et al. | Jan 1999 | A |
6005948 | Maeda | Dec 1999 | A |
6041127 | Elko | Mar 2000 | A |
6424719 | Elko et al. | Jul 2002 | B1 |
6904152 | Moorer | Jun 2005 | B1 |
6934395 | Ito | Aug 2005 | B2 |
20030147539 | Elko et al. | Aug 2003 | A1 |
Number | Date | Country |
---|---|---|
51085702 | Jul 1976 | JP |
03125599 | May 1991 | JP |
08-107595 | Apr 1996 | JP |
2000-023300 | Jan 2000 | JP |
2000-152372 | May 2000 | JP |
2000-354300 | Dec 2000 | JP |
2004-204600 | Jul 2003 | JP |
WO0131973 | May 2001 | WO |
Number | Date | Country | |
---|---|---|---|
20050141723 A1 | Jun 2005 | US |