The present invention relates to a localization control apparatus, a localization control method, a localization control program, and a computer-readable recording medium that changes a position of a sound image of input sound to be played back. Application of the present invention is not limited to the localization control apparatus, the localization control method, the localization control program, and the computer-readable recording medium above.
With the spread of DVDs and terrestrial digital broadcasts, content with surround sound such as 5.1-channel surround sound has increased. Many speakers are needed to enjoy 5.1-channel surround sound at home. However, a room has limited space for disposing many speakers. Particularly, in many cases, a speaker cannot be disposed behind a listener.
An apparatus has been disclosed that generates a virtual sound image with two filters satisfying each condition of HI=(SF−AK)/(S2−A2) and Hr=(SK−AF)/(S2−A2) when two front speakers are disposed symmetrically with respect to a listener (see, for example, Patent Document 1). Where S is a transfer function from a pair of speakers to an ear of the listener on the same side, A is a transfer function from the pair of the speakers to the other ear of the listener on the opposite side, F is a transfer function from a position to which the sound image is to be localized to the ear of the listener on the same side, and K is a transfer function from the position to which the sound image is to be localized to the other ear of the listener on the opposite side.
Patent Document 1: Japanese Patent Application Laid-open Publication No. H8-265899
However, in a playback sound field, a transfer function to a human ear has various peaks and dips, and cannot be flat in general. A filter coefficient calculated from such a transfer function has a similar characteristic. Thus, a problem arises in that a transfer function of a conventional speaker has a non-flat frequency characteristic, and frequency components of a sound source drastically change, resulting in playback with an unnatural sound quality.
Since listening circumstances and a position of a listener's head are not always constant, and as head-shape varies according to each individual, it is generally difficult to find a filter coefficient effective for every one. On the other hand, even if the filter coefficient can be approximated by an interaural level difference for each band obtained using a desired head-related transfer function (HRTF), the sound image is not localized to an intended position since a human detects the position of the sound image in terms of HRTF+α. Although the configuration can be simpler to adjust the portion of +α, in this case, a problem arises in that a logically optimal solution does not always exist.
Furthermore, head shape and playback circumstances vary depending on a user. In processing that uses the HRTF, another problem arises in that a coefficient optimal for the circumstances cannot be obtained without a measurement using a dummy head. Even if speakers are disposed symmetrically with respect to a listener, in many cases, the coefficient causing a virtual sound image to spread bilaterally the most widely most often is asymmetrical. Circumstances of a room and auditory asymmetry are the factors. As a result, a problem arises in that the virtual sound image does not spread such that the listener can listen comfortably.
A localization control apparatus according to the invention of claim 1 outputs an audio signal input thereto to one of a plurality of channels, and based on the audio signal input, outputs a control signal for controlling an audio signal for another channel among the channels. The localization control apparatus includes an attenuating unit that attenuates the audio signal input; a delaying unit that delays the audio signal attenuated by the attenuating unit; and a generating unit that generates the control signal from the audio signal delayed by the delaying unit.
A localization control method according to the invention of claim 7 is for outputting an audio signal input thereto to one of a plurality of channels, and based on the audio signal input, outputting a control signal for controlling an audio signal for another channel among the channels. The localization control method includes an attenuating step of attenuating the audio signal input; a delaying step of delaying the audio signal attenuated at the attenuating step; and a generating step of generating the control signal from the audio signal delayed at the delaying step.
A localization control program according to the invention of claim 8 causes a computer to execute the localization control method according to claim 7.
A computer-readable recording medium according to the invention of claim 9 stores therein the localization control program according to claim 8.
Referring to the accompanying drawings, exemplary embodiments of the localization control apparatus, the localization control method, the localization control program, and the computer-readable recording medium according to the present invention are explained in detail below.
The attenuating unit 101 attenuates the input audio signal. The attenuating unit 101 can attenuate the input audio signal using a bandpass filter. The delaying unit 102 delays the audio signal attenuated by the attenuating unit 101. For example, the delaying unit 102 separates, according to band, the audio signal attenuated by the attenuating unit 101, and delays the audio signal.
The generating unit 103 generates a control signal from the audio signal delayed by the delaying unit 102. For example, the generating unit 103 combines each audio signal for each band that is delayed by the delaying unit 102 to generate the control signal. Additionally, the generating unit 103 can generate a control signal for each of the other channels among the plural channels.
The output unit 104 combines the control signal generated by the generating unit 103 to the audio signal for the other channel among the plural channels, and outputs the combined audio signal to the other channel among the plural channels. By combining the control signal generated by the generating unit 103, the output unit 104 changes a sound pressure level of the audio signal of the other channel among the plural channels to change a position of a sound image of sound corresponding to the audio signal. When the input audio signal is an audio signal to be output to a left speaker, the output unit 104 outputs the input audio signal to the left speaker as it is, and the control signal generated by the generating unit 103 to a right speaker.
The input audio signal is output to one of the plural channels, and based on the input audio signal, a control signal for controlling an audio signal for another channel among the plural channels is output (step S204).
The output unit 104 combines the control signal generated by the generating unit 103 with the audio signal for the other channel, and outputs the combined audio signal to the other channel (step S205). By combining the control signal generated by the generating unit 103 and playing back the combined audio signal, the output unit 104 changes the sound pressure levels at both ears, and the position of the sound image of the sound corresponding to the audio signal.
According to the embodiment explained above, the attenuated and delayed audio signal can be output to the other speaker. As a result, an audio signal output to a speaker is delayed and output to the other speaker, thereby changing sound pressure levels at both ears, for example. As a result, a position of a sound image at a listener can be changed.
Therefore, even when a filter coefficient has a variation such as when listening circumstances and a position of a listener's head are not constant, and when head-shape differs, the filter coefficient can be adjusted accordingly and used. Even if sound for which phase difference alone is changed is played back with one speaker, a difference in the sound quality from that of the original sound can be hardly recognized. Therefore, localization control without reducing the quality of original sound is enabled.
A listener 304 listens to the sound played back from the speakers 302 and 303, the sound played back is in a state in which the localization position of the sound image is changed for the listener 304. As a result, the listener 304 can hear the sound as if the speakers 302 and 303 are disposed at virtual positions 305 and 306.
Usually, 5.1-channel content is played back with three front speakers (L, R, and C) and two rear speakers (SL: surround L, and SR: surround R). A sound image can be virtually localized without the speakers for the SL and the SR channels, and with only the speakers 302 and 303.
Upon hearing sound, a human has sound-image localization ability to acquire not only intensity, elevation, and tone of the sound, but also spatial information such as an orientation and a distance. An orientation of sound can be determined approximately by analyzing and controlling physical factors of sound image localization. A cue of sound image localization includes a time difference and an intensity difference between signals arriving at each ear, a change in a frequency characteristic of a sound wave arising from diffraction at the head, an auricle and the like, and reflection by a wall of a room.
Here, the position of the sound image is changed by changing a level difference of sound. Then, the localization control apparatus 301 changes the sound image, thereby making circumstances such that sound can be approximately heard from the position of the sound image. A human auditory sense recognizes a “sound image” such as a sound-orientation image and intensity of sound by aggregating information such as the time and the level difference between signals arriving both ears.
The stereo terminal 401 is a terminal for outputting sound to the speakers 302 and 303 upon receiving sound output from the CPU 402. The CPU 402 controls the entire localization control apparatus 301 of the example. The ROM 403 stores therein a program such as a boot program. The RAM 404 is used as a work area of the CPU 402. The HD 405 is a nonvolatile and rewritable magnetic memory. The sound-source storing unit 406 stores therein sound sources, and sound is played back by the CPU 402 reading the stored sound sources. For example, the sound sources include a CD and a DVD.
The attenuating unit 500 attenuates the input signal by multiplying the input signal by a given coefficient ATT. Here, ATT has a range of 0 to 1, for example ATT=0.5. The attenuating unit 500 attenuates the signal represented by SL by ATT, and outputs the attenuated signal to the delaying unit 510.
The delaying unit 510 includes a delay device 511, a bandpass filter 512, and an adding unit 513. The delay device 511 delays the signal input by the attenuating unit 500 according to a band of the signal. After the delaying, the delay device 511 inputs the delayed signal to bandpass filter 512.
The bandpass filter 512 includes N bandpass filters. The number of the N bandpass filters is determined by the number of bands into which the band of the signal SL is divided. In the case of 6 bands, N=6, and in the case of 9 bands, N=9. Similar to the delay device 511, the bandpass filter 512 is divided according to the number of bands.
The bandpass filter 512 filters, respectively for each band, the signal filtered by the delay device 511. The bandpass filter 512 filters each of the N separated signals according to each respective band. After the filtering, the bandpass filter 512 outputs the filtered signals to the adding unit 513. Although here, the signals are passed through the bandpass filter 512 after being subjected to the delay device 511, the signals may be subjected to the delay processing after the filtering. The adding unit 513 combines the delayed signals corresponding to the bands, and outputs the combined signal to the speaker 303.
A signal represented by SL and a signal represented by SR are input to the localization control apparatus 301. SL is output to the adding unit 602. Meanwhile, the same signal is input to the attenuating unit 610. SR is output to the adding unit 612. Meanwhile, the same signal is input to the attenuating unit 600.
Each of the delaying units 601 and 611 has the same configuration as the delaying unit 510 shown in
The adding unit 602 adds SL and the signal from the delaying unit 601, and outputs the added signal to the speaker 302. The adding unit 612 adds SR and the signal from the delaying unit 611, and outputs the added signal to the speaker 303.
The sound pressure indicated by the bar 702 is slightly smaller than that indicated by the bar 701. As a result, the sound image 700 is formed adjacent to the speaker 302. The listener feels as though the sound coming from the sound image 700.
Not only does the speaker 302 shown in
As a result, the sound pressure indicated by the bar 802 becomes relatively smaller than that indicated by the bar 801. As a result, the sound image 800 moves rearward or to the side of the listener 304, and the listener 304 feels as though the sound is coming from the sound image 800.
Thus, a waveform to which delay is applied is shifted, thereby achieving substantially same effect as the case in which the phase is reversed. In other words, an interaural level difference is changed, thereby enabling the position of the sound image to be changed. Since in actuality, wavelengths differ according to band, effective delay levels differ. Therefore, delay is applied independently according to each band.
The attenuated signal is stored in a buffer (step S1103). In other words, a_in is stored in a buffer c_buffer( ) for delay. Usually, a circular buffer of a fixed length is used as the buffer. An output sample out is initialized (step S1104).
A band counter i is initialized to 1 (step S1105). The band is divided into six band-divisions having center frequencies of 125, 250, 500, 1 k, 2 k, and 4 k (Hz). At this time, a bandwidth of each bandpass filter is 1/10 ct. It is desirable to use a linear-phase FIR filter as the filter, but IIR may be substituted when calculation amount is to be reduced.
A sample at a shift position is retrieved (step S1106). Specifically, in c_buffer( ), a sample that is d(i) samples before the current time is retrieved and regarded as a value bpf_in(i). As shown in
bpf_in(i) is filtered using a filter coefficient coef(i) (the filter coefficient is a vector value) (step S1107). The counter is incremented (step S1108). Specifically, a value of i is incremented by 1.
It is determined whether i is greater than n (here, 6) (step S1109). When i is not greater than n (step S1109: NO), the process returns to step S1106. When i is greater than n (step S1109: YES), the sample is output (step S1110). At this time, the sample is output to a channel corresponding to the right speaker (the left speaker in the case of SR input), and a series of processing ends. When the next sampling time has come, the processing is repeated from the beginning again.
The center attenuating unit 1200 and the right attenuating unit 1210 attenuate the input signals by multiplying the input signals by a given coefficient ATT. The coefficient ATT has a range of 0 to 1, and for example, can be ATT=0.5. The center attenuating unit 1200 attenuates the signal represented by SL by ATT, and outputs the attenuated signal to a right delaying unit 1201. The right attenuating unit 1210 attenuates the signal represented by SL by ATT, and outputs the attenuated signal to a right delaying unit 1211.
The center delaying unit 1200 and the right delaying unit 1210 each has the same configuration as the delaying unit 510 shown in
According to the above example, the configuration of the localization control apparatus is simple such that one parameter is provided for one band, thereby achieving easy tuning and customizing of a coefficient according to the circumstances of each person. Therefore, a filter coefficient can be easily generated according to a transmission characteristic, which varies according to listener, such as differing listening circumstances and head shapes. Even if sound for which a phase difference is changed is played back with one speaker, a difference in sound quality from the original sound can be hardly recognized. Therefore, localization control without drastically changing the sound quality of the original sound is enabled.
Even when a logically optimal solution does not always exist in the specialized case of a front surround system, not a logically optimal solution, but a subjectively optimal solution can be found. And a sound image can be localized according to this optimal solution.
Since one parameter is provided for one band with respect to head shape and playback circumstances that differ according to user, and setting by an auditory sense is easy, personalization is easily achieved. Since there is no frequency characteristic, playback without losing original sound quality is enabled.
Since a reverse phase is not used, there is no need of redundant multiplication of multiplying (−1). Generally, a sense of reverse phase is one of the most disliked items in the technique of playing back virtual sound images. In contrast, the utilization of the method employing only delay enables playback of more natural sound and does not process the sound source more than necessary.
Even if speakers are disposed symmetrically with respect to a listener, in many cases, a coefficient causing a virtual sound image to spread most widely is not symmetrical. Circumstances of a room, asymmetry of an auditory sense, and the like are considered to be factors. However, a delay level for each band need not be symmetrical with respect to left and right. Therefore, a delay level is set independently for left and right.
A pair of two speakers can be changed for each band. Particularly, a center speaker may be used as a speaker to which delay is applied for middle-to-high frequency. The delay device and the bandpass filter may be integrated into one filter. An all-pass filter (IIR) that changes only a phase or an FIR may be used.
At a band in which a wavelength is shorter than the size of a head (middle-to-high frequency of 1.5-2 kHz), an area within which energy can be reduced by combination of waveforms is small. Therefore, even if energy is reduced at one point, in some cases, energy is increased at a neighboring point that is a few centimeters away. In some cases, the sound to be localized to the left is localized to the right speaker only if a head moves a bit.
The use of a center speaker as a speaker to which delay is applied prevents a phenomenon of reverse localization from occurring, in which sound to be localized to the left is localized to the right. The shortcoming of conventional localization control based on the HRTF in which a sense of localization becomes unstable by movement of a head can be overcome. Additionally, the volume of calculations and memory utilization of coefficients can be reduced. Furthermore, this localization control apparatus is applicable to a home theater system, a personal surround system of a PDP and the like, such as a flat screen TV, a PC, and a portable DVD player.
The localization control method explained in the present embodiment can be implemented by a computer, such as a personal computer and a workstation, executing a program that is prepared in advance. This program is recorded on a computer-readable recording medium such as a hard disk, a flexible disk, a CD-ROM, an MO, and a DVD, and is executed by being read out from the recording medium by a computer. This program can be a transmission medium that can be distributed through a network such as the Internet.
Number | Date | Country | Kind |
---|---|---|---|
2005-303560 | Oct 2005 | JP | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/JP2006/320324 | 10/11/2006 | WO | 00 | 4/17/2008 |