1. Technical Field
The present invention relates to an audio data processing apparatus, an audio apparatus, an audio data processing method, a program, and a recording medium recording this program.
2. Description of Related Art
In recent years, researches for audio systems employing basic principles of wave field synthesis (WFS) are actively carried out in Europe and other regions (for example, see Non-patent Document 1 (A. J. Berkhout, D. de Vries, and P. Vogel (The Netherlands), Acoustic control by wave field synthesis, The Journal of the Acoustical Society of America (J. Acoust. Soc.), Volume 93, Issue 5, May 1993, pp. 2764-2778)). The WFS is a technique that the wave front of sound emitted from a plurality of speakers (referred to as a “speaker array”, hereinafter) arranged in the shape of an array is synthesized on the basis of Huygens' principle.
A listener who listens sound in front of a speaker array in sound space provided by a WFS receives feeling as if sound emitted actually from the speaker array were emitted from a sound source (referred to as a “virtual sound source”, hereinafter) virtually present behind the speaker array (for example, see
Apparatuses to which WFS systems are applicable include movies, audio systems, televisions, AV racks, video conference systems, and TV games. For example, in a case that digital contents are a movie, the presence of each actor is recorded on a medium in the shape of a virtual sound source. Thus, when an actor who is speaking moves inside the screen space, the virtual sound source is allowed to be located left, right, back, and forth, and in an arbitrary direction within the screen space in accordance with the direction of movement of the actor inside the screen space. For example, Patent Document 1 (Japanese Unexamined Patent Application Publication No. 2007-502590) describes a system achieving the movement of a virtual sound source.
In a physical phenomenon known as the Doppler effect, the frequency of sound waves are observed in different values depending on the relative velocity between a sound source which is a source generating sound waves and a listener. According to the Doppler effect, when a sound source which is a source generating sound waves approaches a listener, the oscillation of sound waves is compressed and hence the frequency becomes higher. On the contrary, when the sound source departs from the listener, the oscillation of sound waves is expanded and hence the frequency becomes lower. This indicates that even when the sound source moves, the number of waves of the sound reaching from the sound source does not change.
Nevertheless, in the technique described in Non-patent Document 1, it is premised that the virtual sound source is fixed and not moving. Thus, the Doppler effect occurring in association with the movement of the virtual sound source is not taken into consideration. Thus, when the virtual sound source moves in a direction of departing from the speaker or in a direction of approaching, the number of waves of the audio signal providing the basis of the sound generated by the speaker is changed and hence the change in the number of waves causes distortion in the waveform. When distortion is caused in the waveform, the listener perceives the distortion as noise. Thus, means resolving the waveform distortion need be provided. Details of distortion in the waveform are described later.
On the other hand, in the method described in Patent Document 1, with taking into consideration the Doppler effect generated in association with the movement of the virtual sound source, a weight coefficient is changed for the audio data in a range from suitable sample data within a particular segment in the audio data providing the basis of the audio signal to suitable sample data in the next segment, so that the audio data in the range is corrected. Here, the “segment” indicates the unit of processing of audio data. When the audio data is corrected, extreme distortion in the audio signal waveform is resolved to some extent and hence noise caused by the waveform distortion is reduced.
Nevertheless, in the method described in Patent Document 1, merely the smoothing of audio data is simply performed. That is, the method described in Patent Document 1 is different from that waveform distortion is identified in accordance with approaching or departing of the virtual sound source relative to the speaker and then different correction is performed in accordance with the identified waveform distortion. As a result, in the method described in Patent Document 1, waveform distortion is remained frequently and hence a problem arises that satisfactory effect of avoiding noise caused by waveform distortion is not achieved.
The present invention has been devised in view of this problem. An object of the present invention is provide an audio data processing apparatus and the like in which the part of waveform distortion is identified depending on the approaching or departing of the virtual sound source relative to the speaker and then different correction is performed in accordance with the waveform distortion so that waveform distortion generated when the virtual sound source moves is resolved and hence noise caused by the waveform distortion is avoided.
The audio data processing apparatus according to the present invention is an audio data processing apparatus that receives audio data corresponding to sound generated by a moving virtual sound source, a position of the virtual sound source, and a position of a speaker emitting sound on the basis of the audio data and that corrects the audio data on the basis of the position of the virtual sound source and the position of the speaker, the apparatus comprising: calculating means calculating first and second distances measured at two time points from the position of the speaker to the position of the virtual sound source; comparing means comparing the first and the second distances with each other; identifying means, when the first and the second distances are different from each other as a result of comparison, identifying a distorted part in the audio data at the two time points; and correcting means performing different correction on the audio data of the identified part depending on approaching or departing of the virtual sound source relative to the speaker.
In the audio data processing apparatus according to the present invention, the audio data contains sample data, the identifying means identifies a repeated part of the sample data caused by departing of the virtual sound source from the speaker, and the correcting means includes first correcting means correcting the identified repeated part.
In the audio data processing apparatus according to the present invention, the audio data contains sample data, the identifying means identifies a lost part of the sample data caused by approaching of the virtual sound source to the speaker, and the correcting means includes second correcting means correcting the preceding and the following parts of the identified lost part.
In the audio data processing apparatus according to the present invention, the audio data contains sample data, the identifying means identifies a repeated part of the sample data or a lost part of the sample data caused by approaching and departing of the virtual sound source relative to the speaker, and the correcting means includes: first correcting means correcting the identified repeated part; and second correcting means correcting the preceding and the following parts of the identified lost part.
In the audio data processing apparatus according to the present invention, the part to be processed by the correction has a time width equal to a difference between time widths during propagation of the sound waves through the first and the second distances or a time width proportional to the difference.
In the audio data processing apparatus according to the present invention, the first correcting means replaces the sample data contained in the identified repeated part with sample data obtained by uniformly expanding, into twice the time width, one of two waveforms formed on the basis of the sample data.
In the audio data processing apparatus according to the present invention, the second correcting means replaces the sample data contained in the identified lost part and in the preceding and the following parts of the lost part with sample data obtained by uniformly compressing into ⅔ of the time width a waveform formed on the basis of the sample data.
The audio data processing apparatus according to the present invention further comprises means performing gain control on the audio data corrected by the correcting means.
In the audio data processing apparatus according to the present invention, the number of the virtual sound sources is unity or a plurality.
The audio apparatus according to the present invention is an audio apparatus that uses audio data corresponding to sound generated by a moving virtual sound source, a position of the virtual sound source, and a position of a speaker emitting sound on the basis of the audio data and that thereby corrects the audio data on the basis of the position of the virtual sound source and the position of the speaker, the apparatus comprising: a digital contents input part receiving digital contents containing the audio data and the position of the virtual sound source; a contents information separating part analyzing the digital contents received by the digital contents input part and separating audio data and position data of the virtual sound source contained in the digital contents; an audio data processing part, on the basis of the position data of the virtual sound source separated by the contents information separating part and the position data of the speaker, correcting the audio data separated by the contents information separating part; and an audio signal generating part, on the basis of the corrected audio data, generating an audio signal to the speaker, wherein the audio data processing part includes: means calculating first and second distances measured at two time points from the position of the speaker to the position of the virtual sound source; means comparing the first and the second distances with each other; means, when the first and the second distances are different from each other as a result of comparison, identifying a distorted part in the audio data at the two time points; and means performing different correction on the audio data of the identified part depending on approaching or departing of the virtual sound source relative to the speaker.
In the audio apparatus according to the present invention, the digital contents input part receives digital contents from a recording medium storing digital contents, a server distributing digital contents through a network, or a broadcasting station broadcasting digital contents.
The audio data processing method according to the present invention is an audio data processing method employed in an audio data processing apparatus that receives audio data corresponding to sound generated by a moving virtual sound source, a position of the virtual sound source, and a position of a speaker emitting sound on the basis of the audio data and that corrects the audio data on the basis of the position of the virtual sound source and the position of the speaker, the method comprising: a step of calculating first and second distances measured at two time points from the position of the speaker to the position of the virtual sound source; a step of comparing the first and the second distances with each other; a step of, when the first and the second distances are different from each other as a result of comparison, identifying a distorted part in the audio data at the two time points; and a step of performing different correction on the audio data of the identified part depending on approaching or departing of the virtual sound source relative to the speaker.
The program according to the present invention is a program that receives audio data corresponding to sound generated by a moving virtual sound source, a position of the virtual sound source, and a position of a speaker emitting sound on the basis of the audio data and that corrects the audio data on the basis of the position of the virtual sound source and the position of the speaker, the program causing a computer to execute: a step of calculating first and second distances measured at two time points from the position of the speaker to the position of the virtual sound source; a step of comparing the first and the second distances with each other; a step of, when the first and the second distances are different from each other as a result of comparison, identifying a distorted part in the audio data at the two time points; and a step of performing different correction on the audio data of the identified part depending on approaching or departing of the virtual sound source relative to the speaker.
The recording medium according to the present invention records the above-mentioned program.
In the audio data processing apparatus according to the present invention, when the first and the second distances are different from each other, a distorted part is identified in the audio data at two time points. Then, different correction on the audio data of the identified part is performed depending on approaching or departing of the virtual sound source relative to the speaker. Thus, waveform distortion caused by the movement of the virtual sound source is resolved.
In the audio data processing apparatus according to the present invention, correction is performed on the repeated part of the sample data caused by departing of the virtual sound source relative to the speaker. Thus, waveform distortion generated when the virtual sound source is departing from the speaker is resolved.
In the audio data processing apparatus according to the present invention, correction is performed on the lost part of the sample data caused by approaching of the virtual sound source relative to the speaker. Thus, waveform distortion generated when the virtual sound source is approaching the speaker is resolved.
In the audio data processing apparatus according to the present invention, the repeated part of the sample data and the lost part of the sample data caused by approaching and departing of the virtual sound source relative to the speaker are corrected. Thus, waveform distortion generated when the virtual sound source is approaching and departing relative to the speaker is resolved.
In the audio data processing apparatus according to the present invention, correction by gain control is further performed on the sample data having undergone the above-mentioned correction. Thus, waveform distortion caused by approaching and departing of the virtual sound source relative to the speaker is corrected.
In the audio apparatus according to the present invention, when the first and the second distances are different from each other, a distorted part is identified in the audio data at two time points. Then, different correction on the audio data of the identified part is performed depending on approaching or departing of the virtual sound source relative to the speaker. Thus, an audio signal is outputted in which waveform distortion caused by the movement of the virtual sound source is resolved.
In the audio data processing method according to the present invention, when the first and the second distances are different from each other, a distorted part is identified in the audio data at two time points. Then, different correction on the audio data of the identified part is performed depending on approaching or departing of the virtual sound source relative to the speaker. Thus, waveform distortion caused by the movement of the virtual sound source is resolved.
In the program according to the present invention, when the first and the second distances are different from each other, a distorted part is identified in the audio data at two time points. Then, different correction on the audio data of the identified part is performed depending on approaching or departing of the virtual sound source relative to the speaker. Thus, waveform distortion caused by the movement of the virtual sound source is resolved.
In the computer-readable recording medium according to the present invention, when the first and the second distances are different from each other, a distorted part is identified in the audio data at two time points. Then, different correction on the audio data of the identified part is performed depending on approaching or departing of the virtual sound source relative to the speaker. Thus, waveform distortion generated when the virtual sound source moves is resolved.
According to the audio data processing apparatus and the like according to the present invention, audio data is corrected when the virtual sound source moves. Thus, waveform distortion caused by the movement of the virtual sound source is resolved and hence noise caused by the waveform distortion can be avoided.
The above and further objects and features will more fully be apparent from the following detailed description with accompanying drawings.
First, description is given for: a calculation model assuming that the virtual sound source does not move in sound space provided by a WFS; and a calculation model taking into consideration the movement of the virtual sound source. Then, an embodiment is described.
On the other hand,
In the present calculation model, sample data at time t is generated for an audio signal provided to the m-th speaker (referred to as the “speaker 103—m”, hereinafter) contained in the speaker array 103. Here, as illustrated in
Here,
qn(t) is sample data at discrete time t of sound wave emitted from the n-th virtual sound source (referred to as the “virtual sound source 101—n”, hereinafter) among the N virtual sound sources 101 and then having reached the speaker 103—m among the M speakers, and
lm(t) is sample data at discrete time t of an audio signal provided to the speaker 103—m.
q
n
=G
n
·s
n(t−τmn) (2)
Here,
Gn is a gain coefficient for the virtual sound source 101—n,
sn(t) is sample data at discrete time t of an audio signal provided to the virtual sound source 101—n, and
τmn is the number of samples corresponding to the sound wave propagation time corresponding to the distance between the position of the virtual sound source 101—n and the position of the speaker 103—m.
Here,
w is a weight constant,
rn is the position vector (fixed value) of the virtual sound source 101—n, and
rm is the position vector (fixed value) of the speaker 103—m.
is a floor symbol,
R is the sampling rate, and
c is the speed of sound in air.
Here, the floor symbol expresses “an integer that is maximum among those not exceeding a given value”. As seen from Equations (3) and (4), in the present calculation model, the gain coefficient Gn for the virtual sound source 101—n is inverse proportional to the square root of the distance from the virtual sound source 101—n to the speaker 103—m. This is because the set of the speakers 103—m is modeled as a line of sound source. On the other hand, τmn is proportional to the distance from the virtual sound source 101—n to the speaker 103—m.
In Equations (1) to (4), it is premised that the virtual sound source 101—n does not move and stands still at a particular position. Nevertheless, in the real world, persons speak while walking, and automobiles run while generating engine sound. That is, in the real world, a sound source stands still in some cases and moves in some cases. Thus, in order to treat these cases, a new calculation model (calculation model according to an embodiment) is introduced which takes into consideration a situation that a sound source moves. This new calculation model is described below.
When a situation that the virtual sound source 101—n moves is taken into consideration, Equations (2) to (4) are replaced by Equations (5) to (7) given below.
q
n(t)=Gn,t·sn(t−τmn,t) (5)
Here,
Gn,t is a gain coefficient for the virtual sound source 101—n at discrete time t, and
τmn,t is the number of samples corresponding to the sound wave propagation time corresponding to the distance between the virtual sound source 101—n and the speaker 103—m at discrete time t.
Here,
rn,t is the position vector of the virtual sound source 101—n at discrete time t.
Since the virtual sound source 101—n moves, as seen from Equations (5) to (7), the gain coefficient for the virtual sound source 101—n, the position of the virtual sound source 101—n, and the sound wave propagation time vary as a function of discrete time t.
In general, signal processing on the audio data is performed segment by segment. The “segment” is the unit of processing of audio data and is also referred to as a “frame”. For example, one segment is composed of 256 pieces of sample data or 512 pieces of sample data. Thus, lm(t) (sample data at discrete time t of an audio signal provided to the speaker 103—m) in Equation (1) is calculated in the unit of segment. Thus, in the present calculation model, the segment of audio data calculated at discrete time t and used for generating the audio signal provided to the speaker 103—m is expressed by a vector Lm,t. In this case, Lm,t is vector data constructed from “a” pieces of sample data (such as 256 pieces of sample data and 512 pieces of sample data) contained in one segment extending from discrete time t−a+1 to discrete time t. Lm,t is expressed by Equation (8).
L
m,t=(lm(t−a+1), lm(t−a+2), . . . , lm(t)) (8)
Thus, for example, Lm,t0 at discrete time t0 is expressed by
L
m,t0=(lm(t0−a+1), lm(t0−a+2), . . . , lm(t0))
When this Lm,t0 is obtained, Lm,(t0+a) is then calculated.
Lm,(t0+a) is expressed by
L
m,(t0+a)=(lm(t0+1), lm(t0+2), . . . , lm(t0+a))
Since the audio data is processed segment by segment, it is practical that rn,t also is calculated segment by segment. However, the frequentness of update of rn need not indispensably agree with the segment unit. Then, as a result of comparison between the virtual sound source position rn,t0 at discrete time t0 and the virtual sound source position rn,t0−a at discrete time (t0−a), it is recognized that the virtual sound source position rn,t0 varies by the distance that the virtual sound source 101—n has moved between discrete time (to−a) and discrete time t0. The following description is given for: a case that the virtual sound source 101—n moves in a direction of departing from the speaker 103—m (the virtual sound source 101—n is departing from the speaker 103—m); and a case that the virtual sound source 101—n moves in a direction of approaching (the virtual sound source 101—n is approaching the speaker 103—m).
Gn,t and τmn,t also vary in correspondence to the distance that the virtual sound source 101—n moves between discrete time (t0−a) and discrete time t0. The following Equations (9) and (10) express the amount of variation in the gain coefficient that varies in accordance with the distance that the virtual sound source 101—n has moved between discrete time (t0−a) and discrete time t0 and the amount of variation in the number of samples corresponding to the sound wave propagation time. For example, ΔGn,t0 expresses the amount of variation of the gain coefficient at discrete time t0 relative to the gain coefficient at discrete time (t0−a), and Δτmn,t0 expresses the amount of variation (also referred to as a “time width”) of the number of samples corresponding to the sound wave propagation time at discrete time t0 relative to the number of samples corresponding to the sound wave propagation time at discrete time (t0−a). When the virtual sound source moves from discrete time (t0−a) to discrete time t0, these amounts of variation take any one of a positive value and a negative value depending on the direction of movement of the virtual sound source 101—n.
When the virtual sound source 101—n is departing or approaching relative to the speaker 103—m, ΔGn,t0 and time width Δτmn,t0 arise and hence waveform distortion occurs at discrete time t0. Here, a state that “waveform distortion” has occurred indicates a state that the audio signal waveform does not vary continuously and does vary discontinuously to an extent that the part is perceived as noise by the listener.
For example, when the virtual sound source 101—n moves in a direction of departing from the speaker 103—m so that the sound wave propagation time increases, that is, when the time width Δτmn,t0 is positive, in the beginning part of the segment starting at discrete time t0, the audio data of the final part of the preceding segment appears again for the time width Δτmn,t0. In the following description, the preceding segment of the segment starting at discrete time t0 is referred to as a first segment, and the segment starting at discrete time t0 is referred to as a second segment. As a result, distortion occurs in the waveform.
On the other hand, when the virtual sound source 101—n moves in a direction of approaching the speaker 103—m so that the sound wave propagation time decreases, that is, when the time width Δτmn,t0 is negative, a loss of time width Δτmn,t0 is generated between the audio data of the final part of the first segment and the audio data of the beginning part of the second segment. As a result, a discontinuity point arises in the audio signal waveform. This is also waveform distortion. Detailed examples of distortion in the waveform are described below with reference to the drawings.
First, description is given for a case that the virtual sound source 101—n moves in a direction of departing from the speaker 103—m so that the sound wave propagation time corresponding to the distance between the position of the virtual sound source 101—n and the position of the speaker 103—m increases, that is, a case that the time width Δτmn,t0 is positive.
In the present example, it is assumed that the virtual sound source 101—n moves in a direction of departing from the speaker 103—m so that the number of samples corresponding to the sound wave propagation time corresponding to the distance from the virtual sound source 101—n to the speaker 103—m in the second segment increases, for example, by five (=Δτmn,t) points in comparison with the number of samples corresponding to the sound wave propagation time corresponding to the distance from the virtual sound source 101—n to the speaker 103—m in the first segment. As a result of increase in the sound wave propagation time, the sample data 308, 309, 310, 311, and 312 of the final part of the first segment illustrated in
Description is given below for the contrary case that the virtual sound source 101—n moves in a direction of approaching the speaker 103—m so that the sound wave propagation time decreases, that is, a case that the time width Δτmn,t0 is negative.
The reason why waveform distortion is generated when the virtual sound source 101—n moves has been described above. Next, an embodiment according to the present invention in which audio data is corrected so that waveform distortion is resolved is described in detail with reference to the drawings.
From a recording medium 1117 storing digital contents (such as movies, computer games, and music videos), the reproducing part 1109 reads appropriate digital contents and then outputs the contents to the contents information separating part 1102. The recording medium 1117 is composed of a CD-R (Compact Disc Recordable), a DVD (Digital Versatile Disk), a Blu-ray Disk (registered trademark), or the like. In the digital contents, a plurality of audio data files respectively corresponding to the virtual sound sources 101_1 to 101_N and virtual sound source position data corresponding to the virtual sound sources 101_1 to 101_N are recorded in a manner of correspondence to each other.
The communication interface part 1110 acquires digital contents from a server 1115 distributing digital contents via a communication network such as the Internet 1114, and then outputs the acquired contents to the contents information separating part 1102. Further, the communication interface part 1110 is provided with devices (not illustrated) such as an antenna and a tuner, and receives a program broadcasted from a broadcasting station 1116 and then outputs the received program as digital contents to the contents information separating part 1102.
The contents information separating part 1102 acquires digital contents from the reproducing part 1109 or the communication interface part 1110, and then analyzes the digital contents so as to separate audio data and virtual sound source position data from the digital contents. Then, the contents information separating part 1102 outputs the audio data and the virtual sound source position data obtained by the separation, respectively to the audio data storing part 1103 and the virtual sound source position data storing part 1104. For example, when the digital contents is a music video, the virtual sound source position data is position data corresponding to the relative positions of a singer and a plurality of musical instruments displayed on the video screen. The virtual sound source position data is, together with the audio data, stored in the digital contents.
The audio data storing part 1103 stores the audio data acquired from the contents information separating part 1102, and the virtual sound source position data storing part 1104 stores the virtual sound source position data acquired from the contents information separating part 1102. The speaker position data storing part 1106 acquires from the speaker position data input part 1105 the speaker position data specifying the within-the-sound-space positions of the speakers 103_1 to 103_M of the speaker array 103, and then stores the acquired data. The speaker position data is information set up by the user on the basis of the positions of the speakers 103_1 to 103_M constituting the speaker array 103. For example, this information is expressed with reference to coordinates in one plane (X-Y coordinate system) fixed to the audio apparatus within the sound space. The user operates the speaker position data input part 1105 so as to store the speaker position data into the speaker position data storing part 1106. In a case that arrangement of the speaker array 103 is determined in advance from a constraint on the practical mounting, the speaker position data is set up as fixed values. On the other hand, in a case that the user is allowed to determine the arrangement of the speaker array 103 arbitrarily to an extent, the speaker position data is set up as variable values.
The audio data processing part 1101 reads from the audio data storing part 1103 the audio files corresponding to the virtual sound sources 101_1 to 101_N. Further, the audio data processing part 1101 reads from the virtual sound source position data storing part 1104 the virtual sound source position data corresponding to the virtual sound sources 101_1 to 101_N. Further, the audio data processing part 1101 reads from the speaker position data storing part 1106 the speaker position data corresponding to the speakers 103_1 to 103_M of the speaker array 103. On the basis of the virtual sound source position data and the speaker position data having been read, the audio data processing part 1101 performs the processing according to the embodiment onto the read-out audio data. That is, the audio data processing part 1101 performs arithmetic processing on the basis of the above-mentioned calculation model in which the movement of the virtual sound sources 101_1 to 101_N is taken into consideration, so as to generate audio data used for forming audio signals to be provided to the speakers 103_1 to 103_M. The audio data generated by the audio data processing part 1101 is outputted as audio signals through the D/A conversion part 1107, and then outputted through the amplifiers 1108_1 to 1108_M to the speakers 103_1 to 103_M. On the basis of these audio signals, the speakers 103_1 to 103_M generate and emit sound to the sound space.
The distance data calculating part 1201 acquires the virtual sound source position data and the speaker position data respectively from the virtual sound source position data storing part 1104 and the speaker position data storing part 1106, then, on the basis of these data, calculates distance data (|rn,t−rm|) between the virtual sound source 101—n and each of the speakers 103_1 to 103_M, and then outputs the calculated data to the sound wave propagation time data calculating part 1202 and the gain coefficient data calculating part 1204. On the basis of the distance data (|rn,t−τm|) acquired from the distance data calculating part 1201, the sound wave propagation time data calculating part 1202 calculates sound wave propagation time data (the number of samples corresponding to the sound wave propagation time) τmn,t (see Equation (7)). The sound wave propagation time data buffer 1203 acquires the sound wave propagation time data τmn,t from the sound wave propagation time data calculating part 1202, and then temporarily stores the sound wave propagation time data corresponding to plural segments. On the basis of the distance data (|rn,t−rm|) acquired from the distance data calculating part 1201, the gain coefficient data calculating part 1204 calculates gain coefficient data Gn,t (see Equation (6)).
The input audio data buffer 1206 acquires from the audio data storing part 1103 the input audio data corresponding to the virtual sound sources 101_1 to 101_N, and then stores temporarily the input audio data corresponding to plural segments. For example, one segment is composed of 256 pieces of audio data or 512 pieces of audio data. Using the sound wave propagation time data τmn,t calculated by the sound wave propagation time data calculating part 1202 and the gain coefficient data Gn,t calculated by the gain coefficient data calculating part 1204, the output audio data generating part 1207 generates output audio data corresponding to the input audio data temporarily stored in the input audio data buffer 1206. The output audio data superposing part 1208 synthesizes audio data for the sound corresponding to the output audio data generated by the output audio data generating part 1207, in accordance with the number of virtual sound sources 101.
Main operation in an embodiment is described below with reference to FIGS. 12 to 14. The input audio data buffer 1206 reads from the audio data storing part 1103 the input audio data of one segment extending from discrete time t1 to discrete time (t1+a−1), and then temporarily stores the read-out data. The following description is given with reference to
The distance data calculating part 1201 calculates the distance data (|r1,t1−τ1|) expressing the distance at discrete time t1 between the first virtual sound source (referred to as the “virtual sound source 101_1”, hereinafter) and the first speaker (referred to as the “speaker 103_1”, hereinafter), and then outputs the calculated data to the sound wave propagation time data calculating part 1202 and the gain coefficient data calculating part 1204.
Using Equation (7), on the basis of the distance data (|r1,t1−r1|) acquired from the distance data calculating part 1201, the sound wave propagation time data calculating part 1202 calculates the sound wave propagation time data τ11,t1 and then outputs the calculated data to the sound wave propagation time data buffer 1203.
The sound wave propagation time data buffer 1203 stores the sound wave propagation time data τ11,t1 acquired from the sound wave propagation time data calculating part 1202. With reference to
Using Equation (6), on the basis of the distance data (|r1,t1−r1|) acquired from the distance data calculating part 1201, the gain coefficient data calculating part 1204 calculates gain coefficient data G1,t1 and then outputs the obtained result to the gain coefficient data buffer 1205. The gain coefficient data buffer 1205 stores the gain coefficient data G1,t1 in a form similar to the sound wave propagation time data buffer 1203. Here, the gain coefficient data buffers are prepared in a number equal to (the number of speakers)×(the number of virtual sound sources present at time t1). That is, at least M×N the gain coefficient data buffers are prepared and each buffer stores the gain coefficient data of the past two segments and the present gain coefficient data.
When the above-mentioned processing is repeated by a number of times equal to the number (M pieces) of speakers, the sound wave propagation time data τmn,t1 of the speakers 103_1 to 103_M are stored into the sound wave propagation time data buffer 1203 and the gain coefficient data Gn,t1 of the speakers 103_1 to 103_M are stored into the gain coefficient data buffer 1205.
Then, the input audio data buffer 1206 reads from the audio data storing part 1103 the input audio data within the next segment, that is, within one segment extending from discrete time (t1+a) to discrete time (t1+2a−1), and then temporarily stores the read-out data. Then, the sound wave propagation time data calculating part 1202 and the gain coefficient data calculating part 1204 performs the same processing as the above-mentioned one so as to calculate the sound wave propagation time data τmn,(t1+a) of the speakers 103_1 to 103_M and the gain coefficient data Gn,t1+a and then temporarily store the obtained data respectively into the sound wave propagation time data buffer 1203 and the gain coefficient data buffer 1205. At that time, the sound wave propagation time data buffer 1203 stores the sound wave propagation time data τmn,(t1−a), τmn,t1, and τmn,(t1+a) corresponding to three segments respectively starting at discrete time points (t1−a), t1, and (t1+a). Further, the gain coefficient data buffer 1205 stores the gain coefficient data Gn,(t1−a), Gn,t1, and Gn,(t1+a) corresponding to three segments respectively starting at discrete time (t1−a), t1, and (t1+a).
For the purpose of use in the virtual sound source 101—1, the output audio data generating part 1207 generates the output audio data used for forming audio signals to be provided to the speakers 103_1 to 103_M. In order to generate output audio data at discrete time t1, the output audio data generating part 1207 reads from the input audio data buffer 1206 the audio data from discrete time (t1−τmn,t1) to discrete time (t1−τmn,t1+a−1), and then multiplies each data piece by Gn,t1.
Here, it is assumed that between discrete time t1 and discrete time (t1+a), the virtual sound source 101—n moves in a direction of departing from the speaker 103—m. In this case, the sound wave propagation time data τmn,(t1+n) becomes greater than the sound wave propagation time data τmn,t1. Thus, since Δτmn,(t1+a)=τmn,(t1+a)−τmn,t1 holds, the time width Δτmn,(t1+a) becomes positive. In this case, in the beginning part of the segment starting at discrete time (t1+a), the audio data of the final part of the preceding segment, that is, the segment starting at discrete time t1, is repeated for the time width Δτmn,(t1+a). Here, it is assumed that also between discrete time (t1−a) and discrete time t1, the virtual sound source 101—n moves in a direction of departing from the speaker 103—m. Also in this case, the sound wave propagation time data τmn,t similarly becomes greater than the sound wave propagation time data τmn,(t1−a). Thus, since Δτmn,t1=τmn,t1−τmn,(t1−a) holds, the time width Δτmn,t1 becomes positive. Accordingly, in the audio data of the beginning part of the segment starting at discrete time t1, the audio data of the final part of the preceding segment, that is, the segment starting at discrete time (t1−a), is repeated for the time width Δτmn,t1. As a result, in the segment starting at discrete time t1, waveform distortion occurs relative to each of the preceding and the following segments, that is, the segment starting at discrete time (t1−a) and the segment starting at discrete time (t1+a). At this time point, that is, at discrete time t1, the to-be-corrected segment is the segment starting at discrete time t1. Then, the to-be-corrected intervals within the segment are two intervals consisting of the part of time width Δτmn,t1 sample that contains waveform distortion between this segment and the preceding segment and the part of time width Δτmn,(t1+a) sample that contains waveform distortion between this segment and the following segment. At that time, obviously, to-be-corrected intervals consisting of the part of time width Δτmn,t1 sample and the part of time width Δτmn,(t1+a) sample are contained symmetrically also in the preceding and the following segments of the to-be-corrected segment. Then, in the preceding segment, the interval is already corrected in the correction processing performed for the time point of discrete time (t1−a). Further, in the following segment, the interval is corrected in the future in the correction processing performed for the time point of discrete time (t1+a). Here, the contents of processing are the same for the two correction intervals in the to-be-corrected segment. Thus, the following description performed with reference to the drawings is given only for the correction interval part of the distorted waveform generated relative to the following segment, that is, the segment that follows the to-be-corrected segment.
With reference to
As a result, as seen from
Next, on the contrary to the above-mentioned example, it is assumed that the virtual sound source 101—n has moved in a direction of approaching the speaker 103—m between discrete time t1 and discrete time (t1+a). In this case, the sound wave propagation time data τmn,(t1+a) becomes smaller than the sound wave propagation time data τmn,t1. Thus, since Δτmn,(t1+a)=τmn,(t1+a)−τmn,t1 holds, the time width Δτmn,(t1+a) becomes negative. In this case, the audio data is lost relative to the segment starting at discrete time t1 and the segment starting at discrete time (t1+a). Further, it is assumed that the virtual sound source 101—n has moved in a direction of approaching the speaker 103—m also between discrete time (t1−a) and discrete time t1. Also in this case, the sound wave propagation time data τmn,t1 becomes smaller than the sound wave propagation time data τmn,(t1−a). Thus, since Δτmn,t1=τmn,t1−τmn,(t1−a) holds, the time width Δτmn,t1 becomes negative. In this case, the audio data is lost relative to the segment starting at discrete time (t1−a) and the segment starting at discrete time t1.
In the above-mentioned case, in the segment starting at discrete time t1, waveform distortion occurs relative to each of the preceding and the following segments, that is, the segment starting at discrete time (t1−a) and the segment starting at discrete time (t1+a). At this time, that is, at discrete time t1, similarly to the above-mentioned example, the to-be-corrected segment is the segment starting at discrete time t1. Then, the to-be-corrected intervals within the segment are two intervals consisting of the part of time width Δτmn,t1 sample that contains waveform distortion between this segment and the preceding segment and the part of time width Δτmn,(t1+a) sample that contains waveform distortion between this segment and the following segment. At that time, obviously, to-be-corrected intervals consisting of the part of time width Δτmn,t1 sample and the part of time width Δτmn,(t1+a) sample are contained symmetrically also in the preceding and the following segments of the to-be-corrected segment. Then, in the preceding segment, the interval is already corrected in the correction processing performed for the time point of discrete time (t1−a). Further, in the following segment, the interval is corrected in the future in the correction processing performed for the time point of discrete time (t1+a). Here, the contents of processing are the same for the two correction intervals in the to-be-corrected segment. Thus, the following description performed with reference to the drawings is given only for the correction interval part of the distorted waveform generated relative to the following segment, that is, the segment that follows the to-be-corrected segment.
With reference to
Similar calculation is performed for the next virtual sound source 101—n. Then, the output audio data superposing part 1208 superposes the output audio data for each virtual sound source 101—n so as to synthesize audio data. This processing is repeated by a number of times equal to the number of speakers. In this example, it has been assumed that the correction width for the first part within the segment of the output audio data at discrete time ti is equal to the time width Δτmn,t1. However, the correction width may be a multiple of the time width Δτmn,t1. Further, as illustrated in
As described above, audio data is corrected so that waveform distortion is resolved. Then, correction processing based on the gain coefficient data is further performed. As described above, when the virtual sound source 101—n moves, the gain coefficient data also varies. Thus, the gain coefficient is changed gradually along the correction interval width. When the gain coefficient for the to-be-corrected segment is G1 and the gain coefficient for the following segment of the to-be-corrected segment is G2, the gain coefficient for each point within the correction interval of interest becomes G=qG2+(1−q)G1. Here, q is changed from 0 to 0.5 along the correction interval. It should be noted that in the first correction interval within the following segment, at the time of correction processing, this q is changed from 0.5 to 1 along the correction interval. The way of changing may be linear or any function. As a result, the gain coefficient is changed without causing waveform distortion.
The above-mentioned processing is executed repeatedly in accordance with the number of the virtual sound sources 101—n and the number of the speakers 103—m, so that waveform distortion generated when the virtual sound source 101—n moves is resolved. This remarkably reduces the noise caused by the waveform distortion.
At step S15, when it is judged that the first and the second distances are different from each other (S15: YES), that is, when it is judged that the virtual sound source 101—n has moved, the audio data processing part 1101 identifies and corrects the part of waveform distortion caused by the relation between the to-be-corrected segment and the preceding segment (S16). In contrast, when it is judged that the first and the second distances are the same (S15: NO), that is, when it is judged that the virtual sound source 101—n has stood still, the audio data processing part 1101 goes to the processing of step S18. Details of step S18 are described later. Then, the audio data processing part 1101 performs gain control (S17). Then, the audio data processing part 1101 compares the second and third distances with each other similarly to step S15 (S18). At step S18, when it is judged that the second and the third distances are different from each other (S18: YES), that is, when it is judged that the virtual sound source 101—n has moved, the audio data processing part 1101 identifies and corrects the part of waveform distortion caused by the relation between the to-be-corrected segment and the following segment (S19) and then performs gain control (S20). In contrast, when it is judged that the second and the third distances are the same (S18: NO), that is, when it is judged that the virtual sound source 101—n has stood still, the audio data processing part 1101 goes to the processing of step S21. Then, the audio data processing part 1101 adds 1 to the number n of the virtual sound source 101—n (S21) and then judges whether the number n of the virtual sound source 101—n is equal to the maximum value N (S22). As a result of the judgment at step S21, when the number n of the virtual sound source 101—n is not equal to the maximum value N (S22: NO), the audio data processing part 1101 returns to the processing of step S11, and then performs the processing of step S11 to step S21 for the second virtual sound source 101_2 and the first speaker 103_1. In contrast, when the number n of the virtual sound source 101—n is equal to the maximum value N (S22: YES), audio data is synthesized (S23). Then, the audio data processing part 1101 substitutes 1 into the number n of the virtual sound source 101—n (S24), and then adds 1 to the number m of the speaker (S25). Then, the audio data processing part 1101 judges whether the number m of the speaker is equal to the maximum value M (S26). When the number m of the speaker is not equal to the maximum value M (S26: NO), the audio data processing part 1101 returns to the processing of S11. When the number m of the speaker is equal to the maximum value M (S26: YES), the audio data processing part 1101 terminates the processing.
The audio data processing part 1101 replaces with another data the sample data contained in the repeated part (S33). With reference to
On the other hand, the audio data processing part 1101 compresses the region containing the lost part and the preceding and the following parts of this, then replaces with another data the sample data of the compressed region (S34). For example, with reference to
The above-mentioned processing is executed in accordance with the number of the virtual sound sources 101 and the number of the speakers 103, so that waveform distortion generated when the virtual sound source 101 moves is resolved. This remarkably reduces the noise caused by the waveform distortion.
The program 231 is not limited to one read from the recording medium 230 and then stored into the EEPROM 24 or the internal storage device 25. That is, the program 231 may be stored in an external memory such as a memory card. In this case, the program 231 is read from an external memory (not illustrated) connected to the CPU 17, and then stored into the EEPROM 24 or the internal storage device 25. Alternatively, communication may be established between a communication part (not illustrated) connected to the CPU 17 and an external computer, and then the program 231 may be downloaded onto the EEPROM 24 or the internal storage device 25.
As this invention may be embodied in several forms without departing from the spirit of essential characteristics thereof, the present embodiments are therefore illustrative and not restrictive, since the scope of the invention is defined by the appended claims rather than by the description preceding them, and all changes that fall within metes and bounds of the claims, or equivalence of such metes and bounds thereof are therefore intended to be embraced by the claims.
Number | Date | Country | Kind |
---|---|---|---|
2009-279794 | Dec 2009 | JP | national |
This application is the national phase under 35 U.S.C. §371 of PCT International Application No. PCT/JP2010/071491 filed on Dec. 1, 2010, which claims priority under 35 U.S.C. 119(a) to Patent Application No. 2009-279794 filed in Japan on Dec. 9, 2009, all of which are hereby expressly incorporated by reference into the present application.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/JP2010/071491 | 12/1/2010 | WO | 00 | 7/6/2012 |