This application is based upon and claims the benefit of priority from the prior Japanese Patent Application No. 2014-105825 filed on May 22, 2014, the entire contents of which are incorporated herein by reference.
The embodiments disclosed herein relate to a voice processing device, a voice processing method, and a voice processing program for controlling, for example, a voice signal.
In recent years, voice processing devices and software applications that utilize the Voice over Internet Protocol (VoIP) in which packets converted from a voice signal are transferred on the real time basis by Internet access have been and are utilized. A voice processing device or a software application that utilize the VoIP has, in addition to an advantage that communication may be performed among a plurality of users without the intervention of a public switched telephone network, another advantage that text data or image data may be transmitted and received during communication. Further, for example, in Goode, B., “Voice over Internet protocol(VoIP),” Proceedings of the IEEE, vol. 90, issue 9, September 2002, also a method is disclosed by which, in a voice processing device that utilizes the VoIP, the influence of variation of communication delay by Internet access is moderated by a buffer of the voice processing device.
Since a voice processing device that utilizes the VoIP utilizes, different from a public switched telephone network that occupies a line, an existing Internet network, a delay of approximately 300 msec occurs until a voice signal reaches as communication reception sound. Therefore, for example, when a plurality of users perform voice communication, the users far from each other hear voices of the opponents only from communication reception sound. However, the users near to each other hear voice of each other from both of communication reception sound and direct sound in an overlapping relationship in a state in which the communication reception sound and the direct sound have a time lag of approximately 300 msec therebetween. This phenomenon gives rise to a problem that it becomes rather difficult for the users to hear the sound. It is an object of the present embodiments to provide a voice processing device that makes it easier to listen to sound.
In accordance with an aspect of the embodiments, a voice processing device, includes, a processor; and a memory which stores a plurality of instructions, which when executed by the processor, cause the processor to execute: receiving, through a communication network, a first voice of a first user and a second voice of a second user inputted to a first microphone positioned nearer to the first user than the second user, and a third voice of the first user and a fourth voice of the second user inputted to a second microphone positioned nearer to the second user than the first user; calculating a first phase difference between the received first voice and the received second voice and a second phase difference between the received third voice and the received fourth voice; and performing at least one of: controlling transmission of the received second voice or the received fourth voice to a first speaker positioned nearer to the first user than the second user on the basis of the first phase difference and the second phase difference, and controlling transmission of the received first voice or the received third voice to a second speaker positioned nearer to the second user than the first user on the basis of the first phase difference and the second phase difference.
The object and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the claims. It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention, as claimed.
These and/or other aspects and advantages will become apparent and more readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawing of which:
In the following, a working example of a voice processing device, a voice processing method, and a voice processing program according to one embodiment is described with reference to the drawings. It is to be noted that the working example does not restrict the technology disclosed herein.
In the working example 1, for the convenience of description, it is assumed that the first user and the second user exist on the same base (which may be referred to as floor) and are positioned in an adjacent relationship to each other. Further, a first voice of the first user and a second voice of the second user are inputted to the first microphone 9 (in other words, even if the first user performs utterance to the first microphone 9, also the second microphone 11 picks up the utterance). Meanwhile, a third voice of the first user and a fourth voice of the second user are inputted to the second microphone 11 (in other words, even if the second user performs utterance to the second microphone 11, also the first microphone 9 picks up the utterance). Here, the first and third voices are voices within an arbitrary time period (which may be referred to as temporal segment) within which the first user performs utterance in a time series, and the second and fourth voices are voices within an arbitrary time period (which may be referred to as temporal segment) within which the second user performs utterance in a time series. Further, the utterance contents of the first and third voices are same as each other and the utterance contents of the second and fourth voices are same as each other. In other words, where a positional relationship among the first user, second user, first microphone 9, and second microphone 11 in
The reception unit 2 is, for example, a hardware circuit configured by hard-wired logic. The reception unit 2 may alternatively be a functional module implemented by a computer program executed by the voice processing device 1. The reception unit 2 receives a plurality of input voices (which may be referred to as a plurality of voices) inputted to the first microphone 9 to nth microphone 13 through the first terminal 6 to nth terminal 8 and the network 117 as an example of a communication network. It is to be noted that the process described corresponds to step S201 of the flow chart depicted in
The calculation unit 3 is, for example, a hardware circuit configured by hard-wired logic. The calculation unit 3 may alternatively be a functional module implemented by a computer program executed by the voice processing device 1. The calculation unit 3 receives a plurality of voices (which may be referred to as a plurality of input voices) including the first, second, third, and fourth voices from the reception unit 2. The calculation unit 3 distinguishes input voices inputted to the first and second microphones 9 and 11 between a voiced temporal segment and an unvoiced temporal segment and uniquely specifies the first, second, third, and fourth voices from within the voiced temporal segment.
First, a method for distinguishing an input voice between a voiced temporal segment and an unvoiced temporal segment by the calculation unit 3 is described. It is to be noted that the process described corresponds to step S202 of the flow chart depicted in
Here, details of a detection process of a voiced temporal segment and an unvoiced temporal segment by the calculation unit 3 are described.
In
It is to be noted that, in the (Expression 1) given above, n is a frame number successively applied to each frame after inputting of an acoustic frame included in the input voice is started (n is an integer equal to or greater than zero); M is a time length of one frame; t is time; and c(t) is an amplitude (power) of the input voice.
The noise estimation unit 21 receives a sound volume S(n) of each frame from the sound volume calculation unit 20. The noise estimation unit 21 estimates noise in each frame and outputs a result of the noise estimation to the average SNR calculation unit 22. Here, for the noise estimation of each frame by the noise estimation unit 21, for example, a (noise estimation method 1) or a (noise estimation method 2) given below may be used.
(Noise Estimation Method 1)
The noise estimation unit 21 may estimate a magnitude (power) N(n) of noise in each frame n on the basis of the sound volume S(n) in the frame n, sound volume S(n-1) in the preceding frame (n-1), and magnitude N(n-1) of noise in accordance with the following expression:
It is to be noted that, in the (Expression 2) above, α and β are constants and may be determined experimentally. For example, α may be equal to 0.9 and β may be equal to 2.0. Also the initial value N(−1) of the noise power may be determined experimentally. If, in the (Expression 2) given above, the sound volume S(n) of the frame n does not vary by equal to or more than the fixed value β with respect to the sound volume S(n-1) of the immediately preceding frame n-1, then the noise power N(n) of the frame n is updated. On the other hand, if the sound volume S(n) of the frame n varies by equal to or more than the fixed value β with respect to the sound volume S(n-1) of the immediately preceding frame n-1, then the noise power N(n-1) of the immediately preceding frame n-1 is determined as the noise power N(n) of the frame n. It is to be noted that the noise power N(n) may be referred to also as the noise estimation result described above.
(Noise Estimation Method 2)
The noise estimation unit 21 may update the magnitude of noise on the basis of the ratio between the sound volume S(n) of the frame n and the noise power N(n-1) of the immediately preceding frame n-1 and in accordance with the following (Expression 3):
It is to be noted that, in the (Expression 3) above, γ is a constant and may be determined experimentally. For example, γ may be equal to 2.0. Also the initial value N(−1) of the noise power may be determined experimentally. In the (Expression 3) above, if the sound volume S(n) of the frame n is equal to or smaller by a fixed value of γ times than the noise power N(n-1) of the immediately preceding frame n-1, then the noise power N(n) of the frame n is updated. On the other hand, if the sound volume S(n) of the frame n is equal to or greater by the fixed value of γ times than the noise power N(n-1) of the immediately preceding frame n-1, then the noise power N(n-1) of the immediately preceding frame n-1 is determined as the noise power N(n) of the frame n.
Referring to
It is to be noted that, in the (Expression 4) above, L may be set to a value higher than the value of a general length of an assimilated sound and may be, for example, determined in accordance with the number of frames corresponding to 0.5 msec.
The temporal segment determination unit 23 receives an average SNR from the average SNR calculation unit 22. The temporal segment determination unit 23 has a buffer or a cache not depicted, in which a flag n_breath indicative of whether or not a pre-processed frame by the temporal segment determination unit 23 is within a voiced temporal segment (in other words, within a breath temporal segment) is retained. The temporal segment determination unit 23 detects a start end tb of a voiced temporal segment in accordance with the following (Expression 5) and detects a last end to of the voiced temporal segment in accordance with the following (Expression 6) on the basis of the average SNR and the flag n_breath:
tb=n×M (Expression 5)
(if n_breath=not voiced temporal segment and SNR(n)>THSNR)
te=n×M−1 (Expression 6)
(if n_breath=voiced temporal segment and SNR(n)<THSNR)
Here, THSNR is a threshold value for the consideration by the temporal segment determination unit 23 that the processed frame does not have noise and may be determined experimentally. Further, the temporal segment determination unit 23 may detect a temporal segment of an input voice other than voiced temporal segments as an unvoiced temporal segment.
Now, a method of uniquely specifying a first voice, a second voice, a third voice, and a fourth voice from within a voiced temporal segment by the calculation unit 3 is described. It is to be noted that this process corresponds to step S203 of the flow chart depicted in
First, the calculation unit 3 identifies, for example, from the input voice inputted to the first microphone 9 and the input voice inputted to the second microphone 11, candidates for the first voice and the third voice, which represent the same utterance contents, on the basis of a first correlation between the first voice and the third voice. The calculation unit 3 calculates a first correlation R1(d) that is a cross-correlation between an arbitrary voiced temporal segment ci(t) included in the input voice inputted to the first microphone 9 and an arbitrary voiced temporal segment cj(t) included in the input voice inputted to the second microphone 11 in accordance with the following expression:
It is to be noted that, in the (Expression 7) above, tbi is a start point of the voiced temporal segment ci(t), and tei is an end point of the voiced temporal segment ci(t). Further, tbj is a start point of the voiced temporal segment cj(t), and tej is an end point of the voiced temporal segment cj(t). Further, m=tbj−tbi, and L=tbe−tbi.
Further, when the maximum value of the first correlation R1(d) is higher than an arbitrary threshold value MAX_R (for example, MAX_R=0.95), the calculation unit 3 decides, in accordance with the expression given below, that the utterance contents within the voiced temporal segment ci(t) and within the voiced temporal segment cj(t) are same as each other (in other words, the calculation unit 3 associates the first voice and the third voice with each other).
It is to be noted that, if, in the (Expression 8) above, a difference |(tei−tbi)−(tej−tbj) | between lengths of the voiced temporal segments is greater than an arbitrary threshold value TH_dL (for example, TH_dL=1 second), then the voiced temporal segments may be excluded from a determination target in advance by determining that the utterance contents therein are different from each other. While the description of the working example 1 is directed to the identification method of candidates for the first voice and the third voice, the identification method of candidates for the first voice and the third voice may be similarly applied also to the identification method of candidates for the second voice and the fourth voice. The calculation unit 3 identifies candidates, for example, for the second voice and the fourth voice, which have the same utterance contents, from the input voice inputted from the first microphone 9 and the input voice inputted from the second microphone 11 on the basis of a second correlation R2(d) between the second voice and the fourth voice. To the second correlation R2(d), the right side of the (Expression 7) given hereinabove may be applied as it is.
Then, the calculation unit 3 identifies the voiced temporal segments associated with each other determining that they have the same utterance contents in regard to whether each of the voiced temporal segments includes the utterance of the first user or of the second user. For example, the calculation unit 3 compares average Root Mean Square (RMS) values representing voice levels (which may be referred to as amplitudes) of the two voiced temporal segments associated with each other determining that, for example, they have the same utterance contents (in other words, candidates for the first voice and the third voice or candidates for the second voice and the fourth voice identified in accordance with the (Expression 7) and the (Expression 8) given hereinabove). Then, the calculation unit 3 specifies the microphone from which the input voice including the voiced temporal segment that has a comparatively high value from between the average RMS values is inputted and may specify the user on the basis of the specified microphone. Further, by specifying the user, it is possible to uniquely specify the first voice and the second voice or to uniquely specify the third voice and the fourth voice. For example, if the positional relationship of the first user, second user, first microphone 9, and second microphone 11 in
The estimation unit 4 of
dm=(first phase difference+second phase difference)/2×vs (Expression 9)
It is to be noted that, in the (Expression 9) above, vs is the speed of sound. The estimation unit 4 may use comparison between the first and second phase differences to calculate the total value of the first and second phase differences in place of the estimation of the estimated distance. The estimation unit 4 outputs the estimated distance between the first microphone 9 and the second microphone 11 or the total value of the first and second phase differences to the controlling unit 5.
Here, the technological significance of the estimation of the distance between the first microphone 9 and the second microphone 11 through comparison of the first and second phase differences by the estimation unit 4 is described. As a result of intensive verification of the inventors of the present technology, the technological matters described below were found out newly. For example, when the first microphone 9 and the second microphone 11 or the first terminal 6 and the second terminal 7 are compared with each other, if one of the two microphones or the two terminals is in a state subject to an additional process such as, for example, noise reduction or velocity adjustment, then a delay Δt occurs as a result of the additional process. Further, the delay Δt is caused also by a difference between the line speed between the first terminal 6 and the network 117 and the line speed between the second terminal 7 and the network 117. Although the delay Δt by the difference in the line speeds does not originate from an additional process, it is assumed that the delay Δt is used in a unified manner for the convenience of description.
Further, a qualitative reason why the distance between the first microphone 9 and the second microphone 11 may be estimated accurately through comparison between the first and second phase differences by the estimation unit 4 is described. Since the first voice and the third voice of the first user are inputted to the first microphone 9 and the second microphone 11, respectively, a phase difference between the input voices of the first user to the first microphone 9 and the second microphone 11 may be obtained. Further, since the second voice and the fourth voice of the second user are inputted to the first microphone 9 and the second microphone 11, respectively, a phase difference between the input voices of the second user to the first microphone 9 and the second microphone 11 may be obtained.
Here, for example, where the delay amount until the input voice is inputted to the reception unit 2 of the voice processing device 1 is different between the first microphone 9 and the second microphone 11, for example, if the phase difference between the voices of the first user is determined with reference to the first microphone 9 used by the first user, then the determined phase difference is equal to the total value of the phase difference caused by the distance between the users and the delay in the other microphone (second microphone 11) with respect to the delay in the reference microphone (first microphone 9). Therefore, the phase difference between the voices of the first user is the total value of the delay amount caused by the distance between the first user and the second user and the delay amount in the second microphone 11 with respect to the first microphone 9. Meanwhile, the phase difference between the voices of the second user is the total value of the delay amount caused by the distance between the first user and the second user and the delay amount in the first microphone 9 with respect to the second microphone 11. Since the delay amount in the second microphone 11 with respect to the first microphone 9 and the delay amount by the first microphone 9 with respect to the second microphone 11 are equal in absolute value but are different in sign, by combining the phase difference in voice of the first user and the phase difference in voice of the second user, the delay amount in the second microphone 11 with respect to the first microphone 9 and the delay mount in the first microphone 9 with respect to the second microphone 11 may be removed from the phase difference.
Referring to
Further, when the estimated distance between the first microphone 9 and the second microphone 11 or the total value of the first and second phase differences is equal to or greater than the given first threshold value, the controlling unit 5 controls transmission of a plurality of voices (for example, the second voice and the fourth voice) other than the first voice or the third voice to the first speaker 10 and controls transmission of a plurality of voices (for example, the first voice and the third voice) other than the second voice or the fourth voice to the second speaker 12. In particular, when the estimated distance between the first microphone 9 and the second microphone 11 or the total value of the first and second phase differences is equal to or greater than the first threshold value, since this fact signifies that the distance between the first user and the second user is great, the users hear the voices of the opponents only from communication reception sound. Therefore, the controlling unit 5 controls the first speaker 10 to output voices other than the first voice or the third voice which are voices of the first user. Meanwhile, the controlling unit 5 controls the second speaker 12 to output voices other than the second voice or the fourth voice which are voices of the second user. As a result of the control described, the first user or the second user is placed out of a situation in which the voice of the first user or the second user itself is heard from both of communication reception sound and direct sound in a superposed relationship with a time lag interposed therebetween. Therefore, there is an advantage that the voices may be heard easily.
In the voice processing device 1 of the working example 1, when a plurality of users communicate with each other, the distance between the users is estimated accurately. Further, where the distance between the users is small, the users are placed out of a situation in which the voices of the opponents are heard from both of communication reception sound and direct sound in a superposed relationship with a time lag interposed therebetween. Therefore, the voices may be heard easily.
While, in the description of the working example 1, a voice process whose subject is a first user and a second user is described, also where three or more users communicate with each other, the present embodiment may accurately estimate the distances between the users. Therefore, in the description of a working example 2, a voice process whose subject is the first terminal 6 corresponding to the first user to the nth terminal 8 corresponding to the nth user of
The calculation unit 3 determines a reference voice and stores a terminal number of an origination source of the reference voice into n (step S803). In particular, at step S803, the calculation unit 3 calculates, for each voiced temporal segment of each of the plurality of input voices, a voice level vi in accordance with the following expression:
In the (Expression 10) above, ci(t) is an input voice i from the ith terminal, and vi is a voice level of the input voice i. tbi and tei are a start frame (which may be referred to as start point) and an end frame (which may be referred to as end point) of a voiced temporal segment of the input voice i, respectively. Then, the calculation unit 3 compares the values of a plurality of voice levels vi calculated in accordance with the (Expression 10) given above with each other and estimates the input voice i having the highest value as the terminal number of the origination source of the utterance. In the description of the working example 2, the following description is given assuming that the terminal number estimated as the origination source is n (nth terminal 8) for the convenience of description.
The calculation unit 3 sets i=0 (step S804) and then determines whether or not conditions at step S805 (that i not equal n and that a voiced temporal segment of ci(t) and a voiced temporal segment of cn(t) are the same as each other) are satisfied, for example, on the basis of the (Expression 7) and the (Expression 8) given above. If the conditions at step S805 are satisfied (Yes at step S805), then the calculation unit 3 specifies the mth input voice i that satisfies the condition of the same voiced temporal segment as the input voice km. It is to be noted that, if the conditions at step S805 are not satisfied (No at step S805), then the processing advances to step S809.
θ(n, km)=tbn−tbkm (Expression 11)
Then, the calculation unit 3 refers to the table 91 to decide whether or not the phase difference θ(km, n) between the input voice n and the input voice km is recorded already in the table 92 (step S807). If the phase difference θ(km, n) is recorded already (Yes at step S807), then the calculation unit 3 updates the value of the table 92 on the basis of the expression given below (step S808). It is to be noted that the condition at step S807 is not satisfied (No at step S807), then the processing advances to step S809.
θ(n, km)=(θ(n, km)+θ(km, n))/2
θ(km, n)=(θ(n, km)+θ(km, n))/2 (Expression 11)
In the (Expression 12) above, θ(km, n) has a value calculated in accordance with the following expression when the terminal number estimated as the origination source is km and the voiced temporal segment of ckm(t) is same as the voiced temporal segment of cn(t):
θ(km, n)=tbkm−tbn (Expression 13)
It is to be noted that an initial value of the table 92 may be set to a value equal to or higher than an arbitrary threshold value TH_OFF indicative of the fact that the distance between the terminals (between the microphones) is sufficiently great. Also it is to be noted that the value of the threshold value TH_OFF may be, for example, 30 ms that is a phase difference arising from a distance of, for example, approximately 10 m. Alternatively, the value of the threshold value TH_OFF may be inf indicating that the threshold value TH_OFF is equal to or higher than a value that may be set.
After the process at step S808 is completed or when the condition of No at step S805 or No at step S807 is satisfied, the calculation unit 3 increments i (step S809) and then decides whether or not i is smaller than the number of terminals (step S810). If the condition at step S810 is not satisfied (No at step S810), then the processing returns to step S804. If the condition at step S810 is satisfied (Yes at step S810), then the voice processing device 1 completes the process depicted in the flow chart of
Now, a controlling method of an output voice based on the table 92 by the voice processing device 1 is described.
Then, the controlling unit 5 sets the terminal numbers k other than the terminal number m to 0 (step S1004). The controlling unit 5 refers to the table 92 to detect an inter-terminal phase difference θ(n, k) between the terminal number n and the terminal number k in regard to the terminal numbers k (k not equal n, k=0, . . . , N-1) other than the terminal number n and decides whether or not the inter-terminal phase difference θ is smaller than the threshold value TH_OFF (step S1005). If the condition at step S1005 is not satisfied (No at step S1005), then the processing is advanced to step S1007. If the condition at step S1005 is satisfied (Yes at step S1005), then the controlling unit 5 updates the output voice on(t) in accordance with the following expression (step S1006):
on(t)=on(t)+ck(t) (Expression 14)
After the process at step S1006 is completed or in the case of No at step S1005, the controlling unit 5 increments k (step S1007) and decides whether or not the number of the terminal numbers k is smaller than the number of terminals N (step S1008). If the condition at step S1008 is not satisfied (No at step S1008), then the processing returns to step S1005. However, if the condition at step S1008 is satisfied (Yes at step S1008), then the controlling unit 5 outputs the output voice on(t) to the terminal number n (step S1009). Then, the controlling unit 5 increments n (step S1010) and decides whether or not n is smaller than the number of terminals (step S1011). If the condition at step S1011 is not satisfied (No at step S1011), then the processing returns to the process at step S1003. If the condition at step S1011 is satisfied (Yes at step S1011), then the voice processing device 1 completes the process illustrated in the flow chart of
The computer 100 is controlled entirely by a processor 101. To the processor 101, a Random Access Memory (RAM) 102 and a plurality of peripheral apparatuses are coupled through a bus 109. It is to be noted that the processor 101 may be a multiprocessor. Further, the processor 101 is, for example, a Central Processing Unit (CPU), a Micro Processing Unit (MPU), a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC) or a Programmable Logic Device (PLD). Further, the processor 101 may be a combination of two or more of a CPU, an MPU, a DSP, an ASIC, and a PLD. It is to be noted that, for example, the processor 101 may execute processes of functional blocks such as the reception unit 2, calculation unit 3, estimation unit 4, controlling unit 5 and so forth depicted in
The RAM 102 is used as a main memory of the computer 100. The RAM 102 temporarily stores at least part of a program of an Operating System (OS) and application programs to be executed by the processor 101. Further, the RAM 102 stores various data to be used for processing by the processor 101. The peripheral apparatuses coupled to the bus 109 include a Hard Disk Drive (HDD) 103, a graphic processing device 104, an input interface 105, an optical drive unit 106, an apparatus coupling interface 107, and a network interface 108.
The HDD 103 performs writing and reading out of data magnetically on and from a disk built therein. The HDD 103 is used, for example, as an auxiliary storage device of the computer 100. The HDD 103 stores a program of an OS, application programs, and various data. It is to be noted that also a semiconductor storage device such as a flash memory may be used as an auxiliary storage device.
A monitor 110 is coupled to the graphic processing device 104. The graphic processing device 104 controls the monitor 110 to display various images on a screen in accordance with an instruction from the processor 101. The monitor 110 may be a display unit that uses a Cathode Ray Tube (CRT), a liquid crystal display unit or the like.
To the input interface 105, a keyboard 111 and a mouse 112 are coupled. The input interface 105 transmits a signal sent thereto from the keyboard 111 or the mouse 112 to the processor 101. It is to be noted that the mouse 112 is an example of a pointing device and may be configured using a different pointing device. As the different pointing device, a touch panel, a tablet, a touch pad, a track ball and so forth are available.
The optical drive unit 106 performs reading out of data recorded on an optical disc 113 utilizing a laser beam or the like. The optical disc 113 is a portable recording medium on which data are recorded so as to be read by reflection of light. As the optical disc 113, a Digital Versatile Disc (DVD), a DVD-RAM, a Compact Disc Read Only Memory (CD-ROM), a CD-R (Recordable)/RW (ReWritable) and so forth are available. A program stored on the optical disc 113 serving as a portable recording medium is installed into the voice processing device 1 through the optical drive unit 106. The given program installed in the voice processing device 1 is enabled for execution.
The apparatus coupling interface 107 is a communication interface for coupling a peripheral apparatus to the computer 100. For example, a memory device 114 or a memory reader-writer 115 may be coupled to the apparatus coupling interface 107. The memory device 114 is a recording medium that incorporates a communication function with the apparatus coupling interface 107. The memory reader-writer 115 is an apparatus that performs writing of data into a memory card 116 and reading out of data from the memory card 116. The memory card 116 is a card type recording medium.
The network interface 108 is coupled to the network 117. The network interface 108 performs transmission and reception of data to and from a different computer or a communication apparatus through the network 117. For example, the network interface 108 receives a plurality of input voices (which may be referred to as a plurality of voices) inputted to the first microphone 9 to nth microphone 13 depicted in
The computer 100 implements the voice processing function described hereinabove by executing a program recorded, for example, on a computer-readable recording medium. A program that describes the contents of processing to be executed by the computer 100 may be recorded on various recording media. The program may be configured from one or a plurality of functional modules. For example, the program may be configured from functional modules that implement the processes of the reception unit 2, calculation unit 3, estimation unit 4, controlling unit 5 and so forth depicted in
The components of the devices and the apparatus depicted in the figures need not necessarily be configured physically in such a manner as in the figures. In particular, a particular form of integration or disintegration of the devices and apparatus is not limited to that depicted in the figures, and all or part of the devices and apparatus may be configured in a functionally or physically integrated or disintegrated manner in an arbitrary unit in accordance with loads, use situations and so forth of the devices and apparatus. Further, the various processes described in the foregoing description of the working examples may be implemented by execution of a program prepared in advance by a computer such as a personal computer or a work station.
All examples and conditional language recited herein are intended for pedagogical purposes to aid the reader in understanding the invention and the concepts contributed by the inventor to furthering the art, and are to be construed as being without limitation to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although the embodiments of the present invention have been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.
Number | Date | Country | Kind |
---|---|---|---|
2014-105825 | May 2014 | JP | national |