1) Field of the Invention
The present invention relates to a speech input device that requires speech input such as recording equipment, a cellular phone terminal or a personal computer.
2) Description of the Related Art
In recent years, a data communication function for transmitting and receiving text data of about several hundred characters is often installed, as a standard equipment, into a portable terminal such as a cellular phone terminal or a personal handyphone system (PHS) terminal besides a telephone conversation function.
According to IMT-2000 (International Mobile Telecommunications-2000) that is a next-generation communication scheme, one portable terminal uses a plurality of lines, and it is thereby possible to perform data communication without disconnecting speech communication while the speech communication is being held. Accordingly, the portable terminal of this type may possibly be used in a case where text is input by operating keys during a telephone conversation and then data communication is also performed.
In recent years, an attention has been paid to an Internet Protocol (IP) telephone system that requires a less expensive call charge than that of an ordinary telephone call. This IP telephone system is referred to as an Internet telephone system. This is a communication system enabling a telephone conversation similarly to an ordinary telephone by exchanging speech data between IP telephone devices each of which is provided with a microphone and a loudspeaker.
The IP telephone device is a computer that enables network communication and is equipped with an e-mail transmitting/receiving function through the operation of a man-machine interface such as a keyboard and a mouse.
Meanwhile, as explained above, if a man-machine interface (keys, keyboard, mouse) is operated during a telephone conversation using a conventional portable terminal or an IP telephone device, then an operation sound (click sound or the like) which is regarded as noise is captured by the microphone, and superimposed on speech. Therefore, tone quality is disadvantageously, greatly deteriorated.
To solve this problem, it may be considered to employ a method of eliminating the component of the noise (operation sound) contained in speech signals that are input into the microphone by means of a noise elimination device. According to this method, however, the side of the noise elimination device cannot predict the occurrence of an operation sound, and therefore noise elimination processing always needs to be executed to the sound signal that is input into the microphone. With this method, therefore, the noise elimination processing is conducted to the sound signal even if no noise is present, unavoidably causing the deterioration of tone quality.
It is an object of the present invention to provide a speech input device capable of efficiently eliminating an operation sound regarded as noise that is produced when a man-machine interface is operated and enhancing tone quality.
The speech input device according to one aspect of this invention comprises a speech input unit which inputs speech, a detection unit which detects an operation of a man-machine interface, and a noise eliminator which eliminates a component of an operation sound of the man-machine interface from the speech that is input into the speech input unit within a period in which the operation is detected by the detection unit.
The speech input device according to another aspect of this invention comprises a speech input unit which inputs speech, and a control unit which outputs a control signal for controlling respective sections based on an operation signal indicating that a man-machine interface is operated. The speech input device also comprises a detection unit which detects an operation of the man-machine interface based on the control signal, and a noise eliminator which eliminates a component of an operation sound of the man-machine interface from the speech that is input into the speech input unit within a period in which the operation is detected by the detection unit.
The speech input device according to still another aspect of this invention comprises a speech input unit which inputs speech, a speech information accumulation unit which accumulates information on the speech that is input into the speech input unit, a detection unit which detects an operation of a man-machine interface, and a noise eliminator which reads the speech information from the speech information accumulation unit when the operation is detected by the detection unit, and which eliminates a component of an operation sound of the man-machine interface from the speech that is input into the speech input unit within an operation-detected period.
The speech input device according to still another aspect of this invention comprises a speech input unit which inputs speech, and a detection unit which detects an operation of a man-machine interface and outputs information for an operation time which corresponds to a start of the operation and an end of the operation. The speech input device also comprises a noise eliminator which eliminates a component of an operation sound of the man-machine interface from the speech that is input into the speech input unit within an operation-detected period, the period being determined based on the information for the operation time when the operation is detected by the detection unit.
The speech input method according to still another aspect of this invention comprises steps of inputting speech, detecting an operation of a man-machine interface, and eliminating a component of an operation sound of the man-machine interface from the speech that is input in the speech inputting step within a period in which the operation is detected in the detection step.
The speech input program, according to still another aspect of this invention, that allows a computer to function as the components in the above-mentioned devices, respectively.
The speech input device according to still another aspect of this invention comprises a speech input unit which inputs speech, a detection unit which detects an operation of a man-machine interface, and a suppression processing unit which suppresses a period in which the operation of the man-machine interface is detected, in the speech that is input into the speech input unit within the period in which the operation is detected by the detection unit.
The speech input method according to still another aspect of this invention comprises steps of inputting speech, detecting an operation of a man-machine interface, and suppressing a period in which the operation of the man-machine interface is detected, in the speech that is input in the speech inputting step within the period in which the operation is detected in the detecting step.
The speech input program, according to still another aspect of this invention, that allows a computer to function as the components in the above-mentioned device.
These and other objects, features and advantages of the present invention are specifically set forth in or will become apparent from the following detailed descriptions of the invention when read in conjunction with the accompanying drawings.
The present invention relates to a speech input device that requires speech input such as recording equipment, a cellular phone terminal or a personal computer. More particularly, the present invention relates to the speech input device capable of efficiently eliminating an operation sound (click sound or the like) which is regarded as noise produced when a man-machine interface such as a key or a mouse is operated in parallel to speech input, and enhancing tone quality.
Embodiments of the speech input device according to the present invention will be explained below in detail with reference to the drawings.
A key section 20 shown in
During this operation, an operation sound (click sound) is produced. This key click sound is captured by a microphone 60 explained later during a telephone conversation and is input while being superimposed on speech by a speaker.
A key signal S1 that corresponds to a key code or the like is output from the key section 20 during the operation of the key section 20. A key entry detector 30 outputs a key detection signal S2 indicating that a corresponding key has been operated in response to input of the key signal S1.
A controller 40 generates a control signal (digital) based on the key signal S1 and controls respective sections. For example, the controller 40 performs controls such as interpreting text from the key signal S1 and displaying this text on a display 50 (see
The microphone 60 (see
A noise eliminator 90 functions to eliminate the component of the operation sound in an interval in which the component of the operation sound is superimposed on the speech signal from the first memory 80 as noise, while using the key detection signal S2 as a trigger.
Specifically, as will be explained later, the noise is eliminated by performing waveform interpolation (see
The write section 100 writes the speech signal (or the speech signal from which the operation sound component is eliminated) from the noise eliminator 90 in a second memory 110. An encoder 120 encodes the speech signal from the second memory 110. A transmitter 130 transmits the output signal of the encoder 120.
The operation of the first embodiment will next be explained with reference to flow charts shown in
At step SA1 shown in
Accordingly, the A/D converter 70 outputs the result of determination as “Yes” at step SA1. At step SA2, the A/D converter 70 digitizes the analog speech signal. At step SA3, the speech signal (digital) from the A/D converter 70 is stored in the first memory 80.
At step SA4, the noise eliminator 90 determines whether or not the key detection signal S2 is input from the key entry detector 30. In this case, it is assumed that the determination result is “No” and the speech signal from the first memory 80 is directly output to the write section 100. At step SA5, the write section 100 stores the speech signal in the second memory 110.
At step SA6, the encoder 120 encodes the speech signal from the second memory 110. At step SA7, the transmitter 130 transmits the output signal thus encoded. Thereafter, a series of operations are repeated while the speech signal having a waveform shown in
When the key section 20 is operated at time t0 (see
In response to this, the noise eliminator 90 outputs the determination result of step SA4 as “Yes” and executes waveform interpolation at step SA8. This waveform interpolation is the processing in which a waveform in an N sample interval longer than an interval from time t0 to time t1 during which the operation sound is superimposed on the speech, is interpolated by a waveform which is a waveform before time t0 and which has a high correlation coefficient (
Specifically, at step SB1 shown in
ps≦k≦pe
ps: starting point of search interval of k sample,
pe: end point of search interval of k sample,
x[]: input speech signal, and
t0: starting time of detecting operation sound.
The correlation coefficient represents the correlation between a waveform A in an M sample interval just before time t0 (see
At steps SB1 to SB5 to be explained next, while the M sample interval is shifted rightward one by one from the starting point ps within the search interval of k sample (“k sample search interval”), the coefficient of the correlation between the waveform A and a waveform (in the M sample interval) in the k sample search interval is calculated from the equation (1).
At step SB2, the noise eliminator 90 calculates the coefficient of the correlation between the waveform A and a waveform B at k=0, from the equation (1). At step SB3, the noise eliminator 90 stores information for calculated intervals (for the M samples from the starting point ps) each in which the correlation of the correlation is calculated and stores the correlation coefficients in a memory (not shown). At the step SB4, the noise eliminator 90 determines whether or not a waveform (the waveform B in this case) corresponding to the waveform A is in the k sample search interval and outputs a determination result of “Yes” in this case.
At step SB5, the noise eliminator 90 increments k in the equation (1) by one. Accordingly, a waveform which is shifted rightward from the waveform shown in
If the determination result at step SB4 becomes “No”, the noise eliminator 90 calculates time tL at which the correlation coefficient cor[k] becomes the highest from the following equation (2) at step SB6. The correlation coefficient cor[k] is calculated from the equation (1).
In the equation (2), “arg max(cor[k])” is a function which indicates that the time tL at which the correlation coefficient cor[k] becomes the highest is to be calculated in the period from the starting point ps to the end point pe shown in
At step SB7, the noise eliminator 90 interpolates a waveform (which includes an operation sound component) in an N sample interval from time t0 by the waveform in an N sample interval from time tm indicating the right end of the waveform C. Accordingly, in the first embodiment, the waveform is interpolated by the waveform D as shown in
As explained so far, according to the first embodiment, when the operation of the key section 20 which serves as the man-machine interface is detected, the waveform interpolation shown in
In the first embodiment, the configuration example in which the key detection signal S2 is output based on the key signal S1 from the key section 20 shown in
This key entry detector 210 generates a key detection signal S2 from a control signal (digital signal) from a controller 40 and outputs the key detection signal S2 to the noise eliminator 90. It is noted that the basic operations of the second embodiment are the same as those of the first embodiment except for the above operation.
As explained so far, the second embodiment can obtain the same advantages as those of the first embodiment.
In the second embodiment, the configuration example in which the first memory 80 shown in
As explained so far, the third embodiment can obtain the same advantages as those of the first embodiment.
In the first embodiment, the configuration example in which the key detection signal S2 is output based on the key signal S1 from the key section 20 shown in
The A/D converter 410 digitizes a key signal S1 (analog signal) from the key section 20. The key signal holder 420 holds the key signal (digital signal) from the A/D converter 410. The key entry detector 430 generates the key detection signal S2 based on the key signal which is held in the key signal holder 420 and outputs the key detection signal S2 to the noise eliminator 90. The basic operations of the fourth embodiment are the same as those of the first embodiment except for the operations explained above.
As explained so far, the fourth embodiment can obtain the same advantages as those of the first embodiment.
In the first embodiment, the configuration example in which the key detection signal S2 is directly output from the key entry detector 30 to the noise eliminator 90 shown in
This detection time monitor 510 monitors a key entry while using the rise and fall of the key detection signal S2 (see
The noise eliminator 90 executes the processing for waveform interpolation based on the starting time of the operation (“operation start time”) and the end time of the operation (“operation end time”) that are obtained from the detection time signal S3. It is noted that the basic operations of the fifth embodiment are the same as those of the first embodiment except for the operations explained above.
As explained so far, the fifth embodiment can obtain the same advantages as those of the first embodiment.
In the fifth embodiment, the configuration example in which the detection time signal S3 is output from the detection time monitor 510 to the noise eliminator 90 shown in
The reference signal generator 610 generates a reference signal S4 having a fixed cycle (known) shown in
As explained so far, the sixth embodiment can obtain the same advantages as those of the first embodiment.
In each of the first to sixth embodiments, the configuration example in which the configuration of eliminating the component of the operation sound from the speech signal is applied to the portable terminal, has been explained. This configuration may be replaced by a configuration example in which the configuration of eliminating the component of the operation sound from the speech signal is applied to an IP telephone system. This configuration example will be explained below as a seventh embodiment.
The IP telephone device 710 includes a computer terminal 711, a keyboard 712, a mouse 713, a microphone 714, a loudspeaker 715, and a display 716. The IP telephone device 710 has a telephone function and a data communication function. The keyboard 712 and the mouse 713 are used to input text and perform various operations during the data communication. The microphone 714 converts speech of a speaker into speech signals during the telephone conversation. The loudspeaker 715 outputs the speech of a counterpart speaker during the telephone conversation.
The IP telephone device 720 has the same configuration as that of the IP telephone device 710. The IP telephone device 720 includes a computer terminal 721, a keyboard 722, a mouse 723, a microphone 724, a loudspeaker 725, and a display 726. The IP telephone device 720 has a telephone function and a data communication function. The keyboard 722 and the mouse 723 are used to input text and perform various operations during the data communication. The microphone 724 converts the speech of a speaker into speech signals during the telephone conversation. The loudspeaker 725 outputs the speech of a counterpart speaker during the telephone conversation.
A key/mouse entry detector 717 detects a key signal indicating that the keyboard 712 is operated and a mouse signal indicating that the mouse 713 is operated, and outputs the result of detection as a key/mouse detection signal.
In the seventh embodiment, when the keyboard 712 or the mouse 713 is operated during a telephone conversation, an operation sound is captured by the microphone 714 and superimposed on a speech signal. A controller 718 generates a control signal based on the key signal or the mouse signal. The controller 718 controls the respective sections based on the control signal.
A detection time monitor 719 monitors a key entry while using the rise and fall of the key/mouse detection signal from the key/mouse entry detector 717 as triggers. The detection time monitor 719 outputs the time of the rise (operation start time) and the time of the fall (operation end time) to the noise eliminator 90 as a detection time signal. The noise eliminator 90 executes the processing for waveform interpolation based on the operation start time and the operation end time which are obtained from the detection time signal.
The basic operations of the seventh embodiment are the same as those of the first embodiment except for the operations explained above. Namely, if the keyboard 712 or the mouse 713 is operated during a telephone conversation, an operation sound is captured by the microphone 714 and superimposed on a speech signal. Accordingly, the noise eliminator 90 executes the waveform interpolation processing in the same manner as that of the first embodiment to thereby eliminate the component of the operation sound from the speech signal and enhance tone quality.
As explained so far, the seventh embodiment can obtain the same advantages as those of the first embodiment.
The first to seventh embodiments of the present invention have been explained in detail so far with reference to the drawings. The concrete configuration examples of the invention are not limited to these first to seventh embodiments. Any changes and the like in design within the scope of the spirit of the present invention are included in the present invention.
For example, in the first to seventh embodiments, a program which realizes the functions (waveform interpolation, waveform suppression of the speech signal, and the like) of the portable terminal or the IP telephone device may be recorded on a computer readable recording medium 900 shown in
The computer 800 shown in
The CPU 810 loads the program recorded on the recording medium 900 through the reader 850 and then executes the program, thereby realizing the functions. The recording medium 900 is exemplified by an optical disk, a flexible disk, a hard disk, and the like.
As explained so far, according to the present invention, when the operation of the man-machine interface is detected, the component of the operation sound of the man-machine interface is eliminated from the speech that is input within an operation-detected period. Therefore, it is advantageously possible to efficiently eliminate the operation sound as noise produced when the man-machine interface is operated, and to enhance tone quality.
According to the present invention, when the operation of the man-machine interface is detected, the component of the operation sound of the man-machine interface is eliminated from the speech that is input within an operation-detected period which is determined based on the information for the operation time. Therefore, it is advantageously possible to efficiently eliminate the operation sound as noise produced when the man-machine interface is operated, and to enhance tone quality.
According to the present invention, when the operation of the man-machine interface is detected, the information for an operation time is output based on a reference signal, and the component of the operation sound of the man-machine interface is eliminated from the speech that is input within an operation-detected period which is determined by this information for the operation time information. Therefore, it is advantageously possible to efficiently eliminate the operation sound as noise produced when the man-machine interface is operated, and to enhance tone quality.
According to the present invention, when the operation of the man-machine interface is detected, the component of the operation sound of the man-machine interface is eliminated from the speech that is input within the operation-detected period by performing waveform interpolation. Therefore, it is advantageously possible to efficiently eliminate the operation sound as noise produced when the man-machine interface is operated, and to enhance tone quality.
According to the present invention, when the operation of the man-machine interface is detected, a period in which the operation of the man-machine interface is detected, is suppressed in the speech that is input within the operation-detected period. Therefore, it is advantageously possible to efficiently eliminate the operation sound as noise produced when the man-machine interface is operated, and to enhance tone quality.
Although the invention has been described with respect to a specific embodiment for a complete and clear disclosure, the appended claims are not to be thus limited but are to be construed as embodying all modifications and alternative constructions that may occur to one skilled in the art which fairly fall within the basic teaching herein set forth.
Number | Date | Country | Kind |
---|---|---|---|
2002-093165 | Mar 2002 | JP | national |
Number | Name | Date | Kind |
---|---|---|---|
4843488 | Watatani et al. | Jun 1989 | A |
5930372 | Kuriyama | Jul 1999 | A |
6038532 | Kane et al. | Mar 2000 | A |
6240383 | Tanaka | May 2001 | B1 |
6320918 | Walker et al. | Nov 2001 | B1 |
6324499 | Lewis et al. | Nov 2001 | B1 |
6778959 | Wu et al. | Aug 2004 | B1 |
Number | Date | Country |
---|---|---|
0622724 | Nov 1994 | EP |
55-084010 | Jun 1980 | JP |
57-184334 | Nov 1982 | JP |
02-001661 | Jan 1990 | JP |
05-307432 | Nov 1993 | JP |
09-149157 | Jun 1997 | JP |
09-204290 | Aug 1997 | JP |
Number | Date | Country | |
---|---|---|---|
20030187640 A1 | Oct 2003 | US |