This application is based upon and claims the benefit of priority from Japanese Patent Application No. 2014-070717, filed on Mar. 31, 2014, and Japanese Patent Application No. 2014-070718, filed on Mar. 31, 2014. The entire disclosure of both these application is incorporated herein by reference.
1. Technical Field
The present disclosure generally relates to a voice response apparatus, a method for voice processing, and a recording medium having a program stored thereon.
2. Description of the Related Art
Voice response and voice interaction technologies may include a barge-in function, which allows voice input to interrupt a response signal or a response voice output by a voice interaction system. For example, the barge-in function may be used in a case where a user restates his original question or response due to an error in his speech or in voice recognition. The barge-in function may also be used in a case where or the user immediately performs the next input without waiting for a response, or changes his mind and retries. This, however, may pose the problem of degradation in performance of voice detection and voice recognition as a result of the response signal being mixed with the input from a voice input microphone.
A related approach that addresses the above problem focuses on frequency characteristics of the response signal. Specifically, the conventional approach assigns a smaller weight to a frequency bandwidth for voice detection in which the amount of the response signal is large. By assigning a smaller weight to a bandwidth containing a greater portion of the response signal, the related method may identify whether an input is a voice or non-voice. Thus, the related method can prevent degradation in performance of, for example, voice detection, if the frequency bandwidths of the response signal and of the input voice do not overlap much with each other.
However, when both the response signal and the input voice have frequency bandwidths that overlap, the related approach may have the problem of degradation in performance of, for example, voice detection.
Exemplary embodiments of the present disclosure may solve one or more of the above-noted problems. For example, the exemplary embodiments may provide a voice response technique for enabling accurate voice detection during output of a response signal.
According to a first aspect of the present disclosure, a voice response apparatus is disclosed. The voice response apparatus may include a memory storing instructions; and one or more processors configured to process the instructions to output a response voice, determine a second frequency bandwidth corresponding to the response voice, select a first frequency bandwidth that does not overlap with the second frequency bandwidth, and detect an input voice from an input signal using the first frequency bandwidth.
According to a second aspect of the present disclosure, a voice response apparatus is disclosed. The voice response apparatus may include a memory storing instructions; and one or more processors configured to process the instructions to store a plurality of response voices in the memory. The one or more processors may be further configured to detect a first frequency bandwidth of an input voice from an input signal, and select a second frequency bandwidth based on the first frequency. The one or more processors may be further configured to select, from among the plurality of response voices, a response voice containing a predetermined amount of components in the second frequency bandwidth, and output the selected response voice.
A voice processing method according to another aspect of the present disclosure may include outputting a response voice, determining a second frequency bandwidth corresponding to the response voice, selecting a first frequency bandwidth that does not overlap with the second frequency bandwidth, and detecting an input voice from an input signal using the first frequency bandwidth.
A voice processing method according to another aspect of the present disclosure may include storing a plurality of response voices in the memory, detecting a first frequency bandwidth of an input voice from an input signal, and selecting a second frequency bandwidth based on the first frequency. The voice processing method may further include selecting, from among the plurality of response voices, a response voice containing a predetermined amount of components in the second frequency bandwidth, and outputting the selected response voice.
A non-transitory computer-readable storage medium may store instructions that when executed by a computer enable the computer to implement a method. The method may include outputting a response voice, determining a second frequency bandwidth corresponding to the response voice, selecting a first frequency bandwidth that does not overlap with the second frequency bandwidth, and detecting an input voice from an input signal using the first frequency bandwidth.
A non-transitory computer-readable storage medium may store instructions that when executed by a computer enable the computer to implement a method. The method may include storing a plurality of response voices in the memory, detecting a first frequency bandwidth of an input voice from an input signal, and selecting a second frequency bandwidth based on the first frequency. The method may further include selecting, from among the plurality of response voices, a response voice containing a predetermined amount of components in the second frequency bandwidth, and outputting the selected response voice.
As illustrated in
Response selection unit 111 may select a response voice whose frequency bandwidth is known in advance, and may notify bandwidth selection unit 121 of the selected response voice. A response voice may refer to a voice that is output by voice processing system 1000. For example, a response voice may be a voice whose content represents a response to the content of a user's input voice.
Bandwidth selection unit 121 may select a frequency bandwidth excluding one or more frequencies of the response voice selected by response selection unit 111, and may notify voice detection unit 131 of bandwidth information indicating the selected bandwidth. For example, bandwidth selection unit 121 may select a bandwidth excluding at least part of the frequencies of the response voice. Exemplarily, bandwidth selection unit 121 may select a bandwidth excluding frequencies having large amounts of the response voice.
Voice detection unit 131 may use the bandwidth information to perform voice detection for an input voice signal. Voice detection unit 131 may use at least part of the selected bandwidth to perform the voice detection.
An exemplary method for voice processing implemented by the first exemplary embodiment is now described in detail with reference to the flowchart of
In step 101, response selection unit 111 may select a response voice whose frequency bandwidth is known in advance. Response selection unit 111 may notify the bandwidth selection unit 121 of the selected response voice. Exemplarily, bandwidth selection unit 121 may select a bandwidth that does not overlap (or excludes) at least part of the frequency bandwidth of the selected response voice.
In step 102, bandwidth selection unit 121 may notify voice detection unit 131 of bandwidth information indicating the selected bandwidth. In step 103, voice detection unit 131 may perform voice detection for the input voice in at least part of the bandwidth selected by bandwidth selection unit 121.
In step 104, voice detection unit may determine if a user input voice has been detected. If voice detection unit 131 detects a voice (“Yes” in step 104), the processing may return to step 101. If voice detection unit 131 does not detect a voice (“No” in step 104), the voice processing may terminate.
Since the voice processing system in this exemplary embodiment performs voice detection for an input signal using a bandwidth excluding the frequency bandwidth of a response voice, the voice detection for the input signal may be possible even during output of the response signal.
Voice response apparatus 2 may include a response selection unit 112, a bandwidth selection unit 122, a voice detection unit 132, a voice recognition unit 142, and a voice reproduction unit 152.
Response selection unit 112 may select a response voice from one or more response voices stored in the response voice storage unit 212, each response voice having a predetermined frequency bandwidth. Response selection unit 112 may further notify bandwidth selection unit 122 and voice reproduction unit 152 of the selected response voice.
Bandwidth selection unit 122 may select a bandwidth excluding the frequency bandwidth of the response voice selected by response selection unit 112, and may notify voice detection unit 132 of bandwidth information, which may be the selected bandwidth.
As the bandwidth information, bandwidth selection unit 122 may select a bandwidth excluding the entire bandwidth covering the response voice. In some aspects, the bandwidth of the response voice may vary over time and therefore, the selected bandwidth may vary accordingly. For example, as illustrated in
Bandwidth selection unit 122 may preliminarily divide the frequency bandwidth in which the voice detection is to be performed into multiple subbands, and discretely select relevant subbands. In some aspects, bandwidth selection unit 122 may weigh each subband depending on the amount of the response voice in the subband. Bandwidth selection unit 122 may assign a smaller weight to a subband in which the amount of the response voice is larger. Techniques of subband-based voice detection may be well known.
Voice detection unit 132 may receive an input voice from input unit 172 and the bandwidth information from the bandwidth selection unit 122 and perform voice detection for the input voice.
Once voice detection unit 132 detects a voice, voice detection unit 132 may notify voice reproduction unit 152 (to be described later) of the detection voice. Voice reproduction unit 152 may stop voice reproduction upon receiving the notification. In some aspects, the voice processing system of this exemplary embodiment can immediately stop the reproduction of the response voice upon successful voice detection, thereby more accurately performing subsequent processing, such as voice detection and voice recognition.
If the bandwidth information received from bandwidth selection unit 122 is weighted on a subband basis, voice detection unit 132 may vary a detection threshold according to the weight. For example, voice detection unit 132 may use the result of detection in a subband having a larger weight as a more reliable result. This may allow voice detection unit 132 to detect the voice more accurately. Voice recognition unit 142 may perform voice recognition for the voice input from input unit 172 Further, response selection unit 112 may select a response voice based on the result of the voice recognition by the voice recognition unit 142.
Voice reproduction unit 152 may cause output unit 162 to reproduce the response voice selected by the response selection unit 112.
A method for voice processing according to the second exemplary embodiment will now be described in detail with reference to the flowchart of
Response selection unit 112 may select, from response voice storage unit 212, a response voice whose frequency bandwidth is known in advance. In step 201, response selection unit 112 may notify voice reproduction unit 152 and bandwidth selection unit 122 of the selected response voice. For example, upon system startup, response selection unit 112 may select a response voice suitable for the start of an interaction, such as “Hello.” Bandwidth selection unit 122 may select a bandwidth excluding the frequency bandwidth of the response voice provided by the response selection unit 112. In step 202, bandwidth selection unit 122 may notify voice detection unit 132 of bandwidth information indicating the selected bandwidth. In step 203, voice reproduction unit 152 may cause output unit 162 to reproduce the response voice provided by response selection unit 112.
In step 204, voice detection unit 132 may receive an input voice from input unit 172 and bandwidth information from bandwidth selection unit 122, and perform voice detection for the input voice. If voice detection unit 132 detects a voice (“Yes” in step 205), voice recognition unit 142 may use the result of the voice detection to perform voice recognition (e.g., step 206 of
Since the voice processing system in this exemplary embodiment performs voice detection for an input signal in a bandwidth excluding the frequency bandwidth of a response voice, the voice detection for the input signal may be possible even during output of the response signal. In some instances, when the frequency bandwidths of the response voice and of the input voice are likely to overlap each other, the voice processing system of this exemplary embodiment may vary the bandwidth in which the voice detection is performed, depending on the temporal variations in the response voice bandwidth, thereby enabling more accurate voice detection. Further, the voice processing system of this exemplary embodiment may select the response voice so that a frequency bandwidth used immediately before is continuously used as much as possible. In some instances, the overlap between the frequency bandwidths can be more accurately avoided when an identical user continuously performs voice input.
Voice response apparatus 3 may include a response selection unit 113, a bandwidth selection unit 123, a voice detection unit 133, a voice recognition unit 143, a voice reproduction unit 153, and a scenario reference unit 183.
In certain aspects, response selection unit 113, bandwidth selection unit 123, voice detection unit 133, voice reproduction unit 153, output unit 163, the input unit 173, and the response voice storage unit 213 are similar in functionality to corresponding elements of the voice processing system described above in reference to
For example, and as described above, response selection unit 113 may select a response voice from one or more response voices stored in the response voice storage unit 213. Response selection unit 113 may further notify bandwidth selection unit 123 and voice reproduction unit 153 of the selected response voice. Bandwidth selection unit 123 may select a bandwidth excluding the frequency bandwidth of the response voice selected by response selection unit 113, and may notify voice detection unit 133 of bandwidth information. Voice detection unit 133 may receive an input voice from input unit 173 and bandwidth information from bandwidth selection unit 123 and perform voice detection for the input voice. Voice reproduction unit 153 may stop voice reproduction upon receiving the notification.
Voice recognition unit 143 may use the result of voice detection provided by voice detection unit 133 to recognize a voice input from input unit 173. Voice recognition unit 143 may notify scenario reference unit 183 of the result of the recognition.
Scenario reference unit 183 may refer to scenario storage unit 223 to notify the response selection unit 113 of a scenario corresponding to the result of the recognition provided by voice recognition unit 143. Scenario storage unit 223 may store scenarios representing the content of responses corresponding to results of voice recognition. The response selection unit 113 may select a response voice corresponding to the received scenario.
A method for voice processing of the third exemplary embodiment will be described in detail with reference to the exemplary flowchart of
In step 301, response selection unit 113 may select a response voice from one or more response voices stored in response voice storage unit 213, each response voice having a frequency bandwidth known in advance. Further, response selection unit 113 may notify bandwidth selection unit 123 and voice reproduction unit 153 of the selected response voice.
In step 302, bandwidth selection unit 123 may select a bandwidth excluding the frequency bandwidth of the response voice provided by response selection unit 113. Further, bandwidth selection unit 123 may notify voice detection unit 133 of bandwidth information indicating the selected bandwidth.
In step 303, voice reproduction unit 153 may cause output unit 163 to reproduce the response voice provided by response selection unit 113.
In step 304, voice detection unit 133 may receive an input voice from input unit 173 and the bandwidth information from bandwidth selection unit 123 to perform voice detection.
If voice detection unit 133 detects a voice (“Yes” in step 305), the voice detection unit 133 may notify voice recognition unit 143 of the result of the detection. Further, voice recognition unit 143 may use the result of the voice detection to perform voice recognition (e.g., step 306 in
In step 307, scenario reference unit 183 may refer to scenario storage unit 223. If a scenario representing the content of a response corresponding to the result of the voice recognition exists in scenario storage unit 223 (“Yes” in step 308), scenario reference unit 183 may notify response selection unit 113 of that scenario, and the processing may return to step 301. In step 307, scenario reference unit 183 may refer to scenario storage unit 223. If scenario reference unit 183 does not find a scenario representing the content of a response corresponding to the result of the voice recognition (“No” in step 308), the voice processing may terminate.
Voice detection unit 412 may receive an input voice from input unit 472 and perform voice detection. If a voice is detected, voice detection unit 412 may notify bandwidth estimation unit 422 of detected bandwidth information.
Voice detection unit 412 may receive estimated bandwidth information from bandwidth estimation unit 422 and perform voice detection in a frequency bandwidth based on the estimated bandwidth information, which is discussed in detail below. By performing the voice detection in the frequency bandwidth in which the immediately preceding or past voice has been detected, more accurate voice detection may be possible when an identical user continuously performs voice input. The estimated bandwidth information resulting from the past detection may not exist at the start of the interaction. Therefore, voice detection unit 412 may, for example at the start of the voice detection, perform the voice detection in the entire frequency bandwidth.
Voice detection unit 412 may preliminarily divide the frequency bandwidth in which the voice detection is to be performed into multiple subbands (also referred to herein as “partial bandwidths”). Voice detection unit 412 may further weigh each subband depending on the amount (gain) of the detected input voice in the subband, and notify bandwidth estimation unit 422 of the detected bandwidth information such that a larger weight is assigned to a subband in which the amount of the input voice is larger than a predetermined value. Techniques of subband-based voice detection may be well known to a skilled artisan.
Bandwidth estimation unit 422 may select a bandwidth excluding at least part of the detected frequency bandwidth and notify response selection unit 432 of the selected bandwidth as the estimated bandwidth information. Bandwidth estimation unit 422 may notify voice detection unit 412 of the estimated bandwidth information. In some aspects, voice detection unit 412 may perform voice detection for the next input voice in frequencies excluding the frequencies indicated in the estimated bandwidth information. As the estimated bandwidth information, bandwidth estimation unit 422 may provide a frequency bandwidth estimated information from the result of the immediately preceding voice detection, or may provide a frequency bandwidth resulting from smoothing frequency bandwidths estimated from the results of immediately preceding voice detections.
Response selection unit 432 may select, from the response voice storage unit 512, a response voice appropriate as a response and containing many components of the bandwidth indicated in the estimated bandwidth information provided by the bandwidth estimation unit 422. For example, as illustrated in
Further, if the estimated bandwidth information is weighted on a subband basis, response selection unit 432 may select the response voice based on the weights. As an example, assume that the bandwidth of the input voice is divided in the frequency direction into eight subbands B1 to B8, where B1 is assigned a weight of 0, B2 to B3 are assigned a large weight, B4 to B5 are assigned a small weight, B6 is assigned a large weight, and B7 to B8 are assigned a medium weight. Response selection unit 432 may select a response having fewer components in the subbands B2 to B3 and B6 and more components in the subbands B4 to B5, among the response voice candidates.
Voice reproduction unit 452 may cause output unit 462 to reproduce the response voice selected by response selection unit 432. Voice reproduction unit 452 may be notified when voice detection unit 412 starts voice detection. Voice reproduction unit 452 may stop the voice reproduction upon receiving the notification. In this manner, the reproduction of the response voice may be stopped upon the voice detection, so that voice detection unit 412 can more accurately perform subsequent voice detection.
A method for voice processing of the fourth exemplary embodiment will be described in detail with reference to
In step 501, voice detection unit 412 may receive an input voice and perform voice detection. Further, voice detection unit 412 may provide a notification of detected bandwidth information indicating the bandwidth covering the detected voice.
In step 502, and width estimation unit 422 may select a bandwidth excluding at least part of the bandwidth indicated in the detected bandwidth information. Further, bandwidth estimation unit 422 may notify response selection unit 432 of estimated bandwidth information indicating the selected bandwidth.
In step 503, response selection unit 432 may select, from the response voice storage unit 512, a response voice appropriate as a response and containing many components of the bandwidth indicated in the estimated bandwidth information.
In step 504, voice reproduction unit 452 may cause output unit 472 to reproduce the response voice selected by the response selection unit 432.
Voice detection unit 412 may perform voice detection for the next input voice, and if a voice is detected (“Yes” in step 505), may notify bandwidth estimation unit 422 of detected bandwidth information, and the processing may return to step 502. If voice detection unit 412 does not detect a voice (“No” in step 505), the voice processing may terminate.
The voice processing system of this exemplary embodiment can accurately detect an input voice during output of a response voice. When an identical user continuously performs voice input, the voice processing system of this exemplary embodiment may be able to perform more accurate voice detection by detecting a voice in a frequency bandwidth in which the immediately preceding or past voice has been detected.
With subband-based weighting in the voice detection, the voice processing system of this exemplary embodiment may select a response voice having a small gain or weight in bandwidth portions that have a large gain or weight for the input voice. This may allow expanding the range of variations of the response voice while preventing reduction in accuracy of the voice detection.
In a certain aspects, voice detection unit 413, bandwidth estimation unit 423, response selection unit 433, the voice reproduction unit 453, output unit 463, input unit 473, and response voice storage unit 513 are similar in functionality to corresponding elements of the voice processing system described above in reference to
Voice recognition unit 443 may use the result of voice detection provided by voice detection unit 413 to recognize a voice input from the input unit 473. Voice recognition unit 443 may notify scenario reference unit 483 of the result of the recognition.
Scenario reference unit 483 may refer to scenario storage unit 523 to notify response selection unit 433 of a scenario corresponding to the result of the recognition provided by t voice recognition unit 443. Response selection unit 433 may select a response voice corresponding to the received scenario. Scenario storage unit 523 may store scenarios representing the content of responses corresponding to results of voice recognition. The scenarios may be defined in a text-representation, or may be described in meta-representation which permits a degree of freedom in expression and vocabulary within the constraint that the meta-representation has the same meaning.
If the scenarios are defined in a text-representation, response voices of an identical text-representation may be voices of speakers having different voice qualities and speaking manners so that their frequency bandwidths do not overlap each other. If the scenarios are described in meta-representation, response voices having an identical meaning may take advantage of differences in expression and vocabulary in addition to differences invoice quality, so that their frequency bandwidths have still less overlap with each other. For example, expressions of asking, “ . . . shitekudasai” and “ . . . wo onegaisimasu,” may use different dominant frequency bandwidths. That is, response selection unit 433 may take advantage of the unevenness of phonemes in response voices to select a response voice containing many phonemes having a smaller amount of overlap between the frequency bandwidths. For example, phonemes of syllables beginning with the s-sound of Japanese may contain many high frequency bandwidth components. Therefore, if the input voice is assumed to contain low frequency bandwidth components, e response selection unit 433 may select a response voice with word or expression containing many phonemes of syllables beginning with the s-sound.
A method for voice processing of the fifth exemplary embodiment will be described in detail with reference to
In step 601, bandwidth estimation unit 423 may perform a first bandwidth estimation and notify response selection unit 433 of the first estimated bandwidth information.
In step 602, response selection unit 433 may select a response voice based on the first estimated bandwidth information.
In step 603, voice reproduction unit 453 may cause output unit 463 to reproduce the response voice provided by response selection unit 433.
In step 604, voice detection unit 413 may perform voice detection for an input voice received from input unit 473.
If a voice is detected (“Yes” in step 605), voice detection unit 413 may notify voice recognition unit 443 of the result of the detection. If voice detection unit 413 does not detect a voice (“No” in step 605), the voice processing may terminate.
In step 606, bandwidth estimation unit 423 may perform a second bandwidth estimation for the detected voice and notify response selection unit 433 of estimated bandwidth information.
In step 607, voice recognition unit 443 may perform voice recognition using the result of the voice detection and notify scenario reference unit 483 of the result of the voice recognition.
In step 608, scenario reference unit 483 may refer to the scenario storage unit 523. If a scenario corresponding to the result of the recognition provided by voice recognition unit 443 exists (“Yes” in step 609), scenario reference unit 483 may notify response selection unit 433 of the corresponding scenario, and the processing may return to step 602. If a scenario corresponding to the result of the recognition provided by voice recognition unit 443 does not exist (“No” in step 609), the voice processing may terminate.
The voice processing system of this exemplary embodiment, as a voice processing system that responds based on a scenario, may accurately perform voice detection for an input signal even during output of a response signal.
As illustrated in
Bandwidth estimation unit 424 may select a bandwidth excluding at least part of the frequency bandwidth indicated in the detected bandwidth information, and notify response selection unit 434 of the selected bandwidth as estimated bandwidth information.
Response selection unit 434 may select a response voice containing many components of the bandwidth indicated in the estimated bandwidth information. A voice may contain many components of a bandwidth if the amount (gain) of the voice in that bandwidth is larger than a predetermined value. A response voice may refer to a voice that is output by the voice processing system. For example, a response voice may be a voice whose content represents a response to the content of an input voice.
Response selection unit 434 may select a response voice based on characteristics of the human voice. A male voice and a female voice may be different in their dominant frequency bandwidth components. Based on this characteristic, for example, when the input voice is assumed to be a male voice, response selection unit 434 may select a female response voice. When the input voice is assumed to be a female voice, response selection unit 434 may select a male response voice. This processing may allow avoiding the overlap between the frequency bandwidths used by the input voice and the response voice.
Response selection unit 434 may also artificially subtract, from the selected response voice, bandwidth components of the frequency bandwidth used for detecting the input voice. This processing may allow avoiding the overlap between the frequency bandwidths used by the input voice and the response voice.
For example, if the frequency bandwidth of 200 to 500 Hz and 2400 to 2800 Hz is detected for the input voice, response selection unit 434 may subtract these frequency bandwidth components from the response voice. In some aspects, if frequency bandwidth components overlapping between the input voice and the response voice are completely deleted from the response voice, the resulting response voice may sound unnatural. Therefore, response selection unit 434 may make a soft decision, for example reducing the gain, for the overlapping frequency bandwidth components, rather than completely deleting the frequency bandwidth components.
The following is an exemplary list of of rules for response selection unit 434 to identify frequency bandwidth components to be subtracted.
A first rule may be to cut a fixed partial bandwidth in a medium bandwidth. The medium bandwidth is a bandwidth that may be known to be suitable for voice detection, and in which components of human speech certainly exist. Therefore, response selection unit 434 may process the response voice by cutting a fixed frequency range (part of the medium bandwidth) using, for example, a notch filter.
For example, as a criterion for frequencies of the medium bandwidth, the response selection unit 434 may set a bandwidth between 300 Hz and 1 kHz, in which the long-term voice spectra of many people have large values.
As another example, for an input voice of 960 Hz, setting the frequency bandwidth to be subtracted around 0.5 kHz has an undetectable influence on the voice quality of the response voice. If changing the voice quality of the response voice is permitted as in this implementation, the frequency bandwidth to be subtracted may be set around several tens to several hundreds of hertz.
A second rule may be to cut formant valleys of a synthetic voice. Human recognition of voices is known to be sensitive to formant peaks and insensitive to formant valleys. Based on this characteristic, response selection unit 434 may cut the frequency bandwidths of formant valleys of the response voice.
For example, using a valley between a first formant peak and a second formant peak, response selection unit 434 may set the frequency bandwidth to be cut between about 200 Hz and 2000 Hz, which is suitable for detection of the input voice. Especially when the response voice is a synthetic voice, response selection unit 434 can make the voice more natural by finely measuring this frequency bandwidth as compared to a natural voice, or by processing the voice to intentionally drop this frequency bandwidth.
A third rule may be to cut a frequency bandwidth in which the ratio between the long-term spectra of the input voice and of the response voice is large. The long-term spectrum of the input voice may vary with users. Therefore, by determining a frequency bandwidth in which the ratio between the long-term spectra of the input voice and the response voice is large, and cutting that frequency bandwidth, a point may be found at which the response voice is small and the input voice may be easily detected.
In this case, determining the frequency bandwidth based on only the ratio between the long-term spectra may lead to selecting a region where both of the input voice and the response voice have low powers. Therefore, the third rule may be applied in combination with the first and second rules.
Since the frequency bandwidth components may depend on the first phoneme in speech, e.g., human speech, the accuracy of detection of the input voice may be further increased by setting multiple frequency bandwidths corresponding to the components to be subtracted from the response voice.
An exemplary method for voice processing of the sixth exemplary embodiment will be described in detail with reference to
In step 701, voice detection unit 414 may receive an input voice and perform voice detection. Voice detection unit 414 may notify bandwidth estimation unit 424 of detected bandwidth information indicating the frequency bandwidth covering the detected voice.
In step 702, bandwidth estimation unit 424 may select a bandwidth excluding at least part of the frequency bandwidth indicated in the detected bandwidth information provided by voice detection unit 414. Further, bandwidth estimation unit 424 may notify the response selection unit 434 of estimated bandwidth information indicating the selected bandwidth.
In step 703, response selection unit 434 may select a response voice containing many components of the bandwidth indicated in the estimated bandwidth information.
In step 704, the response selection unit 434 may process the selected response voice by subtracting predetermined frequency bandwidth components based on, for example, the above-described rules.
As illustrated in
Bandwidth setting unit 811 may set at least one of a first frequency bandwidth to be used by voice detection unit 821 (to be described later) for voice detection and a second frequency bandwidth to be contained in a response voice selected by response selection unit 831 (to be described later) so that the first and second frequency bandwidths do not overlap each other.
Bandwidth setting unit 811 may receive detected bandwidth information indicating the frequency bandwidth covering a detected input voice from voice detection unit 821, and set the second frequency bandwidth based on the detected bandwidth information so as not to overlap with the frequency bandwidth indicated in the detected bandwidth information. Bandwidth setting unit 811 may notify response selection unit 831 (of estimated bandwidth information indicating the set frequency bandwidth.
Once response selection unit 831 selects a response voice and notifies bandwidth setting unit 811 of the selected response voice, bandwidth setting unit 811 may set a frequency bandwidth excluding the frequency bandwidth of the response voice as the first frequency bandwidth. Bandwidth setting unit 811 may notify voice detection unit 831 of bandwidth information indicating the first frequency bandwidth.
Voice detection unit 821 may receive an input voice and perform voice detection. If a voice is detected, voice detection unit 821 may notify bandwidth setting unit 811 of detected bandwidth information indicating the frequency bandwidth covering the voice. Voice detection unit 821 may receive bandwidth information from bandwidth setting unit 811 and perform voice detection using at least part of the bandwidth indicated in the bandwidth information. Response selection unit 831 may select a response voice containing many components of the bandwidth indicated in the estimated bandwidth information.”
A method for voice processing according to the seventh exemplary embodiment will be described in detail with reference to the flowchart of
In the step 801, bandwidth setting unit 811 may receive appropriate information from voice detection unit 821 or response selection unit 831.
In step 802, based on the received information, bandwidth setting unit 811 may set at least one of the first frequency bandwidth to be used by voice detection unit 821 for voice detection or second frequency bandwidth to be contained in a response voice selected by e response selection unit 831 so that the first and second frequency bandwidths do not overlap each other.
While the exemplary methods and processes may be described herein as a series of steps, it is to be understood that the order of the steps may be varied. In particular, non-dependent steps may be performed in any order, or in parallel. Also, the above-noted features and other aspects and principles of the present disclosure may be implemented in various environments. Such environments and related applications may be specially constructed for performing the various processes and operations of the disclosure or they may include a general-purpose computer or computing platform selectively activated or reconfigured by code to provide the necessary functionality. The processes disclosed herein are not inherently related to any particular computer or other apparatus, and may be implemented by a suitable combination of hardware, software, and/or firmware. For example, various general-purpose machines may be used with programs written in accordance with teachings of the disclosure, or it may be more convenient to construct a specialized apparatus or system to perform the required methods and techniques.
Systems and methods consistent with the present disclosure also include computer readable media that include program instruction or code for performing various computer-implemented operations based on the methods and processes of the disclosure. The media and program instructions may be those specially designed and constructed for the purposes of the disclosure, or they may be of the kind well known and available to those having skill in the computer software arts. Examples of program instructions include for example machine code, such as produced by a compiler, and files containing a high level code that can be executed by the computer using an interpreter.
Number | Date | Country | Kind |
---|---|---|---|
2014-070717 | Mar 2014 | JP | national |
2014-070718 | Mar 2014 | JP | national |