This application is based on and claims the benefit of priority of Japanese Patent Application No. 2021-061316 filed on Mar. 31, 2021, the entire contents of which are incorporated herein by reference.
The present disclosure relates to an audio feedback detection apparatus and an audio feedback detection method.
JP-A-2004-023722 discloses an audio feedback suppression apparatus including: filter means for filtering an audio signal of at least one of an input side and an output side of signal path setting means for setting signal paths of a plurality of audio signals for each of the signal paths; input and output signal path combination selection means for selecting a combination of one or more input and output signal paths based on a comparison result between a frequency characteristic of the input side audio signal and a frequency characteristic of the output side audio signal; audio feedback detection means for detecting an audio feedback of the selected input and output signal path; filter information generation means for generating filter information on audio feedback suppression based on an audio feedback characteristic; and filter control means for controlling the filter means of the selected input and output signal path. The filter means suppresses the audio feedback occurring in the selected input and output signal path based on the filter information.
Here, in a web conference system, it is assumed to suppress an audio feedback that may occur when a plurality of PCs each including a microphone and a speaker are disposed at an own place and are connected to a partner's place. A configuration disclosed in JP-A-2004-023722 is limited to a case where a plurality of audio input terminals and a plurality of audio output terminals included in the signal path setting means in the audio feedback suppression apparatus are controlled in the same audio feedback suppression apparatus. Therefore, when each of the plurality of PCs arranged at the own place detects the audio feedback in the web conference system described above using the configuration disclosed in JP-A-2004-023722, one PC at the own place cannot acquire an audio signal that can be acquired by the microphone included in the other PC arranged at the same own place and an audio signal output from the speaker included in the other PC. Therefore, it is difficult to apply the configuration disclosed in JP-A-2004-023722 to the web conference system described above.
In addition, a method of performing audio feedback detection based on a frequency characteristic of the audio signal obtained by only one of the audio input terminal (that is, the microphone) and the audio output terminal (the speaker) included in the PC may be considered. However, in this method, the frequency characteristic extracted from the audio signal obtained by only one of the audio input terminal and the audio output terminals is affected by signal processing performed in a transmission path (for example, an application used in the web conference system) of the audio signal. Therefore, the audio feedback cannot be accurately detected, and it is difficult to suppress the occurrence of the audio feedback.
The present disclosure has been made in view of the above-described situations in the related art, and an object thereof is to provide an audio feedback detection apparatus and an audio feedback detection method that suppress occurrence of an audio feedback that may occur between an audio input and output apparatus and another audio input and output apparatus including a microphone and a speaker.
The present disclosure provides an audio feedback detection apparatus including: a communication unit configured to communicate with one or more other terminals via a network; a microphone configured to acquire a first audio signal based on an utterance of a talker; a speaker configured to output a second audio signal from the one or more other terminals, the second audio signal being received by the communication unit and processed by an audio communication application; and an audio signal processing unit configured to detect whether an audio feedback is present based on a correlation between a frequency characteristic of the first audio signal input to the audio communication application and a frequency characteristic of the second audio signal input to the speaker, wherein one of the microphone and the speaker is located on a path in which the audio feedback occurs, and the other of the microphone and the speaker is located on a path in which the audio feedback does not occur.
Further, the present disclosure provides an audio feedback detection apparatus including: a first audio signal input unit configured to acquire a first audio signal collected by a microphone; a second audio signal input unit configured to acquire a second audio signal from an audio communication application to be output to a speaker; and an audio feedback determination unit configured to determine whether an audio feedback is present based on a correlation between a frequency characteristic of the first audio signal input to the audio communication application and a frequency characteristic of the second audio signal input to the speaker, wherein one of the microphone and the speaker is located on a path in which the audio feedback occurs, and the other of the microphone and the speaker is located on a path in which the audio feedback does not occur.
The present disclosure provides an audio feedback detection method executed by a computer including a microphone and a speaker and capable of communicating with one or more other terminals via a network, the audio feedback detection method including: acquiring a first audio signal based on an utterance of a talker by the microphone; acquiring a second audio signal from the one or more other terminals processed by an audio communication application installed so as to be executable by the computer; and detecting whether an audio feedback is present based on a correlation between a frequency characteristic of the first audio signal input to the audio communication application and a frequency characteristic of the second audio signal input to the speaker, wherein one of the microphone and the speaker is located on a path in which an audio feedback occurs, and the other of the microphone and the speaker is located on a path in which the audio feedback does not occur.
Further, the present disclosure provides an audio feedback detection method including the steps of: acquiring a first audio signal collected by a microphone; acquiring a second audio signal from an audio communication application to be output to a speaker; and determining whether an audio feedback is present based on a correlation between a frequency characteristic of the first audio signal input to the audio communication application and a frequency characteristic of the second audio signal input to the speaker, wherein one of the microphone and the speaker is located on a path in which an audio feedback occurs, and the other of the microphone and the speaker is located on a path in which the audio feedback does not occur.
The present disclosure provides an audio feedback detection apparatus including: a microphone configured to acquire a first audio signal based on an utterance of a talker; a communication unit configured to communicate an audio signal obtained by processing the first audio signal by an audio communication application with one or more other terminals via a network; a speaker configured to output a second audio signal from the one or more other terminals received by the communication unit and processed by the audio communication application; and an audio signal processing unit configured to detect whether an audio feedback is present based on the first audio signal before being processed by the audio communication application.
Further, the present disclosure provides an audio feedback detection apparatus including: a first audio signal detection unit configured to acquire a first audio signal collected by a microphone; a second audio signal detection unit configured to acquire a second audio signal from an audio communication application to be output to a speaker; and an audio feedback determination unit configured to determine whether audio feedback is present based on the first audio signal which has not been processed by the audio communication application.
The present disclosure provides an audio feedback detection method executed by a computer including a microphone and a speaker and capable of communicating with one or more other terminals via a network, the method including: acquiring a first audio signal based on an utterance of a talker by the microphone; transmitting an audio signal obtained by processing the first audio signal by an audio communication application installed to be executable by the computer to the one or more other terminals via the network; acquiring a second audio signal from the one or more other terminals processed by the audio communication application installed to be executable by the computer; and detecting whether an audio feedback is present based on the first audio signal before being processed by the audio communication application.
Further, the present disclosure provides an audio feedback detection method including: acquiring a first audio signal collected by a microphone; acquiring a second audio signal from an audio communication application to be output to a speaker; and determining whether audio feedback is present based on the first audio signal which has not been processed by the audio communication application.
According to the present disclosure, occurrence of an audio feedback that may occur with another audio input and output apparatus including a microphone and a speaker can be suppressed.
Hereinafter, an embodiment specifically disclosing an audio feedback detection apparatus and an audio feedback detection method according to the present disclosure will be described in detail with reference to the drawings as appropriate. However, an unnecessary detailed description may be omitted. For example, a detailed description of a well-known matter or a repeated description of substantially the same configuration may be omitted. This is to avoid unnecessary redundancy in the following description and to facilitate understanding for those skilled in the art. It should be noted that the accompanying drawings and the following description are provided for a thorough understanding of the present disclosure by those skilled in the art, and are not intended to limit the subject matter recited in the claims.
First, before describing a configuration of a web conference system 100 according to a first embodiment, problems to be solved by the web conference system 100 will be briefly described. It is assumed that a plurality of PCs (personal computers) are connected to the same network at an own place (for example, the same space such as the same conference room), and a web conference is performed between the plurality of PCs and a PC at a partner's place connected to the network.
In this case, for example, when both a microphone and a speaker included in each of the plurality of PCs at the own place are turned on, an audio feedback occurs. There is a problem in that, for example, an audio signal output from a speaker of a first PC at the own place is collected by a microphone of a second PC at the same own place, and an echo of the audio signals continues in a loop shape, thereby causing the unpleasant audio feedback. When the audio feedback occurs, a progress of the web conference does not proceed as expected, and a work efficiency may deteriorate. Therefore, it is necessary to detect the occurrence of the audio feedback in real time and interrupt a loop by blocking the audio signal (hereinafter, may be referred to as a “speaker signal”) input to the speaker and output from the speaker or the audio signal (hereinafter, may be referred to as a “microphone signal”) collected by the microphone.
Therefore, the following first embodiment describes an example of a computer (PC) including a microphone and a speaker as an example of an audio feedback detection apparatus that suppresses occurrence of an audio feedback that may occur between the computer (PC) and another PC including a microphone and a speaker provided at the same place in a configuration of one computer (PC) in the above-described web conference system.
Instead of the microphone MIC1 and the speaker SPK1 incorporated in the PC 10, an external speaker microphone DV1 integrally provided with configurations and functions of the microphone and the speaker may be connected to the PC 10 for use. That is, the speaker microphone DV1 has both a function of the microphone MIC1 that collects a voice based on an utterance of an operator (talker) of the PC 10 and a function of the speaker SPK1 that outputs a voice from the PCs 20 and 30 other than the PC 10.
The PC 10 includes a memory 11, an operation device 12, a storage 13, a processor 14, a communication interface 15, the microphone MIC1, and the speaker SPK1. These units are connected to each other via an internal bus (not shown) or the like such that data or signals can be transmitted and received.
The memory 11 includes at least a random access memory (RAM) as a work memory used when, for example, the processor 14 performs various kinds of processing, and a read only memory (ROM) that stores a program (including a program of a web conference application 142) for defining the various kinds of processing performed by the processor 14 and data used during the execution of the program. The data or information generated or acquired by the processor 14 is temporarily stored in the RAM. In the ROM, the program for defining the various kinds of processing performed by the processor 14 and the data used during the execution of the program are written.
For example, the memory 11 stores a threshold A and a threshold B. The threshold A is a value used by the processor 14 to determine a protrusion degree with respect to a surrounding bin (see
For example, the memory 11 stores current values (for example, initial values) of a first gain and a second gain set in variable amplifiers VG1 and VG2. However, when the occurrence of an audio feedback is detected by an audio feedback detector HWD as described later, at least one of the first gain and the second gain is adjusted to be lowered, and thus the memory 11 may store at least one of the first gain and the second gain. The first gain and the second gain before the adjustment may be discarded from the memory 11, or may be continuously stored.
In addition, for example, the memory 11 temporarily stores the number of times of audio feedback detection detected by the audio feedback detector HWD of an audio signal processing unit 141 of the processor 14 during the web conference using the web conference system 100. Since the number of times of detection indicates the number of times the audio feedback is counted by the audio feedback detector HWD during the web conference using the web conference system 100, and for example, when the web conference is ended, the number of times of detection is reset to zero.
The operation device 12 is configured using, for example, at least one of devices such as a mouse, a keyboard, a touch pad, and a touch panel. The operation device 12 receives an input operation performed by the operator (that is, a user of the web conference system 100) who uses the PC 10, and inputs a signal corresponding to the input operation to the processor 14. In the following description, the operator who uses the PC 10 may be referred to as a “PC 10 user”, an operator who uses the PC 20 may be referred to as a “PC 20 user”, and an operator who uses the PC 30 may be referred to as a “PC 30 user” for convenience.
The storage 13 is configured using a storage medium such as a flash memory, a hard disk drive (HDD), or a solid state drive (SSD). The storage 13 stores the data or the information generated or acquired by the processor 14 regardless of whether the PC 10 is powered on.
The processor 14 is configured using a semiconductor chip on which at least one of electronic devices such as a central processing unit (CPU), a digital signal processor (DSP), a graphical processing unit (GPU), and a field programmable gate array (FPGA) is mounted. The processor 14 functions as a controller that controls an overall operation of the PC 10, and performs control processing for controlling operations of each of the units of the PC 10, data input and output processing with each of the units of the PC 10, data calculation processing, and data storage processing. The processor 14 can functionally execute the audio signal processing unit 141 and the web conference application 142 by using the program and the data stored in the ROM of the memory 11. The processor 14 uses the RAM of the memory 11 during the operation, and temporarily stores the data or the information generated or acquired by the processor 14 in the RAM of the memory 11.
The audio signal processing unit 141 includes at least the audio feedback detector HWD and the variable amplifiers VG1 and VG2. The audio signal processing unit 141 inputs an audio signal (that is, a microphone signal in a time domain as an example of a first audio signal) of the PC 10 user after being collected by the microphone MIC1 and before being input to the web conference application 142 and an audio signal (that is, a speaker signal in a time domain as an example of a second audio signal) after being processed by the web conference application 142 and before being input to the speaker SPK1 to the audio feedback detector HWD. Although not shown, the audio signal processing unit 141 includes a first audio input unit that acquires the audio signal (that is, the audio signal to be input to the audio feedback detector HWD) collected by the microphone MIC1, a first audio output unit that outputs the audio signal (that is, an audio signal amplified or suppressed by the variable amplifier VG1) to be output to the web conference application 142, a second audio input unit that acquires the audio signal (that is, an audio signal to be input to the audio feedback detector HWD) received via the web conference application 142, and a second audio output unit that outputs the audio signal (that is, an audio signal amplified or suppressed by the variable amplifier VG2) to be output to the speaker SPK1.
The audio signal processing unit 141 detects whether the audio feedback occurs in the web conference system 100 using the audio feedback detector HWD based on a correlation between frequency characteristics of the input microphone signal and the input speaker signal. In response to the detection of the audio feedback, the audio signal processing unit 141 adjusts at least one of the first gain for suppressing the microphone signal in the time domain described above and the second gain for suppressing the speaker signal in the time domain described above in the variable amplifiers VG1 and VG2. For example, the audio signal processing unit 141 adjusts the second gain to be larger than the first gain. Accordingly, since the speaker signal output from the PC 10 is more suppressed than the microphone signal to be transmitted to the other PCs 20 and 30, the audio feedback in the web conference system 100 can be effectively suppressed. A configuration example of the audio feedback detector HWD will be described later with reference to
The web conference application 142 is an application executed by the processor 14 during the web conference using the web conference system 100, and is installed in each of the PCs 10, 20, and 30 constituting the web conference system 100 in an executable manner. The web conference application 142 is, for example, an application called Microsoft Teams (registered trademark) provided by Microsoft Corporation or an application called Zoom (registered trademark) provided by Zoom Video Communications, and is not limited to thereto. The web conference application 142 performs various types of signal processing, such as amplification and filtering, on the microphone signal based on sound collected by the microphone MIC1, and outputs the microphone signal to the communication interface 15. The web conference application 142 performs various types of signal processing such as amplification and filtering on the audio signal (speaker signal) received by the communication interface 15, and outputs the audio signal to the audio signal processing unit 141.
The communication interface 15 is configured using, for example, a communication device capable of transmitting and receiving the data or the information to and from the network NW1. The communication interface 15 transmits, for example, the data or the information (for example, an audio signal Tx of the PC 10 user processed by the web conference application 142) generated or acquired by the processor 14 to the other PCs 20 and 30 via the network NW1. The communication interface 15 receives data or information (for example, an audio signal Rx processed by a web conference application installed in the PC 20 or the PC 30 based on an utterance of the PC 20 user or the PC 30 user) transmitted from the other PCs 20 and 30, and inputs the data or the information to the processor 14.
A display device 16 is configured using, for example, a liquid crystal display (LCD) or an organic electroluminescence (EL) display, and displays the data or the information (for example, display screens MSG1a and MSG1b shown in
The microphone MIC1 collects the sound based on the utterance (for example, an utterance during the web conference using the web conference system 100) of the PC 10 user, and inputs the audio signal obtained by the sound collection to the processor 14. Specifically, the audio signal from the microphone MIC1 is input to the audio signal processing unit 141 of the processor 14.
The speaker SPK1 acoustically outputs the audio signal (for example, an audio signal based on the collection of the sound made by the PC 20 user or the PC 30 user during the web conference using the web conference system 100) processed by the processor 14. A part of the audio signal output from the speaker SPK1 goes around (that is, is diffracted) and is collected by the microphone MIC2 included in another PC 20 (see
Next, an operation outline example of the PC 10 according to the first embodiment will be described with reference to
In
That is, as shown in
The processor 14 of the PC 10 according to the first embodiment can detect whether the audio feedback is present based on the correlation between the frequency characteristics of the microphone signal obtained by the sound collection by the microphone MIC1 and the speaker signal output from the speaker SPK1 (details will be described later). In response to the detection of the occurrence of the audio feedback, the processor 14 of the PC 10 performs processing (gain adjustment processing to be described later) of decreasing a gain to be multiplied by at least one of the microphone signal and the speaker signal in a stepwise manner in order to suppress at least one of the microphone signal in the time domain based on the sound collection by the microphone MIC1 and the speaker signal in the time domain before being input to the speaker SPK1. Accordingly, the occurrence of the audio feedback is gradually suppressed.
In response to the detection of the occurrence of an audio feedback, the processor 14 of the PC 10 displays, on the display device 16, a display screen MSG1a for notifying the occurrence of an audio feedback or a display screen MSG1b for notifying that a volume of the audio signal output from the speaker SPK1 is being suppressed. The PC 10 may display both the display screens MSG1a and MSG1b on the display device 16. Thus, the PC 10 user can visually recognize via the display device 16 that the audio feedback occurs during the web conference using the web conference system 100.
When the PC 10 detects the audio feedback, the processor 14 of the PC 10 may transmit display instructions of display screens MSG2 and MSG3 for notifying the occurrence of an audio feedback to the PCs 20 and 30 via the network NW1. The PCs 20 and 30 generate the display screens MSG2 and MSG3 based on the display instructions from the PC 10 and display the display screens MSG2 and MSG3 on the display device, respectively. Each of the display screens MSG2 and MSG3 may be received by each of the PCs 20 and 30 from the PC 10 together with the above-described display instructions generated by the PC 10. Accordingly, each of the PC 20 user and the PC 30 user can visually recognize that the audio feedback occurs during the web conference using the web conference system 100 via the corresponding display devices 16 of the PCs 20 and 30.
Next, a configuration example of the audio feedback detector HWD included in the PC 10 according to first embodiment will be described with reference to
The audio signal processing unit 141 includes at least the audio feedback detector HWD and the variable amplifiers VG1 and VG2. The audio feedback detector HWD includes a frequency domain conversion unit 21, a peak detection unit 22, a frequency domain conversion unit 23, a peak detection unit 24, a peak match determination unit 25, a peak match time calculation unit 26, an audio feedback determination unit 27, and gain adjustment units 28 and 29. The audio feedback detector HWD temporarily stores the input microphone signal and the input speaker signal in the memory 11, and cooperates with the memory 11 to perform corresponding processing in each of the units constituting the audio feedback detector HWD.
The frequency domain conversion unit 21 converts, for example, the microphone signal in the time domain into the microphone signal Mc1 in the frequency domain by performing Fourier transform on the microphone signal from the microphone MIC1, and outputs the microphone signal Mc1 in the frequency domain to the peak detection unit 22. In the configuration example of the audio signal processing unit 141 shown in
Based on a frequency characteristic of the microphone signal Mc1 in the frequency domain from the frequency domain conversion unit 21, the peak detection unit 22 detects first peak frequencies f1 and f2 (see
The frequency domain conversion unit 23 converts, for example, the speaker signal in the time domain into a speaker signal Sp1 in the frequency domain by performing Fourier transform on the speaker signal before being input to the speaker SPK1, and outputs the speaker signal Sp1 in the frequency domain to the peak detection unit 24. In the configuration example of the audio signal processing unit 141 shown in
Based on a frequency characteristic of the speaker signal Sp1 in the frequency domain from the frequency domain conversion unit 23, the peak detection unit 24 detects second peak frequencies f3 and f4 (see
The peak match determination unit 25 determines whether the first peak frequencies (for example, f1 and f2 shown in
When the determination result from the peak match determination unit 25 indicates that the first peak frequency matches the second peak frequency, the peak match time calculation unit 26 determines whether a time for which the peaks match each other is continuous for a predetermined time (for example, 100 milliseconds) or more. The predetermined time is not limited to 100 milliseconds. The peak match time calculation unit 26 outputs a determination result to the audio feedback determination unit 27.
The audio feedback determination unit 27 determines whether the audio feedback occurs based on the determination result from the peak match time calculation unit 26. Specifically, the audio feedback determination unit 27 determines that the audio feedback occurs when it is determined that the first peak frequency detected by the peak detection unit 22 and the second peak frequency detected by the peak detection unit 24 match each other and the match time is continuous for the predetermined time or more. The audio feedback determination unit 27 outputs the determination result to each of the gain adjustment units 28 and 29.
When the determination result indicates that the occurrence of an audio feedback is detected, the audio feedback determination unit 27 outputs an instruction to adjust the first gain and the second gain to the gain adjustment units 28 and 29.
When the determination result from the audio feedback determination unit 27 indicates that the occurrence of an audio feedback is detected, the gain adjustment unit 28 adjusts the first gain by which the microphone signal is multiplied to suppress the power (level) of the microphone signal to be lower than the current value (for example, the initial value) stored in the memory 11 based on the instruction from the audio feedback determination unit 27 and the number of times of audio feedback detection stored in the memory 11. The gain adjustment unit 28 outputs the adjusted first gain to the variable amplifier VG1. An amount of reduction of the first gain from the current value is set in advance (for example, is stored in the memory 11). The gain adjustment unit 28 may include an adaptive notch filter NF1. The adaptive notch filter NF1 adjusts and sets a notch suppression gain (that is, a gain for suppressing the microphone signal) based on the determination result of the audio feedback determination unit 27, and outputs the set notch suppression gain (an example of the first gain described above) to the variable amplifier VG1. When the gain adjustment unit 28 receives a determination result indicating that the audio feedback is not detected from the audio feedback determination unit 27 after adjusting the first gain, the gain adjustment unit 28 may gradually or at once return the adjusted first gain to an original value.
When the determination result from the audio feedback determination unit 27 indicates that the occurrence of an audio feedback is detected, the gain adjustment unit 29 adjusts the second gain by which the speaker signal is multiplied to suppress the power (level) of the speaker signal to be lower than the current value (for example, the initial value) stored in the memory 11 based on the instruction from the audio feedback determination unit 27 and the number of times of audio feedback detection stored in the memory 11. The gain adjustment unit 29 outputs the adjusted second gain to the variable amplifier VG2. An amount of reduction of the second gain from the current value is set in advance (for example, is stored in the memory 11), and may be larger, smaller, or the same as the amount of reduction of the first gain from the current value described above. The gain adjustment unit 29 may include an adaptive notch filter NF2. The adaptive notch filter NF2 adjusts and sets a notch suppression gain (that is, a gain for suppressing the speaker signal) based on the determination result of the audio feedback determination unit 27, and outputs the set notch suppression gain (an example of the second gain described above) to the variable amplifier VG2. When the gain adjustment unit 29 receives a determination result indicating that the audio feedback is not detected from the audio feedback determination unit 27 after adjusting the second gain, the gain adjustment unit 29 may gradually or at once return the adjusted second gain to an original value.
The variable amplifier VG1 is disposed such that the microphone signal from the microphone MIC1 is input to the variable amplifier VG1 before being input to the web conference application 142, and amplifies or suppresses the microphone signal using the first gain instructed by the audio feedback detector HWD. For example, the variable amplifier VG1 amplifies or suppresses the microphone signal based on the adjusted first gain from the gain adjustment unit 28. Therefore, when the first gain is reduced due to the adjustment performed by the gain adjustment unit 28, the microphone signal is suppressed by the variable amplifier VG1.
The variable amplifier VG2 is disposed such that the speaker signal processed by the web conference application 142 is input to the variable amplifier VG2 before being input to the speaker SPK1, and amplifies or suppresses the speaker signal using the second gain instructed by the audio feedback detector HWD. For example, the variable amplifier VG2 amplifies or suppresses the speaker signal based on the adjusted second gain from the gain adjustment unit 29. Therefore, when the second gain is reduced due to the adjustment performed by the gain adjustment unit 29, the speaker signal is suppressed by the variable amplifier VG2.
The configuration of the audio signal processing unit of the processor 14 is not limited to the configuration of the audio signal processing unit 141 shown in
Similarly to the audio signal processing unit 141 shown in
In the configuration example of the audio signal processing unit 141A shown in
When the determination result from the audio feedback determination unit 27 indicates that the occurrence of an audio feedback is detected, the gain adjustment unit 28A adjusts the first gain by which the microphone signal is multiplied to suppress the power (level) of the microphone signal to be lower than the current value (for example, the initial value) stored in the memory 11 based on the instruction from the audio feedback determination unit 27 and the number of times of audio feedback detection stored in the memory 11. The gain adjustment unit 28A outputs the adjusted first gain to the variable amplifier VG1A. The gain adjustment unit 28A may include the adaptive notch filter NF1. The adaptive notch filter NF1 adjusts and sets the notch suppression gain (that is, the gain for suppressing the microphone signal) based on the determination result of the audio feedback determination unit 27, and outputs the set notch suppression gain (the example of the first gain described above) to the variable amplifier VG1A.
The variable amplifier VG1A is disposed such that the microphone signal from the microphone MIC1 is input to the variable amplifier VG1A before being input to each of the frequency domain conversion unit 21 and the web conference application 142. The variable amplifier VG1A amplifies or suppresses the microphone signal using the first gain indicated by the audio feedback detector HWDA. For example, the variable amplifier VG1A amplifies or suppresses the microphone signal based on the adjusted first gain from the gain adjustment unit 28A. Therefore, when the first gain is reduced due to the adjustment performed by the gain adjustment unit 28A, the microphone signal is suppressed by the variable amplifier VG1A and is input to the web conference application 142.
Similarly to the audio signal processing unit 141 shown in
In the configuration example of the audio signal processing unit 141A shown in
When the determination result from the audio feedback determination unit 27 indicates that the occurrence of an audio feedback is detected, the gain adjustment unit 29B adjusts the second gain by which the speaker signal is multiplied to suppress the power (level) of the speaker signal to be lower than the current value (for example, the initial value) stored in the memory 11 based on the instruction from the audio feedback determination unit 27 and the number of times of audio feedback detection stored in the memory 11. The gain adjustment unit 29B outputs the adjusted second gain to the variable amplifier VG2B. The gain adjustment unit 29B may include the adaptive notch filter NF2. The adaptive notch filter NF2 adjusts and sets the notch suppression gain (that is, the gain for suppressing the speaker signal) based on the determination result of the audio feedback determination unit 27, and outputs the set notch suppression gain (an example of the second gain described above) to the variable amplifier VG2B.
The variable amplifier VG2B is disposed at a stage preceding the speaker SPK1 such that the speaker signal is input to the variable amplifier VG2B after being input to the frequency domain conversion unit 23. The variable amplifier VG2B amplifies or suppresses the speaker signal using the second gain instructed from the audio feedback detector HWDB. For example, the variable amplifier VG2B amplifies or suppresses the speaker signal based on the adjusted second gain from the gain adjustment unit 29B. Therefore, when the second gain is reduced due to the adjustment performed by the gain adjustment unit 29B, the speaker signal is suppressed by the variable amplifier VG2B and input to the speaker SPK1.
Similarly to the audio signal processing unit 141 shown in
In the configuration example of the audio signal processing unit 141C shown in
When the determination result from the audio feedback determination unit 27 indicates that the occurrence of an audio feedback is detected, the gain adjustment unit 28A adjusts the first gain by which the microphone signal is multiplied to suppress the power (level) of the microphone signal to be lower than the current value (for example, the initial value) stored in the memory 11 based on the instruction from the audio feedback determination unit 27 and the number of times of audio feedback detection stored in the memory 11. The gain adjustment unit 28A outputs the adjusted first gain to the variable amplifier VG1A. The gain adjustment unit 28A may include the adaptive notch filter NF1. The adaptive notch filter NF1 adjusts and sets the notch suppression gain (that is, the gain for suppressing the microphone signal) based on the determination result of the audio feedback determination unit 27, and outputs the set notch suppression gain (the example of the first gain described above) to the variable amplifier VG1A.
The variable amplifier VG1A is disposed such that the microphone signal from the microphone MIC1 is input to the variable amplifier VG1A before being input to each of the frequency domain conversion unit 21 and the web conference application 142. The variable amplifier VG1A amplifies or suppresses the microphone signal using the first gain indicated by the audio feedback detector HWDA. For example, the variable amplifier VG1A amplifies or suppresses the microphone signal based on the adjusted first gain from the gain adjustment unit 28A. Therefore, when the first gain is reduced due to the adjustment performed by the gain adjustment unit 28A, the microphone signal is suppressed by the variable amplifier VG1A and is input to the web conference application 142.
When the determination result from the audio feedback determination unit 27 indicates that the occurrence of an audio feedback is detected, the gain adjustment unit 29B adjusts the second gain by which the speaker signal is multiplied to suppress the power (level) of the speaker signal to be lower than the current value (for example, the initial value) stored in the memory 11 based on the instruction from the audio feedback determination unit 27 and the number of times of audio feedback detection stored in the memory 11. The gain adjustment unit 29B outputs the adjusted second gain to the variable amplifier VG2B. The gain adjustment unit 29B may include the adaptive notch filter NF2. The adaptive notch filter NF2 adjusts and sets the notch suppression gain (that is, the gain for suppressing the speaker signal) based on the determination result of the audio feedback determination unit 27, and outputs the set notch suppression gain (an example of the second gain described above) to the variable amplifier VG2B.
The variable amplifier VG2B is disposed at a stage preceding the speaker SPK1 such that the speaker signal is input to the variable amplifier VG2B after being input to the frequency domain conversion unit 23. The variable amplifier VG2B amplifies or suppresses the speaker signal using the second gain instructed from the audio feedback detector HWDB. For example, the variable amplifier VG2B amplifies or suppresses the speaker signal based on the adjusted second gain from the gain adjustment unit 29B. Therefore, when the second gain is reduced due to the adjustment performed by the gain adjustment unit 29B, the speaker signal is suppressed by the variable amplifier VG2B and input to the speaker SPK1.
Next, an overall operation procedure of the PC 10 according to the first embodiment will be described with reference to
In
The processor 14 converts the speaker signal in the time domain into the speaker signal Sp1 in the frequency domain by performing Fourier transform on the speaker signal before being input to the speaker SPK1 (St3). It is assumed that the frequency domain is, for example, 100 to 2500 Hz. The processor 14 detects the second peak frequency at which the power of the speaker signal Sp1 obtained for each bin of the speaker signal Sp1 is the maximum value based on the frequency characteristic of the speaker signal (for example, the speaker signal Sp1 shown in
Based on detection results of steps St2 and St4, the processor 14 determines whether the first peak frequency (for example, f1 and f2 shown in
On the other hand, when the processor 14 determines that the first peak frequency and the second peak frequency match each other and the time for which the peaks match each other continues for the predetermined time (for example, 100 milliseconds) or more (YES in St5), the processor 14 determines that the audio feedback occurs, increments the number of times of audio feedback detection, and stores the number of times of audio feedback detection in the memory 11 (St6).
In response to the detection of the audio feedback, the processor 14 adjusts the first gain by which the microphone signal is multiplied in order to suppress the power (level) of the microphone signal before being input to the web conference application 142 to be lower than the current value (St7), and adjusts the second gain by which the speaker signal is multiplied in order to suppress the power (level) of the speaker signal before being input to the speaker SPK1 to be lower than the current value (St7). Details of step St7 will be described later with reference to
The processor 14 suppresses the power (level) of the microphone signal before being input to the web conference application 142 using the first gain adjusted in step St7 (St8). The processor 14 suppresses the power (level) of the speaker signal before being input to the speaker SPK1 using the second gain adjusted in step St7 (St9). Based on the occurrence of an audio feedback in step St6, the processor 14 displays, on the display device 16, the display screen MSG1a (see
Next, an operation procedure of detecting the first peak frequency and the second peak frequency by PC 10 according to the first embodiment will be described with reference to
In
The processor 14 determines whether the power in the frequency domain of the microphone signal of the target bin is larger than a multiplication result of the average power calculated in step St11 and the threshold A read from the memory 11 (St12). When the processor 14 determines that the power in the frequency domain of the microphone signal of the target bin is smaller than the multiplication result of the average power calculated in step St11 and the threshold A read from the memory 11 (NO in St12), the processor 14 determines that the power of the target bin is not the peak, and ends the processing shown in
On the other hand, when the processor 14 determines that the power in the frequency domain of the microphone signal of the target bin is larger than the multiplication result of the average power calculated in step St11 and the threshold A read from the memory 11 (YES in St12), the processor 14 determines whether the power in the frequency domain of the microphone signal of the target bin is larger than the power in the frequency band of −1 bin from the target bin (that is, the frequency band reduced by 1 bin from the target bin) (St13). When the processor 14 determines that the power in the frequency domain of the microphone signal of the target bin is smaller than the power in the frequency band of −1 bin from the target bin (that is, the frequency band reduced by 1 bin from the target bin) (NO in St13), the processor 14 determines that the power of the target bin is not the peak, and ends the processing shown in
On the other hand, when the processor 14 determines that the power in the frequency domain of the microphone signal of the target bin is larger than the power in the frequency band of −1 bin from the target bin (that is, the frequency band reduced by 1 bin from the target bin) (YES in St13), the processor 14 determines whether the power in the frequency domain of the microphone signal of the target bin is larger than the power in the frequency band of +1 bin from the target bin (that is, the frequency band increased by 1 bin from the target bin) (St14). When the processor 14 determines that the power in the frequency domain of the microphone signal of the target bin is smaller than the power in the frequency band of +1 bin from the target bin (that is, the frequency band increased by 1 bin from the target bin) (NO in St14), the processor 14 determines that the power of the target bin is not the peak, and ends the processing shown in
On the other hand, when the processor 14 determines that the power in the frequency domain of the microphone signal of the target bin is larger than the power in the frequency band of +1 bin from the target bin (that is, the frequency band increased by 1 bin from the target bin) (YES in St14), the processor 14 determines whether the power in the frequency domain of the microphone signal of the target bin is larger than the threshold B read from the memory 11 (St15). When the processor 14 determines that the power in the frequency domain of the microphone signal of the target bin is smaller than the threshold B read from the memory 11 (NO in St15), the processor 14 determines that the power of the target bin is not the peak, and ends the processing shown in
On the other hand, when the processor 14 determines that the power in the frequency domain of the microphone signal of the target bin is larger than the threshold B read from the memory 11 (YES in St15), the processor 14 determines that the power in the frequency domain of the microphone signal of the target bin is the peak (St16). After step St16, the processing shown in
Next, the operation procedure of adjusting the first gain and the second gain by the PC 10 according to the first embodiment will be described with reference to
In
On the other hand, when the processor 14 determines that the audio feedback is detected (YES in St21), the processor 14 determines whether the number of times of audio feedback detection stored in the memory 11 is 1 (that is, whether the audio feedback detection is a first detection, that is, a detection at a first time) (St22).
When the processor 14 determines that the audio feedback detection of the first detection (YES in St22), the processor 14 adjusts, for example, the gain of the microphone MIC1 (that is, the first gain) to be reduced by about 3 dB from the current value (for example, the initial value) of the first gain (St23), and adjusts the gain of the speaker SPK1 (that is, the second gain) to be reduced by about 6 dB from the current value (for example, the initial value) (St24). An adjustment amount (decrease amount) of the first gain, which is 3 dB, and an adjustment amount (decrease amount) of the second gain, which is 6 dB, are merely examples, and the processor 14 can effectively suppress the audio feedback while suppressing interruption of the web conference by adjusting the second gain to be largely reduced than the first gain. That is, it is possible to suppress the audio feedback at an early stage by largely reducing the second gain, and it is possible to suppress the audio feedback by slightly reducing the first gain, and it is possible to suppress a situation in which the utterance of the PC 10 user is rapidly decreased and other users cannot hear the utterance of the PC 10 user.
On the other hand, when the processor 14 determines that the detection of the audio feedback is not the first detection (NO in St22), the processor 14 adjusts, for example, the gain of the microphone MIC1 (that is, the first gain) and the gain of the speaker SPK1 (that is, the second gain) to be lower than the current values of the first gain and the second gain by about 1 dB (St25).
In
The processor 14 determines whether the audio feedback is detected in step St6 of
On the other hand, when the processor 14 determines that the audio feedback is detected (YES in St33), the processor 14 determines whether the number of times of audio feedback detection stored in the memory 11 is 1 (that is, whether the audio feedback detection is the first detection) (St34).
When the processor 14 determines that the audio feedback detection is the first detection (YES in St34), the processor 14 sets, for example, the notch suppression gain in each of the adaptive notch filters NF1 and NF2 to 6 dB (St35). On the other hand, when the processor 14 determines that the audio feedback detection is not the first detection (NO in St34), the processor 14 sets, for example, the notch suppression gain in each of the adaptive notch filters NF1 and NF2 to 3 dB (St36). That is, the gain adjustment processing shown in
After step St35 or step St36, the processor 14 determines operation frequency ranges of the adaptive notch filters NF1 and NF2 based on the audio feedback frequency of the audio feedback detected in step St6 (St37). For example, the processor 14 determines a frequency band in a predetermined range centered on the audio feedback frequency as the operation frequency ranges of the adaptive notch filters NF1 and NF2. As a result, the processor 14 can effectively suppress the microphone signal and the speaker signal in the operation frequency ranges set in step St37, and thus can suppress the occurrence of an audio feedback.
As described above, in the first embodiment, the PC 10 that performs audio communication via the web conference application 142 detects whether the audio feedback is present based on the correlation between the frequency characteristics of the microphone signal obtained by sound collection by the microphone MIC1 and the speaker signal output from the speaker SPK1. As a result, it is possible to suppress the erroneous detection of an audio feedback caused by the signal processing of the web conference application 142.
For example, as described above, it is assumed that the audio feedback detection is performed by observing characteristics (for example, peaks) about the frequency characteristics. At this time, when a special frequency characteristic is added to the audio signal input to the web conference application 142 by the signal processing of the web conference application 142 and the audio feedback detection is performed on the audio signal, when the added frequency characteristic matches or is similar to the frequency characteristic to be observed in an algorithm of the audio feedback detection, the audio feedback may be erroneously detected. In particular, it is general that business entities (corporates) providing the (a program for executing) the audio signal processing unit 141 and (a program for executing) the web conference application 142 are different from each other, and in this case, since it is unknown what kind of signal processing is used to output the audio signal from the audio signal processing unit 141 generated in the web conferencing application 142 and a cloud system connected to the web conferencing application 142, the erroneous detection described above may occur.
Specifically, the web conference application 142 has an echo cancellation function. That is, when the description is performed using the PC 10, the utterance of the PC 20 user or the PC user 30, which is collected by the other PC 20 or the PC 30 and transmitted via the network NW1, is output via the speaker SPK1 and collected by the microphone MIC1, but the utterance is removed by the echo cancellation function. As a result, although an audio quality in the web conference system 100 is improved, the audio signal after the echo cancellation is not optimal for the audio feedback detection because the frequency characteristics are corrected. The web conference application 142 further includes a mute function. When the mute function is executed, since the sound is not output from the web conference application 142, the audio feedback detection cannot be performed by audio processing in a subsequent stage of the web conference application 142.
In contrast, in the first embodiment, whether the audio feedback is present is determined after confirming that the correlation of the frequency characteristics is high (for example, peak positions match or are similar) in the audio signal before being input to the web conference application 142 and the audio signal after being output from the web conference application 142. Therefore, it is possible to suppress the erroneous detection of characteristics (for example, peaks) about the frequency characteristics caused by the influence of the signal processing of the web conference application 142 as the audio feedback. Further, the audio signal processing unit 141 can suppress the erroneous detection of an audio feedback even when the signal processing performed by the web conference application 142 is unknown.
It is preferable to check the correlation of the audio signals when determining whether the audio feedback is present as in the first embodiment, whereas the audio signal processing unit 141 may perform the audio feedback detection using at least the audio signal before being input to the web conference application 142, and may control the first gain and the second gain as described above based on the result of the audio feedback detection. Accordingly, it is possible to suppress the erroneous detection caused by the signal processing of the web conference application 142 and appropriately cope with the detected audio feedback. When the correlation of the frequency characteristics between the input and the output is not observed as described above, a method for the audio feedback detection may be based on the frequency characteristics of the audio signal as described above, or other methods may be used.
As described above, the PC 10 serving as an example of the audio feedback detection apparatus according to the first embodiment includes: the communication unit (for example, the communication interface 15) configured to communicate with one or more other terminals (for example, the PCs 20 and 30) via the network NW1; the microphone MIC1 configured to acquire the first audio signal based on an utterance of a talker (for example, the PC 10 user); the speaker SPK1 configured to output the second audio signal from the other terminals, the second audio signal being received by the communication unit and processed by the audio communication application (for example, the web conference application 142); and the audio signal processing unit 141 configured to detect whether the audio feedback is present based on the correlation between the frequency characteristic of the first audio signal input to the audio communication application and the frequency characteristic of the second audio signal input to the speaker SPK1. Further, one of the microphone MIC1 and the speaker SPK1 of the PC 10 is located on the audio feedback occurrence path PTH1, and the other of the microphone MIC1 and the speaker SPK1 of the PC 10 is located on the audio feedback non-occurrence path NPTH1.
Further, the audio feedback detector HWD serving as an example of the audio feedback detection apparatus according to the first embodiment includes: the first audio signal input unit configured to acquire the first audio signal collected by the microphone MIC1; the second audio signal input unit configured to acquire the second audio signal from the audio communication application to be output to the speaker SPK1; and the audio feedback determination unit 27 configured to determine whether the audio feedback is present based on the correlation between the frequency characteristic of the first audio signal input to the audio communication application and the frequency characteristic of the second audio signal input to the speaker SPK1. Further, one of the microphone MIC1 and the speaker SPK1 of the PC 10 is located on the audio feedback occurrence path PTH1, and the other of the microphone MIC1 and the speaker SPK1 of the PC 10 is located on the audio feedback non-occurrence path NPTH1.
Thereby, it is possible to suppress the occurrence of an audio feedback that may occur with the other audio input and output apparatus (for example, the PC 20) including the microphone and the speaker.
The audio signal processing unit 141 adjusts at least one of the first gain for suppressing the first audio signal and the second gain for suppressing the second audio signal in response to the detection of the audio feedback. Accordingly, the PC 10 can effectively suppress the occurrence of an audio feedback that may occur when the plurality of PCs 10 and 20 each including the microphone and the speaker are connected with each other at the same place (for example, the place B1) in the web conference using the web conference system 100 shown in
The audio signal processing unit 141 adjusts the second gain to be larger than the first gain. Accordingly, since the power (level) of the speaker signal output from the speaker SPK1 is suppressed to be larger than the power (level) of the microphone signal obtained by being collected by the microphone MIC1, the power (level) of the speaker signal output from the speaker SPK1 and going around to the microphone MIC2 of the other PC 20 is reduced, and thus it is possible to effectively suppress the audio feedback while suppressing interruption of the web conference. That is, it is possible to suppress the audio feedback at an early stage by largely reducing the second gain, and it is possible to suppress the audio feedback by slightly reducing the first gain, and it is possible to suppress a situation in which the utterance of the PC 10 user is rapidly decreased and other users cannot hear the utterance of the PC 10 user.
The audio signal processing unit 141 sets adjustment amounts of the first gain and the second gain according to the first detection of the audio feedback to be larger than adjustment amounts of the first gain and the second gain according to the second and subsequent detections of the audio feedback. Accordingly, the PC 10 can suppress the occurrence of an audio feedback as much as possible by adjusting the gains of the microphone signal and the speaker signal to suppress the power (level) when the audio feedback is detected for the first time. Further, the gains are not adjusted as much as the first time even if the audio feedback occurs for the second and subsequent time, and the PC 10 can suppress the power (level) of the microphone signal and the speaker signal in a stepwise manner by similarly adjusting the gains of the microphone signal and the speaker signal, whereby the audio feedback in the web conference system 100 can be effectively suppressed.
The audio signal processing unit 141 determines that the audio feedback occurs when the first peak frequencies f1 and f2 at which the peaks on the frequency characteristic of the first audio signal are detected match the second peak frequencies f3 and f4 at which the peaks on the frequency characteristic of the second audio signal are detected for a predetermined time or more. In
The audio signal processing unit 141 further includes the first notch filter (for example, the adaptive notch filter NF1) configured to suppress the first audio signal having a frequency in a predetermined range around a frequency at which the audio feedback is detected and the second notch filter (for example, the adaptive notch filter NF2) configured to suppress the second audio signal having the frequency in the predetermined range. Accordingly, the PC 10 can gradually reduce the occurrence of an audio feedback.
The PC 10 serving as an example of the audio feedback detection apparatus according to the first embodiment includes: the microphone MIC1 configured to acquire the first audio signal based on the utterance of the talker (for example, the PC 10 user); the communication unit (for example, the communication interface 15) configured to communicate the audio signal obtained by processing the first audio signal by the audio communication application (for example, the web conference application 142) with one or more other terminals (for example, the PCs 20 and 30) via the network NW1; the speaker SPK1 configured to output the second audio signal from the other terminals received by the communication unit and processed by the audio communication application (for example, the web conference application 142); and the audio signal processing unit 141 configured to detect whether audio feedback is present based on the first audio signal before being input to the audio communication application.
The audio feedback detector HWD serving as an example of the audio feedback detection apparatus according to the first embodiment includes: a first audio signal detection unit configured to acquire a first audio signal collected by the microphone MIC1; a second audio signal detection unit configured to acquire a second audio signal from an audio communication application to be output to the speaker SPK1; and the audio feedback detector HWD configured to determine whether audio feedback is present based on the first audio signal before being input to the audio communication application.
In this case, when the peak detection unit 22 detects the peak of the power (level) in the frequency domain in the frequency characteristic of the first audio signal (for example, the microphone signal) before being input to the audio communication application (for example, the web conference application 142), the PC 10 can detect, for example, the audio feedback when the audio feedback determination unit 27 receives the detection result of the peak detection unit 22 (see a dotted arrow in
Although the various embodiments are described above with reference to the drawings, it is needless to say that the present disclosure is not limited to such examples. It will be apparent to those skilled in the art that various alterations, modifications, substitutions, additions, deletions, and equivalents can be conceived within the scope of the claims, and it should be understood that such changes also belong to the technical scope of the present disclosure. Components in the above-described embodiments may be combined optionally within a range not departing from the spirit of the invention.
The present disclosure is useful as an audio feedback detection apparatus and an audio feedback detection method for suppressing occurrence of an audio feedback that may occur with another audio input and output apparatus including a microphone and a speaker.
Number | Date | Country | Kind |
---|---|---|---|
2021-061316 | Mar 2021 | JP | national |