1. Field of the Invention
The present invention relates to a sound pickup apparatus and a method preferable for use when, for example, a plurality of conference participants in two distant conference rooms hold an audio teleconference by using a plurality of microphones, or hold a voice+television conference by adding a video further.
Particularly, the present invention relates to a sound pickup apparatus and a method to improve defects of an echo cancellation processing arising in switching an internal processing of an echo canceller to the internal processing for a new microphone immediately when a microphone is switched in a sound pickup apparatus performing an echo cancellation processing by one echo canceller for a plurality of microphones.
2. Description of the Related Art
A TV conference system having a sound pickup apparatus or a sound pickup apparatus that a picture image is added has been used to enable conference participants in two conference rooms at distant location to hold a conference.
In a sound pickup apparatus, a microphone is selected, where the microphone is used by a speaking person whose voice should be transmitted to a conference room of the other party among the speaking persons using a plurality of microphones.
In such a sound pickup apparatus, one echo canceller is set for a plurality of microphones. Because, although the echo canceller is possible to process at high speed usually, since it is realized by an expensive digital signal processor (DSP), the echo cancellation processing of a plurality of microphones is performed by one echo canceller.
The echo canceller performs the echo cancellation with performing a learning processing about a sound from the selected microphone. Therefore, in the echo canceller, learning data for an echo cancellation of each microphone is held.
When, one echo canceller performs an echo cancel processing of a plurality of microphones, further when switching from the first microphone to the second microphone is performed, if switching learning data in the echo canceller to learning data for the second microphone immediately, an occurrence that a voice from the second microphone is performed the echo cancellation processing with the learning data for the first microphone is arisen.
That is because the learning data for each microphone obtained by the learning processing in the echo canceller is based on sound data obtained ongoingly for predetermined time.
An object of the present invention is to provide a sound pickup apparatus and a method to prevent a false echo cancellation processing when switching from the first microphone to the second microphone in a sound pickup apparatus performing an echo cancellation processing to a plurality of microphone with one echo canceller.
According to a first aspect of the present invention, there is provided a sound pickup apparatus having a plurality of microphones placed based on a predetermined condition, a microphone selector detecting sound pickup signals of a plurality of the microphones and selecting the microphone having detected an effective sound pickup signal among the detected sound pickup signals, an echo cancellation processor performing an echo cancellation processing about the sound signal of the selected microphone, and an echo cancellation processing controller stopping the echo cancellation processing for a predetermined period when switching the sound signal of the microphone.
Preferably, the microphone selector cross-fades a sound signal of a microphone selected before and a sound signal of a new microphone when outputting by selecting a sound pickup signal of a new microphone, and the echo cancellation processing controller stops the echo cancellation processing in the cross-fading period.
According to a second aspect of the present invention, there is provided a sound pickup method having a microphone selection step of detecting sound pickup signals of a plurality of microphones placed based on a predetermined condition and selecting the microphone having detected an effective sound pickup signal among the detected sound pickup signals, an echo cancellation processing step of performing an echo cancellation processing about the sound signal of the selected microphone, and an echo cancellation processing control step of stopping the echo cancellation processing for a predetermined period when switching the sound signal of the microphone in the microphone selection step.
According to the present invention, an unnatural echo cancellation processing can be avoided by stopping an echo cancellation processing in selecting (changing) microphones.
These and other objects and features of the present invention will become clearer from the following description of the preferred embodiments given with reference to the accompanying drawings, in which:
Preferred embodiments of the present invention will be described with reference to the accompanying drawings.
Hereinafter, a sound pickup apparatus of an embodiment of the present invention will be explained.
As illustrated in
[Brief of Sound Pickup Apparatus]
Usually, a conversation via the communication line 920 is carried out between one speaker and another, that is, one-to-one, but in the communication apparatus of the embodiment of the present invention, a plurality of conference participants in the conference rooms 901 and 902 can converse with each other by using one communication line 920. Note that in the present embodiment, in order to avoid congestion of audio, the parties speaking at the same time (same period) are limited to one at each side.
As mentioned above, the sound pickup apparatus selects (identifies) a calling party and picks up audio of selected calling party.
The picked-up audio and the imaged video are transferred to the conference room of the other side and played in the sound pickup apparatus of the other side.
Details of Communication Apparatus
The configuration of the communication apparatus in the sound pickup apparatus according to an embodiment of the present invention will be explained referring to
As illustrated in
As illustrated in
Speech by a speaking person of the other conference room passes through the receiving and reproduction speaker 16 and upper sound output opening 14c and is diffused along the space defined by the sound reflection surface 12a of the sound reflection plate 12 and the sound reflection surface 14a of the speaker housing 14 to the entire 360 degree orientation around an axis C-C. The cross-section of the sound reflection surface 12a of the sound reflection plate 12 draws a loose trumpet type arc as illustrated. The cross-section of the sound reflection surface 12a forms the illustrated sectional shape over 360 degrees (entire orientation) around the axis C-C. Similarly, the cross-section of the sound reflection surface 14a of the speaker housing 14 draws a loose convex shape as illustrated. The cross-section of the sound reflection surface 14a forms the illustrated sectional shape over 360 degrees (entire orientation) around the axis C-C.
The sound S output from the receiving and reproduction speaker 16 passes through the upper sound output opening 14c, passes through the sound output space defined by the sound reflection surface 12a and the sound reflection surface 14a and having a trumpet-like cross-section, is diffused along the surface of the table 911 on which the sound pickup apparatus is placed in the entire orientation of 360 degrees around the axis C-C, and is heard with an equal volume by all conference participants A1 to A6. In the present embodiment, the surface of the table 911 is utilized as part of the sound propagating means.
The state of diffusion of the sound S output from the receiving and reproduction speaker 16 is shown by the arrows.
The sound reflection plate 12 supports a printed circuit board 21.
The printed circuit board 21, as illustrated in a plane in
The printed circuit board 21 has dampers 18 attached to it for absorbing vibration from the receiving and reproduction speaker 16 so as to prevent vibration from the receiving and reproduction speaker 16 from being transmitted through the sound reflection plate 12, entering the microphones MC1 to MC6 etc., and becoming noise. Each damper 18 is comprised by a screw and a buffer material such as a vibration-absorbing rubber insert between the screw and the printed circuit board 21. The buffer material is fastened by the screw to the printed circuit board 21. Namely, the vibration transmitted from the receiving and reproduction speaker 16 to the printed circuit board 21 is absorbed by the buffer material. Due to this, the microphones MC1 to MC6 are not affected much by sound from the speaker 16.
Arrangement of Microphones
As illustrated in
Each of the microphones MC1 to MC6 is supported by a first microphone support member 22a and a second microphone support member 22b both having flexibility or resiliency so that it can freely rock (illustration is made for only the first microphone support member 22a and the second microphone support member 22b of the microphone MC1 for simplifying the illustration). In addition to the measure of preventing the influence of vibration from the receiving and reproduction speaker 16 by the dampers 18 using the above buffer materials, by preventing the influence of vibration from the receiving and reproduction speaker 16 by absorbing the vibration of the printed circuit board 21 vibrating by the vibration from the receiving and reproduction speaker 16 by the first and second microphone support members 22a and 22b having flexibility or resiliency, noise of the receiving and reproduction speaker 16 is avoided.
As illustrated in
The conference participants A1 to A6, as illustrated in
As a means for notification of the determination of the speaking person (microphone selection result displaying means), light emission diodes LED1 to LED6 are arranged in the vicinity of the microphones MC1 to MC6. The light emission diodes LED1 to LED6 have to be provided so as to be able be viewed from all conference participants A1 to A6 even in a state where the upper cover 11 is attached. Accordingly, the upper cover 11 is provided with a transparent window so that the light emission states of the light emission diodes LED1 to LED6 can be viewed. Naturally, openings can also be provided at the portions of the light emission diodes LED1 to LED6 in the upper cover 11, but the transparent window is preferred from the viewpoint for preventing dust from entering the microphone electronic circuit housing 2.
In order to perform the various types of signal processing explained later, the printed circuit board 21 is provided with a first digital processor (DSP1) 25, a second digital signal processor (DSP2) 26, and various types of electronic circuits 27 to 29 are arranged in a space other than the portion where the microphones MC1 to MC6 are located.
In the present embodiment, the DSP 25 is used as the signal processing means for performing processing such as filter processing and microphone selection processing together with the various types of electronic circuits 27 to 29, and the DSP 26 is used as an echo canceller.
The microprocessor 23 performs the processing for overall control of the microphone electronic circuit housing 2.
The codec 24 compresses and encodes the audio to be transmitted to the conference room of the other party.
The DSP 25 performs the various types of signal processing explained below, for example, the filter processing and the microphone selection processing.
The DSP 26 functions as the echo canceller.
In
In addition, as the microphone electronic circuit housing 2, various types of circuits such as the power supply circuit are mounted on the printed circuit board 21.
In
Note that, the A/D converters 271 to 274 can be configured as A/D converters 271 to 274 equipped with variable gain type amplification functions as well.
Sound pickup signals of the microphones MC1 to MC6 converted at the A/D converters 271 to 273 are input to the DSP 25 where various types of signal processing explained later are carried out.
As one of processing results of the DSP 25, the result of selection of one of the microphones MC1 to MC6 is output to the light emission diodes LED1 to LED6 as one of the examples of the microphone selection result displaying means.
The processing result of the DSP 25 is output to the DSP 26 where the echo cancellation processing is carried out. The DSP 26 has for example an echo cancellation transmitter and an echo cancellation receiver.
The processing results of the DSP 26 are converted to analog signals at the D/A converters 281 and 282. The output from the D/A converter 281 is encoded at the codec 24 according to need, output to a line-out terminal of the telephone line 920 (
The audio from the communication apparatus disposed in the conference room of the other party is input via the line-in terminal of the telephone line 920 (
The output from the D/A converter 282 is output as sound from the receiving and reproduction speaker 16 of the communication apparatus via the amplifier 292. Namely, the conference participants A1 to A6 can also hear audio emitted by the speaking parties in the conference room via the receiving and reproduction speaker 16 in addition to the audio of the selected speaking person of the conference room of the other party from the receiving and reproduction speaker 16 explained above.
Microphones MC1 to MC6
In each single directivity characteristic microphone, as illustrated in
When using microphones having directivity shown in
When not having microphones having directivity as in the embodiment of the present invention, but using microphones having no directivity, all sounds around the microphones are picked up, therefore the S/N's of the audio of the speaking person with the surrounding noise are mixed, so a good sound can not be picked up so much. In order to avoid this, in the present invention, by picking up the sounds by one directivity microphones, the S/N with the surrounding noise is enhanced.
Further, as the method for obtaining the directivity of the microphones, a microphone array using a plurality of no directivity microphones can be used. With this method, however, complex processing is required for matching the time axes (phases) of the plurality of signals, therefore a long time is taken, the response is low, and the hardware configuration becomes complex. Namely, complex signal processing is required also for the signal processing system of the DSP. The present invention solves such a problem by using microphones having directivity exemplified in
Further, to combine microphone array signals to utilize microphones as directivity sound pickup microphones, there is the disadvantage that the outer shape is restricted by the pass frequency characteristic and the outer shape becomes large. The present invention also solves this problem.
The sound pickup apparatus having the above configuration has the following advantages.
(1) The positional relationships between the even number of microphones MC1 to MC6 arranged at equal angles radially and at equal intervals and the receiving and reproduction speaker 16 are constant and further the distances thereof are very close, therefore the level of the sound issued from the receiving and reproduction speaker 16 directly coming back is overwhelmingly larger and dominant than the level of the sound issued from the receiving and reproduction speaker 16 passing through the conference room (room) environment and coming back to the microphones MC1 to MC6. Due to this, the characteristic (signal levels (intensities), frequency characteristic (f characteristic), and phases) of arrival of the sounds from the speaker 16 to the microphones MC1 to MC6 are always the same. That is, the sound pickup apparatus in the embodiment of the present invention has the advantage that the transmission function is always the same.
(2) Therefore, there is the advantage that the transmission function when switching the output of the microphone transmitted to the conference room of the other party when the speaking person changes does not change and it is not necessary to adjust the gain of the microphone system whenever the microphone is switched. In other words, there is the advantage that it is not necessary to re-do the adjustment once adjustment is carried out at the time of manufacture of the communication apparatus.
(3) Even if switching the microphone when the speaking person changes for the same reason as above, a single echo canceller (DSP) 26 is sufficient. A DSP is expensive. Further, it is not necessary to arrange a plurality of DSPs on a printed circuit board 21 having little empty space because various members are mounted on it. In addition, the space for arranging the DSP on the printed circuit board 21 may be small. As a result, the printed circuit board 21 and, in turn, the communication apparatus of the present invention can be made small.
(4) As explained above, since the transmission functions between the receiving and reproduction speaker 16 and the microphones MC1 to MC6 are constant, there is the advantage for example that adjustment of the sensitivity difference of the microphones of +3 dB can be carried out solely by the microphone unit of the communication apparatus. Details of the adjustment of the sensitivity difference will be explained later.
(5) By using a round table or a polygonal table as the table on which the sound pickup apparatus is mounted, a speaker system for equally dispersing (scattering) audio having an equal quality in the entire orientation of 360 degrees about the axis C by one receiving and reproduction speaker 16 in the communication apparatus 1 becomes possible.
(6) There is the advantage that the sound output from the receiving and reproduction speaker 16 is propagated through the table surface of the round table (boundary effect) and good quality sound effectively arrives at the conference participants equally and with a good efficiency, the sound and the phase of opposite side are cancelled in a ceiling direction of the conference room and become small, there is a little reflected sound from the ceiling direction at the conference participants, and as a result a clear sound is distributed to the participants.
(7) The sound output from the receiving and reproduction speaker 16 arrives at the microphones MC1 to MC6 arranged at equal angles radially and at equal intervals with the same volume simultaneously, therefore a decision of whether sound is audio of a speaking person or received audio becomes easy. As a result, erroneous decision in the microphone selection processing is reduced. Details thereof will be explained later.
(8) By arranging an even number of, for example, six, microphones at equal angles radially and at equal intervals so that a facing pair of microphones are arranged on a straight line, the level comparison for detecting the direction can be easily carried out.
(9) By the dampers 18, the microphone support members 22 and so on, the influence of vibration due to the sound of the receiving and reproduction speaker 16 exerted upon the sound pickup of the microphones MC1 to MC6 can be reduced.
(10) As illustrated in
In the sound pickup apparatus explained referring to
The number of microphones is not limited to six. Any number of microphones, for example, four or eight, may be arranged at equal angles radially and at equal intervals about the axis C so that a plurality of pairs are located on straight lines (in the same direction), for example, like the microphones MC1 and MC4. The reason that two microphones, for example MC1 and MC4, are arranged on a straight line facing each other as a preferable embodiment is for selecting the microphone and identifying the speaking person.
Content of Signal Processing
Hereinafter, the content of the processing performed mainly by the first digital signal processor (DSP) 25 will be explained.
(1) Measurement of Surrounding Noise
As an initial operation, preferably, the noise of the surroundings where the sound pickup apparatus is disposed is measured.
The sound pickup apparatus can be used in various environments (conference rooms). In order to achieve correct selection of the microphone and raise the performance of the sound pickup apparatus, in the present invention, at the initial stage, the noise of the surrounding environment where the sound pickup apparatus is disposed is measured to enable elimination of the influence of that noise from the signals picked up at the microphones.
Naturally, when the sound pickup apparatus is repeatedly used in the same conference room, the noise is measured in advance, so this processing can be omitted when the state of the noise does not change. Note that the noise can also be measured in the normal state.
(2) Selection of Chairman
For example, when using the sound pickup apparatus for a two-way conference, it is advantageous if there is a chairperson who runs the proceedings in the conference rooms. Accordingly, as an aspect of the present invention, in the initial stage using the sound pickup apparatus, the chairman is set from the operation unit 15 of the sound pickup apparatus. As a method for setting the chairperson, for example the first microphone MC1 located in the vicinity of the operation unit 15 is used as the chairman's microphone. Naturally, the chairperson's microphone may be any microphone.
Note that, when the chairperson repeatedly using the sound pickup apparatus is the same, this processing can be omitted. Alternatively, the microphone at the position where the chairperson sits may be determined in advance too. In this case, no operation for selection of the chairperson is necessary each time.
Naturally, the selection of the chairperson is not limited to the initial state and can be carried out at any time.
(3) Adjustment of Sensitivity Difference of Microphones
As the initial operation, preferably the gain of the amplification unit for amplifying signals of the microphones MC1 to MC6 or the attenuation value of the attenuation unit is automatically adjusted so that the acoustic couplings between the receiving and reproduction speaker 16 and the microphones MC1 to MC6 become equal.
As the usual processing, various types of processings exemplified below are carried out.
(1) Processing for Selection and Switching of Microphones
When a plurality of conference participants simultaneously speak in one conference room, the audio is mixed and hard to understand by the conference participants A1 to A6 in the conference room of the other party. Therefore, in the present invention, in principle, only one person is allowed to speak in a certain time interval. For this, the DSP 25 performs processing for selecting and switching the microphone.
As a result, only the speech from the selected microphone is transmitted to the communication apparatus 1 of the conference room of the other party via the telephone line 920 and output from the speaker. Naturally, as explained by referring to
This processing aims to select the signal of the single directivity microphone facing to the speaking person and to send a signal having a good S/N to the other party as the transmission signal.
(2) Display of Selected Microphone
Whether a microphone of the speaking person is selected and which is the microphone of the conference participant permitted to speak is made easy to recognize by all of the conference participants A1 to A6 by turning on the corresponding microphone selection result displaying means, for example, the light emission diodes LED1 to LED6.
(3) Signal Processing
As a background art of the above microphone selection processing or in order to execute the processing for the microphone selection correctly, various types of signal processing exemplified below are carried out.
Measurement of Floor (Environment) Noise
This processing is divided into initial processing immediately after turning on the power of the sound pickup apparatus and the normal processing.
Note that, the processing is carried out under the following typical preconditions.
Generation of Various Types of Frequency Component Signals by Filter Processing
The sound pickup signals of microphones are processed at an analog low cut filter 101 having a cut-off frequency of for example 100 Hz, the filtered voice signals from which the frequency of 100 Hz or less was removed are output to the A/D converter 102, and the sound pickup signals converted to the digital signals at the A/D converter 102 are stripped of their high frequency components at the digital high cut filters 103a to 103e (referred to overall as 103) having cut-off frequencies of 7.5 kHz, 4 kHz, 1.5 kHz, 600 Hz, and 250 Hz (high cut processing). The results of the digital high cut filters 103a to 103e are further subtracted by the filter signals of the adjacent digital high cut filters 103a to 103e in the subtracters 104a to 104d (referred to overall as 104).
In this embodiment of the present invention, the digital high cut filters 103a to 103e and the subtracters 104a to 104e are actually realized by processing in the DSP 25. The A/D converter 102 can be realized as part of the A/D converter block 27.
Band-Pass Filter Processing and Microphone Signal Level Conversion Processing
As one of the triggers for start of the microphone selection processing, the start and end of the speech is judged. The signal used for this is obtained by the band-pass filter processing and the level conversion processing illustrated in
Each of the level conversion units 202a to 202g has a signal absolute value processing unit 203 and a peak hold processing unit 204. Accordingly, as illustrated by the waveform diagram, the signal absolute value processing unit 203 inverts the sign when receiving as input a negative signal indicated by a broken line to converts the same to a positive signal. The peak hold processing unit 204 holds the maximum value of the output signals of the signal absolute value processing unit 203. Note that in the present embodiment, the held maximum value drops a little along with the elapse of time. Naturally, it is also possible to improve the peak hold processing unit 204 to reduce the amount of drop and enable the maximum value to be held for a long time.
The band-pass filter will be explained next. The band-pass filter used in the communication apparatus 1 is for example comprised of just a secondary IIR high cut filter and a low cut filter of the microphone signal input stage. The present embodiment utilizes the fact that if a signal passed through the high cut filter is subtracted from a signal having a flat frequency characteristic, the remainder becomes substantially equivalent to a signal passed through the low cut filter.
In order to match the frequency-level characteristic, one extra band of the band-pass filters of the full band-pass becomes necessary. The required band-pass is obtained by the number of bands and filter coefficients of the number of bands of the band-pass filters +1. The band frequency of the band-pass filter required this time is the following six bands of band-pass filters shown in the followings per channel (CH) of the microphone signal:
In this method, the computation program of the IIR filters in the DSP 25 is only 6 CH (channel)×5 (IIR filter)=30. Compare this with the configuration of conventional band-pass filters.
In the embodiment of the present invention, 100 Hz low cut filter processing is realized by the analog filters of the input stage. There are five cut-off frequencies of the prepared secondary IIR high cut filters: 250 Hz, 600 Hz, 1.5 kHz, 4 kHz, and 7.5 kHz. The high cut filter having the cut-off frequency of 7.5 kHz among them actually has a sampling frequency of 16 kHz, so is unnecessary, but the phase of the subtracted number is intentionally rotated in order to reduce the phenomenon of the output level of the band-pass filter being reduced due to phase rotation of the IIR filter in the step of the subtraction processing.
In the filter processing at the DSP 25 illustrated in
First Stage
[1] For the full band-pass filter, the input signal is passed through the 7.5 kHz high cut filter. This filter output signal becomes the band-pass filter output of [100 Hz-7.5 kHz] by the analog low cut matching of inputs.
[2] The input signal is passed through the 4 kHz high cut filter. This filter output signal becomes the band-pass filter output of [100 Hz-4 kHz] by combination with the input analog low cut filter.
[3] The input signal is passed through the 1.5 kHz high cut filter. This filter output signal becomes the band-pass filter output of [100 Hz-1.5 kHz] by combination with the input analog low cut filter.
[4] The input signal is passed through the 600 kHz high cut filter. This filter output signal becomes the band-pass filter output of [100 Hz-600 kHz] by combination with the input analog low cut filter.
[5] The input signal is passed through the 250 kHz high cut filter. This filter output signal becomes the band-pass filter output of [100 Hz-250 kHz] by combination with the input analog low cut filter.
Second Stage
[1] When the band-pass filter (BPF5=[4 kHz to 7.5 kHz]) executes the processing of the filter output [1]-[2] ([100 Hz to 7.5 kHz]-[100 Hz to 4 kHz]), the above signal output [4 kHz to 7.5 kHz] is obtained.
[2] When the band-pass filter (BPF4=[1.5 kHz to 4 kHz]) executes the processing of the filter output [2]-[3] ([100 Hz to 4 kHz]-[100 Hz to 1.5 kHz]), the above signal output [1.5 kHz to 4 kHz] is obtained.
[3] When the band-pass filter (BPF3=[60 kHz to 1.5 kHz]) executes the processing of the filter output [3]-[4] ([100 Hz to 1.5 kHz]-[100 Hz to 600 Hz]), the above signal output [600 Hz to 1.5 kHz] is obtained.
[4] When the band-pass filter (BPF2=[250 Hz to 600 Hz]) executes the processing of the filter output [4]-[5] ([100 Hz to 600 Hz]-[100 Hz to 250 Hz]), the above signal output [250 Hz to 600 Hz] is obtained.
[5] The band-pass filter (BPF1=[100 Hz to 250 Hz]) defines the signal of the above [5] as is as the output signal of the above [5].
[6] The band-pass filter (BPF6=[100 Hz to 600 Hz]) defines the signal of the above [4] as is as the output signal of the above [4].
The required band-pass filter output is obtained by the above processing in the DSP 25.
The input sound pickup signals MIC1 to MIC6 of the microphones are constantly updated as in Table 1 as the sound pressure level of the entire band and the six bands of sound pressure levels passed through the band-pass filter.
In Table 1, for example, L1-1 indicates the peak level when the sound pickup signal of the microphone MC1 passes through the first band-pass filter 201a. In the judgment of the start and end of speech, use is made of the microphone sound pickup signal passed through the 100 Hz to 600 Hz band-pass filter 201a illustrated in
Processing for Judgment of Start and End of Speech
Based on the value output from the sound pressure level detection unit, as illustrated in
The start judgment of speech judges the start of speech from the time when the sound pressure level data (microphone signal level (1)) passing through the 100 Hz to 600 Hz band-pass filter and converted in sound pressure level at the microphone signal conversion processing unit 202b illustrated in
The DSP 25 is designed not to detect the start of the next speech during the speech end judgment time, for example, 0.5 second, after detecting the start of speech in order to avoid the malfunctions accompanying frequent switching of the microphones.
Microphone Selection
The DSP 25 detects the direction of the speaking person in the mutual speech system and automatically selects the signal of the microphone facing to the speaking person based on the so-called “score card method”.
The sound pickup apparatus, as illustrated in
Hereinafter, a description will be given of the operation mainly using the DSP 25 in the sound pickup apparatus by referring to the flowchart of
Step S1: Monitoring of Level Conversion Signal
The signals picked up at the microphones MC1 to MC6 are converted as seven types of level data in the band-pass filter block 201 and the level conversion block 202 explained by referring to
Based on the monitor results, the DSP 25 shifts to either processing of the speaking person direction detection processing, the speaking person direction detection processing, or the speech start end judgment processing.
Step S2: Processing for Judgment of Speech Start/End
The DSP 25 judges the start and end of speech by referring to
Note that, in the processing for judgment of the start and end of speech at step S2, when the speech level becomes smaller than the speech end level, the timer of the speech end judgment time (for example 0.5 second) is activated. When the speech level is smaller than the speech end level during the speech end judgment, it is judged that the speech has ended.
When it becomes larger than the speech end level during the speech end judgment, the wait processing is entered until it becomes smaller than the speech end level again.
Step S3: Processing for Detection of Speaking Person Direction
The processing for detection of the speaking person direction in the DSP 25 is carried out by searching for the speaking person direction constantly and continuously. Thereafter, the data is supplied to the processing for judgment of the speaking person direction of step S4.
Step S4: Processing for Switching of Speaking Person Direction Microphone
The processing for judgment of timing in the processing for switching the speaking person direction microphone in the DSP 25 instructs the selection of a microphone in a new speaking person direction to the processing for switching the microphone signal of step S4 when the results of the processing of step S2 and the processing of step S3 are that the speaking person detection direction at that time and the speaking person direction which has been selected up to now are different.
However, when the chairperson's microphone has been set from the operation unit 15 and the chairperson's microphone and other conference participants simultaneously speak, priority is given to the speech of the chairperson.
At this time, the selected microphone information is displayed on the microphone selection result displaying means, for example, the light emission diodes LED1 to LED6.
Step 5: Transmission of Microphone Sound Pickup Signals
The processing for switching the microphone signal transmits only the microphone signal selected by the processing of step S4 from among the six microphone signals as, for example, the transmission signal from the first sound pickup apparatus 10A to the second sound pickup apparatus 10B of the other party via the communication line 920, so outputs it to the line-out terminal of the communication line 920 illustrated in
Judgment of Speech Start
Processing 1: The output levels of the sound pressure level detector corresponding to the six microphones and the threshold value of the speech start level are compared.
The start of speech is judged when the output level exceeds the threshold value of the speech start level. When the output levels of the sound pressure level detector corresponding to all microphones exceed the threshold value of the speech start level, the DSP 25 judges the signal to be from the receiving and reproduction speaker 16 and does not judge that speech has started. This is because the distances between the receiving and reproduction speaker 16 and all microphones MC1 to MC6 are the same, so the sound from the receiving and reproduction speaker 16 reaches all microphones MC1 to MC6 almost equally.
Processing 2: Three sets of microphones each comprised of two single directivity microphones (microphones MC1 and MC4, microphones MC2 and MC5, and microphones MC3 and MC6) obtained by arranging the six microphones illustrated in
Absolute value of (signal level of microphone 1−signal level of microphone 4) [1]
Absolute value of (signal level of microphone 2−signal level of microphone 5) [2]
Absolute value of (signal level of microphone 3−signal level of microphone 6) [3]
The DSP 25 compares the above absolute values [1], [2], and [3] with the threshold value of the speech start level and judges the speech start when the absolute value exceeds the threshold value of the speech start level.
In the case of this processing, all absolute values do not become larger than the threshold value of the speech start level unlike the processing 1 (since sound from the receiving and reproduction speaker 16 equally reaches all microphones), so judgment of whether the sound is from the receiving and reproduction speaker 16 or audio from a speaking person becomes unnecessary.
Processing for Detection of Speaking Person Direction
For the detection of the speaking person direction, the characteristic of the single directivity microphones exemplified in
The method of judgment applied as the actual processing for detecting the speaking person direction in the sound pickup apparatus according to the embodiment of the present invention will be described next.
Suitable weighting processing (0 when 0 dBFs in a 1 dB full span (1 dBFs) step, while 3 when −3 dBFs, or vice versa) is carried out with respect to the output level of each band of band-pass filter. The resolution of the processing is determined by this weighting step.
The above weighting processing is executed for each sample clock, the weighted scores of each microphone are added, the result is averaged for the constant number of samples, and the microphone signal having a small (large) total points is judged as the microphone facing the speaking person. The following Table 2 indicates the results of this as an image.
In the example illustrated in Table 2, the first microphone MC1 has the smallest total points, so the DSP 25 judges that there is a sound source (there is a speaking person) in the direction of the first microphone MC1. The DSP 25 holds the result in the form of a sound source direction microphone number.
As explained above, the DSP 25 weights the output level of the band-pass filter of the frequency band for each microphone, ranks the outputs of the bands of band-pass filters in the sequence from the microphone signal having the smallest (largest) point up, and judges the microphone signal having the first order for three bands or more as from the microphone facing the speaking person. Then, the DSP 25 prepares the score card as in the following Table 3 indicating that there is a sound source (there is a speaking person) in the direction of the first microphone MC1.
In actuality, due to the influence of the reflection of sound and standing wave according to the characteristic of the room, the result of the first microphone MC1 does not always become the top among the outputs of all band-pass filters, but if the first rank in the majority of five bands, it can be judged that there is a sound source (there is a speaking person) in the direction of the first microphone MC1. The DSP 25 holds the result in the form of the sound source direction microphone number.
The DSP 25 totals up the output level data of the bands of the band-pass filters of the microphones in the form shown in the following, judges the microphone signal having a large level as from the microphone facing the speaking person, and holds the result in the form of the sound source direction microphone number.
MIC1 Level=L1-1+L1-2+L1-1+L1-4+L1-5
MIC2 Level=L2-1+L2-2+L2-1+L2-4+L2-5
MIC3 Level=L3-1+L3-2+L3-1+L3-4+L3-5
MIC4 Level=L4-1+L4-2+L4-1+L4-4+L4-5
MIC5 Level=L5-1+L5-2+L5-1+L5-4+L5-5
MIC6 Level=L6-1+L6-2+L6-1+L6-4+L6-5
Processing for Judgment of Switch Timing of Speaking Person Direction Microphone
When activated by the speech start judgment result of step S2 of
In order to eliminate the influence of reflection sound and the standing wave in a room having a large echo, the DSP 25 prohibits the issuance of a new microphone selection command unless the speech end judgment time (for example 0.5 second) passes after switching the microphone.
It prepares two microphone selection switch timings from the microphone signal level conversion processing result of step S1 of
First Method: Time when Speech Start can be Clearly Judged
Case where speech from the direction of the selected microphone is ended and there is new speech from another direction.
In this case, the DSP 25 decides that speech is started after the speech end judgment time (for example 0.5 second) or more passes after all microphone signal levels (1) and microphone signal levels (2) become the speech end threshold value level or less and when any one microphone signal level (1) becomes the speech start threshold value level or more, determines the microphone facing the speaking person direction as the legitimate sound pickup microphone based on the information of the sound source direction microphone number, and starts the microphone signal selection switch processing of step S5.
Second Method: Case where there is New Speech of Larger Voice from Another Direction During Period where Speech is Continued
In this case, the DSP 25 starts the judgment processing after the speech end judgment time (for example 0.5 second) or more passes from the speech start (time when the microphone signal level (1) becomes the threshold value level or more).
When it judges that the sound source direction microphone number from the processing of S3 changed before the detection of the speech end and it is stable, the DSP 25 decides there is a speaking person speaking with a larger voice than the speaking person which is selected at present at the microphone corresponding to the sound source direction microphone number, determines the sound source direction microphone as the legitimate sound pickup microphone, and activates the microphone signal selection switch processing of step S5.
Processing for Switching Selection of Signal of Microphone Facing Detected Speaking Person
The DSP 25 is activated by the command selectively judged by the command from the switch timing judgment processing of the speaking person direction microphone of step S4 of
The processing for switching the selection of the microphone signal of the DSP 25 is realized by six multipliers and a six input adder as illustrated in
When the channel gain is switched to [1] or [0] as described above, there is a possibility that a clicking sound will be generated due to the level difference of the microphone signals switched. Therefore, in the sound pickup apparatus, as illustrated in
Further, by setting the maximum channel gain to other than [1], for example [0.5], the echo cancellation processing operation in the later DSP 25 can be adjusted.
As explained above, the sound pickup apparatus of the first embodiment of the present invention can be effectively applied to a call processing of a conference without the influence of noise.
The communication apparatus of the first embodiment of the present invention has the following advantages from the viewpoint of structure:
The communication apparatus of the first embodiment of the present invention has the following advantages from the viewpoint of the signal processing:
A second embodiment of the present invention will be described with reference to FIGS. 19 to 21 about a detail of an echo cancellation processing.
A sound from the other party inputted via a communication path is outputted to all directions (360 degrees) evenly from the speaker 16 of the sound pickup apparatus of this side described with reference to
On the other side, the sound from the speaker 16 is reflected by a wall, a ceiling and so on in the conference room of this side. That reflected sound is detected with overlapped with the sound of the conference participants of this side as an echo by a plurality of, for example, six microphones MC1 to MC6. Further, the sound from the speaker 16 may be entered to the microphones MC1 to MC6 directly, overlapped with the sound of the conference participants of this side as an echo and detected by the microphones MC1 to MC6.
As mentioned above, the sound detected by the microphones MC1 to MC6 may include not only a sound of the conference participants in the conference room of this side but a sound from the sound pickup apparatus of the other party.
Therefore, if such an echo signal is not removed from a sound signal detected by the microphones selected by the sound pickup apparatus of this side, a sound including the sound selected by the sound pickup apparatus as an echo is sent to the sound pickup apparatus of the other party, and a sound is heard where the sound includes the sound sent from this side and outputted from the speaker of the sound pickup apparatus of the other party as an echo. Therefore, it is necessary to remove such an echo.
The second DSP 26 operates as an echo canceller performing an above-mentioned echo cancellation processing.
Such a sound from the other party becoming an echo is not detected identically for a plurality of microphones due to a difference of a position of the microphones and a reflecting state from a wall, a ceiling and so on. Therefore, the second DSP 26 performs the echo cancellation processing for each microphone. Therefore, the second DSP 26 is referred to as an echo canceller (EC) 26.
In the present embodiment, particularly, one EC 26 performs the echo cancellation processing for a plurality of, for example, six microphones.
Since the EC 26 is realized with one DSP housing a memory, actually, it is performed a program processing in the DSP. However, in
The EC processing portion 261 performs an echo cancellation processing for a sound signal of the microphone inputted to the EC 26 by selected in the first DSP 25 performing a microphone selection processing and so on, and a signal after the processing is sent to the sound pickup apparatus of the other party via a D/A converter 281 and a line out terminal.
The memory 263 stores data used in the EC processing portion 261.
The a control processing portion in the EC 264 performs a control processing in the EC 26 such as, particularly, a timing control of the control processing in the EC processing portion 261 by cooperating with the first DSP 25.
An exemplification illustrated in
The output of two microphones MCa and MCb is inputted to the first DSP 25 via two A/D converters 27a and 27b among the A/D converters 27 illustrated in
The sound output of two microphones MCa and MCb cross-faded via the faders FDa and FDb is added by an adder ADR and outputted to the EC 26.
A brief of the switching method from one of two microphones MCa and MCb to the other with cross-fading in the first DSP 25 has been explained, however, details of selecting method of microphones and switching method is based on the above-mentioned method of the first embodiment.
A brief of the processing of the EC processing portion 261 is shown in
The EC processing portion 261 has a first switch SW1, a second switch SW2, a first and a second transmission characteristic processing portion 2611 and 2612, an adder-subtracter portion 2614 and a learning processing portion 2615.
The first switch SW1 connects any one of off-switch, the first and the second transmission characteristic processing portions 2611 and 2612 with an output signal S1 of the A/D converter 274 by the control processing portion in the EC.
The transmission characteristic processing portions 2611 and 2612 are portions generating echo cancellation components for signals of the microphones MCa and MCb respectively. They have the same transmission characteristic function and have a delay element and a filter coefficient different according to the microphones MCa and MCb. The transmission characteristic function, delay element and filter coefficient are described later.
The second switch SW2 also connects any one of off-switch, the first and the second transmission characteristic processing portion 2611 and 2612 to the adder-subtracter portion 2614 by the control processing portion in the EC 264.
Any output of connected transmission characteristic processing portions 2611 and 2612 is subtracted from a signal S25 from the adder ADR of the first DSP 25 as an echo cancellation component in the adder-subtracter portion 2614.
The echo component is estimated in the learning processing portion 2615, the delay element and the filter coefficient according to the estimated echo component are stored (updated) in the memory portion 263 and set to any of the transmission characteristic processing portions 2611 and 2612 corresponding to any one of the microphones MCa and MCb.
The echo cancellation processing in the EC processing portion 261 is an equalization filter processing regarding the delay element. The delay element is prescribed as average delay time until a microphone signal transmitted from the sound pickup apparatus of the other party is reflected by a wall, a ceiling and so on and detected by a microphone of this side, and further it reaches to the EC 26. Then, an echo signal component of amplitude that should be removed is prescribed by a filter coefficient of an equalization filter.
The transmission characteristic processing portions 2611 and 2612 are prescribed as equalization filters prescribed by a transmission function of the same configuration, however, the delay element and the filter coefficient are different according to the microphones MCa and MCb. The delay element and the filter coefficient are stored in the memory portion 263 by the learning processing portion 2615.
The learning processing portion 2615 has the transmission characteristic function equal to the transmission characteristic processing portions 2611 and 2612, inputs the output signal S1 of the A/D converter 274 showing a microphone selection signal of the sound pickup apparatus of the other party, an output signal S25 of the adder ADR in the first DSP 25 and an echo cancellation processing result signal S27 of the adder-subtracter portion 2614 continuously, learns, processes and estimates a characteristic so that an echo signal according to the microphone selection signal of the sound pickup apparatus of the other party (such as a reflection signal of the speaker 16) is removed and estimates the delay element and the filter coefficient.
The delay element and the filter coefficient obtained by estimating in the learning processing portion 2615 are stored in the memory portion 263, configure any of the transmission characteristic processing portions 2611 and 2612 connected to the adder-subtracter portion 2614 by the switches SW1 and SW2 and equalize the output signal S1 of the A/D converter 274 in any of the transmission characteristic processing portions 2611 and 2612.
An echo cancellation signal S26 is outputted to a D/A converter 281, where the echo cancellation signal S26 is a signal that the equalization signal is applied to the adder-subtracter portion 2614 and subtracted from the signal S25 in the adder-subtracter portion 2614 and echo signals (such as the reflection signal of the speaker 16) according to the microphone selection signal of the sound pickup apparatus of the other party are deleted.
In the present embodiment, the echo cancellation processing is performed about the sound signal from one microphone selected among a plurality of, for example, two microphones MCa and MCb in the exemplification illustrated in
When one of two microphones MCa and MCb is switched to the other of the two microphones, the switching signal is reported from the control portion 25MS in the first DSP25 or from the a whole control portion 23 via the control portion 25MS to the control processing portion in the EC 264. However, if the control processing portion in the EC 264 activates the switches SW1 and SW2 so that the transmission characteristic processing portions 2611 and 2612 corresponding to the selected microphone are connected to the adder-subtracter portion 2614 and if the learning processing portion 2615 switches to the microphone that the delay element and the filter coefficient stored in the memory 23 are switched, the echo cancellation processing goes wrong.
Because, since there is time lag between the signal S1 outputted from the A/D converter 274 and the echo such as a reflected sound outputted from the speaker 16 and detected by the microphones MCa and MCb, if switching a target of the echo cancellation processing immediately, the echo cancellation processing will be performed about the signal of the microphones MCa and MCb switched by the echo cancellation processing signal about the microphones MCa and MCb selected previously.
Then, in the second embodiment of the present invention, the switching of the echo cancellation processing will be performed by a method exemplified in
Hereinafter, the case of performing switching from the first microphone MCa to the second microphone MCb (selection change) will be exemplified.
At the time point t1, when the switching from the first microphone MCa to the second microphone MCb is detected, that detected signal is reported from the control portion 25MS of the first DSP 25 via the microprocessor for whole control 23 or from the control portion 25MS in the first DSP 25 directly to the control processing portion in the EC 264. Hereinafter, the case of being reported from the control portion 25MD to the control processing portion in the EC 264 directly will be described.
At the time point t2 almost same or a little late as the time point t1, the control processing portion in the EC 264 orders the learning processing portion 2615 of the EC processing portion 261 to stop its operation. At the same time, the control processing portion in the EC 264 turns off the switches SW1 and SW2 and disconnects between the transmission characteristic processing portions 2611, 2612 and the adder-subtracter portion 2614. Herewith, the echo cancellation becomes off-state, that is, the echo cancellation processing is not performed in the adder-subtracter portion 2614.
At the time point t3, the control portion 25MS in the first DSP 25 makes the microphones MCa and MCb to cross-fade as described in reference to
Cross-fading time τcf is tens of milliseconds usually, for example, about 10 milliseconds to 80 milliseconds.
At the time point t5, the control processing portion in the EC 264 reported a beginning of the cross-fading from the control portion 25MS at the time point t3 or t4 orders the learning processing portion 2615 to read out the delay element and the filter coefficient about the microphone MCb from the memory portion 263 and to set it to the switched transmission characteristic processing portion 2612. The learning processing portion 2615 learns the microphone MCb to be a target of a new echo cancellation processing, reads out the delay element and the filter coefficient for the microphone MCb from the memory portion 263 and set it to the corresponding transmission characteristic processing portion 2612.
At the time point t6, the control processing portion in the EC 264 reported finishing of cross-fading from the control portion 25MS activates the switch SW1 so that the output signal S1 of the A/D converter 274 is inputted to the transmission characteristic processing portion 2612 corresponding to the selected microphone MCb. Herewith, an echo cancellation component is calculated by using the delay element and the filter coefficient obtained beforehand and stored in the memory portion 263 in the selected transmission characteristic processing portion 2612. However, since the switch SW2 is still off in this state, the output of the transmission characteristic processing portion 2612 is not applied to the adder-subtracter portion 2614.
When assuming an output signal of the selected transmission characteristic processing portion 2612 is inputted, and the output signal is applied to the adder-subtracter portion 2614 and the echo cancellation processing is performed, the learning processing portion 2615 checks whether it reaches a state of being performed the echo cancellation processing well or not.
The learning processing portion 2615 performs the above-mentioned check continuously. When it judges that the selected microphone MCb reaches to a state able to perform the echo cancellation processing adequately or at a certain degree, the learning processing portion 2615 begins the echo cancellation processing by applying the output signal of the transmission characteristic processing portion 2612 corresponding to the selected microphone MCb.
Alternatively, without performing the above-mentioned check by the learning processing portion 2615, time between the time point t6 and t7 is defined as echo time set beforehand, and after elapsing predetermined time from the time point t6, the above-mentioned echo cancellation processing may be restart at the time point t7.
Afterward, the echo cancellation component calculated in the transmission characteristic processing portion 2612 in the adder-subtracter portion 2614 about the microphone MCb is reduced.
The learning processing portion 2615 estimates the echo cancellation component such that the sound signal from the sound pickup apparatus from the other party is removed in the output of the adder-subtracter 2614, learns the delay element and the filter coefficient for that, stores in the memory portion 263 and set them to the transmission characteristic processing portion 2612.
Therefore, even if switching from the forst microphone MCa to the second microphone MCb is performed, it can be prevented to arise an unnatural echo cancellation processing.
The echo cancellation processing in the EC processing portion 261 are exemplifications. For example, the transmission characteristic function in the transmission characteristic processing portions 2611 and 2612 and the learning processing in the learning processing portion 2615. The other echo cancellation processing can be performed.
In the present embodiment, an unnatural echo cancellation processing can be prevented by keeping the echo cancellation processing in an off state for predetermined time about an echo component having time constant or delay element.
Although the above-mentioned embodiment describes the case of performing cross-fading, when not performing cross-fading, it has only to be performed without considering cross-fading period.
Although, about the above-mentioned processing in the second DSP (echo canceller) 26, the case of performing with the EC 26 having the components exemplified in
The present embodiment is particularly effective in the case of performing an echo cancellation processing by using one EC 26 (EC processing portion 261) for sound signals of a plurality of microphones.
Further, in the above-mentioned embodiment, although it is described about the case that the delay element and the filter coefficient is set in the transmission characteristic processing portions 2611 and 2612 by using the learning processing portion 2615 and estimating the echo cancellation processing component full-time, a method without using the learning processing portion 2615 can be used.
For example, when placing the sound pickup apparatus, a transmission characteristic function is obtained for each microphone, a delay element and a filter coefficient are obtained for each microphone, they are stored in the memory portion 263 and they are used as fixed values. That is, when switching microphones, at the above-mentioned timing, for example, the control processing portion in the EC 264 sets to the transmission characteristic processing portion 2611 and 2612. According to such a method, the learning processing portion 2615 becomes unneeded, since it is not necessary to learn and to process in the learning processing portion 2615 sequentially and to estimate echo cancellation processing components, the processing of the second DSP (echo canceller) 26 is reduced.
In the present embodiment, a plurality of above-mentioned embodiments can be combined arbitrarily.
Note that the present invention is not limited to the above embodiments and includes modifications within the scope of the claims.
Number | Date | Country | Kind |
---|---|---|---|
2004-037264 | Feb 2004 | JP | national |