This application claims the priority benefit of Taiwanese application no. 109138512, filed on Nov. 5, 2020. The entirety of the above-mentioned patent application is hereby incorporated by reference herein and made a part of this specification.
The disclosure relates to a voice conference; particularly, the disclosure relates to a conference terminal and a multi-device coordinating method for a conference.
Remote conferences allow people in different locations or spaces to make conversations, and conference-related equipment, protocols, and/or applications are also well developed. Notably, in actual cases, people may participate in a telephone or video conference in the same space using their own communication devices, respectively. During communication between these communication devices at the same time, microphones on the devices pick up sounds from speakers of several other devices, forming many unstable feedback mechanisms, causing obvious whistling sounds, and thereby affecting the conference procedure.
The embodiment of the disclosure provides a conference terminal and a multi-device coordinating method for a conference, so that a plurality of devices participate in a conference call in the same space at the same time without interference.
The multi-device coordinating method for a conference according to an embodiment of the disclosure is adapted for a plurality of conference terminals. Each conference terminal includes a sound receiver and a loudspeaker. The multi-device coordinating method includes (but is not limited to) the following steps. The conference terminals are allocated to a plurality of areas according to location relationship. Each area includes one or more conference terminals that are close in the location relationship. An input sound signal is obtained from picking up or recording a sound by the conference terminal in each area. The input sound signal of the one or more conference terminals in a first area among the areas is allocated to the one or more conference terminals in a second area among the areas to be played. The input sound signal obtained from picking up the sound by the conference terminal in each area is not played by any of the conference terminal in the same area.
The conference terminal according to an embodiment of the disclosure includes (but is not limited to) a sound receiver, a loudspeaker, a communication transceiver, and a processor. The sound receiver is configured to pick up or record a sound in order to obtain an input sound signal. The loudspeaker is configured to play a sound. The communication transceiver is configured to send or receive data. The processor is coupled to the sound receiver, the loudspeaker, and the communication transceiver. The processor is configured to determine to belong to a first area of a plurality of areas according to location relationship, send the input sound signal through the communication transceiver, and play the input sound signal of the conference terminal in a second area of the areas through the loudspeaker. The first area is different from the second area. The input sound signal obtained from picking up the sound by one or more conference terminals in each area is not played by the loudspeaker of any conference terminal in the same area.
Based on the foregoing, in the conference terminal and the multi-device coordinating method for a conference according to the embodiment of the disclosure, the area to which the conference terminal belongs is determined based on the location of the conference terminal, and the sound signal from one area is allocated as the sound signal to be played in other areas. This prevents cross-interference between sounds or whistling.
To make the aforementioned more comprehensible, several embodiments accompanied with drawings are described in detail as follows.
The accompanying drawings are included to provide a further understanding of the disclosure, and are incorporated in and constitute a part of this specification. The drawings illustrate exemplary embodiments of the disclosure and, together with the description, serve to explain the principles of the disclosure.
The conference terminals 10a-10e may each be a wired phone, a mobile phone, a tablet computer, a desktop computer, a notebook computer, or a smart speaker. The conference terminals 10a-10e each include (but is not limited to) a sound receiver 11, a loudspeaker 13, a communication transceiver 15, memory 17, and a processor 19.
The sound receiver 11 may be a microphone in any form, such as dynamic, condenser, electret condenser, or the like. The sound receiver 11 may also be a combination of an electronic element, an analog-to-digital converter, a filter, and an audio processor that receives a sound wave (e.g., a human voice, an environmental sound, a machine operation sound, etc.) and converts the sound wave into a sound signal. In an embodiment, the sound receiver 11 is configured to pick up or record a sound of a speaking person to obtain an input sound signal. The input sound signal may include a voice of the speaking person, a sound of the loudspeaker 13, and/or other environmental sounds.
The loudspeaker 13 may be a speaker or an amplifier. In an embodiment, the loudspeaker 13 is configured to play a sound.
The communication transceiver 15 is, for example, a transceiver (including but not limited to a connection interface, a signal converter, a communication protocol processing chip, and other elements) that supports Ethernet, optical fiber networks, cables, or other wired networks, and it may as well be a transceiver (including but not limited to an antenna, a digital-to-analog/analog-to-digital converter, a communication protocol processing chip, and other elements) that supports Wi-Fi, fourth generation (4G), fifth generation (5G), later generation mobile networks, or other wireless networks. In an embodiment, the communication transceiver 15 is configured to send or receive data.
The memory 17 may include a fixed or removable element in any form, such as a random access memory (RAM) device, a read only memory (ROM) device, a flash memory device, a traditional hard disk drive (HDD), a solid-state drive (SSD), or the like. In an embodiment, the memory 17 is configured to record codes, software modules, configurations, data (e.g., a sound signal, an area list, or the like), or files.
The processor 19 is coupled to the sound receiver 11, the loudspeaker 13, the communication transceiver 15, and the memory 17. The processor 19 may include a central processing unit (CPU), a graphic processing unit (GPU), or any other programmable general-purpose or special-purpose microprocessor, digital signal processor (DSP), programmable controller, field programmable gate array (FPGA), application-specific integrated circuit (ASIC), other similar elements, or a combination of the above elements. In an embodiment, the processor 19 is configured to execute all or some operations of the conference terminals 10a-10e to which the processor 19 belongs, and may load and execute the software modules, files, and data recorded in the memory 17.
The local signal management devices 30 are connected to the conference terminals 10a-10e. The local signal management devices 30 may be a computer system, a server, or a signal processing device. In an embodiment, the conference terminals 10a-10e may serve as the local signal management devices 30. In another embodiment, the local signal management devices 30 may be an independent relay device different to the conference terminals 10a-10e. In some embodiments, the local signal management devices 30 each include (but is not limited to) the same or similar communication transceiver 15, memory 17, and processor 19, and the implementation and function of the elements will not be repeatedly described.
The allocation server 50 is connected to the local signal management devices 30. The allocation server 50 may be a computer system, a server, or a signal processing device. In an embodiment, the conference terminals 10a-10e or the local signal management devices 30 may serve as the allocation server 50. In another embodiment, the allocation server 50 may be an independent cloud server different to the conference terminals 10a-10e or the local signal management devices 30. In some embodiments, the allocation server 50 includes (but is not limited to) the same or similar communication transceiver 15, memory 17, and processor 19, and the implementation and function of the elements will not be repeatedly described.
Hereinafter, a method according to an embodiment of the disclosure will be explained accompanied with the various devices, elements, and modules in the conference system 1. Depending on the implementation condition, each procedure of the method may be accordingly adjusted, and is not limited thereto.
In addition, it should be noted that, for the convenience of description, the same element may realize the same or similar operation, and will not be repeatedly described. For example, since the conference terminals 10a-10e may serve as the local signal management devices 30 or the allocation server 50, and the local signal management devices 30 may also serve as the allocation server 50, in some embodiments, therefore, the processors 19 of the conference terminals 10a-10e, the local signal management devices 30, and the allocation server 50 may each realize the same or similar method according to the embodiment of the disclosure.
In an embodiment, the conference terminals 10a-10e may determine on their own the area to which they belong. For example, a user interface provides area options related to conference room numbers for the speaking person to select from. In another embodiment, each local signal management device 30 serves as a representative of one area, and determines whether the adjacent conference terminals 10a-10e belong to the same area according to a relative distance between the conference terminals 10a-10e. For example, in
The processor 19 of each of the conference terminals 10a-10e may pick up the sound through the sound receiver 11 to obtain the respective input sound signal. For example, with a conference established through video software, voice call software, or a phone call, the speaking person may then start talking. The processor 19 may send the input sound signal through the communication transceiver 15 to the local signal management device 30 in the same area via the network. Namely, in each area, the local signal management device 30 obtains the input sound signals from picking up the sounds by the conference terminals 10a-10e in the area (S230).
In an embodiment, one of the conference terminals 10a-10e serves as the local signal management device 30 (as a master). The master may provide an application that integrates the input sound signals of the sound receivers 11 and output sound signals of the loudspeakers 13 from all of the conference terminals 10a-10e in the same area. One conference terminal (taking the conference terminal 10a as an example) in this area is selected as the master and the other conference terminals (taking the conference terminal 10b as an example) as a slave. Through virtual audio cable (VAC) technology (i.e., forwarding audio streams), the application extracts the signal of each conference terminal (taking conference terminals 10a and 10b as an example), and then send the signal to the master.
In an embodiment, the processor 19 of the local signal management device 30 or the conference terminals 10a-10e serving as the master may separate, from the input sound signal, an individual sound signal recorded from the speaking person corresponding to each of the conference terminals 10a-10e in the same area. Specifically, it may be inevitable that not only the voices of the speaking persons (assumed to be located directly in front of the conference terminals 10a-10e) using the conference terminals 10a-10e are picked up or recorded by the sound receiver 11, but other interference such as the sound of each loudspeaker 13 in the same area, the environmental noise on site, etc. is also picked up by the same sound receiver 11. For example, the sound receiver 11 of the conference terminal 10a in
Notably, in the echo cancellation technology, a relative distance between the sound source and the sound receiver 11 (related to the delay) requires to be taken into consideration. Since the conference terminals 10a and 10b or the speaking person may move, dynamic adjustment is required for the corresponding delay.
In an embodiment, the processor 19 of the local signal management device 30 or the conference terminal 10a serving as the master may separate the individual sound signal recorded from the speaking person corresponding to another conference terminal in the same area with the input sound signal of one of the conference terminals in the area serving as reference noise (S350). For example, the processor 19 may cancel noise (i.e., cancel the input sound signal B as the noise) from the input sound signal A (possibly after the echo cancellation in S310) with noise suppression (noise reduction or sound source separation) technology (e.g., generate a signal with an opposite phase to the noise sound wave, or utilizing independent components analysis (ICA), or the like) with the input sound signal B of the conference terminal 10b (possibly after the echo cancellation in S330) serving as the noise (S351), to accordingly output an individual sound signal A′ recorded from the speaking person speaking toward the conference terminal 10a. Similarly, the processor 19 may cancel noise (i.e., cancel the input sound signal A as the noise) from the input sound signal B (possibly after the echo cancellation in S330) with noise suppression technology with the input sound signal A of the conference terminal 10a (possibly after the echo cancellation in S310) serving as the noise (S353), to accordingly output an individual sound signal B′ recorded from the speaking person speaking toward the conference terminal 10b.
Notably, by analogy, input sound signals C, D, and E may be processed to accordingly separate individual sound signals C′, D′, and E′ of speaking persons, which will not be repeatedly described here. In this way, cross-interference or whistling from other areas can be avoided.
The local signal management device 30 in each area may send the input sound signal (it is possible that only one conference terminal is present in the same area, and only the echo cancellation is required) or the individual sound signals A′-E′ that are processed (it is possible that the plurality of conference terminals 10a-10e are present in the same area) to the allocation server 50 via the network. The processor 19 of the allocation server 50 may allocate the input sound signals of the conference terminals 10a-10e in one of the areas to the conference terminals 10a-10e in another of the areas to be played (S250). Specifically, in order to prevent cross-interference of the sounds or whistling in the same area, the input sound signals A-E obtained from picking up the sounds by the conference terminals 10a-10e in each area or the individual sound signals A′-E′ are not played by the loudspeaker 13 of any of the conference terminals 10a-10e in the same area.
Taking Table (1) as an example, assuming that the conference terminals 10a and 10b are in a first area, the conference terminal 10c is in a second area, and the conference terminals 10d and 10e are in a third area.
Herein, Tx represents the sound signal that is sent, and is accordingly sent to the allocation server 50 or other communication software for integration. Besides, Rx represents the sound signal that is received, and is accordingly sent to the conference terminals 10a-10e and/or the local signal management devices 30. For example, the local signal management device 30 of the first area sends the individual sound signals A′ and B′ to the allocation server 50, but only receives the individual sound signals C′, D′, and E′. The rest may be understood by analogy, and will not be repeatedly described herein. Namely, the individual sound signals A′-E′ of each speaking person are allocated to other conference terminals 10a-10e in different areas to be played (i.e., as the output sound signals A″-E″ of each of the loudspeakers 13).
Through the communication transceiver 15, the processors 19 of the conference terminals 10a-10e may be forwarded from the local signal management devices 30 with, or directly receive the input sound signal or the individual sound signal that is allocated. In an embodiment, the processors 19 of the conference terminals 10a-10e may synthesize the individual sound signals or the input sound signals of all or some of the conference terminals 10a-10e in other areas, to be played by the conference terminals 10a-10e in one of the areas (different from any of the above-mentioned other areas). For example, the conference terminal 10a may select any one or more of the individual sound signals C′, D′, and E′ for synthesis, and play a synthesized sound signal (i.e., the output sound signal A″ including the individual sound signals C′, D′, and E′) through the loudspeaker 13.
In some embodiments, each of the conference terminals 10a-10e is only allocated with one of the individual sound signals A′-E′ from other areas.
In summary of the foregoing, in the conference terminal and the multi-device coordinating method for a conference according to the embodiment of the disclosure, the conference terminals are allocated to appropriate areas, the signals are distributed by areas (e.g., sending the input sound signal in the same area and receiving only the input sound signal from other areas), sound source separation is performed on the input sound signals that are obtained from picking up the sounds, and the sounds from the conference terminals are synthesized before being played. In this way, during the conference of multiple devices in multiple spaces at the same time, cross-interference of the sounds or whistling in the same area or from different areas can be prevented.
It will be apparent to those skilled in the art that various modifications and variations can be made to the disclosed embodiments without departing from the scope or spirit of the disclosure. In view of the foregoing, it is intended that the disclosure covers modifications and variations provided that they fall within the scope of the following claims and their equivalents.
Number | Date | Country | Kind |
---|---|---|---|
109138512 | Nov 2020 | TW | national |