The invention relates to the field of voice translation, specifically to a bluetooth headset-based voice translation system and method.
With globalization of economy, exchanges between countries in business and life have become increasingly frequent, and international academic speeches are often held between countries. Due to different languages of different countries, a translation device is generally needed to communicate. In order to ensure confidentiality and convenience of the translation content, the translation device is often used with a pair of translation bluetooth headsets. Each of two parties in the communication wears a bluetooth headset. Each headset can collect voice signals, send the collected voice signals to a translation machine or cloud translation engine for translation, and finally send the translated voice signals to the other headset for playback. Although in most cases this method can achieve effective communication between two parties in different languages, this translation bluetooth headset has certain limitations. Usually, the two parties in communication are face to face and a distance between them is relatively small. For example, a pair of bluetooth headsets include a headset A and a headset B, and two persons in communication includes a person A and a person B, and the person A wears the headset A and the person B wears the headset B; when the person A is speaking, due to a close distance between the two, both of the headset A and the headset B can collect an audio signal of the person A. Since the translation machine or cloud translation engine cannot identify whether the audio signal is from the person A or the person B, the translation machine or cloud translation engine will recognize and translate any audio signal from the headset A and the headset B, that is, the translation machine or cloud translation engine will recognize and translate both the audio signal of the person A collected from the headset A and the audio signal of the person A collected from the headset B. Similarly, when only the person B is speaking, this situation will also occur, which can easily cause confusion in recognition and translation of the translation machine or cloud translation engine, resulting in inaccurate recognition and translation, affecting normal communication and exchanges, and also increase a large amount of unnecessary translation work, resulting in a waste of translation traffic resources.
An object of the present invention is to provide a bluetooth headset-based voice translation system to overcome defects and shortcomings of the related art. The system can identify a source of audio signals collected by two bluetooth headsets from which party of two parties in communication, and only send audio signals from a person wearing the headset to a translation machine or a cloud translation engine for voice recognition and translation. This solves confusion of voice recognition and translation, thereby making communication more effective; meanwhile, unnecessary voice recognition and translation are greatly reduced.
In order to achieve the above object, the present invention provides a bluetooth headset-based voice translation system, including:
According to one embodiment of the present invention, when the signal cross-correlation value of the first audio signal and the second audio signal is (0.7, 1), the first audio signal and the second audio signal are signals from the same sound source.
According to one embodiment of the present invention, when the first audio signal and the second audio signal are both from the user who wears the first translation bluetooth headset, the first gain factor G1 is set to 1, and the second gain factor G2 is set to 0; when the first audio signal and the second audio signal are both from the user who wears the second translation bluetooth headset, the first gain factor G1 is set to 0, and the second gain factor G2 is set to 1.
According to one embodiment of the present invention, the audio signal processing center further includes a signal amplitude detection module configured to detect signal amplitudes of the first audio signal and the second audio signal; when the first audio signal and the second audio signal are from the same sound source, if the signal amplitude of the first audio signal is greater than the signal amplitude of the second audio signal, the first audio signal and the second audio signal are both from the user who wears the first translation bluetooth headset; otherwise, the first audio signal and the second audio signal are both from the user who wears the second translation bluetooth headset.
An object of the present invention is to provide a bluetooth headset-based voice translation method including: firstly performing Fourier transformation on the first audio signal and the second audio signal which are respectively collected by the first translation bluetooth headset and the second translation bluetooth headset, to perform time-frequency signal processing; then, performing signal cross-correlation processing on the first audio signal and the second audio signal that have undergone the time-frequency signal processing, thereby obtaining a signal cross-correlation value of the first audio signal and the second audio signal, and judging whether the first audio signal and the second audio signal are from a same sound source according to the size of the signal cross-correlation value; when the first audio signal and the second audio signal are from the same sound source, determining a sound source position according to a time delay relationship between the first audio signal and the second audio signal; setting a gain factor G1 of the first audio signal and a gain factor G2 of the second audio signal according to sound source position information; finally, performing gain operation on the first audio signal and the second audio signal, and transmitting the first audio signal and the second audio signal that have undergone the gain operation to a translation machine or a cloud translation engine for recognition and translation, and sending translated audio signals to the first translation bluetooth headset or the second translation bluetooth headset accordingly.
According to one embodiment of the present invention, the judging whether the first audio signal and the second audio signal are from the same sound source, includes:
According to one embodiment of the present invention, the sound source position is determined according to the time delay relationship between the first audio signal and the second audio signal:
Compared with the related art, the present invention has the following beneficial effects. According to the bluetooth headset-based voice translation system and method provided in the present invention, the system can identify whether the first audio signal and the second audio signal are signals from the same sound source, and determine which sound source the first audio signal and the second audio signal from the same sound source signal are from, thereby setting corresponding gain factors for the first audio signal and the second audio signal, that is, when the first audio signal and the second audio signal are both from the user who wears the first translation bluetooth headset, the first gain factor G1=1, and the second gain factor G2=0; when the first audio signal and the second audio signal are both from the user who wears the second translation bluetooth headset, the first gain factor G1=0, and the second gain factor G2=1. The first audio signal and the second audio signal that have undergone gain operation are transmitted to the translation machine or the cloud translation engine for recognition and translation. In this way, the confusion of recognition and translation is solved, thereby making communication more effective; meanwhile, unnecessary voice recognition and translation are greatly reduced.
The technical solutions of the present invention will be described hereinafter in conjunction with embodiments of the present invention.
As shown in
The voice translation system of the present invention further includes an audio signal processing center, and also includes a storage module, a processor, a power module, etc. for realizing functions of the translation system. The audio signal processing center is a core of signal processing, and mainly includes a Fourier-transform module, a signal cross-correlation processing module, a judgment module, and a gain module. A signal collected by the first translation bluetooth headset is a first audio signal, and a signal collected by the second translation bluetooth headset is a second audio signal. The first audio signal and the second audio signal are sent to the Fourier-transform module for time-frequency signal processing, and then the first audio signal and the second audio signal are subjected to signal cross-correlation processing by the signal cross-correlation processing module, thereby obtaining a signal cross-correlation value; and according to a size of the signal cross-correlation value, the judgment module judges whether the first audio signal and the second audio signal are from a same sound source. When it is judged that the first audio signal and the second audio signal are from the same sound source, a sound source position is determined according to a time delay relationship between the first audio signal and the second audio signal. When it is judged that the first audio signal and the second audio signal are not from the same sound source, the audio signal processing center does not process the two audio signals, and only maintains original signal collection function. The gain module sets a gain factor G1 of the first audio signal and a gain factor G2 of the second audio signal according to the obtained sound source position information.
In the embodiment of the present invention, the voice translation system further includes a translation module. After the first audio signal and the second audio signal are processed by the gain module, the translation module recognizes and translates the first audio signal and the second audio signal, and sends the translated audio signals to the first translation bluetooth headset or the second translation bluetooth headset accordingly. That is, the first audio signal collected by the first translation bluetooth headset, after recognition and translation, is sent to the second translation bluetooth headset for playback; the second audio signal collected by the second translation bluetooth headset, after recognition and translation is sent to the first translation bluetooth headset for playback. In this way, accurate voice translation is achieved, and translation errors and confusion are avoided.
In one embodiment of the present invention, whether the first audio signal and the second audio signal are signals from the same sound source is judged according to the size of the signal cross-correlation value of the first audio signal and the second audio signal. When the signal cross-correlation value of the first audio signal and the second audio signal is (0.7, 1), it can be judged that the first audio signal and the second audio signal are signals from the same sound source; otherwise, the first audio signal and the second audio signal are not signals from the same sound source.
In one embodiment of the present invention, when the first audio signal and the second audio signal are signals from the same sound source, a next step of signal processing is required. For example, when the first audio signal and the second audio signal are both from the user who wears the first translation bluetooth headset, the first gain factor G1 is set to 1, and the second gain factor G2 is set to 0; when the first audio signal and the second audio signal are both from the user who wears the second translation bluetooth headset, the first gain factor G1 is set to 0, and the second gain factor G2 is set to 1. The purpose of setting the gain factor in this way is to obtain a required target voice signal for accurate recognition and translation.
In one embodiment of the present invention, the audio signal processing center further includes a signal amplitude detection module for detecting signal amplitudes of the first audio signal and the second audio signal. When the first audio signal and the second audio signal are from the same sound source, if the signal amplitude of the first audio signal is greater than the signal amplitude of the second audio signal, the first audio signal and the second audio signal are both from the user who wears the first translation bluetooth headset; otherwise, the first audio signal and the second audio signal are both from the user who wears the second translation bluetooth headset.
Another object of the present invention is to provide a bluetooth headset-based voice translation method. The method includes the following steps: firstly performing Fourier transformation on a first audio signal and a second audio signal which are respectively collected by a first translation bluetooth headset and a second translation bluetooth headset, to perform time-frequency signal processing; then, performing signal cross-correlation processing on the first audio signal and the second audio signal that have undergone the time-frequency signal processing, thereby obtaining a signal cross-correlation value of the first audio signal and the second audio signal, and judging whether the first audio signal and the second audio signal are from a same sound source according to the size of the signal cross-correlation value; when the first audio signal and the second audio signal are from the same sound source, determining a sound source position according to a time delay relationship between the first audio signal and the second audio signal; setting a gain factor G1 of the first audio signal and a gain factor G2 of the second audio signal according to sound source position information; finally, performing gain operation on the first audio signal and the second audio signal, and transmitting the first audio signal and the second audio signal to the translation machine or the cloud translation engine for recognition and translation, and sending translated audio signals to the first translation bluetooth headset or the second translation bluetooth headset accordingly.
In one embodiment of the present invention, judging whether the first audio signal and the second audio signal are from the same sound source, includes:
In one embodiment of the present invention, when the first audio signal and the second audio signal are both from the same sound source, the sound source position can be determined according to the time delay relationship between the first audio signal and the second audio signal. Then the formula (4) is transformed into:
According to characteristics of the correlation function, Rss(τ1−τ2); Rss(0), thus, the time delay between the first audio signal and the second audio signal can be calculated as τ=τ1−τ2. When the time delay τ is positive, it indicates that the first audio signal and the second audio signal are both from the user who wears the second translation bluetooth headset. On the contrary, when the time delay τ is negative, it indicates that the first audio signal and the second audio signal are both from the user who wears the first translation bluetooth headset. In one embodiment of the present invention, the position of the signal source can also be determined based on amplitudes or strengths of collected signals, that is,
where E1(*) represent energy of a signal x1, and E2(*) represent energy of a signal x2, it indicates that the first audio signal and the second audio signal are both from the user who wears the first translation bluetooth headset; on the contrary,
it indicates that the first audio signal and the second audio signal are both from the user who wears the second translation bluetooth headset.
In summary, according to the bluetooth headset-based voice translation system and method provided in the present invention, the system can identify whether the first audio signal and the second audio signal are signals from the same sound source, and determine which sound source the first audio signal and the second audio signal from the same sound source signal are from, thereby setting corresponding gain factors for the first audio signal and the second audio signal, that is, when the first audio signal and the second audio signal are both from the user who wears the first translation bluetooth headset, the first gain factor G1=1, and the second gain factor G2=0; when the first audio signal and the second audio signal are both from the user who wears the second translation bluetooth headset, the first gain factor G1=0, and the second gain factor G2=1. The first audio signal and the second audio signal that have undergone gain operation are transmitted to the translation machine or the cloud translation engine for recognition and translation. In this way, the confusion of recognition and translation is solved, thereby making communication more effective; meanwhile, unnecessary voice recognition and translation are greatly reduced.
The preferred embodiments of the present invention are described in details above, but the present invention is not limited to the above embodiments. Various changes can be made within the knowledge scope of ordinary technicians in this field without departing from the purpose of the present invention.
This application is a Bypass Continuation application of PCT International Application No. PCT/CN2023/115929 filed on Aug. 30, 2023, the entire contents of which are incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/CN2023/115929 | Aug 2023 | WO |
Child | 18888120 | US |