This application is a U.S. non-provisional application claiming the benefit of French Application No. 22 05151, filed on May 30, 2022, which is incorporated herein by reference in its entirety.
The present invention relates to an electronic processing device for an acoustic apparatus.
The invention also relates to an acoustic apparatus comprising a first microphone comprising an electroacoustic transducer adapted to receive acoustic sound waves from a sound signal coming from the vocal cords of a user and to convert said acoustic waves into a first analog signal; a second microphone including a bone-mechanically excited transducer adapted to receive vibratory oscillations of said sound signal by bone conduction and to transform said vibratory oscillations into a second analog signal; such an electronic processing device being connected to the first and the second microphones, the processing device being configured for receiving the first and the second analog signals as input, and then to deliver a corrected signal as output.
The electronic processing device comprises a hybridization module configured for calculating a hybrid signal from the first and the second analog signals.
The invention also relates to a processing method implemented by such an electronic processing device; and to a non-transitory computer-readable medium including a computer program including software instructions which, when executed by a computer, implements such processing method.
An acoustic apparatus of the above-mentioned type is known from the document FR 3 019 422 B1. The acoustic apparatus comprises the first microphone with such an electroacoustic transducer, also called an air conduction transducer; the second microphone with such a bone-mechanically excited transducer, also called a structure-borne noise transducer; means for calculating a corrected electrical signal according to the first electrical signal and the second electrical signal, the corrected electrical signal being adapted to be delivered at the output of the acoustic apparatus; and a noise reduction apparatus connected to the output of the electroacoustic transducer for reducing the noise in the first electrical signal; the calculation means being connected to the output of the noise reduction apparatus and to the output of the bone-mechanically excited transducer.
However, with such an acoustic apparatus, noise reduction is not always optimal, and relatively high background noise sometimes remains in the signal delivered at the output of the acoustic apparatus.
The aim of the invention is then to propose an electronic processing device, and an associated processing method, which can be used for further improving the reduction of noise in the signal delivered at the output of the acoustic apparatus, i.e. to reduce the presence of noise in said signal.
To this end, the subject matter of the invention is an electronic processing device for an acoustic apparatus,
With the electronic processing device according to the invention, the fact of estimating the noise in the hybrid signal calculated from the first and the second analog signals, i.e. in the hybrid signal obtained from the signals coming from the electroacoustic, or air conduction, transducer, and the bone-mechanically excited transducer, also called bone conduction transducer, or structure-borne noise transducer, can be used for a more accurate estimation of the noise, then for obtaining obtain—via the noise reduction module
Preferentially, the hybrid signal includes a plurality of successive segments, each segment corresponding to the hybrid signal during a period of time, and the processing device further includes a voice activity detection module adapted to determine whether or not each segment of the hybrid signal includes the presence of a voice, the estimation module being then configured for estimating the noise in the hybrid signal only from each segment without any voice.
Preferentially, the presence or absence of voice is determined from the second signal from the bone conduction transducer, and the presence or absence of voice is better detectable in a signal coming from a bone conduction microphone, rather than in a signal coming from an air conduction microphone.
According to other advantageous aspects of the invention, the electronic processing device comprises one or a plurality of the following features, taken individually or according to all technically possible combinations:
The invention further relates to an acoustic apparatus comprising:
According to another advantageous aspect of the invention, the acoustic apparatus further comprises two lateral acoustic modules resting on the lateral flanks of the skull and suitable for transmitting a sound signal to the auditory nerve.
The invention further relates to head fitted equipment for an operator, comprising a protective helmet, and an acoustic apparatus as defined herein.
A further subject matter of the invention is a processing method, the method being implemented by an electronic processing device connected to first and second microphones, the first microphone including an electroacoustic transducer adapted to receive acoustic sound waves from a sound signal from the vocal cords of a user and to convert said acoustic waves into a first analog signal; and the second microphone including a bone-mechanically excited transducer adapted to receive vibratory oscillations of said sound signal by bone conduction and to transform said vibratory oscillations into a second analog signal, the electronic processing device being configured for receiving as input, the first and the second analog signals and for delivering a corrected signal as output, the processing method comprising:
The invention further relates to a non-transitory computer-readable medium including a computer program including software instructions which, when executed by a computer, implement the processing method as defined hereinabove.
Such features and advantages of the invention will become clearer upon reading the following description, given only as a non-limiting example, and made with reference to the enclosed drawings, wherein:
The expression “substantially equal to” defines a relation of equality within plus or minus 10%, preferentially still within plus or minus 20%, more preferentially still within plus or minus 5%.
In
The acoustic apparatus 10 comprises a protective housing 18 and a processing device 20 arranged inside the protective housing 18, the processing device 20 being connected to the first microphone 12 and to the second microphone 14, and configured for receiving as input the first and the second analog signals and for delivering as output a corrected signal in which noise has been reduced.
In addition, the acoustic apparatus 10 further comprises two lateral acoustic modules 22, an upper arch 24, a rear arch 26 for connecting the acoustic modules and a connection cable 27, the connection cable 27 being equipped with a connector (not shown) at the end thereof. The lateral acoustic modules 22, the upper arch 24, the rear arch 26 and the connection cable 27 are known per se, e.g. from the document FR 3 019 422 B1.
The first microphone 12 is known, e.g. from the document FR 3 019 422 B1, and includes an electroacoustic transducer (not shown) adapted to receive acoustic sound waves from a sound signal coming from the vocal cords and to convert said acoustic waves into the first electrical signal. The first microphone 12 is connected to the input of the processing device 20.
The second microphone 14 is also known, e.g. from the document FR 3 019 422 B1, and includes a bone-mechanically excited transducer adapted to receive, through bone conduction, in particular through a corresponding bone of the skull, the vibratory waves of the sound signal coming from the vocal cords of the user and to convert same into the second electrical signal. The bone-mechanically excited transducer is also called a bone conduction transducer, or a structure-borne noise transducer. The second microphone 14 is also connected to the input of the processing device 20.
In the example shown in
In a variant, as illustrated in the example shown in FIG. 13 of document FR 3 019 422 B1, the second microphone 14 is not arranged in the protective housing 18, but is arranged in another additional housing, the other additional unit being connected by two connecting arms to one of the two acoustic modules 22. The bone-mechanically excited transducer of the second microphone is then arranged in the other additional housing. The other additional housing is preferentially intended for being applied in contact with the right-hand side of the users skull and is then preferentially connected to the right-hand acoustic module 22.
In a further variant, as illustrated in the example shown in FIG. 1 of document FR 3 019 422 B1, the first microphone 12 includes a protuberance, e.g. integral with the protective housing 18. According to such variant, the second microphone 14, in particular its bone-mechanically excited transducer, is arranged inside the protective housing 18.
The electronic processing device 20 comprises a hybridization module 30 connected to the first microphone 12 and to the second microphone 14; an estimation module 32 connected to the hybridization module 30; and a noise reduction module 34 connected to the hybridization module 30 and to the estimation module 32, as shown in
As an optional addition, the electronic processing device 20 further comprises a voice activity detection module 36 connected to the hybridization module 30.
In the example shown in
In the example shown in
In a variant (not shown), the hybridization module 30, the estimation module 32, the noise reduction module 34 and, as an optional addition, the voice activity detection module 36 are each produced in the form of a programmable logic component, such as an FPGA (Field Programmable Gate Array), or further of integrated circuit, such as an ASIC (Application Specific Integrated Circuit).
When the electronic processing device 20 is produced in the form of one or a plurality of software programs, i.e. in the form of a computer program, same is further adapted for being recorded on a computer-readable medium (not shown). The computer-readable medium is e.g. a medium adapted to store the electronic instructions and to be coupled to a bus of a computer system. As an example, the readable medium is an optical disk, a magneto-optical disk, a ROM memory, a RAM memory, any type of non-volatile memory (e.g. EPROM, EEPROM, FLASH, NVRAM), a magnetic card or an optical card. A computer program containing software instructions is then stored on the readable medium.
The hybridization module 30 is configured for calculating the hybrid signal from the first and the second analog signals.
The hybridization module 30 is configured, e.g., for obtaining a first filtered signal by applying to the first signal, a first filter associated with a first frequency range; for obtaining a second filtered signal by applying to the second signal, a second filter associated with a second frequency range; the hybrid signal is then calculated by summing the first filtered signal and the second filtered signal, the second frequency range being distinct from the first frequency range.
The first frequency range typically includes frequencies higher than the frequencies of the second frequency range; the first and the second frequency ranges being e.g. disjoint.
The first filter is typically a high-pass filter with a cut-off frequency fc substantially equal to 1000 Hz, the high-pass filter being e.g. a Gaussian high-pass filter. The second filter is typically a low-pass filter with a cut-off frequency also substantially equal to 1000 Hz, the low-pass filter being e.g. a Gaussian low-pass filter. In other words, the first frequency range is then the range of frequencies greater than 1000 Hz, and the second frequency range is the range of frequencies less than 1000 Hz.
In addition, the hybridization module 30 is configured for converting the first analog signal into a first digital signal as and when the first analog signal is received, and for generating successive first segments from the first digital signal.
According to such addition, the hybridization module 30 is also configured for converting the second analog signal into a second digital signal, as and when the second analog signal is received, and for generating successive second segments from the second digital signal.
According to such optional addition, the hybridization module 30 is then configured for progressively calculating hybrid segments of the hybrid signal, from the first and the second segments generated; the corrected signal then being calculated from said hybrid segments.
In the example shown in
In the example shown in
In the example shown in
In the example shown in
By convention, in the present description, for a signal denoted by x, the continuous form in time thereof is denoted by x(t), and the discretized form thereof is denoted by x[n] where n is a natural integer, n then forming a variable representing the discretized time. In the frequency domain, m represents the discrete frequency variable, between 0 and N/2, where N represents the number of samples per segment, e.g. equal to 512.
The discretized form of each signal then satisfies the following equation:
x[n]=x(n×Te) [1]
The discrete frequency variable m is typically associated with a frequency vector f[m] satisfying the following equation:
The frequency then typically varies between 0 Hz and fe/2 Hz, with a frequency step equal to fe/N.
By convention, the kth segment of the signal x is denoted by xk or xk [n], and {tilde over (X)}k[m] in the frequency domain with:
The spectral subtraction describes further the requirement of working only on the amplitude spectrum of the signal, the phase being conserved and unchanged throughout the process, with |[m]| representing the amplitude spectrum and φ(
[m]) representing the phase spectrum of xk[n], respectively. By convention, the spectrum (without any other precision) will refer thereafter to the amplitude spectrum.
In the example shown in
The hybridization module 30 is then configured e.g. for calculating the hybrid signal {tilde over (X)}khyb by summing the first filtered signal {tilde over (X)}kaer
The values of the constants α and β are preferentially adjustable, making it possible to have an output signal at an equivalent level to the input signal of the first air conduction microphone 12. Furthermore, in this way it is possible to give a possible preponderance to the air conduction signal, or to the bone conduction signal, respectively.
As an optional addition, the hybridization module 30 is configured, during the generation of the first successive segments, for generating each new first segment with samples of a preceding first segment and new samples of the first digital signal.
According to such optional addition, the hybridization module 30 is configured in a similar manner, during the generation of the successive second segments, for generating each new second segment with samples from a preceding second segment and new samples from the second digital signal.
There is then an overlap between the first successive segments thus generated, i.e. from a first segment generated to the next; and similarly between the second successive segments thus generated, i.e. from a second segment generated to the next.
An overlap ratio then corresponds to a ratio, within each new first segment, between the number of samples from the preceding first segment used and the total number of samples from the first segment, i.e. the new first segment generated; or to the ratio, within each new second segment, between the number of samples from the preceding second segment used and the total number of samples from the second segment, respectively. The overlap rate is e.g. comprised between 50% and 75%, i.e. between 0.5 and 0.75. In other words, within each new first segment, between half and three-quarters of the last samples from the preceding first segment are used; and similarly within each new second segment, between half and three-quarters of the last samples from the preceding second segment are used. The overlap between segments is illustrated in
In
In the example shown in
In
In the case of a 50% overlap, the output segment ykout typically satisfies the following equation:
The estimation module 32 is configured for estimating noise in the hybrid signal.
When, as an optional addition, the Voice Activity Detection Module (36) is configured for determine the presence of voice or absence of voice in each segment of the hybrid signal, the estimation module 32 is then configured for estimating the noise in the hybrid signal as a function of each segment with a determined absence of voice.
In other words, when the voice activity detection module 36 determines the presence of voice in a given segment, the noise spectrum is not updated. On the other hand, when the voice activity detection module 36 determines the presence of voices in a given segment, the background noise spectrum is updated. Such update of the background noise spectrum is then performed when the segment is not voice and the probability that the segment is noise is high. The robustness of the voice activity detection module 36 will provide all the more accuracy on the estimation and tracking of the noise.
According to such optional add, the estimation module 32 is typically configured for updating the background spectrum |Ñk| according to the following equation:
The noise reduction module 34 is configured for calculating the corrected signal by applying a generalized spectral subtraction algorithm to the hybrid signal and according to the estimated noise.
In the example shown in
The generalized spectral subtraction algorithm satisfies e.g. the following equation:
The generalized spectral subtraction algorithm is calculated, e.g. in amplitude, and the power coefficient γ is then equal to 1; or further in power, and the power coefficient γ is then equal to 2.
In the case of an amplitude calculation of the generalized spectral subtraction, with γ=1, little musical noise will be produced, but the estimated voice signal could be variably distorted depending on the signal-to-noise ratio. Musical noise is a set of artifacts produced during spectral subtraction, consisting of tones short in time and producing a relatively unpleasant noise.
In the case of a power calculation of the generalized spectral subtraction, with γ=2, little distortion will be created, but a non-negligible amount of musical noise could be generated.
The noise overestimation coefficient α is preferentially recalculated at each segment of index k and is then denoted by αk. Such coefficient prevents the generation of too much musical noise. To maximize the efficiency thereof, the coefficient is calculated per frequency band and depends on the signal-to-noise ratio on each of the bands.
The |[m]| spectra and |
[m]| are first cut into sub-spectra denoted by |
[m]| and |
[m]|, where j represents the number of the frequency band. Thus, j values of the signal-to-noise ratio, denoted SNRkj, each associated with a frequency band of index j, are typically calculated according to the following equation:
Then, for each signal-to-noise ratio value, the noise overestimation coefficient αk satisfies e.g. the following equation:
Overall, such calculation of the noise overestimation coefficient α can be used for overestimating the noise when the signal-to-noise ratio is low, and for reducing the introduction of musical noise artifacts.
The noise overestimation coefficient αkj is then converted so that same can be reinserted into equation (8), e.g. according to the following equation:
αk[m]=αkj∀m∈[ƒj;ƒj+1] [11]
The correction coefficient δ is a frequency correction coefficient calculated only once, typically at the beginning of the algorithm, and not changing over time.
The coefficient is a simple frequency-dependent pre-factor, in order to maximize certain frequency bands in a manner suitable for voice pick-up.
The correction coefficient δ is e.g. a piecewise constant function, satisfying the following equation:
Given the calculations made with the amplitude spectra, the estimation |{tilde over (Y)}k[m]|γ should not be negative because it would have no mathematical meaning. Thus, equation (8) includes a condition for avoiding negative values.
The noise reinsertion coefficient β can be then used for to choosing whether or not to reinsert noise in the case of potentially negative values. When the noise reinsertion coefficient β is chosen to be equal to 0, any subtraction leading to a negative value is replaced by the zero value. On the other hand, for any value greater than 0, noise is reinserted. The above keeps a part of the noise which can be perceived as a comfort noise masking a part of musical noise, if there is any.
The noise reinsertion coefficient β is generally equal to a few percent. The noise reinsertion coefficient β is e.g. substantially equal to 0.05, i.e. a reinsertion of 5% of the background noise into the output signal. Such value is a preset parameter.
It should be noted that the lower or poorer the signal-to-noise ratio, the less efficient the estimation of the denoised signal is and the more the voice will be altered. It is thus interesting to set a higher value of the noise reinsertion coefficient β in the case of a poor signal-to-noise ratio, in order to recapture some harmonics of the voice in the background noise which would otherwise be lost in the spectral subtraction.
In the example shown in
As indicated above, the frequency domain calculations are performed with the amplitude of the signal spectrum of the segment. The phase of the latter, which remains unmodified, is then reintegrated into the signal before the inverse Fourier Transform for returning to the time domain, e.g. according to the following equation:
y
k
[n]=IFFT(|{tilde over (Y)}k[m]|)
In the example shown in
The voice activity detection module 36 is configured for determining a presence of voice or an absence of voice in each segment of the hybrid signal.
The voice activity detection module 36 is configured e.g. for determining the presence of voice or the absence of voice from the second signal coming from the bone-mechanically excited transducer; and preferentially only from said second signal, without taking the first signal into account.
The second microphone 14, either bone conduction or with structure-borne noise, is adapted to measure the vibrations of the skin and the face related to the stress of the vocal cords, and can be used for picking up the voiced part of a voice signal while being very insensitive to background noise (which a priori does not make the user's skin vibrate enough to be picked up).
The advantage of using the second bone conduction microphone 14 lies in insensitivity thereof to background noise. Such insensitivity is even greater in the low frequency part of the acquired signal.
Advantageously, the voice activity detection is then carried out after a filtering in the frequency domain (also operating in the time domain) of the structure-borne noise signal. The voice activity detection module 36 is then preferentially configured for determining the presence of voice or the absence of voice from the second filtered signal coming from the second filtered signal {tilde over (X)}kost
As an optional addition, the voice activity detection module 36 is configured for calculating an RMS value for each segment of the second signal, i.e. for each second segment; then for determining the presence of voice or the absence of voice as a function of respective RMS values.
The processing is based on the calculation of the signal energy, segment by segment. However, herein, due to the noise-insensitive character of the signal of the filtered structure-borne noise microphone, the energy of the voice will always emerge from the noise floor energy. The calculation of the RMS level then makes it possible to know the energy of the signal.
As is known per se, the root mean square (RMS) value of a periodic signal is the square root of the mean square of said quantity over a given time interval or the square root of the second order moment (or variance) of the signal.
For a time segment xk[n] of N samples, the calculation of the RMS value is then typically performed via the following equation:
However, in the frequency domain, using Parseval's identity according to which energy is equal in the frequency and time domains, we obtain the following equation:
The RMS level value is optionally converted to a dBFS value from the following equation:
RMSkdB=20×log10(RMSk) [16]
The dBFS value is typically between −94 dBFS minimum (in the case of a dynamic resolution of 16 bits) and 0 dBFS maximum (for a constant signal which would be equal to 1).
As yet an optional addition, the voice activity detection module 36 is configured for determining the presence of voice or the absence of voice according to an average value of M last calculated RMS values, also known as smoothed RMS, and/or a variation of RMS value between a current RMS value and a preceding RMS value, also known as the RMS level variation rate, where M is an integer greater than or equal to 1.
According to such optional addition, the voice activity detection module 36 is configured, e.g., for determining the presence of voices if said average value is greater than or equal to a predefined mean threshold A or if said RMS value variation is greater than or equal to a predefined variation threshold B.
The value of the RMS level is likely to vary over time, and to undergo sudden variations when the microphone concerned, in particular the second microphone 14, picks up a significant vibration. The optional addition then improves the accuracy and reduces the errors of the algorithm, with averaging over the last M calculated values of the RMS level (during the last M segments). The above is implemented e.g. via a circular buffer which adds the new calculated RMS value to each new segment, deletes the last Mth value and then averages the old value. The smoothed RMS level at the kth segment, denoted by
Monitoring the value of
The value dt can correspond exactly to the time difference between two successive segments, and the variation of the RMS level will then be expressed in dB·s−1, but the latter can take very large values.
In a variant, and for convenience, the value dt is chosen to be equal to 1. Where appropriate, ΔRMSkdB is a rate of variation expressed in dB·segment−1. Such quantity is relevant because, at the moment when a discussion partner begins to speak, the RMS level increases abruptly, resulting in a positive ΔRMSkdB greater than 1 dB·segment−1. Since such quantity varies rapidly, same can be used for detecting the voice very quickly, thus preventing missing the beginning of a sentence.
Decision-making for the instantaneous voice activity detection is then defined e.g. by the following equation:
The threshold values A and B are predefined according to the dynamics of the acoustic apparatus 10, e.g. as a function of the gain of the microphone concerned, in particular of the second microphone 14, etc.
The voice activity detection calculation described hereinabove gives an instantaneous value for each successive segment (whether overlapped or not). Relying only on an instantaneous value can lead to errors, e.g. a micro-silence in the voice could create an unwanted switch to 0 of the voice activity indicator DAV. On the other hand, a very short impulse noise can lead to a voice activity indicator DAV equal to 1 for only one segment, before returning to 0. Depending on the use of the voice activity detection module 36 (with a mode where the channel is open only if DAV=1 e.g.), such behavior could cause unpleasant artifacts. For this reason, the calculation of voice activity detection is advantageously smoothed so as to avoid such artifacts.
The smoothing is carried out e.g. by using an attack time and a release time. When an instantaneous DAV voice activity indicator DAVinstk is equal to 1 at least as long as the attack time (or equivalent number of segments), then a smooth DAV voice activity indicator DAVsmoothk becomes equal to 1. On the other hand, when the voice activity indicator DAV instant DAVinstk is equal to 0 at least as long as the release time, then the smoothed voice activity indicator DAV, DAVsmoothk returns to 0. In all other cases, the smoothed voice activity indicator DAV, DAVsmoothk retains the value same had in the preceding segment. For the implementation of the smoothing, a counter Ck e.g. is used. The modification of the counter Ck is typically governed by Table 1 below, for each current segment of index k, according to the instantaneous voice activity indicator DAV, DAVinstk and to the value of the counter Ck-1 at the preceding segment of index k−1:
Decision-making for the smoothed voice activity detection is then defined e.g. by the following equation:
The operation of the electrical energy conversion system 10, and in particular of the processing device 20 according to the invention, will now be explained with reference to
The processing applied to the signal for reducing noise is performed numerically and in real-time. Indeed, when the operator uses the acoustic apparatus 10, the signal has to be denoised and sent to the discussion partner thereof as quickly as possible, seeking to reduce the latency as much as possible, with a desired value of 20 to 30 ms. For qualitative noise reduction, a minimum amount of information to be analyzed has to be available before being able to effectively reduce noise. The processing performed is then a block processing, applied segment by segment to the input signal. As indicated above, each segment typically has a duration of approximately 20 ms. Indeed, over such a period, the voice has a quasi-stationary behavior, whereas the noise has a quasi-stationary behavior over much longer durations.
In order to optimize power consumption, the sampling frequency is preferentially less than 22,050 Hz, leading to a passband in the interval [0; 11,025 Hz]. Consequently, in order to have signal segments of about 20 ms at said sampling frequency, the segments will typically contain 512 samples.
The processing applied to the signal to reduce the noise is mostly carried out in the frequency domain, which is more suitable for noise reduction because the aim is to reduce the level in the frequency bands containing the most noise. However, because of working by frequency segments, problems of discontinuities and inaccuracies could appear from one segment to another, and an overlap of the segments, with an overlap ratio preferentially greater than 50%, ideally equal to 75%, as described hereinabove, is then advantageously implemented to attenuate such problems.
During an initial step 100, the processing device 20 then calculates, via the hybridization module 30 thereof, the hybrid signal from the first and the second analog signals, coming from the first and the second microphones 12, 14, as described hereinabove.
During a subsequent optional step 110, the processing device 20 determines, via the voice activity detection module 36 thereof, a presence of voice or an absence of voice in each segment of the hybrid signal, as described hereinabove.
The processing device 20 then estimates, during the next step 120 and via the estimation module 32 thereof, the noise in the hybrid signal obtained beforehand during the hybridization step 100, as described hereinabove.
When optionally a presence of voice or an absence of voice in each segment of the hybrid signal has been determined during the voice activity detection step 110, the noise is then estimated, during the estimation step 120, in the hybrid signal according to each segment with a determined absence of voice, as described hereinabove.
Finally, during the next step 130, the processing device 20 applies, via the noise reduction module 34 thereof, the generalized spectral subtraction algorithm to the hybrid signal and according to the estimated noise, in order to calculate the corrected signal.
As indicated hereinabove, the processing method is applied in real-time or quasi-real time, with a latency of approximately 20 to 30 ms, and [is] a block processing, applied segment by segment to the input signal.
Thus, at the end of the step 130, the processing method returns to the initial step 100, and more generally, each of the steps 100, optionally 110, 120 and 130, is repeated regularly so as to be implemented for each successive segment of signal.
In
In the example shown in
In
Finally, through the two examples shown in
The curve 500 is the time-dependent representation of the signal on which is superimposed the decision taken by the voice activity detection, where the grayed out zones 510 correspond to zones for which a presence of voice has been determined, i.e. DAV=1; the other zones, either not grayed out or blank, corresponding to zones for which an absence of voice has been determined, i.e. DAV=0. In
With the processing device 20 according to the invention, a first striking element is that the waveform associated with the filtered bone conduction recording (low-pass filter) is much less marked by noise. Whatever the noise level, the voice emerges very easily therefrom. Such effect is even more visible on the representation of the RMS level of the filtered signal over time as there is a difference of almost 40 dB between the voice-related peaks and the background noise. Hence, the choice of the threshold value becomes easier and provides greater flexibility than with the processing device of the prior art. The threshold has e.g. been arbitrarily set herein at −35 dBFS, while observing that a threshold value at −25 dBFS or −45 dBFS would have given similar results. Due to such natural emergence, the generalized spectral subtraction algorithm is particularly effective and identifies the voice as well in three different noise zones.
Finally, due to the performance thereof, the processing device 20 according to the invention is adapted to accurately detect the time periods in the presence of noise alone. In such way, the averaging of the RMS level of the air conduction microphone only at the moments when DAV=0 can be used for obtaining a good estimation of the level of the background noise, represented by the curve 540.
The results clearly show the advantage of the processing device 20 according to the invention because of the significant gain in performance and in calculation cost, compared with the processing device of the prior art.
Thus, when the user is in a noisy environment, and uses the acoustic apparatus 10, e.g. with a radio, for communicating with a remote correspondent, the signal sent to the correspondent, without implementing the invention, would be altered by unwanted acquisition of a portion of background noise. The electronic processing device 20 according to the invention can be used for reducing the presence of the background noise in the signal sent to the correspondent, and in particular for filtering the voice from the noise, in order to aim to send only the effective signal to the correspondent, via the radio.
The results obtained with the electronic processing device 20 according to the invention, in particular the results presented above with reference to
It will thus be understood that the electronic processing device 20, and the associated processing method, can be used for further improving the reduction of noise in the signal delivered at the output of the acoustic apparatus 10.
Number | Date | Country | Kind |
---|---|---|---|
FR 2205151 | May 2022 | FR | national |