The present invention relates to the field of sound signal processing, and particularly relates to a joint spectrum gain adaptation module and method thereof, an audio processing system and an implementation method thereof.
Current digital audio processing systems perform signal processing on digitized sounds.
The spectrum modification module 130 of
The SCE module 170 is used to enhance the contrast between peaks and valleys of the global or local power spectrum to make it easier for listeners to obtain clues to identify speech and music. For its principle and design examples, refer to reference document 3. Yet over-enhancing spectral contrast leads to strong noise amplification that affects listening adversely. Appropriately enhancing the spectral contrast is the key to help listeners.
In conventional audio processing, the DRC module 180 is used to adjust the level and transient behavior of the input sound at each channel to modify the sound volume and the sound quality.
Referring to reference document 4, the DRC processing in hearing aids and related applications is aimed to reduce the dynamic range of the input sound at each channel, so that the result sound conforms to the reduced auditory dynamic range of the impaired ear, that is, the sound pressure level between the listener's hearing threshold to the discomfort level at each frequency, thereby mitigating the hearing loss. In
Performing DRC processing with static mapping functions, however, does not take into account the auditory masking which is the sound perception being weakened or inhibited by temporally or spectrally adjacent sounds. This effect may not be significant for normal hearing (hereinafter abbreviated as NH) listeners. As the auditory masking getting worse with the increased hearing loss (i.e. the perception get stronger influence by sounds within a wider spectral and temporal region), listeners cannot perceive the compressed sound as expected. To provide better hearing assistance for listeners, the DRC processing should be extended to deal with the auditory masking. Similarly, for the designs of the NR and SCE processing, better hearing assistance can be achieved by extending them to deal with the auditory information of hearing impaired listeners.
Further, considering a design that performs DRC processing on the input sound of each ear separately. The ratio of the sound pressures of the two ears at each frequency will be changed after the DRC processing due to the difference on the input spectra and the compression characteristics between ears. This may affect the binaural sound localization or related operations.
Furthermore, in a serial signal processing configuration, the functions of a processing stage may be cancelled out by the processing of subsequent stages, for example in
7: B. R. Glasberg, B. C. J. Moore: A model of loudness applicable to time-varying sounds. J. Audio Eng. Soc. 50 (2002) 331-341.
In view of above issues, an object of the present invention is to provide a joint spectral gain adaptation (hereinafter abbreviated as JSGA) module and a method thereof, and a corresponding audio processing system and an implementation method thereof. This design is based on a loop to feedback the difference between the output signals of the two loudness models adapted with the listener to shape the sound spectrum. Extra audio signal processing functions can be further inserted in the loop as needed, and the interaction of them is dealt with to improve the listener's perception. By applying loudness models, the JSGA design integrates the signal processing functions and associates them with the listener's hearing information to provide more appropriate hearing assistance to hearing impaired listeners.
A first aspect of the present invention provides a JSGA module comprising:
an aided-ear loudness (hereinafter abbreviated as AL) model, wherein an AL spectrum is obtained by performing computations on an aided-ear threshold elevation (hereinafter abbreviated as ATE) profile and a spectrum selected from the group consisting of an input spectrum and a first spectrum derived from the input spectrum;
a bare-ear loudness (hereinafter abbreviated as BL) model, wherein a BL spectrum is obtained by performing computations on a bare-ear threshold elevation (hereinafter abbreviated as BTE) profile and a modified spectrum previously obtained; and
a spectrum shaping (hereinafter abbreviated as SS) sub-module, wherein the modified spectrum previously obtained is passed to the BL model as an input, and a modified spectrum and a linear spectral gain (hereinafter abbreviated as LSG) vector are obtained by performing computations on the input spectrum, the BL spectrum, and a loudness spectrum selected from the group consisting of the AL spectrum and a first loudness spectrum derived from the AL spectrum.
A second aspect of the present invention provides an audio processing system comprising a JSGA module according to the first aspect, wherein a modified spectrum is obtained by performing computations on an ATE profile, a BTE profile, and an input spectrum of each frame period, the audio processing system further comprising:
an ADC unit and a DI signal obtained by performing sampling on an AI signal at a sampling period;
a FWA unit, wherein the input spectrum of each frame period is obtained by performing framing and waveform analysis on the DI signal;
a waveform synthesis unit and a DO signal obtained by performing waveform synthesis on the modified spectrum; and
a DAC unit, wherein the DO signal is converted into an AO signal at the sampling period.
A third aspect of the present invention provides an audio processing system comprising a JSGA module according to the first aspect, wherein a LSG vector is obtained by performing computations on an ATE profile, a BTE profile, and an input spectrum of each time interval, the audio processing system further comprising:
an ADC unit and a DI signal obtained by performing sampling on an AI signal at a sampling period;
an analysis filter bank and a plurality of sub-band signals obtained by performing sub-band filtering on the DI signal;
a sub-band snapshot unit, wherein the input spectrum of each time interval is obtained by performing simultaneous sampling on each sub-band signal at a time interval and ranking the simultaneously sampled values according to their corresponding sub-band center frequencies;
a sub-band signal combining unit and a DO signal obtained by performing weighted combining on the sub-band signals according to the LSG vector corresponding to each sampling period; and
a DAC unit, wherein the DO signal is converted into an AO signal at the sampling period.
A fourth aspect of the present invention provides a JSGA method applied to a JSGA module comprising an AL model, a BL model and a SS sub-module, the JSGA method comprising the following steps:
obtaining an AL spectrum with the AL model by performing computations on an ATE profile and a spectrum selected from the group consisting of an input spectrum and a first spectrum derived from the input spectrum;
passing a modified spectrum previously obtained from the SS sub-module to the BL model as an input, and obtaining a BL spectrum with the BL model by performing computations on a BTE profile and a modified spectrum previously obtained; and
obtaining a modified spectrum and a LSG vector with the SS sub-module by performing computations on the input spectrum, the BL spectrum, and a loudness spectrum selected from the group consisting of the AL spectrum and a first loudness spectrum derived from the AL spectrum.
A fifth aspect of the present invention provides a method of implementing an audio processing system comprising a step of implementing a JSGA method with a JSGA module according to the fourth aspect by performing computations on an ATE profile, a BTE profile, and an input spectrum of each frame period to obtain a modified spectrum, the method of implementing the audio processing system further comprising the following steps:
performing sampling on an AI signal at a sampling period with an ADC unit to obtain a DI signal;
performing framing and waveform analysis on the DI signal with a FWA unit to obtain the input spectrum of each frame period;
performing waveform synthesis on the modified spectrum with a waveform synthesis unit to obtain a DO signal; and
converting the DO signal into an AO signal at the sampling period with a DAC unit.
A sixth aspect of the present invention provides a method of implementing an audio processing system comprising a step of implementing a JSGA method with a JSGA module according to the fourth aspect by performing computations on an ATE profile, a BTE profile, and an input spectrum of each time interval to obtain a LSG vector, the method of implementing the audio processing system further comprising the following steps:
performing sampling on an AI signal at a sampling period with an ADC unit to obtain a DI signal;
performing sub-band filtering on the DI signal with an analysis filter bank to obtain a plurality of sub-band signals;
performing simultaneous sampling on each of the plurality of sub-band signals at a time interval and ranking the simultaneously sampled values according to their corresponding sub-band center frequencies with a sub-band snapshot unit to obtain the input spectrum of each time interval;
performing weighted combining on the plurality of sub-band signals according to the LSG vector corresponding to each sampling period with a sub-band signal combining unit to obtain a DO signal; and converting the DO signal into an AO signal at the sampling period with a DAC unit.
To make the present invention better understood by those skilled in the art to which the present invention pertains, preferred embodiments of the present invention are detailed below with the accompanying drawings to clarify the composition of the present invention and effects to be achieved.
The ADC unit 110 is used to obtain a DI signal by performing sampling on an AI signal at a time period. The AI signal and the DI signal are of monaural type (in the present invention, it means that information is associated with a single ear). The time period is referred to as the sampling period. Further, if the input signal has been digitized, the ADC unit 110 is not required.
The FWA unit 120 is used to obtain an input spectrum of monaural type of each frame period by performing framing and waveform analysis on the DI signal obtained from the ADC unit 110. Framing is used to arrange the samples of the DI signal into a sequence of equal-length, evenly-spaced, and partially-overlapped waveform frames. Assuming that each waveform frame contains NDATA samples where Non samples are overlapped between two consecutive waveform frames, each waveform frame corresponds to a time interval of (NDATA−NOVL) sampling periods, and the time interval is referred to as the frame period.
Waveform analysis is used to obtain an input spectrum of each frame period by analyzing the waveform frame of corresponding frame period. For details of the spectral analysis such as the short-time Fourier transform, refer to reference document 1.
The JSGA module 200 is used to obtain a modified spectrum and a LSG vector (not shown in
The waveform synthesis unit 140 is used to obtain a DO signal of monaural type by performing waveform synthesis such as the inverse short-time Fourier transform on the modified spectrum obtained from the JSGA module 200, that is, reconstructing a waveform frame with the modified spectrum of each frame period, weighting the reconstructed waveform frames corresponding to the adjacent frame periods by a window function, and performing overlap-addition on the weighted frames. For details of the inverse short-time Fourier transform, refer to reference document 1.
The DAC unit 150 is used to convert the DO signal obtained from the waveform synthesis unit 140 into an AO signal of monaural type at the sampling period. Further, the DO signal can also be used for other processing or stored as a digital recording file, where the DAC unit 150 is omitted in such aspect.
In the first embodiment, a DI signal is obtained with the ADC unit 110 by performing sampling on an AI at a time period. The AI signal and the DI signal are of monaural type. The time period is called a sampling period (step S3000).
Referring to paragraphs [0021] to [0022], an input spectrum of monaural type of each frame period is obtained with the FWA unit 120 by performing framing and waveform-analysis on the DI signal obtained from the ADC unit 110 (step S3100).
A modified spectrum is obtained with the JSGA module 200 by performing computations on an ATE profile, a BTE profile, and the input spectrum of each frame period obtained from the FWA unit 120. The ATE profile, the BTE profile, and the modified spectrum are of monaural type (step S3200). The structure and operation method of various embodiments of the JSGA module 200 in a monaural audio processing system or application are described below, and the supplementary description is made for the corresponding adjustment of the signal, structure and operation method of the JSGA module 200 in a binaural audio system or application.
Referring to paragraph [0024], a DO signal of monaural type is obtained with the waveform synthesis unit 140 by performing waveform synthesis on the modified spectrum obtained from the JSGA module 200 (step S3300).
The DO signal obtained from the waveform synthesis unit 140 is converted into an AO signal of monaural type at the sampling period with the DAC unit 150 (step S3400).
In the block diagram of the JSGA module of the present invention of
The subject's hearing threshold at each frequency can be obtained by interpolating the result of the pure tone audiometry (hereinafter abbreviated as PTA, measuring the hearing thresholds at specified frequencies and recording them in decibels). A threshold elevation profile contains the amount of elevation of the subject's hearing threshold relative to the corresponding NH threshold at each frequency, where the NH threshold is the expectation value of the hearing threshold of NH young listeners which is typically 6 to 10 dB higher than the NH threshold of binaural listening in reference document 7. If the listener is subjected to a single-ear PTA without wearing an assistive device, the monaural bare-ear hearing threshold at each frequency can be obtained, and the BTE profile of monaural type is derived as:
ΔTBARE(z)=Tq,BARE(z)−Tq,NH(z) (1)
where ΔTBARE (z), Tq,BARE(z) and Tq,NH (z) denote the value of the BTE profile, the bare-ear hearing threshold and the NH threshold at the frequency z, respectively. In a binaural system or application, the PTA is performed on both ears of the subject, and the result values are interpolated to obtain both a left-ear bare-ear hearing threshold and a right-ear bare-ear hearing threshold at each frequency, thus to obtain a BTE profile of binaural type (in the present invention, it means that information includes two monaural counterparts associated with the left and right ears of a listener, respectively).
The ATE profile contains the amount of elevation of the measured hearing threshold relative to the NH threshold at each frequency when the subject wears an assistive device during test. It is used as a setting of the corrected hearing ability rather than the result of a hearing test. In
ΔTAIDED(z)=ΔTBARE(z)−φ(z)·ΔTBARE(z) (2)
T
q,AIDED(Z)−(1−φ(z))·Tq,BARE(z)·φ(z)·Tq,NH(z) (3)
where ΔTAIDED(z) denotes the value of the ATE profile at the frequency z, and other notations are as aforementioned.
In Eq. (3), the aided-ear hearing threshold is expressed as a linear interpolation of the bare-ear hearing threshold and the NH threshold according to the correction ratio φ(z). Setting of φ(z)=0 implies that no correction on the amount of threshold elevation and the original hearing ability is maintained. Setting of φ(z)=½ implies half amount of the hearing threshold elevation is corrected, making the result of audiometry after correction close to that of the “½-gain rule” disclosed in reference document 5 which is simple and easy to be adopted. Further, the correction ratio corresponding to the frequency with severe threshold elevation has to be reduced in practices, that is, φ(z) is decreased as the value of the BTE profile ΔTBARE(z) at the frequency z increased to avoid listening discomfort. In a binaural system or application, a left-ear correction ratio and a right-ear correction ratio of each frequency is determined according to the BTE profile and an ATE profile of binaural type is derived with the fitting procedure 210.
The present invention argues that the listener's original hearing loss and the expected amount of correction on hearing loss should be both taken into account in an audio processing to shape the input spectra, so as to provide appropriate effects to the listener. The argument is employed by the design of the JSGA module and its variants of the present invention, wherein the loudness models of the JSGA module are used to associate the original and expected hearing loss conditions of the listener with the corresponding sound perception behaviors, and to translate the sounds into loudness spectra (in the present invention, a loudness spectrum is a vector representation of the listener's loudness perception at each frequency).
Specifically, the BL model 240 of
The audio signals in real life are usually continuously changing. When the JSGA module 200 receives the audio signals and operates, the difference between the BL spectrum and the AL spectrum (hereinafter referred to as the loudness spectrum error vector) will be continuously presented. Such loudness spectrum error vector is used to adjust the signal gain of each frequency to correct the loudness perception of the listener to achieve the expected effect of hearing assistance.
Unlike conventional designs that adjust the signal gain of each frequency step by step with various types of audio processing, the JSGA module of the present invention operates according to the feedback of the loudness spectrum error vector, and further combines various audio processing functions in the loop computations according to the functional requirements of the system, so as to associate various psychoacoustic effects of the listener with the audio processing functions and to integrate the functions to dynamically adjust the signal gain of each frequency.
In
A BL spectrum is obtained with the BL model 240 by performing computations on the BTE profile and the modified spectrum previously obtained from the SS sub-module 250, wherein the BTE profile contains the amount of elevation of a bare-ear hearing threshold relative to the NH threshold at each frequency. The modified spectrum and the BL spectrum are of monaural type. When the JSGA module 200 start to perform computations, the modified spectrum previously obtained (i.e. the initial setting of the modified spectrum) can be set equal to the input spectrum. In a binaural system or application, the modified spectrum and the BL spectrum are of binaural type.
The modified spectrum previously obtained from the SS sub-module 250 is passed to the BL model 240, and a modified spectrum and a LSG vector of monaural type are obtained with the SS sub-module 250 by performing computations on the input spectrum obtained from the FWA unit 120, the AL spectrum obtained from the AL module 230 and the BL spectrum obtained from the BL model 240 (in the next turn of the JSGA module operation, the modified spectrum becomes an input of the BL model 240 which is referred to as modified spectrum previously obtained). In some variants of the JSGA module described below, a loudness spectrum derived from the AL spectrum is an input of the SS sub-module 250 in place of the AL spectrum. In a binaural system or application, a modified spectrum and a LSG vector of binaural type are obtained with the SS sub-module 250 by performing computations on the left-ear part and the right-ear part of the input signals (such as the input spectrum, the AL spectrum, and the BL spectrum) separately.
In the field of psychoacoustics, loudness models are used to evaluate the listener's perception of sound intensity affected by the input sound and various parameters. The loudness value corresponds to the neural activity of an auditory system corresponding to the sound over a certain time period. In reference documents 6 and 7 the implementation details of different loudness models are illustrated. Those loudness models can handle time-varying wide-band sounds covering sounds presenting in real life, hence are suitable for the JSGA module of the present invention after adjusting the computations according to the interface signal formats of the AL model 230 and the BL model 240. Moreover, since the JSGA module 200 performs feedback adjustment according to the loudness spectrum error vector, responding the loudness changes caused by the difference of the hearing loss is more important to the loudness model than providing accurate loudness estimations. Deleting part of the computations not affected by the hearing loss helps to reduce the computational cost of the loudness models.
The hearing loss model 340 is used to derive a hearing loss parameter set with a threshold elevation profile (i.e. either the ATE profile or the BTE profile of
The conventional loudness model performs filtering and filter bank processing (or their equivalent processing) on the time-domain input signal to account for the filtering and frequency division functions corresponding to the outer ears to the inner ears of the auditory system, and to estimate an output level of each filter of the filter bank (hereinafter referred to as an auditory excitation). A vector where the auditory excitations are ranked according to the corresponding filter center frequencies are referred to as an excitation pattern.
Since the input of the loudness model of the present invention is a spectrum, the spectrum-to-excitation pattern conversion sub-module 360 is used to obtain an excitation pattern of monaural type by performing computations on a sound spectrum of monaural type. Each auditory excitation in the excitation pattern is calculated as:
E
p=Σk|X(k)|2|G(k)Hp(k)|2 (4)
where p denotes the filter index, Hp(k) and Ep denote the frequency response of the p-th filter and the corresponding auditory excitation, respectively, X(k) denotes the input sound spectrum of the loudness model, and G(k) denotes the lumped frequency response of the outer ear and middle ear which can be referred to reference documents 7, 8. Depending on the loudness model in used, the filter bank can be either with fixed coefficients (referring to reference document 6, using fixed filters) or with time-varying coefficients (referring to reference document 8, adjusting the filter response according to the hearing loss and the input sound level). In a binaural system or application, the spectrum-to-excitation pattern conversion sub-module 360 is used to obtain an excitation pattern of binaural type by performing aforementioned monaural computations on a left-ear sound spectrum and a right-ear sound spectrum of a sound spectrum separately.
The specific loudness estimation sub-module 320 is used to obtain a specific loudness of monaural type (in the present invention, a specific loudness is a vector of the instantaneous loudness information of a sound over frequency) by performing computations on the excitation pattern obtained from the spectrum-to-excitation pattern conversion sub-module 360 according to the hearing loss parameter set obtained from the hearing loss model 340. The computations include sub-models of the loudness model in used. Taking the loudness model of reference document 6 as an example, the computations include the loudness transformation, the forward masking, and the upward spread of masking. Taking the loudness model of reference document 8 as an example, the computations include the reduction on IHC/neural function and the loudness transformation. In a binaural system or application, the specific loudness estimation sub-module 320 is used to obtain a specific loudness of binaural type by performing the aforementioned computations on the excitation pattern according to the hearing loss parameter set obtained from the hearing loss model 340.
The temporal integration sub-module 350 is used to obtain a loudness spectrum of monaural type by performing computations on the specific loudness obtained from the specific loudness estimation sub-module 320. Referring to loudness models in reference documents 6 and 7, the specific loudness is integrated over frequency and the result is fed to a temporal integration model to approximate the effect of loudness perception getting stronger with the increasing of the sound duration. Since the loudness models of the present invention have to generate the frequency-dependent loudness information, the aforementioned integration over frequency is omitted while the temporal integration is applied on each element of the specific loudness. In a binaural system or application, the temporal integration sub-module 350 is used to obtain a loudness spectrum of binaural type by performing computations on the left-ear specific loudness and the right-ear specific loudness of the specific loudness separately.
The error measurement sub-module 510 is used to obtain a loudness spectrum error vector by performing computations on the AL spectrum obtained from the AL model 230 and the BL obtained from the BL model 240:
L
ERR.db(z)=10·log10(LBARE(z))−10·log10(LBARE(z)) (5)
where LERR.dB(z) LBARE(z), and LAIDED(z) denote the values of the loudness spectrum error vector, the BL spectrum, and the AL spectrum at the frequency z, respectively. In this embodiment, the signal quality (hereinafter abbreviated as SQ) vector of
L
ERR.dB(z)=10·log10(LAIDED(z)·WSQ(z))−10·log10(LBARE(z)) (6)
where WSQ(z) denotes the value of the SQ vector at the frequency z, and other notations are as aforementioned. In practice, WSQ(z) can be approximated by the element of the SQ vector that corresponds to the frequency closest to z. The purpose of weighting the AL spectrum by the SQ vector in Eq. (6) is to suppress the spectral gains corresponding to the low signal quality spectrum components to prevent computations of the SS sub-module 250 from enhancing the noise or interference of the input signal.
The gain adjustment sub-module 520 is used to adjust a spectral gain vector according to the loudness spectrum error vector obtained from the error measurement sub-module 510:
where GdB,tmp denotes a temporary variable, GdB,last(z), GdB(z), and GdB,MAX (z) denote the values of the spectral gain vector before adjustment, the spectral gain vector after adjustment, and the gain upper-bound vector at the frequency z, respectively, CATT(z) and CREL (z) denote the values of the loop speed control vector set at the frequency z, and are applied to loudness spectrum errors in negative sign and positive sign, respectively, and other notations are as aforementioned. When the JSGA module 200 start to perform computations, the spectral gain vector before adjustment (i.e. the initial setting of the spectral gain vector) can be set to all zeros to match the initial setting of the modified spectrum identical to the input spectrum.
The format conversion sub-module 540 is used to convert the spectral gain vector obtained from the gain adjustment sub-module 520 into a LSG vector, by performing the frequency axis adjustment and the decibel-to-linear domain conversion described as follows:
(i) Frequency axis adjustment: if a plurality of frequencies corresponding to each element of a vector, a spectrum, or a loudness spectrum are ranked into a frequency vector, the frequency vector is called the frequency axis of the vector, the spectrum, or the loudness spectrum. To properly scale the input spectrum, the spectral gain vector is adjusted in a way of matching the frequency axis with that of the input spectrum obtained from the FWA unit 120. The step is omitted if the frequency axes of the two vectors are identical, otherwise the following interpolation is calculated:
where {tilde over (G)}dB (k) and zk denote the spectral gain and the frequency after frequency axis adjustment corresponding to vector index k, respectively, zL, zU, and zMAX denote the two frequencies, low (zL) and high (zU), closest to zk on the frequency axis of the spectral gain vector and the highest frequency of the frequency axis, respectively, and zU, and zMAX correspond to the elements of the spectral gain vector GdB(zL), GdB(zU), and GdB(zMAX), respectively.
(ii) Decibel-to-linear domain conversion: each element of the spectral gain vector after frequency axis adjustment {tilde over (G)}dB(k) is passed through an exponential function to obtain the LSG vector GJSGA:
G
JSGA(k)=100.1·{tilde over (G)}
The spectrum scaling sub-module 550 is used to pass the modified spectrum previously obtained to the BL model 240, and obtain a modified spectrum by scaling the input spectrum according to the LSG vector:
X
MOD(k)=GJSGA(k)·XIN(k) (11)
where XIN(k), GJSGA(k), and XMOD(k) denote the values of the input spectrum, the LSG vector, and the modified spectrum at vector index k, respectively.
In
Referring to paragraphs [0044] to [0050] and [0055], a modified spectrum previously obtained from the spectrum shaping sub-module 250 is passed to the BL model 240, and a BL spectrum is obtained with the BL model 240 by performing computations on a BTE profile and the modified spectrum previously obtained (step S4700). Further, because of no data dependency between step S4700 and step S4200, step S4700 can also be executed before or in parallel with step S4200 without changing computation results.
Referring to paragraphs [0051] to [0055], a modified spectrum and a LSG vector are obtained with the SS sub-module 250 by performing computations on the input spectrum, the AL spectrum obtained from the AL model 230 and the BL obtained from the BL model 240 (step S4800).
The JSGA module 200 of the present invention performs computations on the input spectrum of each frame period, where the frame period is typically set between a few milliseconds and tens of milliseconds. With the current hardware capability, such computations can be easily performed more than once in this period. Therefore, the JSGA module 200 of the present invention can be modified to support iterative processing, that is, to perform more than one turn of computations of the BL model 240 and the SS sub-module 250 in one frame period, thereby reducing the value of each element of the loudness spectrum error vector.
The iterative processing is carried out in each frame period by either running a fixed number of iterations, or running iterations according to a weighted sum of the loudness spectrum error vector (hereinafter referred to as loudness spectrum difference). The latter is employed in the embodiments presented below.
By determining whether or not to continue the iterative processing according to the loudness spectrum difference, iterations mainly occur in the frame periods with relatively large loudness spectrum fluctuations over time. Due to its low probability of occurrence, this approach is good to control the average number of iterations per frame and maintain the quality of loop convergence.
To conduct iterative processing, the frame operation flow of the JSGA module 200 is changed to the flowchart of the variant of iterative processing of the JSGA module of the present invention shown in
Next, the steps of
Then, whether or not to continue the iterative processing is determined (step S4826). If the loudness spectrum difference is excessive and the iteration count does not exceed the iteration count limit, the iteration count is advanced (step S4828) and the processing flow is continued from step S4700 of
The criterion of excessive loudness spectrum difference in step S4826 is:
where RERR denotes the threshold of the loudness spectrum difference, LBARE(z), LAIDED(z), and S(z) denote the values of the BL spectrum, the AL spectrum, and a weighting vector at the frequency z, respectively. In practice, the weighting S(z) of the frequency in the hearing insensitive region or the frequency with the spectral gain reaching the upper limit can be reduced to relax this criterion to reduce the average number of iterations. In a binaural system or application, the iterative processing of the JSGA module 200 is still performed with the flow of
In each single iteration, the BL spectrum, the LSG vector, and the modified spectrum are obtained in order. If the loudness spectrum difference is lower than the threshold RERR before the iteration count reaching the limit, it indicates that the criterion of loop convergence is met, and the computations corresponding to the next frame period can be performed accordingly.
To simplify texts and figures, iterative processing is not mentioned in flowcharts and text corresponding to the following embodiments of the JSGA module of the present invention. While the operation flow of each embodiment can be modified as
The NR processing is aimed to suppress the noise of the sound based on the difference in characteristics between noise and speech, hopefully to increase the audibility or intelligibility of the sound. By attenuating the spectral components that are with relatively lower signal-to-noise ratios, the NR processing reduces the total noise power and improves the overall signal-to-noise ratio (hereinafter abbreviated as SNR) of the sound.
The NR sub-module 1300 is used to obtain a NR spectrum and a SQ vector of monaural type by performing NR processing on the input spectrum obtained from the FWA unit 120. In a binaural system or application, the NR sub-module 1300 is used to obtain a NR spectrum and a SQ vector of binaural type by performing NR processing on the left-ear input spectrum and the right-ear input spectrum of the input spectrum obtained from the FWA unit 120 separately.
The noise estimation sub-module 1310 is used to obtain a noise estimation vector by estimating the noise component of the input spectrum at each frequency. In the variants of the JSGA module of the present invention described below, if the AL spectrum is the input of the NR sub-module 1300, the noise estimation sub-module 1310 is used to obtain a noise estimation vector by estimating the noise component of the AL spectrum at each frequency.
In the signal estimation sub-module 1320, the input spectrum and the noise estimation vector are used to estimate a signal-to-noise ratio of each frequency (hereinafter referred to as a SNR estimation vector), and a NR spectrum is obtained by adjusting the input spectrum according to the SNR estimation vector. If the AL spectrum is the input of the NR sub-module 1300, the noise estimation vector and the AL spectrum are used to estimate a SNR estimation vector, and a noise reduction loudness (hereinafter abbreviated as NRL) spectrum is obtained by adjusting the AL spectrum according to the SNR estimation vector. The signal processing of noise estimation, signal estimation and SNR estimation can be referred to reference document 2, where the design considerations, implementation details, and performance description of various kinds of NR processing for speech enhancement are introduced.
The SQ estimation sub-module 1330 is used to convert the SNR estimation vector into a SQ vector (i.e. the signal quality estimation of each frequency) to provide the signal quality information required by the subsequent processing, such as the SS sub-module 250. The conversion, for example, is to pass each element of the SNR estimation vector through a monotonic function to obtain the SQ vector. The monotonic function shown in
In
In addition, the SQ vector obtained from the NR sub-module 1300 is passed to the SS sub-module 250. Referring to
Referring to paragraphs [0032] to [0035], [0044] to [0050], the AL spectrum is obtained with the AL model 230 by performing computations on the ATE profile obtained by the fitting procedure 210 and the NR spectrum obtained from the NR sub-module 1300 (step S4202). Since step S4700 of
Referring to paragraphs [0051] to [0055], the modified spectrum and the LSG vector are obtained with the SS sub-module 250 by performing computations on the input spectrum, the AL spectrum obtained from the AL model 230, the BL spectrum obtained from the BL model 240, and the SQ vector obtained from the NR sub-module 1300 (step S4802).
Referring to
Referring to paragraphs [0051] to [0055], the modified spectrum and the LSG vector are obtained with the SS sub-module 250 by performing computations on the NR spectrum and the SQ vector obtained from the NR sub-module 1300, the AL spectrum obtained from the AL model 230, and the BL spectrum obtained from the BL model 240 (step S4803). Since steps S4202 and S4700 of
Owing to the high statistical correlation and identical value range between loudness spectra and the amplitude of acoustic spectra (positive values or zeros), frequency-domain NR processing performed on the amplitude of acoustic spectra can be performed on the loudness spectra, whereas different sound effects are provided. Performing NR processing on loudness spectra associates the NR processing with the hearing model of the listener which produces an effect similar to the perceptual-based NR processing in reference document 2 operating on the acoustic spectrum domain. Nonetheless, since the information of the input sound is partially lost, the loudness spectra are not suitable for directly reconstructing the waveform. In the variant of the JSGA module of the present invention, the NRL spectrum is passed to the spectral shaping sub-module 250, thereby feeding the noise reduced information back to adjust the spectral gain so that the NR processing is performed in an indirect way.
The AL spectrum obtained from the AL model 230 is passed to the NR sub-module 1300 in place of the input spectrum obtained from the FWA unit 120 of
The NRL spectrum becomes the input of the SS sub-module 250 in place of the AL spectrum of
Referring to paragraphs [0051] to [0055], the modified spectrum and the LSG vector are obtained with the SS sub-module 250 by performing computations on the SQ vector and the NRL spectrum obtained from the NR sub-module 1300, the BL spectrum obtained from the BL model 240, and the input spectrum (step S4804). Since steps S4200 and S4700 of
The meaning and effect of performing DRC processing on a loudness spectrum (also referred to as loudness spectrum compression) are different from that of performing DRC processing on an acoustic spectrum. In the JSGA module of the present invention, since the listener's hearing loss and the noise issues have been dealt with by the aforementioned sub-modules, the compression characteristics used in the loudness spectrum compression sub-module 800 can be configured according to listener's preference rather than hearing loss condition, thus the single-channel loudness spectrum compression is applicable even for listeners with large difference on the amounts of threshold elevation across frequencies.
The present invention argues that, in a binaural system or application, the audio processing has better to keep the loudness ratio between the two ears at each channel unchanged to reduce the impact to the binaural sound localization or related functions. Based on this argument, a CL spectrum of binaural type is obtained with the loudness spectrum compression sub-module 800 by performing DRC processing on the left-ear AL spectrum and the right-ear AL spectrum of the AL spectrum in the same way, that is, the loudness spectra corresponding to two ears in the frequency range of each channel are both scaled by a value referred to as channel loudness gain.
The channel loudness calculation sub-module 810 is used to obtain a channel loudness corresponding to the channel or each of the plurality of the channels by performing integration on the AL spectrum over the channel frequency range (since the loudness spectrum is represented by finite elements, the integration is represented as a summation):
L
CH=Σz=z
where CH denotes the channel index corresponding to the channel frequency between zCH_L(CH) and zCH_U(CH), LAIDED (z) and Δz denote the values of the AL spectrum and the reciprocal of the number of the loudness spectrum elements per unit frequency at frequency z, respectively. In a binaural system or application, the channel loudness is calculated as:
L
CH=Σz=z
where LAIDED,L(z) and LAIDED,R(z) denote the values of the left-ear AL spectrum and the right-ear AL spectrum of the AL spectrum at the frequency z, respectively, and other notations are as aforementioned.
The compression characteristic substitution sub-module 820 is used to obtain a channel loudness gain GCH corresponding to the channel or each of the plurality of channels, which is the ratio between the compressed channel loudness and the original channel loudness LCH corresponding to the channel or each of the plurality of channels, by substituting the channel loudness LCH corresponding to the channel or each of the plurality of channels into the channel compression characteristics corresponding to the channel or each of the plurality of channels. A channel compression characteristic shown in
The loudness spectrum scaling sub-module 830 is used to obtain a CL spectrum by scaling the AL spectrum with the channel loudness gain corresponding to the channel or each of the plurality of channels in the corresponding frequency range:
L
CMP(z)=LAIDED(z)·GCH·zCH_L(CH)≤z≤zCH_U(CH) (15)
where LCMP(z) denotes the value of the CL spectrum at the frequency z, and other notations are as aforementioned. In a binaural system or application, the CL spectrum is calculated as:
where LCMP,L(z) and LCMP,R(z) denote the values of the left-ear CL spectrum and the right-ear CL spectrum of the CL spectrum at frequency z, respectively, and other notations are as aforementioned.
The CL spectrum is passed to the SS sub-module 250 in place of the AL spectrum obtained from the AL model 230 of
Referring to paragraphs [0051] to [0055], the modified spectrum and the LSG vector are obtained with the SS sub-module 250 by performing computations on the CL spectrum obtained from the loudness spectrum compression sub-module 800, the input spectrum, and the BL spectrum obtained from the BL model 240 (step S4806). Since steps S4200 and S4700 of
Transient sounds are sounds that have dramatic volume changes in time domain, such as airs or consonants in speech, burst noise and interference sound in the living environment, and sounds introduced in audio processing. An example of the latter is that an effect of combined NR and DRC processing is to make part of the sound more prominent, since the dynamic range of the sound is increased by the NR processing, while the noise reduced sound is adjusted by subsequent dynamic range compression according to the average volume of it. At the moment of the sound suddenly appearing from a lower volume (e.g. denoise) background, the dynamic range compression keeps providing a gain for the lower volume background which makes the sound louder and even causes discomfort to the listener.
On the other hand, transient sounds such as percussion and blasting sounds may be related to safety. Hence detecting and removing transient sounds is not a widely applicable strategy. Different from the conventional transient sound processing on the sound waveform or its spectrum, the present invention proposes to reduce the total loudness of the sound to barely avoid listening discomfort by proportionally adjusting elements of the AL spectrum. Such processing is referred to as attack trimming (hereinafter abbreviated as AT).
The AT sub-module 1100 is used to obtain a trimmed loudness (hereinafter abbreviated as TL) spectrum of monaural type by performing AT processing on the AL spectrum obtained from the AL model 230. In a binaural system or application, the AT sub-module 1100 is used to obtain a TL spectrum of binaural type by performing AT processing on both the left-ear AL spectrum and the right-ear AL spectrum of the AL spectrum.
The total loudness calculation sub-module 1110 is used to obtain a total loudness LTOTAL by performing integration on the AL spectrum over frequency:
L
TOTAL=ΣzLAIDED(z)·Δz (17)
where LAIDED(z) and Δz denote the values of the AL spectrum and the reciprocal of the number of the AL spectrum elements per unit frequency at frequency z, respectively. In a binaural system or application, the total loudness is calculated as:
L
TOTAL=Σz(LAIDED,L(z)+LAIDED,R(z))·Δz (18)
where LAIDED,L(z) and LAIDED,R(z) denote the values of the left-ear AL spectrum and the right-ear AL spectrum of the AL spectrum at the frequency z, respectively, and other notations are as aforementioned.
The loudness upper-bound estimation sub-module 1120 is used to derive a loudness bound of comfortable listening LBOUND according to the total loudness obtained from the total loudness calculation sub-module 1110, for example, by performing time smoothing on the total loudness to obtain a long-term loudness LLm of the present frame period m, and deriving the loudness bound of comfortable listening according to the long-term loudness:
where LLm−1 denotes the long-term loudness of the previous frame period m−1, CATT,LL and CREL,LL denote the leaky factors of the smoothing operation on the rising and falling of the long-term loudness, respectively, CHEADROOM denotes the instantaneous loudness rising ratio acceptable by the listener, LUCL denotes the setting of a loudness value that makes the listener feel very loud, and other notations are as aforementioned. In a binaural system or application, this sub-module operates in the same way as in a monaural system or application.
The loudness limiting sub-module 1130 is used to derive a rate according to the total loudness obtained from the total loudness calculation sub-module 1110 and the loudness bound of comfortable listening obtained from the loudness upper-bound estimation sub-module 1120, and to obtain a TL spectrum by scaling down the AL spectrum with the rate:
where LTRIM(z) denotes the value of the TL spectrum at the frequency z, and other notations are as aforementioned. In a binaural system or application, the TL spectrum is calculated as:
where LTRIM,L(z) and LTRIM,R(z) denote the values of the left-ear TL spectrum and the right-ear TL spectrum of the TL spectrum at frequency z, respectively, and other notations are as aforementioned.
The TL spectrum is passed to the SS sub-module 250 in place of the AL spectrum of
Referring to paragraphs [0051] to [0055], the LSG vector and the modified spectrum are obtained with the SS sub-module 250 by performing computations on the TL spectrum obtained from the AT sub-module 1100, the BL spectrum obtained from the BL model 240, and the input spectrum (step S4808). Since steps S4200 and S4700 of
The CL spectrum obtained from the loudness spectrum compression sub-module 800 of
The TL spectrum obtained from the AT sub-module 1100 of
Referring to paragraphs [0105] to [0108], a TL spectrum is obtained by performing AT processing on the CL spectrum obtained from the loudness spectrum compression sub-module 800 with the AT sub-module 1100. The TL spectrum is passed to the SS sub-module 250 (step S4602). Referring to paragraphs [0051] to [0055], the LSG vector and the modified spectrum are obtained with the SS sub-module 250 by performing computations on the TL spectrum obtained from the AT sub-module 1100, the BL spectrum obtained from the BL model 240, and the input spectrum (step S4808). Since steps S4200 and S4700 of
Generally speaking, the frequency-domain NR processing is suitable for suppressing steady noise in speech rather than transient-type noise in speech. As the DRC processing is performed after NR processing, the interaction of them makes the transient-type noise in speech become prominent. The following variants of the JSGA module 200 of the present invention further integrates a NR sub-module 1300, a loudness spectrum compression sub-module 800, and an AT sub-module 1100. It is with the purpose of limiting the amount of instantaneous changes on loudness while performing both the NR processing and the DRC processing to improve the sound quality felt by the listener through reducing the interaction of the algorithms.
Referring to paragraphs [0072] to [0075], a NR spectrum and a SQ vector are obtained by performing NR processing on the input spectrum obtained from the FWA unit 120 with the NR sub-module 1300. The NR spectrum is passed to the AL model 230. The SQ vector is passed to the SS sub-module 250. Referring to paragraphs [0044] to [0050], the AL spectrum is obtained with the AL model 230 by performs computations on the ATE profile and the NR spectrum.
The TL spectrum obtained from the AT sub-module 1100 of
Referring to paragraphs [0032] to [0035], [0044] to [0050], the AL spectrum is obtained with the AL model 230 by performing computations on the ATE profile obtained by the fitting procedure 210 and the NR spectrum obtained from the NR sub-module 1300 (step S4202).
Referring to paragraphs [0051] to [0055], the LSG vector and the modified spectrum are obtained with the SS sub-module 250 by performing computations on the SQ vector, the TL spectrum, the BL spectrum, and the input spectrum (step S4812). Since steps S4700, S4502, and S4602 of
Referring to
Referring to paragraphs [0051] to [0055], the LSG vector and the modified spectrum are obtained with the SS sub-module 250 by performing computations on the NR spectrum, the SQ vector, the BL spectrum, and the TL spectrum (step S4805). Since steps S4202, S4700, S4502, and S4602 of
The AL spectrum obtained from the AL model 230 is passed to the NR sub-module 1300. Referring to paragraphs [0072] to [0075], a NRL spectrum and a SQ vector are obtained by performing NR processing on the AL spectrum with the NR sub-module 1300. The NRL spectrum is passed to the loudness spectrum compression sub-module 800. The SQ vector is passed to the SS sub-module 250.
Referring to
Referring to
Referring to paragraphs [0094] to [0097], the CL spectrum is obtained by performing loudness spectrum compression on the NRL spectrum with the loudness spectrum compression sub-module 800. The CL spectrum is passed to the AT sub-module 1100 (step S4506).
Referring to paragraphs [0051] to [0055], the LSG vector and the modified spectrum are obtained with the SS sub-module 250 by performing computations on the SQ vector, the TL spectrum, the BL spectrum, and the input spectrum (step S4812). Since steps S4200, S4700, and S4602 of
The ADC unit 110 is used to obtain a DI signal by performing sampling on an AI signal at a time period. The AI signal and the DI signal are of monaural type. The time period is referred to as the sampling period.
The analysis filter bank 1810 is used to obtain a plurality of sub-band signals of monaural type by performing sub-band filtering on the DI signal obtained from the ADC unit 110, that is, passing the DI signal through each of a plurality of sub-band filters of the filter bank.
The frequency responses of the sub-band filters of the analysis filter bank, as shown in
The sub-band snapshot unit 1820 is used to obtain an input spectrum of each time interval by performing simultaneous sampling on each sub-band signal obtained from the analysis filter bank 1810 at a time interval and ranking simultaneously sampled values according to their corresponding sub-band center frequencies. The input spectrum and simultaneously sampled values are of monaural type.
Referring to block diagrams and related descriptions of the JSGA module and its variants of
The sub-band signal combining unit 1830 is used to obtain a DO signal of monaural type by performing weighted combining on the sub-band signals obtained from the analysis filter bank 1810 according to the LSG vector corresponding to each sampling period:
where n denotes the index of the sampling period, F denotes the number of sub-bands of the filter bank, y(n) and xk(n) denote the DO signal and the k-th sub-band signal of the sampling period n, respectively, and GJSGA(n,k) denotes the k-th sub-band gain of the LSG vector obtained from the JSGA module 200 corresponding to the sampling period n (for example, the LSG vector latest obtained with the JSGA module 200 before the sampling period n).
The DAC unit 150 is used to convert the DO signal obtained from the sub-band signal combining unit 1830 into an AO signal of monaural type at the sampling period.
In the second embodiment, a DI signal is obtained with the ADC unit 110 by performing sampling on an AI signal at a time period. The AI signal and the DI signal are of monaural type. The time period is called a sampling period (step S3000).
Referring to paragraphs [0137] to [0138], a plurality of sub-band signals of monaural type are obtained with the analysis filter bank 1810 by performing sub-band filtering on the DI signal obtained from the ADC unit 110 (step S3102).
An input spectrum of each time interval is obtained with the sub-band snapshot unit 1820 by performing simultaneous sampling on each sub-band signal obtained from the analysis filter bank 1810 at a time interval and ranking simultaneously sampled values according to their corresponding sub-band center frequencies. The input spectrum and simultaneously sampled values are of monaural type (step S3150).
Referring to flowcharts and descriptions of the JSGA module and its variants of
Referring to paragraph [0141], a DO signal of monaural type is obtained with the sub-band signal combining unit 1830 by performing weighted combining on the sub-band signals obtained from the analysis filter bank 1810 according to the LSG vector corresponding to each sampling period (step S3302).
The DO signal obtained from the sub-band signal combining unit 1830 is converted into an AO signal of monaural type at the sampling period with the DAC unit 150 (step S3402).
Moreover, the audio processing system 102 equipped with the filter bank according to the second embodiment has a design flexibility that the time interval of the sub-band snapshot unit 1820 can be dynamically adjusted. Hence it is possible to detect the signal dynamics and lengthen the time interval in a quiet environment or in a slow-varying input condition, to reduce the computations of the JSGA module.
The following illustrates how the JSGA module of the present invention is applied to binaural systems. Similar to cases of monaural systems of previous embodiments, the JSGA module can be applied to binaural systems employing the AMS framework and binaural systems employing filter banks.
The ADC unit 110 is used to obtain a DI signal by performing sampling on an AI signal at a time period. The AI signal and the DI signal are of binaural type. The time period is referred to as the sampling period.
Referring to paragraphs [0021] to [0022], the FWA unit 120 is used obtain to an input spectrum of each frame period by performing framing and waveform analysis on the left-ear DI signal and the right-ear DI signal of the DI signal obtained from the ADC unit 110, wherein the input spectrum of each frame period is of binaural type.
Referring to block diagrams and descriptions of the JSGA module and its variants of
Referring to paragraph [0024], the waveform synthesis unit 140 is used to obtain a DO signal of binaural type by performing waveform synthesis on the left-ear modified spectrum and the right-ear modified spectrum of the modified spectrum obtained from the JSGA module 200.
The DAC unit 150 is used to convert the DO signal obtained from the waveform synthesis unit 140 into an AO signal of binaural type at the sampling period.
In the third embodiment, a DI signal is obtained with the ADC unit 110 by performing sampling on an AI signal at a time period. The AI signal and the DI signal are of binaural type. The time period is called a sampling period (step S3010).
Referring to paragraphs [0021] to [0022] and [0154], an input spectrum of each frame period is obtained with the FWA unit 120 by performing framing and waveform analysis on the DI signal obtained from the ADC unit 110, wherein the input spectrum of each frame period is of binaural type (step S3110).
Referring to flowcharts and descriptions of the JSGA module and its variants of
Referring to paragraphs [0024] and [0156], a DO signal of binaural type is obtained with the waveform synthesis unit 140 by performing waveform synthesis on the modified spectrum obtained from the JSGA module 200 (step S3310).
The DO signal obtained from the waveform synthesis unit 140 is converted into an AO signal of binaural type at the sampling period with the DAC unit 150 (step S3410).
The ADC unit 110 is used to obtain a DI signal by performing sampling on an AI signal at a time period. The AI signal and the DI signal are of binaural type. The time period is referred to as the sampling period.
Referring to paragraphs [0137] and [0138], the analysis filter bank 1810 is used to obtain a plurality of sub-band signals of binaural type by performing sub-band filtering on the left-ear DI signal digital and the right-ear DI signal of the DI signal obtained from the analog-to-digital conversion unit 110 separately.
The sub-band snapshot unit 1820 is used to obtain an input spectrum of each time interval by performing simultaneous sampling on each sub-band signal obtained from the analysis filter bank 1810 at a time interval and ranking simultaneously sampled values according to their corresponding sub-band center frequencies. The input spectrum of each time interval and the simultaneously sampled values are of binaural type.
Referring to block diagrams and descriptions of the JSGA module and its variants of
Referring to paragraph [0141], the sub-band signal combining unit 1830 is used to obtain a DO signal of binaural type by performing weighted combining on the left-ear sub-band signals and the right-ear sub-band signals of the sub-band signals obtained from the analysis filter bank 1810 according to the left-ear LSG vector and the right-ear LSG vector of the LSG vector corresponding to each sampling period, respectively.
The DAC unit 150 is used to convert the DO signal obtained from the sub-band signal combining unit 1830 into an AO signal of binaural type at the sampling period.
In the fourth embodiment, a DI signal is obtained with the ADC unit 110 by performing sampling on an AI signal at a time period. The AI signal and the DI signal are of binaural type. The time period is called a sampling period (step S3010).
Referring to paragraphs [0137] to [0138] and [0166], a plurality of sub-band signals of binaural type are obtained with the analysis filter bank 1810 by performing sub-band filtering on the DI signal obtained from the ADC unit 110 (step S3112).
Referring to paragraph [0167], an input spectrum of each time interval is obtained with the sub-band snapshot unit 1820 by performing simultaneous sampling on each sub-band signal obtained from the analysis filter bank 1810 at a time interval and ranking simultaneously sampled values according to their corresponding sub-band center frequencies. The input spectrum of each time interval and the simultaneously sampled values are of binaural type (step S3160).
Referring to flowcharts and descriptions of the JSGA module and its variants of
Referring to paragraphs [0141] and [0169], a DO signal of binaural type is obtained with the sub-band signal combining unit 1830 by performing weighted combining on the sub-band signals obtained from the analysis filter bank 1810 according to the LSG vector corresponding to each sampling period (step S3312).
The DO signal obtained from the sub-band signal combining unit 1830 is converted into an AO signal of binaural type at the sampling period with the DAC unit 150 (step S3412).
Although the present invention has been described above with reference to the preferred embodiments and the accompanying drawings, it shall not be considered as limited. Those skilled in the art can make various modifications, omissions and changes to the details of the embodiments of the present invention without departing from the scope of the claims of the invention.
Number | Date | Country | Kind |
---|---|---|---|
107139003 | Nov 2018 | TW | national |