Noise suppression circuit

Information

  • Patent Grant
  • 6647367
  • Patent Number
    6,647,367
  • Date Filed
    Monday, August 19, 2002
    22 years ago
  • Date Issued
    Tuesday, November 11, 2003
    21 years ago
Abstract
An adaptive noise suppression system includes an input A/D converter, an analyzer, a filter, and a output D/A converter. The analyzer includes both feed-forward and feedback signal paths that allow it to compute a filtering coefficient, which is input to the filter. In these paths, feed-forward signal are processed by a signal to noise ratio estimator, a normalized coherence estimator, and a coherence mask. Also, feedback signals are processed by a auditory mask estimator. These two signal paths are coupled together via a noise suppression filter estimator. A method according to the present invention includes active signal processing to preserve speech-like signals and suppress incoherent noise signals. After a signal is processed in the feed-forward and feedback paths, the noise suppression filter estimator then outputs a filtering coefficient signal to the filter for filtering the noise out of the speech and noise digital signal.
Description




BACKGROUND OF THE INVENTION




1. Field of the Invention




The present invention is in the field of voice coding. More specifically, the invention relates to a system and method for signal enhancement in voice coding that uses active signal processing to preserve speech-like signals and suppresses incoherent noise signals.




2. Description of the Related Art




The emergence of wireless telephony and data terminal products has enabled users to communicate with anyone from almost anywhere. Unfortunately, current products do not perform equally well in many of these environments, and a major source of performance degradation is ambient noise. Further, for safe operation, many of these hand-held products need to offer hands-free operation, and here in particular, ambient noise possess a serious obstacle to the development of acceptable solutions.




Today's wireless products typically use digital modulation techniques to provide reliable transmission across a communication network. The conversion from analog speech to a compressed digital data stream is, however, very error prone when the input signal contains moderate to high ambient noise levels. This is largely due to the fact that the conversion/compression algorithm (the vocoder) assumes the input signal contains only speech. Further, to achieve the high compression rates required in current networks, vocoders must employ parametric models of noise-free speech. The characteristics of ambient noise are poorly captured by these models. Thus, when ambient noise is present, the parameters estimated by the vocoder algorithm may contain significant errors and the reconstructed signal often sounds unlike the original. For the listener, the reconstructed speech is typically fragmented, unintelligible, and contains voice-like modulation of the ambient noise during silent periods. If vocoder performance under these conditions is to be improved, noise suppression techniques tailored to the voice coding problem are needed.




Current telephony and wireless data products are generally designed to be hand held, and it is desirable that these products be capable of hands-free operation. By hands-free operation what is meant is an interface that supports voice commands for controlling the product, and which permits voice communication while the user is in the vicinity of the product. To develop these hands-free products, current designs must be supplemented with a suitably trained voice recognition unit. Like vocoders, most voice recognition methods rely on parametric models of speech and human conversation and do not take into account the effect of ambient noise.




SUMMARY OF THE INVENTION




An adaptive noise suppression system (ANSS) is provided that includes an input A/D converter, an analyzer, a filter, and an output D/A converter. The analyzer includes both feed-forward and feedback signal paths that allow it to compute a filtering coefficient, which is then input to the filter. In these signal paths, feed-forward signals are processed by a signal-to-noise ratio (SNR) estimator, a normalized coherence estimator, and a coherence mask. The feedback signals are processed by an auditory mask estimator. These two signal paths are coupled together via a noise suppression filter estimator. A method according to the present invention includes active signal processing to preserve speech-like signals and suppress incoherent noise signals. After a signal is processed in the feed-forward and feedback paths, the noise suppression filter estimator outputs a filtering coefficient signal to the filter for filtering the noise from the speech-and-noise digital signal.




The present invention provides many advantages over presently known systems and methods, such as: (1) the achievement of noise suppression while preserving speech components in the 100-600 Hz frequency band; (2) the exploitation of time and frequency differences between the speech and noise sources to produce noise suppression; (3) only two microphones are used to achieve effective noise suppression and these may be placed in an arbitrary geometry; (4) the microphones require no calibration procedures; (5) enhanced performance in diffuse noise environments since it uses a speech component; (6) a normalized coherence estimator that offers improved accuracy over shorter observation periods; (7) makes the inverse filter length dependent on the local signal-to-noise ratio (SNR); (8) ensures spectral continuity by post filtering and feedback; (9) the resulting reconstructed signal contains significant noise suppression without loss of intelligibility or fidelity where for vocoders and voice recognition programs the recovered signal is easier to process. These are just some of the many advantages of the invention, which will become apparent to one of ordinary skill upon reading the description of the preferred embodiment, set forth below.




As will be appreciated, the invention is capable of other and different embodiments, and its several details are capable of modifications in various respects, all without departing from the invention. Accordingly, the drawings and description of the preferred embodiments are illustrative in nature and not restrictive.











BRIEF DESCRIPTION OF THE DRAWING





FIG. 1

is a high-level signal flow block diagram of the preferred embodiment of the present invention; and





FIG. 2

is a detailed signal flow block diagram of FIG.


1


.











DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT




Turning now to the drawing figures,

FIG. 1

sets forth a preferred embodiment of an adaptive noise suppression system (ANSS)


10


according to the present invention. The data flow through the ANSS


10


flows through an input converting stage


100


and an output converting stage


200


. Between the input stage


100


and the output stage


200


is a filtering stage


300


and an analyzing stage


400


. The analyzing stage


400


includes a feed-forward path


402


and a feedback path


404


.




Analog signals A(n) and B(n) are first received in the input stage


100


at receivers


102


and


104


, which are preferably microphones. These analog signals A and B are then converted to digital signals X


n


(m) (n=a,b) in input converters


110


and


120


. After this conversion, the digital signals X


n


(m) are fed to the filtering stage


300


and the feed-forward path


402


of the analyzing stage


400


. The filtering stage


300


also receives control signals H


c


(m) and r(m) from the analyzing stage


400


, which are used to process the digital signals X


n


(m).




In the filtering stage


300


, the digital signals X


n


(m) are passed through a noise suppressor


302


and a signal mixer


304


, and generate output digital signals S(m). Subsequently, the output digital signals S(m) from the filtering stage


300


are coupled to the output converter


200


and the feedback path


404


. Digital signals X


n


(m) and S(m) transmitted through paths


402


and


404


are received by a signal analyzer


500


, which processes the digital signals X


n


(m) and S(m) and outputs control signals H


c


(m) and r(m) to the filtering stage


300


. Preferably, the control signals include a filtering coefficient H


c


(m) on path


512


and a signal-to-noise ratio value r(m) on path


514


. The filtering stage


300


utilizes the filtering coefficient H


c


(m) to suppress noise components of the digital input signals. The analyzing stage


400


and the filtering stage


300


may be implemented utilizing either a software-programmable digital signal processor (DSP), or a programmable/hardwired logic device, or any other combination of hardware and software sufficient to carry out the described functionality.




Turning now to

FIG. 2

, the preferred ANSS


10


is shown in more detail. As seen in this figure, the input converters


110


and


120


include analog-to-digital (A/D) converters


112


and


122


that output digitized signals to Fast Fourier Transform (FFT) devices


114


and


124


, which preferably use short-time Fourier Transform. The FFT's


114


and


124


convert the time-domain digital signals from the A/Ds


112


,


122


to corresponding frequency domain digital signals X


n


(m), which are then input to the filtering and analyzing stages


300


and


400


. The filtering stage


300


includes noise suppressors


302




a


and


302




b,


which are preferably digital filters, and a signal mixer


304


. Digital frequency domain signals S(m) from the signal mixer


304


are passed through an Inverse Fast Fourier Transform (IFFT) device


202


in the output converter, which converts these signals back into the time domain s(n). These reconstructed time domain digital signals s(n) are then coupled to a digital-to-analog (D/A) converter


204


, and then output from the ANSS


10


on ANSS output path


206


as analog signals y(n).




With continuing reference to

FIG. 2

, the feed forward path


402


of the signal analyzer


500


includes a signal-to-noise ratio estimator (SNRE)


502


, a normalized coherence estimator (NCE)


504


, and a coherence mask (CM)


506


. The feedback path


404


of the analyzing stage


500


further includes an auditory mask estimator (AME)


508


. Signals processed in the feed-forward and feedback paths,


402


and


404


, respectively, are received by a noise suppression filter estimator (NSFE)


510


, which generates a filter coefficient control signal H


c


(m) on path


512


that is output to the filtering stage


300


.




An initial stage of the ANSS


10


is the A/D conversion stage


112


and


122


. Here, the analog signal outputs A(n) and B(n) from the microphones


102


and


104


are converted into corresponding digital signals. The two microphones


102


and


104


are positioned in different places in the environment so that when a person speaks both microphones pick up essentially the same voice content, although the noise content is typically different. Next, sequential blocks of time domain analog signals are selected and transformed into the frequency domain using FFTs


114


and


124


. Once transformed, the resulting frequency domain digital signals X


n


(m) are placed on the input data path


402


and passed to the input of the filtering stage


300


and the analyzing state


400


.




A first computational path in the ANSS


10


is the filtering path


300


. This path is responsible for the identification of the frequency domain digital signals of the recovered speech. To achieve this, the filter signal H


c


(m) generated by the analysis data path


400


is passed to the digital filters


302




a


and


302




b.


The outputs from the digital filters


302




a


and


302




b


are then combined into a single output signal S(m) in the signal mixer


304


, which is under control of second feed-forward path signal r(m). The mixer signal S(m) is then placed on the output data path


404


and forwarded to the output conversion stage


200


and the analyzing stage


400


.




The filter signal H


c


(m) is used in the filters


302




a


and


302




b


to suppress the noise component of the digital signal X


n


(m). In doing this, the speech component of the digital signal X


n


(m) is somewhat enhanced. Thus, the filtering stage


300


produces an output speech signal S(m) whose frequency components have been adjusted in such a way that the resulting output speech signal S(m) is of a higher quality and is more perceptually agreeable than the input speech signal X


n


(m) by substantially eliminating the noise component.




The second computation data path in the ANSS


10


is the analyzing stage


400


. This path begins with an input data path


402


and the output data path


404


and terminates with the noise suppression filter signal H


c


(m) on path


512


and the SNRE signal r(m) on path


514


.




In the feed forward path of the analyzing stage


400


, the frequency domain signals X


n


(m) on the input data path


402


are fed into an SNRE


502


. The SNRE


502


computes a current SNR level value, r(m), and outputs this value on paths


514


and


516


. Path


514


is coupled to the signal mixer


304


of the filtering stage


300


, and path


516


is coupled to the CM


506


and the NCE


504


. The SNR level value, r(m), is used to control the signal mixer


304


. The NCE


504


takes as inputs the frequency domain signal X


n


(m) on the input data path


402


and the SNR level value, r(m), and calculates a normalized coherence value γ(m) that is output on path


518


, which couples this value to the NSFE


510


. The CM


506


computes a coherence mask value X(m) from the SNR level value r(m) and outputs this mask value X(m) on path


520


to the NFSE


510


.




In the feedback path


404


of the analyzing stage


400


, the recovered speech signals S(m) on the output data path


404


are input to an AME


508


, which computes an auditory masking level value β


c


(m) that is placed on path


522


. The auditory mask value β


c


(m) is also input to the NFSE


510


, along with the values X(m) and γ(m) from the feed forward path. Using these values, the NFSE


510


computes the filter coefficients H


c


(m), which are used to control the noise suppressor filters


302




a,




302




b


of the filtering stage


300


.




The final stage of the ANSS


10


is the D-A conversion stage


200


. Here, the recovered speech coefficients S(m) output by the filtering stage


300


are passed through the IFFT


202


to give an equivalent time series block. Next, this block is concatenated with other blocks to give the complete digital time series s(n). The signals are then converted to equivalent analog signals y(n) in the D/A converter


204


, and placed on ANSS output path


206


.




The preferred method steps carried out using the ANSS


10


is now described. This method begins with the conversion of the two analog microphone inputs A(n) and B(n) to digital data streams. For this description, let the two analog signals at time t seconds be x


a


(t) and x


b


(t). During the analog to digital conversion step, the time series x


a


(n) and x


b


(n) are generated using







x




a


(


n


)=


x




a


(


nT




s


) and


x




b


(


n


)=


x




b


(


nT




s


)   (1)




where T


s


is the sampling period of the A/D converters, and n is the series index.




Next, x


a


(n) and x


b


(n) are partitioned into a series of sequential overlapping blocks and each block is transformed into the frequency domain according to equation (2).














X
a



(
m
)


=






DWx
a



(
n
)











X
b



(
m
)


=






DWx
b



(
n
)



,

m
=

1







M









(2)













where








x




a


(


m


)=[


x




a


(


mN




s


) . . .


x




a


(


mN




s


+(


N−


1))]


t


;






m is the block index;




M is the total number of blocks;




N is the block size;




D is the N×N Discrete Fourier Transform matrix with









[
D
]

uv

=

e



j2π


(

u
-
1

)




(

v
-
1

)


N



,
u
,


v
=

1








N
.



;











W is the N×N diagonal matrix with [W]


uu


=w(u) and w(n) is any suitable window function of length N; and




[x


a


(m)]


t


is the vector transpose of x


a


(m).




The blocks X


a


(m) and X


b


(m) are then sequentially transferred to the input data path


402


for further processing by the filtering stage


300


and the analysis stage


400


.




The filtering stage


300


contains a computation block


302


with the noise suppression filters


302




a,




302




b.


As inputs, the noise suppression filter


302




a


accepts X


a


(m) and filter


302




b


accepts X


b


(m) from the input data path


402


. From the analysis stage data path


512


H


c


(m), a set of filter coefficients, is received by filter


302




b


and passed to filter


302




a.


The signal mixer


304


receives a signal combining weighting signal r(m) and the output from the noise suppression filter


302


. Next, the signal mixer


304


outputs the frequency domain coefficients of the recovered speech S(m), which are computed according to equation (3).








S


(


m


)=(


r


(


m


)


X




a


(


m


)+(1


−r


(


m


))


X




b


(


m





H




c


(


m


)   (3)






where




[


x·y]=[x]




i




[y]




i






The quantity r(m) is a weighting factor that depends on the estimated SNR for block m and is computed according to equation (5) and placed on data paths


516


and


518


.




The filter coefficients H


c


(m) are applied to signals X


a


(m) and X


b


(m) (


402


) in the noise suppressors


302




a


and


302




b.


The signal mixer


304


generates a weighted sum S(m) of the outputs from the noise suppressors under control of the signal r(m)


514


. The signal r(m) favors the signal with the higher SNR. The output from the signal mixer


304


is placed on the output data path


404


, which provides input to the conversion stage


200


and the analysis stage


400


.




The analysis filter stage


400


generates the noise suppression filter coefficients, H


c


(m), and the signal combining ratio, r(m), using the data present on the input


402


and output


404


data paths. To identify these quantities, five computational blocks are used: the SNRE


502


, the CM


506


, the NCE


504


, the AME


508


, and the NSFE


510


.




Described below is the computation performed in each of these blocks beginning with the data flow originating at the input data path


402


. Along this path


402


, the following computational blocks are processed: The SNRE


502


, the NCE


504


, and the CM


506


. Next, the flow of the speech signal S(m) through the feedback data path


404


originating with the output data path is described. In this path


404


, the auditory mask analysis is performed by AME


508


. Lastly, the computation of H


c


(m) and r(m) is described.




From the input data path


402


, the first computational block encountered in the analysis stage


400


is the SNRE


502


. In the SNRE


502


, an estimate of the SNR that is used to guide the adaptation rate of the NCE


504


is determined. In the SNRE


502


an estimate of the local noise power in X


a


(m) and X


b


(m) is computed using the observation that relative to speech, variations in noise power typically exhibit longer time constants. Once the SNRE estimates are computed, the results are used to ratio-combine the digital filter


302




a


and


302




b


outputs and in the determination of the length of H


c


(m) (Eq. 9).




To compute the local SNR in the SNRE


502


, exponential averaging is used. By employing different adaptation rates in the filters, the signal and noise power contributions in X


a


(m) and X


b


(m) can be approximated at block m by








SNR




a


(


m


)=(


Es




a




s




a




H


(


m


)


Es




a




s




a


(


m


)) /(


En




a




n




a




H


(


m


)


En




a




n




a


(


m


))   (4a,b)










SNR




b


(


m


)=(


Es




b




s




b




H


(


m


)


Es




b




s




b


(


m


)) /(


En




b




n




b




H


(


m


)


En




b




n




b


(


m


))






where




Es


a


s


a


(m), En


a


n


a


(m), Es


b


s


b


(m), and En


b


n


b


(m) are the N-element vectors;








Es




a




s




a


(


m


)=


Es




a




s




a


(


m−


1)+α


s






a






·X




a




*


(


m





X




a


(


m


);   (4c)










Es




b




s




b


(


m


)=


Es




b




s




b


(


m−


1)+α


s






b






·X




b




*


(


m





X




b


(


m


);   (4d)










En




a




n




a


(


m


)=


En




a




n




a


(


m−


1)+α


n






a






·X




a




*


(


m





X




a


(


m


);   (4e)










En




b




n




b


(


m


)=


En




b




n




b


(


m−


1)+α


n






b






·X




b




*


(


m





X




b


(


m


);   (4f)














[

α

s
a


]

i

=

{





μ

s
a







for




[

E






s
a




s
a



(

m
-
1

)



]

i




[



X
a
*



(
m
)


·


X
a



(
m
)



]

i







δ

s
a







for




[

E






s
a




s
a



(

m
-
1

)



]

i

>


[



X
a
*



(
m
)


·


X
a



(
m
)



]

i





;






(4g)








[

α

n
a


]

i

=

{





μ

n
a







for




[

E






n
a




n
a



(

m
-
1

)



]

i




[



X
a
*



(
m
)


·


X
a



(
m
)



]

i







δ

n
a







for




[

E






n
a




n
a



(

m
-
1

)



]

i

>


[



X
a
*



(
m
)


·


X
a



(
m
)



]

i





;






(4h)








[

α

s
b


]

i

=

{





μ

s
b







for




[

E






s
b




s
b



(

m
-
1

)



]

i




[



X
b
*



(
m
)


·


X
b



(
m
)



]

i







δ

s
b







for




[

E






s
b




s
b



(

m
-
1

)



]

i

>


[



X
b
*



(
m
)


·


X
b



(
m
)



]

i





;






(4i)







[

α
ub

]

=

{





μ
ub






for




[

E






n
b




n
b



(

m
-
1

)



]

i




[



X
b
*



(
m
)


·


X
b



(
m
)



]

i







δ
ub






for




[

E






n
b




n
b



(

m
-
1

)



]

i

>


[



X
b
*



(
m
)


·


X
b



(
m
)



]

i





.






(4j)













In these equations, 4(c)-4(j), x


*


is the conjugate of x, and μ


s






a




, μ


s






b




, μ


n






a




, μ


n






b




, are application specific adaptation parameters associated with the onset of speech and noise, respectively. These may be fixed or adaptively computed from X


a


(m) and X


b


(m). The values δ


s






a




, δ


s






b




, δ


n






a




, δ


n




b


are application specific adaptation parameters associated with the decay portion of speech and noise, respectively. These also may be fixed or adaptively computed from X


a


(m) and X


b


(m).




Note that the time constants employed in computation of Es


a


s


a


(m), En


a


n


a


(m), Es


b


s


b


(m), En


b


n


b


(m) depend on the direction of the estimated power gradient. Since speech signals typically have a short attack rate portion and a longer decay rate portion, the use of two time constants permits better tracking of the speech signal power and thereby better SNR estimates.




The second quantity computed by the SNR estimator


502


is the relative SNR index r(m), which is defined by










r


(
m
)


=




SNR
a



(
m
)





SNR
a



(
m
)


+


SNR
b



(
m
)




.





(5)













This ratio is used in the signal mixer


304


(Eq. 3) to ratio-combine the two digital filter output signals.




From the SNR estimator


502


, the analysis stage


400


splits into two parallel computation branches: the CM


506


and the NCE


504


.




In the ANSS method, the filtering coefficient H


c


(m) is designed to enhance the elements of X


a


(m) and X


b


(m) that are dominated by speech, and to suppress those elements that are either dominated by noise or contain negligible psycho-acoustic information. To identify the speech dominant passages, the NCE


504


is employed, and a key to this approach is the assumption that the noise field is spatially diffuse. Under this assumption, only the speech component of x


a


(t) and x


b


(t) will be highly cross-correlated, with proper placement of the microphones. Further, since speech can be modeled as a combination of narrowband and wideband signals, the evaluation of the cross-correlation is best performed in the frequency domain using the normalized coherence coefficients γ


ab


(m). The i


th


element of γ


ab


(m) is given by












[


γ
ab



(
m
)


]

i

=


(



[



Es
a




s
b



(
m
)



-


En
a




n
b



(
m
)




]

i




[


Es
a





s
a



(
m
)


·

Es
b





s
b



(
m
)


]

)

i



)



[

τ


(


(



SNR
a



(
m
)


+


SNR
b



(
m
)



)

/
2

)


]

i



,

i
=

1







N






(6)













where








Es




a




s




b


(


m


)=


Es




a




s




b


(


m−


1)+α


s






ab






·X




a




*


(


m





X




b


(


m


);   (6a)










En




a




n




b


(


m


)=


En




a




n




b


(


m−


1)+α


n






ab






·X




a




*


(


m





X




b


(


m


);   (6b)














[

α

s
ab


]

i

=

{





μ

s
ab






for







&LeftBracketingBar;


Es
a




s
b



(

m
-
1

)



&RightBracketingBar;

i





&LeftBracketingBar;



X
a
*



(
m
)


·


X
b



(
m
)



&RightBracketingBar;

i







δ

s
ba





for






&LeftBracketingBar;



Es
a





s
b



(

m
-
1

)


i


>


&LeftBracketingBar;



X
a
*



(
m
)


·


X
b



(
m
)



&RightBracketingBar;

i







;






(6c)








[

α

n
ab


]

i

=

{





μ

n
ab






for







&LeftBracketingBar;


En
a




n
b



(

m
-
1

)



&RightBracketingBar;

i





&LeftBracketingBar;



X
b
*



(
m
)


·


X
b



(
m
)



&RightBracketingBar;

i







δ

n
ba





for






&LeftBracketingBar;



En
a





n
b



(

m
-
1

)


i


>


&LeftBracketingBar;



X
b
*



(
m
)


·


X
b



(
m
)



&RightBracketingBar;

i







;






(6d)













In these equations, 6(a)-6(d), |x|


2


=x


*


·x and τ(a) is a normalization function that depends on the packaging of the microphones and may also include a compensation factor for uncertainty in the time alignment between x


a


(t) and x


b


(t). The values μ


s






ab




, μ


n






ab




are application specific adaptation parameters associated with the onset of speech and the values δ


s






ab




, δ


n






ba




are application specific adaptation parameters associated with the decay portion of speech.




After completing the evaluation of equation (6), the resultant γ


ab


(m) is placed on the data path


518


.




The performance of any ANSS system is a compromise between the level of distortion in the desired output signal and the level of noise suppression attained at the output. This proposed ANSS system has the desirable feature that when the input SNR is high, the noise suppression capability of the system is deliberately lowered, in order to achieve lower levels of distortion at the output. When the input SNR is low, the noise suppression capability is enhanced at the expense of more distortion at the output. This desirable dynamic performance characteristic is achieved by generating a filter mask signal X(m)


520


that is convolved with the normalized coherence estimates, γ


ab


(


m


), to give H


c


(m) in the NSFE


510


. For the ANSS algorithm, the filter mask signal equals








X


(


m


)=D


χ


((


SNR




a


(


m


)+


SNR




b


(


m


))/2)   (7)






where




χ(b) is an N-element vector with








[

χ


(
b
)


]

i

=

{




1



i


N
/
2







e

-

(


(

b
-

χ
th


)




(

i
-

N
/
2


)

/

χ
s



)






N

i
>

N
/
2





,

and





where













χ


th


, χ


s


are implementation specific parameters.




Once computed, X(m) is placed on the data path


520


and used directly in the computation of H


c


(m) (Eq. 9). Note that X(m) controls the effective length of the filtering coefficient H


c


(m).




The second input path in the analysis data path is the feedback data path


404


, which provides the input to the auditory mask estimator


508


. By analyzing the spectrum of the previous block, the N-element auditory mask vector, β


c


(m), identifies the relative perceptual importance of each component of S(m). Given this information and the fact that the spectrum varies slowly for modest block size N, H


c


(m) can be modified to cancel those elements of S(m) that contain little psycho-acoustic information and are therefore dominated by noise. This cancellation has the added benefit of generating a spectrum that is easier for most vocoder and voice recognition systems to process.




The AME


508


uses psycho-acoustic theory that states if adjacent frequency bands are louder than a middle band, then the human auditory system does not perceive the middle band and this signal component is discarded. The AME


508


is responsible for identifying those bands that are discarded since these bands are not perceptually significant. Then, the information from the AME


508


is placed in path


522


that flows to the NSFE


510


. Through this, the NSFE


510


computes the coefficients that are placed on path


512


to the digital filter


302


providing the noise suppression.




To identify the auditory mask level, two detection levels must be computed: an absolute auditory threshold and the speech induced masking threshold, which depends on S(m). The auditory masking level is the maximum of these two thresholds or






β


c


(


m


)=


max





abs




, ΨS


(


m−


1))   (8)






where




Ψ


abs


is an N-element vector containing the absolute auditory detection levels at frequencies












(


u
-
1


NT
s


)






Hz





and





u

=

1







N


;




(8b)






















[

Ψ
abs

]

i

=


Ψ
a



(


i
-
1


NT
s


)



;




(8b)









Ψ
a



(
f
)





180.17

T
s




10

(




Ψ
c



(
f
)


/
10

-
12

)




;




(8c)








Ψ
c



(
f
)




{







34.97
-


10






log


(
f
)




log


(
50
)




,









f

500








4.97
-


4






log


(
f
)




log


(
1000
)




,









f
>
500




;






(8d)













Ψ is the N×N Auditory Masking Transform;













[
Ψ
]

uv

=

T


(



2


(

u
-
1

)



NT
s


,


2


(

v
-
1

)



NT
s



)



;

,
u
,
v
,

=
1

,





,
N




(8e)







T


(


f
m

,
f

)


=

{








T
max



(

f
m

)





(

f

f
m


)

28


,




f


f
m










T
max



(

f
m

)





(

f

f
m


)


-
10



,




f
>

f
m





;






(8f)








T
max



(
f
)


=

{











10


-

(


14





5

+

f
250


)


/
10


,






f
<
1700

















10


-
2






5


,






1700

f
<
3000












10


-

(

25
-

f
1000


)


/
10


,





f

3000









;






(8g)













The final step in the analysis stage


400


is performed by the NSFE


510


. Here the noise suppression filter signal H


c


(m) is computed according to equation (8) using the results of the normalized coherence estimator


504


and the CM


506


.




The i


th


element of H


c


(m) is given by











[


H
c



(
m
)


]

i

=

{



0





for




[


X


(
m
)


*


γ
ab



(
m
)



]

i




[


β
c



(
m
)


]

i






1





for




[


X


(
m
)


*


γ
ab



(
m
)



]

i


1







[


X


(
m
)


*


γ
ab



(
m
)



]

i



elsewhere








(
9
)













and where




A*B is the convolution of A with B.




Following the completion of equation (9), the filter coefficients are passed to the digital filter


302


to be applied to X


a


(m) and X


b


(m).




The final stage in the ANSS algorithm involves reconstructing the analog signal from the blocks of frequency coefficients present on the output data path


404


. This is achieved by passing S(m) through the Inverse Fourier Transform, as shown in equation (10), to give s(m).







s


(


m


)=


D




H




S


(


m


)   (110)




where




[D]


H


is the Hermitian transpose of D.




Next, the complete time series, s(n), is computed by overlapping and adding each of the blocks. With the completion of the computation of s(n), the ANSS algorithm converts the s(n) signals into the output signal y(n), and then terminates.




The ANSS method utilizes adaptive filtering that identifies the filter coefficients utilizing several factors that include the correlation between the input signals, the selected filter length, the predicted auditory mask, and the estimated signal-to-noise ratio (SNR). Together, these factors enable the computation of noise suppression filters that dynamically vary their length to maximize noise suppression in low SNR passages and minimize distortion in high SNR passages, remove the excessive low pass filtering found in previous coherence methods, and remove inaudible signal components identified using the auditory masking model.




Although the preferred embodiment has inputs from two microphones, in alternative arrangements the ANS system and method can use more microphones using several combining rules. Possible combining rules include, but are not limited to, pair-wise computation followed by averaging, beam-forming, and maximum-likelihood signal combining.




The invention has been described with reference to preferred embodiments. Those skilled in the art will perceive improvements, changes, and modifications. Such improvements, changes and modifications are intended to be covered by the appended claims.



Claims
  • 1. A noise suppression circuit, comprising:an input converting stage for receiving an analog input signal and for generating a digital input signal: a filter stage coupled to the digital input signal for generating a filtered digital signal based upon a pair of control signals, a first control signal comprising a filtering coefficient and a second control signal comprising a signal-to-noise ratio value; an output converting stage coupled to the filtered digital signal for generating a filtered analog output signal; and an analysis stage coupled to the input converting stage and the filter stage, the analysis stage receiving the digital input signal from the input converting stage and the filtered digital signal from the filter stage and generating the first and second control signals to the filter stage.
  • 2. The noise suppression circuit of claim 1, wherein the first control signal is generated by a noise suppression filter estimator coupled to the digital input signal in a feed-forward signal path and to the filtered digital signal in a feed-back signal path.
  • 3. The noise suppression circuit of claim 2, further comprising an auditory mask estimator coupled between the filtered digital signal and the noise suppression filter estimator that computes an auditory masking level value which is used by the noise suppression filter estimator to generate the first control signal.
  • 4. The noise suppression circuit of claim 2, wherein the feed-forward signal path comprises a normalized coherence estimator coupled to the digital input signal that computes a normalized coherence value which is used by the noise suppression filter estimator to generate the first control signal.
  • 5. The noise suppression circuit of claim 4, wherein the normalized coherence estimator is also coupled to a signal to noise ratio estimator circuit which generates the second control signal.
  • 6. The noise suppression circuit of claim 2, wherein the feed-forward signal path comprises a signal to noise ratio estimator circuit which generates the second control signal, the second control signal being coupled to a normalized coherence estimator that computes a normalized coherence value and a coherence mask that computes a coherence mask value, wherein the normalized coherence value and the coherence mask value are used by the noise suppression filter estimator to generate the first control signal.
  • 7. The noise suppression circuit of claim 1, wherein the input converting stage includes an analog to digital converter and a Fast Fourier Transform circuit, the digital input signals comprising frequency domain digital signals.
  • 8. The noise suppression circuit of claim 7, wherein the input converting stage further includes a microphone coupled to the analog to digital converter.
  • 9. The noise suppression circuit of claim 1, wherein the input converting stage includes a pair of microphones, a pair of analog to digital converters, and a pair of Fast Fourier Transform circuits, each microphone being coupled to an analog to digital converter and a Fast Fourier Transform circuit, the digital input signals comprising a pair of frequency domain digital signals.
  • 10. The noise suppression circuit of claim 1, wherein the filter stage further comprises a noise suppressor coupled to the first control signal and a signal mixer coupled to the second control signal.
  • 11. The noise suppression circuit of claim 10, the noise suppressor comprises a digital filter.
  • 12. The noise suppression circuit of claim 1, wherein the filter stage and the analysis stage comprise a digital signal processor.
  • 13. The noise suppression circuit of claim 1, wherein the output converting stage comprises an Inverse Fast Fourier Transform circuit and a digital to analog converter.
  • 14. The noise suppression circuit of claim 1, wherein the filter stage enhances voice components and suppresses noise components in the digital input signal.
  • 15. An adaptive noise suppression system, comprising:an input converting stage for converting analog input signals into digital input signals; an output converting stage for converting digital output signals into analog output signals: a first computation data path coupled between the input converting stage and the output converting stage for receiving the digital input signals and for processing the digital input signals to create the digital output signals based upon a control signal; and a second computation data path for generating the control signal, the second computation data path including a feedback computation data path coupled to the digital input signals and a feed forward computation data path coupled to the digital output signals, wherein the control signal is generated based upon the signals on the feedback computation data path and the feed forward computation data path.
  • 16. The system of claim 15, wherein the first computation data path comprises a filtering stage.
  • 17. The system of claim 16, wherein the input converting stage converts a plurality of analog input signals into a plurality of digital input signals, and wherein the filtering stage filters the plurality of digital input signals and combines the plurality of digital input signals into a digital output signal.
  • 18. The system of claim 17, wherein the input converting stage comprises a plurality of input converters, and wherein the filtering stage comprises a plurality of noise suppression filters coupled to a correspondingone of the plurality of input converters and a signal mixer coupled to the plurality of noise suppression filters.
  • 19. The system of claim 16, wherein the feed forward computation data path and the feedback computation data path are coupled through a filter coefficient estimator configured to compute a filter coefficient, and to output the filter coefficient as the control signal to the filtering stage.
  • 20. The system of claim 16, wherein the feed forward computation data path comprises a signal-to-noise ratio (SNR) estimator for receiving the digital input signals, computing an SNR level value, and outputting the SNR level value as the control signal to the filtering stage.
  • 21. The system of claim 16, wherein:the feed forward computation data path and the feedback computation data path are coupled through a filter coefficient estimator configured to compute a filter coefficient, and to output the filter coefficient as a first control signal to the filtering stage; and the feed forward computation data path comprises a signal-to-noise ratio (SNR) estimator configured to receive the digital input signals, to compute an SNR level value, and to output the SNR level value as a control signal to the filtering stage.
  • 22. The system of claim 21, wherein the feed forward computation data path further comprises:a normalized coherence mask estimator configured to receive the digital input signals and the SNR level value, to compute normalized coherence value, and to output the normalized coherence value to the filter coefficient estimator; and a coherence mask configured to receive the SNR level value, to compute a coherence mask value, and to output the coherence mask value to the filter coefficient estimator.
  • 23. The system of claim 22, wherein the feedback computation data path comprises an auditory mask estimator configured to receive the digital output signals, to compute an auditory mask, and to output the auditory mask to the filter coefficient estimator.
  • 24. The system of claim 21, wherein the feedback computation data path comprises an auditory mask estimator configured to receive the digital output signals, to compute an auditory mask, and to output the auditory mask to the filter coefficient estimator.
  • 25. A method of suppressing noise, comprising the steps of:receiving an analog input signal and generating a digital input signal; filtering the digital input signal to generate a filtered digital signal based upon a pair of control signals, a first control signal comprising a filtering coefficient and a second control signal comprising a signal-to-noise ratio value; generating a filtered analog output signal from the filtered digital signal; and analyzing the digital input signal and the filtered digital signal to generate the first and second control signals.
  • 26. The method of claim 25, further comprising the step of:providing a noise suppression filter estimator coupled to the digital input signal in a feed-forward signal path and to the filtered digital signal in a feed-back signal path to generate the first control signal.
  • 27. The method of claim 24, further comprising the step of:computing an auditory masking level value which is used by the noise suppression filter estimator to generate the first control signal.
  • 28. The method of claim 24, further comprising the step of:computing a normalized coherence value which is used by the noise suppression filter estimator to generate the first control signal.
  • 29. The method of claim 28, further comprising the step of:providing a signal to noise ratio estimator circuit which generates the second control signal.
  • 30. The method of claim 24, further comprising the step of generating the first control signal using a normalized coherence value and a coherence mask value.
  • 31. The method of claim 25, further comprising the step of:converting the digital input signals into frequency domain digital signals.
  • 32. The method of claim 25, further comprising the step of:receiving the analog input signal with a microphone.
  • 33. A system for suppressing noise, comprising:means for receiving an analog input signal and generating a digital input signal; means for filtering the digital input signal to generate a filtered digital signal based upon a pair of control signals, a first control signal comprising a filtering coefficient and a second control signal comprising a signal-to-noise ratio value; means for generating a filtered analog output signal from the filtered digital signal; and means for analyzing the digital input signal and the filtered digital signal to generate the first and second control signals.
  • 34. The system of claim 33, further comprising:a noise suppression filter estimator coupled to the digital input signal in a feed-forward signal path and to the filtered digital signal in a feed-back signal path to generate the first control signal.
  • 35. The system of claim 34, further comprising:means for computing an auditory masking level value which is used by the noise suppression filter estimator to generate the first control signal.
  • 36. The system of claim 34, further comprising:means for computing a normalized coherence value which is used by the noise suppression filter estimator to generate the first control signal.
  • 37. The system of claim 36, further comprising:a signal to noise ratio estimator circuit which generates the second control signal.
  • 38. The system of claim 34, further comprising:means for generating the first control signal using a normalized coherence value and a coherence mask value.
  • 39. The system of claim 33, further comprising:means for converting the digital input signals into frequency domain digital signals.
Parent Case Info

The application is a continuation of application Ser. No. 09/452,623, filed Dec. 1, 1999, now U.S. Pat. No. 6,473,733.

US Referenced Citations (13)
Number Name Date Kind
4630304 Borth et al. Dec 1986 A
5245665 Lewis et al. Sep 1993 A
5307405 Sih Apr 1994 A
5396189 Hays Mar 1995 A
5507036 Vagher Apr 1996 A
5528196 Baskin et al. Jun 1996 A
5546422 Yokev et al. Aug 1996 A
5598158 Linz Jan 1997 A
5742694 Eatwell Apr 1998 A
5796819 Romesburg Aug 1998 A
5920834 Sih et al. Jul 1999 A
6005640 Strolle et al. Dec 1999 A
6122384 Mauro Sep 2000 A
Foreign Referenced Citations (1)
Number Date Country
196 29 132 Jan 1998 DE
Non-Patent Literature Citations (2)
Entry
Linhard K., “Speech Enhancement Using Two Versions of the Noisy Speech Signal,” 4th European Conference on Speech Communication and Technology, Eurospeech '95, Madrid, Spain, Sep. 18-21, 1995, European Conference on Speech Communication and Technology (Eurospeech), Madrid: Graficas Brens, ES, vol. 3, Conf. 4, Sep. 18, 1995, pp. 2005-2008, XP000855101.
Virag, N., “Speech Enhancement Based on Masking Properties of the Auditory System,” Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Detroit, May 9-12, 1995, Speech, New York, IEEE, US, vol. 1, May 9, 1995, pp. 796-799, XP000658114, ISBN: 0-7803-2432-3.
Continuations (1)
Number Date Country
Parent 09/452623 Dec 1999 US
Child 10/223409 US