Below, the invention is described in more detail by means of examples and the included drawings. The figures show schematically:
The reference symbols used in the figures and their meaning are summarized in the list of reference symbols. Generally, alike or alike-functioning parts are given the same or similar reference symbols. The described embodiments are meant as examples and shall not confine the invention.
From the input signals S1, the signal processing circuit 3 derives output audio signals S2, which are fed to an output transducer unit 5, e.g., a loudspeaker. The output transducer unit 5 transduces the output audio signals S2 into signals to be perceived by the user of the hearing device, e.g., into acoustic sound, as indicated in
An automatic adaptation of the transfer function G to said current acoustic environment is accomplished in the following manner:
The input audio signals S1 are fed to a classifier unit 4, in which said current acoustic environment is classified, wherein any known classification method can in principle be used. I.e., the current acoustic environment, represented by the input audio signals S1, is compared to N predetermined acoustic environments, each described by one class of a set of N predefined classes C1 . . . CN.
A set of N class similarity factors p1 . . . pN is output, wherein each of the class similarity factors p1 . . . pN is indicative of the similarity of said current acoustic environment with the respective predetermined acoustic environment of classes C1 . . . CN or, put in other words, of the likeness (resemblance) of said current acoustic environment and the respective predetermined acoustic environment, or, expressed differently, of the degree of correspondence between said current acoustic environment and the respective predetermined acoustic environment.
The classification may be accomplished in various ways known in the art. E.g., as indicated in
Today, N may typically be N=2, N=3, N=4, N=5 or possibly larger. Typical classes may be “speech”, “speech in noise”, “noise”, “music” or others. Typical features are, e.g., spectral shape, harmonic structure, coherent frequency and/or amplitude modulations, signal-to-noise ratio, spectral center of gravity, spatial distribution of sound sources and many more.
The automatic adaptation of the transfer function G is on the one hand based on said class similarity factors p1 . . . pN and on the other hand based on base parameter sets. Said base parameter sets are predetermined, and their respective values are usually obtained during a fitting procedure and/or may be at least partly pre-defined in the hearing device 1.
For each sub-function (in
In order to adapt the transfer function G, and in particular each sub-function, to a current acoustic environment, for each sub-function, the base parameter sets are mixed in dependence of their class similarity factors p1 . . . pN. In the embodiment of
Said class weight factors P1 . . . PN are derived from said class similarity factors p1 . . . pN. In the example of
The processing unit 8 outputs an activity parameter set a1 (generally: one for each sub-function), which is fed to the transfer function G, or, more precisely, to the respective sub-function. Accordingly, the transfer function G is adapted to the current acoustic environment in a fashion based on the predetermined base parameter sets.
A simple example:
M=1, g1: beamformer; N=2, C1: music, C2: speech in noise. The according base parameter sets B1/1, B1/2 do not have to be derived in a fitting procedure, but can be pre-programmed by the hearing device manufacturer: B1/1=0, B1/2=1, which means that no beam forming (zero activity of g1) shall be used when the user wants to listen to music, and full beam forming (full activity of g1) shall be used when the user wants to understand a speaker in a noisy place. Zero beam forming activity will usually mean that an omnidirectional polar pattern of the input transducer unit 2 shall be used, and full beam forming activity will typically mean that a high sensitivity towards the front direction (along the user's nose) shall be used, with little sensitivity for sound from other directions.
When the user is in an acoustic environment with p1=99% and p2=1%, i.e., the classification result implies that the current acoustic environment is practically pure music, the beam former (realized by sub-function g1) is run with (at least approximately) B1/1, i.e., at practically zero activity (o1=o2=0, f1=f2=1 implied).
When the user is in an acoustic environment with p1=1% and p2=99%, i.e., the classification result implies that the current acoustic environment is practically purely speech-in-noise, the beam former (realized by sub-function g1) is run with (at least approximately) B1/2, i.e., with practically full activity (o1=o2=0, f1=f2=1 implied).
When, however, the user is in an acoustic environment with p1=40% and p2=60% (e.g., in a restaurant situation with background music), i.e., the classification result implies that the current acoustic environment has aspects of music and somewhat stronger aspects of speech-in-noise, the beam former (realized by sub-function g1) is run with 0.4×B1/1+0.6×B1/2, i.e., with moderate activity (o1=o2=0, f1=f2=1 implied). The beam former may provide for a medium emphasis of sound from the front hemisphere and only little suppression of sound from elsewhere.
Of course, instead of the simple linear behaviour of the mixing of the base parameter sets that is exemplary discussed above, also more sophisticated (non-linear) ways of mixing the base parameter sets may be applied.
If it is particularly important to the user to understand speech in noisy surroundings, whereas he is not particularly fond of music, this individual preference may be taken into account by using something like o1=0, o2=0.3 and/or f1=0.8, f2=1.5, or the like.
Another simple example:
M=1, g1: gain model (amplification characteristic); N=2, C1: music, C2: speech. The according base parameter sets B1/1, B1/2 will usually be derived in a fitting procedure and indicate the amplification in dependence of incoming signal power that shall be used; characterized, e.g., in terms of decibel values characterizing the incoming signal power and compression values characterizing the steepness of increase of output signal with increase of incoming signal power. E.g., B1/1=(50 dB, 2.5; 90 dB, 0.8; 110 dB, 0.3; 0) indicating expansion below 50 dB, light compression up to 90 dB, strong compression up to 110 dB and limiting (infinite compression) thereabove. On the other hand, for speech, other values may be used, e.g., B1/1=(30 dB, 2.5; 80 dB, 0.4; 105 dB, 0.2; 0) indicating expansion below 30 dB, medium compression up to 80 dB, strong compression up to 105 dB and limiting thereabove. These rather arbitrarily chosen numbers for the base parameter sets shall just indicate one possible way of forming base parameter sets. Usually, gain models are furthermore frequency-dependent, so that the base parameter sets will, in addition, comprise frequency values and, accordingly, even more decibel values and compression values (for the various frequency ranges).
When the user is in an acoustic environment with p1=99% and p2=1%, i.e., the classification result implies that the current acoustic environment is practically pure music, the gain model (realized by sub-function g1) is run with (at least approximately) B1/1 (o1=o2=0, f1=f2=1 implied).
When the user is in an acoustic environment with p1=1% and p2=99%, i.e., the classification result implies that the current acoustic environment is practically pure speech, the gain model (g1) is run with (at least approximately) B1/2 (o1=o2=0, f1=f2=1 implied).
When, however, the user is in an acoustic environment with p1=40% and p2=60% (e.g., in a conversation situation with background music), i.e., the classification result implies that the current acoustic environment has aspects of music and somewhat stronger aspects of speech, the beam former (g1) is run with 0.4×B1/1+0.6×B1/2 (o1=o2=0, f1=f2=1 implied). I.e., the gain model is a linear combination of the gain model for music and the gain model for speech, obtained in processing unit 8. The activity parameter set a1 may be identical with this linear combination. Such an activity parameter set a1 is, of course, no more just a simple strength value or an activity setting. Such an activity parameter set a1 can already be, without further processing, the parameters used in the corresponding sub-function.
Of course, instead of the simple linear behaviour of the mixing of the base parameter sets that is exemplary discussed above, also more sophisticated (non-linear) ways of mixing the base parameter sets may be applied.
Said class similarity factors p1, p2 can be obtained, e.g., in the following manner (in classifier unit 4):
In the feature extractor FE, a number of features is extracted from the input audio signals S, e.g., rather technical characteristics like the signal power between 200 Hz and 600 Hz relative to the overall signal power and the harmonicity of the signal, or auditory-based characteristics like common build-up and decay processes and coherent amplitude modulations. Each examined feature provides for at least one value in a feature vector. For one specific current acoustic environment (represented by the input audio signals S1), the feature vector might be (3.0; 2.6; 4.1); note that usually, there will typically be between 5 and 10 or even more features and vector components. There is one feature vector for each predetermined acoustic environment, e.g., (5.3; 1.8; 3.6) for class C1 and (1.2; 3.1; 3.9) for class C2. The class similarity factors p1, p2 are a measure for the inverse distance between the feature vector of the current acoustic environment and the feature vector of class C1 and class C2, respectively. I.e., p1, p2 are measures for the closeness of the feature vector of the current acoustic environment and the feature vector of class C1 and class C2, respectively. A measure for said distance can be obtained, e.g., as the euclidian distance between the vectors, or by means of multivariate variance analysis. For example, the inverse of the square root of the sum of the squares of the differences between the components of the vectors can be used, i.e.,
In this case, the current acoustic environment is more similar to class C2 than to class C1, since p1<p2.
Of course, normalization of each feature vector component (corresponding to a specific feature), e.g., to a range from 0 to 1, and/or a normalization during determining p1,p2 is advisable, and it is also possible to weight different features differently strong during determining p1,p2. A suitable normalization allows to generate class similarity factors, which lie between 0 and 1 and can therefore be expressed in percent (%), wherein the likeness of the current acoustic environment with a predetermined acoustic environment is the higher, the higher (and closer to 100%) the corresponding class similarity factor is. The p1, p2 values in the two simple examples above were assumed to be class similarity factors normalized in such a way.
The averaging unit 9 outputs time-averaged activity parameter sets a1* . . . aM*, which are used for steering the sub-functions g1 . . . gM. The advantages of this will become clear in the following.
The above-described mixing of base-parameter sets already provides for a significant improvement over prior art hearing devices, which can only run at one of a number of predetermined hearing programs at a time, wherein these hearing programs correspond to base parameter sets, which are optimized for a corresponding predefined class. The according switching between the predetermined hearing programs in such prior art hearing devices can be annoying to the user, in particular, if similarity values for competing classes are about equal to each other (e.g., about 50% for each of two classes). In that case, a frequent switching between hearing programs may occur. Since, by means of the above-described mixing of base-parameter sets, (quasi-) continuous adaptations of the transfer function G are possible by means of the invention (without switching), and smooth and agreeable changes will take place in most situations.
There are, nevertheless, situations, when there might still occur undesirable recognizable changes in the transfer function G despite of the base parameter set mixing. E.g., in a car, classification may change within seconds from nearly 100% speech (conversation at a red light) to nearly 100% noise (acceleration) to nearly 100% music (car radio) to nearly 100% speech-in-noise (car radio speaker at medium or high speeds). A too fast adaptation of the transfer function may, in such a case, be undesirable.
A preferable behaviour of the adaptation of the transfer function G shall, as far as possible, fulfill the following points:
1. Upon a changing acoustic situation, the hearing device shall change its signal processing sufficiently fast, but as inconspicuous to the user as possible. This should provide for an optimum performance during most of the time.
These features can be accomplished, at least in part, by means of the following behaviour:
a. In a constantly strongly changing situation, the partly significant changes in signal processing, which would be needed for a full adaptation to different sound classes, shall be averaged out, in order to achieve a more constant (more stable) signal processing.
b. When (after strong changes) an acoustic situation is (again) practically stable (for a certain span of time), the signal processing shall slowly fade towards the appropriate parameter set values (activity parameter sets) for this situation.
c. Only, when class similarity factors have remained relatively stable for a sufficiently long time (i.e., detection of a rather constant acoustic situation for a certain span of time), the hearing device shall (again) react fast upon a detected significant change in the acoustic environment.
Such a behaviour can be readily implemented in form of software or otherwise. One exemplary implementation is shown in
a1(t) is fed to a differentiator 91, which outputs a value representative of the derivative of a1(t), i.e., a measure for the changes in a1(t). Therefrom, the absolute value is taken (reference 92), which then is integrated (summed up) in a leaky integrator 93. Through a leakage factor α, the time, until which the circuit reacts again to a fast change of the input after a series of former fast input changes, is determined.
Accordingly, a measure for the magnitude of changes during the past time is obtained. The corresponding value can be multiplied with a base time constant t0 for adjustment. The so-obtained value is used as the time constant τ for an averager 90, which averages a1(t) during a time span τ and outputs the so-derived a1*(t).
Using an averager with different attack and release time constants (not shown) allows the averaging unit to settle towards a predetermined percentage of the dynamic range of the many fast changes, when many fast changes occur. Only when the input to the averaging unit settles, the output of the averaging unit will follow slowly.
Both, the averaging in the averaging unit 9 and the processing in the processing unit 8 may be adjusted individually for different parameters of an activity parameter set and/or for parameter sets for different sub-functions.
E.g., for sub-functions, which tend to strongly annoy the user when subject to rapid changes, greater time constants for averaging may be chosen (e.g., via to), whereas a more rapid following of a1*(t) to a1(t) may be chosen for sub-functions that result in less strong irritations when changed. In the case of an averager with different attack and release time constants (not shown), different ratios of attack time constants to release time constants may be chosen for different sub-functions.
As has already been stated above, it is possible to have just one single parameter as a1 for a sub-function. That parameter can be considered the “strength” or the “activity” of the sub-function.
It is to be noted that a time-averaging like the time-averaging described above, may not only be used for activity parameters (or more particularly, for each value or number of an activity parameter set), but may also be used, in general, for smoothing any other adjustments of a transfer function G. It is applicable to any (dynamically and/or continuously) adjustable processing algorithm.
It is furthermore to be noted, that the various units and parts in the Figures are merely logic units. They may be implemented in various ways, e.g., all in one processor chip or distributed over a number of processors; in one or several pieces of software and so on.
Number | Date | Country | |
---|---|---|---|
60747330 | May 2006 | US |