The invention relates to a method of evaluating perception intensity of an audio signal as stated in claim 1.
Perception intensity estimates of audio signals have been the subject of research for decades. Although audio signal processing and acoustic engineering have reached significant progress with respect to different aspects of recording, engineering, storage and reproduction many key issues have been left as they originally were, namely aspects which were to be dealt with on the basis of subjective analysis of the skilled sound engineer. This manual approach to several key issues is, of course, acceptable to the degree that the individual preference of the recipient, i.e. the listener, determines the individual opinion of the quality of the perceived audio signal.
For different purposes it would, however, be advantageous if a more automated approach to the processing of audio signal was possible. One of these purposes is loudness estimates, which relate to the different listeners' perception of how loud a present signal is. An automated loudness estimation of audio signals is highly needed for different purposes such as automatic gain control in relation to broadcasting or, e.g., reproduction of audio signals in a car.
A problem related to measuring of loudness is that it for many years has been well accepted that the loudness perception of an audio signal is not just a straightforward measurement and a subsequent processing of an audio signal to be evaluated.
A more advanced example of loudness estimation is disclosed in US 2004/0044525 A1 where loudness estimation is based on the assumption that loudness of speech must be evaluated differently than other audio signal components. A problem of the disclosed method is that a signal to be evaluated initially must be processed for the purpose of identifying and separating speech components, which is a relatively complicated and processing consuming affair.
It is the object of the invention to obtain a relatively straightforward and universal loudness evaluation and estimation, which may also serve as the basis for automated gain control.
The invention relates to a method of evaluating perception intensity of an audio input signal (IS) comprising the steps of
receiving the audio input signal (IS),
estimating a time variant distribution function (TVDF) on the basis of said audio input signal (IS) or a derivative thereof,
determining the perception intensity as at least one perception intensity estimate (PIE) on the basis of said estimated time variant distribution function (TVDF).
According to the invention a perception intensity has been obtained on the basis of a time variant distribution function, thereby obtaining an advantageous universal and flexible determination of a perception intensity. The universal applicability is basically obtained due to the fact that a distribution function may match and describe audio input signal of very different nature. Thus, according to the invention even speech, music and noise may be evaluated on the basis of a distribution function.
In an embodiment of the invention said estimation of a time variant distribution function (TVDF) refers to the audio input signal (IS).
According to an embodiment of the invention, a time variant distribution function should, preferably, be performed on the basis of the input signal as; in other words, a feed-forward implementation of the invention. Alternatively, the estimation according to the invention may also be performed on the basis of the output signal
In an embodiment of the invention said estimation of a time variant distribution function (TVDF) is made on the basis of a modified audio input signal (MIS)
According to an embodiment of the invention, a time variant distribution function should, preferably, be performed on the basis of the actually modified audio input signal; in other words, a feed-back implementation of the invention.
In an embodiment of the invention said audio input signal comprises a sequence of input samples (IS).
According to a preferred embodiment of the invention, establishment of one perception intensity estimate in the form of a sample should be made on the basis of several audio input signal representative samples, preferably at least two, in order to benefit from the signal history.
In an embodiment of the invention said perception intensity estimate comprises an output sample.
In an embodiment of the invention said time variant distribution function (TVDF) is estimated by a shape description of a distribution function.
Basically, according to the invention a shape should facilitate utilization of not just only a simple representation or single point of such distribution but rather a representation representing the variation of the distribution function. In this specific context variation should not be regarded as a strict mathematical expression, e.g. only variance, but rather reflect the fact that the shape of a distribution function may vary and that this variation may be estimated for the purpose of obtaining an advantageous evaluation of perception intensity. In this context it should also be stressed that a shape description may also comprise parameters or measures, which may not specifically relate to a specific point of the distribution function. On the other hand, such parameters or measures should of course be derived from the distribution function.
Note that the shape refers to a time variant distribution function and thus also comprises a location and a scale. Consequently the shape may form a basis for derival or direct extraction of relevant feature parameters of the time variant distribution function.
In an embodiment of the invention said time variant distribution comprises an amplitude distribution function.
In an embodiment of the invention said time variant distribution comprises a power distribution function.
In an embodiment of the invention said time variant distribution comprises a sound intensity distribution function.
In an embodiment of the invention said time variant distribution comprises a two-dimensional distribution function.
In an embodiment of the invention the determining of the perception intensity estimate (PIE) is made on the basis of at least two time variant distribution functions (TVDF) estimated at least two different times.
In an embodiment of the invention the determining of the perception intensity representative output samples (OS) is on the basis of a weighted accumulation of at least two time variant distribution functions (TVDF) estimated at least two different times.
According to a preferred embodiment of the invention, the estimated time variant distribution function (TVDF) should be weighted over time in order to facilitate the desired derivation of perception intensity. This feature is particular strong when the perception intensity to be determined relates to a loudness estimate.
In an embodiment of the invention an output sample (OS) is determined on the basis of a least two audio input samples (IS)
According to a preferred embodiment of the invention an output sample should, preferably, be based on at least two input samples, thereby obtaining an advantageous description of an input signal, which may broadly be applied for the derivation of a perceptual intensity of representations of audio signals of very different nature.
In an embodiment of the invention the determining of the perception intensity on the basis of said estimated time variant distribution function (TVDF) is done according to at least one non-linear function (NLF).
According to an advantageous embodiment of the invention, a loudness estimate is based on the basis of determination of at least two different statistical functions characterising the evaluated input signal on the basis of non-linear signal processing.
A typical modification would be applied for the purpose of obtaining automatic equalisation of loudness, although other types of gain control may be applied within the scope of the invention.
According to the invention, a non-linearity may form a necessary and advantageous way of deriving a representative loudness estimate.
In an embodiment of the invention said at least one non-linear function (NLF) is established by an artificial neural network (ANN: artificial neural network).
In an embodiment of the invention said artificial neural network comprises a multilayer perceptron.
In an embodiment of the invention said at least one non-linear function is established by means of polynomial fitting.
In an embodiment of the invention said at least one non-linear function is established by means of splining.
In an embodiment of the invention the evaluation is established by a serial, a parallel or a combination thereof of at least two non-linear functions (NLF).
According to the invention, an overall desired evaluation may advantageously be split up in several different non-linear signal processing steps. Examples of such splitting may, e.g., comprise a pre-processing of an input signal performed by at least one non-linear function in one or several bands or partial representations of the input signal prior to a non-linear processing of the individual or combined representations obtained by the pre-processing. An example of such pre-processing may, e.g., be establishment of non-linear typically well-known statistical functions representing the input signal in one or several bands according to predetermined signal processing and subsequently performing a signal processing of the combined signals on the basis of one or several non-linear functions. The subsequent one or several non-linear functions will typically be non-linear functions adapted specifically for the purpose of bringing the result of the established pre-filtering into an estimate of perception intensity.
Evidently, further processing steps than the above-described may be inserted prior to, between and after the above-explained processing steps.
In an embodiment of the invention said perception intensity comprises loudness.
In an embodiment of the invention said perception intensity comprises sharpness, annoyance, airiness, punchiness, brilliance, presence, fatness, deepness and edginess or any combination thereof.
In an embodiment of the invention the estimation of said time variant distribution function (TVDF) is made on the basis of at least two different feature characterizing parameters of said audio input signal (IS)
In an embodiment of the invention at least one of said at least two different characterizing functions comprises a time variant statistical function.
According to a preferred embodiment of the invention, two statistical functions are applied as a combined representation of the desired time variant distribution function.
In an embodiment of the invention at least one of said feature characterizing parameters comprises a central value over time, such as a mean value, an average value and/or a median.
In an embodiment of the invention at least one of said feature characterizing parameters comprises a measure of the spread over time, standard deviation, variance or inter quartile range.
In an embodiment of the invention preprocessing of the audio input signal is done prior to the establishment of said at least two feature characterizing parameters.
In an embodiment of the invention said time variant distribution function is determined in a time window.
According to an advantageous embodiment of the invention, the time variant distribution function should be determined as a function of time and in a time window of the input signal. In this way, a runtime updating of the perception intensity may be obtained and, moreover, when applying a time window, a memory in the method with respect to previous behavior of the input signal.
Examples of a runtime window would range from, e.g., approximately 1/10 second and, e.g., up to 30 seconds. Evidently, the window may in principle be much larger than 30 seconds, solely depending on the input signal to be evaluated and the intentions of the user. An overall evaluation of perception intensity of an audio signal, e.g. an audio track of a CD or several minutes, may, thus, be evaluated according to the invention if so desired.
In an embodiment of the invention at least two different partial representations (PR1, PR2, . . . PRn) of the audio input signal (IS) are established,
at least two different statistical functions (SF1, SF2, SFn) are established on the basis of at least one of said different partial representations (PR1, PR2, . . . PRn) of said audio input signal (IS),
said determined statistical functions are combined into a loudness representation by means of at least one non-linear signal processing.
According to a preferred embodiment of the invention, the loudness estimation is initially performed on the basis of an (initial) individual analysis of different bands of the complete audio input signals, which are subsequently combined into at least one, preferably one, combined loudness estimate.
In an embodiment of the invention said audio input signal is modified on the basis of said evaluated perception intensity.
According to the invention the evaluated perception intensity should preferably from the basis of a modification of the input signal or an input signal corresponding thereto. The modification should preferably be automatic by means of signal processing hardware.
In an embodiment of the invention said modifying of the audio input signal is performed as a gain control of the complete or a part of the audio input signal (IS).
According to an embodiment of the invention, different controlling of the input signal may be performed on the basis of the determined loudness estimate although a simple straightforward gain control may typically be quite sufficient in order to establish, e.g., a somewhat smoothed loudness between different input signals. In some embodiments, however, a gain control may, e.g., be narrowed to a certain band or certain bands, e.g. by a boosting or a damping of parts or a part of the input signal.
In an embodiment of the invention said audio input signal (IS) comprises a multichannel signal.
According to an embodiment of the invention, a multichannel signal may, e.g., comprise a stereo signal, a five or six-channel surround sound signal format, etc, all representing an audio representation which may be evaluated advantageously into one or a number of perception intensity representations. One of these may, e.g., be an overall loudness perception intensity of the complete multi- channel signal.
In an embodiment of the invention the perception intensity refers to one shared parameter evaluation of the audio input signal or a derivative thereof.
In an embodiment of the invention the audio input signal or a derivative thereof is evaluated with respect to two or more different types of perception intensity and combinations thereof.
According to an embodiment of the invention, the perception intensity of an audio input signal may comprise sharpness, annoyance, airiness, punchiness brilliance, presence, fatness, deepness and edginess or any combination thereof. In other words, an example of a more complex evaluation of an input signal would be an evaluation of a 5.1 audio input signal with respect to loudness and annoyance.
In an embodiment of the invention said method is implemented in signal processing hardware, such as a digital signal processor and optional supporting electrical circuitry.
In an embodiment of the invention said non-linear function (NLF) is established on the basis of adaptation data (AD).
Adaptation AD could e.g. be registering the user behavior of a signal processing device, e.g. a consumer amplifier, and modifying the performed signal processing accordingly. A specific example of such embodiment may be an amplifier, which may be used in a “learn-mode” by a user and combined with a registered user behavior—e.g. a registering of the user settings, modifying the function of the block ASP. This embodiment is in particular advantageous when applying a non-linear transfer function established by a neural network, as the learn mode may be activated on a run-time basis if so desired.
Adaptation data AD could also be a previously collected data set.
Moreover, the invention relates to a perception intensity estimating device comprising signal processing means performing the method according to any of the claims 1-34.
In an embodiment of the invention, the device comprises monitoring means for displaying the estimated perception intensity.
In an embodiment of the invention, the device comprises control means for controlling connected electronic circuitry in response to the established perception intensity.
Moreover, the invention relates to the use of perception intensity established according to any of the claims 1-34 for automatic control of electronic circuitry.
The invention will now be described with reference to the drawings of which
Initially, an embodiment of the invention will be described specifically with reference to a specific time varying audio sequence and related to loudness evaluation.
A more detailed and general explanation of the invention will be given subsequently.
Basically, the illustrated audio signal was constructed to represent six different audio signals each forming a two second sound segment window from each of the following sound segments: a
A) 1 kHz tone,
B) Pink noise
C) Reference female speech
D) Rock music
E) Big band jazz
F) Clarinet duet
According to the invention an audio input signal, preferably in the forms of one or a number of sample streams, should initially be processed in order to extract the necessary and sufficient input signal characterizing features. Examples of such time variant characterizing features are inter quartile range, median, sum of squares, percentiles, average, maximum, minimum, standard deviation, sum or variance and combinations thereof. The combination of these characterizing features should, according to the invention, characterize the distribution function of the audio input signals. The necessary exactness of the time varying functions may vary depending on the desired type of evaluation and the type of input signal. It is generally desired that a two-dimensional representation of the time varying distributing function representing the input signal is obtained.
The specifically chosen and illustrated parameters are statistical parameters such as maximum, median and inter quartile range (IQR), defined as the distance between the first and third quartile of a specific statistical representation of an input audio signal. The illustrated characterizing features are well-known within the art.
In the following, each of the abovementioned six two-second segments will be analyzed individually and non-overlapping in a single frequency band. The two calculated signal features are: the median and the inter-quartile range (IQR) of the dB magnitude of the signal. These two functions are commonly used in descriptive statistics as robust measurements of the central tendency and the spread of a distribution, respectively.
In
In
Turning now to
According to the illustrated embodiment of an evaluation of perception intensity—in this embodiment loudness—the input audio signal is initially divided into nine octave bands B1 to B9. The magnitude in each octave frequency band B1 to B9 is illustrated in
In
In
An audio input signal representation IS is input to a block FPE performing feature parameter extraction. The performing feature parameter extraction has the purpose of representing the input signal IS suitably for the further evaluation of the signal.
The audio representative input signal must be represented in a certain way to facilitate the desired evaluation of perception intensity. Basically an at least two-dimensional statistical description over time of the input signal must be estimated for the purpose of evaluating perception intensity according to the invention. More specifically such a two-dimensional description of the input signal is referred to as a distribution function of the input signal.
Several different statistical functions may be applied within the scope of the invention. Examples of such function may be inter quartile range, median, sum of squares, percentiles, average, maximum, minimum, standard deviation, sum, variance.
It must be stressed that the description of the shape of the distribution function may be obtained in several different ways, e.g. by means of at least two at least partly linear independent functions. Evidently, further descriptive parameters, i.e. further dimensional description serving the purpose of providing a more detailed description of the distribution function, may be applied according to the invention. It should also be noted that a partial description of the distribution function of the input signal according to the invention may also be obtained by more conventional filtering typically not associated as a statistical function. An example of such is a mean value over a time interval which may be e.g. be obtained by a conventional integrating filter.
It should, moreover, be noted that the shape of a distribution function preferably refers to a shape of a function which has been fixed with respect to the axis of the distribution function.
In this context it should, generally, be stressed that various processing may occur both prior to and subsequently to the estimation of a distribution function of an input audio signal within the scope of the invention. Examples of such pre or post processing is the use of an asymmetrical low pass filter, rectification, squaring, evaluation of power functions, taking the logarithm, etc.
Another example is an initial band-pass filtering of an input audio signal into two or several bands for the purpose of individual handling of the different bands prior to the estimation of perception intensity. Such initial splitting of the input signal into different bands may, e.g., ease the process of establishing a non-linear function fitting a relevant perception intensity reference database.
Generally, such preprocessing is preferred, for the purpose of reducing the complexity of the subsequent establishment of a perception intensity estimate.
Specific examples of feature parameters of an input signal have already been given in FIGS. 3 to 8.
The length of the time intervals of the input signal applied for extraction of feature parameters may vary from application to application. Likewise, the interval between the evaluation of a new perception intensity estimate may vary. The two mentioned intervals do not necessarily need to be identical.
In the next block SP a signal processing is performed and a resulting perception intensity estimate PIE is output.
It is stressed that the invention, although very advantageous with respect to loudness as explained above, may be utilized for evaluation of very different types of perception intensity such as sharpness, annoyance, and airiness. In this context it is noted that the invention features a very advantageous adaptation to each purpose as the invention basically needs to adapt ultimately one non-linear function to the purpose as the rest of the processing equipment and critical settings may be fixed or principally fixed. In this context it is noted that an initial setting of a non-linear function may be changed over time, e.g. on the basis of user behavior.
According to an advantageous and preferred embodiment of the invention the signal processing performed in the block SP is based on a non-linear transfer function.
The preferred processing of the estimated distribution function is non-linear as the available non-linear processing is very advantageous in connection with complex evaluation of two or several input parameters. One reason is that a non-linear function may be established on the basis of a multidimensional input by machine-learning, e.g. by means of a neural network.
Several different non-linear functions may, generally, be applied according to the invention. Examples of such functions will be given below.
Although the non-linear function has proven to be very advantageous for the purpose of evaluating perception intensity, it has proved to be a particular strong evaluation basis when evaluating audio signals represented by distribution function descriptive parameters. Preferred descriptive parameters comprise two substantially orthogonal or linearly independent descriptive parameters expressing a central tendency and a spread of distribution of preferably the amplitude of an input signal.
The resulting perception intensity estimate PIE may, e.g., be fed to a perception intensity metering for a run-time monitoring of the perception intensity of the input signal IS. En example of such meter may be a loudness meter.
Evidently, several other blocks or steps may be added to the illustrated embodiment between the processing blocks and as pre-processing, post-processing or combinations thereof. An example of such embodiment will be described subsequently with reference to
In this embodiment an input signal IS is feature extracted in a feature extraction block FPE and perception intensity estimate is subsequently established on the basis of the distribution function established by block FPE.
Moreover, the input signal IS is bypassed to a signal processing block SPA and the input signal IS may then be processed according to the perception intensity estimate PIE established by the block SP. The resulting modified audio signal MIS is subsequently output. A real-life example of such an embodiment is an automatic gain control of an input signal IS.
An input signal IS is fed to a signal processing block SPA and the input signal IS may then be processed according to the perception intensity estimate PIE established by the block SP. The resulting modified audio signal MIS is subsequently output. A real-life example of such an embodiment is an automatic gain control of an input signal IS. According to this embodiment, however, the feature extraction is performed on the resulting modified output signal.
The adaptive signal processing block is adapted for adaptation data AD. Adaptation AD could e.g. be registering the user behavior of a signal processing device, e.g. a consumer amplifier, and modifying the performed signal processing accordingly. A specific example of such embodiment may be an amplifier, which may be used in a “learn-mode” by a user and combined with a registered user behavior—e.g. a registering of the user settings, modifying the function of the block ASP. This embodiment is in particular advantageous when applying a non-linear transfer function established by a neural network, as the learn mode may be activated on a run-time basis if so desired.
Adaptation data AD could also be a previously collected data set.
The described flow chart may, e.g., be implemented in a signal processing device or signal processing circuitry described in principles according to
Initially, in step 100 an audio signal representation is provided, typically in the form of a digital audio signal. Evidently, an analog program material may be applied although an initial A/D conversion would be strongly preferred for the purpose of a subsequent streamlined and efficient signal processing.
In step 101 a time window is applied to the provided audio signal representation. In the illustrated embodiment, the selected window is chosen to be the individual sound segments; that is, the six different audio signals as explained with reference to
In step 102 the input audio signal is normalised in level in order to optimize use of the dynamic range of the following steps. The normalization is performed by using a weighted RMS measurement. This level normalisation is compensated at the end of the measurement procedure.
In step 113 a broadband crest parameter is calculated as the ratio between the overall unweighted RMS value and a pseudo peak value (attack time 1 ms). This value, Crest, is converted into dB.
In step 103 a filterbank is applied as a rough approximation of the frequency analysis in the human ear. The applied filters are octave wide, and an overall bandwidth limitation is also applied.
In step 104 a full wave rectification is applied to the processed signal. Thus, the output of each band is passed through an abso function. This implies that the loudness measurement method is insensitive to the absolute phase of the input signal.
In step 114, for each band, the BandCrest is the maximum value divided by the overall RMS value per band. This value is converted into dB. The BandCrest vector contains one value for each frequency band.
In step 105, each of the rectified filter output signals are filtered with a first order low pass filter with asymmetric time constants to extract the short-term envelope of each band. For rising level the time constant—natural logarithm based—is 20 ms, for falling level the time constant is 50 ms
In step 106 the level of the processed signal is converted to level in dB by taking 20 times the logarithm (base 10) of the envelope.
In step 107, for each band, two percentiles are calculated: The 50th percentile (corresponding to the median) and the 90th percentile (corresponding to the value which 10% of the values are above). These two latter statistics are referred to as the lower and the upper percentiles, respectively
In step 108 a feature vector is constructed from the following parameters:
Each of the linear combinations is implemented by first subtracting a constant value from each contributing parameter, and then multiplying the result by another constant value.
Finally, the products are summed:
N is the number of parameters in each vector. For the percentile differences N=9. For the crest parameters, N=10.
In step 109 the non-linear function is established for the purpose of mapping the feature parameters into a loudness estimate.
To estimate the loudness value based on the feature set an artificial neural network is employed. The applied network comprises a multi-layer perceptron type having a tan-sigmoid activation function for the units in the single hidden layer and, moreover, it comprises a single output unit with a linear activation function. The tan-sigmoid activation function is expressed as:
The topology of the neural network is as follows: There are thirteen input units (normalised features). The first nine represent bands 1-9 from the reference signal, the last 2 plus 2 are the percentile difference and crest features, respectively. These thirteen input units are connected to hidden-layer units of the ANN, and the hidden-layer units are in turn connected to the single output unit. The input to the neural network, thus, consists of the 9+2+2 feature parameters, normalised by addition of real-valued constants in the range [−50,50], and multiplication by real-valued constants in the range [0,10]. The weights connecting the units of the network are optimised to predict the perceived loudness. The neural network weights are real-valued constants in the range [−16,16], and the bias values are real-valued constants in the range [−3,71].
In step 110 a loudness estimate is determined on the basis of the above-described non-linear function provided according to the previous step.
The last step in computing the relative loudness level value consists of de-normalising the output of the neural network. This may be done by adding the weighted level measured at the start in step 102 to the output of the neural network.
In step 115 the loudness of a reference signal is provided.
Using the model as described in the previous, the loudness of a reference signal is estimated corresponding to the output of block 110. This value is kept as a constant within the model in order to enable calculation of gain correction values. The model itself does not assume any particular relationship between digital levels and playback SPL but a practical value for some purposes would be 100 dB SPL for digital full scale. With this assumption the loudness level estimate of a specific reference signal used is: 72.2 dB (phon).
In step 111 and 112, a gain correction is computed.
This is done by subtracting the measured loudness estimate from the stored reference loudness. This results in the desired relative loudness estimate expressed as the gain correction having to bring the tested sound segment to the same perceived level as the reference segment. Evidently, such estimate may freely be established or calculated according to other methods or ideas of presentation.
Note that certain steps of the above-described flowchart may be omitted and that the flow chart may include several further process steps within the scope of the invention.
In
In
In
Several others than the above-listed distribution function characterizing parameters may be applied according to the invention. Examples of such parameters are listed below. Moreover, it should be noted that the distribution function may be estimated by more than two characterizing parameters, e.g. four, namely a combination of the illustrated parameters of
In
Applicable distribution function characterizing parameters.
Below is a list of various common scalar or 1-dimensional statistical parameters that may characterize the distribution of a given data sample. For instance, the location, the spread, or the symmetry of the distribution may be measured. In each case, the parameter is calculated from a set of n sample values, denoted xi (i=1 . . . n).
Mean Values:
The arithmetic mean,
The geometric mean,
The harmonic mean,
Variance, and Standard Deviation:
The sample variance,
The standard deviation,
Average Absolute Deviation and Median Absolute Deviation
The average absolute deviation (AAD) is defined as,
The median absolute deviation (MAD) is defined as,
MAD=median(|xi−{tilde over (x)}|)
where {tilde over (x)} is the median of the data x.
Coefficient of Variation:
CV=s/{tilde over (x)}*100%
Min, Max, Range and Mid Range:
The min and max are the minimum and maximum values, respectively.
The range of x is then,
Range=max−min
The mid range is,
MidRange=(min+max)/2
Percentile:
The rth percentile of x is the value such that r percent of the data in x falls at or below that value.
Interpolated Percentile:
Interpolation, such as linear interpolation, may be used in the calculation of the percentile, which makes the percentile parameter ‘smoother’, in particular in cases with small sample sizes.
Median and Quartiles:
The median is the value such that half of the data in x falls below that value and half above,
median={tilde over (x)}
The first, second and third quartiles are,
The inter-quartile range (IQR) is,
IQR=Q3−Q1
The mid mean is,
The trimmed mean is similar to the mid mean except that different percentile values are used. A common choice is to trim 5% of the data in both the lower and upper tails of the distribution, i.e. the trimmed mean is the mean of the data between the 5th and 95th percentiles.
The winsorized mean is similar to the trimmed mean. However, instead of trimming the extreme data samples, they are set to the lowest (or highest) value. For example, all data below the 5th percentile is set equal to the value of the 5th percentile, and all data greater than the 95th percentile is set equal to the 95th percentile.
It should be noted that many of the other parameters can be formulated in ‘trimmed’ or ‘Winsorized’ versions too.)
Mode:
For continuous data distributions, any specific value may not occur more than once. Therefore, the mode may be defined as the midpoint of the histogram-interval with the highest peak.
Skewness:
The skewness measures the amount of asymmetry of the distribution,
Kurtosis:
The kurtosis measures the concentration of data around the peak and in the tails versus the concentration in the flanks of the distribution.
The r′th Central Moment:
For example, the second central moment (r=2) is the same as the maximum-likelihood estimate of the variance.
Outlier-Detectors:
A) The proportion of the data samples that is higher than m standard deviations above, or lower than m standard deviations below the mean value:
B) The proportion of the samples that is higher than m times IQR above, or lower than m IQR below the median value.
Miscellaneous:
It should be emphasized that the above-mentioned exemplary distribution function characterizing parameters may be supplemented or combined with other suitable weights or relevant filters fulfilling the requirements of obtaining a suitable description of a distribution function for the purpose of obtaining an evaluation of perception intensity.
The perception intensity evaluator comprises an input block BP comprising a filter bank of band-pass filters, e.g. octave filters adapted in a conventional manner to divide an incoming audio signal into a parallel representation. The parallel representations are fed to an analyzer block DFC. The analyzer block DFC is adapted for extraction of feature parameters of the input signal. Such feature parameters have also been referred to above as distribution function characterizing parameters.
When the distribution function of the individual bands has been established, they are fed to a processing block NF performing a non-linear processing of the parallel signal. The resulting processing is transformed into one expression of the overall perception intensity in the block PIE. Processing block NF may be adapted to adaptation data AD as previously described with reference to
Subsequently, the established evaluation is fed to a block ACE performing a monitoring of the evaluated perception intensity and/or performing an automatic control of the signal on the basis thereof.
The illustrated hardware may, e.g., be implemented in a Motorola DSP 56303 and optional supporting circuitry.
Moreover, the illustrated device may comprise monitoring means (not shown) for displaying the estimated perception intensity.
Moreover, the illustrated device may comprise control means for controlling connected electronic circuitry in response to the established perception intensity (not shown).
It should finally be stressed that the above examples should in no way be regarded as en exhaustive and full list of every embodiment applicable within the scope of the invention.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/DK04/00458 | 6/25/2004 | WO | 12/21/2006 |