The present invention relates to audio signal processing technical field, and more specifically, to an audio signal processing system, a loudspeaker and an electronics device.
Improving the sound quality of audio devices is most often done using audio algorithms such as equalizers, dynamic range compressors and limiters to compensate for non-ideal capabilities of the loudspeakers including amplifiers in the devices. Often there is a desire to increase the loudness of a device by means of audio algorithms because it is impractical to do this by using larger loudspeakers and/or amplifiers capable of delivering higher output voltages.
When boosting the audio signal, the amplitude will not exceed the full-scale value. For signal processing in the digital domain, the full-scale value is the digital full-scale value, and for signal processing in the analogue domain, the full-scale value is in this context the maximum input voltage the amplifier can handle. One way of restricting the amplitude to the full-scale limit is to apply clipping. For many audio signals this will result in audible distortion and a degraded audio quality. A more common approach is to use a peak limiter which uses dynamic gain regulations to keep the signal within the full-scale limits. This approach will for many signals result in less audible distortion than the clipping approach but also reduced loudness compared to clipping and may introduce undesired audible signal modulation known as pumping.
In the area of music production, especially music mastering, a common method for maximizing loudness is to use a combination of peak limiting and clipping. For many music signals it is possible to apply clipping to some parts of the signal while keeping the amount of audible distortion within reasonable limits. This method cannot be directly utilized in the area of audio enhancement because it is highly content dependent and requires knowledge about when it is acceptable from a perceptual viewpoint to apply clipping.
Therefore, there is a demand in the art that a new solution for audio signal processing shall be proposed to address at least one of the problems in the prior art.
One object of this invention is to provide a new technical solution for audio signal processing.
According to a first aspect of the present invention, an audio signal processing system is provided, which comprises: a clipping threshold estimator, which receives an input audio signal and outputs at least one clipping threshold; and an audio processing unit, which receives the input audio signal, processes the input audio signal to control the non-linear distortion added to the input audio signal based on the clipping threshold and outputs an output audio signal to a loudspeaker driver, wherein the clipping threshold estimator includes: an extraction unit which extracts a set of features from the input audio signal; and a regression or classification unit which receives the set of features and converts the set of features into the at least one clipping threshold by using a regression or classification processing.
According to a second aspect of the present invention, a loudspeaker is provided, which includes: a loudspeaker driver; and the audio signal processing system according to an embodiment of this disclosure, wherein the audio signal processing system outputs the output audio signal to the loudspeaker driver According to a third aspect of the present invention, an electronics device including the loudspeaker according to an embodiment of this disclosure is provided.
According to an embodiment of this invention, the present invention can improve the performance of an audio processing system.
Further features of the present invention and advantages thereof will become apparent from the following detailed description of exemplary embodiments according to the present invention with reference to the attached drawings.
The accompanying drawings, which are incorporated in and constitute a part of the specification, illustrate embodiments of the invention and, together with the description thereof, serve to explain the principles of the invention.
Various exemplary embodiments of the present invention will now be described in detail with reference to the drawings. It should be noted that the relative arrangement of the components and steps, the numerical expressions, and numerical values set forth in these embodiments do not limit the scope of the present invention unless it is specifically stated otherwise.
The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the invention, its application, or uses.
Techniques, methods and apparatus as known by one of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.
In all of the examples illustrated and discussed herein, any specific values should be interpreted to be illustrative only and non-limiting. Thus, other examples of the exemplary embodiments could have different values.
Notice that similar reference numerals and letters refer to similar items in the following figures, and thus once an item is defined in one figure, it is possible that it need not be further discussed for following figures.
As shown in
The audio signal processing system 11 comprises a clipping threshold estimator 20 and an audio signal processing unit 30.
The clipping threshold estimator 20 receives an input audio signal and outputs at least one clipping threshold. For example, the clipping threshold estimator 20 may output one clipping threshold for all frequency of the audio signal, or it can output multiple clipping thresholds, each of which is used for a specific frequency band of the input audio signal.
The audio processing unit 30 receives the input audio signal and processes the input audio signal to control the non-linear distortion added to the input audio signal based on the clipping threshold. The audio processing unit 30 processes the input audio signal to control peaks and clipping levels for the input audio signal based on the clipping threshold. Then, the audio processing unit 30 outputs an output audio signal to the loudspeaker driver 12 for playing.
As shown in
In this disclosure, the clipping threshold estimator uses an estimator algorithm (regression or classification processing) to perform an analysis of an audio signal to estimate how much clipping can be applied to the signal while keeping the audible distortion below an acceptable level. The clipping threshold estimator 20 extracts features of the input audio signal and outputs clipping threshold based on the features of the input signal. The output of the estimator algorithm is a clipping threshold signal which states how much peaks in the audio signal can be reduced by means of clipping, limiting and so on. As such, the clipping threshold could be content-dependent on the input audio signal. The loudspeaker including such an audio signal processing system with a clipping threshold estimator can produce a clipping/limiting on the audio signal with reduced audible distortion and pumping feeling for a listener while increasing the loudness.
The regression or classification processing may include at least one of a processing using an artificial neutral network, a processing using a decision tree and a logistic regression processing. When producing a clipping threshold, the processing can take the content of the input audio signal into consideration by using the features therein.
The regression or classification unit 22 can be trained by using a training set of short audio chunks in advance. The short audio chunks have been clipped at various clipping thresholds and have been labelled by audible degrees. For example, listeners can label short audio chunks by stating how audible a clipping is for each audio chunk. That is, the clipping threshold is an estimate of how much clipping can be applied to the signal while keeping the audible distortion below an acceptable level.
Alternatively, the regression or classification unit 22 can be updated (trained) during the using of the loudspeaker. For example, one or more sensors can be used to capture the reactions of a listener when playing an audio signal with a recorded clipping threshold, and a processing unit can process the data obtained from the sensors and output an indication stating the possible audible feeling of the listener. Then, the recorded clipping threshold and the corresponding indication can be used to update the regression or classification unit. The sensors can include at least one of the following components: a camera which captures the reaction of the listener such as face expressions, a microphone which captures the reaction sound of the listener and a log record which records the operations by the listener on the volume key of the electronics device where the loudspeaker is located. These can continuously improve the audio signal processing system as a user uses the electronics device. The recorded clipping threshold and its corresponding indication can be sent to the manufacture entity via Internet and can be used to train other audio signal processing systems (audio signal processing system in later loudspeakers).
The clipping threshold estimator 20 can further receive update configuration data to update its regression or classification unit 22. As such, the clipping threshold estimator 20 is configurable and updatable to continuously improve the listening experiences for the listener.
For example, the clipping threshold estimator 20 outputs multiple clipping thresholds. Each clipping threshold is an estimate of how audible the clipping is when being applied in a specific frequency band of the input audio signal. The clipping threshold can be used as control inputs for an algorithm which splits the input signal into frequency bands, applies a boost to each band and reduces the peak amplitude in each band using clipping according to the supplied clipping thresholds. The clipping thresholds could also be used as control inputs for a multiband dynamic range compressor which uses the clipping thresholds to allow clipping in combination with the compression and gain applied in each frequency band.
Each clipping threshold can be calculated using a separate regression or classification unit 22 which can be trained in a similar manner as described in this disclosure. The clipping thresholds can also be estimated from the wideband clipping threshold using simpler means such as a multiplication factor for each band.
Generally, clipping introduced distortion in the form of harmonics and intermodulation distortion of the frequency components in the audio signal. How audible these distortion components are, depends on how they are masked by other frequency components already present in the audio signal. The audibility of applying clipping to an audio signal is therefore highly correlated with how the energy in the signal is distributed across frequencies. In general, clipping is highly audible if only a few tonal components are present in the signal and less audible if the signal is more noise alike. The inventor of this invention found that this could be used in clipping estimation.
If the input audio signal has a tonal character, the minimum power across all bands will be low (close to zero) and if the audio signal is broad band noise, the minimum power across all bands will be relatively high. In addition, the minimum band power for the higher bands will be relatively high if the input audio signal resembles high frequency noise in which case a high amount of clipping can be applied without being audible.
Here, the two minimum power values (the first minimum normalized power value for all frequency bands and the second minimum normalized power value for a set of the frequency bands covering higher frequencies of the input audio signal) can be used as features to estimate a clipping threshold. This clipping threshold can be used as is or combined with other features to improve the quality of the clipping threshold estimator 20.
In
In
As shown in
The dynamic booster 304 could be a compressor or multiband compressor. The clipping threshold estimated by the clipping threshold estimator 20 controls the maximum peak level in the limiter 305 such that peaks up to the clipping threshold are allowed in the output of the limiter 305.
In
As shown in
Here, the equalizer 306 is used to compensate for a non-ideal frequency response of a loudspeaker in a device and a multiband compressor 307 is used to apply dynamic gains and clipping in a set of frequency bands to increase bass, treble and overall loudness. Dedicated clipping thresholds for each frequency band is provided by the clipping threshold estimator 20 to control how much clipping is allowed in each band in the multiband compressor. A wideband clipping threshold is supplied to the limiter 308. As explained above, a clipper may be placed after the limiter 308.
In
In
In
Here, the input audio signal to the clipping threshold estimator 20 is filtered by a transducer filter 23 tuned to match the linear magnitude response of the loudspeaker driver 12. By taking the magnitude response of the loudspeaker driver into account, a clipping threshold that better matches the audio emitted by the loudspeaker 10 can be obtained because each frequency is weighted according to how it is reproduced by the loudspeaker 10. Hereby frequencies which cannot be reproduced (e.g. frequencies far below the resonance frequency of the loudspeaker) are not taken into account by the clipping threshold estimator 20. In
Similar with
In
The clipping used in the embodiments of this disclosure can be hard clipping or different types of soft clipping. Ideally, the labelled audio chunks used to train the regression or classification unit 22 in the clipping threshold estimator 20 can be created using the clipping type used in the audio processing. In practice, a simple multiplication factor can be applied to the clipping threshold to compensate for different clipping types.
The use of the clipping threshold estimator 20 is not limited to controlling peak and clipping levels. The clipping threshold estimator 20 can also be used to control other parameters which affects the amount of non-linear distortion added to the audio signal. For example, it can be attack and release times in a limiter.
In
As shown in
Although some specific embodiments of the present invention have been demonstrated in detail with examples, it should be understood by a person skilled in the art that the above examples are only intended to be illustrative but not to limit the scope of the present invention.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/CN2020/136801 | 12/16/2020 | WO |