This disclosure relates to an interference signal reduction system and method of interference signal reduction.
Model-based approaches to interference signal reduction have been applied to train a machine learning model to estimate the desired target signal based on features extracted from the observed sensor signals. The machine learning model training happens off-line using approximate representative data to the actual sensor data. Once trained, the machine learning model is deployed on a processing device, where it performs interference signal reduction.
Aspects of the disclosure are defined in the accompanying claims. In a first aspect, there is provided a method of interference signal reduction comprising: receiving an input signal comprising a plurality of signal segments; providing the input signal to a machine learning model configured to output an estimate of a target signal from the input signal; wherein the method further comprises: classifying each input signal segment of the plurality of signal segments as one of a target-only signal segment, an interference-only signal segment and, optionally, an undefined signal segment; and adapting the machine learning model based on the target-only signal segments and the interference-only signal segments.
In some embodiments, classifying each input signal segment further comprises: estimating a target-signal-to-interference ratio of the input signal segment; classifying the input signal segment as a target-only signal segment in response to the target-signal-to-interference ratio exceeding a first threshold value; and classifying the input signal segment as an interference-only signal segment in response to the target-signal-to-interference ratio being less than a second threshold value lower than the first threshold value.
In some embodiments, adapting the machine learning model further comprises: generating a plurality of target-plus-interference signal segments by mixing a selected target-only signal segment and a selected interference-only signal segment with a target-signal-to-interference ratio for each target-plus-interference signal segment. In some embodiments, the input signal further comprises at least one of an audio signal and a video signal. In some embodiments, the input signal further comprises an audio signal and a video signal and wherein classifying the input signal further comprises: determining a noise power from at least one audio component of the input signal segment; classifying the input signal segment as a target-only signal segment in response to the noise power being less than a noise power threshold value and detecting voice activity from a video component of the input signal segment; and classifying the input signal segment as an interference-only signal segment in response to the noise power being greater than the noise power threshold value and not detecting voice activity from the video component of the input signal segment. In some embodiments, adapting the machine learning model further comprises: generating a plurality of target-plus-interference signal segments; wherein the audio component of a target-plus-interference signal segment is generated by mixing the audio component of a selected target-only signal segment and the audio component of a selected interference-only signal segment with a target-signal-to-interference ratio; and the video component of the target-plus-interference signal segment comprises the video component of the selected target-only signal segment.
In some embodiments, adapting the machine learning model further comprises: constructing a target-plus-interference data set from the plurality of target-plus-interference signal segments; constructing a target-only signal data set from the respective selected target-only signal segment of each target-plus-interference signal segment; and applying the target-plus-interference data set and the target-only signal data set to the machine learning model.
In some embodiments, the input signal comprises a digital audio signal, wherein the input signal segment comprises an audio packet including a plurality of audio samples and wherein classifying the input signal segment further comprises: classifying the input signal segment as a target-only signal segment in response to the audio packet being correctly received; and classifying the input signal segment as an interference-only signal segment in response to the audio packet being incorrectly received.
In some embodiments, the method further comprises: updating at least one parameter of a packet loss pattern model with the interference-only signal segments. In some embodiments, adapting the machine learning model further comprises: constructing a target-only signal data set from a plurality of selected target-only signal segments; constructing a target-plus-interference data set by muting a subset of the plurality of selected target-only signal segments according to the pattern loss packet model; and applying the target-plus-interference data set and the target-only signal data set to the machine learning model.
In some embodiments, the method further comprises: storing target-only and interference-only signal segments in separate databases. In some embodiments, the databases are modified to stay within a predefined size after storing additional signal segments In some embodiments, the method further comprises: receiving a further input signal comprising a plurality of further signal segments; providing the further input signal to the machine learning model further configured to output an estimate of a further target signal from the further input signal; classifying each further input signal segment of the plurality of further signal segments as one of a target-only signal segment, an interference-only signal segment and, optionally, an undefined signal segment; and adapting the machine learning model based on the target-only signal segments and the interference-only signal segments.
In a second aspect, there is provided an interference signal reduction system comprising: a signal input configured to receive an input signal comprising a plurality of signal segments; an interference signal reduction system output; a target signal estimator configured to output an estimate the target signal from the input signal, the target signal estimator comprising a machine learning model and having a target signal estimator input coupled to the signal input, a target signal estimator output coupled to the interference signal reduction system output, and an adaptor input; a signal classifier having a signal classifier input coupled to the signal input, a target signal output, and interference signal output; an adaptor having a target signal input coupled to the target signal output, an interference signal input coupled to the interference signal output, and a model adaptor output coupled to the adaptor input; wherein the signal classifier is configured to classify each input signal segment of the plurality of signal segments as one of a target-only signal segment, an interference-only signal segment and, optionally, an undefined signal segment; and the adaptor is configured to adapt the machine learning model based on the target-only signal segments and the interference-only signal segments.
In some embodiments, the signal classifier is further configured to: estimate a target-signal-to-interference ratio of the input signal segment; classify the input signal segment as a target-only signal segment in response to the target-signal-to-interference ratio exceeding a first threshold value; and classify the input signal segment as an interference-only signal segment in response to the target-signal-to-interference ratio being less than a second threshold value lower than the first threshold value.
In some embodiments, the adaptor is further configured to: generate a plurality of target-plus-interference signal segments by mixing a selected target-only signal segment and a selected interference-only signal segment with a target-signal-to-interference ratio for each target-plus-interference signal segment.
In some embodiments, the input signal further comprises an audio signal and a video signal and wherein the signal classifier is further configured to: determine a noise power from at least one audio component of the input signal segment; classify the input signal segment as a target-only signal segment in response to the noise power being less than a noise power threshold value and voice activity being detected from a video component of the input signal segment; and classify the input signal segment as an interference-only signal segment in response to estimated noise power being greater than the noise power threshold value and no voice activity being detected from the video component of the input signal segment.
In some embodiments, the adaptor is further configured to generate a plurality of target-plus-interference signal segments; wherein the audio component of a target-plus-interference signal segment is generated by mixing the audio component of a selected target-only signal segment and the audio component of a selected interference-only signal segment with a target-signal-to-interference ratio; and the video component of the target-plus-interference signal segment comprises the video component of the selected target-only signal segment.
In some embodiments, the adaptor is further configured to: construct a target-plus-interference data set from the plurality of target-plus-interference signal segments; construct a target-only signal data set from the respective selected target-only signal segment of each target-plus-interference signal segment; and apply the target-plus-interference data set and the target-only signal data set to the machine learning model.
In some embodiments, the input signal comprises a digital audio signal, the input signal segment comprises an audio packet including a plurality of audio samples, and wherein the signal classifier is further configured to: classify the input signal segment as a target-only signal segment in response to the audio packet being correctly received; classify the input signal segment as an interference-only signal segment in response to the audio packet being incorrectly received; and update at least one parameter of a packet loss pattern model with the interference-only signal segments; and wherein the adaptor is further configured to: construct a target-only signal data set from a plurality of selected target-only signal segments; construct a target-plus-interference data set by muting a subset of the plurality of selected target-only signal segments according to the pattern loss packet model; and apply the target-plus-interference data set and the target-only signal data set to the machine learning model.
In the figures and description like reference numerals refer to like features. Embodiments are now described in detail, by way of example only, illustrated by the accompanying drawings in which:
It should be noted that the Figures are diagrammatic and not drawn to scale. Relative dimensions and proportions of parts of these Figures have been shown exaggerated or reduced in size, for the sake of clarity and convenience in the drawings. The same reference signs are generally used to refer to corresponding or similar features in modified and different embodiments.
In operation, a signal may be received at signal input 110. The signal comprises a target signal from a source 102 which may be degraded by one or more interference sources 104 illustrated as interference process 108.
The received signal may be processed for example by converting the analog signal to a digital signal and provided to the target signal estimator 122 which is implemented as a machine learning model configured to output an estimate of a target signal from the input signal (step 154). The input signal may be considered to consist of a number of signal segments. For example in the digital domain, the input signal segment may consist of a number N of signal samples corresponding to a certain predetermined time duration. The target signal estimator 122 may then output an estimate of the target (desired) signal from the received signal on the interference signal reduction system output 124 for each signal segment.
In step 156, the signal classifier 112 may classify a signal segment of the input signal segment as one of a target-only signal segment, an interference-only signal segment and optionally an undefined signal segment. The target-only signal segments may be output on the target signal output 114 and the interference only signal segments may be output on the interference signal output 116. The undefined signal segments may be discarded. The adaptor 118 may adapt the machine learning model based on the target-only signal segments and the interference-only signal segments by constructing a training data set which is used to update the parameters of the machine learning model. The update process typically requires one or more iterations consisting of a forward step of applying input data to the machine learning model and observing the output and a backpropagation step where the machine learning model outputs to the reference outputs from the training set, and adapts model parameters to minimize the difference between the reference outputs and the model outputs. In some examples, the forward-backward iterations may be applied to a copy of the machine learning model in the adaptor 118 which then replaces the machine learning model in target signal estimator 122. In other examples, the forward-backward iterations may be applied by the adaptor 118 directly to the machine-learning model in the target signal estimator 122 during a down-time period. Steps 152 and 154 may be implemented in software running as foreground processes on a processor. Steps 156 and 158 may be implemented in software running as a background process on the processor.
The interference reduction system 100 and method 150 may allow on-system machine learning for interference signal reduction. In some examples, the system 100 may be implemented as a system-on-chip (SoC), microprocessor, microcontroller, or other integrated circuit device. The machine learning model may be trained or adapted online using the actual sensor signal data collected on the system. The interference reduction system 100 and method 150 may allow personalization of interference signal reduction machine learning models towards potentially dynamic target and interference signal characteristics specific to the device/system, the user and/or environment i.e. personalized machine learning.
For examples where the system 100 is implemented on a device, the system 100 allows the learning or adaptation of interference signal reduction machine learning models to be performed fully on-device, without necessity for data exchange or processing externally to the device i.e. privacy-preserving machine learning.
A target source 202 and interference source(s) 204 are combined during a physical interference process 208, resulting in signals 210, 212, 214 which may be detected by respective sensors 216, 218, 220 observing a mixture of the desired target signal and the undesired interference signal(s). The aim of the interference signal reduction system 200 is the removal of undesired interference signals, while retaining the desired target signal based on the observed sensor signals. The input signal or signals may be received at signal inputs 222, 224, 226. The received signal may be initially detected by a sensor which, depending on the type of target signal may for example be a microphone 216, a camera 218, and an antenna 220. The interference signal reduction system 200 illustrates three different types of signals i.e. audio signal 210, video signal 212 and radio frequency (RF) signal 214 which may be processed. In other examples, fewer or more signal types may be processed.
The received signals may be processed for example by converting the sensor signal to a digital signal and provided to the target signal estimator 248 configured to output an estimate of a target signal from the input signals. Each of the input signal types may be considered to consist of a number of signal segments. For example, in the digital domain, an input signal segment may consist of a number N of signal samples corresponding to a certain predetermined time duration. The target signal estimator 248 may then output an estimate of the target (desired) signal from the received signal on the interference signal reduction system output 206 for each signal segment.
The inputs to the interference reduction machine learning model 254 consist of features, extracted from the sensor signals by feature extractor 230. In a background process, the interference reduction machine learning model 254 may be adapted through application of a signal classification process by signal classifier 228 and a model adaptation process by model adaptor 250. The signal classifier 228 may make a relatively coarse classification based on the input sensor signals. The signal classifier 228 may detect the presence of target-only signal or interference-only signal conditions, and in these cases, the respective resulting sensor signal segments output on the target signal output 234 and interference signal output 238 are stored in the respective target signal store 240 or interference signal store 242 as target signal segments or interference signal segments, respectively. In all other signal conditions, the sensor signal segments are not stored as they are not representative of the target signal or the interference signal but may be output on undefined signal output 236. As the signal classifier 228 is designed to detect relatively extreme signal conditions, a complex or adaptive approach is not required; instead relatively simple and fixed implementations can suffice. In the model adaptation process, the stored target signal representations and interference representations may be used to optimize the interference signal reduction performance of the machine learning model for these measured target signal and interference signal conditions.
Different embodiments may have selected implementations of the following aspects:
Sensor configuration: which sensors are used for the background signal classification process, and which are used for the foreground interference reduction process.
Signal classification method: which method is used to decide at which time instances a target signal representation or interference signal representation will be collected.
Estimated signal representations: which type of signal representations are extracted from the on-device sensors signals.
Model adaptation method: which method is used to update the machine learning model based on the collected target signal representations and/or interference signal representations.
The signal classification process is based on a target-signal-to-interference ratio estimation of the microphone sensor signal s (n), where n is the discrete time index. In a first step, the signal s (n) is buffered into an N-sample signal frame also referred to herein as a signal segment, and the analysis is performed on the resulting signal segment sk=[s ((k−1)*N), s ((k−1)*N+1), . . . , s (k*N−1)], where k=1, 2, . . . , K is the segment index. In step 272 the target-signal-to-interference ratio may be estimated. The target-signal-to-interference ratio estimation γk=g (sk) can be implemented using any state-of-the-art method, for example the statistical WADA-SNR algorithm which uses waveform amplitude distribution analysis to estimate the signal-to-noise ratio of speech signals. The resulting estimate γk can then be used in a simple threshold logic for signal classification:
In step 274 the target signal-to-interference ratio is compared with the upper threshold Thi. If the target signal-to-interference ratio exceeds the upper threshold the signal segment sk is classified as a target-only signal segment (step 276) and the signal sk is stored as a target signal representation. Otherwise in step 278 the target signal-to-interference ratio is compared with the lower threshold Tlo. If the signal-to-interference ratio falls below the lower threshold Tlo, the signal segment sk is classified as an interference-only signal segment (step 280). The signal segment sk is stored as an interference signal representation. In other cases (step 282), the signal segment sk is classified as an undefined signal, which could be a mixture of target and interference, or the absence of both target and interference.
Turning now to
The target-plus-interference mixture dataset M and the clean target signal dataset C are used in step 314 to update the machine learning model in an iterative supervised learning process. In each iteration of the supervised learning process, a batch of mixture signals Mi is constructed containing a subset of batch size B signals from the full dataset M. Accordingly, a batch of aligned clean target signals Cj is constructed containing a subset of batch size B signals from the full dataset C. Given current machine learning model f ( ) with model parameters θi-1, a model adaptation step with a step size u is performed in each iteration i:
and where the data batches Mi and Ci are used to optimize the machine learning model parameter change Δθi, in order to minimize the loss function , measuring the interference cancellation quality on the signal batch Mi and the target batch Ci and Δ is an optimization variable. This optimization problem may be solved for example using known gradient descent methods and backpropagation techniques.
The model adaptation process may be considered complete depending on a stopping criterion, for example based on one or more of the following:
Upon completion of the model adaptation process, the adapted machine learning model may replace the previous model in the foreground process, and the interference reduction is performed with the adapted machine learning model. The model adaptation process can be scheduled to be run for example at fixed time intervals using the latest sensor data, or whenever a sufficient amount of new target and interference data has been collected, or based on a user input requesting to update the model.
The signal classification process is now based on both the microphone signal segment which is also referred to as an audio component of the input signal segment, skmic [s ((k−1)*N),s ((k−1)*N+1), . . . ,s (k*N−1)], and the corresponding camera signal segment or segments skcam, also referred to as a video component of the input signal segment typically represented as the RGB values for the different pixels in the image frame. One implementation of the signal classification process is to have a binary voice activity detector vk=gvad (skcam) operating on the camera signal segment, and a noise power estimation nk=gnoise (skmic) operating on the microphone signal segment. The voice activity detector can be implemented using state-of-the-art detectors based on head and lip movement tracking. The noise power estimation can similarly be implemented using a state-of-the-art noise power tracking algorithm. Signal classification is based on a combination logic using both the voice activity detector vk and the noise power estimation nk, for example:
In step 352 a level of voice activity is detected on a video component skcam of the input signal segment sk(cam,mic). In step 354 the noise power nk is estimated from at least one audio component skmic of the input signal segment sk(cam,mic). In step 356, a comparison is made between the noise power estimate and threshold value. In case the noise power estimation is below the threshold Tnoise and voice activity is detected in subsequent step 358, the signals skcam and skmic are classified as target-only signal segments (step 360) and may be stored in a target-only signal database as target signal representations. If voice activity is not detected in step 358, the signal segment is classified as an undefined signal segment (step 366). Returning to step 356 in case the noise power estimation exceeds the threshold Tnoise and voice activity is not detected in subsequent step 362, the signals skcam and skmic are classified as interference-only signal segments (step 364). The signal skmic is stored as an interference signal representation; the signal skcam is not stored as it does not bear information related to the acoustic interferences. If voice activity is detected in step 362, the signal segment is classified as an undefined signal segment (step 366).
Turning now to
Referring to
Model adaption method 500 uses a generative data mixing process. The target-only signal segments are used to construct a clean target-only signal dataset C (step 502). A target-plus-interference dataset M is constructed by muting a subset of the plurality of selected target-only signal segments according to the pattern loss packet model (step 504). Batches of pairs of clean audio streams from dataset C and corresponding audio streams with model-generated packet losses from dataset M can be used to update the interference reduction machine learning model using a supervised learning approach (step 506) as described previously for methods 300, 400.
Embodiments of system and method is described to train models for interference signal reduction on-system, using signal data collected through sensors. A relatively coarse signal classification stage may classify signal segments or frames as target-only and interference-only signal representations. These signal representations are used to optimize interference reduction model performance towards these signal conditions through a model adaptation stage.
Some embodiments may be used for blind source separation in which there may be multiple target signal sources which could all be desired. The invention can be applied in the case of blind source separation to detect presence of the different target sources using the signal classification process, and adapt a blind source separation machine learning system based on an extracted representation of each of the target sources.
Embodiments may allow model personalization towards target and interference signal characteristics specific to the system/device, user and/or environment. Where the interference reduction system is implemented on a device, the learning process is executed fully on-device, without necessity for data exchange or processing externally to the device.
An interference signal reduction apparatus and method of interference signal reduction is described. An input signal includes a plurality of signal segments (frames). The input signal is provided to a machine learning model trained to output an estimate of a desired target signal from the input signal. Each signal segment maybe classified as a target-only signal segment, an interference-only signal segment, or an undefined signal segment. The machine learning model may be adapted based on the target-only signal segments and the interference-only signal segments.
In some example embodiments the set of instructions/method steps described above are implemented as functional and software instructions embodied as a set of executable instructions which are effected on a computer or machine which is programmed with and controlled by said executable instructions. Such instructions are loaded for execution on a processor (such as one or more CPUs). The term processor includes microprocessors, microcontrollers, processor modules or subsystems (including one or more microprocessors or microcontrollers), or other control or computing devices. A processor can refer to a single component or to plural components.
In other examples, the set of instructions/methods illustrated herein and data and instructions associated therewith are stored in respective storage devices, which are implemented as one or more non-transient machine or computer-readable or computer-usable storage media or mediums. Such computer-readable or computer usable storage medium or media is (are) considered to be part of an article (or article of manufacture). An article or article of manufacture can refer to any manufactured single component or multiple components. The non-transient machine or computer usable media or mediums as defined herein excludes signals, but such media or mediums may be capable of receiving and processing information from signals and/or other transient mediums.
Example embodiments of the material discussed in this specification can be implemented in whole or in part through network, computer, or data based devices and/or services. These may include cloud, internet, intranet, mobile, desktop, processor, look-up table, microcontroller, consumer equipment, infrastructure, or other enabling devices and services. As may be used herein and in the claims, the following non-exclusive definitions are provided.
In one example, one or more instructions or steps discussed herein are automated. The terms automated or automatically (and like variations thereof) mean controlled operation of an apparatus, system, and/or process using computers and/or mechanical/electrical devices without the necessity of human intervention, observation, effort and/or decision.
Although the appended claims are directed to particular combinations of features, it should be understood that the scope of the disclosure of the present invention also includes any novel feature or any novel combination of features disclosed herein either explicitly or implicitly or any generalisation thereof, whether or not it relates to the same invention as presently claimed in any claim and whether or not it mitigates any or all of the same technical problems as does the present invention.
Features which are described in the context of separate embodiments may also be provided in combination in a single embodiment. Conversely, various features which are, for brevity, described in the context of a single embodiment, may also be provided separately or in any suitable sub combination.
The applicant hereby gives notice that new claims may be formulated to such features and/or combinations of such features during the prosecution of the present application or of any further application derived therefrom.
For the sake of completeness it is also stated that the term “comprising” does not exclude other elements or steps, the term “a” or “an” does not exclude a plurality, a single processor or other unit may fulfil the functions of several means recited in the claims and reference signs in the claims shall not be construed as limiting the scope of the claims.
Number | Date | Country | Kind |
---|---|---|---|
23198481.6 | Sep 2023 | EP | regional |