Robustness of speech processing system against ultrasound and dolphin attacks

Description

TECHNICAL FIELD

Embodiments described herein relate to methods and devices for improving the robustness of a speech processing system.

BACKGROUND

Many devices include microphones, which can be used to detect ambient sounds. In many situations, the ambient sounds include the speech of one or more nearby speaker. Audio signals generated by the microphones can be used in many ways. For example, audio signals representing speech can be used as the input to a speech recognition system, allowing a user to control a device or system using spoken commands.

It has been suggested that it is possible to interfere with the operation of such a system by transmitting an ultrasound signal, which is by definition inaudible to the user of the device, but which is converted into a signal in the audio frequency band by non-linear components of the electronic circuitry in the device, and which will be recognised as speech by the speech recognition system. Such a malicious ultrasonics-based attack is sometimes referred to as a “dolphin attack”, due to the similarity with how dolphins communicate in ultrasonic audio bands.

SUMMARY

According to an aspect of the present invention, there is provided a method for improving the robustness of a speech processing system having at least one speech processing module, the method comprising: receiving an input sound signal comprising audio and non-audio frequencies; separating the input sound signal into an audio band component and a non-audio band component; identifying possible interference within the audio band from the non-audio band component; and adjusting the operation of a downstream speech processing module based on said identification.

According to another aspect of the present invention, there is provided a system for improving the robustness of a speech processing system, configured for operating in accordance with the method.

According to another aspect of the present invention, there is provided a device comprising such a system. The device may comprise a mobile telephone, an audio player, a video player, a mobile computing platform, a games device, a remote controller device, a toy, a machine, or a home automation controller or a domestic appliance.

According to another aspect of the present invention, there is provided a computer program product, comprising a computer-readable tangible medium, and instructions for performing a method according to the first aspect.

According to another aspect of the present invention, there is provided a non-transitory computer readable storage medium having computer-executable instructions stored thereon that, when executed by processor circuitry, cause the processor circuitry to perform a method according to the first aspect. According to further aspects of the invention, there is provided a device comprising the non-transitory computer readable storage medium. The device may comprise a mobile telephone, an audio player, a video player, a mobile computing platform, a games device, a remote controller device, a toy, a machine, or a home automation controller or a domestic appliance.

According to another aspect of the present invention, there is provided a method of detecting an ultrasound interference signal, the method comprising:

- filtering an input signal to obtain an audio band component of the input signal;
- filtering the input signal to obtain an ultrasound component of the input signal;
- detecting an envelope of the ultrasound component of the input signal;
- detecting a degree of correlation between the audio band component of the input signal and the envelope of the ultrasound component of the input signal; and
- detecting a presence of an ultrasound interference signal if the degree of correlation between the audio band component of the input signal and the envelope of the ultrasound component of the input signal exceeds a threshold level.

According to another aspect of the present invention, there is provided a method of detecting an ultrasound interference signal, the method comprising:

- filtering an input signal to obtain an audio band component of the input signal;
- filtering the input signal to obtain an ultrasound component of the input signal;
- modifying the ultrasound component to simulate an effect of a non-linear downconversion of the input signal;
- detecting a degree of correlation between the audio band component of the input signal and the modified ultrasound component of the input signal; and
- detecting a presence of an ultrasound interference signal if the degree of correlation between the audio band component of the input signal and the modified ultrasound component of the input signal exceeds a threshold level.

According to another aspect of the present invention, there is provided a method of processing a signal containing an ultrasound interference signal, the method comprising:

- filtering an input signal to obtain an audio band component of the input signal;
- filtering the input signal to obtain an ultrasound component of the input signal;
- modifying the ultrasound component to simulate an effect of a non-linear downconversion of the input signal; and
- comparing the audio band component of the input signal and the modified ultrasound component.

In that case, comparing the audio band component of the input signal and the modified ultrasound component may comprise:

- detecting a degree of correlation between the audio band component of the input signal and the modified ultrasound component of the input signal; and
- detecting a presence of an ultrasound interference signal if the degree of correlation between the audio band component of the input signal and the modified ultrasound component of the input signal exceeds a threshold level.

The method may further comprise sending the audio band component of the input signal to a speech processing module only if no ultrasound interference signal is detected.

The step of comparing the audio band component of the input signal and the modified ultrasound component may comprise:

- applying the modified ultrasound component of the input signal to a filter; and
- subtracting the filtered modified ultrasound component of the input signal from the audio band component of the input signal to obtain an output signal.

The filter may be an adaptive filter, and the method may comprise adapting the adaptive filter such that the component of the filtered modified ultrasound component in the output signal is minimised.

BRIEF DESCRIPTION OF DRAWINGS

For a better understanding of the present invention, and to show how it may be put into effect, reference will now be made to the accompanying drawings, in which:

FIG. 1 illustrates a smartphone;

FIG. 2 is a schematic diagram, illustrating the form of the smartphone;

FIG. 3 illustrates a speech processing system;

FIG. 4 illustrates an effect of using a speech processing system;

FIG. 5 is a flow chart illustrating a method of handling an audio signal;

FIG. 6 is a block diagram illustrating a system using the method of FIG. 5;

FIG. 7 is a block diagram illustrating a system using the method of FIG. 5;

FIG. 8 is a block diagram of a system using the method of FIG. 5;

FIG. 9 is a block diagram of a system using the method of FIG. 5;

FIG. 10 is a block diagram of a system using the method of FIG. 5;

FIG. 11 is a block diagram of a system using the method of FIG. 5;

FIG. 12 is a block diagram of a system using the method of FIG. 5; and

FIG. 13 is a block diagram of a system using the method of FIG. 5.

DETAILED DESCRIPTION OF EMBODIMENTS

The description below sets forth example embodiments according to this disclosure. Further example embodiments and implementations will be apparent to those having ordinary skill in the art. Further, those having ordinary skill in the art will recognize that various equivalent techniques may be applied in lieu of, or in conjunction with, the embodiments discussed below, and all such equivalents should be deemed as being encompassed by the present disclosure.

The methods described herein can be implemented in a wide range of devices and systems. However, for ease of explanation of one embodiment, an illustrative example will be described, in which the implementation occurs in a smartphone.

FIG. 1 illustrates a smartphone 10, having a microphone 12 for detecting ambient sounds. In normal use, the microphone is of course used for detecting the speech of a user who is holding the smartphone 10 close to their face.

FIG. 2 is a schematic diagram, illustrating the form of the smartphone 10.

Specifically, FIG. 2 shows various interconnected components of the smartphone 10. It will be appreciated that the smartphone 10 will in practice contain many other components, but the following description is sufficient for an understanding of the present invention.

Thus, FIG. 2 shows the microphone 12 mentioned above. In certain embodiments, the smartphone 10 is provided with multiple microphones 12, 12a, 12b, etc.

FIG. 2 also shows a memory 14, which may in practice be provided as a single component or as multiple components. The memory 14 is provided for storing data and program instructions.

FIG. 2 also shows a processor 16, which again may in practice be provided as a single component or as multiple components. For example, one component of the processor 16 may be an applications processor of the smartphone 10.

FIG. 2 also shows a transceiver 18, which is provided for allowing the smartphone 10 to communicate with external networks. For example, the transceiver 18 may include circuitry for establishing an internet connection either over a WiFi local area network or over a cellular network.

FIG. 2 also shows audio processing circuitry 20, for performing operations on the audio signals detected by the microphone 12 as required. For example, the audio processing circuitry 20 may filter the audio signals or perform other signal processing operations.

In this embodiment, the smartphone 10 is provided with voice biometric functionality, and with control functionality. Thus, the smartphone 10 is able to perform various functions in response to spoken commands from an enrolled user. The biometric functionality is able to distinguish between spoken commands from the enrolled user, and the same commands when spoken by a different person. Thus, certain embodiments of the invention relate to operation of a smartphone or another portable electronic device with some sort of voice operability, for example a tablet or laptop computer, a games console, a home control system, a home entertainment system, an in-vehicle entertainment system, a domestic appliance, or the like, in which the voice biometric functionality is performed in the device that is intended to carry out the spoken command. Certain other embodiments relate to systems in which the voice biometric functionality is performed on a smartphone or other device, which then transmits the commands to a separate device if the voice biometric functionality is able to confirm that the speaker was the enrolled user.

In some embodiments, while voice biometric functionality is performed on the smartphone 10 or other device that is located close to the user, the spoken commands are transmitted using the transceiver 18 to a remote speech recognition system, which determines the meaning of the spoken commands. For example, the speech recognition system may be located on one or more remote server in a cloud computing environment. Signals based on the meaning of the spoken commands are then returned to the smartphone 10 or other local device.

FIG. 3 is a block diagram illustrating the basic form of a speech processing system in a device 10. Thus, signals received at a microphone 12 are passed to a speech processing block 30. For example, the speech processing block 30 may comprise a voice activity detector, a speaker recognition block for performing a speaker identification or speaker verification process, and/or a speech recognition block for identifying the speech content of the signals. The speech processing block 30 may also comprise signal conditioning circuitry, such as a pre-amplifier, analog-digital conversion circuitry, and the like.

In such a system, there may be a non-linearity in the system. For example, the non-linearity may be in the microphone 12, or may be in signal conditioning circuitry in the speech processing block 30.

The effect of this is non-linearity in the circuitry is that ultrasonic tones may mix down into the audio band.

FIG. 4 illustrates this schematically. Specifically, FIG. 4 shows a situation where there are interfering signals at two frequencies F₁and F₂in the ultrasound frequency range (i.e. at frequencies>20 kHz), which mix down as a result of the circuit non-linearity to form a signal at a frequency F₃in the audio frequency range (i.e. at frequencies between about 20 Hz and 20 kHz).

FIG. 5 is a flow chart, illustrating a method of analysing an audio signal.

In step 52, the method comprises receiving an input sound signal comprising audio and non-audio frequencies.

In step 54, the method comprises separating the input sound signal into an audio band component and a non-audio band component. The non-audio component may be an ultrasonic component.

In step 56, the method comprises identifying possible interference within the audio band from the non-audio band.

Identifying possible interference within the audio band from the non-audio band component may comprise determining whether a power level of the non-audio band component exceeds a threshold value and, if so, identifying possible interference within the audio band from the non-audio band component.

Alternatively, identifying possible interference within the audio band from the non-audio band component may comprise comparing the audio band and non-audio band components.

Separating the input sound signal into an audio component and a non-audio component, such as an ultrasonic component, makes it possible to identify the presence of potentially problematic non-audio band components which may result in interference in the audio band. Such problematic signals may be present accidentally, as the result of relatively high levels of background sound signals, such as ultrasonic signals from ultrasonic sensor devices or modems. Alternatively, the problematic signals may be generated by a malicious actor in an attempt to interfere with or spoof the operation of a speech processing system, for example by generating ultrasonic signals that mix down as a result of circuit non-linearities to form audio band signals that can be misinterpreted as speech, or by generating ultrasonic signals that interfere with other aspects of the processing.

In step 58, the method comprises adjusting the operation of a downstream speech processing module based on said identification of possible interference.

The adjusting of the operation of the speech processing module may take the form of modifications to the speech processing that is performed by the speech processing module, or may take the form of modifications to the signal that is applied to the speech processing module.

For example, modifications to the speech processing that is performed by the speech processing module may involve placing less (or zero) reliance on the speech signal during time periods when possible interference is identified, or warning a user that there is possible interference.

For example, modifications to the signal that is applied to the speech processing module may take the form of attempting to remove the effect of the interference.

FIG. 6 is a block diagram illustrating the basic form of a speech processing system in a device 10. As in FIG. 3, signals received at a microphone 12 are passed to a speech processing block 30. Again, as in FIG. 3, the speech processing block 30 may comprise a voice activity detector, a speaker recognition block for performing a speaker identification or speaker verification process, and/or a speech recognition block for identifying the speech content of the signals. The speech processing block 30 may also comprise signal conditioning circuitry, such as a pre-amplifier, analog-digital conversion circuitry, and the like.

As mentioned with respect to FIG. 3, there may be a non-linearity in the system. For example, the non-linearity may be in the microphone 12, or may be in signal conditioning circuitry in the speech processing block 30.

In the system of FIG. 6, the received signals are also passed to an ultrasound monitoring block 62, which separates the input sound signal into an audio band component and a non-audio band component, which may be an ultrasonic component, and identifies possible interference within the audio band from the non-audio band component.

If a source of possible interference is identified, the speech processing that is performed by the speech processing module may be modified appropriately.

FIG. 7 is a block diagram illustrating the basic form of a speech processing system in a device 10. In the system of FIG. 7, signals received at a microphone 12 are passed to an ultrasound monitoring block 66, which separates the input sound signal into an audio band component and a non-audio band component, which may be an ultrasonic component, and identifies possible interference within the audio band from the non-audio band component, resulting for example from non-linearity in the microphone 12.

If a source of possible interference is identified, the received signal may be modified appropriately, and the modified signal may then be applied to the speech processing module 30.

As in FIG. 3, the speech processing block 30 may comprise a voice activity detector, a speaker recognition block for performing a speaker identification or speaker verification process, and/or a speech recognition block for identifying the speech content of the signals. The speech processing block 30 may also comprise signal conditioning circuitry, such as a pre-amplifier, analog-digital conversion circuitry, and the like.

FIG. 8 is a block diagram, illustrating the form of the ultrasound monitoring block 62 or 66, in some embodiments.

In this embodiment, signals received from the microphone 12 are separated into an audio band component and a non-audio band component. The received signals are passed to a low-pass filter (LPF) 82, for example a low-pass filter with a cut-off frequency at or below ˜20 kHz, which filters the input sound signal to obtain an audio band component of the input sound signal. The received signals are also passed to a high-pass filter (HPF) 84, for example a high-pass filter with a cut-off frequency at or above ˜20 kHz, to obtain a non-audio band component of the input sound signal, which will be an ultrasound signal when the high-pass filter has a cut-off frequency at or above ˜20 kHz. In other embodiments, the HPF 84 may be replaced by a band-pass filter, for example with a pass-band from ˜20 kHz to ˜90 kHz. Again, the non-audio band component of the input sound signal will be an ultrasound signal when the low frequency end of the pass band of the band-pass filter is at or above ˜20 kHz.

The non-audio band component of the input sound signal is passed to a power level detect block 150, which determines whether a power level of the non-audio band component exceeds a threshold value. For example, the power level detect block 150 may determine whether the peak non-audio band (e.g. ultrasound) power level exceeds a threshold. For example, it may determine whether the peak ultrasound power level exceeds −30 dBFS (decibels relative to full scale). Such a level of ultrasound may result from an attack by a malicious party. In any event, if the ultrasound power level exceeds the threshold value, it could be identified that this may result in interference in the audio band due to non-linearities.

The threshold value may be set based on knowledge of the effect of the non-linearity in the circuit. Thus, if the effect of the nonlinearity is known to be a value A(nl), for example a 40 dB mixdown, it is possible to set a threshold A(bb) for a power level in the audio base band which could affect system operation, for example 30 dB SPL.

Then, an ultrasonic signal at or above A(us), where A(us)=A(bb)+A(nl), would cause problems in the audio band, because the non-linearity would cause it to generate a base band signal above the threshold at which system operation could be affected. With the examples given above, where A(nl)=40 dB and A(bb)=30 dB SPL, this gives a threshold value of 70 dB for the ultrasound power level.

If it is determined that the ultrasound power level exceeds the threshold value, the output of the power level detect block 150 may be a flag, to be sent to the downstream speech processing module in step 58 of the method of FIG. 5, in order to control the operation thereof.

FIG. 9 is a block diagram, illustrating the form of the ultrasound monitoring block 62 or 66, in some embodiments.

The non-audio band component of the input sound signal is passed to a power level compare block 160. This compares the audio band and non-audio band components.

For example, in this case, identifying possible interference within the audio band from the non-audio band component may comprise: measuring a signal power in the audio band component P_a; measuring a signal power in the non-audio band component P_b. Then, if (P_a/P_b) is less than a threshold limit, it could be identified that this may result in interference in the audio band due to non-linearities.

In that case, the output of the power level compare block 160 may be a flag, to be sent to the downstream speech processing module in step 58 of the method of FIG. 5, in order to control the operation thereof. More specifically, this flag may indicate to the speech processing module that the quality of the input sound signal is unreliable for speech processing. The operation of the downstream speech processing module may then be controlled based on the flagged unreliable quality.

FIG. 10 is a block diagram, illustrating the form of the ultrasound monitoring block 62 or 66, in some embodiments.

Signals received from the microphone 12 are separated into an audio band component and a non-audio band component. The received signals are passed to a low-pass filter (LPF) 82, for example a low-pass filter with a cut-off frequency at or below ˜20 kHz, which filters the input sound signal to obtain an audio band component of the input sound signal. The received signals are also passed to a high-pass filter (HPF) 84, for example a high-pass filter with a cut-off frequency at or above ˜20 kHz, to obtain a non-audio band component of the input sound signal, which will be an ultrasound signal when the high-pass filter has a cut-off frequency at or above ˜20 kHz. In other embodiments, the HPF 84 may be replaced by a band-pass filter, for example with a pass-band from ˜20 kHz to ˜90 kHz. Again, the non-audio band component of the input sound signal will be an ultrasound signal when the low frequency end of the pass band of the band-pass filter is at or above ˜20 kHz.

The non-audio band component of the input sound signal may be passed to a block 86 that simulates the effect of a non-linearity on the signal, and then to a low-pass filter 88.

The audio band component generated by the low-pass filter 82 and the simulated non-linear signal generated by the block 86 and the low-pass filter 88 are then passed to a comparison block 90.

In one embodiment, the comparison block 90 measures a signal power in the audio band component, measures a signal power in the non-audio band component, and calculates a ratio of the signal power in the audio band component to the signal power in the non-audio band component. If this ratio is below a threshold limit, this is taken to indicate that the input sound signal may contain too high a level of ultrasound to be reliably used for speech processing. In that case, the output of the comparison block 90 may be a flag, to be sent to the downstream speech processing module in step 58 of the method of FIG. 5, in order to control the operation thereof.

In another embodiment, the comparison block 90 detects the envelope of the signal of the non-audio band component, and detects a level of correlation between the envelope of the signal and the audio band component. Detecting the level of correlation may comprise measuring a time-domain correlation between identified signal envelopes of the non-audio band component, and speech components of the audio band component. In this situation, some or all of the audio band component may result from ultrasound signals in the ambient sound, that have been downconverted into the audio band by non-linearities in the microphone 12. This will lead to a correlation with the non-audio band component that is selected by the filter 84. Therefore, the presence of such a correlation exceeding a threshold value is taken as an indication that there may be non-audio band interference within the audio band.

In that case, the output of the comparison block 90 may be a flag, to be sent to the downstream speech processing module in step 58 of the method of FIG. 5, in order to control the operation thereof.

In another embodiment, the block 86 simulates the effect of a non-linearity on the signal, to provide a simulated non-linear signal. For example, the block 86 may attempt to model the non-linearity in the system that may be causing the interference by non-linear downconversion of the input sound signal. The non-linearities simulated by the block 86 may be second-order and/or third-order non-linearities.

In that embodiment, the comparison block 90 then detects a level of correlation between the simulated non-linear signal and the audio band component. If the level of correlation exceeds a threshold value, then it is determined that there may be interference within the audio band caused by signals from the non-audio band.

Again, in that case, the output of the comparison block 90 may be a flag, to be sent to the downstream speech processing module in step 58 of the method of FIG. 5, in order to control the operation thereof.

FIG. 11 is a block diagram, illustrating the form of the ultrasound monitoring block 66, in some other embodiments.

The non-audio band component of the input sound signal may be passed to a block 86 that simulates the effect of a non-linearity on the signal, and then to a low-pass filter 88.

In the case of the embodiments shown in FIG. 11, the adjustment of the operation of the downstream speech processing module, in step 58 of the method of FIG. 5, comprises providing a compensated sound signal to the downstream speech processing module.

The step of providing the compensated sound signal may comprise subtracting the simulated non-linear signal from the audio band component to provide the compensated output signal, which is then provided to the downstream speech processing module.

In the embodiment of FIG. 11, the simulated non-linear signal generated by the block 86 and the low-pass filter 88 are passed to a further filter 100.

The audio band component generated by the low-pass filter 82 is passed to a subtractor 102, and the output of the further filter 100 is subtracted from the audio band component, in order to remove from the audio band signal any component caused by downconversion of ultrasound signals. The further filter 100 may be an adaptive filter, and in its simplest form it may be an adaptive gain. The further filter 100 is adapted such that the component of the filtered simulated non-linearity signal in the compensated output signal is minimised.

The resulting compensated audio band signal is passed to the downstream speech processing module.

FIG. 12 is a block diagram, illustrating the form of the ultrasound monitoring block 66, in some other embodiments.

In the embodiments illustrated above, the signals from the microphone 12 may be analog signals, and they may be passed to an analog-digital converter for conversion to digital form before being passed to the respective filters. However, for ease of illustration, in cases where it is assumed that the analog-digital conversion is not the source of non-linearity that causes ultrasound signals to be mixed down into the audio band, the analog-digital converters have not been shown in the figures.

However, FIG. 12 shows a case in which the analog-digital conversion is not ideal, and so FIG. 12 shows signals received from the microphone 12 being passed to an analog-digital converter (ADC) 120.

Again, the resulting signal is separated into an audio band component and a non-audio band component. The received signals are passed to a low-pass filter (LPF) 82, for example a low-pass filter with a cut-off frequency at or below ˜20 kHz, which filters the input sound signal to obtain an audio band component of the input sound signal.

In general the bandwidth of the ADC must be large enough to be able to handle the ultrasonic components of the received signal. However, in any real ADC, there will be a frequency at which the quantization noise of the ADC will start to rise. This places an upper limit on the frequencies that can be allowed into the non-linearity. Therefore, FIG. 12 shows the output of the ADC 120 being passed not to a high-pass filter, but to a band-pass filter (BPF) 122. The lower end of the pass-band may for example be at ˜20 kHz, with the upper end of the pass-band being at a frequency that excludes the frequencies that are corrupted by quantization noise, for example at ˜90 kHz.

As in other embodiments, the non-audio band component of the input sound signal may be passed to a block 86 that simulates the effect of a non-linearity on the signal, and then to a low-pass filter 88.

In the case of the embodiments shown in FIG. 12, the adjustment of the operation of the downstream speech processing module, in step 58 of the method of FIG. 5, comprises providing a compensated sound signal to the downstream speech processing module.

In this illustrated example, the step of providing the compensated sound signal may comprise subtracting the simulated non-linear signal from the audio band component to provide the compensated output signal, which is then provided to the downstream speech processing module.

Thus, in FIG. 12, the audio band component generated by the low-pass filter 82 is passed to a subtractor 102, and the simulated non-linear signal generated by the block 86 and the low-pass filter 88 is subtracted from the audio band component. This attempts to remove from the audio band signal any component caused by downconversion of ultrasound signals.

The resulting compensated audio band signal is passed to the downstream speech processing module.

FIG. 13 is a block diagram, illustrating the form of the ultrasound monitoring block 66, in some other embodiments, where the non-linearity in the microphone 12 or elsewhere is unknown (for example the magnitude of the non-linearity and/or the relative strengths of 2^ndorder non-linearity and 3^rdorder non-linearity). In this case, the step of simulating a non-linearity comprises providing the non-audio band component to an adaptive non-linearity module, and the method comprises controlling the adaptive non-linearity module such that the component of the simulated non-linearity signal in the compensated output signal is minimised.

Thus, FIG. 13 shows the received signal being passed to a low-pass filter (LPF) 82, for example a low-pass filter with a cut-off frequency at or below ˜20 kHz, which filters the input sound signal to obtain an audio band component of the input sound signal.

FIG. 13 shows the received signal being passed to a band-pass filter (BPF) 122. The lower end of the pass-band may for example be at ˜20 kHz, with the upper end of the pass-band being at a frequency that excludes the frequencies that are corrupted by quantization noise, for example at ˜90 kHz.

In these embodiments, the non-audio band component of the input sound signal may be passed to an adaptive block 140 that simulates the effect of a non-linearity on the signal. The output of the block 140 is passed to a low-pass filter 88.

As before, the adjustment of the operation of the downstream speech processing module, in step 58 of the method of FIG. 5, comprises providing a compensated sound signal to the downstream speech processing module.

More specifically, in this illustrated example, the step of providing the compensated sound signal may comprise subtracting the simulated non-linear signal from the audio band component to provide the compensated output signal, which is then provided to the downstream speech processing module.

Thus, in FIG. 13, the audio band component generated by the low-pass filter 82 is passed to a subtractor 102, and the simulated non-linear signal generated by the block 140 and the low-pass filter 88 is subtracted from the audio band component. This attempts to remove from the audio band signal any component caused by downconversion of ultrasound signals.

The resulting compensated audio band signal is passed to the downstream speech processing module.

In one example, the non-linearity may be modelled in the block 140 with a polynomial p(x), with the error being fed back from the output of the subtractor 102.

The Least Mean Squares algorithm may update the m-th polynomial term p_mas per:

p_m→p_m+μ·ε·x^m
p_m→p_m+μ·(x−α)·x^m.

An alternative version applies a filtering to the error signal:

p_m→p_m+μ·λ{(x−α)·x^m},

where λ is a filter function.

For example a simple Boxcar filter could be used.

Any of the embodiments described above can be used in a two-stage system, in which the first stage corresponds to that shown in FIG. 8. That is, the received signal is filtered to obtain an audio band component and a non-audio band (for example, ultrasound) component of the input signal. It is then determined whether the signal power in the non-audio band component is below or above a threshold value. If there is a low power level in the ultrasound band, this indicates that there is unlikely to be a problem caused by downconversion of audio signals to the audio band. If there is a higher power level in the ultrasound band, there is a possibility of a problem, and so the further processing described above with reference to FIG. 10, 11, 12 or 13 is performed to determine if interference is likely, and to take mitigating action if required. For example, if the measured signal power level in the non-audio band component is below a threshold level X, the input sound signal may be flagged as free of non-audio band interference, and, if the measured signal power level in the non-audio band component is above a threshold level X, the audio band and non-audio band components may be compared to identify possible interference within the audio band from the non-audio band.

This allows for low-power operation, as the comparison step will only be performed in situations where the non-audio band component has a signal power above the threshold level. For a non-audio band component having signal power below such a threshold, it can be assumed that no interference will be present in the input sound signal used for downstream speech processing.

The skilled person will recognise that some aspects of the above-described apparatus and methods may be embodied as processor control code, for example on a non-volatile carrier medium such as a disk, CD- or DVD-ROM, programmed memory such as read only memory (Firmware), or on a data carrier such as an optical or electrical signal carrier. For many applications embodiments of the invention will be implemented on a DSP (Digital Signal Processor), ASIC (Application Specific Integrated Circuit) or FPGA (Field Programmable Gate Array). Thus the code may comprise conventional program code or microcode or, for example code for setting up or controlling an ASIC or FPGA. The code may also comprise code for dynamically configuring re-configurable apparatus such as re-programmable logic gate arrays. Similarly the code may comprise code for a hardware description language such as Verilog™ or VHDL (Very high speed integrated circuit Hardware Description Language). As the skilled person will appreciate, the code may be distributed between a plurality of coupled components in communication with one another. Where appropriate, the embodiments may also be implemented using code running on a field-(re)programmable analogue array or similar device in order to configure analogue hardware.

Note that as used herein the term module shall be used to refer to a functional unit or block which may be implemented at least partly by dedicated hardware components such as custom defined circuitry and/or at least partly be implemented by one or more software processors or appropriate code running on a suitable general purpose processor or the like. A module may itself comprise other modules or functional units. A module may be provided by multiple components or sub-modules which need not be co-located and could be provided on different integrated circuits and/or running on different processors.

Embodiments may be implemented in a host device, especially a portable and/or battery powered host device such as a mobile computing device for example a laptop or tablet computer, a games console, a remote control device, a home automation controller or a domestic appliance including a domestic temperature or lighting control system, a toy, a machine such as a robot, an audio player, a video player, or a mobile telephone for example a smartphone.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design many alternative embodiments without departing from the scope of the appended claims. The word “comprising” does not exclude the presence of elements or steps other than those listed in a claim, “a” or “an” does not exclude a plurality, and a single feature or other unit may fulfil the functions of several units recited in the claims. Any reference numerals or labels in the claims shall not be construed so as to limit their scope.

Claims

1. A method for improving the robustness of a speech processing system having at least one speech processing module, the method comprising: receiving an input sound signal comprising audio and non-audio frequencies;separating the input sound signal into an audio band component and a non-audio band component;identifying possible interference within the audio band from the non-audio band component, wherein the step of identifying possible interference within the audio band from the non-audio band component comprises: comparing the audio band and non-audio band components;measuring a signal power in the audio band component Pa;measuring a signal power in the non-audio band component Pb; andif (Pa/Pb)<threshold limit, flagging the quality of the input sound signal as unreliable for speech processing; andadjusting operation of a downstream speech processing module based on said identification, wherein the step of adjusting comprises controlling the operation of a downstream speech processing module based on the flagged unreliable quality.
2. The method of claim 1, wherein identifying possible interference within the audio band from the non-audio band component comprises determining whether a power level of the non-audio band component exceeds a threshold value and, if so, identifying possible interference within the audio band from the non-audio band component.
3. The method of claim 1, wherein the step of separating comprises: filtering the input sound signal to obtain an audio band component of the input sound signal; andfiltering the input sound signal to obtain a non-audio band component of the input sound signal.
4. The method of claim 1, wherein the speech processing system is a voice biometrics system.
5. A method for improving the robustness of a speech processing system having at least one speech processing module, the method comprising: receiving an input sound signal comprising audio and non-audio frequencies;separating the input sound signal into an audio band component and a non-audio band component;identifying possible interference within the audio band from the non-audio band component, wherein the step of identifying possible interference within the audio band from the non-audio band component comprises comparing the audio band and non-audio band components, and wherein the step of comparing comprises: detecting an envelope of the non-audio band component;detecting a level of correlation between the envelope of the non-audio band component and the audio band component; anddetermining possible non-audio band interference within the audio band if the level of correlation exceeds a threshold value; andadjusting operation of a downstream speech processing module based on said identification.
6. The method of claim 5, wherein the step of adjusting comprises flagging a detection of possible non-audio band interference within the audio band to a downstream speech processing module.
7. The method of claim 5, wherein the step of separating comprises: filtering the input sound signal to obtain an audio band component of the input sound signal; andfiltering the input sound signal to obtain a non-audio band component of the input sound signal.
8. The method of claim 5, wherein the speech processing system is a voice biometrics system.
9. A method for improving the robustness of a speech processing system having at least one speech processing module, the method comprising: receiving an input sound signal comprising audio and non-audio frequencies;separating the input sound signal into an audio band component and a non-audio band component;identifying possible interference within the audio band from the non-audio band component, wherein the step of identifying possible interference within the audio band from the non-audio band component comprises comparing the audio band and non-audio band components, and wherein the step of comparing comprises: simulating an effect of a non-linearity on the non-audio band component to provide a simulated non-linear signal;detecting a level of correlation between the simulated non-linear signal and the audio band component; anddetermining possible non-audio band interference within the audio band if the level of correlation exceeds a threshold value; andadjusting operation of a downstream speech processing module based on said identification.
10. The method of claim 9, wherein the step of separating comprises: filtering the input sound signal to obtain an audio band component of the input sound signal; andfiltering the input sound signal to obtain a non-audio band component of the input sound signal.
11. The method of claim 9, wherein the speech processing system is a voice biometrics system.
12. A method for improving the robustness of a speech processing system having at least one speech processing module, the method comprising: receiving an input sound signal comprising audio and non-audio frequencies;separating the input sound signal into an audio band component and a non-audio band component;identifying possible interference within the audio band from the non-audio band component; andadjusting operation of a downstream speech processing module based on said identification, wherein the step of adjusting comprises providing a compensated sound signal to a downstream speech processing module; and wherein the step of providing a compensated sound signal comprises: subtracting a simulated non-linear signal from the audio band component to provide a compensated output signal; andproviding the compensated output signal to a downstream speech processing module.
13. The method of claim 12, wherein the step of subtracting comprises: applying the simulated non-linearity signal to a filter; andsubtracting the filtered simulated non-linearity signal from the audio band component of the input sound signal to provide a compensated output signal.
14. A method according to claim 13, wherein the filter is an adaptive filter, and the method comprises adapting the adaptive filter such that the component of the filtered simulated non-linearity signal in the compensated output signal is minimised.
15. The method of claim 14, wherein adapting the adaptive filter comprises adapting a gain of the filter.
16. The method of claim 14, wherein adapting the adaptive filter comprises adapting filter coefficients of the filter.
17. The method of claim 12, wherein the step of separating comprises: filtering the input sound signal to obtain an audio band component of the input sound signal; andfiltering the input sound signal to obtain a non-audio band component of the input sound signal.
18. The method of claim 12, wherein the speech processing system is a voice biometrics system.
19. A method for improving the robustness of a speech processing system having at least one speech processing module, the method comprising: receiving an input sound signal comprising audio and non-audio frequencies;separating the input sound signal into an audio band component and a non-audio band component;identifying possible interference within the audio band from the non-audio band component, wherein the step of identifying possible interference within the audio band from the non-audio band component comprises comparing the audio band and non-audio band components; andadjusting operation of a downstream speech processing module based on said identification;wherein the steps of comparing and adjusting comprise: simulating an effect of a non-linearity on the non-audio band component to provide a simulated non-linear signal;subtracting the simulated non-linear signal from the audio band component to provide a compensated output signal; andproviding the compensated output signal to a downstream speech processing module.
20. The method of claim 19, wherein the step of simulating the effect of the non-linearity comprises providing the non-audio band component to an adaptive non-linearity module, and wherein the method comprises controlling the adaptive non-linearity module such that the component of the simulated non-linearity signal in the compensated output signal is minimised.
21. The method of claim 19, wherein the step of separating comprises: filtering the input sound signal to obtain an audio band component of the input sound signal; andfiltering the input sound signal to obtain a non-audio band component of the input sound signal.
22. The method of claim 19, wherein the speech processing system is a voice biometrics system.
23. A method for improving the robustness of a speech processing system having at least one speech processing module, the method comprising: receiving an input sound signal comprising audio and non-audio frequencies;separating the input sound signal into an audio band component and a non-audio band component;identifying possible interference within the audio band from the non-audio band component;adjusting operation of a downstream speech processing module based on said identification; andmeasuring a signal power in the non-audio band component Pb, wherein the method is responsive to the step of measuring the signal power, such that: if the measured signal power level Pb is below a threshold level X, the method comprises flagging the input sound signal as free of non-audio band interference, andif the measured signal power level Pb is above a threshold level X, the method performs the step of identifying possible interference within the audio band from the non-audio band component.
24. The method of claim 23, wherein the step of separating comprises: filtering the input sound signal to obtain an audio band component of the input sound signal; andfiltering the input sound signal to obtain a non-audio band component of the input sound signal.
25. The method of claim 23, wherein the speech processing system is a voice biometrics system.
26. A system for improving the robustness of a speech processing system having at least one speech processing module, the system comprising an input for receiving an input sound signal comprising audio and non-audio frequencies; and a filter for separating a non-audio band component from the input sound signal, and the system being configured for: receiving an input sound signal comprising audio and non-audio frequencies;separating the input sound signal into an audio band component and a non-audio band component;identifying possible interference within the audio band from the non-audio band component, wherein the step of identifying possible interference within the audio band from the non-audio band component comprises: comparing the audio band and non-audio band components;measuring a signal power in the audio band component Pa;measuring a signal power in the non-audio band component Pb; andif (Pa/Pb)<threshold limit, flagging the quality of the input sound signal as unreliable for speech processing; andadjusting operation of a downstream speech processing module based on said identification, wherein the step of adjusting comprises controlling operation of a downstream speech processing module based on the flagged unreliable quality.
27. A system for improving the robustness of a speech processing system having at least one speech processing module, the system comprising an input for receiving an input sound signal comprising audio and non-audio frequencies; and a filter for separating a non-audio band component from the input sound signal, and the system being configured for: receiving an input sound signal comprising audio and non-audio frequencies;separating the input sound signal into an audio band component and a non-audio band component;identifying possible interference within the audio band from the non-audio band component, wherein the step of identifying possible interference within the audio band from the non-audio band component comprises comparing the audio band and non-audio band components, and wherein the step of comparing comprises: detecting an envelope of the non-audio band component;detecting a level of correlation between the envelope of the non-audio band component and the audio band component; anddetermining possible non-audio band interference within the audio band if the level of correlation exceeds a threshold value; andadjusting operation of a downstream speech processing module based on said identification.
28. A system for improving the robustness of a speech processing system having at least one speech processing module, the system comprising an input for receiving an input sound signal comprising audio and non-audio frequencies; and a filter for separating a non-audio band component from the input sound signal, and the system being configured for: receiving an input sound signal comprising audio and non-audio frequencies;separating the input sound signal into an audio band component and a non-audio band component;identifying possible interference within the audio band from the non-audio band component, wherein the step of identifying possible interference within the audio band from the non-audio band component comprises comparing the audio band and non-audio band components, and wherein the step of comparing comprises: simulating an effect of a non-linearity on the non-audio band component to provide a simulated non-linear signal;detecting a level of correlation between the simulated non-linear signal and the audio band component; anddetermining possible non-audio band interference within the audio band if the level of correlation exceeds a threshold value; andadjusting operation of a downstream speech processing module based on said identification.
29. A system for improving the robustness of a speech processing system having at least one speech processing module, the system comprising an input for receiving an input sound signal comprising audio and non-audio frequencies; and a filter for separating a non-audio band component from the input sound signal, and the system being configured for: receiving an input sound signal comprising audio and non-audio frequencies;separating the input sound signal into an audio band component and a non-audio band component;identifying possible interference within the audio band from the non-audio band component; andadjusting operation of a downstream speech processing module based on said identification, wherein the step of adjusting comprises providing a compensated sound signal to a downstream speech processing module; and wherein the step of providing a compensated sound signal comprises: subtracting a simulated non-linear signal from the audio band component to provide a compensated output signal; andproviding the compensated output signal to a downstream speech processing module.
30. A system for improving the robustness of a speech processing system having at least one speech processing module, the system comprising an input for receiving an input sound signal comprising audio and non-audio frequencies; and a filter for separating a non-audio band component from the input sound signal, and the system being configured for: receiving an input sound signal comprising audio and non-audio frequencies;separating the input sound signal into an audio band component and a non-audio band component;identifying possible interference within the audio band from the non-audio band component, wherein the step of identifying possible interference within the audio band from the non-audio band component comprises comparing the audio band and non-audio band components; andadjusting operation of a downstream speech processing module based on said identification;wherein the steps of comparing and adjusting comprise: simulating an effect of a non-linearity on the non-audio band component to provide a simulated non-linear signal;subtracting the simulated non-linear signal from the audio band component to provide a compensated output signal; andproviding the compensated output signal to a downstream speech processing module.
31. A system for improving the robustness of a speech processing system having at least one speech processing module, the system comprising an input for receiving an input sound signal comprising audio and non-audio frequencies; and a filter for separating a non-audio band component from the input sound signal, and the system being configured for: receiving an input sound signal comprising audio and non-audio frequencies;separating the input sound signal into an audio band component and a non-audio band component;identifying possible interference within the audio band from the non-audio band component;adjusting operation of a downstream speech processing module based on said identification; andmeasuring a signal power in the non-audio band component Pb, wherein the method is responsive to the step of measuring the signal power, such that: if the measured signal power level Pb is below a threshold level X, the method comprises flagging the input sound signal as free of non-audio band interference, andif the measured signal power level Pb is above a threshold level X, the method performs the step of identifying possible interference within the audio band from the non-audio band component.
32. A non-transitory computer readable storage medium having computer-executable instructions stored thereon that, when executed by processor circuitry, cause the processor circuitry to perform a method according to claim 1.

Priority Claims (1)

Number	Date	Country	Kind
1801874.7	Feb 2018	GB	national

US Referenced Citations (202)

Number	Name	Date	Kind
5197113	Mumolo	Mar 1993	A
5568559	Makino	Oct 1996	A
5787187	Bouchard et al.	Jul 1998	A
6480825	Sharma et al.	Nov 2002	B1
7016833	Gable et al.	Mar 2006	B2
7039951	Chaudhari et al.	May 2006	B1
7492913	Connor et al.	Feb 2009	B2
8489399	Gross	Jul 2013	B2
8856541	Chaudhury et al.	Oct 2014	B1
8997191	Stark et al.	Mar 2015	B1
9049983	Baldwin	Jun 2015	B1
9171548	Velius et al.	Oct 2015	B2
9305155	Vo et al.	Apr 2016	B1
9317736	Siddiqui	Apr 2016	B1
9390726	Smus et al.	Jul 2016	B1
9430629	Ziraknejad et al.	Aug 2016	B1
9484036	Kons et al.	Nov 2016	B2
9548979	Johnson et al.	Jan 2017	B1
9641585	Kvaal et al.	May 2017	B2
9646261	Agrafioti et al.	May 2017	B2
9659562	Lovitt	May 2017	B2
9665784	Derakhshani et al.	May 2017	B2
9984314	Philipose et al.	May 2018	B2
10032451	Mamkina et al.	Jul 2018	B1
10063542	Kao	Aug 2018	B1
10079024	Bhimanaik et al.	Sep 2018	B1
10097914	Petrank	Oct 2018	B2
10192553	Chenier et al.	Jan 2019	B1
10204625	Mishra et al.	Feb 2019	B2
10210685	Borgmeyer	Feb 2019	B2
10305895	Barry et al.	May 2019	B2
10318580	Topchy et al.	Jun 2019	B2
10334350	Petrank	Jun 2019	B2
10460095	Boesen	Oct 2019	B2
10467509	Albadawi et al.	Nov 2019	B2
10733987	Govender et al.	Aug 2020	B1
20020194003	Mozer	Dec 2002	A1
20030033145	Petrushin	Feb 2003	A1
20030177006	Ichikawa et al.	Sep 2003	A1
20030177007	Kanazawa et al.	Sep 2003	A1
20040030550	Liu	Feb 2004	A1
20040141418	Matsuo et al.	Jul 2004	A1
20050060153	Gable et al.	Mar 2005	A1
20050171774	Applebaum et al.	Aug 2005	A1
20060171571	Chan et al.	Aug 2006	A1
20070055517	Spector	Mar 2007	A1
20070129941	Tavares	Jun 2007	A1
20070185718	Di Mambro et al.	Aug 2007	A1
20070233483	Kuppuswamy et al.	Oct 2007	A1
20070250920	Lindsay	Oct 2007	A1
20080071532	Ramakrishnan et al.	Mar 2008	A1
20080082510	Wang et al.	Apr 2008	A1
20080223646	White	Sep 2008	A1
20080262382	Akkermans et al.	Oct 2008	A1
20080285813	Holm	Nov 2008	A1
20090087003	Zurek et al.	Apr 2009	A1
20090105548	Bart	Apr 2009	A1
20090167307	Kopp	Jul 2009	A1
20090232361	Miller	Sep 2009	A1
20090281809	Reuss	Nov 2009	A1
20090319270	Gross	Dec 2009	A1
20100004934	Hirose et al.	Jan 2010	A1
20100076770	Ramaswamy	Mar 2010	A1
20100204991	Ramakrishnan et al.	Aug 2010	A1
20100328033	Kamei	Dec 2010	A1
20110051907	Jaiswal et al.	Mar 2011	A1
20110246198	Asenjo et al.	Oct 2011	A1
20110276323	Seytetdinov	Nov 2011	A1
20110314530	Donaldson	Dec 2011	A1
20110317848	Ivanov	Dec 2011	A1
20120110341	Beigi	May 2012	A1
20120223130	Knopp et al.	Sep 2012	A1
20120224456	Visser et al.	Sep 2012	A1
20120249328	Xiong	Oct 2012	A1
20120323796	Udani	Dec 2012	A1
20130024191	Krutsch et al.	Jan 2013	A1
20130058488	Cheng et al.	Mar 2013	A1
20130080167	Mozer	Mar 2013	A1
20130227678	Kang	Aug 2013	A1
20130247082	Wang et al.	Sep 2013	A1
20130279297	Wulff et al.	Oct 2013	A1
20130279724	Stafford et al.	Oct 2013	A1
20130289999	Hymel	Oct 2013	A1
20140059347	Dougherty et al.	Feb 2014	A1
20140149117	Bakish et al.	May 2014	A1
20140188770	Agrafioti et al.	Jul 2014	A1
20140237576	Zhang et al.	Aug 2014	A1
20140241597	Leite	Aug 2014	A1
20140293749	Gervaise	Oct 2014	A1
20140307876	Agiomyrgiannakis et al.	Oct 2014	A1
20140330568	Lewis et al.	Nov 2014	A1
20140337945	Jia et al.	Nov 2014	A1
20140343703	Topchy et al.	Nov 2014	A1
20150006163	Liu et al.	Jan 2015	A1
20150033305	Shear et al.	Jan 2015	A1
20150036462	Calvarese	Feb 2015	A1
20150088509	Gimenez et al.	Mar 2015	A1
20150089616	Brezinski et al.	Mar 2015	A1
20150112682	Rodriguez et al.	Apr 2015	A1
20150134330	Baldwin et al.	May 2015	A1
20150161370	North et al.	Jun 2015	A1
20150161459	Boczek	Jun 2015	A1
20150168996	Sharpe et al.	Jun 2015	A1
20150245154	Dadu et al.	Aug 2015	A1
20150261944	Hosom et al.	Sep 2015	A1
20150301796	Visser et al.	Oct 2015	A1
20150332665	Mishra et al.	Nov 2015	A1
20150347734	Beigi	Dec 2015	A1
20150356974	Tani et al.	Dec 2015	A1
20150371639	Foerster et al.	Dec 2015	A1
20160026781	Boczek	Jan 2016	A1
20160071275	Hirvonen	Mar 2016	A1
20160086609	Yue et al.	Mar 2016	A1
20160111112	Hayakawa	Apr 2016	A1
20160125877	Foerster et al.	May 2016	A1
20160147987	Jang et al.	May 2016	A1
20160210407	Hwang et al.	Jul 2016	A1
20160217321	Gottleib	Jul 2016	A1
20160234204	Rishi et al.	Aug 2016	A1
20160314790	Tsujikawa et al.	Oct 2016	A1
20160324478	Goldstein	Nov 2016	A1
20160330198	Stern et al.	Nov 2016	A1
20160371555	Derakhshani	Dec 2016	A1
20170011406	Tunnell et al.	Jan 2017	A1
20170049335	Duddy	Feb 2017	A1
20170068805	Chandrasekharan et al.	Mar 2017	A1
20170078780	Qian et al.	Mar 2017	A1
20170110121	Warlord et al.	Apr 2017	A1
20170112671	Goldstein	Apr 2017	A1
20170116995	Ady et al.	Apr 2017	A1
20170161482	Elton et al.	Jun 2017	A1
20170169828	Sachdev	Jun 2017	A1
20170200451	Booklet et al.	Jul 2017	A1
20170213268	Puehse et al.	Jul 2017	A1
20170214687	Klein et al.	Jul 2017	A1
20170231534	Agassy et al.	Aug 2017	A1
20170279815	Chung et al.	Sep 2017	A1
20170287490	Biswal et al.	Oct 2017	A1
20170323644	Kawato	Nov 2017	A1
20170347180	Petrank	Nov 2017	A1
20170347348	Masaki et al.	Nov 2017	A1
20170351487	Aviles-Casco Vaquero et al.	Dec 2017	A1
20180018974	Zass	Jan 2018	A1
20180032712	Oh et al.	Feb 2018	A1
20180039769	Saunders et al.	Feb 2018	A1
20180047393	Tian et al.	Feb 2018	A1
20180060557	Valenti et al.	Mar 2018	A1
20180096120	Boesen	Apr 2018	A1
20180107866	Li et al.	Apr 2018	A1
20180108225	Mappus et al.	Apr 2018	A1
20180113673	Sheynblat	Apr 2018	A1
20180121161	Ueno et al.	May 2018	A1
20180146370	Krishnaswamy et al.	May 2018	A1
20180174600	Chaudhuri et al.	Jun 2018	A1
20180176215	Perotti et al.	Jun 2018	A1
20180187969	Kim et al.	Jul 2018	A1
20180191501	Lindemann	Jul 2018	A1
20180232201	Holtmann	Aug 2018	A1
20180232511	Bakish	Aug 2018	A1
20180239955	Rodriguez et al.	Aug 2018	A1
20180240463	Perotti	Aug 2018	A1
20180254046	Khoury et al.	Sep 2018	A1
20180289354	Cvijanovic et al.	Oct 2018	A1
20180292523	Orenstein et al.	Oct 2018	A1
20180308487	Goel et al.	Oct 2018	A1
20180336716	Ramprashad et al.	Nov 2018	A1
20180336901	Masaki et al.	Nov 2018	A1
20180366124	Cilingir et al.	Dec 2018	A1
20180374487	Lesso	Dec 2018	A1
20190005963	Alonso et al.	Jan 2019	A1
20190005964	Alonso et al.	Jan 2019	A1
20190013033	Bhimanaik et al.	Jan 2019	A1
20190030452	Fassbender et al.	Jan 2019	A1
20190042871	Pogorelik	Feb 2019	A1
20190098003	Ota	Mar 2019	A1
20190114496	Lesso	Apr 2019	A1
20190114497	Lesso	Apr 2019	A1
20190115030	Lesso	Apr 2019	A1
20190115032	Lesso	Apr 2019	A1
20190115033	Lesso	Apr 2019	A1
20190115046	Lesso	Apr 2019	A1
20190147888	Lesso	May 2019	A1
20190149932	Lesso	May 2019	A1
20190197755	Vats	Jun 2019	A1
20190199935	Danielsen et al.	Jun 2019	A1
20190228778	Lesso	Jul 2019	A1
20190228779	Lesso	Jul 2019	A1
20190246075	Khadloya et al.	Aug 2019	A1
20190260731	Chandrasekharan et al.	Aug 2019	A1
20190294629	Wexler et al.	Sep 2019	A1
20190295554	Lesso	Sep 2019	A1
20190306594	Aumer et al.	Oct 2019	A1
20190311722	Caldwell	Oct 2019	A1
20190313014	Welbourne et al.	Oct 2019	A1
20190318035	Blanco et al.	Oct 2019	A1
20190356588	Shahraray et al.	Nov 2019	A1
20190371330	Lin et al.	Dec 2019	A1
20190373438	Amir et al.	Dec 2019	A1
20190392145	Komogortsev	Dec 2019	A1
20190394195	Chari et al.	Dec 2019	A1
20200035247	Boyadjiev et al.	Jan 2020	A1
20200204937	Lesso	Jun 2020	A1

Foreign Referenced Citations (39)

Number	Date	Country
2015202397	May 2015	AU
1937955	Mar 2007	CN
104956715	Sep 2015	CN
105185380	Dec 2015	CN
106297772	Jan 2017	CN
106531172	Mar 2017	CN
1205884	May 2002	EP
1600791	Nov 2005	EP
1701587	Sep 2006	EP
1928213	Jun 2008	EP
1965331	Sep 2008	EP
2660813	Nov 2013	EP
2704052	Mar 2014	EP
2860706	Apr 2015	EP
3016314	May 2016	EP
3156978	Apr 2017	EP
2375205	Nov 2002	GB
2493849	Feb 2013	GB
2499781	Sep 2013	GB
2515527	Dec 2014	GB
2541466	Feb 2017	GB
2551209	Dec 2017	GB
2003058190	Feb 2003	JP
2006010809	Jan 2006	JP
2010086328	Apr 2010	JP
9834216	Aug 1998	WO
02103680	Dec 2002	WO
2006054205	May 2006	WO
2007034371	Mar 2007	WO
2008113024	Sep 2008	WO
2010066269	Jun 2010	WO
2013022930	Feb 2013	WO
2013154790	Oct 2013	WO
2014040124	Mar 2014	WO
2015117674	Aug 2015	WO
2015163774	Oct 2015	WO
2016003299	Jan 2016	WO
2017055551	Apr 2017	WO
2017203484	Nov 2017	WO

Non-Patent Literature Citations (56)

Entry
Zhang et al. “DolphinAttack: Inaudible Voice Commands”, Retrieved from Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. Aug. 2017. (Year: 2017).
Song, Liwei, and Prateek Mittal. “Poster: Inaudible voice commands.” Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. Aug. 2017. (Year: 2017).
Andrea Fortuna, [online], DolphinAttack: inaudible voice commands allows attackers to control Siri, Alexa and other digital assistants, Sep. 2017. (Year: 2017).
Combined Search and Examination Report under Sections 17 and 18(3), UKIPO, Application No. GB1801874. dated Jul. 25, 2018.
International Search Report and Written Opinion of the International Searching Authority, International Application No. PCT/GB2019/050185, dated Apr. 2, 2019.
International Search Report and Written Opinion of the International Searching Authority, International Application No. PCT/GB2019/052302, dated Oct. 2, 2019.
Liu, Yuan et al., “Speaker verification with deep features”, Jul. 2014, in International Joint Conference on Neural Networks (IJCNN), pp. 747-753, IEEE.
International Search Report and Written Opinion of the International Searching Authority, International Application No. PCT/GB2018/051927, dated Sep. 25, 2018.
Combined Search and Examination Report under Sections 17 and 18(3), UKIPO, Application No. 1801530.5, dated Jul. 25, 2018.
International Search Report and Written Opinion of the International Searching Authority, International Application No. PCT/GB2018/051924, dated Sep. 26, 2018.
Combined Search and Examination Report under Sections 17 and 18(3), UKIPO, Application No. 1801526.3, dated Jul. 25, 2018.
International Search Report and Written Opinion of the International Searching Authority, International Application No. PCT/GB2018/051931, dated Sep. 27, 2018.
Combined Search and Examination Report under Sections 17 and 18(3), UKIPO, Application No. 1801527.1, dated Jul. 25, 2018.
International Search Report and Written Opinion of the International Searching Authority, International Application No. PCT/GB2018/051925, dated Sep. 26, 2018.
Combined Search and Examination Report under Sections 17 and 18(3), UKIPO, Application No. 1801528.9, dated Jul. 25, 2018.
International Search Report and Written Opinion of the International Searching Authority, International Application No. PCT/GB2018/051928, dated Dec. 3, 2018.
Combined Search and Examination Report under Sections 17 and 18(3), UKIPO, Application No. 1801532.1, dated Jul. 25, 2018.
International Search Report and Written Opinion of the International Searching Authority, International Application No. PCT/GB2018/053274, dated Jan. 24, 2019.
Beigi, Homayoon, “Fundamentals of Speaker Recognition,” Chapters 8-10, ISBN: 978-0-378-77592-0; 2011.
Li, Lantian et al., “A Study on Replay Attack and Anti-Spoofing for Automatic Speaker Verification”, INTERSPEECH 2017, Jan. 1, 2017, pp. 92-96.
Li, Zhi et al., “Compensation of Hysteresis Nonlinearity in Magnetostrictive Actuators with Inverse Multiplicative Structure for Preisach Model”, IEE Transactions on Automation Science and Engineering, vol. 11, No. 2, Apr. 1, 2014, pp. 613-619.
Partial International Search Report of the International Searching Authority, International Application No. PCT/GB2018/052905, dated Jan. 25, 2019.
Combined Search and Examination Report, UKIPO, Application No. GB1713699.5, dated Feb. 21, 2018.
Combined Search and Examination Report, UKIPO, Application No. GB1713695.3, dated Feb. 19, 2018.
Zhang et al., An Investigation of Deep-Learing Frameworks for Speaker Verification Antispoofing—IEEE Journal of Selected Topics in Signal Processes, Jun. 1, 2017.
Combined Search and Examination Report under Sections 17 and 18(3), UKIPO, Application No. GB1809474.8, dated Jul. 23, 2018.
Wu et al., Anti-Spoofing for text-Independent Speaker Verification: An Initial Database, Comparison of Countermeasures, and Human Performance, IEEE/ACM Transactions on Audio, Speech, and Language Processing, Issue Date: Apr. 2016.
International Search Report and Written Opinion of the International Searching Authority, International Application No. PCT/GB2018/051760, dated Aug. 3, 2018.
International Search Report and Written Opinion of the International Searching Authority, International Application No. PCT/GB2018/051787, dated Aug. 16, 2018.
International Search Report and Written Opinion of the International Searching Authority, International Application No. PCT/GB2018/052907, dated Jan. 15, 2019.
Ajmera, et al., “Robust Speaker Change Detection,” IEEE Signal Processing Letters, vol. 11, No. 8, pp. 649-651, Aug. 2004.
Combined Search and Examination Report under Sections 17 and 18(3), UKIPO, Application No. GB1719731.0, dated May 16, 2018.
Further Search Report under Sections 17 (6), UKIPO, Application No. GB1719731.0, dated Nov. 26, 2018.
Combined Search and Examination Report under Sections 17 and 18(3), UKIPO, Application No. GB1801663.4, dated Jul. 18, 2018.
Combined Search and Examination Report under Sections 17 and 18(3), UKIPO, Application No. GB1801659.2, dated Jul. 26, 2018.
Combined Search and Examination Report under Sections 17 and 18(3), UKIPO, Application No. GB1801661.8, dated Jul. 30, 2018.
Combined Search and Examination Report under Sections 17 and 18(3), UKIPO, Application No. GB1801684.2, dated Aug. 1, 2018.
Combined Search and Examination Report under Sections 17 and 18(3), UKIPO, Application No. GB1803570.9, dated Aug. 21, 2018.
Combined Search and Examination Report under Sections 17 and 18(3), UKIPO, Application No. GB1804841.9, dated Sep. 27, 2018.
International Search Report and Written Opinion of the International Searching Authority, International Application No. PCT/GB2018/052906, dated Jan. 14, 2019.
Villalba, Jesus et al., Preventing Replay Attacks on Speaker Verification Systems, International Carnahan Conference on Security Technology (ICCST), 2011 IEEE, Oct. 18, 2011, pp. 1-8.
Combined Search and Examination Report, UKIPO, Application No. GB1713697.9, dated Feb. 20, 2018.
Chen et al., “You Can Hear But You Cannot Steal: Defending Against Voice Impersonation Attacks on Smartphones”, Proceedings of the International Conference on Distributed Computing Systems, PD: 20170605.
International Search Report and Written Opinion of the International Searching Authority, International Application No. PCT/GB2018/051765, dated Aug. 16, 2018.
Ohtsuka, Takahiro and Kasuya, Hideki, Robust ARX Speech Analysis Method Taking Voice Source Pulse Train Into Account, Journal of the Acoustical Society of Japan, 58, 7, pp. 386-397, 2002.
Wikipedia, Voice (phonetics), https://en.wikipedia.org/wiki/Voice_(phonetics), accessed Jun. 1, 2020.
Zhang et al., DolphinAttack: Inaudible Voice Commands, Retrieved from Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, Aug. 2017.
Song, Liwei, and Prateek Mittal, Poster: Inaudible Voice Commands, Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, Aug. 2017.
Fortuna, Andrea, [Online], DolphinAttack: inaudiable voice commands allow attackers to control Siri, Alexa and other digital assistants, Sep. 2017.
Lucas, Jim, What is Electromagnetic Radiation?, Mar. 13, 2015, Live Science, https://www.livescience.com/38169-ectromagnetism.html, pp. 1-11 (Year 2015).
Brownlee, Jason, A Gentle Introduction to Autocorrelation and Partial Autocorrelation, Feb. 6, 2017, https://machinelearningmastery.com/gentle-introduction-autocorrelation-partial-autocorrelationi, accessed Apr. 28, 2020.
First Office Action, China National Intellectual Property Administration, Patent Application No. 2018800418983, dated May 29, 2020.
International Search Report and Written Opinion, International Application No. PCT/GB2020/050723, dated Jun. 16, 2020.
Liu, Yuxi et al., “Earprint: Transient Evoked Otoacoustic Emission for Biometrics”, IEEE Transactions on Information Forensics and Security, IEEE, Piscataway, NJ, US, vol. 9, No. 12, Dec. 1, 2014, pp. 2291-2301.
Seha, Sherif Nagib Abbas et al., “Human recognition using transient auditory evoked potentials: a preliminary study”, IET Biometrics, IEEE, Michael Faraday House, Six Hills Way, Stevenage, Herts., UK, vol. 7, No. 3, May 1, 2018, pp. 242-250.
Liu, Yuxi et al., “Biometric identification based on Transient Evoked Otoacoustic Emission”, IEEE International Symposium on Signal Processing and Information Technology, IEEE, Dec. 12, 2013, pp. 267-271.

Related Publications (1)

	Number	Date	Country
	20190115046 A1	Apr 2019	US

Provisional Applications (1)

	Number	Date	Country
	62571944	Oct 2017	US

Robustness of speech processing system against ultrasound and dolphin attacks

Information

Patent Number

Date Filed

Date Issued

Inventors

Original Assignees

Examiners

Agents

CPC

Field of Search

CPC

International Classifications

Term Extension

Abstract

Description

Claims

Priority Claims (1)

US Referenced Citations (202)

Foreign Referenced Citations (39)

Non-Patent Literature Citations (56)

Related Publications (1)

Provisional Applications (1)