When using electronic devices, such as computing devices, users may play audio data through output devices such as speakers, earphones, or headphones. Such audio data may comprise different types of sound, for instance sounds within the hearing range of humans, inaudible sounds for the human ear, soft sounds, loud sounds, noise, and music, amongst others. The sources of the audio data may be, for instance, a readable-memory belonging to the electronic device, an external readable-memory connected to the electronic device, or a remote location accessible through the Internet.
Features of the present disclosure are illustrated by way of example and are not limited in the following figure(s), in which like numerals indicate like elements, in which:
For simplicity and illustrative purposes, the present disclosure is described by referring mainly to examples. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure. It will be readily apparent, however, that the present disclosure may be practiced without limitation to these specific details. In other instances, some methods and structures have not been described in detail so as not to unnecessarily obscure the present disclosure.
Throughout the present disclosure, the terms “a” and “an” are intended to denote at least one of a particular element. As used herein, the term “includes” means includes but not limited to, the term “including” means including but not limited to. The term “based on” means based at least in part on.
Electronic devices may be used to reproduce audio data received from input devices. Such input devices may be within the electronic device, for instance a memory of the electronic device, or may be remote to the electronic device. Examples of remote input devices may be an external electronic device connected to the electronic device, a remote location accessible through a network such as the Internet, or microphones of an external electronic device locally connected to the electronic device and/or connected via a network. In order to play the sound associated with the audio data, the electronic devices comprise output devices. In the same way as the input devices, the output devices may belong to the electronic devices, for instance a speaker of the electronic device, or may be an external output device connected to the electronic device, for instance earphones, headphones, or external speakers. The selection of a specific type of output device may depend on the preferences of the user or other factors, such as the device(s) availability. Hence, when having multiple output devices available, the user may select one of them at their discretion.
Throughout this description, the term “electronic device” refers generally to electronic devices that are to receive audio data and to transmit the audio data to an output device in order to reproduce it. Examples of electronic devices comprise displays, computer desktops, all-in-one computers, portable computers, printers, smartphones, tablets, and additive manufacturing machines (3D printers), amongst others.
When selecting an output device, users may take into account aspects such as where they are using the electronic device, the presence of people or additional electronic devices nearby the electronic device, or the applications running in their own electronic device. In some cases, the electronic devices located nearby to the output device or the electronic device itself may comprise personal assistant application(s) that are invoked by the usage of a keyword. Therefore, if the audio data received by the input device and subsequently played through the output device contains that keyword, the keyword may activate or wake-up the personal assistant application of the own electronic device or electronic devices located nearby the output device.
Since most electronic devices such as computers and smartphones have personal assistant applications invoked by a keyword, the usage of output devices that reproduce the audio data has implicit the risk of invocating third-party applications in the user's electronic device or electronic devices located nearby the output device.
Examples of keywords used to invoke personal assistant applications may be “Ok Google”, “Alexa”, “Hey, Cortana”, “Hey, Siri”, amongst others. Hence, if the audio data received by the electronic device contains at least one of these keywords, a personal application(s) associated with the keyword(s) may be triggered if the output device selected for the electronic device enables the personal assistant application(s) to hear such keyword(s). Because most of the time users are not aware of the content of the sound data beforehand, this scenario creates uncertainty for users because of its unpredictability. For instance, when users are attending a conference call, the speaker may pronounce one of the keywords. Subsequently, the listeners will receive in their electronic devices audio data that, when reproduced in their output device(s), may trigger an action from any personal assistant application nearby the user(s) having enough range to hear the keyword, if any.
In order to reduce the risk of unexpectedly triggering a personal assistant application, users may turn off all the personal assistant applications of the electronic device and the electronic devices located nearby when an output device of the electronic device is to reproduce sounds associated with sound data. However, even though, in some scenarios, users will be able to turn off (or temporary block) all the personal assistant applications, this approach is time-consuming for users. Further, users may desire to intentionally utilize such personal assistant applications while also utilizing the audio output device. An alternative approach may be to use as output devices earphones or headphones instead of speakers in order to avoid other electronic devices hearing the sounds. However, in some cases the usage of speakers is inevitable.
In order to effectively improve the transmission of audio data in an electronic device by reducing the risk of triggering personal assistant applications, methods to correct the audio data may be used. In the same way, systems may be used so as to reduce the risk of invoking or waking-up personal assistant applications in the electronic device or the electronic devices located nearby.
According to some examples, personal assistant applications associate keywords with acoustic patterns. Therefore, even though specific keywords are not strictly pronounced sound outputted by the output devices, the personal assistant applications may be woken-up or invoked. In an example, an acoustic pattern comprises a frequency pattern and an amplitude pattern within a time frame (or cadence time frame). Hence, if a sound matches the frequency and the amplitude patterns within the cadence time frame, the personal assistant applications will identify the sound as a keyword and an action will be triggered. In some cases, each of the frequency pattern, the amplitude pattern, and the cadence time frame comprises a tolerance range, i.e., there are multiple values of frequency, amplitude and cadence time frame indicating the presence of a keyword.
According to other examples, personal assistant applications may be invoked by sound which may be inaudible for human hearing. Based on the non-linearity of the microphones used by the electronic devices containing the personal assistant applications, a third-party may send to the electronic device of the user sounds which are inaudible for the user but within the hearing range of the personal assistant application. In some examples, these sounds may be embedded within audio or video segments.
Referring now to
At block 110, method 100 comprises receiving a first audio stream. The first audio stream may be received, for instance, from an input device. In an example, an electronic device receives the first audio stream through the input device. The first audio stream represents sound data to be outputted by an output device of the electronic device. In an example, the output device may be a speaker. At block 120, method 100 comprises detecting presence within the first audio stream of at least an acoustic pattern. The acoustic pattern may be detected, for instance, by using a data-processing system to determine a portion of data including an acoustic pattern. Since different keywords may be possible, block 120 comprises detecting an acoustic pattern of the set of acoustic patterns. Hence, by comparing the first audio stream with patterns that would launch (or invoke) a personal assistant application, method 100 compares a portion of the first audio stream with a pattern. At block 130, method 100 comprises executing at least one corrective action over a portion of data of the first audio stream including the acoustic pattern such that a second audio stream is obtained. By applying at least one corrective action, the first audio stream is modified to compensate for the presence of the acoustic pattern within the second audio stream. In an example, the corrective actions comprise jamming the portion of data of the first audio stream including the acoustic pattern, omitting the portion of data of the first audio stream including the acoustic pattern, and applying an audio scrambler over the portion of the first audio stream including the acoustic pattern. At block 140, method 100 comprises transmitting the second audio stream to an output device. Since the second audio stream does not contain acoustic pattern(s) associated with the keyword(s), or compensates for the presence of the acoustic pattern(s), playing the second audio stream with the output device will not unexpectedly trigger personal assistant applications.
As used herein, the term “jamming” refers to a modification of the energy levels of a portion of sound data in order to change the pressure levels generated upon the portion of sound data is outputted by an output device.
As used herein, the term “audio scrambling” refers to the modification of a portion of sound data by adding additional audio data such that the resulting portion of sound data is distorted.
In some examples, method 100 may further comprise applying a filter over the second audio stream, wherein the filter comprises filtering frequencies that are outside of a frequency range, and filtering energy levels that are outside an energy range. By filtering the frequency and energy levels, the inadvertent usage of inaudible sounds to launch personal assistant applications in listening devices will be prevented. Hence, if method 100 includes frequency filters and energy level filters, the usage of inaudible sounds to launch, invoke, or execute personal assistant applications is prevented.
Referring now to
The flowchart 200, at block 210, represents the receipt of an audio stream. The audio stream may be received, for instance, from an input device. Upon the audio stream being received, block 220 determines whether or not data within the audio stream fulfills or matches a pattern of one acoustic pattern of a set of acoustic patterns 225. For instance, in
Then, at block 230, a corrective action 232 is executed over a portion of data 231a satisfying the acoustic pattern 226. As indicated by the arrow between the acoustic pattern 226 and the corrective action 232, the corrective action 232 may be selected based on the acoustic pattern 226. However, in other examples, the corrective action 232 may be selected based on the preferences of the user. Upon the corrective action 232 being executed over the portion of data 231a, a corrected portion of data 231b is obtained. The corrected portion of data 231b, which compensates for, or no longer contains the acoustic pattern 226, is subsequently inserted into the audio stream in order to replace the portion of data 231a, thereby providing a different audio stream with respect to the audio stream received from the input device, i.e., a corrected audio stream. At block 240, the corrected audio stream is transmitted to an output device. Since the portion of data 231a is no longer included in the audio stream, users may reproduce the corrected audio stream through any kind of output device while not waking-up or invoking personal assistant applications in either their own electronic device or electronic devices located nearby.
In some examples, the corrective action may be selected based on users' preferences. Hence, if users aim to omit the acoustic patterns determined at block 220 from the audio stream, the corrective action may comprise modifying the audio stream received at block 210 to omit the portion of data 231a. Hence, the corrected portion of data 231b may include a sound wave having low energy levels, such as a sound wave having a null amplitude. In other examples, the corrective action 232 may comprise replacing the portion of data 231a for a pre-defined acoustic signal, such as a beep signal.
In some other examples, users may consider minimizing as much as possible the effects derived from the modifications over the audio stream. Hence, even though users aim to remove the acoustic patterns from the audio stream, they may be interested in keeping the keyword(s) associated with such acoustic patterns in the corrected audio stream such that the personal assistant application cannot detect them. Hence, instead of omitting the portion of data 231a for a corrected portion of data 231b, the corrective action 232 may comprise modifying specific characteristics of the portion of data 231a so that is not identified as an acoustic pattern, but the keyword(s) is still audible and/or recognizable by users. Examples of characteristics that can be modified in order to obtain keywords that do not satisfy the acoustic pattern but are still recognizable by the users are modifying the frequency, modifying the energy levels of the audio data, modifying the time to reproduce the sound data, or a combination thereof. In some examples, the corrective action may comprise partially modifying the portion of data 231a instead of modifying the whole portion of data 231a.
According to some examples, an acoustic pattern associated with a keyword comprises parameters defining the sound of the keyword. The acoustic pattern, when outputted by an output device, may be identified by personal assistant applications as the keyword. Since sound travels in compression waves made up of areas of increased pressure called compressions and areas of decreased pressure called rarefactions, sounds can be represented as a series of physical parameters such as frequency and amplitude. The amplitude of a sound indicates the amount of energy that the wave carries. As the energy increases, the intensity and volume of the sound increases. The frequency of a sound indicates the number of wavelengths within a unit of time, a wavelength being the distance between two crests or two troughs. Hence, since keywords can be characterized by these physical parameters, electronic devices are capable of determining the presence of a keyword by identifying the presence of these patterns corresponding to such keyword during a time frame, or cadence time. For instance, in the examples of
Referring now to
According to other examples, multiple pattern waves may be possible for the same keyword. Hence, different amplitude values and/or frequencies may be associated with the same keyword. In some other examples, the pattern waves comprise ranges for the frequency and/or the amplitude. Hence, when determining if a portion of data comprises a pattern, ranges for the amplitude and/or the frequency may be used. Similarly, the time frame of the pattern wave may have multiple possible values (for instance a range of values from 1 second to 2 seconds).
Referring now to
The series of charts 400 further comprises a first corrective action represented on the upper right chart. The corrective action comprises modifying the frequency values such that the frequency pattern of the corrected portion of data does not match with the frequency pattern of the pattern wave 310. In order to modify the frequency pattern within the time frame 313, the amplitude values of the pattern wave 310 are maintained but the frequency is increased, thereby resulting in a first corrected wave 410. The first corrected wave 410 takes a corrected time 411 for a full cycle, i.e., a corrected frequency of one divided by the corrected time 411. As a result, the personal assistant applications won't recognize the keyword associated with the pattern wave 310 because the pattern wave 310 has been replaced by the first corrected wave 410. In some examples, the frequency is partially modified during a portion of the time frame 313 instead of modifying the pattern wave 310 during the entire time frame 313. In other examples, the frequency may be decreased instead of increased. In some other examples, the frequency is increased as much as the first corrected wave 410 is still audible by the human hearing. In further examples, the pattern wave 310 experiences both increases of frequency and decreases as long as the acoustic pattern does not match with the resulting wave.
The series of charts 400 further comprises a second corrective action represented on the bottom left chart. The corrective action comprises applying an audio scrambler to the pattern wave 310 such that a second corrected wave 420 is obtained. The second corrected wave 420, when outputted by an output device, will reproduce a sound that won't be detected as the keyword. Because the amplitude and the frequency of the pattern wave 310 will have changed, the personal assistant applications won't be capable of recognizing the keyword associated with the pattern wave 310. In the example represented in
On the bottom right chart, a third corrective action is represented. The third corrective action comprises jamming the pattern wave 310 by modifying the amplitude values such that a third corrected wave 430 is obtained. The third corrected wave 430, when outputted by an output device, will reproduce a sound which won't be detected as the keyword because of the changes in the energy levels with respect to the acoustic pattern associated with the keyword. The modification of the energy levels of the audio data, when outputted by an output device, will generated different pressure levels. Since the personal assistant application comprise a range for pressure levels, the third corrective wave when outputted by the outputted device won't be recognized as the keyword. In the same way as the first corrective action and the second corrective action, the third corrective action may be applied to a portion of the pattern wave 310. In other examples, instead of reducing the amplitude values, the third corrective actions comprise increasing the amplitude. In further examples, both increases and decreases are performed over the pattern wave 310 as long as the acoustic pattern is not fulfilled by the resulting wave.
In some other examples, multiple corrective actions may be applied over the portion of data 231a having the pattern wave 310. Therefore, partial and/or total changes of frequency, partial and/or total changes of amplitude, and partial and/or total audio scrambling may be performed over the portion of data 231a. In other examples, different types of corrective actions may be used such as omitting the portion of data 231a from the audio stream, as previously explained in reference to other examples.
According to some examples, a pattern likelihood (or behavior likelihood) for each acoustic pattern may be determined based on portion of data of an audio stream. The pattern likelihood may represent an accomplished portion of the acoustic pattern with respect the complete acoustic pattern. In other words, the pattern likelihood monitors, based on a portion of data, if an acoustic pattern is likely to be present. Hence, even though the set of characteristics associated with a keyword have not been completely found, the pattern likelihood may indicate how close a portion of data is to the whole acoustic pattern. Therefore, in case of measuring that one of the pattern likelihoods has reached a threshold value, a corrective action may be executed over the remaining data such that a personal assistant application won't recognize the keyword because the acoustic pattern is not completely found in the audio outputted by the output device. In an example, a first likelihood may be 66%, a second likelihood is 50% and a third likelihood is 78%. If the threshold value is 75%, a corrective action may be triggered in order to modify the portion of data which will potentially contain the remaining 22% of the acoustic pattern associated with the third likelihood. If the threshold value is set at 65%, a corrective action may be triggered in order to modify the portion of data which will potentially contain the remaining 34% of the acoustic pattern associated with the first likelihood and the remaining 22% of the acoustic pattern associated with the third likelihood. In some examples, corrective actions may be selected based on behavior likelihood that exceeds the threshold value.
Referring now to
In order to determine presence of patterns during at least a time frame 520, the system may compare audio data within the input signal with a set of characteristics associated with a specific pattern. If the set of characteristics match with the input data, the audio data is determined to contain data a pattern associated with a keyword that may invoke a personal assistant application. In an example, the set of characteristics is the set of characteristics 300 previously described in
Upon determination of presence of a pattern associated with a keyword within the input data, a corrective action is executed over a portion of data including the pattern. Examples of corrective actions comprises scrambling a portion of the input data, modifying the frequency pattern of a portion of the input data, modifying the amplitude pattern of a portion of the input data, omitting portions of audio data, amongst others. In other examples, multiple corrective actions may be executed over the portion of data. In some other examples, the corrective actions comprise the examples of first, second and third corrective actions previously described in reference with
In some examples, the computer-readable medium 500 comprises further instructions to cause the system to determine a pattern likelihood for each pattern of the set of patterns and execute a correction action over the input data if one of the pattern likelihoods exceeds a threshold value. As described above, the pattern likelihood may represent an accomplished portion of the pattern, i.e. how much pattern has been found in the input data. Hence, if one of the pattern likelihoods exceeds a threshold value, the corrective action is executed over an expected remaining portion of the pattern. In other examples, the corrective action is executed over the portion of the input data that has contributed to the pattern likelihood, i.e. having a common pattern(s) with a pattern of the set of patterns. In some other examples, different threshold values may be defined for different patterns.
In some other examples, execute a corrective action over the input signal to modify input data during the time frame comprises one of applying to the input data a filter to modify the behavior during the time frame and omit from the input data the time frame containing the pattern.
In further examples, the corrective action executed over the input data if one of the pattern likelihoods exceeds a threshold value is selected based on the pattern of the set of patterns for which the threshold value is exceeded, i.e., the corrective action is selected based on the keyword that could invoke a personal assistant application.
According to some examples, the computer-readable medium 500 comprises further instructions to cause the system to read from a memory the set of patterns, receive user input from a user interface, and modify the set of patterns based on the user input. Since the patterns that invoke a personal assistant application may change, a user is capable of providing an updated version of the set of patterns through the user interface. In other examples, the computer-readable medium 500 comprises instructions to cause the system to periodically check for an updated set of patterns through the Internet, and if any, replace the set of patterns with the updated version.
Referring now to
In some examples, memory 620 may comprise further instructions to cause the electronic system 600 to identify portions of data of the audio data that match with a set of patterns of the set of behavior patterns. In an example, identify portions of sound data having behavior patterns comprises comparing a first set of patterns of the portion of sound data with each reference set of patterns of each behavior pattern of the set of behavior patterns within a time frame to determine differences between patterns and determine a behavior likelihood based on the differences. The reference set of patterns may be, for instance, the set of characteristics 300 previously explained in reference with
In other examples, the first set of patterns comprises a frequency pattern and an amplitude pattern for the sound data received by the electronic system 600 from the external device, and each reference set of patterns comprises at least a reference frequency pattern, at least a reference amplitude pattern, and at least a cadence time frame. Since multiple combinations of frequency, amplitude and cadence time are possible, a reference set of patterns may comprise different possibilities associated with the same keyword. In further examples, the set of instructions 631 may comprises further instructions to cause the electronic system 600 to apply a frequency filter to apply a frequency filter and a sound energy level filter over the corrected sound data.
In some other examples, the set of instructions 631 of the electronic system 600 correspond to the instructions 510, 520, 530 and 540 previously explained in reference with
According to other examples, the corrective actions that may be executed over the portion of data including the behavior pattern comprise jamming the portion of data of the first audio stream including the acoustic pattern, omitting the portion of data of the first audio stream including the acoustic pattern, and applying an audio scrambler over the portion of the first audio stream including the acoustic pattern, as previously explained in reference with
What has been described and illustrated herein are examples of the disclosure along with some variations. The terms, descriptions, and figures used herein are set forth by way of illustration only and are not meant as limitations. Many variations are possible within the scope of the disclosure, which is intended to be defined by the following claims (and their equivalents) in which all terms are meant in their broadest reasonable sense unless otherwise indicated.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2021/015797 | 1/29/2021 | WO |