Humans can engage in human-to-computer dialogs with interactive software applications referred to herein as “automated assistants” (also referred to as “digital agents,” “interactive personal assistants,” “intelligent personal assistants,” “assistant applications,” “conversational agents,” etc.). For example, humans (which when they interact with automated assistants may be referred to as “users”) may provide commands and/or requests to an automated assistant using spoken natural language input (i.e., utterances), which may in some cases be converted into text and then processed, by providing textual (e.g., typed) natural language input, and/or through touch and/or utterance free physical movement(s) (e.g., hand gesture(s), eye gaze, facial movement, etc.). An automated assistant responds to a request by providing responsive user interface output (e.g., audible and/or visual user interface output), controlling one or more smart devices, and/or controlling one or more function(s) of a device implementing the automated assistant (e.g., controlling other application(s) of the device).
As mentioned above, many automated assistants are configured to be interacted with via spoken utterances. To preserve user privacy and/or to conserve resources, automated assistants refrain from performing one or more automated assistant functions based on all spoken utterances that are present in audio data detected via microphone(s) of a client device that implements (at least in part) the automated assistant. Rather, certain processing based on spoken utterances occurs only in response to determining certain condition(s) are present.
For example, many client devices, that include and/or interface with an automated assistant, include a hotword detection model. When microphone(s) of such a client device are not deactivated, the client device can continuously process audio data detected via the microphone(s), using the hotword detection model, to generate predicted output that indicates whether one or more hotwords (inclusive of multi-word phrases) are present, such as “Hey Assistant”, “OK Assistant”, and/or “Assistant”. When the predicted output indicates that a hotword is present, any audio data that follows within a threshold amount of time (and optionally that is determined to include voice activity) can be processed by one or more on-device and/or remote automated assistant components such as speech recognition component(s), voice activity detection component(s), etc. Further, recognized text (from the speech recognition component(s)) can be processed using natural language understanding engine(s) and/or action(s) can be performed based on the natural language understanding engine output. The action(s) can include, for example, generating and providing a response and/or controlling one or more application(s) and/or smart device(s)). However, when the predicted output indicates that a hotword is not present, corresponding audio data will be discarded without any further processing, thereby conserving resources and user privacy.
An environment (e.g., a room, home, office, etc.) may include two or more client devices capable of far-field speech recognition, each executing an automated assistant. In such an environment, two or more automated assistants may be triggered (i.e., activated) in response to a spoken utterance of a user that includes a hotword. This activation of two or more automated assistants in response to the spoken utterance may be undesirable as it may not have been intended by the user and may result in unnecessary utilization of processing and/or power (e.g., battery) resources, etc.
Some implementations disclosed herein are directed to multi-device arbitration using pairwise range multilateration between earbuds and a target device. As described in more detail herein, ultra-wideband (UWB) antennas on each earbud in a pair of earbuds may be used to perform a two-range multilateration on a target device that is also equipped with UWB. Based on a spherical threshold on the multilateration result, the target device may be localized down to sub-meter accuracy, and thus multi-device arbitration may be accurately performed. In some implementations, by identifying a target device using pairwise range multilateration, unintended activations of automated assistants may be minimized, and excess consumption of processing and/or power (e.g., battery) resources may be reduced or avoided. Additionally, in some implementations, the target device, once identified, may perform automated speech recognition on an input audio signal, prior to identifying a hotword in the input audio signal.
In various implementations, a method implemented by one or more processors may include: transmitting, by an ultra-wideband transceiver on a first user-worn device, to an ultra-wideband transceiver on a candidate device, a first impulse signal; receiving, by the ultra-wideband transceiver on the first user-worn device, from the ultra-wideband transceiver on the candidate device, a second impulse signal, the second impulse signal being sent as a response to the first impulse signal; determining a physical distance between the first user-worn device and the candidate device based on an elapsed time between transmitting the first impulse signal and receiving the second impulse signal; transmitting, by an ultra-wideband transceiver on a second user-worn device, to the ultra-wideband transceiver on the candidate device, a third impulse signal; receiving, by the ultra-wideband transceiver on the second user-worn device, from the ultra-wideband transceiver on the candidate device, a fourth impulse signal, the fourth impulse signal being sent as a response to the third impulse signal; determining a physical distance between the second user-worn device and the candidate device based on an elapsed time between transmitting the third impulse signal and receiving the fourth impulse signal; and determining, based on (i) the physical distance between the first user-worn device and the candidate device and (ii) the physical distance between the second user-worn device and the candidate device, that the candidate device is a query target.
In some implementations, determining that the candidate device is the query target may include: determining a first absolute value, the first absolute value being an absolute value of a difference between (i) the physical distance between the first user-worn device and the candidate device and (ii) the physical distance between the second user-worn device and the candidate device; and determining that the first absolute value satisfies a first threshold. In some implementations, determining that the candidate device is the query target may further include: determining a second absolute value, the second absolute value being an absolute value of a difference between a ground truth device distance and an average of (i) the physical distance between the first user-worn device and the candidate device and (ii) the physical distance between the second user-worn device and the candidate device; and determining that the second absolute value satisfies a second threshold.
In some implementations, the first user-worn device is a first earbud, and the second user-worn device is a second earbud. In some implementations, determining the physical distance between the first user-worn device and the candidate device is further based on a predetermined time in which the candidate device prepares the response to the first impulse signal; and determining the physical distance between the second user-worn device and the candidate device is further based on a predetermined time in which the candidate device prepares the response to the second impulse signal.
In some implementations, the method may further include, in response to determining that the candidate device is the query target, causing the candidate device to activate an automated assistant function. In some implementations, the method may further include causing the candidate device to perform automated speech recognition on an input audio signal, prior to identifying a hotword in the input audio signal.
In some additional or alternative implementations, a computer program product may include one or more computer-readable storage media having program instructions collectively stored on the one or more computer-readable storage media. The program instructions may be executable to: transmit, by an ultra-wideband transceiver on a first user-worn device, to an ultra-wideband transceiver on a candidate device, a first impulse signal; receive, by the ultra-wideband transceiver on the first user-worn device, from the ultra-wideband transceiver on the candidate device, a second impulse signal, the second impulse signal being sent as a response to the first impulse signal; determine a physical distance between the first user-worn device and the candidate device based on an elapsed time between transmitting the first impulse signal and receiving the second impulse signal; transmit, by an ultra-wideband transceiver on a second user-worn device, to the ultra-wideband transceiver on the candidate device, a third impulse signal; receive, by the ultra-wideband transceiver on the second user-worn device, from the ultra-wideband transceiver on the candidate device, a fourth impulse signal, the fourth impulse signal being sent as a response to the third impulse signal; determine a physical distance between the second user-worn device and the candidate device based on an elapsed time between transmitting the third impulse signal and receiving the fourth impulse signal; and determine, based on (i) the physical distance between the first user-worn device and the candidate device and (ii) the physical distance between the second user-worn device and the candidate device, that the candidate device is a query target.
In some additional or alternative implementations, a system may include a processor, a computer-readable memory, one or more computer-readable storage media, and program instructions collectively stored on the one or more computer-readable storage media. The program instructions may be executable to: transmit, by an ultra-wideband transceiver on a first user-worn device, to an ultra-wideband transceiver on a candidate device, a first impulse signal; receive, by the ultra-wideband transceiver on the first user-worn device, from the ultra-wideband transceiver on the candidate device, a second impulse signal, the second impulse signal being sent as a response to the first impulse signal; determine a physical distance between the first user-worn device and the candidate device based on an elapsed time between transmitting the first impulse signal and receiving the second impulse signal; transmit, by an ultra-wideband transceiver on a second user-worn device, to the ultra-wideband transceiver on the candidate device, a third impulse signal; receive, by the ultra-wideband transceiver on the second user-worn device, from the ultra-wideband transceiver on the candidate device, a fourth impulse signal, the fourth impulse signal being sent as a response to the third impulse signal; determine a physical distance between the second user-worn device and the candidate device based on an elapsed time between transmitting the third impulse signal and receiving the fourth impulse signal; and determine, based on (i) the physical distance between the first user-worn device and the candidate device and (ii) the physical distance between the second user-worn device and the candidate device, that the candidate device is a query target.
The above description is provided as an overview of some implementations of the present disclosure. Further description of those implementations, and other implementations, are described in more detail below.
Various implementations can include a non-transitory computer readable storage medium storing instructions executable by one or more processors (e.g., central processing unit(s) (CPU(s)), graphics processing unit(s) (GPU(s)), digital signal processor(s) (DSP(s)), and/or tensor processing unit(s) (TPU(s)) to perform a method such as one or more of the methods described herein. Other implementations can include an automated assistant client device (e.g., a client device including at least an automated assistant interface for interfacing with cloud-based automated assistant component(s)) that includes processor(s) operable to execute stored instructions to perform a method, such as one or more of the methods described herein. Yet other implementations can include a system of one or more servers that include one or more processors operable to execute stored instructions to perform a method such as one or more of the methods described herein.
In implementations, the environment 100 may include a pair of wireless earbuds 110-1, 110-2 that are worn by a user 190. In some implementations, the wireless earbuds 110-1, 110-2 can gently reside in a right ear (in the case of wireless earbud 110-2) and a left ear (in the case of wireless earbud 110-1) of the user 190 for listening to audio generated at an application of another computing device. Wireless earbud 110-1 may include an ultra-wideband transceiver 120-1, and wireless earbud 110-2 may include an ultra-wideband transceiver 120-2. The ultra-wideband transceivers 120-1, 120-2 may include an ultra-wideband chipset and a single antenna capable of ranging against a target ultra-wideband chipset. Additional and/or alternative headphone devices (e.g., on-ear or over-the-ear headphones) may be provided instead of the wireless earbuds 110-1, 110-2.
In other implementations, other wearable devices (e.g., non-headphone devices) may be provided instead of wireless earbuds 110-1, 110-2, and an ultra-wideband transceiver 120-1 may be located on a portion of the wearable device that is worn on a first side of a user's body, and an ultra-wideband transceiver 120-2 may be located on a portion of the wearable device that is worn on a second side of a user's body. For example, a pair of smart glasses (not shown) may be provided instead of wireless earbuds 110-1, 110-2, and the pair of smart glasses may include an ultra-wideband transceiver 120-1 located at the left temple of the smart glasses (e.g., worn on the left side of the user 190), and an ultra-wideband transceiver 120-2 located at the right temple of the smart glasses (e.g., worn on the right side of the user 190).
In implementations, the environment 100 may also include one or more client device(s) 130-1, 130-2, . . . 130-n. The one or more client device(s) 130-1, 130-2, . . . 130-n may include one or more smart devices, such as a standalone assistant-centric interactive speaker, a standalone assistant-centric interactive display with speaker(s), a smart appliance such as a smart television, and/or a wearable apparatus of a user that includes a computing device (e.g., a watch of the user having a computing device, etc.). The one or more client device(s) 130-1, 130-2, . . . 130-n may also include any other computing devices, such as one or more smartphones, desktop computing devices, laptop computing devices, and/or tablet computing devices, etc. Additional and/or alternative client device(s) 130-1, 130-2, . . . 130-n may be provided.
The one or more client device(s) 130-1, 130-2, . . . 130-n may include an instance of an automated assistant 140-1, 140-2, . . . , 140-n. The automated assistant 140-1, 140-2, . . . , 140-n can process user inputs received from input device(s) of I/O components 150-1, 150-2, . . . , 150-n, such as spoken inputs detected via microphone(s) of I/O components 150-1, 150-2, . . . , 150-n, touch inputs received via touch-screen displays of I/O components 150-1, 150-2, . . . , 150-n, images detected via camera(s) of I/O components 150-1, 150-2, . . . , 150-n, etc. Further, the automated assistant 140-1, 140-2, . . . , 140-n can optionally render various outputs via output device(s) of I/O components 150-1, 150-2, . . . , 150-n, such as speaker(s) and/or touch-screen displays of I/O components 150-1, 150-2, . . . , 150-n. The one or more client device(s) 130-1, 130-2, . . . 130-n may also include ultra-wideband transceivers 160-1, 160-2, . . . , 160-n. The ultra-wideband transceivers 160-1, 160-2, . . . , 160-n may include an ultra-wideband chipset and a single antenna capable of ranging against a target ultra-wideband chipset.
In some implementations, the ultra-wideband transceiver 120-1 of the wireless earbud 110-1 may perform ranging against the ultra-wideband transceiver 160-1, 160-2, . . . , 160-n of a particular client device 130-1, 130-2, . . . 130-n, determining a distance between the wireless earbud 110-1 and the particular client device 130-1, 130-2, . . . 130-n. For example, the ultra-wideband transceiver 120-1 of the wireless earbud 110-1 may perform ranging against the ultra-wideband transceiver 160-1 of the client device 130-1, determining a distance 170-1 between the wireless earbud 110-1 and the client device 130-1. The ultra-wideband transceiver 120-1 of the wireless earbud 110-1 may also perform ranging against the ultra-wideband transceiver 160-2 of the client device 130-2, determining a distance 170-2 between the wireless earbud 110-1 and the client device 130-2. The ultra-wideband transceiver 120-1 of the wireless earbud 110-1 may also perform ranging against the ultra-wideband transceiver 160-n of the client device 130-n, determining a distance 170-n between the wireless earbud 110-1 and the client device 130-n.
The ultra-wideband transceiver 120-2 of wireless earbud 110-2 may also perform ranging against the ultra-wideband transceiver 160-1, 160-2, . . . , 160-n of one of the client devices 130-1, 130-2, . . . 130-n, determining a distance between the wireless earbud 110-2 and the particular client device 130-1, 130-2, . . . 130-n. For example, the ultra-wideband transceiver 120-2 of the wireless earbud 110-2 may perform ranging against the ultra-wideband transceiver 160-1 of the client device 130-1, determining a distance 180-1 between the wireless earbud 110-2 and the client device 130-1. The ultra-wideband transceiver 120-2 of the wireless earbud 110-2 may also perform ranging against the ultra-wideband transceiver 160-2 of the client device 130-2, determining a distance 180-2 between the wireless earbud 110-2 and the client device 130-2. The ultra-wideband transceiver 120-2 of the wireless earbud 110-2 may also perform ranging against the ultra-wideband transceiver 160-n of the client device 130-n, determining a distance 180-n between the wireless earbud 110-2 and the client device 130-n.
The second ultra-wideband transceiver understands a template a priori for decoding the first impulse signal 220. The second ultra-wideband transceiver prepares a reply to the first impulse signal 220 in a fixed time, T (reply) 250. At time Tx2260, the second ultra-wideband transceiver sends a second impulse signal 270 to the first ultra-wideband transceiver. The second impulse signal 270 is received by the first ultra-wideband transceiver at time Rx1280. The elapsed time between Tx2260 and Rx1280 is T (2→1) 290.
The first ultra-wideband transceiver then computes the distance d between the first ultra-wideband transceiver and the second ultra-wideband transceiver, based on the roundtrip time RTT, which is the sum of T (1→2) 240, T (reply) 250, and T (2→1) 290. In particular, the first ultra-wideband transceiver may compute distance d using the equation
where c is the speed of light.
In the example environment 100 of
In some implementations, a second absolute value of a difference between a ground truth device distance and an average of the distance 170-1 and the distance 180-1 may be determined, and the client device 130-1 is determined to be the target device further based on the second absolute value satisfying a second threshold. In some implementations, the ground truth device distance may be calibrated before runtime (e.g., during setup of the client device 130-1).
In the example environment 100 of
In some implementations, based on the client device 130-1 being determined to be the target device for querying, an automated assistant function of the client device 130-1 may be activated. For example, the client device 130-1 may be caused to perform automated speech recognition on an input audio signal, prior to identifying a hotword in the input audio signal. In some implementations, activation of an automated assistant function of the client devices 130-2, 130-n may be suppressed, based on the client device 130-1 being determined to be the target device for querying. In this way, unintended activations of automated assistants running on the client devices 130-2, 130-n may be avoided.
In some implementations, the ultra-wideband transceivers 120-1, 120-2 of the wireless earbuds 110-1, 110-2 may perform ranging against the ultra-wideband transceivers 160-1, 160-2, . . . , 160-n of each of the client devices 130-1, 130-2, . . . 130-n in the environment 100 at a high inference rate (e.g., 30 Hz) in query time with an always-on ultra-wideband chipset. Accordingly, a target device for querying may be rapidly updated as the user 190 turns to face a new one of the client devices 130-1, 130-2, . . . 130-n in the environment 100.
In some implementations, ultra-wideband ranging may be initiated by the wireless earbuds 110-1, 110-2 or a computing device (e.g., a smartphone) to which the wireless earbuds 110-1, 110-2 are connected. In other implementations, ultra-wideband ranging may be initiated by one or more of the client devices 130-1, 130-2, . . . 130-n in the environment 100.
In implementations, the environment 300 may also include client devices 310-1, 310-2. The client devices 310-1, 310-2 may include one or more smart devices, such as a standalone assistant-centric interactive speaker, a standalone assistant-centric interactive display with speaker(s), a smart appliance such as a smart television, and/or a wearable apparatus of a user that includes a computing device (e.g., a watch of the user having a computing device, etc.). The client devices 310-1, 310-2 may also include any other computing devices, such as one or more smartphones, desktop computing devices, laptop computing devices, and/or tablet computing devices, etc. Additional and/or alternative client device(s) client devices 310-1, 310-2 may be provided.
The client devices 310-1, 310-2 may include an instance of an automated assistant 320-1, 320-2. The automated assistant 320-1, 320-2 can process user inputs received from input device(s) of I/O components 330-1, 330-2, such as spoken inputs detected via microphone(s) of I/O components 330-1, 330-2, touch inputs received via touch-screen displays of I/O components 330-1, 330-2, images detected via camera(s) of I/O components 330-1, 330-2, etc. Further, the automated assistant 320-1, 320-2 can optionally render various outputs via output device(s) of I/O components 330-1, 330-2, such as speaker(s) and/or touch-screen displays of I/O components 330-1, 330-2. The client devices 310-1, 310-2 may also include ultra-wideband transceivers 340-1, 340-2. The ultra-wideband transceivers 340-1, 340-2 may include an ultra-wideband chipset and a single antenna capable of ranging against a target ultra-wideband chipset.
In some implementations, the ultra-wideband transceiver 120-1 of the wireless earbud 110-1 may perform ranging against the ultra-wideband transceiver 340-1, 340-2 of a particular client device 310-1, 310-2, determining a distance between the wireless earbud 110-1 and the particular client device 310-1, 310-2. For example, the ultra-wideband transceiver 120-1 of the wireless earbud 110-1 may perform ranging against the ultra-wideband transceiver 340-1 of the client device 310-1, determining a distance 350-1 between the wireless earbud 110-1 and the client device 310-1. The ultra-wideband transceiver 120-1 of the wireless earbud 110-1 may also perform ranging against the ultra-wideband transceiver 340-2 of the client device 310-2, determining a distance 350-2 between the wireless earbud 110-1 and the client device 310-2.
The ultra-wideband transceiver 120-2 of wireless earbud 110-2 may also perform ranging against the ultra-wideband transceiver 340-1, 340-2 of a particular client device 310-1, 310-2, determining a distance between the wireless earbud 110-2 and the particular client device 310-1, 310-2. For example, the ultra-wideband transceiver 120-2 of the wireless earbud 110-2 may perform ranging against the ultra-wideband transceiver 340-1 of the client device 310-1, determining a distance 360-1 between the wireless earbud 110-2 and the client device 310-1. The ultra-wideband transceiver 120-2 of the wireless earbud 110-2 may also perform ranging against the ultra-wideband transceiver 340-2 of the client device 310-2, determining a distance 360-2 between the wireless earbud 110-2 and the client device 310-2.
In the example environment 300 of
In the example environment 300 of
The computing device 402 and/or other third party client devices can be in communication with a server device over a network, such as the internet. Additionally, the computing device 402 and any other computing devices can be in communication with each other over a local area network (LAN), such as a Wi-Fi network. The computing device 402 can offload computational tasks to the server device in order to conserve computational resources at the computing device 402. For instance, the server device can host the automated assistant 404, and/or computing device 402 can transmit inputs received at one or more assistant interfaces 420 to the server device. However, in some implementations, the automated assistant 404 can be hosted at the computing device 402, and various processes that can be associated with automated assistant operations can be performed at the computing device 402.
In various implementations, all or less than all aspects of the automated assistant 404 can be implemented on the computing device 402. In some of those implementations, aspects of the automated assistant 404 are implemented via the computing device 402 and can interface with a server device, which can implement other aspects of the automated assistant 404. The server device can optionally serve a plurality of users and their associated assistant applications via multiple threads. In implementations where all or less than all aspects of the automated assistant 404 are implemented via computing device 402, the automated assistant 404 can be an application that is separate from an operating system of the computing device 402 (e.g., installed “on top” of the operating system)—or can alternatively be implemented directly by the operating system of the computing device 402 (e.g., considered an application of, but integral with, the operating system).
In some implementations, the automated assistant 404 can include an input processing engine 406, which can employ multiple different modules for processing inputs and/or outputs for the computing device 402 and/or a server device. For instance, the input processing engine 406 can include a speech processing engine 408, which can process audio data received at an assistant interface 420 to identify the text embodied in the audio data. The audio data can be transmitted from, for example, the computing device 402 to the server device in order to preserve computational resources at the computing device 402. Additionally, or alternatively, the audio data can be exclusively processed at the computing device 402.
The process for converting the audio data to text can include a speech recognition algorithm, which can employ neural networks, and/or statistical models for identifying groups of audio data corresponding to words or phrases. The text converted from the audio data can be parsed by a data parsing engine 410 and made available to the automated assistant 404 as textual data that can be used to generate and/or identify command phrase(s), intent(s), action(s), slot value(s), and/or any other content specified by the user. In some implementations, output data provided by the data parsing engine 410 can be provided to a parameter engine 412 to determine whether the user provided an input that corresponds to a particular intent, action, and/or routine capable of being performed by the automated assistant 404 and/or an application or agent that is capable of being accessed via the automated assistant 404. For example, assistant data 438 can be stored at the server device and/or the computing device 402, and can include data that defines one or more actions capable of being performed by the automated assistant 404, as well as parameters necessary to perform the actions. The parameter engine 412 can generate one or more parameters for an intent, action, and/or slot value, and provide the one or more parameters to an output generating engine 414. The output generating engine 414 can use the one or more parameters to communicate with an assistant interface 420 for providing an output to a user, and/or communicate with one or more applications 434 for providing an output to one or more applications 434.
In some implementations, the automated assistant 404 can be an application that can be installed “on-top of” an operating system of the computing device 402 and/or can itself form part of (or the entirety of) the operating system of the computing device 402. The automated assistant application includes, and/or has access to, on-device speech recognition, on-device natural language understanding, and on-device fulfillment. For example, on-device speech recognition can be performed using an on-device speech recognition module that processes audio data (detected by the microphone(s)) using an end-to-end speech recognition machine learning model stored locally at the computing device 402. The on-device speech recognition generates recognized text for a spoken utterance (if any) present in the audio data. Also, for example, on-device natural language understanding (NLU) can be performed using an on-device NLU module that processes recognized text, generated using the on-device speech recognition, and optionally contextual data, to generate NLU data.
NLU data can include intent(s) that correspond to the spoken utterance and optionally parameter(s) (e.g., slot values) for the intent(s). On-device fulfillment can be performed using an on-device fulfillment module that utilizes the NLU data (from the on-device NLU), and optionally other local data, to determine action(s) to take to resolve the intent(s) of the spoken utterance (and optionally the parameter(s) for the intent). This can include determining local and/or remote responses (e.g., answers) to the spoken utterance, interaction(s) with locally installed application(s) to perform based on the spoken utterance, command(s) to transmit to internet-of-things (IoT) device(s) (directly or via corresponding remote system(s)) based on the spoken utterance, and/or other resolution action(s) to perform based on the spoken utterance. The on-device fulfillment can then initiate local and/or remote performance/execution of the determined action(s) to resolve the spoken utterance.
In various implementations, remote speech processing, remote NLU, and/or remote fulfillment can at least selectively be utilized. For example, recognized text can at least selectively be transmitted to remote automated assistant component(s) for remote NLU and/or remote fulfillment. For instance, the recognized text can optionally be transmitted for remote performance in parallel with on-device performance, or responsive to failure of on-device NLU and/or on-device fulfillment. However, on-device speech processing, on-device NLU, on-device fulfillment, and/or on-device execution can be prioritized at least due to the latency reductions they provide when resolving a spoken utterance (due to no client-server roundtrip(s) being needed to resolve the spoken utterance). Further, on-device functionality can be the only functionality that is available in situations with no or limited network connectivity.
In some implementations, the computing device 402 can include one or more applications 434 which can be provided by a third-party entity that is different from an entity that provided the computing device 402 and/or the automated assistant 404. An application state engine of the automated assistant 404 and/or the computing device 402 can access application data 430 to determine one or more actions capable of being performed by one or more applications 434, as well as a state of each application of the one or more applications 434 and/or a state of a respective device that is associated with the computing device 402. A device state engine of the automated assistant 404 and/or the computing device 402 can access device data 432 to determine one or more actions capable of being performed by the computing device 402 and/or one or more devices that are associated with the computing device 402. Furthermore, the application data 430 and/or any other data (e.g., device data 432) can be accessed by the automated assistant 404 to generate contextual data 436, which can characterize a context in which a particular application 434 and/or device is executing, and/or a context in which a particular user is accessing the computing device 402, accessing an application 434, and/or any other device or module.
While one or more applications 434 are executing at the computing device 402, the device data 432 can characterize a current operating state of each application 434 executing at the computing device 402. Furthermore, the application data 430 can characterize one or more features of an executing application 434, such as content of one or more graphical user interfaces being rendered at the direction of one or more applications 434. Alternatively, or additionally, the application data 430 can characterize an action schema, which can be updated by a respective application and/or by the automated assistant 404, based on a current operating status of the respective application. Alternatively, or additionally, one or more action schemas for one or more applications 434 can remain static, but can be accessed by the application state engine in order to determine a suitable action to initialize via the automated assistant 404.
The computing device 402 can further include an assistant invocation engine 422 that can use one or more trained machine learning models to process application data 430, device data 432, contextual data 436, and/or any other data that is accessible to the computing device 402. The assistant invocation engine 422 can process this data in order to determine whether or not to wait for a user to explicitly speak an invocation phrase to invoke the automated assistant 404, or consider the data to be indicative of an intent by the user to invoke the automated assistant—in lieu of requiring the user to explicitly speak the invocation phrase. For example, the one or more trained machine learning models can be trained using instances of training data that are based on scenarios in which the user is in an environment where multiple devices and/or applications are exhibiting various operating states. The instances of training data can be generated in order to capture training data that characterizes contexts in which the user invokes the automated assistant and other contexts in which the user does not invoke the automated assistant. When the one or more trained machine learning models are trained according to these instances of training data, the assistant invocation engine 422 can cause the automated assistant 404 to detect, or limit detecting, spoken invocation phrases from a user based on features of a context and/or an environment.
At block 510, the system transmits, by an ultra-wideband transceiver (e.g., ultra-wideband transceiver 120-1) on a first user-worn device (e.g., wireless earbud 110-1), to an ultra-wideband transceiver (e.g., ultra-wideband transceiver 160-1, 160-2, . . . , 160-n) on a candidate device (e.g., one of the client devices 130-1, 130-2, . . . , 130-n), a first impulse signal. In some implementations, the first user-worn device is a first earbud (e.g., wireless earbud 110-1).
At block 520, the system receives, by the ultra-wideband transceiver (e.g., ultra-wideband transceiver 120-1) on the first user-worn device (e.g., wireless earbud 110-1), from the ultra-wideband transceiver (e.g., ultra-wideband transceiver 160-1, 160-2, . . . , 160-n) on the candidate device (e.g., one of the client devices 130-1, 130-2, . . . , 130-n), a second impulse signal, the second impulse signal being sent as a response to the first impulse signal.
At block 530, the system determines a physical distance between the first user-worn device (e.g., wireless earbud 110-1) and the candidate device (e.g., one of the client devices 130-1, 130-2, . . . , 130-n) based on an elapsed time between transmitting the first impulse signal and receiving the second impulse signal. In some implementations, determining the physical distance between the first user-worn device (e.g., wireless earbud 110-1) and the candidate device (e.g., one of the client devices 130-1, 130-2, . . . , 130-n) is further based on a predetermined time in which the candidate device prepares the response to the first impulse signal.
At block 540, the system transmits, by an ultra-wideband transceiver (e.g., ultra-wideband transceiver 120-2) on a second user-worn device (e.g., wireless earbud 110-2), to the ultra-wideband transceiver (e.g., ultra-wideband transceiver 160-1, 160-2, . . . , 160-n) on the candidate device (e.g., one of the client devices 130-1, 130-2, . . . , 130-n), a third impulse signal. In some implementations, the second user-worn device is a second earbud (e.g., wireless earbud 110-2).
At block 550, the system receives, by the ultra-wideband transceiver (e.g., ultra-wideband transceiver 120-2) on the second user-worn device (e.g., wireless earbud 110-2), from the ultra-wideband transceiver (e.g., ultra-wideband transceiver 160-1, 160-2, . . . , 160-n) on the candidate device (e.g., one of the client devices 130-1, 130-2, . . . , 130-n), a fourth impulse signal, the fourth impulse signal being sent as a response to the third impulse signal.
At block 560, the system determines a physical distance between the second user-worn device (e.g., wireless earbud 110-2) and the candidate device (e.g., one of the client devices 130-1, 130-2, . . . , 130-n) based on an elapsed time between transmitting the third impulse signal and receiving the fourth impulse signal. In some implementations, determining the physical distance between the second user-worn device (e.g., wireless earbud 110-2) and the candidate device (e.g., one of the client devices 130-1, 130-2, . . . , 130-n) is further based on a predetermined time in which the candidate device prepares the response to the second impulse signal.
At block 570, the system determines, based on (i) the physical distance between the first user-worn device (e.g., wireless earbud 110-1) and the candidate device (e.g., one of the client devices 130-1, 130-2, . . . , 130-n) and (ii) the physical distance between the second user-worn device (e.g., wireless earbud 110-2) and the candidate device (e.g., one of the client devices 130-1, 130-2, . . . , 130-n), that the candidate device is a query target. In some implementations, determining that the candidate device is the query target includes:
determining a first absolute value, the first absolute value being an absolute value of a difference between (i) the physical distance between the first user-worn device and the candidate device and (ii) the physical distance between the second user-worn device and the candidate device; and determining that the first absolute value satisfies a first threshold.
In some implementations, determining that the candidate device is the query target at block 570 further includes: determining a second absolute value, the second absolute value being an absolute value of a difference between a ground truth device distance and an average of (i) the physical distance between the first user-worn device and the candidate device and (ii) the physical distance between the second user-worn device and the candidate device; and determining that the second absolute value satisfies a second threshold.
At block 580, in response to determining that the candidate device (e.g., one of the client devices 130-1, 130-2, . . . , 130-n) is the query target, the system causes the candidate device to activate an automated assistant function. In some implementations, at block 580, the system causes the candidate device to perform automated speech recognition on an input audio signal, prior to identifying a hotword in the input audio signal.
Computing device 610 typically includes at least one processor 614 which communicates with a number of peripheral devices via bus subsystem 612. These peripheral devices may include a storage subsystem 624, including, for example, a memory subsystem 625 and a file storage subsystem 626, user interface output devices 620, user interface input devices 622, and a network interface subsystem 616. The input and output devices allow user interaction with computing device 610. Network interface subsystem 616 provides an interface to outside networks and is coupled to corresponding interface devices in other computing devices.
User interface input devices 622 may include a keyboard, pointing devices such as a mouse, trackball, touchpad, or graphics tablet, a scanner, a touchscreen incorporated into the display, audio input devices such as voice recognition systems, microphones, and/or other types of input devices. In general, use of the term “input device” is intended to include all possible types of devices and ways to input information into computing device 610 or onto a communication network.
User interface output devices 620 may include a display subsystem, a printer, a fax machine, or non-visual displays such as audio output devices. The display subsystem may include a cathode ray tube (CRT), a flat-panel device such as a liquid crystal display (LCD), a projection device, or some other mechanism for creating a visible image. The display subsystem may also provide non-visual display such as via audio output devices. In general, use of the term “output device” is intended to include all possible types of devices and ways to output information from computing device 610 to the user or to another machine or computing device.
Storage subsystem 624 stores programming and data constructs that provide the functionality of some or all of the modules described herein. For example, the storage subsystem 624 may include the logic to perform selected aspects of the methods disclosed herein, as well as to implement various components depicted in
These software modules are generally executed by processor 614 alone or in combination with other processors. The memory subsystem 625 included in the storage subsystem 624 can include a number of memories including a main random access memory (RAM) 630 for storage of instructions and data during program execution and a read only memory (ROM) 632 in which fixed instructions are stored. A file storage subsystem 626 can provide persistent storage for program and data files, and may include a hard disk drive, a floppy disk drive along with associated removable media, a CD-ROM drive, an optical drive, or removable media cartridges. The modules implementing the functionality of certain implementations may be stored by file storage subsystem 626 in the storage subsystem 624, or in other machines accessible by the processor(s) 614.
Bus subsystem 612 provides a mechanism for letting the various components and subsystems of computing device 610 communicate with each other as intended. Although bus subsystem 612 is shown schematically as a single bus, alternative implementations of the bus subsystem may use multiple busses.
Computing device 610 can be of varying types including a workstation, server, computing cluster, blade server, server farm, or any other data processing system or computing device. Due to the ever-changing nature of computers and networks, the description of computing device 610 depicted in
While several implementations have been described and illustrated herein, a variety of other means and/or structures for performing the function and/or obtaining the results and/or one or more of the advantages described herein may be utilized, and each of such variations and/or modifications is deemed to be within the scope of the implementations described herein. More generally, all parameters, dimensions, materials, and configurations described herein are meant to be exemplary and that the actual parameters, dimensions, materials, and/or configurations will depend upon the specific application or applications for which the teachings is/are used. Those skilled in the art will recognize, or be able to ascertain using no more than routine experimentation, many equivalents to the specific implementations described herein. It is, therefore, to be understood that the foregoing implementations are presented by way of example only and that, within the scope of the appended claims and equivalents thereto, implementations may be practiced otherwise than as specifically described and claimed. Implementations of the present disclosure are directed to each individual feature, system, article, material, kit, and/or method described herein. In addition, any combination of two or more such features, systems, articles, materials, kits, and/or methods, if such features, systems, articles, materials, kits, and/or methods are not mutually inconsistent, is included within the scope of the present disclosure.
Number | Date | Country | |
---|---|---|---|
63615928 | Dec 2023 | US |