This relates generally to ultrasonic sensing, including but not limited to, utilizing ultrasound in a smart home environment.
Audio devices having microphones and speakers are used extensively. In particular, usage of audio devices in residential and commercial environments has increased substantially, in part due to lower prices.
As consumer demands change and the complexity of home automation and related systems increases, various new challenges, such as occupancy and positioning detection, arise in designing such audio products. For example, audio devices use excess energy to constantly monitor for audio inputs such as key words, or require manual user interaction to “wake up” prior to receiving audio inputs. As another example, audio devices require manual user interaction to adjust volume and directionality in accordance with the user's relative positioning and the audio background.
Accordingly, there is a need for systems and/or devices with more efficient, accurate, and intuitive methods for sensing and interacting with users. Such systems, devices, and methods optionally complement or replace conventional systems, devices, and methods for sensing and interacting with users.
The disclosed methods enable a wide array of electronic devices to use sound navigation and ranging (SONAR) by modifying the operation of audible microphones and speakers existing on devices. By enabling SONAR in this way, the present disclosure solves the engineering problem of intelligent human sensing and scene understanding in a cost-effective manner. The disclosed SONAR systems optionally capture human static occupancy, proximity, human breathing rates, over-the-air gestures such as waving hands, and relative room temperature profiles by time-of-flight differentials. Another advantage is that the use of SONAR in this manner is inexpensive (no additional hardware required), since modified software may be used to push the audio hardware's operating band to the ultrasonic regime, which is inaudible and reflective in natural scenes, and thus a prime mode for human sensing. Another advantage is that detecting users via SONAR allows the device to modulate outgoing signals (e.g., outgoing ultrasonic and audible signals) based on the users' proximity thereby saving energy and reducing interference between devices.
The present disclosure describes an ultrasonic sensing system (e.g., SONAR) enabled by audible-range audio hardware in accordance with some implementations. One advantage of using the ultrasonic sensing system is being able to detect and interpret human breathing cues to better assist a user. Another advantage is being able to deliver human sensing features (e.g., user proximity) while using no specialized hardware (e.g., only audio software modifications). For example, an audio assistant device may use a same set of microphones and speakers to audibly communication with nearby persons (e.g., play music, engage in conversations, listen for instructions, etc.) and to perform ultrasonic sensing. As another example, a display assistant may present audiovisual content while concurrently using the same speakers to perform ultrasonic sensing.
In accordance with some implementations, an electronic device (e.g., an audio device and/or smart device) having one or more microphones and speakers is configured to audibly (e.g., verbally) interact with a user and, while doing so, send and receive ultrasonic pulses (e.g., having frequencies at or above 20 kHz). The ultrasonic pulses are optionally used to determine the user's relative position, and enable the device to adjust operation accordingly. Adjusting operation may include adjusting volume levels and/or directionality, e.g., an audio device with multiple speakers may select the speaker facing the user's direction to communicate with the user. A device with multiple microphones may assign a subset of the microphones to the user based on the user's position. Adjusting operation based on the user's position allows the audio device to conserve energy (and extend battery life), reduce audio interference with other nearby persons and/or audio devices, and provide a better user experience.
In accordance with some implementations, a smart device (e.g., an audio device) includes multiple types of sensors, such as passive infrared (PIR) sensors, ambient light sensors (ALS), microphones (e.g., for audible and/or ultrasound sensing), image sensors, radar modules, and wireless communication (Wi-Fi) signal analysis modules. The sensors are optionally configured to work together to complete tasks. For example, the smart device may operate in a low power mode where some of the sensors are disabled (e.g., the image sensor, radar module, ultrasound module, etc.). In this example, the smart device may use a low power sensor, such as a PIR sensor or ALS, to detect motion and then “wake up,” e.g., activate one or more of the disabled sensors. A specific example would be detecting motion with a PIR sensor and then enabling a camera, radar module, or ultrasound module to characterize and/or track the motion. As another example, in a low light situation, the smart device may detect motion via Wi-Fi signal analysis and then enable the radar module, ultrasound module, or lights (e.g., IR LEDs) and the camera to characterize and/or track the motion.
In accordance with some implementations, a plurality of smart devices are communicatively coupled to one another. The smart devices may include a variety of device types with distinct device capabilities. In some implementations, the smart devices work together to detect, characterize, and respond to events. For example, one or more of the smart devices may receive a request from a user. In this example, the request is processed and a visual and/or audible response is identified. To present the response to the user, the smart devices determine a location of the user (e.g., via radar or ultrasound), determine relative positioning of other smart devices (e.g., via Wi-Fi signal analysis), identify any obstructions between the smart devices and the user (e.g., via radar or ultrasound), and select an optimal device to respond to the user (e.g., based on the type of response, positioning of the devices and user, obstructions, and individual device capabilities).
In one aspect, some implementations include a method performed at an audio device having memory, one or more processors, a speaker, and a microphone. The method includes, while audibly communicating with a user via the speaker and microphone: (1) sending one or more ultrasound pulses via the speaker; (2) receiving, via the microphone, one or more signals corresponding to the one or more ultrasound pulses; and (3) determining positioning of the user based on the one or more received signals.
In some implementations, the method further includes adjusting one or more parameters of the speaker and/or microphone based on the determined positioning of the user.
In some implementations: (1) the method further includes: (a) scanning ultrasound environs of the audio device; and (b) determining, based on the scanning, one or more ultrasound parameters for the one or more ultrasound pulses; and (2) the one or more ultrasound pulses are sent with the one or more ultrasound parameters.
In another aspect, some implementations include a method performed at an audio device having memory, one or more processors, a speaker, and a microphone. The method includes: (1) sending a first set of ultrasound chirps at a first rate via the speaker; (2) receiving, via the microphone, a first set of signals corresponding to the first set of ultrasound chirps; (3) determining based on the first set of signals that a person is in proximity to the audio device; and (4) in accordance with the determination that the person is in proximity to the audio device, sending a second set of ultrasound chirps at a second rate, faster than the first rate
In another aspect, some implementations include smart or audio device having one or more processors; a microphone; a speaker; and memory storing one or more instructions that, when executed by the one or more processors perform any of the methods described herein.
In another aspect, some implementations include a non-transitory computer-readable storage medium storing one or more programs, the one or more programs comprising instructions which, when executed by a (smart or audio) device, cause the device to perform any of the methods described herein.
Thus, devices are provided with more efficient and effective methods for detecting and interacting with users, thereby increasing the accuracy, effectiveness, efficiency, and user satisfaction with such devices, while reducing power consumption and extending battery life. Such devices and methods may complement or replace conventional systems and methods for detecting and interacting with users.
For a better understanding of the various described implementations, reference should be made to the Description of Implementations below, in conjunction with the following drawings in which like reference numerals refer to corresponding parts throughout the figures.
A smart home environment may include features that are confounding to various sensors, such as walls and visual obstacles, low light conditions, and atmospheric changes. In addition, multiple smart devices operating in proximity with one another within the smart home environment cause additional interference, e.g., radio interference, infrared interference, and the like. These confounding features interfere with interactions between the smart devices and the user, thereby decreasing accuracy, effectiveness, efficiency, and user satisfaction with such devices.
In accordance with some implementations, a smart device utilizes one or more sensors that can overcome the confounding features, such as an ultrasound module that can detect a user in low light conditions, or a radar module that can detect a user through visual obstacles and in low light. In some implementations, the smart device utilizes a plurality of sensors to detect and interact with a user. For example, the smart device uses a PIR sensor to determine if a user is in proximity to the device, then uses ultrasound to determine the user's positioning and/or interpret the user's gestures, posture, breathing cues, and the like. In this example, the smart device may use an image sensor to identify the user, a microphone to capture audible user requests, radar to track a user as the user passes behind an object, and Wi-Fi signal analysis to determine positioning of other smart devices (e.g., to handoff the user interaction when appropriate, or to modulate signal outputs to reduce interference with the other devices). In this way, the smart device is enabled to overcome the confounding features of the smart home environment and ensure a better user interaction.
Additionally, the smart device can utilize multiple sensors (or multiple settings of a particular sensor) to conserve energy and increase battery life. For example, a device may operate in a low energy “sleep” mode in which higher-energy sensors such as image sensors and radar modules are disabled to conserve energy. In this example, the device may use a PIR or ALS sensor to “wake up” and enable the higher-energy sensors. In another example, a device operates an ultrasound module in a lower power mode, in which pulses are emitted on a low duty cycle (e.g., 1 pulse every 500 milliseconds, 1 second, or 3 seconds). In this example, once motion is detected in the lower power mode, the device transitions to a higher power mode, in which pulses are emitted more frequently (e.g., 1 pulse every 10 milliseconds, 50 milliseconds, or 100 milliseconds). In addition, once the device determines a user's relative positioning, the device may modulate the output to conserve energy and reduce potential interference with other nearby devices (e.g., modulate duty cycle, frequency, amplitude, and/or phase of the signals). For example, in some implementations the device uses a +3 dB ultrasonic signal to detect if a user is in proximity to the device. In this example, once the user is detected within the proximity (e.g., within 10 feet, 5 feet, or 3 feet), the device switches to a +1 dB signal (e.g., optionally with a higher duty cycle). In this way, once a user is identified as being in proximity to the device, ultrasonic pulse energy can be reduced as the ultrasonic pulses do not travel need to travel as far as when the device is in detection mode, which allows the device to conserve energy.
Table 1 below illustrates the types of sensors that may be included on a smart device as well as example use cases for each type of sensor.
As shown in Table 1, each sensor type has particular uses and advantages. However, each sensor is also potentially susceptible to certain confounding factors. For example: (1) acoustic interference may confound ultrasound imaging components and microphones; (2) changes in atmosphere pressure and temperature may confound ultrasound imaging components and PIR sensors; (3) multiple entities that are equidistance from the sensor may confound ultrasound or radar components using a single receiver; (4) radio interference may confound a radar components and a wireless communication (Wi-Fi) signal analysis components; (5) infrared interference may confound a PIR sensor; (6) visual obstructions may confound visual imaging, PIR, ALS, and ultrasound components; and (7) low light conditions may confound visual imaging components. Also, since radar can penetrate walls and objects, it may be difficult for the radar component to determine what entities are in line-of-sight of the device.
In various implementations, the devices described herein (e.g., the electronic, audio, and/or smart devices) include all or a subset of the sensors shown in Table 1 above. In some implementations, the devices described herein include a set of sensors selected to complement one another and overcome various confounding factors as discussed above. In some implementations, the devices described herein include a first set of sensors operable with low power and a second set of sensors that have higher power consumption. In some implementations, the second set of sensors are disabled, or in a sleep mode, until activated based on data from the first set of sensors, thereby saving power consumption at the device.
In some implementations, the devices described herein utilize one or more device interface elements (e.g., microphones and speakers) for multiple purposes. For example, an audio assistant and a display assistant optionally utilize the microphones and speakers for assistant functionalities as well as for ultrasonic scanning (e.g., as illustrated in
Turning now to the figures,
It is to be appreciated that “smart home environments” may refer to smart environments for homes such as a single-family house, but the scope of the present teachings is not so limited. The present teachings are also applicable, without limitation, to duplexes, townhomes, multi-unit apartment buildings, hotels, retail stores, office buildings, industrial buildings, and more generally any living space or work space.
It is also to be appreciated that while the terms user, customer, installer, homeowner, occupant, guest, tenant, landlord, repair person, and the like may be used to refer to the person or persons acting in the context of some particularly situations described herein, these references do not limit the scope of the present teachings with respect to the person or persons who are performing such actions. Thus, for example, the terms user, customer, purchaser, installer, subscriber, and homeowner may often refer to the same person in the case of a single-family residential dwelling, because the head of the household is often the person who makes the purchasing decision, buys the unit, and installs and configures the unit, and is also one of the users of the unit. However, in other scenarios, such as a landlord-tenant environment, the customer may be the landlord with respect to purchasing the unit, the installer may be a local apartment supervisor, a first user may be the tenant, and a second user may again be the landlord with respect to remote control functionality. Importantly, while the identity of the person performing the action may be germane to a particular advantage provided by one or more of the implementations, such identity should not be construed in the descriptions that follow as necessarily limiting the scope of the present teachings to those particular individuals having those particular identities.
The depicted structure 150 includes a plurality of rooms 152, separated at least partly from each other via walls 154. The walls 154 may include interior walls or exterior walls. Each room may further include a floor 156 and a ceiling 158. Devices may be mounted on, integrated with and/or supported by a wall 154, floor 156 or ceiling 158.
In some implementations, the integrated devices of the smart home environment 100 include intelligent, multi-sensing, network-connected devices that integrate seamlessly with each other in a smart home network (e.g., 202
In some implementations, the one or more smart thermostats 102 detect ambient climate characteristics (e.g., temperature and/or humidity) and control a HVAC system 103 accordingly. For example, a respective smart thermostat 102 includes an ambient temperature sensor.
The one or more smart hazard detectors 104 may include thermal radiation sensors directed at respective heat sources (e.g., a stove, oven, other appliances, a fireplace, etc.). For example, a smart hazard detector 104 in a kitchen 153 includes a thermal radiation sensor directed at a stove/oven 112. A thermal radiation sensor may determine the temperature of the respective heat source (or a portion thereof) at which it is directed and may provide corresponding blackbody radiation data as output.
The smart doorbell 106 and/or the smart door lock 120 detects a person's approach to or departure from a location (e.g., an outer door), control doorbell/door locking functionality (e.g., receive user inputs from a portable electronic device 166 to actuate bolt of the smart door lock 120), announce a person's approach or departure via audio or visual means, and/or control settings on a security system (e.g., to activate or deactivate the security system when occupants go and come). In some implementations, the smart doorbell 106 and/or the smart lock 120 are battery-powered (e.g., are not line-powered). In some implementations, the smart doorbell 106 includes some or all of the components and features of the camera 118. In some implementations, the smart doorbell 106 includes a camera 118. In some implementations, the smart doorbell 106 includes a camera 118 that is embedded in the doorbell 106. In some implementations, the smart doorbell 106 includes a camera that is mounted on or near the doorbell 106. In some implementations, the smart doorbell 106 includes a camera 118 that is not mounted in, on, or near the doorbell 106, but is instead mounted in proximity to the doorbell 106. In some implementations, the smart doorbell 106 includes two or more cameras 118 (e.g., one camera facing the entryway, and another camera facing approaching visitors). In some implementations, the smart doorbell 106 has a camera (also sometimes referred to herein as doorbell camera 106) which is separate from a video camera 118. For the purposes of this disclosure, video-related references to doorbell 106 refer to one or more cameras associated with doorbell 106.
The smart alarm system 122 may detect the presence of an individual within close proximity (e.g., using built-in IR sensors), sound an alarm (e.g., through a built-in speaker, or by sending commands to one or more external speakers), and send notifications to entities or users within/outside of the smart home network 100. In some implementations, the smart alarm system 122 also includes one or more input devices or sensors (e.g., keypad, biometric scanner, NFC transceiver, microphone) for verifying the identity of a user, and one or more output devices (e.g., display, speaker). In some implementations, the smart alarm system 122 may also be set to an “armed” mode, such that detection of a trigger condition or event causes the alarm to be sounded unless a disarming action is performed.
In some implementations, the smart home environment 100 includes one or more intelligent, multi-sensing, network-connected wall switches 108 (hereinafter referred to as “smart wall switches 108”), along with one or more intelligent, multi-sensing, network-connected wall plug interfaces 110 (hereinafter referred to as “smart wall plugs 110”). The smart wall switches 108 detect ambient lighting conditions, detect room-occupancy states, and/or control a power and/or dim state of one or more lights. In some instances, smart wall switches 108 also control a power state or speed of a fan, such as a ceiling fan. The smart wall plugs 110 may detect occupancy of a room or enclosure and control supply of power to one or more wall plugs (e.g., such that power is not supplied to the plug if nobody is at home).
In some implementations, the smart home environment 100 of
In some implementations, the smart home environment 100 includes one or more network-connected cameras 118 that are configured to provide video monitoring and security in the smart home environment 100. In some implementations, the cameras 118 are battery-powered (e.g., are not line-powered). In some implementations, as described in more detail below, the cameras 118 are configured to selectively couple to one or more networks and/or selectively capture, store, transmit video data (e.g., based on presence and characterization of motion within the field of view). In some implementations, in the low power mode, a camera 118 detects an approaching visitor using a low power sensor, such as a PIR sensor, which is always on or periodically on.
In some implementations, the cameras 118 are used to determine occupancy of the structure 150 and/or particular rooms 152 in the structure 150, and thus act as occupancy sensors. For example, video captured by the cameras 118 may be processed to identify the presence of an occupant in the structure 150 (e.g., in a particular room 152). Specific individuals may be identified based, for example, on their appearance (e.g., height, face) and/or movement (e.g., their walk/gait). Cameras 118 may additionally include one or more sensors (e.g., IR sensors, motion detectors), input devices (e.g., microphone for capturing audio), and output devices (e.g., speaker for outputting audio). In some implementations, the cameras 118 are each configured to operate in a day mode and in a low-light mode (e.g., a night mode). In some implementations, the cameras 118 each include one or more IR illuminators for providing illumination while the camera is operating in the low-light mode. In some implementations, the cameras 118 include one or more outdoor cameras. In some implementations, the outdoor cameras include additional features and/or components such as weatherproofing and/or solar ray compensation.
In some implementations, the smart home environment 100 includes one or more network-connected doorbells 106 that are configured to provide video monitoring and security in a vicinity of an entryway of the smart home environment 100. The doorbells 106 are optionally used to determine the approach and/or presence of a visitor. Specific individuals are optionally identified based, for example, on their appearance (e.g., height, face) and/or movement (e.g., their walk/gait). A doorbell 106 optionally includes one or more sensors (e.g., IR sensors, motion detectors), input devices (e.g., microphone for capturing audio), and output devices (e.g., speaker for outputting audio). In some implementations, a doorbell 106 is configured to operate in a high-light mode (e.g., a day mode) and in a low-light mode (e.g., a night mode). In some implementations, a doorbell 106 includes one or more IR illuminators for providing illumination while the camera is operating in the low-light mode. In some implementations, a doorbell 106 includes one or more lights (e.g., one or more LEDs) for illuminating the doorbell in low-light conditions and/or giving visual feedback to a visitor. In some implementations, a doorbell 106 includes additional features and/or components such as weatherproofing and/or solar ray compensation. In some implementations, doorbell 106 is battery powered and runs in a low power or a high power mode. In some implementations, in the low power mode, doorbell 106 detects an approaching visitor using a low power sensor such as a PIR sensor which is always on or periodically on. In some implementations, after the visitor approach is detected, doorbell 106 switches to the high power mode to carry out further processing functions (described below).
In some implementations, the smart home environment 100 additionally or alternatively includes one or more other occupancy sensors (e.g., the smart doorbell 106, smart door locks 120, touch screens, IR sensors, microphones, ambient light sensors, motion detectors, smart nightlights 170, etc.). In some implementations, the smart home environment 100 includes radio-frequency identification (RFID) readers (e.g., in each room 152 or a portion thereof) that determine occupancy based on RFID tags located on or embedded in occupants. For example, RFID readers may be integrated into the smart hazard detectors 104.
In some implementations, the smart home environment 100 includes one or more devices outside of the physical home but within a proximate geographical range of the home. For example, the smart home environment 100 may include a pool heater monitor 114 that communicates a current pool temperature to other devices within the smart home environment 100 and/or receives commands for controlling the pool temperature. Similarly, the smart home environment 100 may include an irrigation monitor 116 that communicates information regarding irrigation systems within the smart home environment 100 and/or receives control information for controlling such irrigation systems.
By virtue of network connectivity, one or more of the smart home devices of
As discussed above, users may control smart devices in the smart home environment 100 using a network-connected computer or portable electronic device 166. In some examples, some or all of the occupants (e.g., individuals who live in the home) may register their device 166 with the smart home environment 100. Such registration may be made at a central server to authenticate the occupant and/or the device as being associated with the home and to give permission to the occupant to use the device to control the smart devices in the home. An occupant may use their registered device 166 to remotely control the smart devices of the home, such as when the occupant is at work or on vacation. The occupant may also use their registered device to control the smart devices when the occupant is actually located inside the home, such as when the occupant is sitting on a couch inside the home. It should be appreciated that instead of or in addition to registering devices 166, the smart home environment 100 may make inferences about which individuals live in the home and are therefore occupants and which devices 166 are associated with those individuals. As such, the smart home environment may “learn” who is an occupant and permit the devices 166 associated with those individuals to control the smart devices of the home.
In some implementations, in addition to containing processing and sensing capabilities, the devices 102, 104, 106, 108, 110, 112, 114, 116, 118, 120, and/or 122 (collectively referred to as “the smart devices”) are capable of data communications and information sharing with other smart devices, a central server or cloud-computing system, and/or other devices that are network-connected. Data communications may be carried out using any of a variety of custom or standard wireless protocols (e.g., IEEE 802.15.4, Wi-Fi, ZigBee, 6LoWPAN, Thread, Z-Wave, Bluetooth Smart, ISA100.5A, WirelessHART, MiWi, etc.) and/or any of a variety of custom or standard wired protocols (e.g., Ethernet, HomePlug, etc.), or any other suitable communication protocol, including communication protocols not yet developed as of the filing date of this document.
In some implementations, the smart devices serve as wireless or wired repeaters. In some implementations, a first one of the smart devices communicates with a second one of the smart devices via a wireless router. The smart devices may further communicate with each other via a connection (e.g., network interface 160) to a network, such as the Internet 162. Through the Internet 162, the smart devices may communicate with a server system 164 (also called a central server system and/or a cloud-computing system herein). The server system 164 may be associated with a manufacturer, support entity, or service provider associated with the smart device(s). In some implementations, a user is able to contact customer support using a smart device itself rather than needing to use other communication means, such as a telephone or Internet-connected computer. In some implementations, software updates are automatically sent from the server system 164 to smart devices (e.g., when available, when purchased, or at routine intervals).
In some implementations, the network interface 160 includes a conventional network device (e.g., a router), and the smart home environment 100 of
In some implementations, smart home environment 100 includes a local storage device 190 for storing data related to, or output by, smart devices of smart home environment 100. In some implementations, the data includes one or more of: video data output by a camera device (e.g., a camera included with doorbell 106), metadata output by a smart device, settings information for a smart device, usage logs for a smart device, and the like. In some implementations, local storage device 190 is communicatively coupled to one or more smart devices via a smart home network (e.g., smart home network 202,
In some implementations, some low-power nodes are incapable of bidirectional communication. These low-power nodes send messages, but they are unable to “listen”. Thus, other devices in the smart home environment 100, such as the spokesman nodes, cannot send information to these low-power nodes.
In some implementations, some low-power nodes are capable of only a limited bidirectional communication. For example, other devices are able to communicate with the low-power nodes only during a certain time period.
As described, in some implementations, the smart devices serve as low-power and spokesman nodes to create a mesh network in the smart home environment 100. In some implementations, individual low-power nodes in the smart home environment regularly send out messages regarding what they are sensing, and the other low-powered nodes in the smart home environment—in addition to sending out their own messages—forward the messages, thereby causing the messages to travel from node to node (i.e., device to device) throughout the smart home network 202. In some implementations, the spokesman nodes in the smart home network 202, which are able to communicate using a relatively high-power communication protocol, such as IEEE 802.11, are able to switch to a relatively low-power communication protocol, such as IEEE 802.15.4, to receive these messages, translate the messages to other communication protocols, and send the translated messages to other spokesman nodes and/or the server system 164 (using, e.g., the relatively high-power communication protocol). Thus, the low-powered nodes using low-power communication protocols are able to send and/or receive messages across the entire smart home network 202, as well as over the Internet 162 to the server system 164. In some implementations, the mesh network enables the server system 164 to regularly receive data from most or all of the smart devices in the home, make inferences based on the data, facilitate state synchronization across devices within and outside of the smart home network 202, and send commands to one or more of the smart devices to perform tasks in the smart home environment.
As described, the spokesman nodes and some of the low-powered nodes are capable of “listening.” Accordingly, users, other devices, and/or the server system 164 may communicate control commands to the low-powered nodes. For example, a user may use the electronic device 166 (e.g., a smart phone) to send commands over the Internet to the server system 164, which then relays the commands to one or more spokesman nodes in the smart home network 202. The spokesman nodes may use a low-power protocol to communicate the commands to the low-power nodes throughout the smart home network 202, as well as to other spokesman nodes that did not receive the commands directly from the server system 164.
In some implementations, a smart nightlight 170 (
Other examples of low-power nodes include battery-powered versions of the smart hazard detectors 104, cameras 118, doorbells 106, and the like. These battery-powered smart devices are often located in an area without access to constant and reliable power and optionally include any number and type of sensors, such as image sensor(s), occupancy/motion sensors, ambient light sensors, ambient temperature sensors, humidity sensors, smoke/fire/heat sensors (e.g., thermal radiation sensors), carbon monoxide/dioxide sensors, and the like. Furthermore, battery-powered smart devices may send messages that correspond to each of the respective sensors to the other devices and/or the server system 164, such as by using the mesh network as described above.
Examples of spokesman nodes include line-powered smart doorbells 106, smart thermostats 102, smart wall switches 108, and smart wall plugs 110. These devices are located near, and connected to, a reliable power source, and therefore may include more power-consuming components, such as one or more communication chips capable of bidirectional communication in a variety of protocols.
In some implementations, the smart home environment 100 includes service robots 168 (
As explained above with reference to
In some implementations, a multiple reviewer accounts are linked to a single smart home environment 100. For example, multiple occupants of a smart home environment 100 may have accounts liked to the smart home environment. In some implementations, each reviewer account is associated with a particular level of access. In some implementations, each reviewer account has personalized notification settings. In some implementations, a single reviewer account is linked to multiple smart home environments 100. For example, a person may own or occupy, or be assigned to review and/or govern, multiple smart home environments 100. In some implementations, the reviewer account has distinct levels of access and/or notification settings for each smart home environment.
In some implementations, each of the video sources 222 includes one or more video cameras 118 or doorbell cameras 106 that capture video and send the captured video to the server system 164 substantially in real-time. In some implementations, each of the video sources 222 includes one or more doorbell cameras 106 that capture video and send the captured video to the server system 164 in real-time (e.g., within 1 second, 10 seconds, 30 seconds, or 1 minute). In some implementations, each of the doorbells 106 include a video camera that captures video and sends the captured video to the server system 164 in real-time. In some implementations, a video source 222 includes a controller device (not shown) that serves as an intermediary between the one or more doorbells 106 and the server system 164. The controller device receives the video data from the one or more doorbells 106, optionally performs some preliminary processing on the video data, and sends the video data and/or the results of the preliminary processing to the server system 164 on behalf of the one or more doorbells 106 (e.g., in real-time). In some implementations, each camera has its own on-board processing capabilities to perform some preliminary processing on the captured video data before sending the video data (e.g., along with metadata obtained through the preliminary processing) to the controller device and/or the server system 164. In some implementations, one or more of the cameras is configured to optionally locally store the video data (e.g., for later transmission if requested by a user). In some implementations, a camera is configured to perform some processing of the captured video data, and, based on the processing, either send the video data in substantially real-time, store the video data locally, or disregard the video data.
In accordance with some implementations, a client device 220 includes a client-side module or smart home application, such as client-side module 528 in
In some implementations, the server system 164 includes one or more processors 212, a video storage database 210, an account database 214, an I/O interface to one or more client devices 216, and an I/O interface to one or more video sources 218. The I/O interface to one or more clients 216 facilitates the client-facing input and output processing. The account database 214 stores a plurality of profiles for reviewer accounts registered with the video processing server, where a respective user profile includes account credentials for a respective reviewer account, and one or more video sources linked to the respective reviewer account. The I/O interface to one or more video sources 218 facilitates communications with one or more video sources 222 (e.g., groups of one or more doorbells 106, cameras 118, and associated controller devices). The video storage database 210 stores raw video data received from the video sources 222, as well as various types of metadata, such as motion events, event categories, event category models, event filters, and event masks, for use in data processing for event monitoring and review for each reviewer account.
Examples of a representative client device 220 include a handheld computer, a wearable computing device, a personal digital assistant (PDA), a tablet computer, a laptop computer, a desktop computer, a cellular telephone, a smart phone, an enhanced general packet radio service (EGPRS) mobile phone, a media player, a navigation device, a game console, a television, a remote control, a point-of-sale (POS) terminal, a vehicle-mounted computer, an ebook reader, or a combination of any two or more of these data processing devices or other data processing devices.
Examples of the one or more networks 162 include local area networks (LAN) and wide area networks (WAN) such as the Internet. The one or more networks 162 are implemented using any known network protocol, including various wired or wireless protocols, such as Ethernet, Universal Serial Bus (USB), FIREWIRE, Long Term Evolution (LTE), Global System for Mobile Communications (GSM), Enhanced Data GSM Environment (EDGE), code division multiple access (CDMA), time division multiple access (TDMA), Bluetooth, Wi-Fi, voice over Internet Protocol (VoIP), Wi-MAX, or any other suitable communication protocol.
In some implementations, the server system 164 is implemented on one or more standalone data processing apparatuses or a distributed network of computers. In some implementations, the server system 164 also employs various virtual devices and/or services of third party service providers (e.g., third-party cloud service providers) to provide the underlying computing resources and/or infrastructure resources of the server system 164. In some implementations, the server system 164 includes, but is not limited to, a server computer, a cloud server, a distributed cloud computing system, a handheld computer, a tablet computer, a laptop computer, a desktop computer, or a combination of any two or more of these data processing devices or other data processing devices.
In some implementations, a server-client environment includes both a client-side portion (e.g., the client-side module) and a server-side portion (e.g., the server-side module). The division of functionality between the client and server portions of operating environment can vary in different implementations. Similarly, the division of functionality between a video source 222 and the server system 164 can vary in different implementations. For example, in some implementations, the client-side module is a thin-client that provides only user-facing input and output processing functions, and delegates all other data processing functionality to a backend server (e.g., the server system 164). Similarly, in some implementations, a respective one of the video sources 222 is a simple video capturing device that continuously captures and streams video data to the server system 164 with limited or no local preliminary processing on the video data. Although many aspects of the present technology are described from the perspective of the server system 164, the corresponding actions performed by a client device 220 and/or the video sources 222 would be apparent to one of skill in the art. Similarly, some aspects of the present technology may be described from the perspective of a client device or a video source, and the corresponding actions performed by the video server would be apparent to one of skill in the art. Furthermore, some aspects may be performed by the server system 164, a client device 220, and a video source 222 cooperatively.
In some implementations, a video source 222 (e.g., a camera 118 or doorbell 106 having an image sensor) transmits one or more streams of video data to the server system 164. In some implementations, the one or more streams include multiple streams, of respective resolutions and/or frame rates, of the raw video captured by the image sensor. In some implementations, the multiple streams include a “primary” stream (e.g., 226-1) with a certain resolution and frame rate (e.g., corresponding to the raw video captured by the image sensor), and one or more additional streams (e.g., 226-2 through 226-q). An additional stream is optionally the same video stream as the “primary” stream but at a different resolution and/or frame rate, or a stream that captures a portion of the “primary” stream (e.g., cropped to include a portion of the field of view or pixels of the primary stream) at the same or different resolution and/or frame rate as the “primary” stream. In some implementations, the primary stream and/or the additional streams are dynamically encoded (e.g., based on network conditions, server operating conditions, camera operating conditions, characterization of data in the stream (e.g., whether motion is present), user preferences, and the like.
In some implementations, one or more of the streams 226 is sent from the video source 222 directly to a client device 220 (e.g., without being routed to, or processed by, the server system 164). In some implementations, one or more of the streams is stored at the doorbell 106 (e.g., in memory 426,
In some implementations, the server system 164 transmits one or more streams of video data to a client device 220 to facilitate event monitoring by a user. In some implementations, the one or more streams may include multiple streams, of respective resolutions and/or frame rates, of the same video feed. In some implementations, the multiple streams include a “primary” stream with a certain resolution and frame rate, corresponding to the video feed, and one or more additional streams. An additional stream may be the same video stream as the “primary” stream but at a different resolution and/or frame rate, or a stream that shows a portion of the “primary” stream (e.g., cropped to include portion of the field of view or pixels of the primary stream) at the same or different resolution and/or frame rate as the “primary” stream.
The server system 164 receives one or more video stream(s) 246 from the video source 241 (e.g., a video source 222 from
A data processing pipeline processes video information (e.g., a live video feed) received from a video source 241 (e.g., including a doorbell 106 and an optional controller device) and/or audio information received from one or more smart devices in real-time (e.g., within 10 seconds, 30 seconds, or 2 minutes) to identify and categorize events occurring in the smart home environment, and sends real-time event alerts (e.g., within 10 seconds, 20 seconds, or 30 seconds) and/or a refreshed event timeline (e.g., within 30 seconds, 1 minute, or 3 minutes) to a client device 220 associated with a reviewer account for the smart home environment. The data processing pipeline also processes stored information (such as stored video feeds from a video source 241) to reevaluate and/or re-categorize events as necessary, such as when new information is obtained regarding the event and/or when new information is obtained regarding event categories (e.g., a new activity zone definition is obtained from the user).
After video and/or audio data is captured at a smart device, the data is processed to determine if any potential event candidates or persons are present. In some implementations, the data is initially processed at the smart device (e.g., video source 241, camera 118, or doorbell 106). Thus, in some implementations, the smart device sends event candidate information, such as event start information, to the server system 164. In some implementations, the data is processed at the server system 164 for event start detection. In some implementations, the video and/or audio data is stored on server system 164 (e.g., in video source database 256). In some implementations, the visual/audio data is stored on a server distinct from server system 164. In some implementations, after a motion start is detected, the relevant portion of the video stream is retrieved from storage (e.g., from video source database 256).
In some implementations, the event identification process includes segmenting the video stream into multiple segments then categorizing the event candidate within each segment. In some implementations, categorizing the event candidate includes an aggregation of background factors, entity detection and identification, motion vector generation for each motion entity, entity features, and scene features to generate motion features for the event candidate. In some implementations, the event identification process further includes categorizing each segment, generating or updating an event log based on categorization of a segment, generating an alert for the event based on categorization of a segment, categorizing the complete event, updating the event log based on the complete event, and generating an alert for the event based on the complete event. In some implementations, a categorization is based on a determination that the event occurred within a particular zone of interest. In some implementations, a categorization is based on a determination that the event candidate involves one or more zones of interest. In some implementations, a categorization is based on audio data and/or audio event characterization.
The event analysis and categorization process may be performed by the smart device (e.g., the video source 241) and the server system 164 cooperatively, and the division of the tasks may vary in different implementations, for different equipment capability configurations, power parameters, and/or for different network, device, and server load situations. After the server system 164 categorizes the event candidate, the result of the event detection and categorization may be sent to a reviewer associated with the smart home environment.
In some implementations, the server system 164 stores raw or compressed video data (e.g., in a video source database 256), event categorization models (e.g., in an event categorization model database 260), and event masks and other event metadata (e.g., in an event data and event mask database 262) for each of the video sources 241. In some implementations, the video data is stored at one or more display resolutions such as 480p, 780p, 1080i, 1080p, and the like.
In some implementations, the video source 241 (e.g., the doorbell 106) transmits a live video feed to the remote server system 164 via one or more networks (e.g., the network(s) 162). In some implementations, the transmission of the video data is continuous as the video data is captured by the doorbell 106. In some implementations, the transmission of video data is irrespective of the content of the video data, and the video data is uploaded from the video source 241 to the server system 164 for storage irrespective of whether any motion event has been captured in the video data. In some implementations, the video data is stored at a local storage device of the video source 241 by default, and only video portions corresponding to motion event candidates detected in the video stream are uploaded to the server system 164 (e.g., in real-time or as requested by a user).
In some implementations, the video source 241 dynamically determines at what display resolution the video stream is to be uploaded to the server system 164. In some implementations, the video source 241 dynamically determines which parts of the video stream are to be uploaded to the server system 164. For example, in some implementations, depending on the current server load and network conditions, the video source 241 optionally prioritizes the uploading of video portions corresponding to newly detected motion event candidates ahead of other portions of the video stream that do not contain any motion event candidates; or the video source 241 uploads the video portions corresponding to newly detected motion event candidates at higher display resolutions than the other portions of the video stream. This upload prioritization helps to ensure that important motion events are detected and alerted to the reviewer in real-time, even when the network conditions and server load are less than optimal. In some implementations, the video source 241 implements two parallel upload connections, one for uploading the continuous video stream captured by the doorbell 106, and the other for uploading video portions corresponding to detected motion event candidates. At any given time, the video source 241 determines whether the uploading of the continuous video stream needs to be suspended temporarily to ensure that sufficient bandwidth is given to the uploading of the video segments corresponding to newly detected motion event candidates.
In some implementations, the video stream uploaded for cloud storage is at a lower quality (e.g., lower resolution, lower frame rate, higher compression, etc.) than the video segments uploaded for motion event processing.
As shown in
In some implementations, the smart device sends additional source information 503 to the server system 164. This additional source information 244 may include information regarding a device state (e.g., IR mode, AE mode, DTPZ settings, etc.) and/or information regarding the environment in which the device is located (e.g., indoors, outdoors, night-time, day-time, etc.). In some implementations, the source information 244 is used by the server system 164 to perform event detection, entity recognition, and/or to categorize event candidates. In some implementations, the additional source information 244 includes one or more preliminary results from video processing performed by the video source 241 (e.g., a doorbell 106), such as categorizations, object/entity recognitions, motion masks, and the like.
In some implementations, the video portion after an event start incident is detected is divided into multiple segments. In some implementations, the segmentation continues until event end information (sometimes also called an “end-of-event signal”) is obtained. In some implementations, the segmentation occurs within the server system 164 (e.g., by the event processor 248). In some implementations, the segmentation comprises generating overlapping segments. For example, a 10-second segment is generated every second, such that a new segment overlaps the prior segment by 9 seconds.
In some implementations, each of the multiple segments is of the same or similar duration (e.g., each segment has a 10-12 second duration). In some implementations, the first segment has a shorter duration than the subsequent segments. Keeping the first segment short allows for real time initial categorization and alerts based on processing the first segment. The initial categorization may then be revised based on processing of subsequent segments. In some implementations, a new segment is generated if the motion entity enters a new zone of interest.
In some implementations, after the event processor module obtains the video portion corresponding to an event candidate, the event processor 248 obtains background factors and performs motion entity detection identification, motion vector generation for each motion entity, and feature identification. Once the event processor 248 completes these tasks, the event categorizer 252 aggregates all of the information and generates a categorization for the motion event candidate. In some implementations, the event processor 248 and the event categorizer 252 are components of the video processing module 322 (
In some implementations, the video source 241 has sufficient processing capabilities to perform, and does perform, entity detection, person recognition, background estimation, motion entity identification, the motion vector generation, and/or the feature identification.
Each of the above identified elements may be stored in one or more of the previously mentioned memory devices, and corresponds to a set of instructions for performing a function described above. The above identified modules or programs (i.e., sets of instructions) need not be implemented as separate software programs, procedures, or modules, and thus various subsets of these modules may be combined or otherwise rearranged in various implementations. In some implementations, the memory 306, optionally, stores a subset of the modules and data structures identified above. Furthermore, the memory 306, optionally, stores additional modules and data structures not described above (e.g., an account management module for linking client devices, smart devices, and smart home environments).
The sensor(s) 422 include, for example, one or more thermal radiation sensors, ambient temperature sensors, humidity sensors, infrared (IR) sensors such as passive infrared (PIR) sensors, proximity sensors, range sensors, occupancy sensors (e.g., using RFID sensors), ambient light sensors (ALS), motion sensors 424, location sensors (e.g., GPS sensors), accelerometers, and/or gyroscopes.
The communication interfaces 404 include, for example, hardware capable of data communications using any of a variety of custom or standard wireless protocols (e.g., IEEE 802.15.4, Wi-Fi, ZigBee, 6LoWPAN, Thread, Z-Wave, Bluetooth Smart, ISA100.5A, WirelessHART, MiWi, etc.) and/or any of a variety of custom or standard wired protocols (e.g., Ethernet, HomePlug, etc.), or any other suitable communication protocol, including communication protocols not yet developed as of the filing date of this document. The radios 406 enable one or more radio communication networks in the smart home environments, and enable a smart device 204 to communicate with other devices. In some implementations, the radios 406 are capable of data communications using any of a variety of custom or standard wireless protocols (e.g., IEEE 802.15.4, Wi-Fi, ZigBee, 6LoWPAN, Thread, Z-Wave, Bluetooth Smart, ISA100.5A, WirelessHART, MiWi, etc.).
The memory 426 includes high-speed random access memory, such as DRAM, SRAM, DDR RAM, or other random access solid state memory devices; and, optionally, includes non-volatile memory, such as one or more magnetic disk storage devices, one or more optical disk storage devices, one or more flash memory devices, or one or more other non-volatile solid state storage devices. The memory 426, or alternatively the non-volatile memory within the memory 426, includes a non-transitory computer-readable storage medium. In some implementations, the memory 426, or the non-transitory computer-readable storage medium of the memory 426, stores the following programs, modules, and data structures, or a subset or superset thereof:
Each of the above identified elements may be stored in one or more of the previously mentioned memory devices, and corresponds to a set of instructions for performing a function described above. The above identified modules or programs (i.e., sets of instructions) need not be implemented as separate software programs, procedures, or modules, and thus various subsets of these modules may be combined or otherwise rearranged in various implementations. In some implementations, the memory 426, optionally, stores a subset of the modules and data structures identified above. Furthermore, the memory 426, optionally, stores additional modules and data structures not described above, such as a sensor management module for managing operation of the sensor(s) 422.
In some implementations, one or more operations of the smart device 204 are performed by the server system 164 and/or the client device 220. These operations include, but are not necessarily limited to, operations performed by or under control of computer program instructions such as the applications 436, device-side module 438, sensor module 444 and event analysis module 450. In some implementations, device data 458 associated with these operations that are performed by the server system 164 are stored, maintained and updated in whole or in part on or by the server system 164.
In accordance with some implementations, the speaker(s) 482 are configured to emit ultrasonic pulses 490 (also sometimes called ultrasonic chirps) and the microphone(s) 480 are configured to receive corresponding ultrasonic signals 492. In some implementations, the pulses 490 are at a frequency above 20 kilohertz (kHz). In some implementations, the pulses 490 sweep a range of frequencies (e.g., a range between 20 kHz and 60 kHz). In some implementations, the signals 492 are analyzed (e.g., by the sensor module 444 and the processor(s) 402) to determine whether motion is present in proximity to the audio device (e.g., determine whether a user is in proximity to the audio device). In some implementations, analyzing the signals 492 includes comparing the profiles of the signals 492 the profiles of the pulses 490. In some implementations, analyzing the signals 492 includes comparing the profiles of the signals 492 to one another. In some implementations, analyzing the signals 492 includes analyzing timing between the sending of the pulses 490 and the receiving of the signals 492. In some implementations, the sensor module 444 includes a sound navigation and ranging (SONAR) module. Although pulses 490 are described above, in some implementations, continuous wave signals are emitted. In some implementations, frequency, amplitude, and/or phase of the signals (e.g., pulses or continuous wave) are modulated.
The memory 506 includes high-speed random access memory, such as DRAM, SRAM, DDR SRAM, or other random access solid state memory devices; and, optionally, includes non-volatile memory, such as one or more magnetic disk storage devices, one or more optical disk storage devices, one or more flash memory devices, or one or more other non-volatile solid state storage devices. The memory 506, optionally, includes one or more storage devices remotely located from one or more processing units 602. The memory 506, or alternatively the non-volatile memory within the memory 506, includes a non-transitory computer readable storage medium. In some implementations, the memory 506, or the non-transitory computer readable storage medium of the memory 506, stores the following programs, modules, and data structures, or a subset or superset thereof:
Each of the above identified elements may be stored in one or more of the previously mentioned memory devices, and corresponds to a set of instructions for performing a function described above. The above identified modules or programs (i.e., sets of instructions) need not be implemented as separate software programs, procedures, modules or data structures, and thus various subsets of these modules may be combined or otherwise rearranged in various implementations. In some implementations, the memory 506, optionally, stores a subset of the modules and data structures identified above. Furthermore, the memory 506, optionally, stores additional modules and data structures not described above, such as an ultrasound module. In some implementations, the programs, modules, and data structures, or a subset or superset thereof described with reference to
Thus, as illustrated in
Thus, as illustrated in
In some implementations, the audio device scans (1002) ultrasound environs of the audio device (e.g., as shown in
In some implementations, the audio device determines (1004), based on the scanning, one or more ultrasound parameters for subsequent ultrasound pulses. For example, the audio device adjusts a frequency, amplitude, and/or intensity of the ultrasound pulses based on a signal-to-noise ratio for each frequency. In some implementations, determining the parameters includes adjusting timing of pulses to offset from other ultrasonic sources. In some implementations, determining the parameters includes identifying the speaker's ultrasonic band local max region(s) (e.g., 32 kHz) and adjusting the parameters to emit ultrasonic bursts (e.g., chirps) at the local max (e.g., via a 29 kHz-35 kHz sweep). In some implementations, the audio device determines, based on the scanning, one or more audible parameters for subsequent audible outputs (e.g., for subsequent music, TTS, or audiovisual content). In some implementations, based on the scanning, the audio device identifies room boundaries and/or objects within the room.
In some implementations, based on the scanning, the audio device performs a room automatic equalization process. For example, the audio device adjusts one or more bass frequency parameters based on the ultrasonic scanning (e.g., based on the identified room boundaries and/or objects).
Manual room equalization is typically a cumbersome process that must be repeated each time the room or placement changes. A user must have a microphone and has to record audio output responses at various positions in a room using the microphone. The user must then apply the required correction based on the collected responses. Furthermore, this manual equalization process requires a lot of knowledge about speakers, microphones, and rooms, which is too advanced for the average user. Automatic audio equalization provides an opportunity for users to achieve the best listening experience and at the same time, avoid the need for any setup or calibration process. Moreover, an automatic equalization process enables a user to rearrange the room or move the speakers without having to worry about conducting another tiresome manual equalization.
In accordance with some implementations, an automated equalization method uses microphones within the device to sense the relative contributions of a wave traveling from the device toward the wall(s) behind and any waves reflected from those walls. In some instances and implementations, a certain delay, or phase shift, from a wave directly from the speaker is anticipated between the microphones; and a reduction of that delay or phase shift is anticipated between the microphones for a wave reflecting off the wall(s) behind the speaker.
In some implementations, the relative phase (phase difference) between one or more microphones is measured. In some implementations, a frequency (acoustic) response is determined using relative amplitude spectral features. In some implementations, relative amplitude spectral features are used in combination with microphone matching and/or calibration. In some instances and implementations, giving weight the phase differences minimizes the impact of differences in sensitivities between the microphones on the equalization process. In some implementations, equalization comprises correcting the frequency response at below a threshold frequency (e.g., below about 300 Hz, where the wavelength is about 1.1 m). In some instances and implementations, only the frequencies below the threshold frequency propagate in all directions, including backwards, from a speaker, and therefore are the only frequencies impacted by walls or corners behind the speaker.
In some implementations, the relative positioning of the microphones with respect to one another is obtained and used to determine phase differences. In some implementations, the automatic equalization is performed without any information regarding relative positioning of the microphones with respect to the speaker(s).
In some implementations, the automatic equalization is carried out based on an acoustical model. In some implementations, the device learns and recognizes patterns based on room position, and applies a corresponding equalization correction.
In some implementations, the automatic equalization is carried out using machine learning. In some implementations, machine learning comprises training the device on desired corrections for a range of positions and/or frequencies (e.g., training targets can be obtained from expert listeners, or by measuring the spectrum at auxiliary microphones in the listening area, or by the ABC method using auxiliary microphones in front of the speaker driver).
In some implementations, the one or more ultrasound parameters include one or more frequency settings and/or one or more timing settings. For example, the audio device analyzes the scan data and determines that a room in which the audio device is situated is 10 feet long. In this example, the audio device adjusts the parameters of subsequent ultrasound pulses to be able to detect users/persons within 10 feet (and to minimize scanning outside of 10 feet). In some implementations, the audio device limits the detection range by adjusting (e.g., increasing) a chirp rate, chirp duration, and/or intensity.
In some implementations, scanning the ultrasound environs includes sending one or more test pulses. In some implementations, scanning the ultrasound environs includes determining a signal-to-noise ratio for each of a plurality of frequencies.
In some implementations, the audio device determines (1006) whether a person is in a vicinity of the audio device. In some implementations, the audio device utilizes ultrasound signals to determine (1008) whether the person is in the vicinity. In some implementations, the audio device determines that a person (e.g., a user) is in the vicinity by determining that motion is present and characterizing the motion (e.g., determining that the motion is consistent with a person walking, breathing, or skipping).
In some implementations, the audio device: (1) detects motion via a sensor of the audio device (e.g., a PIR sensor); (2) in response to detecting the motion, sends one or more second ultrasound pulses; (3) receives one or more second signals corresponding to the one or more second ultrasound pulses; and (4) characterizes the motion based on the one or more second signals. In some implementations, the sensor comprises a radar component, a Wi-Fi signal analysis component, an image sensor, a PIR sensor, and/or an ALS. In some implementations, a duty cycle of the ultrasound pulses is based on whether motion (or an entity/user) has been detected. For example, the audio device emits an ultrasonic pulse once per second when a user is present and once per minute when a user is not present (e.g., to conserve energy and/or minimize interference with other devices). As another example, the audio device emits ultrasonic pulses with lower intensity when a user is closer to the device and with higher intensity when the user is farther away from the device. In some implementations, the audio device enables a virtual assistant feature based on the characterized motion (e.g., enables hotword detection, greets the user, communicatively couples to a virtual assistant server, and the like). In some implementations, the audio device sends a continuous ultrasonic wave and receives one or more signals corresponding to the continuous wave.
In some implementations, the audio device identifies the user based on the one or more received signals (e.g., based on respiratory patterns, gait, and/or cardiac patterns).
In some implementations, the audio device is configured to operate in three distinct ultrasound modes, including: a first mode for occupancy detection (e.g., with a lowest rate of chirps); a second mode for determining positioning (e.g., triggered after motion is detected in the first mode); and a third mode (e.g., with a highest rate of chirps) for movement analysis, such as respiratory or gesture characterization. As an example, in the third mode, for movement analysis, the audio device is configured to emit chirps with a frequency between 25 kHz and 45 kHz with a 10 millisecond (ms) to 100 ms pulse duration and a 25-75% duty cycle. In some implementations, the first mode utilizes a highest intensity or volume for the ultrasonic pulses, and the second and third modes reduce the intensity or volume for the ultrasonic pulses based on a distance to the user.
In some implementations, the audio device greets (1010) the person in accordance with a determination that the person is in the vicinity of the audio device. In some implementations, the audio device identifies the person as a particular user and the greeting is a personalized greeting (e.g., as shown in
While audibly communicating with the person via a speaker and microphone (1012), the audio device sends (1014) one or more ultrasound pulses via the speaker. For example, the ultrasound pulses are interlaced with audible responses to the user. In some implementations, each ultrasound pulse of the one or more ultrasound pulses is a chirp sweeping across multiple frequencies. In some implementations, the ultrasound pulses are in the range of 20 kHz to 90 kHz. In some implementations, the ultrasound pulses sweep a range of frequencies (e.g., 25 kHz-30 kHz). In some implementations, each ultrasound pulse has a duration between 0.1 and 50 milliseconds. In some implementations, the ultrasound pulses have a duty cycle between 1% and 50%.
While audibly communicating with the person via a speaker and microphone (1012), the audio device receives (1016), via the microphone, one or more signals corresponding to the one or more ultrasound pulses. In some implementations, the audio device is configured to have a sampling rate greater than 90 kHz. In some implementations, the audio device is configured to have a ping rate of 10 Hz.
While audibly communicating with the person via a speaker and microphone (1012), the audio device determines (1018) positioning of the person based on the one or more received signals (e.g., based on impulse responses or linear sums). In some implementations, SONAR techniques are used to determine positioning of the person. In some implementations, the SONAR techniques are used to determine human static occupancy, proximity, human breathing rates, over-the-air gestures (e.g., waving hands), posture, and/or relative room temperature. In some implementations, the audio device determines positioning of the person by isolating the ultrasonic band, performing an absolute value operation, and summing over time. In some implementations, after determining the positioning the user, the audio device adjusts a gain or volume level of subsequent audible communications and/or ultrasonic pulses (e.g., so as to minimize noise pollution and reduce interference with other audio devices in the smart home environment). For example, after identifying a position of a person who is a participant in an audible/spoken communication/conversation with the audio device, the audio device can increase or decrease the volume of its audible outputs that are part of that communication/conversation to provide an ideal/consistent volume at the user's position. In some implementations, the ideal volume is determined to approximate a default/predefined sound level at the user's position; an inferred speaking volume of the user based on the user's determined position; or an appropriate sound level at the user's position based on a combination of any one or more of the above factors and/or contextual information, such as time of day, location in a home environment of the device, a background environmental sound level, hearing capabilities of the user, and presence and location of other persons in vicinity of the audio device. For example, in accordance with a determination that a user is quite distant from the device and the time of day is mid-day, the device increases a text-to-speech (TTS) volume to ensure that the user is able to hear and understand the subsequent audible output. As another example, in accordance with a determination that the user is quite close and one or more children are sleeping nearby, the device decreases the TTS volume to a level where the user is able to hear and understand the subsequent audible output, while minimizing the impact on the children.
In some implementations, determining positioning of the user includes determining a distance between the user and the audio device. In some implementations, the audio device identifies movement of the user based on the received signals. For example, the audio device identifies the movement as the user sitting down and offers to adjust room lighting and/or turn on an entertainment system.
In some implementations: (1) the audio device receiving positioning data from one or more second audio devices; and (2) determining the positioning of the user is further based on the received positioning data. In some implementations, the received positioning data includes ultrasound data, radar data, channel state information (CSI), relative signal strength information (RSSI), visual imaging data, and/or PIR data.
For example, the audio device: (1) sends one or more radar pulses via a transmitter at the audio device; and (2) receives, via a receiver at the audio device, one or more second signals corresponding to the one or more radar pulses. In this example, the positioning of the user is determined based in-part on the one or more second signals. In some implementations, the radar data is further used to track the user behind obstructions (e.g., walls and objects) and/or in low light situations.
As another example, the audio device receives one or more wireless communication signals, and in this example, the positioning of the user is further based on analysis of the one or more wireless communication signals, such as channel state information (CSI), received signal strength information (RSSI), and/or bandwidth (BW) information of a Wi-Fi signal. In some implementations, the CSI and/or RSSI is further used to (1) detect motion (e.g., determine if an entity is in proximity); and (2) synchronize with remote devices (e.g., other smart devices 204 in the smart home environment).
As another example, the audio device captures, via an image sensor of the audio device, one or more images of a scene in a field of view of the audio device, and in this example, the positioning of the user is further based on analysis of the one or more images of the scene.
In some implementations, the audio device differentiates between the user and one or more additional entities (e.g., tracks and/or identifies) based on the one or more received signals. In some implementations, the differentiating is further based on additional ultrasound data, radar data, channel state information (CSI), relative signal strength information (RSSI), visual imaging data, and/or PIR data. In some implementations, differentiating between the user and additional entities includes identifying distinct respiratory, cardiac, and/or gait patterns.
In some implementations, the audio device identifies (1020) one or more user gestures based on the one or more received signals. In some implementations, the audio device generates (1022) a response to the person based on the one or more user gestures. For example, the audio device identifies one or more hand, foot, or head gestures as a user response (e.g., shaking of the head) and responds accordingly. As an example, while outputting audible content, the audio device identifies a user gesture corresponding to a pause function and, in response, pauses the audible content. In some implementations, the audio device utilizes multiple microphones (at varying distances from the user) to identify three-dimensional gestures from the user.
In some implementations, the audio device identifies (1024) one or more breathing cues of the person based on the one or more received signals. In some implementations, the audio device generates (1026) a response to the person based on the one or more breathing cues. For example, the audio device may detect a breathing problem with the user and notify emergency services. As another example, the audio device is optionally configured to monitor a baby's breathing patterns and notify the baby's guardians if a potential issue is detected. As another example, the audio device may detect that a user is distressed and offer assistance.
In some implementations, the audio device emits (1028) one or more second ultrasound pulses configured to determine a temperature of the environs of the audio device. In some implementations, the audio device receives (103) one or more second signals corresponding to the second ultrasound pulses. In some implementations, the audio device determines (1032) a temperature of the environs based on the one or more second signals. For example, the audio device uses time-of-flight information to identify changes in temperature and/or determine a room temperature.
In some implementations, the audio device adjusts (1034) one or more parameters of the speaker and/or the microphone based on the determined positioning of the person. In some implementations, the audio device adjusts (1036) a volume level of the speaker in accordance with the determined positioning of the person (e.g., adjusting a gain of the speaker). In some implementations, the audio device has a plurality of speakers and a particular speaker is selected for audible communications based on the determined positioning. In some implementations, the audio device includes a plurality of microphones and a particular microphone is selected for attending to the user based on the determined positioning. In some implementations, the audio device has a plurality of speakers and a particular speaker is selected for subsequent ultrasonic pulses based on the determined positioning. In some implementations, the audio device includes a plurality of microphones and a particular microphone is selected for receiving subsequent ultrasonic pulses based on the determined positioning.
In some implementations, the audio device sends (1038) a notification to a second electronic device (e.g., “wakes up” the second device) based on the determined positioning of the person. For example, the audio device determines that the user is near, or approaching, a second device (e.g., the user is within sensor range) and sends a notification to the second device.
In some implementations, the audio device is configured to optionally emit ultrasonic pulse(s) configured to discourage an animal (e.g., an animal that can hear sounds in the ultrasonic range) from approaching the audio device. For example, to keep wild animals away from a smart home environment, or to enable a user to discourage a pet from entering a particular area (e.g., the user's office).
In some implementations, the audio device determines that an animal (e.g., a family pet that can hear sounds in the ultrasonic range) is in the vicinity and adjusts one or more parameters of subsequent ultrasonic pulses (e.g., modulates subsequent pulses) to minimize upsetting the animal. For example, the audio device adjusts to a higher frequency than the animal can hear, or the audio device decreases an intensity, amplitude, and/or duty cycle to decrease an effect on the animal.
In some implementations, a subset of chirps emitted in the scenario illustrated in
In some implementations, the audio device 602 adjusts a volume of audible outputs and/or a magnification of visual displays based on the relative distance to the user 1106. For example, the audio device 602 reduces volume of audible outputs as the user 1106 approaches to conserve power and present a more consistent audio experience for the user.
The audio device sends (1302) a first set of ultrasound chirps (e.g., the chirps 1202) at a first rate via a speaker (e.g., the speaker 482-1) of the audio device (e.g., as illustrated in
The audio device receives (1304), via a microphone (e.g., the microphone 480-1) of the audio device, a first set of signals (e.g., the responses 1204) corresponding to the first set of ultrasound chirps (e.g., as illustrated in
The audio device determines (1306) based on the first set of signals that a person is in proximity to the audio device. In various implementations, proximity to the audio device corresponds to the person being within range of the ultrasound chirps, within audible range of the audio device, or within a preset threshold distance of the audio device (e.g., 20 feet, 10 feet, or 5 feet of the device). In some implementations, the audio device analyzes temporal differences between responses to determine if the differences are due to a person being in proximity to the audio device. In some implementations, the audio device compares the first set of signals to a room response for the room in which the audio device is positioned. In some implementations, the audio device identifies (1308) proximity of the person based on temporal variations in the first set of signals (e.g., as discussed above with respect to
In some implementations, the audio device is paired with a distinct second device (e.g., via Bluetooth) and utilizes relative device positioning to analyze and interpret variance in chirp responses. In some implementations, the audio device emits the first set of chirps and analyzes responses received at both the audio device and the second audio device (e.g., the second audio device sends response information to the audio device).
In some implementations and circumstances, the audio device detects, based on the first set of signals, that a confounding circumstance is present (e.g., a vibrating or rotating object in proximity to the audio device). In some implementations, in accordance with detecting the confounding circumstance, the audio device adjusts one or more detection criteria (e.g., the audio device masks out, or ignores, variance due to the confounding circumstance). In some implementations, adjusting the one or more detection criteria comprises increasing a proximity detection threshold (e.g., only scanning for motion within a reduced radius of the device). In some implementations, adjusting the one or more detection criteria comprises disabling proximity detection (or analysis) for a preset amount of time or until the confounding circumstance is no longer detected. Confounding circumstances may include a rotating fan, an active blender, or a change in air temperature, pressure, or humidity (e.g., due to activation of an air conditioner). In some implementations, the audio device includes one or more additional sensors (e.g., as illustrated in Table 1 above) to identify and/or overcome the confounding circumstance.
In some implementations, the audio device utilizes machine learning technique(s) to identify and/or mask confounding circumstances. In some implementations, the audio device utilizes machine learning technique(s) to distinguish moving persons from other types of motion or confounding circumstances. In some implementations, the audio device utilizes machine learning technique(s) to identify and distinguish between a plurality of user gestures, postures, and/or breathing patterns. For example, machine learning techniques are used to classify, identify, and respond to sign language from a user.
In some implementations, the audio device: (1) identifies (1310) a segment of the first set of signals, the segment consistent with a person in proximity to the audio device; and (2) determines (1312) whether the segment meets one or more detection criteria. For example, the audio device identifies a segment of the first set of signals indicating that motion is present 5 meters from the device and the audio device analyzes the motion to determine if it corresponds to a moving person (e.g., rather than a moving animal or rotating fan). In some implementations, determining whether the segment meets the one or more detection criteria includes determining whether the detected motion has a velocity, acceleration, and/or size consistent with that of a moving person.
In accordance with a determination that the person is in proximity to the audio device, the audio device sends (1314) a second set of ultrasound chirps at a second rate, faster than the first rate (e.g., as illustrated in
In some implementations, the audio device receives (1316), via the microphone, a second set of signals corresponding to the second set of ultrasound chirps. In some implementations, the second set of signals are analyzed to characterize movement of the person (e.g., characterize breathing patterns, gestures, postures, and/or expressions). For example, the audio device monitors sleep patterns for a person in proximity to the audio device and provide feedback to the person. In some implementations, the audio device identifies (1318) a gesture from the person based on the second set of signals. In some implementations, the audio device compares (1320) signals received by the at least one additional microphone with respective signals of the second set of signals (e.g., to triangular positioning and/or determine a directionality of detected movement).
In some implementations, the audio device determines (1322) a relative distance to the person. For example, the audio device determines a relative distance to the person based on where in the responses a variance is detected, as discussed above with respect to
In some implementations, the audio device adjusts (1326) one or more characteristics of a user interface on the audio device based on relative positioning of the person. For example, the audio device wakes up a display or adjusts a brightness level based on the relative positioning). In some implementations, adjusting the characteristic(s) includes reorienting a user interface based on the relative positioning (e.g., turning a display to face the person). In some implementations, the audio device activates (1328) a display in accordance with the person being within a predetermined distance of the audio device (e.g., as illustrated in
In some implementations, the audio device receives (1334), via the microphone, a second set of signals corresponding to the second set of ultrasound chirps.
In some implementations, the audio device determines based on the second set of signals that the person is in close proximity to the audio device (e.g., within 5 feet, 2 feet, or 1 foot of the device). In some implementations, in accordance with a determination that the person is in close proximity to the audio device, the audio device switches to an interaction mode. In some implementations, the interaction mode includes activating one or more user interface elements. In some implementations, the interaction mode includes reorienting the device to face the person. In some implementations, the interaction mode includes sending a third set of ultrasound chirps at a third rate, faster than the second rate. In some implementations, the interaction mode includes analyzing received signals corresponding to the third set of ultrasound chirps to identify user gestures and/or expressions. In some implementations, the interaction mode includes analyzing received signals corresponding to the third set of ultrasound chirps to monitor respiratory patterns of the user (e.g., to identify health concerns and/or determine mood of the person).
In some implementations, the audio device determines (1336) based on the second set of signals that the person is no longer in proximity to the audio device. For example, the second set of signals indicates that the person is moving, or has moved, away from the audio device beyond a threshold distance. In some implementations, in accordance with the determination that the person is no longer in proximity to the audio device, the audio device sends (1338) a third set of ultrasound chirps at a third rate, slower than the second rate. In some implementations, the third rate is the first rate.
In some implementations, the audio device maintains (1340) a mapping of signals of the first set of signals to respective ultrasound chirps of the first set of ultrasound chirps. For example, the audio device stores vectors and/or matrices of temporal variance between consecutive responses to sent chirps (e.g., stores the variances 1206 in
In some implementations, the audio device identifies (1342) a room response from the mapping. In some implementations, the room response corresponds to a mapping of the room while the room is unoccupied. In some implementations, the room response comprises one or more vectors and/or matrices. In some implementations, the room response is stored locally in the audio device, e.g., as a portion of the device data 458 within the memory 426. In some implementations, the audio device obtains environmental data for environs of the audio device; and updates the identified room response based on the environmental data (e.g., updates the room response based on changing temperatures, pressures, or humidity within the room). In some implementations, the audio device includes one or more environmental sensors configured to detect changes in temperature, pressure, and/or humidity. In some implementations, the audio device receives the environmental data from a remote device or server.
In some implementations, determining that the person is in proximity to the audio device includes identifying (1344) a variance from the room response. For example, a response from one or more chirps is compared to the stored room response to determine if a person is in proximity.
In some circumstances, a disruption may occur in the mapping. For example, the audio device may be using all processing capabilities for other processes and/or experience a buffer overload (overflow) condition and fail to receive or analyze a chirp response. In some implementations, in response to a disruption of the mapping, the audio device discards (1346) the mapping. For example, the audio device determines that a disruption has occurred and discard the mapping as being out of date (e.g., rather than analyze variance based on the pre-disruption mapping).
In some implementations, in response to the disruption of the mapping, the audio device establishes (1348) a new mapping by increasing a chirp rate of subsequent ultrasound chirps for a preset amount of time. For example, the audio device determines that a disruption has occurred and sends chirps at a faster rate (e.g., two times, five times, or ten times the prior rate) so as to more quickly establish the new mapping. For example, prior to the disruption the device is emitting chirps at a rate of eight per second and after the disruption the device emits chirps at a rate of thirty per second to establish the new mapping (e.g., for 5, 10, or 30 chirps). In some implementations, the subsequent ultrasound chirps are sent at a rate that corresponds to a maximum rate for a desired scan distance (e.g., 10 milliseconds for a detection radius of 5 feet). In some implementations, in response to the disruption of the mapping, the audio device disables proximity detection (e.g., detection of persons in proximity to the device) until the new mapping is established.
Although some of various drawings illustrate a number of logical stages in a particular order, stages that are not order dependent may be reordered and other stages may be combined or broken out. While some reordering or other groupings are specifically mentioned, others will be obvious to those of ordinary skill in the art, so the ordering and groupings presented herein are not an exhaustive list of alternatives. Moreover, it should be recognized that the stages could be implemented in hardware, firmware, software or any combination thereof.
It will also be understood that, although the terms first, second, etc. are, in some instances, used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, a first category could be termed a second category, and, similarly, a second category could be termed a first category, without departing from the scope of the various described implementations. The first category and the second category are both categories, but they are not necessarily the same category.
The terminology used in the description of the various described implementations herein is for the purpose of describing particular implementations only and is not intended to be limiting. As used in the description of the various described implementations and the appended claims, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will also be understood that the term “and/or” as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items. It will be further understood that the terms “includes,” “including,” “comprises,” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
As used herein, the term “if” is, optionally, construed to mean “when” or “upon” or “in response to determining” or “in response to detecting” or “in accordance with a determination that,” depending on the context. Similarly, the phrase “if it is determined” or “if [a stated condition or event] is detected” is, optionally, construed to mean “upon determining” or “in response to determining” or “upon detecting [the stated condition or event]” or “in response to detecting [the stated condition or event]” or “in accordance with a determination that [a stated condition or event] is detected,” depending on the context.
The foregoing description, for purpose of explanation, has been described with reference to specific implementations. However, the illustrative discussions above are not intended to be exhaustive or to limit the scope of the claims to the precise forms disclosed. Many modifications and variations are possible in view of the above teachings. The implementations were chosen in order to best explain the principles underlying the claims and their practical applications, to thereby enable others skilled in the art to best use the implementations with various modifications as are suited to the particular uses contemplated.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2018/048780 | 8/30/2018 | WO | 00 |
Number | Date | Country | |
---|---|---|---|
62680982 | Jun 2018 | US |