The present disclosure generally relates to systems, methods, and devices for tracking contexts.
A head-mounted device equipped with a scene camera takes many images of a user's environment. The device can detect contexts in the environment, such as objects or activities of the user (e.g., running, eating, or washing hands) based at least in part on those images. Such detection can be useful for presenting virtual content.
So that the present disclosure can be understood by those of ordinary skill in the art, a more detailed description may be had by reference to aspects of some illustrative implementations, some of which are shown in the accompanying drawings.
In accordance with common practice the various features illustrated in the drawings may not be drawn to scale. Accordingly, the dimensions of the various features may be arbitrarily expanded or reduced for clarity. In addition, some of the drawings may not depict all of the components of a given system, method or device. Finally, like reference numerals may be used to denote like features throughout the specification and figures.
Various implementations disclosed herein include devices, systems, and methods for tracking contexts. In various implementations, the method is performed by a device including an image sensor, one or more processors, and non-transitory memory. The method includes capturing, using the image sensor, an image of an environment at a particular time. The method includes detecting a context based at least in part on the image of the environment. The method includes, in accordance with a determination that the context is included within a predefined set of contexts, storing, in a database, an entry including data indicating detection of the context in association with data indicating the particular time. The method includes receiving a query regarding the context. The method includes providing a response to the query based on the data indicating the particular time.
Various implementations disclosed herein include devices, systems, and method for tracking statuses of objects. In various implementations, the method is performed at a device including an image sensor, one or more processors, and non-transitory memory. The method includes capturing, using the image sensor, an image of an environment at a particular time. The method includes detecting an object in the image of the environment. The method includes, in accordance with a determination that the object is included within a predefined set of objects, determining a status of the object based at least in part on the image of the environment. The method includes storing, in a database, an entry including data indicating the status of the object in association with data indicating the particular time. The method includes receiving a query regarding the status of the object. The method includes providing a response to the query based on the data indicating the particular time.
In accordance with some implementations, a device includes one or more processors, a non-transitory memory, and one or more programs; the one or more programs are stored in the non-transitory memory and configured to be executed by the one or more processors. The one or more programs include instructions for performing or causing performance of any of the methods described herein. In accordance with some implementations, a non-transitory computer readable storage medium has stored therein instructions, which, when executed by one or more processors of a device, cause the device to perform or cause performance of any of the methods described herein. In accordance with some implementations, a device includes: one or more processors, a non-transitory memory, and means for performing or causing performance of any of the methods described herein.
Numerous details are described in order to provide a thorough understanding of the example implementations shown in the drawings. However, the drawings merely show some example aspects of the present disclosure and are therefore not to be considered limiting. Those of ordinary skill in the art will appreciate that other effective aspects and/or variants do not include all of the specific details described herein. Moreover, well-known systems, methods, components, devices, and circuits have not been described in exhaustive detail so as not to obscure more pertinent aspects of the example implementations described herein.
As noted above, a head-mounted device equipped with a scene camera takes many images of a user's environment throughout days or weeks of usage. The device can detect contexts in the environment, such as objects (e.g., keys or a smartphone), statuses of objects (e.g., on/off or locked/unlocked) or current activities of the user (e.g., running, eating, or washing hands) based at least in part on those images. By storing data indicating the detection in association with a time or location the context was detected in a searchable database, response to queries can be generated. For example, a user can query “Where did I leave my keys?” and the device can respond with a location of the user's keys. As another example, a user can query “Is the front door locked?” and the device can respond with the status of the front door as either being locked or unlocked. As another example, a user can query “When did I last go for a run?” and the device can provide a time the user last ran.
In some implementations, the controller 110 is configured to manage and coordinate an XR experience for the user. In some implementations, the controller 110 includes a suitable combination of software, firmware, and/or hardware. The controller 110 is described in greater detail below with respect to
In some implementations, the electronic device 120 is configured to provide the XR experience to the user. In some implementations, the electronic device 120 includes a suitable combination of software, firmware, and/or hardware. According to some implementations, the electronic device 120 presents, via a display 122, XR content to the user while the user is physically present within the physical environment 105 that includes a table 107 within the field-of-view 111 of the electronic device 120. As such, in some implementations, the user holds the electronic device 120 in his/her hand(s). In some implementations, while providing XR content, the electronic device 120 is configured to display an XR object (e.g., an XR sphere 109) and to enable video pass-through of the physical environment 105 (e.g., including a representation 117 of the table 107) on a display 122. The electronic device 120 is described in greater detail below with respect to
According to some implementations, the electronic device 120 provides an XR experience to the user while the user is virtually and/or physically present within the physical environment 105.
In some implementations, the user wears the electronic device 120 on his/her head. For example, in some implementations, the electronic device includes a head-mounted system (HMS), head-mounted device (HMD), or head-mounted enclosure (HME). As such, the electronic device 120 includes one or more XR displays provided to display the XR content. For example, in various implementations, the electronic device 120 encloses the field-of-view of the user. In some implementations, the electronic device 120 is a handheld device (such as a smartphone or tablet) configured to present XR content, and rather than wearing the electronic device 120, the user holds the device with a display directed towards the field-of-view of the user and a camera directed towards the physical environment 105. In some implementations, the handheld device can be placed within an enclosure that can be worn on the head of the user. In some implementations, the electronic device 120 is replaced with an XR chamber, enclosure, or room configured to present XR content in which the user does not wear or hold the electronic device 120.
In some implementations, the one or more communication buses 204 include circuitry that interconnects and controls communications between system components. In some implementations, the one or more I/O devices 206 include at least one of a keyboard, a mouse, a touchpad, a joystick, one or more microphones, one or more speakers, one or more image sensors, one or more displays, and/or the like.
The memory 220 includes high-speed random-access memory, such as dynamic random-access memory (DRAM), static random-access memory (SRAM), double-data-rate random-access memory (DDR RAM), or other random-access solid-state memory devices. In some implementations, the memory 220 includes non-volatile memory, such as one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, or other non-volatile solid-state storage devices. The memory 220 optionally includes one or more storage devices remotely located from the one or more processing units 202. The memory 220 comprises a non-transitory computer readable storage medium. In some implementations, the memory 220 or the non-transitory computer readable storage medium of the memory 220 stores the following programs, modules and data structures, or a subset thereof including an optional operating system 230 and an XR experience module 240.
The operating system 230 includes procedures for handling various basic system services and for performing hardware dependent tasks. In some implementations, the XR experience module 240 is configured to manage and coordinate one or more XR experiences for one or more users (e.g., a single XR experience for one or more users, or multiple XR experiences for respective groups of one or more users). To that end, in various implementations, the XR experience module 240 includes a data obtaining unit 242, a tracking unit 244, a coordination unit 246, and a data transmitting unit 248.
In some implementations, the data obtaining unit 242 is configured to obtain data (e.g., presentation data, interaction data, sensor data, location data, etc.) from at least the electronic device 120 of
In some implementations, the tracking unit 244 is configured to map the physical environment 105 and to track the position/location of at least the electronic device 120 with respect to the physical environment 105 of
In some implementations, the coordination unit 246 is configured to manage and coordinate the XR experience presented to the user by the electronic device 120. To that end, in various implementations, the coordination unit 246 includes instructions and/or logic therefor, and heuristics and metadata therefor.
In some implementations, the data transmitting unit 248 is configured to transmit data (e.g., presentation data, location data, etc.) to at least the electronic device 120. To that end, in various implementations, the data transmitting unit 248 includes instructions and/or logic therefor, and heuristics and metadata therefor.
Although the data obtaining unit 242, the tracking unit 244, the coordination unit 246, and the data transmitting unit 248 are shown as residing on a single device (e.g., the controller 110), it should be understood that in other implementations, any combination of the data obtaining unit 242, the tracking unit 244, the coordination unit 246, and the data transmitting unit 248 may be located in separate computing devices.
Moreover,
In some implementations, the one or more communication buses 304 include circuitry that interconnects and controls communications between system components. In some implementations, the one or more I/O devices and sensors 306 include at least one of an inertial measurement unit (IMU), an accelerometer, a gyroscope, a thermometer, one or more physiological sensors (e.g., blood pressure monitor, heart rate monitor, blood oxygen sensor, blood glucose sensor, etc.), one or more microphones, one or more speakers, a haptics engine, one or more depth sensors (e.g., a structured light, a time-of-flight, or the like), and/or the like.
In some implementations, the one or more XR displays 312 are configured to provide the XR experience to the user. In some implementations, the one or more XR displays 312 correspond to holographic, digital light processing (DLP), liquid-crystal display (LCD), liquid-crystal on silicon (LCoS), organic light-emitting field-effect transitory (OLET), organic light-emitting diode (OLED), surface-conduction electron-emitter display (SED), field-emission display (FED), quantum-dot light-emitting diode (QD-LED), micro-electro-mechanical system (MEMS), and/or the like display types. In some implementations, the one or more XR displays 312 correspond to diffractive, reflective, polarized, holographic, etc. waveguide displays. For example, the electronic device 120 includes a single XR display. In another example, the electronic device includes an XR display for each eye of the user. In some implementations, the one or more XR displays 312 are capable of presenting MR and VR content.
In some implementations, the one or more image sensors 314 are configured to obtain image data that corresponds to at least a portion of the face of the user that includes the eyes of the user (any may be referred to as an eye-tracking camera). In some implementations, the one or more image sensors 314 are configured to be forward-facing so as to obtain image data that corresponds to the scene as would be viewed by the user if the electronic device 120 was not present (and may be referred to as a scene camera). The one or more optional image sensors 314 can include one or more RGB cameras (e.g., with a complimentary metal-oxide-semiconductor (CMOS) image sensor or a charge-coupled device (CCD) image sensor), one or more infrared (IR) cameras, one or more event-based cameras, and/or the like.
The memory 320 includes high-speed random-access memory, such as DRAM, SRAM, DDR RAM, or other random-access solid-state memory devices. In some implementations, the memory 320 includes non-volatile memory, such as one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, or other non-volatile solid-state storage devices. The memory 320 optionally includes one or more storage devices remotely located from the one or more processing units 302. The memory 320 comprises a non-transitory computer readable storage medium. In some implementations, the memory 320 or the non-transitory computer readable storage medium of the memory 320 stores the following programs, modules and data structures, or a subset thereof including an optional operating system 330 and an XR presentation module 340.
The operating system 330 includes procedures for handling various basic system services and for performing hardware dependent tasks. In some implementations, the XR presentation module 340 is configured to present XR content to the user via the one or more XR displays 312. To that end, in various implementations, the XR presentation module 340 includes a data obtaining unit 342, a context tracking unit 344, an XR presenting unit 346, and a data transmitting unit 348.
In some implementations, the data obtaining unit 342 is configured to obtain data (e.g., presentation data, interaction data, sensor data, location data, etc.) from at least the controller 110 of
In some implementations, the context tracking unit 344 is configured to detect contexts and store data indicative of the detected context in association with data indicative of a time the context was detected. To that end, in various implementations, the context tracking unit 344 includes instructions and/or logic therefor, and heuristics and metadata therefor.
In some implementations, the XR presenting unit 346 is configured to present XR content via the one or more XR displays 312, such as a visual response to a query. To that end, in various implementations, the XR presenting unit 346 includes instructions and/or logic therefor, and heuristics and metadata therefor.
In some implementations, the data transmitting unit 348 is configured to transmit data (e.g., presentation data, location data, etc.) to at least the controller 110. To that end, in various implementations, the data transmitting unit 348 includes instructions and/or logic therefor, and heuristics and metadata therefor.
Although the data obtaining unit 342, the context tracking unit 344, the XR presenting unit 346, and the data transmitting unit 348 are shown as residing on a single device (e.g., the electronic device 120), it should be understood that in other implementations, any combination of the data obtaining unit 342, the context tracking unit 344, the XR presenting unit 346, and the data transmitting unit 348 may be located in separate computing devices.
Moreover,
The first XR environment 410 includes a plurality of objects, including one or more physical objects (e.g., a picture 411, a couch 412, a water bottle 413, a door 414, a lock 415, and a thermostat 416) of the physical environment and one or more virtual objects (e.g., a virtual clock 401 and a virtual context tracking window 490). In various implementations, certain objects (such as the physical objects and the virtual context tracking window 490) are presented at a location in the first XR environment 410, e.g., at a location defined by three coordinates in a three-dimensional (3D) XR coordinate system such that while some objects may exist in the physical world and the others may not, a spatial relationship (e.g., distance or orientation) may be defined between them. Accordingly, when the electronic device moves in the first XR environment 410 (e.g., changes either position and/or orientation), the objects are moved on the display of the electronic device, but retain their location in the first XR environment 410. Such virtual objects that, in response to motion of the electronic device, move on the display, but retain their position in the first XR environment 410 are referred to as world-locked objects.
In various implementations, certain virtual objects (such as the virtual clock 401) are displayed at locations on the display such that when the electronic device moves in the first XR environment 410, the objects are stationary on the display on the electronic device. Such virtual objects that, in response to motion of the electronic device, retain their location on the display are referred to display-locked objects.
During the first time period, the user selects the add affordance 491B. In various implementations, the user selects the add affordance 491B by performing a hand gesture (e.g., a pinch-and-release gesture) at the location of the add affordance 491B. In various implementations, the user selects the add affordance 491B by looking at the add affordance 491B and performing a head gesture, such as a nod, a wink, or blink, or an eye swipe (in which the gaze swipes across the add affordance 491B). In various implementations (as illustrated in
Thus, in
During the second time period, the user selects the “Brushing Teeth” entry in the list of available contexts 492A. Thus, in
During the sixth time period, the user selects the object affordance 493A. Thus, in
During the eighth time period, the user selects the add affordance 491B. Thus, in
During the ninth time period, the user selects the custom affordance 492C. Thus, in
During the tenth time period, the user selects the status affordance 493C. Thus, in
During the thirteenth time period, the user selects the add affordance 491B. Thus, in
During the fourteenth time period, the user selects the custom affordance 492C. Thus, in
During the fifteenth time period, the user selects the status affordance 493C. Thus, in
The second XR environment 420 includes a plurality of objects, including one or more physical objects (e.g., a sidewalk 421, a street 422, a tree 423, and a dog 424) of the physical environment and one or more virtual objects (e.g., the virtual clock 401, a virtual running application window 428, and a virtual mile marker 429). The virtual mile marker 429 is a world-locked virtual object. In various implementations, the location in the second XR environment 420 of certain virtual objects (such as the virtual running application window 428) changes based on the pose of the body of the user. Such virtual objects are referred to as body-locked objects. For example, as the user runs, the virtual running application window 428 maintains a location approximately one meter in front and half a meter to the left of the user (e.g., relative to the position and orientation of the user's torso). As the head of the user moves, without the body of the user moving, the virtual running application window 428 appears at a fixed location in the second XR environment 420. The second XR environment 420 further includes the water bottle 413.
During the eighteenth time period, the user is running along the sidewalk 421 and carrying the water bottle 413 in the right hand 408 of the user. The electronic device detects the water bottle 413 in an image of the outdoor physical environment on which the second XR environment 420 is based and stores, in a database, an entry including an indication that the water bottle 413 was detected in association with an indication of the time at which the water bottle 413 was detected, e.g., Tuesday at 8:34 AM. In various implementations, the entry further includes an indication of a location of the electronic device when the water bottle 413 was detected. In various implementations, the entry further includes at least a portion of the image of the outdoor physical environment in which the water bottle 413 was detected.
The third XR environment 430 includes a plurality of objects, including one or more physical objects (e.g., a mirror 431, a sink 432, and a toothbrush 433) of the physical environment and one or more virtual objects (e.g., the virtual clock 401 and a virtual timer 435). The virtual timer 435 is a body-locked virtual object.
During the nineteenth time period, the user is brushing the user's teeth with the toothbrush 433 held in the right hand 408 of the user. The electronic device detects that the user is brushing the user's teeth. In various implementations, the electronic device detects that the user is brushing the user's teeth based on captured images of the third XR environment 430 (e.g., the presence of the toothbrush 433), sound detected in the third XR environment 430 (e.g., the sound of running water or brushing), and/or motion of the electronic device within the third XR environment 430 (e.g., a back-and-forth caused by a brushing motion). In response to detecting the context of teeth-brushing, the electronic device stores, in a database, an entry including an indication that teeth-brushing was detected in association with an indication of the time at which teeth-brushing was detected, e.g., Tuesday at 9:15 AM. In various implementations, the entry further includes an indication of a location of the electronic device when teeth-brushing was detected. In various implementations, the entry further includes at least a portion of the image of the physical environment of the bathroom while teeth-brushing was detected.
The fourth XR environment 440 includes a plurality of objects, including one or more physical objects (e.g., a desk 441, a lamp 442, a television 443, and a laptop 444) of the physical environment and one or more virtual objects (e.g., the virtual clock 401). The fourth XR environment 440 further includes the water bottle 413 on the desk 441.
The electronic device detects the water bottle 413 in an image of the physical environment of the office on which the fourth XR environment 440 is based and stores, in a database, an entry including an indication that the water bottle 413 was detected in association with an indication of the time at which the water bottle 413 was detected, e.g., Tuesday at 10:47 AM. In various implementations, the entry further includes an indication of a location of the electronic device when the water bottle 413 was detected. In various implementations, the entry further includes at least a portion of the image of the outdoor physical environment in which the water bottle 413 was detected.
The electronic device detects the lock 415 with a locked status in an image of the physical environment of the living room on which the first XR environment 410 is based and stores, in a database, an entry including an indication that the lock 415 with the locked status was detected in association with an indication of the time at which the lock 415 with the locked status was detected, e.g., Tuesday at 8:31 PM. In various implementations, the entry further includes at least a portion of the image of the living room physical environment in which the lock 415 with the locked status was detected.
The electronic device detects the thermostat 416 in an image of the physical environment of the living room on which the first XR environment 410 is based. Performing text recognition on the image, the electronic device determines a status of the thermostat 416, e.g., “72”. The electronic device stores, in a database, an entry including an indication that the thermostat 416 with a status of “72” was detected in association with an indication of the time at which the thermostat with the status of “72” was detected, e.g., Tuesday at 8:31 PM. In various implementations, the entry further includes at least a portion of the image of the living room physical environment in which the thermostat 416 with the status of “72” was detected.
During the twenty-second time period, the query indicator 481 is replaced with a response indicator 482. The response indicator 482 is a display-locked virtual object displayed by the electronic device while an audio response to the vocal query is produced by the device. For example, during the twenty-second time period, the electronic device produces the sound of a voice saying “Your water bottle is in the office.” Although
In various implementations, the entry indicating detection of the water bottle 413 further includes at least a portion of the image of the physical environment of the office in which the water bottle 413 was detected. Accordingly, during the twenty-second time period, the first XR environment 410 includes a response window 483 including information from the retrieved entry. For example, in
The fifth XR environment 450 includes a plurality of objects, including one or more physical objects (e.g., a bed 451 and a table 452) of the physical environment and one or more virtual objects (e.g., the virtual clock 401). The fifth XR environment 450 further includes the water bottle 413 on the table 452.
The electronic device detects the water bottle 413 in an image of the physical environment of the bedroom on which the fifth XR environment 450 is based and stores, in a database, an entry including an indication that the water bottle 413 was detected in association with an indication of the time at which the water bottle 413 was detected, e.g., Tuesday at 8:42 PM. In various implementations, the entry further includes an indication of a location of the electronic device when the water bottle 413 was detected. In various implementations, the entry further includes at least a portion of the image of the bedroom physical environment in which the water bottle 413 was detected.
During the twenty-third time period, the fifth XR environment 450 includes the query indicator 481. For example, during the twenty-third time period, the user has vocally asked “Is the front door locked?”
During the twenty-fourth time period, the query indicator 481 is replaced with the response indicator 482. For example, during the twenty-fourth time period, the electronic device produces the sound of a voice saying “The front door is locked.”
In various implementations, the entry indicating detection of the lock 415 with the locked status further includes at least a portion of the image of the physical environment of the living room in which the lock 415 was detected. Accordingly, during the twenty-fourth time period, the fifth XR environment 450 includes the response window 483 including information from the retrieved entry. For example, in
During the twenty-sixth time period, the query indicator 481 is replaced with the response indicator 482. For example, during the twenty-sixth time period, the electronic device produces the sound of a voice saying “The thermostat is set to 72.”
In various implementations, the entry indicating detection of the thermostat 416 with the status of “72” further includes at least a portion of the image of the physical environment of the living room in which the thermostat 416 was detected. Accordingly, during the twenty-sixth time period, the fifth XR environment 450 includes the response window 483 including information from the retrieved entry. For example, in
Whereas
As another example, in various implementations, the query is “How many times did I wash my hands today?” To generate the response, the electronic device searches the database for entries including indications of detection of hand-washing and indications of respective times within the current day. Thus, continuing the example, to generate a response to vocal query, the electronic device counts the number of such entries to generate the response. For example, in various implementations, the response is “You washed your hands five times today.”
The method 500 begins, in block 510, with the device capturing, using the image sensor, an image of an environment at a particular time. For example, in
The method 500 continues, in block 520, with the device detecting a context based at least in part on the image of the environment. In various implementations, detecting the context includes detecting a physical object present in the environment (e.g., using an object detection model or neural network classifier configured to classify images of various objects as one of various object types or subtypes). For example, in
The method 500 continues, in block 530, with the device, in accordance with a determination that the context is included within a predefined set of contexts, storing, in a database, an entry including data indicating detection of the context in association with data indicating the particular time. In various implementations, the database is stored on the device, e.g., the non-transitory memory. In various implementations, the database is stored on a server remote from the device.
In various implementations, the predefined set of contexts includes contexts which are registered by a user via, e.g., a graphical user interface. For example, in
In various implementations, the entry further includes data indicating a location of the device at the particular time. In various implementations, the location of the device is represented by latitude and longitude coordinates, e.g., as determined by a GPS sensor. In various implementations, the location of the device is an address or the name of a business at that address. For example, in
In various implementations, the entry further includes at least a portion of the image of the environment at the particular time. For example, in
In various implementations, the database is queryable or searchable. Thus, the method 500 continues, in block 540, with the device receiving a query regarding the content and further continues, in block 550, with the device providing a response to query based on the data indicating the particular time.
In various implementation, the query is received from a user. In various implementations, the query received from the user is a verbal query including one or more words. In various implementations, the query received from the user is a vocal query. For example, in
In various implementations, providing the response includes providing a verbal response including one or more words. In various implementations, providing the response includes providing an audio response. For example, in
In various implementations, the response indicates a latest time the context was detected. For example, in various implementations, the query is “Did I brush my teeth this morning” and the response indicates the last time teeth-brushing was detected. In various implementations, the query is “When did I last take my medicine?” and the response indicates the last time medicine-taking was detected. In various implementations, the query is “When was the last time I went for a run?” and the response indicates the last time running was detected.
In various implementations, the response indicates a location of the device at a latest time the context was detected. For example, in various implementations, the query is “Where is my water bottle?” and the response indicates the location of the device at the last time the water bottle 413 was detected. As another example, in various implementations, the query is “Where did I leave my keys?” and the response indicates the location of the device at the last time the keys of the user were detected. As another example, in various implementations, the query is “Where did I eat breakfast this morning?” and the response indicates a location of the device at the last time eating was detected.
In various implementations, the response indicates a number of times the context was detected within a time window. For example, in various implementations, the query is “How many times have I washed my hands today?” and the response indicates the number of times hand-washing was detected in the current day. As another example, in various implementations, the query is “How many dogs have I seen this week?” and the response indicates the number of times a dog was detected in the current week.
In various implementations, the device detects the context multiple times and stores an entry in the database for each time the context is detected. Thus, in various implementations, the method 500 includes capturing, using the image sensor, a second image of a second environment at a second particular time. In various implementations, the second environment is different than the environment. In various implementations, the second environment is the same as the environment. The method 500 includes detecting the context based at least in part on the second image of the environment and storing, in the database, a second entry including data indicating detection of the context in association with data indicating the second particular time. In various implementations, providing a response to the query is based on the data indicating the second particular time (and the particular time). For example, as noted above, in various implementations, the response indicates a number of times the context was detected in a time window.
In various implementations, the device detects multiple different contexts and stores an entry in the database for each time any context is detected. Thus, in various implementations, the method 500 includes capturing, using the image sensor, a second image of a second environment at a second particular time. The method 500 includes detecting a second context based at least in part on the second image of the second environment, wherein the second context is different than the first context. The method 500 includes, in accordance with a determination that the second context is included within the predefined set of contexts, storing, in the database, a second entry including data indicating detection of the second context in association with data indicating the second particular time. In various implementations, the method includes receiving, from a user, a second query regarding the second context and providing a response to the second query based on data indicating the second particular time.
In various implementations, the device attempts to detect contexts in each captured image of the environment. However, in various implementations, to reduce processing power expenditure in attempting to detect contexts in each captured image, the device attempts to detect contexts periodically in a subset of the images (e.g., once a second or once a minute). In various implementations, detecting the context (in block 520) is performed in response to determining that a function of the image of the environment breaches an interest threshold. In various implementations, the function of the image of the environment includes a difference between the image of the environment and a baseline image of the environment previously captured. If the function of the image of the environment is greater than the interest threshold, the device detects contexts in the image and sets the image of the environment as the baseline image of the environment. If the function of the image of the environment is less than the interest threshold, the device forgoes detecting contexts in the image. In various implementations, the function of the image of the environment is greater if the user's hands are detected in the image, indicating user interaction and a greater likelihood that an activity context will be detected.
The method 600 begins, in block 610, with the device capturing, using the image sensor, an image of an environment at a particular time. For example, in
The method 600 continues, in block 620, with the device detecting an object in the image of the environment. For example, in
The method 600 continues, in block 630, with the device, in accordance with a determination that the object is included within a predefined set of objects, determining a status of the object based at least in part on the image of the environment. For example, in
In various implementations, determining the status of the object includes applying a classifier to at least a portion of the image of the environment including the object. For example, in
In various implementations, determining the status of the object includes performing text recognition on text displayed by the object. For example, in
The method 600 continues, in block 640, with the device storing, in a database, an entry including data indicating the status of the object in association with data indicating the particular time. In various implementations, the database is stored on the device, e.g., the non-transitory memory. In various implementations, the database is stored on a server remote from the device.
In various implementations, predefined set of objects includes objects registered by a user via, e.g., a graphical user interface. For example, in
In various implementations, the entry further includes at least a portion of the image of the environment at the particular time. For example, in
In various implementations, the database is queryable or searchable. Thus, the method 600 continues, in block 650, with the device receiving a query regarding the status of the object and further continues, in block 660, with the device providing a response to query based on the data indicating the particular time.
In various implementation, the query is received from a user. In various implementations, the query received from the user is a verbal query including one or more words. In various implementations, the query received from the user is a vocal query. For example, in
In various implementations, providing the response includes providing a verbal response including one or more words. In various implementations, providing the response includes providing an audio response. For example, in
In various implementations, the response indicates a latest time the status of the object was determined. For example, in various implementations, the query is “What is the thermostat set to?” and the response indicates the status of the thermostat 416 at the last time the thermostat 416 was detected.
In various implementations, the device detects the object multiple times and stores an entry in the database indicating the status of the object for each time the object is detected. Thus, in various implementations, the method 600 includes capturing, using the image sensor, a second image of a second environment at a second particular time. In various implementations, the second environment is different than the environment. In various implementations, the second environment is the same as the environment. The method 600 includes detecting the object in the second image of the second environment. The method 500 includes determining a second status of the object based at least in part on the second image of the environment and storing, in the database, a second entry including data indicating the second status of the object in association with data indicating the second particular time. In various implementations, providing a response to the query is based on the data indicating the second particular time (and the particular time). For example, in various implementations, the query is “What was the lowest setting on the thermostat today?” the response indicates the lowest status of the thermostat 416 in multiple detections.
In various implementations, the device determines the statuses of multiple different objects and stores an entry in the database for each time any of the objects is detected. Thus, in various implementations, the method 600 includes capturing, using the image sensor, a second image of a second environment at a second particular time. The method 600 includes detecting a second object in the second image of the second environment, wherein the second object is different than the first object. The method 600 includes, in accordance with a determination that the object is included within the predefined set of objects, determining a status of the second object based at least in part on the second image of the second environment. The method 600 includes storing, in the database, a second entry including data indicating the status of the second object in association with data indicating the second particular time. In various implementations, the method includes receiving, from a user, a second query regarding the second context and providing a response to the second query based on data indicating the second particular time.
In various implementations, the device attempts to detect contexts in each captured image of the environment. However, in various implementations, to reduce processing power expenditure in attempting to detect contexts in each captured image, the device attempts to detect contexts periodically in a subset of the images (e.g., once a second or once a minute). In various implementations, detecting the context (in block 520) is performed in response to determining that a function of the image of the environment breaches an interest threshold. In various implementations, the function of the image of the environment includes a difference between the image of the environment and a baseline image of the environment previously captured. If the function of the image of the environment is greater than the interest threshold, the device detects contexts in the image and sets the image of the environment as the baseline image of the environment. If the function of the image of the environment is less than the interest threshold, the device forgoes detecting contexts in the image. In various implementations, the function of the image of the environment is greater if the user's hands are detected in the image, indicating user interaction and a greater likelihood that an activity context will be detected.
The described technology may gather and use information from various sources. This information may, in some instances, include personal information that identifies or may be used to locate or contact a specific individual. This personal information may include demographic data, location data, telephone numbers, email addresses, date of birth, social media account names, work or home addresses, data or records associated with a user's health or fitness level, or other personal or identifying information.
The collection, storage, transfer, disclosure, analysis, or other use of personal information should comply with well-established privacy policies or practices. Privacy policies and practices that are generally recognized as meeting or exceeding industry or governmental requirements should be implemented and used. Personal information should be collected for legitimate and reasonable uses and not shared or sold outside of those uses. The collection or sharing of information should occur after receipt of the user's informed consent.
It is contemplated that, in some instances, users may selectively prevent the use of, or access to, personal information. Hardware or software features may be provided to prevent or block access to personal information. Personal information should be handled to reduce the risk of unintentional or unauthorized access or use. Risk can be reduced by limiting the collection of data and deleting the data once it is no longer needed. When applicable, data de-identification may be used to protect a user's privacy.
Although the described technology may broadly include the use of personal information, it may be implemented without accessing such personal information. In other words, the present technology may not be rendered inoperable due to the lack of some or all of such personal information.
While various aspects of implementations within the scope of the appended claims are described above, it should be apparent that the various features of implementations described above may be embodied in a wide variety of forms and that any specific structure and/or function described above is merely illustrative. Based on the present disclosure one skilled in the art should appreciate that an aspect described herein may be implemented independently of any other aspects and that two or more of these aspects may be combined in various ways. For example, an apparatus may be implemented and/or a method may be practiced using any number of the aspects set forth herein. In addition, such an apparatus may be implemented and/or such a method may be practiced using other structure and/or functionality in addition to or other than one or more of the aspects set forth herein.
It will also be understood that, although the terms “first,” “second,” etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, a first node could be termed a second node, and, similarly, a second node could be termed a first node, which changing the meaning of the description, so long as all occurrences of the “first node” are renamed consistently and all occurrences of the “second node” are renamed consistently. The first node and the second node are both nodes, but they are not the same node.
The terminology used herein is for the purpose of describing particular implementations only and is not intended to be limiting of the claims. As used in the description of the implementations and the appended claims, the singular forms “a,” “an,” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will also be understood that the term “and/or” as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
As used herein, the term “if” may be construed to mean “when” or “upon” or “in response to determining” or “in accordance with a determination” or “in response to detecting,” that a stated condition precedent is true, depending on the context. Similarly, the phrase “if it is determined [that a stated condition precedent is true]” or “if [a stated condition precedent is true]” or “when [a stated condition precedent is true]” may be construed to mean “upon determining” or “in response to determining” or “in accordance with a determination” or “upon detecting” or “in response to detecting” that the stated condition precedent is true, depending on the context.
This application claims priority to U.S. Provisional Patent App. No. 63/247,978, filed on Sep. 24, 2021, and U.S. Provisional Patent App. No. 63/400,291, filed on Aug. 23, 2022, which are both hereby incorporated by reference in their entirety.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2022/043776 | 9/16/2022 | WO |
Number | Date | Country | |
---|---|---|---|
63247978 | Sep 2021 | US | |
63400291 | Aug 2022 | US |